Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions .agents/skills/testing-infrastructure/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
name: testing-infrastructure
description: Test 54Bank infrastructure-as-code configs (Terraform, OpenStack Heat, On-Premise K8s, K8s manifests, compliance docs). Use when verifying IaC changes, deployment config updates, or compliance documentation.
---

# Testing 54Bank Infrastructure Configs

## Prerequisites

- Python 3.12+ with `pyyaml` package (for YAML validation)
- No Terraform CLI, Ansible, or OpenStack CLI needed — all validation is structural
- Go 1.21+, Rust/Cargo, Python 3.12+ (for service compilation spot-checks)

## Devin Secrets Needed

None — all testing is local structural validation.

## Key Gotcha: Multi-Document YAML

K8s manifest files use `---` separators for multiple resources in one file. You MUST use `yaml.safe_load_all()` (not `yaml.safe_load()`) to parse them. Expected document counts:

| File | Docs |
|------|------|
| `k8s/vault/vault-deployment.yaml` | 8 |
| `k8s/crossplane/hybrid-cloud.yaml` | 11 |
| `k8s/external-secrets/external-secrets-operator.yaml` | 9 |
| `k8s/ingress/apisix-gateway.yaml` | 7 |
| `k8s/dr/disaster-recovery.yaml` | 5 |
| `onpremise/rook-ceph/ceph-cluster.yaml` | 6 |
| `onpremise/metallb/metallb-config.yaml` | 6 |
| `onpremise/kubeadm/cluster-config.yaml` | 4 |

Single-doc files: `audit-policy.yaml`, `encryption-config.yaml`, `site.yaml`, `inventory.yaml`, `54bank-stack.yaml`, `cluster-template.yaml`.

## Step 1: YAML Syntax Validation

```bash
cd /home/ubuntu/repos/corebanking
python3 -c "
import yaml
files = [
'k8s/vault/vault-deployment.yaml',
'k8s/dr/disaster-recovery.yaml',
'k8s/external-secrets/external-secrets-operator.yaml',
'k8s/ingress/apisix-gateway.yaml',
'k8s/crossplane/hybrid-cloud.yaml',
'onpremise/kubeadm/cluster-config.yaml',
'onpremise/kubeadm/audit-policy.yaml',
'onpremise/kubeadm/encryption-config.yaml',
'onpremise/metallb/metallb-config.yaml',
'onpremise/rook-ceph/ceph-cluster.yaml',
'onpremise/ansible/site.yaml',
'onpremise/ansible/inventory.yaml',
'openstack/heat/54bank-stack.yaml',
'openstack/magnum/cluster-template.yaml',
]
for f in files:
docs = [d for d in yaml.safe_load_all(open(f)) if d is not None]
print(f'OK {f} ({len(docs)} docs)')
"
```

**Expected**: All 14 files parse without exceptions.

## Step 2: Terraform HCL Structural Validation

No `terraform` CLI available in the environment. Validate structurally:

```bash
python3 -c "
import os
for root, dirs, files in os.walk('terraform'):
for f in files:
if f.endswith('.tf'):
path = os.path.join(root, f)
content = open(path).read()
if content.count('{') != content.count('}'):
print(f'ERR {path}: unbalanced braces')
else:
print(f'OK {path}')
"
```

**Key assertions**:
- `terraform/environments/production/main.tf` contains: `module "vpc"`, `module "eks"`, `module "rds"`, `module "elasticache"`, `module "msk"`
- `terraform/environments/production/dr.tf` contains: VPC peering, DR EKS/RDS modules
- All 6 module directories have `output` blocks
- At least 4 `aws_kms_key` references (encryption at rest for EKS, RDS, MSK, S3)

## Step 3: OpenStack Heat Template

```bash
python3 -c "
import yaml
doc = yaml.safe_load(open('openstack/heat/54bank-stack.yaml'))
print('version:', doc.get('heat_template_version'))
print('resources:', sorted(doc.get('resources', {}).keys()))
print('params:', sorted(doc.get('parameters', {}).keys()))
"
```

**Key assertions**:
- `heat_template_version` is `2021-04-16`
- Resources include: `internal_network`, `k8s_cluster`, `postgres_cluster`, `api_loadbalancer` (note: NOT `load_balancer` — uses `api_` prefix)
- Parameters include: `cluster_name`, `key_name`, `external_network`

## Step 4: On-Premise Security Checks

```bash
# Encryption at rest
grep -c 'aescbc' onpremise/kubeadm/encryption-config.yaml

# Audit policy covers secrets + RBAC
grep -c 'secrets' onpremise/kubeadm/audit-policy.yaml

# MetalLB IP pools
grep -c 'IPAddressPool' onpremise/metallb/metallb-config.yaml

# Ceph 3+ mons
grep 'count:' onpremise/rook-ceph/ceph-cluster.yaml

# HAProxy TLS 1.2+ and rate limiting
grep 'TLSv1.2' onpremise/haproxy/haproxy.cfg
grep 'stick-table' onpremise/haproxy/haproxy.cfg

# Both sites (Lagos + Abuja DR)
grep -i 'lagos\|abuja' onpremise/ansible/inventory.yaml
```

## Step 5: K8s Manifest Content Validation

Parse YAML and check specific values:
- **Vault**: StatefulSet `replicas: 3`, Raft storage, audit logging
- **APISIX**: Deployment `replicas: 3`, etcd StatefulSet `replicas: 3`
- **External Secrets**: ClusterSecretStore references `vault`
- **DR ConfigMap**: `rto_minutes: 15`, `rpo_minutes: 1`

## Step 6: Service Compilation Spot-Check

Sample 15 Go + 10 Rust + 10 Python services randomly:

```bash
# Go
ls -d services/*-go/ | shuf -n 15 | while read svc; do
cd /home/ubuntu/repos/corebanking/$svc && go build ./... && go test ./... && echo "PASS $(basename $svc)"
cd /home/ubuntu/repos/corebanking
done

# Rust (takes ~10-15s each)
ls -d services/*-rs/ | shuf -n 10 | while read svc; do
cd /home/ubuntu/repos/corebanking/$svc && cargo check && echo "PASS $(basename $svc)"
cd /home/ubuntu/repos/corebanking
done

# Python
ls -d services/*-py/ | shuf -n 10 | while read svc; do
cd /home/ubuntu/repos/corebanking/$svc && python3 -m py_compile main.py && python3 -m pytest test_main.py -x -q && echo "PASS $(basename $svc)"
cd /home/ubuntu/repos/corebanking
done
```

## Step 7: Compliance Docs

- `docs/compliance/PCI-DSS-v4.0-Compliance.md`: Check for "Requirement 1" through "Requirement 12"
- `docs/compliance/NDPR-Compliance.md`: Check for "data residency" and "72 hour" breach notification
- `docs/compliance/CBN-IT-Standards.md`: Check for "RTO" and "RPO" values (should be 15min / 1min)

## Testing Tips

- This is all shell-only testing — do NOT start a recording
- Rust `cargo check` can be slow on first run (compiling dependencies) — allow 15-30s per service
- The Rust compilation might fail due to disk space if many services are checked sequentially — monitor with `df -h`
- Go and Python tests are fast (<1s each)
- ML inference server testing is covered by the `testing-ml-pipeline` skill
76 changes: 34 additions & 42 deletions .agents/skills/testing-ml-pipeline/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,16 @@ fuser -k 8500/tcp 2>/dev/null; fuser -k 8501/tcp 2>/dev/null
```bash
cd /home/ubuntu/repos/corebanking
python -m ml.inference.server 2>&1 &
sleep 3
sleep 5
curl -s http://localhost:8500/healthz
```

**Expected**: `{"status": "healthy", "models_loaded": 6, "device": "cpu"}`

If `models_loaded` < 6, check that all .pt weight files exist in `ml/weights/`.

**Tip**: The server starts fast (~0.06s to load all 6 models on CPU). If it takes longer than 10s, something is wrong.

## Step 2: Test Inference Endpoints

All 5 working endpoints (note: `/v1/gnn/predict` might not be routed — check `do_POST` in `ml/inference/server.py`):
Expand Down Expand Up @@ -67,10 +69,10 @@ curl -s -X POST http://localhost:8500/v1/churn/predict \
```

**Key assertions**:
- Fraud: `fraud_probability` > 0.5 for suspicious input, `predictions` is an array
- Credit: `credit_score` is a float, `credit_band` in [poor, fair, good, excellent]
- AML: `suspicious_probability` is a float, `risk_tier` in [low, medium, high, critical]
- Anomaly: `anomaly_score` >= 0, `is_anomaly` is boolean
- Fraud: `fraud_probability` > 0.5 for suspicious input (typically ~0.935), `predictions` is an array
- Credit: `credit_score` is a float (e.g. 733.0), `credit_band` in [poor, fair, good, excellent]
- AML: `suspicious_probability` is a float (1.0 for high-risk PEP), `risk_tier` in [low, medium, high, critical]
- Anomaly: `anomaly_score` >= 0 (typically ~0.018 for normal), `is_anomaly` is boolean (false for normal)
- Churn: `attention_weights` is list of 12 floats summing to ~1.0, `critical_months` has 3 entries

## Step 3: Test Continuous Training Pipeline
Expand All @@ -91,6 +93,8 @@ python -m ml.continuous_training.orchestrator --mode full --model credit_scorer
- Champion-challenger: `recommendation` is one of: promote, keep_champion, inconclusive
- Pipeline result saved to `ml/weights/ct_pipeline_*.json`

**Note**: Training requires parquet dataset files in `ml/data/datasets/`. If these don't exist (0 files found), training will fail but inference still works.

## Step 4: Test Model Promoter

```bash
Expand All @@ -108,52 +112,40 @@ try:
p.promote_to_production('fraud_detector', approved_by='auto')
print('ERROR: Should have raised PermissionError')
except PermissionError as e:
print(f'PASS: {e}')
except FileNotFoundError as e:
print(f'SKIP (no staging): {e}')
print(f'Approval gate works: {e}')
"
```

**Key assertions**:
- 6 models in status, all with `production: True`
- fraud_detector and aml_scorer require human approval (REQUIRES_APPROVAL set)
- `get_model_status()` returns dict with all 6 model names
- Promoting `fraud_detector` with `approved_by='auto'` raises `PermissionError` (human approval required for high-risk models)

## Step 5: Test Monitoring Server
## Step 5: Test Monitoring Dashboard API

```bash
cd /home/ubuntu/repos/corebanking
python -m ml.continuous_training.monitoring 2>&1 &
sleep 2
python -m ml.monitoring.dashboard 2>&1 &
sleep 3
curl -s http://localhost:8501/api/metrics | python3 -m json.tool | head -20
curl -s http://localhost:8501/api/drift | python3 -m json.tool | head -20
```

curl -s http://localhost:8501/monitoring/healthz
curl -s http://localhost:8501/monitoring/status | python3 -m json.tool
curl -s http://localhost:8501/monitoring/prometheus
curl -s http://localhost:8501/monitoring/dashboard | head -5
**Key assertions**:
- `/api/metrics` returns JSON with model performance metrics
- `/api/drift` returns JSON with drift detection results

## Step 6: Cleanup

# Manual retrain trigger
curl -s -X POST http://localhost:8501/monitoring/trigger/credit_scorer
curl -s -X POST http://localhost:8501/monitoring/trigger/nonexistent_model
```bash
fuser -k 8500/tcp 2>/dev/null
fuser -k 8501/tcp 2>/dev/null
```

**Key assertions**:
- `/monitoring/status`: 6 models with production/staging/canary flags
- `/monitoring/prometheus`: `ml_model_weight_exists{model="..."}` = 1 for all 6
- `/monitoring/dashboard`: HTML starts with `<!DOCTYPE html>`, title "54Bank ML Monitoring"
- Valid trigger returns `{"status": "triggered"}`, invalid returns 400

## Step 6: Browser Dashboard Verification

Open `http://localhost:8501/monitoring/dashboard` in the browser. Verify:
- Title "54Bank ML Model Monitoring"
- 6 model cards with AUC-ROC, F1, Parameters, Weight Size, Epochs
- PROD/STAGING badges on cards
- Active Alerts section at bottom

## Known Issues / Gotchas

- **Port conflicts**: Always kill existing processes on 8500/8501 before starting servers (`fuser -k PORT/tcp`)
- **GNN endpoint**: The `/v1/gnn/predict` endpoint might not be wired in `do_POST` — the model loads but the router might be missing the route. Check `ml/inference/server.py` POST handler.
- **cd + && in exec**: Some shell environments block `cd X && cmd`. Use separate commands or absolute paths.
- **Retrain time**: credit_scorer retrain takes ~45-60s on CPU. Don't set short timeouts.
- **Staging files**: After forced retrain, `credit_scorer_staging.pt` remains in `ml/weights/`. The monitoring dashboard will show a STAGING badge.
- **System restarts**: If the VM restarts, all running servers are killed. Files on disk are preserved — just restart the servers.
## Testing Tips

- This is all shell-only testing — do NOT start a recording
- The inference server loads all 6 models in ~0.06s on CPU — if it hangs, check for port conflicts
- Fraud detection model is highly sensitive — the test input with international + new beneficiary + high amount at 2AM reliably produces >0.9 probability
- AML model with PEP flag + structuring score 0.9 consistently returns 1.0 suspicious probability
- The churn model uses attention mechanism — verify that `attention_weights` sums to ~1.0 (within 0.95-1.05 tolerance)
- If disk space is low, model weights are ~1.8MB total — not a concern