munisp · munisp · Jun 16, 2026 · Jun 10, 2026
diff --git a/.agents/skills/testing-infrastructure/SKILL.md b/.agents/skills/testing-infrastructure/SKILL.md
@@ -0,0 +1,174 @@
+---
+name: testing-infrastructure
+description: Test 54Bank infrastructure-as-code configs (Terraform, OpenStack Heat, On-Premise K8s, K8s manifests, compliance docs). Use when verifying IaC changes, deployment config updates, or compliance documentation.
+---
+
+# Testing 54Bank Infrastructure Configs
+
+## Prerequisites
+
+- Python 3.12+ with `pyyaml` package (for YAML validation)
+- No Terraform CLI, Ansible, or OpenStack CLI needed — all validation is structural
+- Go 1.21+, Rust/Cargo, Python 3.12+ (for service compilation spot-checks)
+
+## Devin Secrets Needed
+
+None — all testing is local structural validation.
+
+## Key Gotcha: Multi-Document YAML
+
+K8s manifest files use `---` separators for multiple resources in one file. You MUST use `yaml.safe_load_all()` (not `yaml.safe_load()`) to parse them. Expected document counts:
+
+| File | Docs |
+|------|------|
+| `k8s/vault/vault-deployment.yaml` | 8 |
+| `k8s/crossplane/hybrid-cloud.yaml` | 11 |
+| `k8s/external-secrets/external-secrets-operator.yaml` | 9 |
+| `k8s/ingress/apisix-gateway.yaml` | 7 |
+| `k8s/dr/disaster-recovery.yaml` | 5 |
+| `onpremise/rook-ceph/ceph-cluster.yaml` | 6 |
+| `onpremise/metallb/metallb-config.yaml` | 6 |
+| `onpremise/kubeadm/cluster-config.yaml` | 4 |
+
+Single-doc files: `audit-policy.yaml`, `encryption-config.yaml`, `site.yaml`, `inventory.yaml`, `54bank-stack.yaml`, `cluster-template.yaml`.
+
+## Step 1: YAML Syntax Validation
+
+```bash
+cd /home/ubuntu/repos/corebanking
+python3 -c "
+import yaml
+files = [
+    'k8s/vault/vault-deployment.yaml',
+    'k8s/dr/disaster-recovery.yaml',
+    'k8s/external-secrets/external-secrets-operator.yaml',
+    'k8s/ingress/apisix-gateway.yaml',
+    'k8s/crossplane/hybrid-cloud.yaml',
+    'onpremise/kubeadm/cluster-config.yaml',
+    'onpremise/kubeadm/audit-policy.yaml',
+    'onpremise/kubeadm/encryption-config.yaml',
+    'onpremise/metallb/metallb-config.yaml',
+    'onpremise/rook-ceph/ceph-cluster.yaml',
+    'onpremise/ansible/site.yaml',
+    'onpremise/ansible/inventory.yaml',
+    'openstack/heat/54bank-stack.yaml',
+    'openstack/magnum/cluster-template.yaml',
+]
+for f in files:
+    docs = [d for d in yaml.safe_load_all(open(f)) if d is not None]
+    print(f'OK  {f} ({len(docs)} docs)')
+"
+```
+
+**Expected**: All 14 files parse without exceptions.
+
+## Step 2: Terraform HCL Structural Validation
+
+No `terraform` CLI available in the environment. Validate structurally:
+
+```bash
+python3 -c "
+import os
+for root, dirs, files in os.walk('terraform'):
+    for f in files:
+        if f.endswith('.tf'):
+            path = os.path.join(root, f)
+            content = open(path).read()
+            if content.count('{') != content.count('}'):
+                print(f'ERR {path}: unbalanced braces')
+            else:
+                print(f'OK  {path}')
+"
+```
+
+**Key assertions**:
+- `terraform/environments/production/main.tf` contains: `module "vpc"`, `module "eks"`, `module "rds"`, `module "elasticache"`, `module "msk"`
+- `terraform/environments/production/dr.tf` contains: VPC peering, DR EKS/RDS modules
+- All 6 module directories have `output` blocks
+- At least 4 `aws_kms_key` references (encryption at rest for EKS, RDS, MSK, S3)
+
+## Step 3: OpenStack Heat Template
+
+```bash
+python3 -c "
+import yaml
+doc = yaml.safe_load(open('openstack/heat/54bank-stack.yaml'))
+print('version:', doc.get('heat_template_version'))
+print('resources:', sorted(doc.get('resources', {}).keys()))
+print('params:', sorted(doc.get('parameters', {}).keys()))
+"
+```
+
+**Key assertions**:
+- `heat_template_version` is `2021-04-16`
+- Resources include: `internal_network`, `k8s_cluster`, `postgres_cluster`, `api_loadbalancer` (note: NOT `load_balancer` — uses `api_` prefix)
+- Parameters include: `cluster_name`, `key_name`, `external_network`
+
+## Step 4: On-Premise Security Checks
+
+```bash
+# Encryption at rest
+grep -c 'aescbc' onpremise/kubeadm/encryption-config.yaml
+
+# Audit policy covers secrets + RBAC
+grep -c 'secrets' onpremise/kubeadm/audit-policy.yaml
+
+# MetalLB IP pools
+grep -c 'IPAddressPool' onpremise/metallb/metallb-config.yaml
+
+# Ceph 3+ mons
+grep 'count:' onpremise/rook-ceph/ceph-cluster.yaml
+
+# HAProxy TLS 1.2+ and rate limiting
+grep 'TLSv1.2' onpremise/haproxy/haproxy.cfg
+grep 'stick-table' onpremise/haproxy/haproxy.cfg
+
+# Both sites (Lagos + Abuja DR)
+grep -i 'lagos\|abuja' onpremise/ansible/inventory.yaml
+```
+
+## Step 5: K8s Manifest Content Validation
+
+Parse YAML and check specific values:
+- **Vault**: StatefulSet `replicas: 3`, Raft storage, audit logging
+- **APISIX**: Deployment `replicas: 3`, etcd StatefulSet `replicas: 3`
+- **External Secrets**: ClusterSecretStore references `vault`
+- **DR ConfigMap**: `rto_minutes: 15`, `rpo_minutes: 1`
+
+## Step 6: Service Compilation Spot-Check
+
+Sample 15 Go + 10 Rust + 10 Python services randomly:
+
+```bash
+# Go
+ls -d services/*-go/ | shuf -n 15 | while read svc; do
+  cd /home/ubuntu/repos/corebanking/$svc && go build ./... && go test ./... && echo "PASS $(basename $svc)"
+  cd /home/ubuntu/repos/corebanking
+done
+
+# Rust (takes ~10-15s each)
+ls -d services/*-rs/ | shuf -n 10 | while read svc; do
+  cd /home/ubuntu/repos/corebanking/$svc && cargo check && echo "PASS $(basename $svc)"
+  cd /home/ubuntu/repos/corebanking
+done
+
+# Python
+ls -d services/*-py/ | shuf -n 10 | while read svc; do
+  cd /home/ubuntu/repos/corebanking/$svc && python3 -m py_compile main.py && python3 -m pytest test_main.py -x -q && echo "PASS $(basename $svc)"
+  cd /home/ubuntu/repos/corebanking
+done
+```
+
+## Step 7: Compliance Docs
+
+- `docs/compliance/PCI-DSS-v4.0-Compliance.md`: Check for "Requirement 1" through "Requirement 12"
+- `docs/compliance/NDPR-Compliance.md`: Check for "data residency" and "72 hour" breach notification
+- `docs/compliance/CBN-IT-Standards.md`: Check for "RTO" and "RPO" values (should be 15min / 1min)
+
+## Testing Tips
+
+- This is all shell-only testing — do NOT start a recording
+- Rust `cargo check` can be slow on first run (compiling dependencies) — allow 15-30s per service
+- The Rust compilation might fail due to disk space if many services are checked sequentially — monitor with `df -h`
+- Go and Python tests are fast (<1s each)
+- ML inference server testing is covered by the `testing-ml-pipeline` skill
diff --git a/.agents/skills/testing-ml-pipeline/SKILL.md b/.agents/skills/testing-ml-pipeline/SKILL.md
@@ -27,14 +27,16 @@ fuser -k 8500/tcp 2>/dev/null; fuser -k 8501/tcp 2>/dev/null
 ```bash
 cd /home/ubuntu/repos/corebanking
 python -m ml.inference.server 2>&1 &
-sleep 3
+sleep 5
 curl -s http://localhost:8500/healthz
 ```
 
 **Expected**: `{"status": "healthy", "models_loaded": 6, "device": "cpu"}`
 
 If `models_loaded` < 6, check that all .pt weight files exist in `ml/weights/`.
 
+**Tip**: The server starts fast (~0.06s to load all 6 models on CPU). If it takes longer than 10s, something is wrong.
+
 ## Step 2: Test Inference Endpoints
 
 All 5 working endpoints (note: `/v1/gnn/predict` might not be routed — check `do_POST` in `ml/inference/server.py`):
@@ -67,10 +69,10 @@ curl -s -X POST http://localhost:8500/v1/churn/predict \
 ```
 
 **Key assertions**:
-- Fraud: `fraud_probability` > 0.5 for suspicious input, `predictions` is an array
-- Credit: `credit_score` is a float, `credit_band` in [poor, fair, good, excellent]
-- AML: `suspicious_probability` is a float, `risk_tier` in [low, medium, high, critical]
-- Anomaly: `anomaly_score` >= 0, `is_anomaly` is boolean
+- Fraud: `fraud_probability` > 0.5 for suspicious input (typically ~0.935), `predictions` is an array
+- Credit: `credit_score` is a float (e.g. 733.0), `credit_band` in [poor, fair, good, excellent]
+- AML: `suspicious_probability` is a float (1.0 for high-risk PEP), `risk_tier` in [low, medium, high, critical]
+- Anomaly: `anomaly_score` >= 0 (typically ~0.018 for normal), `is_anomaly` is boolean (false for normal)
 - Churn: `attention_weights` is list of 12 floats summing to ~1.0, `critical_months` has 3 entries
 
 ## Step 3: Test Continuous Training Pipeline
@@ -91,6 +93,8 @@ python -m ml.continuous_training.orchestrator --mode full --model credit_scorer
 - Champion-challenger: `recommendation` is one of: promote, keep_champion, inconclusive
 - Pipeline result saved to `ml/weights/ct_pipeline_*.json`
 
+**Note**: Training requires parquet dataset files in `ml/data/datasets/`. If these don't exist (0 files found), training will fail but inference still works.
+
 ## Step 4: Test Model Promoter
 
 ```bash
@@ -108,52 +112,40 @@ try:
     p.promote_to_production('fraud_detector', approved_by='auto')
     print('ERROR: Should have raised PermissionError')
 except PermissionError as e:
-    print(f'PASS: {e}')
-except FileNotFoundError as e:
-    print(f'SKIP (no staging): {e}')
+    print(f'Approval gate works: {e}')
 "
 ```
 
 **Key assertions**:
-- 6 models in status, all with `production: True`
-- fraud_detector and aml_scorer require human approval (REQUIRES_APPROVAL set)
+- `get_model_status()` returns dict with all 6 model names
+- Promoting `fraud_detector` with `approved_by='auto'` raises `PermissionError` (human approval required for high-risk models)
 
-## Step 5: Test Monitoring Server
+## Step 5: Test Monitoring Dashboard API
 
 ```bash
 cd /home/ubuntu/repos/corebanking
-python -m ml.continuous_training.monitoring 2>&1 &
-sleep 2
+python -m ml.monitoring.dashboard 2>&1 &
+sleep 3
+curl -s http://localhost:8501/api/metrics | python3 -m json.tool | head -20
+curl -s http://localhost:8501/api/drift | python3 -m json.tool | head -20
+```
 
-curl -s http://localhost:8501/monitoring/healthz
-curl -s http://localhost:8501/monitoring/status | python3 -m json.tool
-curl -s http://localhost:8501/monitoring/prometheus
-curl -s http://localhost:8501/monitoring/dashboard | head -5
+**Key assertions**:
+- `/api/metrics` returns JSON with model performance metrics
+- `/api/drift` returns JSON with drift detection results
+
+## Step 6: Cleanup
 
-# Manual retrain trigger
-curl -s -X POST http://localhost:8501/monitoring/trigger/credit_scorer
-curl -s -X POST http://localhost:8501/monitoring/trigger/nonexistent_model
+```bash
+fuser -k 8500/tcp 2>/dev/null
+fuser -k 8501/tcp 2>/dev/null
 ```
 
-**Key assertions**:
-- `/monitoring/status`: 6 models with production/staging/canary flags
-- `/monitoring/prometheus`: `ml_model_weight_exists{model="..."}` = 1 for all 6
-- `/monitoring/dashboard`: HTML starts with `<!DOCTYPE html>`, title "54Bank ML Monitoring"
-- Valid trigger returns `{"status": "triggered"}`, invalid returns 400
-
-## Step 6: Browser Dashboard Verification
-
-Open `http://localhost:8501/monitoring/dashboard` in the browser. Verify:
-- Title "54Bank ML Model Monitoring"
-- 6 model cards with AUC-ROC, F1, Parameters, Weight Size, Epochs
-- PROD/STAGING badges on cards
-- Active Alerts section at bottom
-
-## Known Issues / Gotchas
-
-- **Port conflicts**: Always kill existing processes on 8500/8501 before starting servers (`fuser -k PORT/tcp`)
-- **GNN endpoint**: The `/v1/gnn/predict` endpoint might not be wired in `do_POST` — the model loads but the router might be missing the route. Check `ml/inference/server.py` POST handler.
-- **cd + && in exec**: Some shell environments block `cd X && cmd`. Use separate commands or absolute paths.
-- **Retrain time**: credit_scorer retrain takes ~45-60s on CPU. Don't set short timeouts.
-- **Staging files**: After forced retrain, `credit_scorer_staging.pt` remains in `ml/weights/`. The monitoring dashboard will show a STAGING badge.
-- **System restarts**: If the VM restarts, all running servers are killed. Files on disk are preserved — just restart the servers.
+## Testing Tips
+
+- This is all shell-only testing — do NOT start a recording
+- The inference server loads all 6 models in ~0.06s on CPU — if it hangs, check for port conflicts
+- Fraud detection model is highly sensitive — the test input with international + new beneficiary + high amount at 2AM reliably produces >0.9 probability
+- AML model with PEP flag + structuring score 0.9 consistently returns 1.0 suspicious probability
+- The churn model uses attention mechanism — verify that `attention_weights` sums to ~1.0 (within 0.95-1.05 tolerance)
+- If disk space is low, model weights are ~1.8MB total — not a concern