From cbc6322b01c8a348456cb329f555fb59b7e666db Mon Sep 17 00:00:00 2001 From: "devin-ai-integration[bot]" <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Wed, 10 Jun 2026 08:56:51 +0000 Subject: [PATCH] Add infrastructure testing skill for 54Bank platform Co-authored-by: Patrick Munis --- .../skills/testing-infrastructure/SKILL.md | 174 ++++++++++++++++++ .agents/skills/testing-ml-pipeline/SKILL.md | 76 ++++---- 2 files changed, 208 insertions(+), 42 deletions(-) create mode 100644 .agents/skills/testing-infrastructure/SKILL.md diff --git a/.agents/skills/testing-infrastructure/SKILL.md b/.agents/skills/testing-infrastructure/SKILL.md new file mode 100644 index 000000000..ba783ad58 --- /dev/null +++ b/.agents/skills/testing-infrastructure/SKILL.md @@ -0,0 +1,174 @@ +--- +name: testing-infrastructure +description: Test 54Bank infrastructure-as-code configs (Terraform, OpenStack Heat, On-Premise K8s, K8s manifests, compliance docs). Use when verifying IaC changes, deployment config updates, or compliance documentation. +--- + +# Testing 54Bank Infrastructure Configs + +## Prerequisites + +- Python 3.12+ with `pyyaml` package (for YAML validation) +- No Terraform CLI, Ansible, or OpenStack CLI needed — all validation is structural +- Go 1.21+, Rust/Cargo, Python 3.12+ (for service compilation spot-checks) + +## Devin Secrets Needed + +None — all testing is local structural validation. + +## Key Gotcha: Multi-Document YAML + +K8s manifest files use `---` separators for multiple resources in one file. You MUST use `yaml.safe_load_all()` (not `yaml.safe_load()`) to parse them. Expected document counts: + +| File | Docs | +|------|------| +| `k8s/vault/vault-deployment.yaml` | 8 | +| `k8s/crossplane/hybrid-cloud.yaml` | 11 | +| `k8s/external-secrets/external-secrets-operator.yaml` | 9 | +| `k8s/ingress/apisix-gateway.yaml` | 7 | +| `k8s/dr/disaster-recovery.yaml` | 5 | +| `onpremise/rook-ceph/ceph-cluster.yaml` | 6 | +| `onpremise/metallb/metallb-config.yaml` | 6 | +| `onpremise/kubeadm/cluster-config.yaml` | 4 | + +Single-doc files: `audit-policy.yaml`, `encryption-config.yaml`, `site.yaml`, `inventory.yaml`, `54bank-stack.yaml`, `cluster-template.yaml`. + +## Step 1: YAML Syntax Validation + +```bash +cd /home/ubuntu/repos/corebanking +python3 -c " +import yaml +files = [ + 'k8s/vault/vault-deployment.yaml', + 'k8s/dr/disaster-recovery.yaml', + 'k8s/external-secrets/external-secrets-operator.yaml', + 'k8s/ingress/apisix-gateway.yaml', + 'k8s/crossplane/hybrid-cloud.yaml', + 'onpremise/kubeadm/cluster-config.yaml', + 'onpremise/kubeadm/audit-policy.yaml', + 'onpremise/kubeadm/encryption-config.yaml', + 'onpremise/metallb/metallb-config.yaml', + 'onpremise/rook-ceph/ceph-cluster.yaml', + 'onpremise/ansible/site.yaml', + 'onpremise/ansible/inventory.yaml', + 'openstack/heat/54bank-stack.yaml', + 'openstack/magnum/cluster-template.yaml', +] +for f in files: + docs = [d for d in yaml.safe_load_all(open(f)) if d is not None] + print(f'OK {f} ({len(docs)} docs)') +" +``` + +**Expected**: All 14 files parse without exceptions. + +## Step 2: Terraform HCL Structural Validation + +No `terraform` CLI available in the environment. Validate structurally: + +```bash +python3 -c " +import os +for root, dirs, files in os.walk('terraform'): + for f in files: + if f.endswith('.tf'): + path = os.path.join(root, f) + content = open(path).read() + if content.count('{') != content.count('}'): + print(f'ERR {path}: unbalanced braces') + else: + print(f'OK {path}') +" +``` + +**Key assertions**: +- `terraform/environments/production/main.tf` contains: `module "vpc"`, `module "eks"`, `module "rds"`, `module "elasticache"`, `module "msk"` +- `terraform/environments/production/dr.tf` contains: VPC peering, DR EKS/RDS modules +- All 6 module directories have `output` blocks +- At least 4 `aws_kms_key` references (encryption at rest for EKS, RDS, MSK, S3) + +## Step 3: OpenStack Heat Template + +```bash +python3 -c " +import yaml +doc = yaml.safe_load(open('openstack/heat/54bank-stack.yaml')) +print('version:', doc.get('heat_template_version')) +print('resources:', sorted(doc.get('resources', {}).keys())) +print('params:', sorted(doc.get('parameters', {}).keys())) +" +``` + +**Key assertions**: +- `heat_template_version` is `2021-04-16` +- Resources include: `internal_network`, `k8s_cluster`, `postgres_cluster`, `api_loadbalancer` (note: NOT `load_balancer` — uses `api_` prefix) +- Parameters include: `cluster_name`, `key_name`, `external_network` + +## Step 4: On-Premise Security Checks + +```bash +# Encryption at rest +grep -c 'aescbc' onpremise/kubeadm/encryption-config.yaml + +# Audit policy covers secrets + RBAC +grep -c 'secrets' onpremise/kubeadm/audit-policy.yaml + +# MetalLB IP pools +grep -c 'IPAddressPool' onpremise/metallb/metallb-config.yaml + +# Ceph 3+ mons +grep 'count:' onpremise/rook-ceph/ceph-cluster.yaml + +# HAProxy TLS 1.2+ and rate limiting +grep 'TLSv1.2' onpremise/haproxy/haproxy.cfg +grep 'stick-table' onpremise/haproxy/haproxy.cfg + +# Both sites (Lagos + Abuja DR) +grep -i 'lagos\|abuja' onpremise/ansible/inventory.yaml +``` + +## Step 5: K8s Manifest Content Validation + +Parse YAML and check specific values: +- **Vault**: StatefulSet `replicas: 3`, Raft storage, audit logging +- **APISIX**: Deployment `replicas: 3`, etcd StatefulSet `replicas: 3` +- **External Secrets**: ClusterSecretStore references `vault` +- **DR ConfigMap**: `rto_minutes: 15`, `rpo_minutes: 1` + +## Step 6: Service Compilation Spot-Check + +Sample 15 Go + 10 Rust + 10 Python services randomly: + +```bash +# Go +ls -d services/*-go/ | shuf -n 15 | while read svc; do + cd /home/ubuntu/repos/corebanking/$svc && go build ./... && go test ./... && echo "PASS $(basename $svc)" + cd /home/ubuntu/repos/corebanking +done + +# Rust (takes ~10-15s each) +ls -d services/*-rs/ | shuf -n 10 | while read svc; do + cd /home/ubuntu/repos/corebanking/$svc && cargo check && echo "PASS $(basename $svc)" + cd /home/ubuntu/repos/corebanking +done + +# Python +ls -d services/*-py/ | shuf -n 10 | while read svc; do + cd /home/ubuntu/repos/corebanking/$svc && python3 -m py_compile main.py && python3 -m pytest test_main.py -x -q && echo "PASS $(basename $svc)" + cd /home/ubuntu/repos/corebanking +done +``` + +## Step 7: Compliance Docs + +- `docs/compliance/PCI-DSS-v4.0-Compliance.md`: Check for "Requirement 1" through "Requirement 12" +- `docs/compliance/NDPR-Compliance.md`: Check for "data residency" and "72 hour" breach notification +- `docs/compliance/CBN-IT-Standards.md`: Check for "RTO" and "RPO" values (should be 15min / 1min) + +## Testing Tips + +- This is all shell-only testing — do NOT start a recording +- Rust `cargo check` can be slow on first run (compiling dependencies) — allow 15-30s per service +- The Rust compilation might fail due to disk space if many services are checked sequentially — monitor with `df -h` +- Go and Python tests are fast (<1s each) +- ML inference server testing is covered by the `testing-ml-pipeline` skill diff --git a/.agents/skills/testing-ml-pipeline/SKILL.md b/.agents/skills/testing-ml-pipeline/SKILL.md index ea4f009e5..7823adbd5 100644 --- a/.agents/skills/testing-ml-pipeline/SKILL.md +++ b/.agents/skills/testing-ml-pipeline/SKILL.md @@ -27,7 +27,7 @@ fuser -k 8500/tcp 2>/dev/null; fuser -k 8501/tcp 2>/dev/null ```bash cd /home/ubuntu/repos/corebanking python -m ml.inference.server 2>&1 & -sleep 3 +sleep 5 curl -s http://localhost:8500/healthz ``` @@ -35,6 +35,8 @@ curl -s http://localhost:8500/healthz If `models_loaded` < 6, check that all .pt weight files exist in `ml/weights/`. +**Tip**: The server starts fast (~0.06s to load all 6 models on CPU). If it takes longer than 10s, something is wrong. + ## Step 2: Test Inference Endpoints All 5 working endpoints (note: `/v1/gnn/predict` might not be routed — check `do_POST` in `ml/inference/server.py`): @@ -67,10 +69,10 @@ curl -s -X POST http://localhost:8500/v1/churn/predict \ ``` **Key assertions**: -- Fraud: `fraud_probability` > 0.5 for suspicious input, `predictions` is an array -- Credit: `credit_score` is a float, `credit_band` in [poor, fair, good, excellent] -- AML: `suspicious_probability` is a float, `risk_tier` in [low, medium, high, critical] -- Anomaly: `anomaly_score` >= 0, `is_anomaly` is boolean +- Fraud: `fraud_probability` > 0.5 for suspicious input (typically ~0.935), `predictions` is an array +- Credit: `credit_score` is a float (e.g. 733.0), `credit_band` in [poor, fair, good, excellent] +- AML: `suspicious_probability` is a float (1.0 for high-risk PEP), `risk_tier` in [low, medium, high, critical] +- Anomaly: `anomaly_score` >= 0 (typically ~0.018 for normal), `is_anomaly` is boolean (false for normal) - Churn: `attention_weights` is list of 12 floats summing to ~1.0, `critical_months` has 3 entries ## Step 3: Test Continuous Training Pipeline @@ -91,6 +93,8 @@ python -m ml.continuous_training.orchestrator --mode full --model credit_scorer - Champion-challenger: `recommendation` is one of: promote, keep_champion, inconclusive - Pipeline result saved to `ml/weights/ct_pipeline_*.json` +**Note**: Training requires parquet dataset files in `ml/data/datasets/`. If these don't exist (0 files found), training will fail but inference still works. + ## Step 4: Test Model Promoter ```bash @@ -108,52 +112,40 @@ try: p.promote_to_production('fraud_detector', approved_by='auto') print('ERROR: Should have raised PermissionError') except PermissionError as e: - print(f'PASS: {e}') -except FileNotFoundError as e: - print(f'SKIP (no staging): {e}') + print(f'Approval gate works: {e}') " ``` **Key assertions**: -- 6 models in status, all with `production: True` -- fraud_detector and aml_scorer require human approval (REQUIRES_APPROVAL set) +- `get_model_status()` returns dict with all 6 model names +- Promoting `fraud_detector` with `approved_by='auto'` raises `PermissionError` (human approval required for high-risk models) -## Step 5: Test Monitoring Server +## Step 5: Test Monitoring Dashboard API ```bash cd /home/ubuntu/repos/corebanking -python -m ml.continuous_training.monitoring 2>&1 & -sleep 2 +python -m ml.monitoring.dashboard 2>&1 & +sleep 3 +curl -s http://localhost:8501/api/metrics | python3 -m json.tool | head -20 +curl -s http://localhost:8501/api/drift | python3 -m json.tool | head -20 +``` -curl -s http://localhost:8501/monitoring/healthz -curl -s http://localhost:8501/monitoring/status | python3 -m json.tool -curl -s http://localhost:8501/monitoring/prometheus -curl -s http://localhost:8501/monitoring/dashboard | head -5 +**Key assertions**: +- `/api/metrics` returns JSON with model performance metrics +- `/api/drift` returns JSON with drift detection results + +## Step 6: Cleanup -# Manual retrain trigger -curl -s -X POST http://localhost:8501/monitoring/trigger/credit_scorer -curl -s -X POST http://localhost:8501/monitoring/trigger/nonexistent_model +```bash +fuser -k 8500/tcp 2>/dev/null +fuser -k 8501/tcp 2>/dev/null ``` -**Key assertions**: -- `/monitoring/status`: 6 models with production/staging/canary flags -- `/monitoring/prometheus`: `ml_model_weight_exists{model="..."}` = 1 for all 6 -- `/monitoring/dashboard`: HTML starts with ``, title "54Bank ML Monitoring" -- Valid trigger returns `{"status": "triggered"}`, invalid returns 400 - -## Step 6: Browser Dashboard Verification - -Open `http://localhost:8501/monitoring/dashboard` in the browser. Verify: -- Title "54Bank ML Model Monitoring" -- 6 model cards with AUC-ROC, F1, Parameters, Weight Size, Epochs -- PROD/STAGING badges on cards -- Active Alerts section at bottom - -## Known Issues / Gotchas - -- **Port conflicts**: Always kill existing processes on 8500/8501 before starting servers (`fuser -k PORT/tcp`) -- **GNN endpoint**: The `/v1/gnn/predict` endpoint might not be wired in `do_POST` — the model loads but the router might be missing the route. Check `ml/inference/server.py` POST handler. -- **cd + && in exec**: Some shell environments block `cd X && cmd`. Use separate commands or absolute paths. -- **Retrain time**: credit_scorer retrain takes ~45-60s on CPU. Don't set short timeouts. -- **Staging files**: After forced retrain, `credit_scorer_staging.pt` remains in `ml/weights/`. The monitoring dashboard will show a STAGING badge. -- **System restarts**: If the VM restarts, all running servers are killed. Files on disk are preserved — just restart the servers. +## Testing Tips + +- This is all shell-only testing — do NOT start a recording +- The inference server loads all 6 models in ~0.06s on CPU — if it hangs, check for port conflicts +- Fraud detection model is highly sensitive — the test input with international + new beneficiary + high amount at 2AM reliably produces >0.9 probability +- AML model with PEP flag + structuring score 0.9 consistently returns 1.0 suspicious probability +- The churn model uses attention mechanism — verify that `attention_weights` sums to ~1.0 (within 0.95-1.05 tolerance) +- If disk space is low, model weights are ~1.8MB total — not a concern