feat(k8s): deploy smartem-frontend across dev/staging/production by vredchenko · Pull Request #205 · DiamondLightSource/smartem-devtools

vredchenko · 2026-05-21T11:05:05Z

Summary

Phase B of the smartem-frontend k8s deploy work. Adds Deployment + Service + ConfigMap for the frontend image produced by smartem-frontend#94 (v0.2.0, now on GHCR), plus Ingress for staging/production. The image from Phase A is environment-agnostic — it ships a placeholder config.json with dev defaults and reverse-proxies /api/ to the backend service through its own nginx — so a single tag deploys to every environment with only ConfigMap and env-var differences.

Also bundles a small cleanup pass on dead Keycloak config that the backend stopped reading in smartem-decisions#285 (KEYCLOAK_AUTH_REQUIRED) and on this branch's earlier rename (KEYCLOAK_CLIENT_ID → KEYCLOAK_ALLOWED_AZP allow-list), removes a stale root-level k8s/ingress.yaml orphan from the original December scaffolding, and refreshes the local-dev Keycloak docs to match the SPA's current runtime-config.json mechanism.

What lands per environment

Each k8s/environments/<env>/smartem-frontend.yaml carries three documents:

ConfigMap smartem-frontend-config — the runtime config.json (Keycloak URL/realm/clientId + authEnabled). The Deployment subPath-mounts this onto /usr/share/nginx/html/config.json, overriding the placeholder shipped in the image.
Deployment smartem-frontend — pulls ghcr.io/diamondlightsource/smartem-frontend:latest, sets BACKEND_HOST=smartem-http-api-service so the SPA pod's nginx proxies /api/ internally.
Service smartem-frontend-service — NodePort 30100 in development (next free slot in the 30000s range used by smartem-http-api/Keycloak/RabbitMQ/Postgres/Adminer), ClusterIP in staging and production.

Per-environment config.json values

Env	`keycloak.url`	`authEnabled`
development	`http://localhost:30090` (Keycloak mock NodePort, browser-reachable)	`true`
staging	`https://identity-test.diamond.ac.uk`	`true`
production	`https://identity.diamond.ac.uk`	`true`

All three set realm: dls, clientId: SmartEM_User (post-rename in smartem-devtools#198), and authEnabled: true. The backend enforces Keycloak Bearer-token validation unconditionally on every non-exempt request since smartem-decisions#285 — there is no opt-out, so the SPA must always complete the login ceremony to talk to /api/.

Ingress

Staging and production only — development keeps the NodePort pattern, consistent with everything else in k8s/environments/development/. Each ingress.yaml routes a single host to smartem-frontend-service on port 80. The SPA's nginx handles /api/ proxying internally, so one route covers both the SPA and API traffic.

Env	Hostname
staging	`smartem-staging.diamond.ac.uk`
production	`smartem.diamond.ac.uk`

dev-k8s.sh

The access-URLs section gains:

Keycloak (mock): http://localhost:30090 — was missed when smartem-devtools#198 landed
SmartEM Frontend: http://localhost:30100

Dead-config cleanup (`KEYCLOAK_AUTH_REQUIRED`, `KEYCLOAK_CLIENT_ID`)

The backend stopped reading KEYCLOAK_AUTH_REQUIRED in smartem-decisions#285 (auth is unconditional, no opt-out). This branch had already removed the key from k8s/environments/{development,staging}/configmap.yaml, but scripts/k8s/dev-k8s.sh was still reading it from .env, defaulting it to "false", and re-injecting it into smartem-config on every dev deploy — silently undoing the YAML cleanup. Same story for KEYCLOAK_CLIENT_ID, which the staging ConfigMap replaced with KEYCLOAK_ALLOWED_AZP="SmartEM_User,SmartEM_Agent".

Cleanup:

env-examples/.env.example.k8s.{development,staging}: drop both dead keys. Staging gains KEYCLOAK_ALLOWED_AZP=SmartEM_User,SmartEM_Agent to mirror the YAML; development leaves it commented out (any valid realm token accepted in local dev).
scripts/k8s/dev-k8s.sh: drop both vars from the override check, defaults, log line, and kubectl create configmap args. Append KEYCLOAK_ALLOWED_AZP only when explicitly set, preserving the "unset = any realm token" semantics.
Staging + production smartem-frontend.yaml comments now point readers at KEYCLOAK_ALLOWED_AZP instead of the deleted KEYCLOAK_CLIENT_ID.

Orphan ingress removal

k8s/ingress.yaml at the repo root was added in the initial k8s scaffolding (a5e88da, 2025-12-08) as an early sketch for exposing the backend HTTP API directly via ingress. It pre-dated the current SPA-pod-nginx-proxies-/api/ architecture, was not referenced by any kustomization, lived in the dev namespace (which uses NodePort, not ingress), and carried a placeholder host. Deleted — recoverable from git history if needed. The remaining backend-facing ingress use case (Windows agent connectivity) is tracked in #206.

Frontend-dev Keycloak doc refresh

The SPA stopped reading VITE_KEYCLOAK_* / VITE_AUTH_ENABLED build-time vars; apps/smartem/src/main.tsx fetches /config.json at boot and apps/smartem/src/auth/config.ts is the only consumer (no fallback to import.meta.env). The smartem-frontend repo updated its own apps/smartem/.env.example to reflect this, but the local-dev docs in this repo still instructed developers to edit a .env.local file with VITE_* keys that no code reads.

Same docs also lagged on other realm changes from this branch's feat: add SmartEM_Agent client, rename SmartEM to SmartEM_User commit and fix(keycloak-mock): allow NodePort SPA redirect URI:

Client name SmartEM predated the rename to SmartEM_User + SmartEM_Agent.
Redirect URIs listed only 5173 / 5174; the realm now also allows http://localhost:30100/*.
Seeded users list still mentioned valuser/valpass (not in the realm anymore — only devuser ships).
The "Disabling auth entirely" section described VITE_AUTH_ENABLED=false as a clean opt-out, but #285 made backend auth unconditional, so setting authEnabled:false only bypasses the SPA's login screen — every /api/ call still 401s. Reframed with the caveat (use with MSW VITE_ENABLE_MOCKS=true or for views that don't fetch from the backend).

Updated docs/development/local-keycloak.md and keycloak-mock/README.md to point at apps/smartem/public/config.json as the dev-time source of truth.

Local verification

Static rendering:

kubectl kustomize k8s/environments/development — 926 lines, clean
kubectl kustomize k8s/environments/staging — 579 lines, clean
kubectl kustomize k8s/environments/production — 577 lines, clean
grep -c KEYCLOAK_AUTH_REQUIRED in rendered output: 0 across all three envs
grep -c KEYCLOAK_CLIENT_ID in rendered output: 0 across all three envs
Rendered staging ingress resolves to smartem-staging.diamond.ac.uk; production unchanged at smartem.diamond.ac.uk
bash -n scripts/k8s/dev-k8s.sh: syntax OK
grep for VITE_KEYCLOAK_ / VITE_AUTH_ENABLED / valuser in the repo: 0 hits (was 9 before this branch)

End-to-end auth loop, driven against the live local k3s cluster (backend 0.1.1rc48.dev0+g5bc8e22b3.d20260521, post-#285):

curl http://localhost:30100/version — 200, returns the SPA version JSON
curl http://localhost:30100/config.json — 200, returns dev ConfigMap content with authEnabled: true, clientId: SmartEM_User, keycloak.url: http://localhost:30090
curl http://localhost:30100/api/health — 200 (the /health path is in EXEMPT_PATHS in smartem_backend/auth.py, so it bypasses Bearer validation; the SPA pod's nginx strips /api/ and proxies to the backend's /health)
curl http://localhost:30100/api/acquisitions (no token) — 401 with www-authenticate: Bearer and {"detail":"Missing or malformed Authorization header"} — confirms unconditional auth on a real route
Browser opens http://localhost:30100, SPA renders the auth-gate sign-in screen (smartem-frontend#05b2c9d); SIGN IN redirects to http://localhost:30090/realms/dls/protocol/openid-connect/auth?client_id=SmartEM_User&redirect_uri=… (code + PKCE flow); login as devuser/devpass returns to http://localhost:30100/, header shows "Dev User", and /acquisitions route fires an authenticated GET /api/acquisitions that returns 200

Out of scope / follow-ups

Agent backend-ingress connectivity — the Windows agent (running on EPU workstations, outside the cluster) needs its own connectivity story for staging/production. Today it uses the smartem-http-api NodePort in dev. Tracked in Plan agent connectivity to the backend API from outside the cluster #206 to land as a per-env file under k8s/environments/<env>/ once the route shape is decided.

Test plan

CI passes on this PR
curl http://localhost:30100/version returns {"frontend": "0.2.0", ...}
curl http://localhost:30100/config.json returns the dev ConfigMap content (authEnabled: true)
curl http://localhost:30100/api/health returns 200 (the /health path is in EXEMPT_PATHS and reaches the backend through the SPA pod's nginx)
curl http://localhost:30100/api/acquisitions without a token returns 401 (auth is unconditional on non-exempt routes)
Browser opens http://localhost:30100, SPA redirects to the Keycloak mock login, sign-in as devuser/devpass returns to the SPA, and /api/acquisitions succeeds with the bearer token
Post-merge: ./scripts/k8s/dev-k8s.sh down && ./scripts/k8s/dev-k8s.sh rolls the smartem-config ConfigMap so the stale KEYCLOAK_AUTH_REQUIRED/KEYCLOAK_CLIENT_ID keys (left over from an older dev-k8s.sh run) disappear from the deployed state — functionally a no-op since the backend ignores them, but worth doing for hygiene

Adds Kubernetes manifests for the smartem-frontend image produced by smartem-frontend#94 (v0.2.0, now on GHCR). The image is environment-agnostic: it ships a placeholder config.json with dev defaults and proxies /api/ to the backend service via its own nginx, deferring DNS to request time. What lands per environment (k8s/environments/<env>/ smartem-frontend.yaml): - ConfigMap smartem-frontend-config carries the runtime config.json (Keycloak URL/realm/clientId + authEnabled). The Deployment subPath-mounts this onto /usr/share/nginx/html/config.json, overriding the placeholder shipped in the image. - Deployment smartem-frontend pulls ghcr.io/diamondlightsource/smartem-frontend:latest, sets BACKEND_HOST=smartem-http-api-service so the SPA pod's nginx proxies /api/ to the backend service. - Service smartem-frontend-service: NodePort 30100 for development (next free in the 30000s range; matches the existing smartem-http-api / Keycloak / RabbitMQ / Postgres / Adminer pattern), ClusterIP for staging and production. Per-environment config.json values: - development: keycloak.url http://localhost:30090 (the Keycloak mock NodePort - browser-reachable), authEnabled false to match KEYCLOAK_AUTH_REQUIRED=false on the dev backend. Flip to true to exercise the full login chain. - staging: identity-test.diamond.ac.uk, authEnabled true. - production: identity.diamond.ac.uk, authEnabled true. Ingress (staging and production only - dev keeps the NodePort pattern): - k8s/environments/{staging,production}/ingress.yaml route a single host to smartem-frontend-service. The SPA pod's nginx handles /api/ proxying to the backend internally, so one route covers everything. Hostnames are placeholders (smartem-staging.example.com / smartem.example.com) and flagged TODO until real values are decided. scripts/k8s/dev-k8s.sh: print the new http://localhost:30100 (frontend) and http://localhost:30090 (Keycloak, missed when #198 landed) in the access-URLs section. Verified locally: kubectl kustomize build is clean for all three environments. End-to-end browser flow (SPA login, authenticated /api call) will be exercised on the user's local k3s after merge.

smartem-decisions#285 removed the KEYCLOAK_AUTH_REQUIRED flag from the backend; Bearer-token validation now runs unconditionally. The dev frontend ConfigMap needs authEnabled: true to match, otherwise the SPA skips the login ceremony and every /api/ call comes back 401.

Fold the configmap and ingress cleanups that PR #205's frontend work exposed: - KEYCLOAK_AUTH_REQUIRED was set in dev (false) and staging (true) configmaps but the backend stopped reading it after smartem-decisions#285 (commit 2ec937d, "remove KEYCLOAK_AUTH_REQUIRED flag, enforce azp allow-list"). Auth is always enforced; the entry is dead config and misleading. Removed from both. - KEYCLOAK_CLIENT_ID="SmartEM" in both configmaps is also dead config for the backend. The backend's auth.py reads KEYCLOAK_ALLOWED_AZP (comma-separated list), not KEYCLOAK_CLIENT_ID. The SmartEM Agent reads KEYCLOAK_CLIENT_ID from its own local config file, not the cluster configmap. Removed from both backend configmaps. - Staging gains KEYCLOAK_ALLOWED_AZP="SmartEM_User,SmartEM_Agent" so the azp allow-list is actually populated (was the intent of the old KEYCLOAK_CLIENT_ID line; now expressed in the var the backend reads). Dev stays permissive — comment documents the env var if someone wants to restrict. - production/ingress.yaml host changes from the smartem.example.com placeholder to the real smartem.diamond.ac.uk. Staging's host remains a placeholder pending the real value.

nginx's explicit `resolver` directive in the SPA image doesn't consult /etc/resolv.conf's search list, so the short name `smartem-http-api-service` returns NXDOMAIN and the SPA's /api/ proxy 502s. Switch to the in-cluster FQDN per environment namespace.

The SmartEM_User client only listed Vite dev ports (5173/5174) in redirectUris/webOrigins. The k8s dev deploy serves the SPA at NodePort 30100, so Keycloak rejected the auth flow. Add 30100 alongside. The same realm file is mounted by both the kustomize ConfigMap (k3s local dev) and keycloak-mock/docker-compose.yml (frontend-only dev) - single source of truth, no mirroring needed.

…re KEYCLOAK_ALLOWED_AZP smartem-decisions#285 made backend auth unconditional and removed the KEYCLOAK_AUTH_REQUIRED env var entirely. KEYCLOAK_CLIENT_ID was likewise superseded by KEYCLOAK_ALLOWED_AZP (the azp allow-list) when the backend ConfigMap was cleaned up earlier in this branch. The YAMLs already dropped both keys, but dev-k8s.sh kept reading them from .env and re-injecting them into smartem-config via `kubectl create configmap --from-literal=...`, undoing the YAML cleanup on every dev deploy. The env-examples advertised the same dead knobs. Now: - env-examples/.env.example.k8s.{development,staging}: drop KEYCLOAK_AUTH_REQUIRED and KEYCLOAK_CLIENT_ID. Staging gains KEYCLOAK_ALLOWED_AZP=SmartEM_User,SmartEM_Agent to mirror the YAML; development leaves it commented (any valid realm token accepted). - scripts/k8s/dev-k8s.sh: drop both vars from the override check, defaults, log line, and `kubectl create configmap` args. Append KEYCLOAK_ALLOWED_AZP only when explicitly set (preserves the "unset = any realm token" semantics). - k8s/environments/{staging,production}/smartem-frontend.yaml: comment now points readers at KEYCLOAK_ALLOWED_AZP instead of the deleted KEYCLOAK_CLIENT_ID. Local verification: - kubectl kustomize k8s/environments/development - 926 lines, clean - kubectl kustomize k8s/environments/staging - 579 lines, clean - kubectl kustomize k8s/environments/production - 577 lines, clean - grep KEYCLOAK_AUTH_REQUIRED in rendered output: 0 hits all three - grep KEYCLOAK_CLIENT_ID in rendered output: 0 hits all three - bash -n scripts/k8s/dev-k8s.sh: OK

- k8s/environments/staging/ingress.yaml: smartem-staging.example.com -> smartem-staging.diamond.ac.uk. Production was already smartem.diamond.ac.uk (since becc1cd), so all hostnames are now real. - k8s/ingress.yaml: deleted. Added in the initial k8s scaffolding (a5e88da, 2025-12-08) as an early sketch for exposing the backend HTTP API directly via ingress. Pre-dates the current architecture where the SPA pod's nginx reverse-proxies /api/ to the backend internally, so a single frontend ingress is sufficient for browser traffic. Was not referenced by any kustomization, lived in the dev namespace (which uses NodePort, not ingress), and carried a placeholder host. The remaining backend-facing ingress use case is the Windows agent's connectivity story, which will follow the existing per-env pattern under k8s/environments/<env>/ rather than this root-level file. Recoverable from git history if needed. Local verification: - kubectl kustomize k8s/environments/{development,staging,production} renders clean (926/579/577 lines) - staging ingress now resolves to smartem-staging.diamond.ac.uk; production unchanged at smartem.diamond.ac.uk

…son + client rename The smartem-frontend SPA stopped reading VITE_KEYCLOAK_* / VITE_AUTH_ENABLED build-time vars some time ago - main.tsx now fetches /config.json at boot and apps/smartem/src/auth/config.ts is the only consumer, with no fallback to import.meta.env. The smartem-frontend repo updated its own apps/smartem/.env.example to reflect this, but the docs in this repo (keycloak-mock/README.md, docs/development/local-keycloak.md) still instructed developers to edit a .env.local file with VITE_* keys that no code reads. Other staleness in the same docs: - Client name "SmartEM" predates the rename in smartem-devtools#198 to SmartEM_User (browser) + SmartEM_Agent (Windows agent). Both docs only mentioned the old single client. - Redirect URIs listed 5173 + 5174 only - the realm now also allows http://localhost:30100/* for the SPA pod's k3s NodePort (added in this branch's earlier `fix(keycloak-mock): allow NodePort SPA redirect URI`). - Seeded users list mentioned valuser/valpass; the realm only ships devuser now. - The "Disabling auth entirely" section described VITE_AUTH_ENABLED=false as a clean opt-out, but smartem-decisions#285 made backend auth unconditional, so setting authEnabled:false in config.json just bypasses the SPA's login screen - every /api/ call still 401s. Only useful when paired with MSW (VITE_ENABLE_MOCKS=true) or for views that don't fetch from the backend. Reframed to call out the caveat explicitly. Updated both docs to reflect the runtime-config.json mechanism (edit apps/smartem/public/config.json for `npm run dev:smartem`; the k8s ConfigMap mount overrides it in deploys), the two-client realm layout, the full redirect URI set, the actual seeded user, and the auth-disable caveat. No code or manifest changes.

Fold the configmap and ingress cleanups that PR #205's frontend work exposed: - KEYCLOAK_AUTH_REQUIRED was set in dev (false) and staging (true) configmaps but the backend stopped reading it after smartem-decisions#285 (commit 2ec937d, "remove KEYCLOAK_AUTH_REQUIRED flag, enforce azp allow-list"). Auth is always enforced; the entry is dead config and misleading. Removed from both. - KEYCLOAK_CLIENT_ID="SmartEM" in both configmaps is also dead config for the backend. The backend's auth.py reads KEYCLOAK_ALLOWED_AZP (comma-separated list), not KEYCLOAK_CLIENT_ID. The SmartEM Agent reads KEYCLOAK_CLIENT_ID from its own local config file, not the cluster configmap. Removed from both backend configmaps. - Staging gains KEYCLOAK_ALLOWED_AZP="SmartEM_User,SmartEM_Agent" so the azp allow-list is actually populated (was the intent of the old KEYCLOAK_CLIENT_ID line; now expressed in the var the backend reads). Dev stays permissive — comment documents the env var if someone wants to restrict. - production/ingress.yaml host changes from the smartem.example.com placeholder to the real smartem.diamond.ac.uk. Staging's host remains a placeholder pending the real value.

github-actions Bot added the devops CI/CD, deployment, infrastructure, or tooling work label May 21, 2026

vredchenko added 7 commits May 21, 2026 12:14

github-actions Bot added the documentation Improvements or additions to project documentation label May 22, 2026

vredchenko mentioned this pull request May 22, 2026

Plan agent connectivity to the backend API from outside the cluster #206

Open

vredchenko merged commit 9e27873 into main May 22, 2026
7 checks passed

vredchenko deleted the feat/k8s-smartem-frontend branch May 22, 2026 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(k8s): deploy smartem-frontend across dev/staging/production#205

feat(k8s): deploy smartem-frontend across dev/staging/production#205
vredchenko merged 8 commits into
mainfrom
feat/k8s-smartem-frontend

vredchenko commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vredchenko commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What lands per environment

Per-environment config.json values

Ingress

dev-k8s.sh

Dead-config cleanup (KEYCLOAK_AUTH_REQUIRED, KEYCLOAK_CLIENT_ID)

Orphan ingress removal

Frontend-dev Keycloak doc refresh

Local verification

Out of scope / follow-ups

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vredchenko commented May 21, 2026 •

edited

Loading

Dead-config cleanup (`KEYCLOAK_AUTH_REQUIRED`, `KEYCLOAK_CLIENT_ID`)