Skip to content

feat(k8s): deploy smartem-frontend across dev/staging/production#205

Merged
vredchenko merged 8 commits into
mainfrom
feat/k8s-smartem-frontend
May 22, 2026
Merged

feat(k8s): deploy smartem-frontend across dev/staging/production#205
vredchenko merged 8 commits into
mainfrom
feat/k8s-smartem-frontend

Conversation

@vredchenko
Copy link
Copy Markdown
Collaborator

@vredchenko vredchenko commented May 21, 2026

Summary

Phase B of the smartem-frontend k8s deploy work. Adds Deployment + Service + ConfigMap for the frontend image produced by smartem-frontend#94 (v0.2.0, now on GHCR), plus Ingress for staging/production. The image from Phase A is environment-agnostic — it ships a placeholder config.json with dev defaults and reverse-proxies /api/ to the backend service through its own nginx — so a single tag deploys to every environment with only ConfigMap and env-var differences.

Also bundles a small cleanup pass on dead Keycloak config that the backend stopped reading in smartem-decisions#285 (KEYCLOAK_AUTH_REQUIRED) and on this branch's earlier rename (KEYCLOAK_CLIENT_IDKEYCLOAK_ALLOWED_AZP allow-list), removes a stale root-level k8s/ingress.yaml orphan from the original December scaffolding, and refreshes the local-dev Keycloak docs to match the SPA's current runtime-config.json mechanism.

What lands per environment

Each k8s/environments/<env>/smartem-frontend.yaml carries three documents:

  • ConfigMap smartem-frontend-config — the runtime config.json (Keycloak URL/realm/clientId + authEnabled). The Deployment subPath-mounts this onto /usr/share/nginx/html/config.json, overriding the placeholder shipped in the image.
  • Deployment smartem-frontend — pulls ghcr.io/diamondlightsource/smartem-frontend:latest, sets BACKEND_HOST=smartem-http-api-service so the SPA pod's nginx proxies /api/ internally.
  • Service smartem-frontend-service — NodePort 30100 in development (next free slot in the 30000s range used by smartem-http-api/Keycloak/RabbitMQ/Postgres/Adminer), ClusterIP in staging and production.

Per-environment config.json values

Env keycloak.url authEnabled
development http://localhost:30090 (Keycloak mock NodePort, browser-reachable) true
staging https://identity-test.diamond.ac.uk true
production https://identity.diamond.ac.uk true

All three set realm: dls, clientId: SmartEM_User (post-rename in smartem-devtools#198), and authEnabled: true. The backend enforces Keycloak Bearer-token validation unconditionally on every non-exempt request since smartem-decisions#285 — there is no opt-out, so the SPA must always complete the login ceremony to talk to /api/.

Ingress

Staging and production only — development keeps the NodePort pattern, consistent with everything else in k8s/environments/development/. Each ingress.yaml routes a single host to smartem-frontend-service on port 80. The SPA's nginx handles /api/ proxying internally, so one route covers both the SPA and API traffic.

Env Hostname
staging smartem-staging.diamond.ac.uk
production smartem.diamond.ac.uk

dev-k8s.sh

The access-URLs section gains:

  • Keycloak (mock): http://localhost:30090 — was missed when smartem-devtools#198 landed
  • SmartEM Frontend: http://localhost:30100

Dead-config cleanup (KEYCLOAK_AUTH_REQUIRED, KEYCLOAK_CLIENT_ID)

The backend stopped reading KEYCLOAK_AUTH_REQUIRED in smartem-decisions#285 (auth is unconditional, no opt-out). This branch had already removed the key from k8s/environments/{development,staging}/configmap.yaml, but scripts/k8s/dev-k8s.sh was still reading it from .env, defaulting it to "false", and re-injecting it into smartem-config on every dev deploy — silently undoing the YAML cleanup. Same story for KEYCLOAK_CLIENT_ID, which the staging ConfigMap replaced with KEYCLOAK_ALLOWED_AZP="SmartEM_User,SmartEM_Agent".

Cleanup:

  • env-examples/.env.example.k8s.{development,staging}: drop both dead keys. Staging gains KEYCLOAK_ALLOWED_AZP=SmartEM_User,SmartEM_Agent to mirror the YAML; development leaves it commented out (any valid realm token accepted in local dev).
  • scripts/k8s/dev-k8s.sh: drop both vars from the override check, defaults, log line, and kubectl create configmap args. Append KEYCLOAK_ALLOWED_AZP only when explicitly set, preserving the "unset = any realm token" semantics.
  • Staging + production smartem-frontend.yaml comments now point readers at KEYCLOAK_ALLOWED_AZP instead of the deleted KEYCLOAK_CLIENT_ID.

Orphan ingress removal

k8s/ingress.yaml at the repo root was added in the initial k8s scaffolding (a5e88da, 2025-12-08) as an early sketch for exposing the backend HTTP API directly via ingress. It pre-dated the current SPA-pod-nginx-proxies-/api/ architecture, was not referenced by any kustomization, lived in the dev namespace (which uses NodePort, not ingress), and carried a placeholder host. Deleted — recoverable from git history if needed. The remaining backend-facing ingress use case (Windows agent connectivity) is tracked in #206.

Frontend-dev Keycloak doc refresh

The SPA stopped reading VITE_KEYCLOAK_* / VITE_AUTH_ENABLED build-time vars; apps/smartem/src/main.tsx fetches /config.json at boot and apps/smartem/src/auth/config.ts is the only consumer (no fallback to import.meta.env). The smartem-frontend repo updated its own apps/smartem/.env.example to reflect this, but the local-dev docs in this repo still instructed developers to edit a .env.local file with VITE_* keys that no code reads.

Same docs also lagged on other realm changes from this branch's feat: add SmartEM_Agent client, rename SmartEM to SmartEM_User commit and fix(keycloak-mock): allow NodePort SPA redirect URI:

  • Client name SmartEM predated the rename to SmartEM_User + SmartEM_Agent.
  • Redirect URIs listed only 5173 / 5174; the realm now also allows http://localhost:30100/*.
  • Seeded users list still mentioned valuser/valpass (not in the realm anymore — only devuser ships).
  • The "Disabling auth entirely" section described VITE_AUTH_ENABLED=false as a clean opt-out, but #285 made backend auth unconditional, so setting authEnabled:false only bypasses the SPA's login screen — every /api/ call still 401s. Reframed with the caveat (use with MSW VITE_ENABLE_MOCKS=true or for views that don't fetch from the backend).

Updated docs/development/local-keycloak.md and keycloak-mock/README.md to point at apps/smartem/public/config.json as the dev-time source of truth.

Local verification

Static rendering:

  • kubectl kustomize k8s/environments/development — 926 lines, clean
  • kubectl kustomize k8s/environments/staging — 579 lines, clean
  • kubectl kustomize k8s/environments/production — 577 lines, clean
  • grep -c KEYCLOAK_AUTH_REQUIRED in rendered output: 0 across all three envs
  • grep -c KEYCLOAK_CLIENT_ID in rendered output: 0 across all three envs
  • Rendered staging ingress resolves to smartem-staging.diamond.ac.uk; production unchanged at smartem.diamond.ac.uk
  • bash -n scripts/k8s/dev-k8s.sh: syntax OK
  • grep for VITE_KEYCLOAK_ / VITE_AUTH_ENABLED / valuser in the repo: 0 hits (was 9 before this branch)

End-to-end auth loop, driven against the live local k3s cluster (backend 0.1.1rc48.dev0+g5bc8e22b3.d20260521, post-#285):

  • curl http://localhost:30100/version — 200, returns the SPA version JSON
  • curl http://localhost:30100/config.json — 200, returns dev ConfigMap content with authEnabled: true, clientId: SmartEM_User, keycloak.url: http://localhost:30090
  • curl http://localhost:30100/api/health — 200 (the /health path is in EXEMPT_PATHS in smartem_backend/auth.py, so it bypasses Bearer validation; the SPA pod's nginx strips /api/ and proxies to the backend's /health)
  • curl http://localhost:30100/api/acquisitions (no token) — 401 with www-authenticate: Bearer and {"detail":"Missing or malformed Authorization header"} — confirms unconditional auth on a real route
  • Browser opens http://localhost:30100, SPA renders the auth-gate sign-in screen (smartem-frontend#05b2c9d); SIGN IN redirects to http://localhost:30090/realms/dls/protocol/openid-connect/auth?client_id=SmartEM_User&redirect_uri=… (code + PKCE flow); login as devuser/devpass returns to http://localhost:30100/, header shows "Dev User", and /acquisitions route fires an authenticated GET /api/acquisitions that returns 200

Out of scope / follow-ups

  • Agent backend-ingress connectivity — the Windows agent (running on EPU workstations, outside the cluster) needs its own connectivity story for staging/production. Today it uses the smartem-http-api NodePort in dev. Tracked in Plan agent connectivity to the backend API from outside the cluster #206 to land as a per-env file under k8s/environments/<env>/ once the route shape is decided.

Test plan

  • CI passes on this PR
  • curl http://localhost:30100/version returns {"frontend": "0.2.0", ...}
  • curl http://localhost:30100/config.json returns the dev ConfigMap content (authEnabled: true)
  • curl http://localhost:30100/api/health returns 200 (the /health path is in EXEMPT_PATHS and reaches the backend through the SPA pod's nginx)
  • curl http://localhost:30100/api/acquisitions without a token returns 401 (auth is unconditional on non-exempt routes)
  • Browser opens http://localhost:30100, SPA redirects to the Keycloak mock login, sign-in as devuser/devpass returns to the SPA, and /api/acquisitions succeeds with the bearer token
  • Post-merge: ./scripts/k8s/dev-k8s.sh down && ./scripts/k8s/dev-k8s.sh rolls the smartem-config ConfigMap so the stale KEYCLOAK_AUTH_REQUIRED/KEYCLOAK_CLIENT_ID keys (left over from an older dev-k8s.sh run) disappear from the deployed state — functionally a no-op since the backend ignores them, but worth doing for hygiene

Adds Kubernetes manifests for the smartem-frontend image
produced by smartem-frontend#94 (v0.2.0, now on GHCR). The
image is environment-agnostic: it ships a placeholder
config.json with dev defaults and proxies /api/ to the
backend service via its own nginx, deferring DNS to request
time.

What lands per environment (k8s/environments/<env>/
smartem-frontend.yaml):

- ConfigMap smartem-frontend-config carries the runtime
  config.json (Keycloak URL/realm/clientId + authEnabled).
  The Deployment subPath-mounts this onto
  /usr/share/nginx/html/config.json, overriding the
  placeholder shipped in the image.
- Deployment smartem-frontend pulls
  ghcr.io/diamondlightsource/smartem-frontend:latest, sets
  BACKEND_HOST=smartem-http-api-service so the SPA pod's
  nginx proxies /api/ to the backend service.
- Service smartem-frontend-service: NodePort 30100 for
  development (next free in the 30000s range; matches the
  existing smartem-http-api / Keycloak / RabbitMQ / Postgres
  / Adminer pattern), ClusterIP for staging and production.

Per-environment config.json values:

- development: keycloak.url http://localhost:30090 (the
  Keycloak mock NodePort - browser-reachable), authEnabled
  false to match KEYCLOAK_AUTH_REQUIRED=false on the dev
  backend. Flip to true to exercise the full login chain.
- staging: identity-test.diamond.ac.uk, authEnabled true.
- production: identity.diamond.ac.uk, authEnabled true.

Ingress (staging and production only - dev keeps the
NodePort pattern):

- k8s/environments/{staging,production}/ingress.yaml route
  a single host to smartem-frontend-service. The SPA pod's
  nginx handles /api/ proxying to the backend internally,
  so one route covers everything. Hostnames are placeholders
  (smartem-staging.example.com / smartem.example.com) and
  flagged TODO until real values are decided.

scripts/k8s/dev-k8s.sh: print the new
  http://localhost:30100 (frontend) and http://localhost:30090
  (Keycloak, missed when #198 landed) in the access-URLs
  section.

Verified locally: kubectl kustomize build is clean for all
three environments. End-to-end browser flow (SPA login,
authenticated /api call) will be exercised on the user's
local k3s after merge.
@github-actions github-actions Bot added the devops CI/CD, deployment, infrastructure, or tooling work label May 21, 2026
smartem-decisions#285 removed the KEYCLOAK_AUTH_REQUIRED flag from the
backend; Bearer-token validation now runs unconditionally. The dev
frontend ConfigMap needs authEnabled: true to match, otherwise the SPA
skips the login ceremony and every /api/ call comes back 401.
Fold the configmap and ingress cleanups that PR #205's frontend work
exposed:

- KEYCLOAK_AUTH_REQUIRED was set in dev (false) and staging (true)
  configmaps but the backend stopped reading it after
  smartem-decisions#285 (commit 2ec937d, "remove KEYCLOAK_AUTH_REQUIRED
  flag, enforce azp allow-list"). Auth is always enforced; the entry
  is dead config and misleading. Removed from both.

- KEYCLOAK_CLIENT_ID="SmartEM" in both configmaps is also dead config
  for the backend. The backend's auth.py reads KEYCLOAK_ALLOWED_AZP
  (comma-separated list), not KEYCLOAK_CLIENT_ID. The SmartEM Agent
  reads KEYCLOAK_CLIENT_ID from its own local config file, not the
  cluster configmap. Removed from both backend configmaps.

- Staging gains KEYCLOAK_ALLOWED_AZP="SmartEM_User,SmartEM_Agent" so
  the azp allow-list is actually populated (was the intent of the old
  KEYCLOAK_CLIENT_ID line; now expressed in the var the backend
  reads). Dev stays permissive — comment documents the env var if
  someone wants to restrict.

- production/ingress.yaml host changes from the smartem.example.com
  placeholder to the real smartem.diamond.ac.uk. Staging's host
  remains a placeholder pending the real value.
nginx's explicit `resolver` directive in the SPA image doesn't consult
/etc/resolv.conf's search list, so the short name `smartem-http-api-service`
returns NXDOMAIN and the SPA's /api/ proxy 502s. Switch to the in-cluster
FQDN per environment namespace.
The SmartEM_User client only listed Vite dev ports (5173/5174) in
redirectUris/webOrigins. The k8s dev deploy serves the SPA at NodePort
30100, so Keycloak rejected the auth flow. Add 30100 alongside.

The same realm file is mounted by both the kustomize ConfigMap (k3s
local dev) and keycloak-mock/docker-compose.yml (frontend-only dev) -
single source of truth, no mirroring needed.
…re KEYCLOAK_ALLOWED_AZP

smartem-decisions#285 made backend auth unconditional and removed the
KEYCLOAK_AUTH_REQUIRED env var entirely. KEYCLOAK_CLIENT_ID was likewise
superseded by KEYCLOAK_ALLOWED_AZP (the azp allow-list) when the
backend ConfigMap was cleaned up earlier in this branch.

The YAMLs already dropped both keys, but dev-k8s.sh kept reading them
from .env and re-injecting them into smartem-config via
`kubectl create configmap --from-literal=...`, undoing the YAML cleanup
on every dev deploy. The env-examples advertised the same dead knobs.

Now:

- env-examples/.env.example.k8s.{development,staging}: drop
  KEYCLOAK_AUTH_REQUIRED and KEYCLOAK_CLIENT_ID. Staging gains
  KEYCLOAK_ALLOWED_AZP=SmartEM_User,SmartEM_Agent to mirror the YAML;
  development leaves it commented (any valid realm token accepted).
- scripts/k8s/dev-k8s.sh: drop both vars from the override check,
  defaults, log line, and `kubectl create configmap` args. Append
  KEYCLOAK_ALLOWED_AZP only when explicitly set (preserves the
  "unset = any realm token" semantics).
- k8s/environments/{staging,production}/smartem-frontend.yaml: comment
  now points readers at KEYCLOAK_ALLOWED_AZP instead of the deleted
  KEYCLOAK_CLIENT_ID.

Local verification:

- kubectl kustomize k8s/environments/development - 926 lines, clean
- kubectl kustomize k8s/environments/staging - 579 lines, clean
- kubectl kustomize k8s/environments/production - 577 lines, clean
- grep KEYCLOAK_AUTH_REQUIRED in rendered output: 0 hits all three
- grep KEYCLOAK_CLIENT_ID in rendered output: 0 hits all three
- bash -n scripts/k8s/dev-k8s.sh: OK
- k8s/environments/staging/ingress.yaml: smartem-staging.example.com ->
  smartem-staging.diamond.ac.uk. Production was already
  smartem.diamond.ac.uk (since becc1cd), so all hostnames are now real.

- k8s/ingress.yaml: deleted. Added in the initial k8s scaffolding
  (a5e88da, 2025-12-08) as an early sketch for exposing the backend
  HTTP API directly via ingress. Pre-dates the current architecture
  where the SPA pod's nginx reverse-proxies /api/ to the backend
  internally, so a single frontend ingress is sufficient for browser
  traffic. Was not referenced by any kustomization, lived in the dev
  namespace (which uses NodePort, not ingress), and carried a
  placeholder host. The remaining backend-facing ingress use case is
  the Windows agent's connectivity story, which will follow the
  existing per-env pattern under k8s/environments/<env>/ rather than
  this root-level file. Recoverable from git history if needed.

Local verification:
- kubectl kustomize k8s/environments/{development,staging,production}
  renders clean (926/579/577 lines)
- staging ingress now resolves to smartem-staging.diamond.ac.uk;
  production unchanged at smartem.diamond.ac.uk
…son + client rename

The smartem-frontend SPA stopped reading VITE_KEYCLOAK_* / VITE_AUTH_ENABLED
build-time vars some time ago - main.tsx now fetches /config.json at boot
and apps/smartem/src/auth/config.ts is the only consumer, with no fallback
to import.meta.env. The smartem-frontend repo updated its own
apps/smartem/.env.example to reflect this, but the docs in this repo
(keycloak-mock/README.md, docs/development/local-keycloak.md) still
instructed developers to edit a .env.local file with VITE_* keys that no
code reads.

Other staleness in the same docs:

- Client name "SmartEM" predates the rename in smartem-devtools#198 to
  SmartEM_User (browser) + SmartEM_Agent (Windows agent). Both docs only
  mentioned the old single client.
- Redirect URIs listed 5173 + 5174 only - the realm now also allows
  http://localhost:30100/* for the SPA pod's k3s NodePort (added in this
  branch's earlier `fix(keycloak-mock): allow NodePort SPA redirect URI`).
- Seeded users list mentioned valuser/valpass; the realm only ships
  devuser now.
- The "Disabling auth entirely" section described VITE_AUTH_ENABLED=false
  as a clean opt-out, but smartem-decisions#285 made backend auth
  unconditional, so setting authEnabled:false in config.json just bypasses
  the SPA's login screen - every /api/ call still 401s. Only useful when
  paired with MSW (VITE_ENABLE_MOCKS=true) or for views that don't fetch
  from the backend. Reframed to call out the caveat explicitly.

Updated both docs to reflect the runtime-config.json mechanism (edit
apps/smartem/public/config.json for `npm run dev:smartem`; the k8s
ConfigMap mount overrides it in deploys), the two-client realm layout,
the full redirect URI set, the actual seeded user, and the auth-disable
caveat. No code or manifest changes.
@github-actions github-actions Bot added the documentation Improvements or additions to project documentation label May 22, 2026
@vredchenko vredchenko merged commit 9e27873 into main May 22, 2026
7 checks passed
vredchenko added a commit that referenced this pull request May 22, 2026
Fold the configmap and ingress cleanups that PR #205's frontend work
exposed:

- KEYCLOAK_AUTH_REQUIRED was set in dev (false) and staging (true)
  configmaps but the backend stopped reading it after
  smartem-decisions#285 (commit 2ec937d, "remove KEYCLOAK_AUTH_REQUIRED
  flag, enforce azp allow-list"). Auth is always enforced; the entry
  is dead config and misleading. Removed from both.

- KEYCLOAK_CLIENT_ID="SmartEM" in both configmaps is also dead config
  for the backend. The backend's auth.py reads KEYCLOAK_ALLOWED_AZP
  (comma-separated list), not KEYCLOAK_CLIENT_ID. The SmartEM Agent
  reads KEYCLOAK_CLIENT_ID from its own local config file, not the
  cluster configmap. Removed from both backend configmaps.

- Staging gains KEYCLOAK_ALLOWED_AZP="SmartEM_User,SmartEM_Agent" so
  the azp allow-list is actually populated (was the intent of the old
  KEYCLOAK_CLIENT_ID line; now expressed in the var the backend
  reads). Dev stays permissive — comment documents the env var if
  someone wants to restrict.

- production/ingress.yaml host changes from the smartem.example.com
  placeholder to the real smartem.diamond.ac.uk. Staging's host
  remains a placeholder pending the real value.
@vredchenko vredchenko deleted the feat/k8s-smartem-frontend branch May 22, 2026 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops CI/CD, deployment, infrastructure, or tooling work documentation Improvements or additions to project documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant