Conversation
* Add Matomo Tag Manager as third analytics tracking mode Adds Matomo Tag Manager support alongside existing Google Analytics and Piwik Pro integrations. Includes settings.json configuration (url + tag), build-time script injection via hook-analytics.py, Klaro GDPR consent banner integration, and runtime consent granting via MTM data layer API. https://claude.ai/code/session_0165AXHkmRZ6bx23n7Tbyz8h * Fix Matomo Tag Manager snippet to match official docs - Accept full container JS URL instead of separate url + tag fields, supporting both self-hosted and Matomo Cloud URL patterns - Match the official snippet: var _mtm alias, _mtm.push shorthand - Remove redundant type="text/javascript" attribute - Remove unused "tag" field from settings.json https://claude.ai/code/session_0165AXHkmRZ6bx23n7Tbyz8h * Split Matomo config into base url + tag fields Separate the Matomo setting into `url` (base URL, e.g. https://cdn.matomo.cloud/openms.matomo.cloud) and `tag` (container ID, e.g. yDGK8bfY), consistent with how other providers use a tag field. The script constructs the full path: {url}/container_{tag}.js https://claude.ai/code/session_0165AXHkmRZ6bx23n7Tbyz8h * install matomo tag --------- Co-authored-by: Claude <noreply@anthropic.com>
* Initial plan * fix: remove duplicate address entry in config.toml Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com>
…til.SameFileError (#349) * Initial plan * Fix integration test failures: restore sys.modules mocks, handle SameFileError, update CI workflow Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com> * Remove unnecessary pyopenms mock from test_topp_workflow_parameter.py, simplify test_parameter_presets.py Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com> * Fix Windows build: correct site-packages path in cleanup step Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com>
…(#351) On Windows, 0.0.0.0 is not a valid connect address — the browser fails to open http://0.0.0.0:8501. By removing the address entry from the bundled .streamlit/config.toml, Streamlit defaults to localhost, which works correctly for local deployments. Docker deployments are unaffected as they pass --server.address 0.0.0.0 on the command line. https://claude.ai/code/session_016amsLCZeFogTksmtk1geb5 Co-authored-by: Claude <noreply@anthropic.com>
* Add CLAUDE.md and Claude Code skills for webapp development Adds project documentation (CLAUDE.md) and 6 skills to help developers scaffold and extend OpenMS web applications built from this template: - /create-page: add a new Streamlit page with proper registration - /create-workflow: scaffold a full TOPP workflow (class + 4 pages) - /add-python-tool: add a custom Python analysis script with auto-UI - /add-presets: add parameter presets for workflows - /configure-deployment: set up Docker and CI/CD for a new app - /add-visualization: add pyopenms-viz or OpenMS-Insight visualizations https://claude.ai/code/session_01WYotmLfqRtB8WJXj1Eosiz * Strengthen MS domain context in CLAUDE.md and skills Make it clear to Claude that this is THE framework for building mass spectrometry web applications for proteomics and metabolomics research. Add domain-specific context about MS data types, TOPP tool pipelines, and scientific visualization needs. https://claude.ai/code/session_01WYotmLfqRtB8WJXj1Eosiz --------- Co-authored-by: Claude <noreply@anthropic.com>
* Add Kubernetes manifests and CI workflows for de.NBI migration Decompose the monolithic Docker container into Kubernetes workloads: - Streamlit Deployment with health probes and session affinity - Redis Deployment + Service for job queue - RQ Worker Deployment for background workflows - CronJob for workspace cleanup - Ingress with WebSocket support and cookie-based sticky sessions - Shared PVC (ReadWriteMany) for workspace data - ConfigMap for runtime configuration (replaces build-time settings) - Kustomize base + template-app overlay for multi-app deployment Code changes: - Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml - Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py CI/CD: - Add build-and-push-image.yml to push Docker images to ghcr.io - Add k8s-manifests-ci.yml for manifest validation and kind integration tests https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix kubeconform validation to skip kustomization.yaml kustomization.yaml is a Kustomize config file, not a standard K8s resource, so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add matrix strategy to test both Dockerfiles in integration tests The integration-test job now uses a matrix with Dockerfile_simple and Dockerfile. Each matrix entry checks if its Dockerfile exists before running — all steps are guarded with an `if` condition so they skip gracefully when a Dockerfile is absent. This allows downstream forks that only have one Dockerfile to pass CI without errors. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Adapt K8s base manifests for de.NBI Cinder CSI storage - Switch workspace PVC from ReadWriteMany to ReadWriteOnce with cinder-csi storage class (required by de.NBI KKP cluster) - Increase PVC storage to 500Gi - Add namespace: openms to kustomization.yaml - Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU) so all workspace-mounting pods fit on a single node https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add pod affinity rules to co-locate all workspace pods on same node The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which requires all pods mounting it to run on the same node. Without explicit affinity rules, the scheduler was failing silently, leaving pods in Pending state with no events. Adds a `volume-group: workspaces` label and podAffinity with requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment, rq-worker deployment, and cleanup cronjob. This ensures the scheduler explicitly co-locates all workspace-consuming pods on the same node. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: wait for ingress-nginx admission webhook before deploying The controller pod being Ready doesn't guarantee the admission webhook service is accepting connections. Add a polling loop that waits for the webhook endpoint to have an IP assigned before applying the Ingress resource, preventing "connection refused" errors during kustomize apply. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: add -n openms namespace to integration test steps The kustomize overlay deploys into the openms namespace, but the verification steps (Redis wait, Redis ping, deployment checks) were querying the default namespace, causing "no matching resources found". https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: retry kustomize deploy for webhook readiness Replace the unreliable endpoint-IP polling with a retry loop on kubectl apply (up to 5 attempts with backoff). This handles the race where the ingress-nginx admission webhook has an endpoint IP but isn't yet accepting TCP connections. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ --------- Co-authored-by: Claude <noreply@anthropic.com>
* Add Kubernetes manifests and CI workflows for de.NBI migration Decompose the monolithic Docker container into Kubernetes workloads: - Streamlit Deployment with health probes and session affinity - Redis Deployment + Service for job queue - RQ Worker Deployment for background workflows - CronJob for workspace cleanup - Ingress with WebSocket support and cookie-based sticky sessions - Shared PVC (ReadWriteMany) for workspace data - ConfigMap for runtime configuration (replaces build-time settings) - Kustomize base + template-app overlay for multi-app deployment Code changes: - Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml - Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py CI/CD: - Add build-and-push-image.yml to push Docker images to ghcr.io - Add k8s-manifests-ci.yml for manifest validation and kind integration tests https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix kubeconform validation to skip kustomization.yaml kustomization.yaml is a Kustomize config file, not a standard K8s resource, so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add matrix strategy to test both Dockerfiles in integration tests The integration-test job now uses a matrix with Dockerfile_simple and Dockerfile. Each matrix entry checks if its Dockerfile exists before running — all steps are guarded with an `if` condition so they skip gracefully when a Dockerfile is absent. This allows downstream forks that only have one Dockerfile to pass CI without errors. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Adapt K8s base manifests for de.NBI Cinder CSI storage - Switch workspace PVC from ReadWriteMany to ReadWriteOnce with cinder-csi storage class (required by de.NBI KKP cluster) - Increase PVC storage to 500Gi - Add namespace: openms to kustomization.yaml - Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU) so all workspace-mounting pods fit on a single node https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add pod affinity rules to co-locate all workspace pods on same node The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which requires all pods mounting it to run on the same node. Without explicit affinity rules, the scheduler was failing silently, leaving pods in Pending state with no events. Adds a `volume-group: workspaces` label and podAffinity with requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment, rq-worker deployment, and cleanup cronjob. This ensures the scheduler explicitly co-locates all workspace-consuming pods on the same node. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: wait for ingress-nginx admission webhook before deploying The controller pod being Ready doesn't guarantee the admission webhook service is accepting connections. Add a polling loop that waits for the webhook endpoint to have an IP assigned before applying the Ingress resource, preventing "connection refused" errors during kustomize apply. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: add -n openms namespace to integration test steps The kustomize overlay deploys into the openms namespace, but the verification steps (Redis wait, Redis ping, deployment checks) were querying the default namespace, causing "no matching resources found". https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: retry kustomize deploy for webhook readiness Replace the unreliable endpoint-IP polling with a retry loop on kubectl apply (up to 5 attempts with backoff). This handles the race where the ingress-nginx admission webhook has an endpoint IP but isn't yet accepting TCP connections. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix REDIS_URL to use prefixed service name in overlay Kustomize namePrefix renames the Redis service to template-app-redis, but the REDIS_URL env var in streamlit and rq-worker deployments still referenced the unprefixed name "redis", causing the rq-worker to CrashLoopBackOff with "Name or service not known". Add JSON patches in the overlay to set the correct prefixed hostname. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add Traefik IngressRoute for direct LB IP access The cluster uses Traefik, not nginx, so the nginx Ingress annotations are ignored. Add a Traefik IngressRoute with PathPrefix(/) catch-all routing and sticky session cookie for Streamlit session affinity. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: skip Traefik IngressRoute CRD in validation and integration tests kubeconform doesn't know the Traefik IngressRoute CRD schema, and the kind cluster in integration tests doesn't have Traefik installed. Skip the IngressRoute in kubeconform validation and filter it out with yq before applying to the kind cluster. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix IngressRoute service name for kustomize namePrefix Kustomize namePrefix doesn't rewrite service references inside CRDs, so the IngressRoute was pointing to 'streamlit' instead of 'template-app-streamlit', causing Traefik to return 404. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: use ConfigMap as settings override instead of full replacement The ConfigMap was replacing the entire settings.json, losing keys like "version" and "repository-name" that the app expects (causing KeyError). Now the ConfigMap only contains deployment-specific overrides, which are merged into the Docker image's base settings.json at container startup using jq. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: add set -euo pipefail to fail fast on settings merge error Addresses CodeRabbit review: if jq merge fails, the container should not start with unmerged settings. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ --------- Co-authored-by: Claude <noreply@anthropic.com>
* Add Kubernetes manifests and CI workflows for de.NBI migration Decompose the monolithic Docker container into Kubernetes workloads: - Streamlit Deployment with health probes and session affinity - Redis Deployment + Service for job queue - RQ Worker Deployment for background workflows - CronJob for workspace cleanup - Ingress with WebSocket support and cookie-based sticky sessions - Shared PVC (ReadWriteMany) for workspace data - ConfigMap for runtime configuration (replaces build-time settings) - Kustomize base + template-app overlay for multi-app deployment Code changes: - Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml - Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py CI/CD: - Add build-and-push-image.yml to push Docker images to ghcr.io - Add k8s-manifests-ci.yml for manifest validation and kind integration tests https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix kubeconform validation to skip kustomization.yaml kustomization.yaml is a Kustomize config file, not a standard K8s resource, so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add matrix strategy to test both Dockerfiles in integration tests The integration-test job now uses a matrix with Dockerfile_simple and Dockerfile. Each matrix entry checks if its Dockerfile exists before running — all steps are guarded with an `if` condition so they skip gracefully when a Dockerfile is absent. This allows downstream forks that only have one Dockerfile to pass CI without errors. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Adapt K8s base manifests for de.NBI Cinder CSI storage - Switch workspace PVC from ReadWriteMany to ReadWriteOnce with cinder-csi storage class (required by de.NBI KKP cluster) - Increase PVC storage to 500Gi - Add namespace: openms to kustomization.yaml - Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU) so all workspace-mounting pods fit on a single node https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add pod affinity rules to co-locate all workspace pods on same node The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which requires all pods mounting it to run on the same node. Without explicit affinity rules, the scheduler was failing silently, leaving pods in Pending state with no events. Adds a `volume-group: workspaces` label and podAffinity with requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment, rq-worker deployment, and cleanup cronjob. This ensures the scheduler explicitly co-locates all workspace-consuming pods on the same node. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: wait for ingress-nginx admission webhook before deploying The controller pod being Ready doesn't guarantee the admission webhook service is accepting connections. Add a polling loop that waits for the webhook endpoint to have an IP assigned before applying the Ingress resource, preventing "connection refused" errors during kustomize apply. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: add -n openms namespace to integration test steps The kustomize overlay deploys into the openms namespace, but the verification steps (Redis wait, Redis ping, deployment checks) were querying the default namespace, causing "no matching resources found". https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: retry kustomize deploy for webhook readiness Replace the unreliable endpoint-IP polling with a retry loop on kubectl apply (up to 5 attempts with backoff). This handles the race where the ingress-nginx admission webhook has an endpoint IP but isn't yet accepting TCP connections. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix REDIS_URL to use prefixed service name in overlay Kustomize namePrefix renames the Redis service to template-app-redis, but the REDIS_URL env var in streamlit and rq-worker deployments still referenced the unprefixed name "redis", causing the rq-worker to CrashLoopBackOff with "Name or service not known". Add JSON patches in the overlay to set the correct prefixed hostname. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add Traefik IngressRoute for direct LB IP access The cluster uses Traefik, not nginx, so the nginx Ingress annotations are ignored. Add a Traefik IngressRoute with PathPrefix(/) catch-all routing and sticky session cookie for Streamlit session affinity. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: skip Traefik IngressRoute CRD in validation and integration tests kubeconform doesn't know the Traefik IngressRoute CRD schema, and the kind cluster in integration tests doesn't have Traefik installed. Skip the IngressRoute in kubeconform validation and filter it out with yq before applying to the kind cluster. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix IngressRoute service name for kustomize namePrefix Kustomize namePrefix doesn't rewrite service references inside CRDs, so the IngressRoute was pointing to 'streamlit' instead of 'template-app-streamlit', causing Traefik to return 404. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: use ConfigMap as settings override instead of full replacement The ConfigMap was replacing the entire settings.json, losing keys like "version" and "repository-name" that the app expects (causing KeyError). Now the ConfigMap only contains deployment-specific overrides, which are merged into the Docker image's base settings.json at container startup using jq. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: add set -euo pipefail to fail fast on settings merge error Addresses CodeRabbit review: if jq merge fails, the container should not start with unmerged settings. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: change imagePullPolicy to Always for mutable main tag With IfNotPresent, rollout restarts reuse the cached image even when a new version has been pushed with the same tag. Always ensures Kubernetes pulls the latest image on every pod start. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: build full Dockerfile instead of Dockerfile_simple Switch CI to build the full Docker image with OpenMS and TOPP tools, not the lightweight pyOpenMS-only image. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ --------- Co-authored-by: Claude <noreply@anthropic.com>
* Add Kubernetes manifests and CI workflows for de.NBI migration Decompose the monolithic Docker container into Kubernetes workloads: - Streamlit Deployment with health probes and session affinity - Redis Deployment + Service for job queue - RQ Worker Deployment for background workflows - CronJob for workspace cleanup - Ingress with WebSocket support and cookie-based sticky sessions - Shared PVC (ReadWriteMany) for workspace data - ConfigMap for runtime configuration (replaces build-time settings) - Kustomize base + template-app overlay for multi-app deployment Code changes: - Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml - Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py CI/CD: - Add build-and-push-image.yml to push Docker images to ghcr.io - Add k8s-manifests-ci.yml for manifest validation and kind integration tests https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix kubeconform validation to skip kustomization.yaml kustomization.yaml is a Kustomize config file, not a standard K8s resource, so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add matrix strategy to test both Dockerfiles in integration tests The integration-test job now uses a matrix with Dockerfile_simple and Dockerfile. Each matrix entry checks if its Dockerfile exists before running — all steps are guarded with an `if` condition so they skip gracefully when a Dockerfile is absent. This allows downstream forks that only have one Dockerfile to pass CI without errors. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Adapt K8s base manifests for de.NBI Cinder CSI storage - Switch workspace PVC from ReadWriteMany to ReadWriteOnce with cinder-csi storage class (required by de.NBI KKP cluster) - Increase PVC storage to 500Gi - Add namespace: openms to kustomization.yaml - Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU) so all workspace-mounting pods fit on a single node https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add pod affinity rules to co-locate all workspace pods on same node The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which requires all pods mounting it to run on the same node. Without explicit affinity rules, the scheduler was failing silently, leaving pods in Pending state with no events. Adds a `volume-group: workspaces` label and podAffinity with requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment, rq-worker deployment, and cleanup cronjob. This ensures the scheduler explicitly co-locates all workspace-consuming pods on the same node. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: wait for ingress-nginx admission webhook before deploying The controller pod being Ready doesn't guarantee the admission webhook service is accepting connections. Add a polling loop that waits for the webhook endpoint to have an IP assigned before applying the Ingress resource, preventing "connection refused" errors during kustomize apply. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: add -n openms namespace to integration test steps The kustomize overlay deploys into the openms namespace, but the verification steps (Redis wait, Redis ping, deployment checks) were querying the default namespace, causing "no matching resources found". https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: retry kustomize deploy for webhook readiness Replace the unreliable endpoint-IP polling with a retry loop on kubectl apply (up to 5 attempts with backoff). This handles the race where the ingress-nginx admission webhook has an endpoint IP but isn't yet accepting TCP connections. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix REDIS_URL to use prefixed service name in overlay Kustomize namePrefix renames the Redis service to template-app-redis, but the REDIS_URL env var in streamlit and rq-worker deployments still referenced the unprefixed name "redis", causing the rq-worker to CrashLoopBackOff with "Name or service not known". Add JSON patches in the overlay to set the correct prefixed hostname. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Add Traefik IngressRoute for direct LB IP access The cluster uses Traefik, not nginx, so the nginx Ingress annotations are ignored. Add a Traefik IngressRoute with PathPrefix(/) catch-all routing and sticky session cookie for Streamlit session affinity. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix CI: skip Traefik IngressRoute CRD in validation and integration tests kubeconform doesn't know the Traefik IngressRoute CRD schema, and the kind cluster in integration tests doesn't have Traefik installed. Skip the IngressRoute in kubeconform validation and filter it out with yq before applying to the kind cluster. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Fix IngressRoute service name for kustomize namePrefix Kustomize namePrefix doesn't rewrite service references inside CRDs, so the IngressRoute was pointing to 'streamlit' instead of 'template-app-streamlit', causing Traefik to return 404. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: use ConfigMap as settings override instead of full replacement The ConfigMap was replacing the entire settings.json, losing keys like "version" and "repository-name" that the app expects (causing KeyError). Now the ConfigMap only contains deployment-specific overrides, which are merged into the Docker image's base settings.json at container startup using jq. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: add set -euo pipefail to fail fast on settings merge error Addresses CodeRabbit review: if jq merge fails, the container should not start with unmerged settings. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: change imagePullPolicy to Always for mutable main tag With IfNotPresent, rollout restarts reuse the cached image even when a new version has been pushed with the same tag. Always ensures Kubernetes pulls the latest image on every pod start. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * fix: build full Dockerfile instead of Dockerfile_simple Switch CI to build the full Docker image with OpenMS and TOPP tools, not the lightweight pyOpenMS-only image. https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ * Scope IngressRoute to hostname and drop unused nginx Ingress Traefik is the only ingress controller on the cluster; the nginx Ingress in k8s/base/ingress.yaml was orphaned (no nginx class available) and the overlay was patching it instead of the active Traefik IngressRoute. - Add Host() match to the base IngressRoute (placeholder filled by overlays) - template-app overlay patches the IngressRoute with template.webapps.openms.de - Remove ingress.yaml from the base kustomization resources list (file kept in the repo for nginx-based consumers) https://claude.ai/code/session_01YNDYJTx1eSKaL9vQe1GQzV * fix: use PVC mount for workspaces in online mode In online mode, src/common/common.py hard-coded workspaces_dir to the literal ".." which, from WORKDIR /app, resolved to /. Workspace UUID directories were therefore created on each pod's ephemeral local filesystem instead of the shared PVC mounted at /workspaces-streamlit-template, so the Streamlit pod and the RQ worker each saw their own disconnected copy. The worker's params.json load in tasks.py then hit an empty dict, producing `KeyError: 'mzML-files'` as soon as Workflow.execution() ran. - common.py: in the online branch, use WORKSPACES_DIR env var (default /workspaces-streamlit-template) so Streamlit, the RQ worker, and the cleanup cronjob (which already reads WORKSPACES_DIR) all agree on one location. - k8s streamlit & rq-worker deployments: set WORKSPACES_DIR explicitly so the env is overridable and visible at deploy time. - WorkflowManager.start_workflow: call save_parameters() before dispatch so the latest session state is flushed to disk, closing a small race where a fragment rerun could leave params.json stale when the worker picked up the job. https://claude.ai/code/session_01TsxtENPpuCZ1Ap3mX2ZpHr --------- Co-authored-by: Claude <noreply@anthropic.com>
* fix(ci): pin OpenMS contrib download to matching release tag
The Windows build step downloaded contrib_build-Windows.tar.gz from
OpenMS/contrib without a --tag, always pulling the latest release.
When the GH Actions cache (7-day eviction) expired, a newer contrib
got pulled that was incompatible with the pinned OpenMS release/3.5.0
source tree, breaking MSVC compilation in DIAPrescoring.cpp.
Pin the download to release/${OPENMS_VERSION} and tie the cache key
to the OpenMS version so contrib stays in lockstep with the source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): pass release tag as positional arg to gh release download
`gh release download` takes the tag as a positional argument, not a
`--tag` flag. Silently failed to match on Windows with the system error
"The system cannot find the file specified".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: allow contrib version override via OPENMS_CONTRIB_VERSION
Adds OPENMS_CONTRIB_VERSION env var that falls back to OPENMS_VERSION
when empty. Lets us point OPENMS_VERSION at a non-release branch (e.g.
develop) while keeping the Windows contrib download pinned to a known
release tag, so CI doesn't fail on a missing contrib release.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: ignore docs/superpowers/ (local design notes)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Remove stale patches from template-app overlay The Deployment/streamlit patch with Ingress-shaped path /spec/rules/0/host never applied and produced a silent no-op. The duplicate IngressRoute service-name patch was redundant with the first IngressRoute patch block. This brings the on-disk overlay in line with the production cluster's running version. * Rename configure-deployment skill to configure-docker-compose-deployment First step of splitting the skill into three focused skills (configure-app-settings, configure-docker-compose-deployment, configure-k8s-deployment). Rename is in its own commit so git log --follow traces the docker-compose content cleanly. * Scope docker-compose skill to docker-compose-only Removes app-level content (settings.json, Dockerfile choice, production app examples) that will live in configure-app-settings. Adds a prerequisite note pointing to configure-app-settings. * Add configure-app-settings skill Covers app-level configuration (settings.json, Dockerfile choice, README, dependencies) shared by every deployment mode. Prerequisite for configure-docker-compose-deployment and configure-k8s-deployment. * Fix settings.json key-field list inconsistency The Key fields prose listed max_threads (not in the JSON sample) and omitted enable_workspaces (which is in the sample). Align the prose with the sample and describe max_threads separately since it is a nested object rather than a flat field. * Add configure-k8s-deployment skill New skill walking through Kustomize overlay creation and kubectl apply for deploying a forked app to Kubernetes. Patch list reflects the three-patch canonical shape (IngressRoute match + service, streamlit Redis URL, rq-worker Redis URL). * Fix inline-code rendering in k8s skill The Host(`...`) escape syntax produced literal backslashes that broke the inline-code span when rendered by markdown parsers. Rewrite as Host(...) without nested backticks so the span renders cleanly. * Add K8s deployment doc — overview and architecture sections * Add K8s deployment doc — manifest reference section * Add K8s deployment doc — fork-and-deploy guide * Add K8s deployment doc — CI/CD pipeline section * Clarify PR-blocking behavior depends on branch protection The workflow does not block merges directly — it produces a check status that a branch-protection rule can gate on. Make the preconditions explicit. * Register Kubernetes Deployment page in Streamlit documentation * Cross-link docs/deployment.md to Kubernetes deployment page Adds a preamble listing both deployment paths and introduces a ## Docker Compose heading above the existing content. The existing docker-compose content is preserved verbatim. * Add smoke test for Kubernetes Deployment documentation page Extends the parametrized test_documentation cases to cover the new Documentation page added by this branch, closing the gap where it was the only selectbox entry without test coverage.
ci: unified docker workflow (shadow mode)
github.repository preserves the original casing (OpenMS/streamlit-template). Docker OCI references require lowercase, so cache-from/cache-to fail with 'invalid reference format'. docker/metadata-action handles this internally for tags, but the cache refs bypass it. Compute IMAGE_NAME_LC once and use it in both cache refs.
ci: lowercase image name for OCI cache refs
With push: true, docker/build-push-action pushes every tag in its tags input. A bare name like 'openms-streamlit:simple-test' (no registry prefix) gets resolved to Docker Hub and fails with 401 unauthorized, because the workflow's GHCR token has no rights on docker.io. The local tag was only needed for the kind retag step. Since load: true already loads the image into the runner's docker daemon, we can create the stable local alias with a plain 'docker tag' step after build, picking any tag from docker/metadata-action's output.
ci: don't pass unprefixed local tag to buildx push
ci: cut over from old docker workflows to build-and-test
The @V3 floating tag does not exist on snok/container-retention-policy (v2 is the latest floating major tag; v3 only has v3.0.0 and v3.0.1 as exact version tags). The workflow fails to resolve the action with 'unable to find version v3'. Pin to v3.0.1 (latest v3 release).
ci: pin container-retention-policy to v3.0.1
The ENV GH_TOKEN=${GITHUB_TOKEN} at the top baked the per-run token
into an early layer, so every workflow run rebuilt from scratch.
Moved the ARG next to the one RUN that uses it (gh release download)
so earlier layers stay cacheable.
The committed Dockerfile builds OpenMS from release/3.5.0 on python 3.10, but requirements.txt pinned pyopenms==3.3.0 and ci.yml ran on python 3.11, causing test_gui.py to fail with AttributeError on MSExperiment.to_df() (the API was renamed from get_df to to_df in 3.5).
`MSExperiment.to_df()` exists only on the OpenMS develop branch and is not in the published pyopenms 3.5.0 wheel that CI installs from PyPI, causing AttributeError in the raw data viewer. Switch to `get_df()` and `get_df(long=True)` — both return the same column names that the existing rename logic expects (rt/ms_level/mz_array/intensity_array for the wide form, rt/mz/intensity for the long form).
CI pods crashlooped because a `subPath: secrets.toml` file mount cannot resolve when the optional Secret is absent. Mount the Secret as a directory at /app/admin-secrets/ instead, and register that path via [secrets].files in .streamlit/config.toml so st.secrets picks it up without shadowing the baked-in config.toml / credentials.toml.
Mirrors the base example with overlay-specific guidance: `namePrefix` only rewrites Kustomize-managed resources, so imperative Secrets must still use the literal name `streamlit-secrets`.
k8s: mount admin password from streamlit-secrets Secret
Factor node placement and memory sizing out of the base manifests into reusable Kustomize components (memory-tier-low / memory-tier-high), so each fork picks its tier with a single line in its overlay. - base: remove per-pod `resources` from streamlit and rq-worker Deployments; sizing now comes from the tier component - base: promote redis to Guaranteed QoS (requests == limits for both cpu and memory) so it bottoms the kernel OOM list - base: add LimitRange so containers without explicit resources inherit safe defaults (512Mi/250m request, 2Gi/2 limit, 64Gi/16 max) - components/memory-tier-low: nodeSelector=low, streamlit 512Mi/2Gi, rq-worker 1Gi/16Gi (Burstable) - components/memory-tier-high: nodeSelector=high, streamlit 512Mi/4Gi, rq-worker 2Gi/180Gi (Burstable — uniform across heavy workers so a single active app can burst into the shared pool) - overlays: rename template-app/ to prod/ (one overlay per repo; the repo itself identifies the app) and pull in memory-tier-low - docs & skill: document the new overlays/prod/ path and the one-line tier selector; update CI to kustomize the renamed overlay https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
The memory-tier-low component adds nodeSelector openms.de/memory-tier=low to every Deployment. kind clusters have no such label, so after the rename to overlays/prod all pods stayed Pending and 'Wait for Redis to be ready' timed out. Label --all kind nodes in both the nginx and Traefik integration jobs before deploying so the nodeSelector matches. Also raise the LimitRange max.memory from 64Gi to 200Gi. The original cap was written before memory-tier-high settled on a 180Gi rq-worker limit; without the bump, a high-tier fork (e.g. OpenDIAKiosk) would be rejected by admission when deployed into the shared openms namespace after the template's LimitRange is applied. https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
…tp://127.0.0.1:34609/git/OpenMS/streamlit-template into claude/parallel-webapp-memory-optimization-RoNnJ
Completes the overlay rename started in 6c61365 now that the branch has merged main, which added the example file under the old path. Also rewrite two remaining docs references to overlays/<your-app-name>/ and the CI description to the new prod overlay. https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
Spin up a 2-node kind cluster (control-plane labeled memory-tier=low + ingress-ready, worker labeled memory-tier=high) so the Build-and-Test job passes regardless of which memory-tier component a fork's overlay pulls in. Previously we labeled --all nodes with a single tier after creation, which broke as soon as a fork flipped memory-tier-low to memory-tier-high. - .github/kind-config.yaml: 2-node topology with per-node labels. - .github/workflows/build-and-test.yml: point both helm/kind-action invocations (nginx build + traefik-integration) at the config and drop the now-redundant dynamic label step. https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
Previous run (2f28ed9) showed build + traefik-integration jobs still timing out on 'Wait for Redis'. Root cause: multi-node kind clusters apply node-role.kubernetes.io/control-plane:NoSchedule to the control-plane, which untolerated app pods can't land on even though the nodeSelector matches. The single-node kind used previously had no such taint, which is why CI worked until we added a second node. Add a kubeadmConfigPatches stanza setting nodeRegistration.taints to the empty list so the control-plane is schedulable. Labels and cluster shape (1 control-plane + 1 worker) stay the same. https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
…imization-RoNnJ Refactor K8s deployment to use memory-tier components
Adds a seed-demos initContainer to the Streamlit Deployment that merges image-shipped demos into /workspaces-streamlit-template/.demos/ with cp -rn, so new demos in an image appear after redeploy while admin-saved demos and edits persist across redeploys. - Point demo_workspaces.source_dirs at the PV path via the ConfigMap override (both streamlit and rq-worker pick this up through the jq settings merge at startup). - Make get_demo_target_dir() settings-driven so "Save as Demo" writes to the PV, with backwards-compatible fallbacks for the legacy source_dir string and for environments without settings (tests). - Skip hidden top-level dirs in clean-up-workspaces.py so the nightly cron does not garbage-collect .demos/. - Document the .demos/ layout and the re-seed flow. https://claude.ai/code/session_01Y87aULHSdyBobPdaD4L6tW
…-azhkG Support configurable demo workspace source directories
The Secret used to be an out-of-band copy-the-example step, so forgetting the resources-list edit left the pod booting with an empty admin-secrets mount and a user-facing "Admin not configured" error for a feature that was never wired up in the first place. Now the Secret is committed to the base with an empty admin password and included in k8s/base/kustomization.yaml, so kubectl apply -k always creates it. The "Save as Demo" expander is gated on a non-empty password and is hidden entirely (no error box) when not configured. Operators enable the feature by patching the live Secret or by editing the file locally with git update-index --skip-worktree, both documented. Exception handling in is_admin_configured() is tightened to also catch StreamlitSecretNotFoundError so a missing secrets file never raises. https://claude.ai/code/session_01V1noocAR7uXWjWsC9oLGhz
Hide Save-as-Demo UI when admin password is not configured
Split the build+test flow into three stages so the traefik ingress
test no longer rebuilds Dockerfile_simple from scratch:
build (matrix: full, simple)
-> uploads each image as a workflow artifact
test-nginx (matrix: full, simple)
-> downloads artifact, kind loads, tests nginx ingress
test-traefik (simple only)
-> downloads simple artifact, kind loads, tests traefik ingress
Artifacts (not GHCR) are used because the build job only pushes on
non-PR events and fork PRs cannot auth to GHCR at all, so registry
sharing would not work for every PR path.
Mirror the build/test-nginx matrix so the traefik ingress test also covers the full and simple variants instead of just simple.
test-traefik (simple) failed in the combined "Wait for Redis and deployments to be ready" step because the deployment took longer than 120s to become available, and unlike the test-nginx wait the failure was not soft. Align test-traefik with test-nginx: - Split Redis wait (hard, 60s) from deployment wait (soft, `|| true`). - Bump deployment timeout 120s -> 180s in both jobs. - Widen the curl warm-up loop from 5x2s to 30x2s in both jobs so a marginally late deployment is tolerated; a real failure still surfaces via the trailing unconditional curl.
The previous skill was a manual find-and-replace checklist that assumed Claude could run kubectl against the cluster. Restructure it as an interview-driven file-editing guide with a clear handoff to a human operator (or CI) for cluster apply. - Drop kubectl, kubectl kustomize, and rollout-verification steps that Claude can't actually execute. - Drop nginx ingress fallback; production is Traefik-only. - Add a Step 1 recon over a fixed set of base/overlay/CI files so defaults are derived from the repo, and the skill bails on layouts it doesn't recognize. - Replace the manual checklist with six interview questions, each paired with what it controls in the running deployment, the proposed default, and the reasoning. Slug, GHCR ref, image tag, ingress subdomain, memory tier, workspace storage size. - Make storage a single 1-line edit to k8s/base/workspace-pvc.yaml when the user picks a non-default size; keep the PVC base name unchanged (namePrefix scopes it per-fork, no collisions). - Pin the default storage size to 500 Gi to match the stock base, so the default needs zero file edits. - Explain that images[0].name is a Kustomize match key and must not change.
Refactor CI workflow to build images once and reuse across jobs
Refactor k8s deployment skill to interview-driven overlay editing
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 18 minutes and 26 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR introduces comprehensive Kubernetes deployment infrastructure for the OpenMS Streamlit web app, including base manifests, memory-tier components, production overlays, GitHub CI/CD workflows, and eight new Claude skill guides documenting app extension and deployment processes. Code updates enable workspace directory configuration via environment variables and admin-gated UI features. Changes
Sequence DiagramsequenceDiagram
participant Client as Client/Browser
participant Ingress as Ingress/Traefik
participant Streamlit as Streamlit<br/>(2 replicas)
participant Redis as Redis
participant RQWorker as RQ Worker
Client->>Ingress: HTTP Request<br/>(sticky cookie)
Ingress->>Streamlit: Route to same replica<br/>(stroute affinity)
Streamlit->>Streamlit: Render page,<br/>handle user input
rect rgba(100, 150, 200, 0.5)
Note over Streamlit: Workflow Submission
Streamlit->>Streamlit: Save parameters
Streamlit->>Redis: Queue job<br/>(RQ)
Streamlit-->>Client: Return job ID
end
rect rgba(150, 100, 200, 0.5)
Note over RQWorker: Background Processing
RQWorker->>Redis: Poll for jobs
Redis-->>RQWorker: Dequeue job
RQWorker->>RQWorker: Execute workflow<br/>(conda env)
RQWorker->>RQWorker: Write results
end
Client->>Ingress: Poll results
Ingress->>Streamlit: Route request
Streamlit->>Streamlit: Read job status<br/>from Redis/disk
Streamlit-->>Client: Display results
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.claude/skills/add-visualization.md (1)
127-135: 🛠️ Refactor suggestion | 🟠 MajorExpand the checklist to include OpenMS-Insight cache and StateManager patterns.
The checklist is missing verification items for the cache-based OpenMS-Insight pattern used in the codebase.
✅ Suggested checklist additions
## Checklist - [ ] Correct library chosen for the use case - [ ] Dependencies present in `requirements.txt` - [ ] `show_fig()` used for pyopenms-viz plots (consistent download/export behavior) - [ ] `backend="plotly"` specified for all pyopenms-viz plots -- [ ] Unique `key=` values for all OpenMS-Insight components +- [ ] OpenMS-Insight components initialized with `cache_id` and `cache_path` +- [ ] `StateManager()` initialized once and shared across linked components +- [ ] Components rendered by calling them with `state_manager` parameter +- [ ] Unique `key=` values passed during component rendering (not initialization) - [ ] Cross-component linking set up if multiple views of same data🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/skills/add-visualization.md around lines 127 - 135, Add checklist items verifying the OpenMS-Insight cache and StateManager patterns: confirm that the OpenMS-Insight cache is used where intended, that a StateManager instance (e.g., StateManager or equivalent class) is created and wired to components, that component `key=` values match cache keys, that cache invalidation/refresh logic is present and tested (including persistence/serialization and eviction behavior), and that cache-backed views are covered by tests to ensure consistent state across linked components.src/common/admin.py (1)
12-27:⚠️ Potential issue | 🔴 CriticalCritical: import breaks page load —
StreamlitSecretNotFoundErroris not exported fromstreamlit.errors.The import at line 12 fails because
StreamlitSecretNotFoundErrordoes not exist instreamlit.errors. While the exception was added to Streamlit in version 1.38.0+, it lives in the internal modulestreamlit.runtime.secrets, not the publicstreamlit.errorsAPI. Since the broadexcept Exceptionblock at lines 26–27 already handles all secrets-access failures (including missing secrets and malformed configurations), the safest fix is to drop the unstable import and let the generic handler cover all cases.🛡️ Proposed fix — drop the unstable import
import streamlit as st -from streamlit.errors import StreamlitSecretNotFoundError def is_admin_configured() -> bool: @@ -18,8 +17,8 @@ def is_admin_configured() -> bool: """ try: return bool(st.secrets.get("admin", {}).get("password")) - except (FileNotFoundError, KeyError, StreamlitSecretNotFoundError): - return False except Exception: + # Covers FileNotFoundError, KeyError, and StreamlitSecretNotFoundError + # across Streamlit versions where the symbol's import path varies. return FalseIf you need granular exception handling, guard the import:
try: from streamlit.runtime.secrets import StreamlitSecretNotFoundError except ImportError: StreamlitSecretNotFoundError = Exception # type: ignore[assignment,misc]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/common/admin.py` around lines 12 - 27, Remove the unstable import from streamlit.errors and let the generic exception handler cover secrets failures: delete the line importing StreamlitSecretNotFoundError and update the is_admin_configured function to not rely on that symbol (keep the try/except that returns False on FileNotFoundError, KeyError, and the broad Exception). If you prefer granular handling, wrap an import-from streamlit.runtime.secrets in a try/except ImportError and fall back to a generic Exception alias for StreamlitSecretNotFoundError; locate these changes around the module import and the is_admin_configured function.
🧹 Nitpick comments (12)
.claude/skills/add-visualization.md (1)
14-23: Consider addingstreamlit_plotly_eventsto the library comparison.Based on learnings,
streamlit_plotly_eventsis used for interactive visualizations in this codebase. The table should include guidance on when to use it (e.g., for custom click/selection handlers on Plotly charts) versus OpenMS-Insight components.📊 Suggested addition to the table
| Use Case | Library | Why | |----------|---------|-----| | Quick spectrum/chromatogram/peak map plots | **pyopenms-viz** | Simple one-liner API, publication quality | +| Custom click/selection handlers on Plotly charts | **streamlit_plotly_events** | Event handling for user interactions | | Large datasets (millions of points) | **OpenMS-Insight** | Server-side pagination, intelligent downsampling |Based on learnings: "Use Plotly and streamlit_plotly_events for interactive visualizations"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/skills/add-visualization.md around lines 14 - 23, Add a new row to the library comparison table recommending streamlit_plotly_events for custom Plotly interaction handlers (click/selection events) and how it differs from OpenMS-Insight: use streamlit_plotly_events with plotly.express and show_fig() when you need lightweight client-side click/selection callbacks on Plotly charts, whereas OpenMS-Insight (Table, link_id, server-side filtering/downsampling) should be used for cross-linked views, large datasets, and server-driven interactions; mention it complements pyopenms-viz for one-liners and OpenMS-Insight for heavy-duty UIs.k8s/base/streamlit-secrets.yaml (1)
12-14: Consider an alternative togit update-index --skip-worktreefor managing the live admin password.
skip-worktreeis fragile: the bit is local-only, can be lost ongit stash, branch switches, or fresh clones, and silently risks the real password being committed if it gets cleared. For a production secret, a more robust pattern would be to keep this committed file with the empty placeholder and manage the real value out-of-band viakubectl patch/SealedSecrets/External Secrets/SOPS, then drop theskip-worktreesuggestion to avoid encouraging an error-prone workflow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/streamlit-secrets.yaml` around lines 12 - 14, The current comment recommends using git update-index --skip-worktree in k8s/base/streamlit-secrets.yaml which is fragile; instead keep the committed file with the empty placeholder admin password and remove the skip-worktree recommendation, and update the surrounding docs/comments to instruct operators to manage the live admin password out-of-band using kubectl patch, SealedSecrets/External Secrets/SOPS (or other secret manager) so production secrets are never stored locally or relied on skip-worktree; reference the streamlit-secrets.yaml placeholder and the docs/kubernetes-deployment.md "Configuring the admin password" section to make this change.k8s/base/limitrange.yaml (1)
6-16: LGTM, with one operational note.The defaults/max are consistent with the per-tier patches (low streamlit 2Gi/4 CPU, high worker 180Gi/20 CPU all fit under 200Gi/20 CPU). Note that any container in the namespace without explicit
resourceswill now silently get2Gi/2CPU as its limit, which can cap heavy workloads (e.g., Redis or future sidecars) without an obvious failure mode — worth keeping in mind when adding new workloads.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/limitrange.yaml` around lines 6 - 16, The LimitRange currently sets a default limit (limits[].type: Container -> default) which will silently apply 2Gi/2 CPU to any pod without explicit resources; to avoid capping unintended workloads remove the default block (keep defaultRequest and max) or lower the default values, and add a clear comment/annotation documenting that pods must declare resources; update the LimitRange resource (limits[].default, limits[].defaultRequest, limits[].max) accordingly so new workloads won’t be silently limited.Dockerfile (1)
220-228: Consider using BuildKit secret mount forGITHUB_TOKENinstead ofARG.BuildKit's
--mount=type=secretis the recommended approach for handling secrets in Docker builds. While the current code doesn't expose the token value in echo statements or image layers (it only passes the token as an environment variable togh), Trivy flags this pattern (DS-0031) because build-args can inadvertently leak through logs or build history. BuildKit secrets provide stronger guarantees that the secret never appears in image metadata or build history.♻️ Proposed refactor
# Download latest OpenMS App executable as a ZIP file. -# ARG declared here (not at the top) — otherwise the per-run token busts the cache. -ARG GITHUB_TOKEN -RUN if [ -n "$GITHUB_TOKEN" ]; then \ - echo "GITHUB_TOKEN is set, proceeding to download the release asset..."; \ - gh release download -R ${GITHUB_USER}/${GITHUB_REPO} -p "OpenMS-App.zip" -D /app; \ - else \ - echo "GITHUB_TOKEN is not set, skipping the release asset download."; \ - fi +# Use a BuildKit secret to keep the token out of image layers/history. +# Build with: DOCKER_BUILDKIT=1 docker build --secret id=github_token,env=GITHUB_TOKEN ... +RUN --mount=type=secret,id=github_token,env=GITHUB_TOKEN \ + if [ -n "${GITHUB_TOKEN:-}" ]; then \ + echo "GITHUB_TOKEN is set, proceeding to download the release asset..."; \ + gh release download -R ${GITHUB_USER}/${GITHUB_REPO} -p "OpenMS-App.zip" -D /app; \ + else \ + echo "GITHUB_TOKEN is not set, skipping the release asset download."; \ + fiUpdate
.github/workflows/build-and-test.ymlto pass the secret via thesecretsparameter instead ofbuild-args:build-args: | - GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }} +secrets: | + github_token=${{ secrets.GITHUB_TOKEN }}Also update local build instructions to use:
DOCKER_BUILDKIT=1 docker build --secret id=github_token,env=GITHUB_TOKEN ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 220 - 228, Replace the ARG GITHUB_TOKEN pattern with a BuildKit secret mount: remove ARG GITHUB_TOKEN and change the RUN that calls gh (the block containing gh release download -R ${GITHUB_USER}/${GITHUB_REPO} -p "OpenMS-App.zip" -D /app) to use a --mount=type=secret,id=github_token and set the token inside the container from the secret (e.g., export GITHUB_TOKEN="$(cat /run/secrets/github_token)" && gh release download ...), so the secret never becomes a build-arg; also update your CI (build-and-test.yml) to pass the token as a build secret (not as build-args) and document local build usage as DOCKER_BUILDKIT=1 docker build --secret id=github_token,env=GITHUB_TOKEN ... so the secret is provided via BuildKit.clean-up-workspaces.py (1)
26-28: Hidden-dir skip is correct, but consider.startswithonPath.nameis fine — minor robustness nit.If a user ever creates a directory containing a literal newline or unusual chars, behavior is unchanged. No action needed; flagging only because the seeded
.demos/semantics now depend entirely on this convention — worth documenting next to the demo seeding code so a future contributor doesn't rename.demos/todemos/and silently break it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clean-up-workspaces.py` around lines 26 - 28, The hidden-dir skip using directory.name.startswith(".") is fine but rely on the convention that seeded demo workspaces live in a top-level ".demos" directory; add a brief comment next to the demo seeding logic and/or the directory filter (reference the check using directory.name.startswith(".")) clarifying that seeded demos must be placed in a literal ".demos" top-level dir and that renaming to "demos" will break the skip behavior, so future contributors know why the dot-prefix convention is required.src/common/admin.py (1)
64-75: Consider sharing demo-dir resolution withget_demo_source_dirsto avoid drift.
src/common/common.py:get_demo_source_dirs()already implements list/string/legacy fallback resolution fordemo_workspaces.get_demo_target_dir()reimplements the same precedence inline; if the schema evolves (e.g., per-tier sources, env override), the two functions can drift silently —demo_exists()andsave_workspace_as_demo()would then write to a different root than what's listed for browsing. Reusingget_demo_source_dirs()[0](or extracting a shared helper) keeps a single source of truth.Sketch
-def get_demo_target_dir() -> Path: - default = Path("example-data/workspaces") - try: - demo_config = st.session_state.settings.get("demo_workspaces", {}) - except (AttributeError, KeyError): - return default - - dirs = demo_config.get("source_dirs", demo_config.get("source_dir")) - if isinstance(dirs, str): - return Path(dirs) - if isinstance(dirs, list) and dirs: - return Path(dirs[0]) - return default +def get_demo_target_dir() -> Path: + from src.common.common import get_demo_source_dirs # deferred to avoid cycles + dirs = get_demo_source_dirs() + return dirs[0] if dirs else Path("example-data/workspaces")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/common/admin.py` around lines 64 - 75, get_demo_target_dir currently duplicates the demo_workspaces resolution logic already implemented in get_demo_source_dirs, which risks divergent behavior; update get_demo_target_dir to reuse get_demo_source_dirs() (e.g., return Path(get_demo_source_dirs()[0]) with a safe default if the list is empty) or move the resolution into a shared helper used by both get_demo_source_dirs and get_demo_target_dir so demo_exists and save_workspace_as_demo operate against the same computed demo root.k8s/components/memory-tier-high/streamlit-resources.yaml (1)
10-16: Wide request-to-limit ratio may cause overcommit pressure.Memory limit (
4Gi) is 8× the request (512Mi) and CPU limit (4) is 8× the request (500m). On contended nodes, scheduling decisions are made on requests, so co-located pods can collectively burst far above what the node can sustain, leading to evictions or throttling. If that headroom is intentional for peaky workloads, it's fine; otherwise consider tightening requests closer to typical steady-state usage.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/components/memory-tier-high/streamlit-resources.yaml` around lines 10 - 16, The resource requests vs limits in the streamlit-resources.yaml pod spec are too wide (memory request "512Mi" vs limit "4Gi" and cpu request "500m" vs limit "4"), which risks node overcommit and evictions; adjust the values in the resources.requests and resources.limits block to bring requests closer to typical steady-state usage (or reduce limits) — update the memory and cpu entries under the resources: requests: and limits: sections so requests reflect observed steady consumption and limits only allow reasonable burst headroom for the Streamlit workload.k8s/overlays/prod/kustomization.yaml (1)
12-13: Migrate from deprecatedcommonLabelstolabels.
commonLabelsis deprecated in Kustomize. Uselabels:instead withincludeSelectors: trueandincludeTemplates: trueto match the existing behavior. You can runkustomize edit fixto automate the migration, or manually convert to:labels: - pairs: app: template-app includeSelectors: true includeTemplates: true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/overlays/prod/kustomization.yaml` around lines 12 - 13, Replace the deprecated commonLabels block with the new labels structure: remove commonLabels: app: template-app and add a labels entry that uses pairs to set the app label and sets includeSelectors: true and includeTemplates: true to preserve behavior; specifically update the Kustomize manifest where commonLabels is defined and ensure the new labels stanza contains pairs with app: template-app plus includeSelectors and includeTemplates flags (or run kustomize edit fix to perform this migration automatically)..github/workflows/build-and-test.yml (2)
142-145: Pin ingress-nginx to a release tag.
raw.githubusercontent.com/.../ingress-nginx/main/...follows whatever is onmainupstream. An incompatible breaking change there will silently break this CI overnight with no repo change. Pin to a specific tag (e.g.controller-v1.11.2) and bump deliberately.♻️ Suggested fix
- kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml + kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.11.2/deploy/static/provider/kind/deploy.yamlPick the latest stable tag at the time of merge.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/build-and-test.yml around lines 142 - 145, Replace the unstable raw URL used in the "Install nginx ingress controller" step where kubectl apply fetches from ingress-nginx main; pin the manifest to a specific release tag (e.g., controller-v1.11.2) instead of "main" in the kubectl apply command, and document/update the tag when you intentionally bump it (pick the latest stable tag at merge time).
97-102: Avoid interpolating${{ }}directly into shell — use an env var.Direct interpolation of GitHub Actions expressions into
run:blocks is a script-injection vector (GitHub Security Lab guidance).steps.meta.outputs.tagsis derived from refs and shas so the practical risk here is low, but the safer idiom is to bind viaenv:and let the shell expand the variable.♻️ Proposed refactor
- name: Retag for kind (stable local tag) + env: + META_TAGS: ${{ steps.meta.outputs.tags }} run: | # load:true above loaded all meta-action tags into local docker. # Retag the first one to the stable name the kustomize overlay expects. - FIRST_TAG=$(printf '%s\n' "${{ steps.meta.outputs.tags }}" | head -n 1) + FIRST_TAG=$(printf '%s\n' "$META_TAGS" | head -n 1) docker tag "$FIRST_TAG" openms-streamlit:test🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/build-and-test.yml around lines 97 - 102, The workflow currently interpolates the GitHub Actions expression directly into the run block; instead bind the expression to an env var (e.g., META_TAGS: ${{ steps.meta.outputs.tags }}) and in the Retag for kind step use shell expansion to derive FIRST_TAG (e.g., FIRST_TAG=$(printf '%s\n' "$META_TAGS" | head -n 1)) before calling docker tag "$FIRST_TAG" openms-streamlit:test; update the step to set env and replace any direct `${{ steps.meta.outputs.tags }}` usage in the run script accordingly.k8s/base/streamlit-deployment.yaml (1)
75-80: Sanity-check liveness probe timing.
livenessProbe.initialDelaySeconds: 30assumes the conda activation +jqmerge + streamlit boot finishes within 30s on a cold start. If image pulls or PVC attach are slow on the production cluster, the kubelet will start liveness probes before streamlit binds 8501 and restart the pod. Consider addingfailureThreshold(default 3 × periodSeconds 30 = 90s slack), or usestartupProbeto gate the liveness probe.♻️ Optional hardening with startupProbe
+ startupProbe: + httpGet: + path: /_stcore/health + port: 8501 + failureThreshold: 30 + periodSeconds: 5 livenessProbe: httpGet: path: /_stcore/health port: 8501 initialDelaySeconds: 30 periodSeconds: 30
startupProbegives up to 150s for first boot, after which liveness takes over.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@k8s/base/streamlit-deployment.yaml` around lines 75 - 80, The livenessProbe's initialDelaySeconds (30s) is likely too short for cold starts; update the probe in the Deployment by either increasing initialDelaySeconds and/or adding failureThreshold to give more slack (e.g., keep periodSeconds: 30 and set failureThreshold to 3 or higher), or add a startupProbe that targets the same path (/ _stcore/health) and port (8501) with a longer total timeout (e.g., periodSeconds: 10 and failureThreshold: 15 for ~150s) so the startupProbe gates the livenessProbe until the app binds; ensure livenessProbe remains pointing at httpGet path /_stcore/health and port 8501 so normal health checks continue after startup..claude/skills/configure-k8s-deployment.md (1)
117-130: JSON patch is positional — pin the env order in the base manifests.
/spec/template/spec/containers/0/env/0/valuesilently targets whatever is at index 0 in the base manifest'senvarray. Today that'sREDIS_URLin bothstreamlit-deployment.yamlandrq-worker-deployment.yaml, so this works. If a future contributor reorders env entries (e.g., adds a new var first, or alphabetizes), this patch will overwrite the wrong variable without any error.Two low-cost mitigations worth considering:
- Add an inline comment in both base deployments warning that
env[0]must remainREDIS_URLbecause the prod overlay patches that index.- Or switch the overlay patches to strategic merge (which matches by
name: REDIS_URL) instead of JSON Patch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/skills/configure-k8s-deployment.md around lines 117 - 130, The overlay's JSON patches use positional paths (/spec/template/spec/containers/0/env/0/value) which will silently modify whichever env entry is at index 0; update the overlay to avoid positional JSON patches by either (A) adding a clear inline comment in the base Deployment manifests (streamlit and rq-worker) next to the env array that REDIS_URL must remain at index 0, or (B) convert the overlay patches to strategic-merge style patches that match the env entry by name (targeting the streamlit and rq-worker Deployment resources and setting spec.template.spec.containers[].env[] where name: REDIS_URL to value: "redis://<slug>-redis:6379/0") so the patch matches by name instead of index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/skills/add-visualization.md:
- Around line 84-91: The docs show using a deprecated link_id parameter for
cross-component syncing; update the example to create a single StateManager
instance and pass it into each component instead: import and instantiate
StateManager, initialize Table, LinePlot, Heatmap with their cache_id/cache_path
constructors (Table, LinePlot, Heatmap) and then render each by calling the
component with the same state_manager argument (e.g.,
table(state_manager=state_manager, ...), line_plot(state_manager=state_manager,
...), heatmap(state_manager=state_manager, ...)) so selections synchronize via
the StateManager rather than a link_id column.
- Around line 98-114: The results() fragment should use the cache-based
OpenMS-Insight pattern instead of passing DataFrame directly: create cache_dir =
Path(self.workflow_dir, "results", "insight_cache"), set a cache_id (e.g.
"results_table"), check for (cache_dir / f"table_{cache_id}").is_dir() and if
present instantiate StateManager() and Table(cache_id=f"table_{cache_id}",
cache_path=str(cache_dir)) and call table(state_manager=state_manager,
height=533); keep the pyopenms_viz spectrum plotting (df.plot.ms_spectrum +
show_fig) but move the Table usage to the cached flow and show st.warning when
the cache is missing.
- Around line 61-78: Update the example to use the cache-based component pattern
used in the codebase: import and instantiate StateManager, create components
with cache_id and cache_path (e.g., Table(cache_id=..., cache_path=...),
Heatmap(...), LinePlot(...), SequenceView(...)) rather than passing data, then
render them by calling the component instances with state_manager and runtime
args (e.g., key, height) — for example create state_manager = StateManager(),
instantiate table/heatmap/line_plot variables with cache_id/cache_path, and call
table(state_manager=state_manager, height=...), line_plot(key=...,
state_manager=state_manager, height=...), etc.
In @.claude/skills/configure-app-settings.md:
- Around line 30-37: The documentation incorrectly states that the full
Dockerfile installs Python dependencies from environment.yml; update the doc
block describing **Dockerfile** to say it uses requirements.txt (not
environment.yml) so it matches the actual Dockerfile implementation (see
dependency install in Dockerfile lines that currently reference
requirements.txt), while leaving the **Dockerfile_simple** note that it also
uses requirements.txt; keep references to executor.run_topp(), TOPP tools,
app.py and pyOpenMS unchanged.
In @.claude/skills/configure-docker-compose-deployment.md:
- Around line 28-32: The docs instruct editing clean-up-workspaces.py to
hardcode workspaces_directory, but the script already reads WORKSPACES_DIR from
the environment; update the step to tell users to set the WORKSPACES_DIR env var
for the service instead of editing clean-up-workspaces.py—mention the script
name clean-up-workspaces.py and the env var WORKSPACES_DIR so maintainers know
where the source-of-truth is and to configure the compose service environment
accordingly.
In @.github/workflows/build-and-test.yml:
- Around line 171-175: Remove the "|| true" that masks failures from the kubectl
wait call; instead, run kubectl wait -n openms --for=condition=available
deployment -l app=template-app --timeout=180s and if it fails, explicitly dump
diagnostics (run kubectl get pods -n openms -l app=template-app and kubectl get
services -n openms -l app=template-app) and then exit with a non‑zero status so
the workflow fails; apply the same change to the analogous test-traefik block
(the kubectl wait/get commands in that section).
- Around line 94-95: The workflow currently passes GITHUB_TOKEN as a build-arg
(build-args: GITHUB_TOKEN) which leaks into image metadata; change to using
BuildKit secrets: remove GITHUB_TOKEN from build-args and instead pass it via
the actions/setup-buildx or build-push action secrets mounting feature as a
secret named e.g. github_token, then update the Dockerfile to stop using ARG
GITHUB_TOKEN and instead use a RUN that reads the secret via a secret mount (RUN
--mount=type=secret,id=github_token ...) and set GITHUB_TOKEN from
/run/secrets/github_token before calling gh release download (the gh release
download invocation and the Dockerfile RUN that downloads OpenMS-App.zip are the
locations to change).
In @.github/workflows/build-windows-executable-app.yaml:
- Around line 16-19: The workflow defines OPENMS_CONTRIB_VERSION but never uses
it; add a step (e.g., "Resolve contrib ref") that computes
CONTRIB_REF="${OPENMS_CONTRIB_VERSION:-$OPENMS_VERSION}" and exports it to
GITHUB_ENV, then replace usages of env.OPENMS_VERSION for the contrib cache key
and the gh release download with env.CONTRIB_REF (i.e., update the cache key
expression and the `gh release download release/...` target to reference
CONTRIB_REF) so both caching and download consistently use the resolved contrib
ref.
In @.github/workflows/ghcr-cleanup.yml:
- Around line 19-28: The container-retention step using
snok/container-retention-policy@v3.0.1 is passing the GITHUB_TOKEN, which does
not support the complex negation globs in the image-tags input; replace token:
${{ secrets.GITHUB_TOKEN }} with a personal access token or GitHub App token
secret (e.g., token: ${{ secrets.CLEANUP_TOKEN }}) and ensure that secret is
provisioned with appropriate repo permissions so the negated patterns ("!v*-full
!v*-simple !main-full !main-simple !latest") are honored and protected images
are not deleted.
In @.gitignore:
- Line 15: The .gitignore currently blanket-ignores the entire docs/superpowers/
tree (entry 'docs/superpowers/'), which prevents source-controlled documentation
from being committed; replace that broad entry with a narrower pattern that only
excludes generated artifacts (e.g., build/output folders or specific generated
file extensions inside docs/superpowers) so source files remain tracked—update
the 'docs/superpowers/' line to target only generated paths (for example a
docs/superpowers/build/ or docs/superpowers/**/*.html-generated pattern) and
keep real source docs under version control.
In `@docs/kubernetes-deployment.md`:
- Around line 119-121: Update the two kubectl commands that target the streamlit
deployment to include the namespace flag so they work for non-default contexts:
change the commands that start with "kubectl exec deploy/streamlit" and "kubectl
rollout restart deploy/streamlit" to add "-n openms", and also note that the
actual deployment name may be prefixed (use "<your-app-name>-streamlit" or the
repo's <namePrefix>streamlit as shown elsewhere in the docs).
- Around line 274-281: The docs are out of sync: update the CI description so
that Job "build" is described as only building and uploading the image artifact
(no kind integration), and replace the existing "Job 2 — build" / "Job 3 —
traefik-integration" wording with accurate behavior: there are two test jobs
("test-nginx" and "test-traefik") that both need "build" and both run a matrix
over [full, simple] (so Traefik is exercised for both variants), and ensure
notes about when images are pushed, cache behavior, and PR vs push/tag semantics
remain consistent with that flow; refer to the job names "build", "test-nginx",
and "test-traefik" and the matrix `[full, simple]` when editing the doc.
In `@k8s/base/traefik-ingressroute.yaml`:
- Around line 6-9: The base IngressRoute currently exposes entryPoints: web and
routes.match using PathPrefix(`/`) (i.e., an overly permissive IngressRoute) so
ensure every deployment applies the prod overlay patch that tightens
routes.match to include the host restrictions and updates the service name;
update CI/CD kustomize/helm pipeline to always build/deploy using the prod
overlay (or add a conservative default host match in the base) and add a
pipeline check that verifies the IngressRoute produced for deployment has the
patched match expression (not just PathPrefix(`/`)) before allowing deployment.
In `@k8s/base/workspace-pvc.yaml`:
- Around line 6-11: The PVC currently uses accessModes: ReadWriteOnce with
storageClassName: cinder-csi which will cause Multi-Attach errors when mounted
by streamlit-deployment, rq-worker-deployment, and cleanup-cronjob across
multiple nodes; update the PVC to use a ReadWriteMany-capable backend (replace
storageClassName cinder-csi with your Manila/NFS CSI class and set accessModes
to ReadWriteMany) so multiple pods on different nodes can mount the volume, or
alternatively implement nodeAffinity/podAffinity on streamlit-deployment and
rq-worker-deployment to co-locate them on one node (or merge streamlit and
rq-worker into the same pod) if changing storage backend isn’t possible.
In `@k8s/components/memory-tier-high/worker-resources.yaml`:
- Around line 11-16: The resource spec in worker-resources.yaml has an unsafe
request-to-limit gap (memory request "2Gi" vs limit "180Gi" and cpu "2" vs "20")
which can cause node-level OOMs; update the Resources block (requests.memory,
limits.memory, requests.cpu, limits.cpu) to set requests closer to the realistic
working set (e.g., increase requests.memory to a value that reflects typical
usage, and scale requests.cpu accordingly) or reduce limits to a reasonable
burst factor (e.g., ≤2–4×), and/or add pod antiAffinity or a nodeSelector/taints
for high-memory nodes in the same manifest (use the same Deployment/PodSpec that
references these resources) so that pods requiring very large limits are
scheduled only on capacity-matched nodes.
---
Outside diff comments:
In @.claude/skills/add-visualization.md:
- Around line 127-135: Add checklist items verifying the OpenMS-Insight cache
and StateManager patterns: confirm that the OpenMS-Insight cache is used where
intended, that a StateManager instance (e.g., StateManager or equivalent class)
is created and wired to components, that component `key=` values match cache
keys, that cache invalidation/refresh logic is present and tested (including
persistence/serialization and eviction behavior), and that cache-backed views
are covered by tests to ensure consistent state across linked components.
In `@src/common/admin.py`:
- Around line 12-27: Remove the unstable import from streamlit.errors and let
the generic exception handler cover secrets failures: delete the line importing
StreamlitSecretNotFoundError and update the is_admin_configured function to not
rely on that symbol (keep the try/except that returns False on
FileNotFoundError, KeyError, and the broad Exception). If you prefer granular
handling, wrap an import-from streamlit.runtime.secrets in a try/except
ImportError and fall back to a generic Exception alias for
StreamlitSecretNotFoundError; locate these changes around the module import and
the is_admin_configured function.
---
Nitpick comments:
In @.claude/skills/add-visualization.md:
- Around line 14-23: Add a new row to the library comparison table recommending
streamlit_plotly_events for custom Plotly interaction handlers (click/selection
events) and how it differs from OpenMS-Insight: use streamlit_plotly_events with
plotly.express and show_fig() when you need lightweight client-side
click/selection callbacks on Plotly charts, whereas OpenMS-Insight (Table,
link_id, server-side filtering/downsampling) should be used for cross-linked
views, large datasets, and server-driven interactions; mention it complements
pyopenms-viz for one-liners and OpenMS-Insight for heavy-duty UIs.
In @.claude/skills/configure-k8s-deployment.md:
- Around line 117-130: The overlay's JSON patches use positional paths
(/spec/template/spec/containers/0/env/0/value) which will silently modify
whichever env entry is at index 0; update the overlay to avoid positional JSON
patches by either (A) adding a clear inline comment in the base Deployment
manifests (streamlit and rq-worker) next to the env array that REDIS_URL must
remain at index 0, or (B) convert the overlay patches to strategic-merge style
patches that match the env entry by name (targeting the streamlit and rq-worker
Deployment resources and setting spec.template.spec.containers[].env[] where
name: REDIS_URL to value: "redis://<slug>-redis:6379/0") so the patch matches by
name instead of index.
In @.github/workflows/build-and-test.yml:
- Around line 142-145: Replace the unstable raw URL used in the "Install nginx
ingress controller" step where kubectl apply fetches from ingress-nginx main;
pin the manifest to a specific release tag (e.g., controller-v1.11.2) instead of
"main" in the kubectl apply command, and document/update the tag when you
intentionally bump it (pick the latest stable tag at merge time).
- Around line 97-102: The workflow currently interpolates the GitHub Actions
expression directly into the run block; instead bind the expression to an env
var (e.g., META_TAGS: ${{ steps.meta.outputs.tags }}) and in the Retag for kind
step use shell expansion to derive FIRST_TAG (e.g., FIRST_TAG=$(printf '%s\n'
"$META_TAGS" | head -n 1)) before calling docker tag "$FIRST_TAG"
openms-streamlit:test; update the step to set env and replace any direct `${{
steps.meta.outputs.tags }}` usage in the run script accordingly.
In `@clean-up-workspaces.py`:
- Around line 26-28: The hidden-dir skip using directory.name.startswith(".") is
fine but rely on the convention that seeded demo workspaces live in a top-level
".demos" directory; add a brief comment next to the demo seeding logic and/or
the directory filter (reference the check using directory.name.startswith("."))
clarifying that seeded demos must be placed in a literal ".demos" top-level dir
and that renaming to "demos" will break the skip behavior, so future
contributors know why the dot-prefix convention is required.
In `@Dockerfile`:
- Around line 220-228: Replace the ARG GITHUB_TOKEN pattern with a BuildKit
secret mount: remove ARG GITHUB_TOKEN and change the RUN that calls gh (the
block containing gh release download -R ${GITHUB_USER}/${GITHUB_REPO} -p
"OpenMS-App.zip" -D /app) to use a --mount=type=secret,id=github_token and set
the token inside the container from the secret (e.g., export GITHUB_TOKEN="$(cat
/run/secrets/github_token)" && gh release download ...), so the secret never
becomes a build-arg; also update your CI (build-and-test.yml) to pass the token
as a build secret (not as build-args) and document local build usage as
DOCKER_BUILDKIT=1 docker build --secret id=github_token,env=GITHUB_TOKEN ... so
the secret is provided via BuildKit.
In `@k8s/base/limitrange.yaml`:
- Around line 6-16: The LimitRange currently sets a default limit
(limits[].type: Container -> default) which will silently apply 2Gi/2 CPU to any
pod without explicit resources; to avoid capping unintended workloads remove the
default block (keep defaultRequest and max) or lower the default values, and add
a clear comment/annotation documenting that pods must declare resources; update
the LimitRange resource (limits[].default, limits[].defaultRequest,
limits[].max) accordingly so new workloads won’t be silently limited.
In `@k8s/base/streamlit-deployment.yaml`:
- Around line 75-80: The livenessProbe's initialDelaySeconds (30s) is likely too
short for cold starts; update the probe in the Deployment by either increasing
initialDelaySeconds and/or adding failureThreshold to give more slack (e.g.,
keep periodSeconds: 30 and set failureThreshold to 3 or higher), or add a
startupProbe that targets the same path (/ _stcore/health) and port (8501) with
a longer total timeout (e.g., periodSeconds: 10 and failureThreshold: 15 for
~150s) so the startupProbe gates the livenessProbe until the app binds; ensure
livenessProbe remains pointing at httpGet path /_stcore/health and port 8501 so
normal health checks continue after startup.
In `@k8s/base/streamlit-secrets.yaml`:
- Around line 12-14: The current comment recommends using git update-index
--skip-worktree in k8s/base/streamlit-secrets.yaml which is fragile; instead
keep the committed file with the empty placeholder admin password and remove the
skip-worktree recommendation, and update the surrounding docs/comments to
instruct operators to manage the live admin password out-of-band using kubectl
patch, SealedSecrets/External Secrets/SOPS (or other secret manager) so
production secrets are never stored locally or relied on skip-worktree;
reference the streamlit-secrets.yaml placeholder and the
docs/kubernetes-deployment.md "Configuring the admin password" section to make
this change.
In `@k8s/components/memory-tier-high/streamlit-resources.yaml`:
- Around line 10-16: The resource requests vs limits in the
streamlit-resources.yaml pod spec are too wide (memory request "512Mi" vs limit
"4Gi" and cpu request "500m" vs limit "4"), which risks node overcommit and
evictions; adjust the values in the resources.requests and resources.limits
block to bring requests closer to typical steady-state usage (or reduce limits)
— update the memory and cpu entries under the resources: requests: and limits:
sections so requests reflect observed steady consumption and limits only allow
reasonable burst headroom for the Streamlit workload.
In `@k8s/overlays/prod/kustomization.yaml`:
- Around line 12-13: Replace the deprecated commonLabels block with the new
labels structure: remove commonLabels: app: template-app and add a labels entry
that uses pairs to set the app label and sets includeSelectors: true and
includeTemplates: true to preserve behavior; specifically update the Kustomize
manifest where commonLabels is defined and ensure the new labels stanza contains
pairs with app: template-app plus includeSelectors and includeTemplates flags
(or run kustomize edit fix to perform this migration automatically).
In `@src/common/admin.py`:
- Around line 64-75: get_demo_target_dir currently duplicates the
demo_workspaces resolution logic already implemented in get_demo_source_dirs,
which risks divergent behavior; update get_demo_target_dir to reuse
get_demo_source_dirs() (e.g., return Path(get_demo_source_dirs()[0]) with a safe
default if the list is empty) or move the resolution into a shared helper used
by both get_demo_source_dirs and get_demo_target_dir so demo_exists and
save_workspace_as_demo operate against the same computed demo root.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9b69bba5-fb56-40cd-8089-57b2df09f7c0
📒 Files selected for processing (44)
.claude/skills/add-presets.md.claude/skills/add-python-tool.md.claude/skills/add-visualization.md.claude/skills/configure-app-settings.md.claude/skills/configure-docker-compose-deployment.md.claude/skills/configure-k8s-deployment.md.claude/skills/create-page.md.claude/skills/create-workflow.md.github/kind-config.yaml.github/workflows/build-and-test.yml.github/workflows/build-docker-images.yml.github/workflows/build-windows-executable-app.yaml.github/workflows/ghcr-cleanup.yml.gitignore.streamlit/config.toml.streamlit/secrets.toml.exampleDockerfileclean-up-workspaces.pydocs/kubernetes-deployment.mdk8s/base/cleanup-cronjob.yamlk8s/base/configmap.yamlk8s/base/ingress.yamlk8s/base/kustomization.yamlk8s/base/limitrange.yamlk8s/base/namespace.yamlk8s/base/redis.yamlk8s/base/rq-worker-deployment.yamlk8s/base/streamlit-deployment.yamlk8s/base/streamlit-secrets.yamlk8s/base/streamlit-service.yamlk8s/base/traefik-ingressroute.yamlk8s/base/workspace-pvc.yamlk8s/components/memory-tier-high/kustomization.yamlk8s/components/memory-tier-high/nodeselector.yamlk8s/components/memory-tier-high/streamlit-resources.yamlk8s/components/memory-tier-high/worker-resources.yamlk8s/components/memory-tier-low/kustomization.yamlk8s/components/memory-tier-low/nodeselector.yamlk8s/components/memory-tier-low/streamlit-resources.yamlk8s/components/memory-tier-low/worker-resources.yamlk8s/overlays/prod/kustomization.yamlsrc/common/admin.pysrc/common/common.pysrc/workflow/WorkflowManager.py
💤 Files with no reviewable changes (1)
- .github/workflows/build-docker-images.yml
| ```python | ||
| from openms_insight import Table, LinePlot, Heatmap, VolcanoPlot, SequenceView | ||
|
|
||
| # Interactive table with server-side pagination | ||
| Table(df, key="results-table", page_size=50) | ||
|
|
||
| # Stick-style mass spectrum | ||
| LinePlot(df, x="mz", y="intensity", key="spectrum-plot") | ||
|
|
||
| # 2D heatmap for large datasets (auto-downsampling) | ||
| Heatmap(df, x="rt", y="mz", z="intensity", key="peak-heatmap") | ||
|
|
||
| # Volcano plot for differential expression | ||
| VolcanoPlot(df, x="log2fc", y="neg_log10_pval", key="volcano") | ||
|
|
||
| # Peptide sequence with fragment ion annotations | ||
| SequenceView(sequence="PEPTIDER", ions=ion_df, key="sequence") | ||
| ``` |
There was a problem hiding this comment.
Correct the OpenMS-Insight API pattern to match the actual implementation.
The documented pattern shows direct data passing (Table(df, key="results-table")), but the actual implementation in the codebase uses a cache-based approach. Components are initialized with cache_id and cache_path, then called with state_manager and other parameters.
✅ Correct pattern from content/results_database_search.py
from openms_insight import Table, LinePlot, Heatmap, SequenceView, StateManager
# Initialize state manager for cross-component linking
state_manager = StateManager()
# Load components from cache (no data parameter needed)
table = Table(cache_id=f"table_{cache_id_prefix}", cache_path=str(cache_dir))
heatmap = Heatmap(cache_id=f"heatmap_{cache_id_prefix}", cache_path=str(cache_dir))
line_plot = LinePlot(cache_id=f"lineplot_{cache_id_prefix}", cache_path=str(cache_dir))
# Render components by calling them
st.subheader("PSM Table")
table(state_manager=state_manager, height=533)
st.subheader("MS2 Spectrum")
line_plot(key=f"lineplot_{cache_id_prefix}", state_manager=state_manager, height=450)Key differences:
- Components are initialized with
cache_idandcache_path(not data) - Components are rendered by calling them as functions
state_manageris passed when rendering, notlink_idkeyis passed during rendering, not initialization
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/skills/add-visualization.md around lines 61 - 78, Update the example
to use the cache-based component pattern used in the codebase: import and
instantiate StateManager, create components with cache_id and cache_path (e.g.,
Table(cache_id=..., cache_path=...), Heatmap(...), LinePlot(...),
SequenceView(...)) rather than passing data, then render them by calling the
component instances with state_manager and runtime args (e.g., key, height) —
for example create state_manager = StateManager(), instantiate
table/heatmap/line_plot variables with cache_id/cache_path, and call
table(state_manager=state_manager, height=...), line_plot(key=...,
state_manager=state_manager, height=...), etc.
| ```python | ||
| # Selecting a row in the table highlights the corresponding point in the plot | ||
| Table(df, key="linked-table", link_id="feature_id") | ||
| LinePlot(df, x="mz", y="intensity", key="linked-plot", link_id="feature_id") | ||
| Heatmap(df, x="rt", y="mz", z="intensity", key="linked-heatmap", link_id="feature_id") | ||
| ``` | ||
|
|
||
| All components sharing the same `link_id` column are automatically synchronized. |
There was a problem hiding this comment.
Correct the cross-component linking pattern to use StateManager.
The documented pattern shows using a link_id parameter, but the actual implementation uses StateManager for cross-component synchronization. The link_id column approach is not consistent with the real codebase.
✅ Correct linking pattern
from openms_insight import Table, LinePlot, Heatmap, StateManager
# Initialize state manager ONCE for all linked components
state_manager = StateManager()
# Initialize components from cache
table = Table(cache_id="table_id", cache_path=str(cache_dir))
line_plot = LinePlot(cache_id="lineplot_id", cache_path=str(cache_dir))
heatmap = Heatmap(cache_id="heatmap_id", cache_path=str(cache_dir))
# Render components with the SAME state_manager instance
table(state_manager=state_manager, height=533)
line_plot(state_manager=state_manager, height=450)
heatmap(state_manager=state_manager, height=350)Selections in one component automatically update all components sharing the same state_manager instance.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/skills/add-visualization.md around lines 84 - 91, The docs show
using a deprecated link_id parameter for cross-component syncing; update the
example to create a single StateManager instance and pass it into each component
instead: import and instantiate StateManager, initialize Table, LinePlot,
Heatmap with their cache_id/cache_path constructors (Table, LinePlot, Heatmap)
and then render each by calling the component with the same state_manager
argument (e.g., table(state_manager=state_manager, ...),
line_plot(state_manager=state_manager, ...),
heatmap(state_manager=state_manager, ...)) so selections synchronize via the
StateManager rather than a link_id column.
| @st.fragment | ||
| def results(self) -> None: | ||
| result_file = Path(self.workflow_dir, "results", "step-name", "output.tsv") | ||
| if result_file.exists(): | ||
| df = pd.read_csv(result_file, sep="\t") | ||
|
|
||
| # pyopenms-viz for spectrum plots | ||
| import pyopenms_viz | ||
| fig = df.plot.ms_spectrum(backend="plotly") | ||
| show_fig(fig, "results-spectrum") | ||
|
|
||
| # Or OpenMS-Insight for interactive exploration | ||
| from openms_insight import Table | ||
| Table(df, key="results-table") | ||
| else: | ||
| st.warning("No results found. Please run the workflow first.") | ||
| ``` |
There was a problem hiding this comment.
Align the workflow integration example with the cache-based pattern.
The OpenMS-Insight example in the results() method shows direct data passing (Table(df, key="results-table")), which is inconsistent with the cache-based pattern used throughout the actual codebase (see content/results_database_search.py).
🔄 Suggested alignment with actual pattern
`@st.fragment`
def results(self) -> None:
cache_dir = Path(self.workflow_dir, "results", "insight_cache")
# pyopenms-viz for spectrum plots
result_file = Path(self.workflow_dir, "results", "step-name", "output.tsv")
if result_file.exists():
df = pd.read_csv(result_file, sep="\t")
import pyopenms_viz
fig = df.plot.ms_spectrum(backend="plotly")
show_fig(fig, "results-spectrum")
# OpenMS-Insight for interactive exploration (cache-based)
cache_id = "results_table"
if (cache_dir / f"table_{cache_id}").is_dir():
from openms_insight import Table, StateManager
state_manager = StateManager()
table = Table(cache_id=f"table_{cache_id}", cache_path=str(cache_dir))
table(state_manager=state_manager, height=533)
else:
st.warning("No cached results found. Please run the workflow first.")Alternatively, if direct data passing is a valid alternative pattern, clarify when to use each approach.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/skills/add-visualization.md around lines 98 - 114, The results()
fragment should use the cache-based OpenMS-Insight pattern instead of passing
DataFrame directly: create cache_dir = Path(self.workflow_dir, "results",
"insight_cache"), set a cache_id (e.g. "results_table"), check for (cache_dir /
f"table_{cache_id}").is_dir() and if present instantiate StateManager() and
Table(cache_id=f"table_{cache_id}", cache_path=str(cache_dir)) and call
table(state_manager=state_manager, height=533); keep the pyopenms_viz spectrum
plotting (df.plot.ms_spectrum + show_fig) but move the Table usage to the cached
flow and show st.warning when the cache is missing.
| - **`Dockerfile`** (full): builds OpenMS from source with TOPP tools. Use when the app runs TOPP tools via `executor.run_topp()`. Update: | ||
| - `GITHUB_USER` and `GITHUB_REPO` environment variables | ||
| - Python dependencies in `environment.yml` | ||
| - Entrypoint if main file is not `app.py` | ||
|
|
||
| - **`Dockerfile_simple`** (lightweight): installs pyOpenMS via pip only. Use when the app only uses pyOpenMS Python API. Update: | ||
| - `GITHUB_USER` and `GITHUB_REPO` environment variables | ||
| - Python dependencies in `requirements.txt` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify which dependency files actually exist and are referenced by the Dockerfiles.
fd -HI -t f '^(environment\.yml|requirements\.txt)$'
rg -nP -C2 'environment\.yml|requirements\.txt' Dockerfile Dockerfile_simple 2>/dev/nullRepository: OpenMS/quantms-web
Length of output: 313
Update Dockerfile dependency reference in documentation.
The skill doc at lines 30-37 incorrectly states the full Dockerfile uses environment.yml for Python dependencies. The actual Dockerfile installs dependencies from requirements.txt (lines 88-89). Update the documentation to match the implementation.
Suggested change
- **`Dockerfile`** (full): builds OpenMS from source with TOPP tools. Use when the app runs TOPP tools via `executor.run_topp()`. Update:
- `GITHUB_USER` and `GITHUB_REPO` environment variables
- - Python dependencies in `environment.yml`
+ - Python dependencies in `requirements.txt`
- Entrypoint if main file is not `app.py`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/skills/configure-app-settings.md around lines 30 - 37, The
documentation incorrectly states that the full Dockerfile installs Python
dependencies from environment.yml; update the doc block describing
**Dockerfile** to say it uses requirements.txt (not environment.yml) so it
matches the actual Dockerfile implementation (see dependency install in
Dockerfile lines that currently reference requirements.txt), while leaving the
**Dockerfile_simple** note that it also uses requirements.txt; keep references
to executor.run_topp(), TOPP tools, app.py and pyOpenMS unchanged.
| 2. **Update `clean-up-workspaces.py`**: | ||
|
|
||
| ```python | ||
| workspaces_directory = Path("/workspaces-your-repo-name") | ||
| ``` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm clean-up-workspaces.py reads WORKSPACES_DIR.
fd -t f 'clean-up-workspaces.py' --exec cat {}Repository: OpenMS/quantms-web
Length of output: 2086
Update documentation to reflect environment variable pattern in clean-up-workspaces.py.
The file already resolves workspaces_directory from the WORKSPACES_DIR environment variable (with a fallback default), but step 2 instructs users to hardcode the path in Python. This creates confusion and leads developers to edit code that's no longer the source of truth.
Suggested rewrite
-2. **Update `clean-up-workspaces.py`**:
-
- ```python
- workspaces_directory = Path("/workspaces-your-repo-name")
- ```
+2. **Set `WORKSPACES_DIR` for the service** so the cleanup script targets the right volume:
+
+ ```yaml
+ services:
+ your-app-name:
+ environment:
+ - WORKSPACES_DIR=/workspaces-your-repo-name
+ ```📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 2. **Update `clean-up-workspaces.py`**: | |
| ```python | |
| workspaces_directory = Path("/workspaces-your-repo-name") | |
| ``` | |
| 2. **Set `WORKSPACES_DIR` for the service** so the cleanup script targets the right volume: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/skills/configure-docker-compose-deployment.md around lines 28 - 32,
The docs instruct editing clean-up-workspaces.py to hardcode
workspaces_directory, but the script already reads WORKSPACES_DIR from the
environment; update the step to tell users to set the WORKSPACES_DIR env var for
the service instead of editing clean-up-workspaces.py—mention the script name
clean-up-workspaces.py and the env var WORKSPACES_DIR so maintainers know where
the source-of-truth is and to configure the compose service environment
accordingly.
| kubectl exec deploy/streamlit -- rm -rf /workspaces-streamlit-template/.demos/<name> | ||
| kubectl rollout restart deploy/streamlit | ||
| ``` |
There was a problem hiding this comment.
Add -n openms to the demo-reseed example.
These commands omit the namespace, but every other command in the doc uses -n openms. Without it, users running with a different default-context namespace will get "deployment not found".
📝 Proposed fix
-kubectl exec deploy/streamlit -- rm -rf /workspaces-streamlit-template/.demos/<name>
-kubectl rollout restart deploy/streamlit
+kubectl -n openms exec deploy/<your-app-name>-streamlit -- rm -rf /workspaces-streamlit-template/.demos/<name>
+kubectl -n openms rollout restart deploy/<your-app-name>-streamlitNote: post-overlay the deployment name is <namePrefix>streamlit, so use <your-app-name>-streamlit to match the rest of the guide.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| kubectl exec deploy/streamlit -- rm -rf /workspaces-streamlit-template/.demos/<name> | |
| kubectl rollout restart deploy/streamlit | |
| ``` | |
| kubectl -n openms exec deploy/<your-app-name>-streamlit -- rm -rf /workspaces-streamlit-template/.demos/<name> | |
| kubectl -n openms rollout restart deploy/<your-app-name>-streamlit |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/kubernetes-deployment.md` around lines 119 - 121, Update the two kubectl
commands that target the streamlit deployment to include the namespace flag so
they work for non-default contexts: change the commands that start with "kubectl
exec deploy/streamlit" and "kubectl rollout restart deploy/streamlit" to add "-n
openms", and also note that the actual deployment name may be prefixed (use
"<your-app-name>-streamlit" or the repo's <namePrefix>streamlit as shown
elsewhere in the docs).
| - **Job 2 — `build`** (`needs: lint-manifests`, matrix over `[full, simple]`): | ||
| - Builds `Dockerfile` (full, includes TOPP tools) or `Dockerfile_simple` (pyOpenMS only) depending on the matrix leg. | ||
| - **Buildx registry cache** (`type=registry,…,mode=max`) stored at `ghcr.io/<repo>/cache:full` and `:simple`. A `cache-from` read is attempted on every event; `cache-to` write only on push/tag/workflow_dispatch (fork PRs can't write). Repeat builds with an unchanged Dockerfile finish in minutes. | ||
| - **Push** on push/tag/workflow_dispatch events (not on PRs). Tags: `<branch>-full` / `<branch>-simple`, `v<version>-full` / `v<version>-simple`, `<sha>-full` / `<sha>-simple`. `latest` is emitted only for the full variant on push to `main`. | ||
| - **Kind integration** runs per variant: creates a kind cluster, loads the just-built image, installs the nginx ingress controller, applies the kustomized `prod` overlay (filtering Traefik `IngressRoute`, forcing `imagePullPolicy: Never` and `storageClassName: standard`), asserts Redis + deployments become ready, and curls both `.de` and `.org` hostnames through the nginx ingress to verify dual-host routing. | ||
| - **Job 3 — `traefik-integration`** (`needs: lint-manifests`, runs once on `Dockerfile_simple`): builds the simple image, brings up a second kind cluster, installs Traefik via Helm (`service.type=ClusterIP`), applies the full kustomized overlay without filtering the `IngressRoute` (still patching `imagePullPolicy: Never` and `storageClassName: standard` for kind compatibility), and curls both hostnames through Traefik. Catches IngressRoute-syntax regressions that the nginx-side test cannot. | ||
| - **Auth:** uses the workflow's `GITHUB_TOKEN` for GHCR login and as a build argument for in-image private-resource access. Fork PRs skip login (their `GITHUB_TOKEN` is read-only) but can still read the public cache. | ||
| - **PR behavior:** all three jobs run on pull requests. No tags are pushed and no cache is written. The kind integration still runs, exercising manifests end-to-end. If branch protection requires these checks, a failure blocks merge. |
There was a problem hiding this comment.
CI description doesn't match the actual workflow.
The doc describes:
- Job 2
buildincludes kind integration "per variant" - Job 3
traefik-integrationwithneeds: lint-manifests, runs once onDockerfile_simple
The actual .github/workflows/build-and-test.yml has:
- Job 2
builddoes not run kind integration — it just builds and uploads an image artifact. - Jobs 3/4 are
test-nginxandtest-traefik, bothneeds: build, both matrixed over[full, simple](so traefik runs twice, not once).
Please update the doc so future readers don't trust a stale CI graph description.
📝 Suggested wording
-- **Job 2 — `build`** (`needs: lint-manifests`, matrix over `[full, simple]`):
+- **Job 2 — `build`** (`needs: lint-manifests`, matrix over `[full, simple]`):
- Builds `Dockerfile` (full, includes TOPP tools) or `Dockerfile_simple` (pyOpenMS only) depending on the matrix leg.
- **Buildx registry cache** ...
- **Push** on push/tag/workflow_dispatch events (not on PRs). ...
- - **Kind integration** runs per variant: creates a kind cluster, ...
-- **Job 3 — `traefik-integration`** (`needs: lint-manifests`, runs once on `Dockerfile_simple`): ...
+ - Uploads the built image as a `.tar` artifact for downstream test jobs.
+- **Job 3 — `test-nginx`** (`needs: build`, matrix over `[full, simple]`): downloads the image artifact, creates a kind cluster, installs nginx-ingress, applies the kustomized `prod` overlay (filtering out the Traefik `IngressRoute`, forcing `imagePullPolicy: Never` and `storageClassName: standard`), waits for Redis + deployments, and curls both `.de` and `.org` hostnames through the nginx ingress controller.
+- **Job 4 — `test-traefik`** (`needs: build`, matrix over `[full, simple]`): same shape but installs Traefik via Helm and applies the full overlay including the `IngressRoute`, then curls both hostnames through the Traefik service. Catches IngressRoute-syntax regressions the nginx-side test cannot.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/kubernetes-deployment.md` around lines 274 - 281, The docs are out of
sync: update the CI description so that Job "build" is described as only
building and uploading the image artifact (no kind integration), and replace the
existing "Job 2 — build" / "Job 3 — traefik-integration" wording with accurate
behavior: there are two test jobs ("test-nginx" and "test-traefik") that both
need "build" and both run a matrix over [full, simple] (so Traefik is exercised
for both variants), and ensure notes about when images are pushed, cache
behavior, and PR vs push/tag semantics remain consistent with that flow; refer
to the job names "build", "test-nginx", and "test-traefik" and the matrix
`[full, simple]` when editing the doc.
| entryPoints: | ||
| - web | ||
| routes: | ||
| - match: PathPrefix(`/`) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify all environments using this IngressRoute apply an overlay that patches host/match.
fd -t f kustomization.yaml k8s/
echo "---"
rg -nP -C2 'streamlit-traefik|IngressRoute' k8s/Repository: OpenMS/quantms-web
Length of output: 914
🏁 Script executed:
# Read the prod overlay patch section in full
sed -n '20,50p' k8s/overlays/prod/kustomization.yamlRepository: OpenMS/quantms-web
Length of output: 819
🏁 Script executed:
# List all overlay directories
find k8s/overlays -maxdepth 1 -type d | sortRepository: OpenMS/quantms-web
Length of output: 93
🏁 Script executed:
# Check all overlays for IngressRoute patches
for ov in $(find k8s/overlays -maxdepth 1 -type d ! -name "overlays"); do
echo "=== $ov ==="
if [ -f "$ov/kustomization.yaml" ]; then
rg -A 10 'IngressRoute|streamlit-traefik' "$ov/kustomization.yaml" || echo "Not found"
fi
doneRepository: OpenMS/quantms-web
Length of output: 481
🏁 Script executed:
# Check the base kustomization to understand default resource loading
cat -n k8s/base/kustomization.yamlRepository: OpenMS/quantms-web
Length of output: 565
🏁 Script executed:
# Read the full base IngressRoute
cat -n k8s/base/traefik-ingressroute.yamlRepository: OpenMS/quantms-web
Length of output: 548
🏁 Script executed:
# Search for any other kustomization patterns or deployment references that might use the base directly
rg -i 'traefik.*ingressroute|kustomization' k8s/ --type yaml -B 1 -A 1 | head -50Repository: OpenMS/quantms-web
Length of output: 549
Confirm that all deployments use the prod overlay patch for the IngressRoute.
The base IngressRoute is intentionally permissive—web entrypoint only (HTTP), PathPrefix(/), and a generic service name—allowing the prod overlay to patch it with specific host restrictions and environment-appropriate service naming. The prod patch correctly restricts match to (Host(template.webapps.openms.de) || Host(template.webapps.openms.org)) && PathPrefix(/) and updates the service name.
However, only one overlay (prod) exists in the repository. If the base is ever deployed standalone in dev, test, or other environments without applying the prod overlay patches, the service would be exposed over plain HTTP on all hosts. Ensure deployment pipelines consistently use the overlay, not the base directly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/traefik-ingressroute.yaml` around lines 6 - 9, The base IngressRoute
currently exposes entryPoints: web and routes.match using PathPrefix(`/`) (i.e.,
an overly permissive IngressRoute) so ensure every deployment applies the prod
overlay patch that tightens routes.match to include the host restrictions and
updates the service name; update CI/CD kustomize/helm pipeline to always
build/deploy using the prod overlay (or add a conservative default host match in
the base) and add a pipeline check that verifies the IngressRoute produced for
deployment has the patched match expression (not just PathPrefix(`/`)) before
allowing deployment.
| accessModes: | ||
| - ReadWriteOnce | ||
| storageClassName: cinder-csi | ||
| resources: | ||
| requests: | ||
| storage: 500Gi |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does OpenStack Cinder CSI support ReadWriteMany access mode for PersistentVolumeClaims?
💡 Result:
No, OpenStack Cinder CSI does not support ReadWriteMany (RWX) access mode for PersistentVolumeClaims in the standard filesystem mode. It primarily supports ReadWriteOnce (RWO), which maps to SINGLE_NODE_WRITER in CSI. The driver advertises only SINGLE_NODE_WRITER as its volume capability access mode in its source code. All official examples and documentation use ReadWriteOnce. Multi-attach volumes (via volume types with multiattach=True in OpenStack Cinder) allow attachment to multiple nodes, but this is intended for block mode (volumeMode: Block) use cases like databases handling their own concurrency or raw block access (e.g., KubeVirt). Using ext4 or xfs filesystems (default for filesystem mode) on multi-attach volumes is unsafe for concurrent read-write access across nodes, as these are not cluster filesystems. Recent GitHub issues (up to 2026) confirm users struggle with shared filesystem access on multi-attach Cinder volumes, with no built-in support for clustered filesystems like GFS2. For true RWX filesystem sharing, use OpenStack Manila CSI (NFS/share storage) instead.
Citations:
- 1: openstack cinder storageclass does not provide ReadWriteMany accessmode kubernetes/cloud-provider-openstack#1315
- 2: [cinder-csi-plugin] multi-attach shared file-system kubernetes/cloud-provider-openstack#3031
- 3: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/cinder-csi-plugin/examples.md
- 4: https://docs.openshift.org/latest/storage/container_storage_interface/persistent-storage-csi-cinder.html
- 5: https://creodias.docs.cloudferro.com/en/latest/kubernetes/Kubernetes-Persistent-Volume-Claims-on-Creodias.html
- 6: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/cinder-csi-plugin/using-cinder-csi-plugin.md
- 7: [cinder-csi-plugin] Cinder CSI plugin should not provision ROX or RWX volumes kubernetes/cloud-provider-openstack#1367
- 8: https://openmetal.io/docs/manuals/kubernetes-guides/configuring-openstack-cinder-with-kubernetes
- 9: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/cinder-csi-plugin/features.md
- 10: https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/csi/cinder/driver.go
ReadWriteOnce PVC shared by multiple Deployments — will fail on multi-node clusters.
This PVC is mounted by streamlit-deployment, rq-worker-deployment, and the cleanup-cronjob. With accessModes: ReadWriteOnce, the volume can only be mounted by pods on a single node at a time. When the scheduler places the streamlit pod and rq-worker pod on different nodes, pod startup fails with Multi-Attach error.
This also breaks correctness of the queued-workflow path: rq-worker pods cannot read params.json from the workspace volume if they land on a different node than streamlit.
Mitigation options:
- Switch to a
ReadWriteMany-capable storage backend. On OpenStack, Cinder CSI does not supportReadWriteMany—use Manila CSI (NFS share) instead and setaccessModes: [ReadWriteMany]. - Pin all consumer pods to a single node via
nodeAffinity/podAffinity(limits horizontal scale). - Co-locate
streamlitandrq-workercontainers in the same pod.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/base/workspace-pvc.yaml` around lines 6 - 11, The PVC currently uses
accessModes: ReadWriteOnce with storageClassName: cinder-csi which will cause
Multi-Attach errors when mounted by streamlit-deployment, rq-worker-deployment,
and cleanup-cronjob across multiple nodes; update the PVC to use a
ReadWriteMany-capable backend (replace storageClassName cinder-csi with your
Manila/NFS CSI class and set accessModes to ReadWriteMany) so multiple pods on
different nodes can mount the volume, or alternatively implement
nodeAffinity/podAffinity on streamlit-deployment and rq-worker-deployment to
co-locate them on one node (or merge streamlit and rq-worker into the same pod)
if changing storage backend isn’t possible.
| requests: | ||
| memory: "2Gi" | ||
| cpu: "2" | ||
| limits: | ||
| memory: "180Gi" | ||
| cpu: "20" |
There was a problem hiding this comment.
Very wide request-to-limit gap — node-level OOM risk.
Memory request 2Gi vs limit 180Gi (90×) and CPU 2 vs 20 (10×) means many workers can be scheduled per node based on requests, but a few of them simultaneously bursting toward their limits can exceed node capacity. Under memory pressure, kubelet eviction will OOM-kill pods (not necessarily the offender). For workloads that legitimately need ~180Gi, consider raising the request closer to a typical working set, or sizing one worker per high-mem node via anti-affinity / a node selector with capacity-matched nodes.
🧰 Tools
🪛 Checkov (3.2.524)
[medium] 1-16: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-16: Minimize the admission of root containers
(CKV_K8S_23)
🪛 Trivy (0.69.3)
[error] 9-16: Root file system is not read-only
Container 'rq-worker' of Deployment 'rq-worker' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 9-16: Default security context configured
container rq-worker in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 7-16: Default security context configured
deployment rq-worker in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@k8s/components/memory-tier-high/worker-resources.yaml` around lines 11 - 16,
The resource spec in worker-resources.yaml has an unsafe request-to-limit gap
(memory request "2Gi" vs limit "180Gi" and cpu "2" vs "20") which can cause
node-level OOMs; update the Resources block (requests.memory, limits.memory,
requests.cpu, limits.cpu) to set requests closer to the realistic working set
(e.g., increase requests.memory to a value that reflects typical usage, and
scale requests.cpu accordingly) or reduce limits to a reasonable burst factor
(e.g., ≤2–4×), and/or add pod antiAffinity or a nodeSelector/taints for
high-memory nodes in the same manifest (use the same Deployment/PodSpec that
references these resources) so that pods requiring very large limits are
scheduled only on capacity-matched nodes.
- Drop StreamlitSecretNotFoundError import from admin.py — the symbol doesn't exist in pinned streamlit==1.43.0; the existing broad `except Exception` already covers missing-secrets cases. - Remove the `simple` variant from build/test-nginx/test-traefik matrices — Dockerfile_simple isn't present in this fork. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary by CodeRabbit
Release Notes
New Features
Improvements
Documentation