logicalclocks · o-alex · May 15, 2026
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
@@ -6,11 +6,14 @@
 uv venv && uv pip install -r requirements-docs.txt # setup
 uv pip install "git+https://github.com/logicalclocks/hopsworks-api.git@main#subdirectory=python" # install Python API (needed for API docs section)
 touch docs/javadoc; uv run mkdocs build -s; rm docs/javadoc # build (strict)
-uv run mkdocs serve # preview with live reload
+uv run mike deploy <version> latest --update-alias # build a versioned bundle to the gh-pages worktree (use repo's current version, e.g. 4.4); first time only: `uv run mike set-default latest`
+uv run mike serve # serve the gh-pages worktree locally (preview); does NOT live-reload from source — re-run `mike deploy` after edits
 npx markdownlint-cli2 "**/*.md" # lint Markdown (requires Node.js)
 uv tool install md-snakeoil && snakeoil --line-length 88 --rules "E,F,B,C4,ISC,PIE,PYI,Q,RSE,RET,SIM,TC,I,W,D2,D3,D4,INP,UP,FA" docs # lint Python code blocks
 ```
 
+`uv run mkdocs serve` is available too, but its livereload watcher does not fire rebuilds in this repo's plugin combination on macOS — `mike serve` is the canonical preview tool per the repo README.
+
 ## Rules
 
 - One sentence per line in all Markdown prose

diff --git a/.claude/docs/README.md b/.claude/docs/README.md
@@ -8,12 +8,16 @@ There is no application code — all work is writing Markdown under `docs/` and
 ```bash
 uv venv && uv pip install -r requirements-docs.txt # setup
 uv pip install "git+https://github.com/logicalclocks/hopsworks-api.git@main#subdirectory=python" # needed for Python API section
-touch docs/javadoc; uv run mkdocs serve; rm docs/javadoc # preview with live reload
 touch docs/javadoc; uv run mkdocs build -s; rm docs/javadoc # build in strict mode
+uv run mike deploy <version> latest --update-alias # build a versioned bundle (use repo's current version, e.g. 4.4); first time only: `uv run mike set-default latest`
+uv run mike serve # serve the versioned bundle locally (canonical preview, per repo README); no source live-reload — re-run `mike deploy` after edits
 npx markdownlint-cli2 "**/*.md" # lint Markdown (requires Node.js)
 uv tool install md-snakeoil && snakeoil --line-length 88 --rules "E,F,B,C4,ISC,PIE,PYI,Q,RSE,RET,SIM,TC,I,W,D2,D3,D4,INP,UP,FA" docs # lint Python code blocks
 ```
 
+`uv run mkdocs serve` is available as a static dev server, but its livereload watcher does not fire rebuilds in this repo's plugin combination on macOS.
+Use `mike serve` for previews.
+
 `docs/javadoc` is a directory generated by CI from the `hopsworks-api` Java source.
 Locally it must exist as a stub (`touch`) for the build to pass.
 

diff --git a/docs/assets/images/guides/project/scheduler/compute_resources_usage.png b/docs/assets/images/guides/project/scheduler/compute_resources_usage.png
diff --git a/docs/assets/images/guides/project/scheduler/compute_resources_usage_filtered.png b/docs/assets/images/guides/project/scheduler/compute_resources_usage_filtered.png
diff --git a/...sets/images/guides/project/scheduler/compute_resources_usage_queue_dropdown.png b/...sets/images/guides/project/scheduler/compute_resources_usage_queue_dropdown.png
diff --git a/docs/setup_installation/admin/compute_resources.md b/docs/setup_installation/admin/compute_resources.md
@@ -0,0 +1,78 @@
+---
+description: Cluster configuration for the per-project Compute Resources Usage view
+---
+
+# Configure the Compute Resources Usage view
+
+## Introduction
+
+This page is for cluster administrators.
+It explains how the per-project node list in the **Compute Resources Usage** card is derived, and the RBAC required for that derivation to work.
+For the end-user view of the same card, see [Compute Resources Usage][compute-resources-usage].
+
+When Kueue is installed, the card limits the node list to nodes a given project can actually schedule on.
+The mapping is driven entirely by standard Kueue objects: **LocalQueue → ClusterQueue → ResourceFlavor**.
+There is no Hopsworks-specific configuration on top.
+
+## How project → node visibility is derived
+
+For each project, Hopsworks walks the queue hierarchy bound to the project's Kubernetes namespace.
+
+- Start from every `LocalQueue` in the project's namespace.
+- Follow each `LocalQueue.spec.clusterQueue` to its `ClusterQueue`.
+- For each `ClusterQueue`, collect the `ResourceFlavor`s named in `spec.resourceGroups[].flavors[].name`.
+
+The resulting set of `ResourceFlavor`s is the project's "reachable flavors".
+A node is included in the project's Node Resources view only if it matches at least one reachable flavor.
+
+The per-queue node filter in the UI is built from the same walk, but kept keyed by `LocalQueue` name so users can narrow the view to a single queue.
+
+## How a ResourceFlavor matches a node
+
+A node matches a `ResourceFlavor` when both of these hold.
+
+- **Labels:** every key/value in `ResourceFlavor.spec.nodeLabels` is present on the node with the same value.
+  Extra labels on the node are fine — the flavor's label set must be a subset of the node's labels.
+- **Taints:** every taint on the node with effect `NoSchedule` or `NoExecute` is covered, either by a matching entry in `ResourceFlavor.spec.nodeTaints` or by a matching entry in `ResourceFlavor.spec.tolerations`.
+  Taints with effect `PreferNoSchedule` are soft and do not block matching.
+
+Both rules mirror Kueue's own admission logic, so the view reflects exactly which nodes Kueue would dispatch work to for that flavor.
+
+Cordoned nodes (`spec.unschedulable: true`) and nodes the metrics server cannot report on are dropped from the view regardless of flavor matching, because no useful capacity figure can be produced for them.
+
+## Required RBAC
+
+Hopsworks needs read access to the Kueue CRDs in order to walk the queue hierarchy.
+The Hopsworks Helm chart ships a `ClusterRole` and binding that grant these permissions, so a default install needs no extra action.
+
+If you are managing RBAC manually (e.g. an externally provisioned `hopsworks` service account), grant at least the following:
+
+```yaml
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: hopsworks-kueue-reader
+rules:
+  - apiGroups: ["kueue.x-k8s.io"]
+    resources: ["localqueues", "clusterqueues", "resourceflavors"]
+    verbs: ["get", "list"]
+```
+
+Bind this role to the service account Hopsworks runs as.
+The walk uses `get` and `list` only; no `watch`, `create`, `update`, or `delete` is needed.
+
+## Troubleshooting
+
+The view surfaces several distinct situations.
+Use the table below to map a symptom to a likely cause.
+
+| Symptom | Likely cause |
+| --- | --- |
+| Node Resources sub-section is empty and the access notice says *"None of the queues available in this project currently match any nodes in the cluster."* | The project's LocalQueues resolve to flavors that don't match any node — check `ResourceFlavor.spec.nodeLabels` and `nodeTaints`/`tolerations` against the actual node labels and taints. |
+| Node Resources lists every schedulable node and there is no Queue filter or Queue Resources sub-section | Kueue is not installed, the project namespace has no LocalQueues, or Hopsworks lacks the Kueue RBAC above. All three cases fall through to the legacy non-Kueue path, with no access notice. To distinguish: `kubectl get crd resourceflavors.kueue.x-k8s.io` (absent means Kueue isn't installed), then `kubectl auth can-i list localqueues.kueue.x-k8s.io -n <project-ns> --as=system:serviceaccount:<hopsworks-ns>:<hopsworks-sa>` (`no` means apply the `ClusterRole` and binding above). The `-n` flag is required because `LocalQueue` is namespaced; `--as=` requires the caller to have ServiceAccount impersonation rights (granted by `cluster-admin`). |
+| A node you expect to see is missing | The node is either cordoned, missing from the metrics server, or not matched by any reachable flavor — check `kubectl describe node` for `Unschedulable: true` and confirm node labels/taints satisfy the flavor rules above. |
+
+## See also
+
+- [Compute Resources Usage][compute-resources-usage] — the end-user view this configuration drives.
+- [Kueue][kueue-details] — overview of the Kueue abstractions referenced above.
diff --git a/docs/user_guides/projects/scheduling/compute_resources.md b/docs/user_guides/projects/scheduling/compute_resources.md
@@ -0,0 +1,95 @@
+---
+description: Reading and filtering the Compute Resources Usage view
+---
+
+# Compute Resources Usage
+
+## Introduction
+
+The **Compute Resources Usage** card shows you how much capacity is currently available to your project on the cluster.
+It is meant as a planning aid before submitting work that will consume cluster resources.
+Numbers refresh automatically and reflect the live state of the nodes your project can schedule on.
+
+The same card appears at the top of three pages, so you see it wherever you launch work:
+
+- **Jobs** — above the job list.
+- **Jupyter** — on the Jupyter overview, above the server controls.
+- **Model Deployments** — above the deployments list.
+
+Expand it to see a breakdown of resources per node, namespace, and queue.
+
+![Compute Resources Usage view, fully expanded](../../../assets/images/guides/project/scheduler/compute_resources_usage.png)
+
+## Reading the summary
+
+The collapsed header shows three totals across all the nodes your project can reach: **Memory free**, **CPU free**, and **GPU free**.
+"Free" on each node is its allocatable capacity minus the maximum of utilized and requested resources, and the header is the **sum** of those per-node figures.
+
+These totals give you a sense of the cluster-wide capacity available to your project, but they do not tell you the size of the largest job you can launch.
+A job runs on exactly one node, so the biggest job that will fit is bounded by the single node with the most free resources — not by the sum.
+Always cross-check the **Node Resources** sub-section before sizing a heavy job: a header that reads *100 GB free* can hide the fact that no individual node has more than, say, 30 GB free, in which case a 50 GB job will not start anywhere.
+
+Expanding the card reveals three sub-sections:
+
+- **Node Resources** — per-node breakdown of free Memory, CPU, and GPU.
+- **Namespace Resources** — quotas applied at the project's Kubernetes namespace level.
+- **Queue Resources** — per-queue nominal and borrowable capacity from the Kueue queues you have access to.
+
+## Filter nodes
+
+Two filters sit above the node list: **Queue:** on the left, **Labels:** on the right.
+By default both are inactive — Queue is set to *any* and Labels is empty — so the node list shows the **union** of every node reachable through any of your project's queues.
+
+Use either filter on its own, or both together.
+When both are active, a node is shown only if it passes *both* filters (intersection).
+
+### Queue filter
+
+Choose a queue from the **Queue:** dropdown to narrow the node list to just the nodes reachable through that queue.
+
+![Queue dropdown listing the project's LocalQueues](../../../assets/images/guides/project/scheduler/compute_resources_usage_queue_dropdown.png)
+
+The options are:
+
+- **any** (default) — every node reachable through *any* of your queues.
+- The name of each queue your project has access to — only the nodes reachable through that one queue.
+
+Picking a specific queue shrinks the node list to just the nodes Kueue would actually dispatch to for jobs submitted to that queue.
+
+![Node Resources filtered to the other queue](../../../assets/images/guides/project/scheduler/compute_resources_usage_filtered.png)
+
+The Queue Resources sub-section below is unaffected by this filter — it always lists every queue you have access to.
+
+### Labels filter
+
+Pick one or more labels in the **Labels:** dropdown to narrow the node list to nodes that carry every selected label.
+The dropdown is populated from the labels your project administrator has made available; if no labels are configured for the project, the list is empty.
+
+The Queue and Labels filters compose: with Queue set to *pool-a* and Labels set to `tier:workload`, the view shows only nodes that pool-a can reach *and* that carry `tier:workload`.
+
+## The access notice
+
+When Kueue is configured and your project has at least one LocalQueue, an info icon appears next to **Node Resources**.
+Hover it to see one of two messages.
+
+- **"Reachable through the queues available in this project.
+  See Queue Resources below for the list."**
+  This is the normal case — the listed nodes are the ones your queues route work to.
+  The Queue Resources sub-section names each queue, so you can cross-check which queue claims which capacity.
+
+- **"None of the queues available in this project currently match any nodes in the cluster."**
+  Your project has queues, but none of them currently resolve to any nodes in the cluster.
+  This typically means the queue's underlying configuration (resource flavor) is looking for nodes that don't exist, or all matching nodes are unschedulable.
+  Ask your administrator to review the queue configuration.
+
+## When Kueue is not in use
+
+If the cluster is not running Kueue, or your project has no LocalQueues at all, the Node Resources sub-section lists every schedulable node in the cluster instead.
+There is no Queue filter, no access notice, and no Queue Resources sub-section in that case.
+Jobs run through the standard Kubernetes scheduler rather than a queue.
+
+## See also
+
+- Administrators: see [Configure the Compute Resources Usage view][configure-the-compute-resources-usage-view] for the underlying queue → node mapping and the cluster role permissions required for this view to work.
+- [Kueue][kueue-details] — overview of Kueue's abstractions (ResourceFlavor, ClusterQueue, LocalQueue) used by Hopsworks.
+
diff --git a/docs/user_guides/projects/scheduling/kueue_details.md b/docs/user_guides/projects/scheduling/kueue_details.md
@@ -2,7 +2,7 @@
 description: Kueue abstractions
 ---
 
-# Kueue
+# Kueue { #kueue-details }
 
 ## Introduction
 

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -177,6 +177,7 @@ nav:
           - Kubernetes Scheduling:
               - Base: user_guides/projects/scheduling/kube_scheduler.md
               - Kueue: user_guides/projects/scheduling/kueue_details.md
+              - Compute Resources Usage: user_guides/projects/scheduling/compute_resources.md
 
           - Airflow: user_guides/projects/airflow/airflow.md
           - OpenSearch:
@@ -257,6 +258,7 @@ nav:
           - Configure Alerts: setup_installation/admin/alert.md
           - IAM Role Chaining: setup_installation/admin/roleChaining.md
           - Configure Project Mapping: setup_installation/admin/configure-project-mapping.md
+          - Configure Compute Resources Usage View: setup_installation/admin/compute_resources.md
           - Monitoring:
               - Services Dashboards: setup_installation/admin/monitoring/grafana.md
               - Export metrics: setup_installation/admin/monitoring/export-metrics.md
-Original file line number
+Diff line change
@@ Expand Up / @@ -2,7 +2,7 @@ @@
     description: Kueue abstractions
     ---
-    # Kueue
+    # Kueue { #kueue-details }
     ## Introduction
@@ Expand Down @@