Skip to content

docs(container-gateway): fix Docker driver setup for containerized gateway#1419

Open
ericcurtin wants to merge 1 commit into
NVIDIA:mainfrom
ericcurtin:docs-container-gateway-docker-driver/ec
Open

docs(container-gateway): fix Docker driver setup for containerized gateway#1419
ericcurtin wants to merge 1 commit into
NVIDIA:mainfrom
ericcurtin:docs-container-gateway-docker-driver/ec

Conversation

@ericcurtin
Copy link
Copy Markdown
Contributor

@ericcurtin ericcurtin commented May 17, 2026

Summary

The container-gateway docs were missing or misstating several requirements for running the gateway as a Docker container with the Docker compute driver. Validated by deploying on a Fedora Kinoite (bootc) system.

Related Issue

N/A — discovered during hands-on deployment on a bootc system.

Changes

  • Add OPENSHELL_GRPC_ENDPOINT to all Docker driver examples (required; gateway refuses to start without it)
  • Add supervisor binary extraction step — the binary must exist on the host filesystem at the same path mounted into the gateway container, because the host Docker daemon uses that path as a bind-mount source when creating sandbox containers
  • Keep port binding as 127.0.0.1:8080 — the Docker driver automatically binds the gateway to the bridge network interface via gateway_bind_addresses(), so exposing on 0.0.0.0 is unnecessary
  • Add group_add: [docker] to the compose service — the gateway image runs as nvs:nvs (UID 1000) which needs the docker group to access the Docker socket
  • Add remote gateway registration instructions (--remote flag for LAN access)
  • Add --server-san host.openshell.internal to generate-certs in the mTLS section — sandbox containers resolve host.openshell.internal to reach the gateway, so this SAN must be present in the server cert
  • Complete the mTLS docker run with the missing docker driver requirements (--group-add docker, supervisor binary mount, OPENSHELL_GRPC_ENDPOINT, OPENSHELL_DOCKER_SUPERVISOR_BIN)
  • Add deploy/docker/gateway.toml — TOML config for the Docker driver; binds to 127.0.0.1 since the Docker driver adds the bridge listener automatically
  • Add deploy/docker/docker-compose.yml — mounts gateway.toml as the primary config source; sets command: [] to clear the default CMD (which passes --bind-address 0.0.0.0 as a CLI flag that would otherwise beat the TOML in the merge order); keeps only three env vars that cannot be expressed in TOML (OPENSHELL_DB_URL is blocked from the config file by design, XDG_DATA_HOME/HOME are OS-level path vars outside the gateway config schema)
  • Clarify DooD vs DinD in compose comments — no --privileged needed, only socket read/write access

Testing

  • mise run pre-commit passes (markdownlint clean; python:proto failure is pre-existing env issue unrelated to this change)
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment thread docs/about/container-gateway.mdx Outdated
Comment thread docs/about/container-gateway.mdx Outdated
@ericcurtin ericcurtin force-pushed the docs-container-gateway-docker-driver/ec branch 2 times, most recently from c1ff3e7 to 05176ab Compare May 18, 2026 11:42
Comment thread deploy/docker/gateway.toml Outdated
Comment thread deploy/docker/docker-compose.yml Outdated
Comment thread deploy/docker/docker-compose.yml
Comment thread deploy/docker/docker-compose.yml
@ericcurtin ericcurtin force-pushed the docs-container-gateway-docker-driver/ec branch from 05176ab to a8cc6c0 Compare May 19, 2026 10:55
@ericcurtin
Copy link
Copy Markdown
Contributor Author

@drew @elezar PTAL

Comment thread deploy/docker/docker-compose.yml Outdated
@ericcurtin ericcurtin force-pushed the docs-container-gateway-docker-driver/ec branch from a8cc6c0 to 0193302 Compare May 29, 2026 00:00
…teway

The existing docs omitted or misstated several requirements when running
the gateway as a container with the Docker compute driver:

1. OPENSHELL_GRPC_ENDPOINT is required. The Docker driver rejects
   startup if this env var is missing, but it was not mentioned.

2. The supervisor binary must be extracted to a host path before
   starting the gateway. The gateway validates the path at startup
   from inside the container, and the host Docker daemon uses the
   same path as a bind-mount source when creating sandbox containers.
   Extracting to a path inside the gateway container alone is
   insufficient.

3. Docker socket access requires adding the docker group. The gateway
   image runs as nvs:nvs (UID 1000) which does not have access to the
   Docker socket by default.

4. Port binding should remain 127.0.0.1. The Docker driver
   automatically binds the gateway to the bridge network interface
   (gateway_bind_addresses in the driver) so sandbox containers can
   reach it without exposing the port on 0.0.0.0.

5. The mTLS setup section was missing --server-san host.openshell.internal
   on generate-certs. Sandbox containers resolve host.openshell.internal
   to reach the gateway, so this SAN must be present in the server cert.
   The mTLS docker run was also missing --group-add docker, the supervisor
   binary mount, OPENSHELL_GRPC_ENDPOINT, and OPENSHELL_DOCKER_SUPERVISOR_BIN.

Validated by deploying OpenShell on a Fedora Kinoite (bootc) system
using the updated compose.yml.
@ericcurtin ericcurtin force-pushed the docs-container-gateway-docker-driver/ec branch from 0193302 to d25036a Compare May 29, 2026 08:36
@TaylorMutch
Copy link
Copy Markdown
Collaborator

/ok to test d25036a

Comment on lines +15 to +18
<Card title="Docker Compose Setup" href="/get-started/tutorials/docker-compose">

Run the OpenShell gateway as a Docker Compose service and create agent sandboxes including OpenClaw.
</Card>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we append this to the bottom of the tutorials? Rather than inserting new tutorials at the top, we can append them as they appear.

Copy link
Copy Markdown
Collaborator

@TaylorMutch TaylorMutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment otherwise LGTM

Comment on lines +120 to +122
# Database URL cannot be set in the TOML config file — it is explicitly
# blocked there to prevent secrets from being committed to VCS.
OPENSHELL_DB_URL: "sqlite:/var/lib/openshell/gateway.db?mode=rwc"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the same concern (secrets being committed to VCS) not also valid here? Or are we saying that that concern is not a real concern?


# TOML config — all gateway and driver settings live here.
- type: bind
source: ./gateway.toml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does . here represent the deploy/docker folder or the folder where docker compose is being run from?


```shell
mkdir -p ~/openshell/supervisor
docker create --name tmp-supervisor ghcr.io/nvidia/openshell/supervisor:latest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to match the setting in gateway.toml, correct?

--restart unless-stopped \
--group-add docker \
-p 127.0.0.1:8080:8080 \
-v openshell-state:/var/openshell \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the compose example it was motivated that the path on the host and the path in the gateway container needed to be the same. Why does that not have to be the case here? (I don't see /var/lib/openshell mentioned here).

chmod +x ~/openshell/supervisor/openshell-sandbox
```

Save the following as `~/openshell/compose.yml`, substituting your home directory for `HOME`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is it better to refer to the compose file we have defined to prevent drift?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants