Skip to content

refactor(sandbox): remove binary path allowlisting entirely#1496

Closed
cgwalters wants to merge 1 commit into
NVIDIA:mainfrom
cgwalters:drop-binary-allowlisting
Closed

refactor(sandbox): remove binary path allowlisting entirely#1496
cgwalters wants to merge 1 commit into
NVIDIA:mainfrom
cgwalters:drop-binary-allowlisting

Conversation

@cgwalters
Copy link
Copy Markdown
Contributor

@cgwalters cgwalters commented May 21, 2026

Binary identity enforcement via /proc/<pid>/exe is nearly impossible to make reliable. There is a reason that security systems like SELinux assign security "domains" to a process that are preserved across execve().

One of the biggest is LD_PRELOAD and LD_LIBRARY_PATH: the glibc dynamic linker will easily load arbitrary shared libraries into the address space even for "fixed purpose" binaries. And most uses of agents will have some writable directory.

But a much bigger problem is that many binaries (including claude) effectively include the ability to execute arbitrary code inside that process - they are interpreters.

For example, claude is a Bun single-file executable. A sandbox policy allowlisting only claude to reach api.anthropic.com is bypassed with:

  printf "process.stderr.write('INJECTED\n'" > /tmp/e.js
  BUN_OPTIONS="--preload /tmp/e.js" claude --version
  # => INJECTED
  # => 2.1.146 (Claude Code)

Arbitrary JS runs inside the claude process before the app starts — with full access to credentials and sockets — while /proc/<pid>/exe still shows claude. The binary check passes; the exfiltration is permitted.

It would absolutely be possible to try to craft an execution environment for an agent that closed some of these loopholes, but my opinion is that anyone doing that is already in the business of minimizing what code goes into the container, and at which point they are actually doing something better: restricting what binaries are present at all.
That said, to do anything truly strong here gets into things like trusted_for etc.

Policy evaluation is now just enforced based on code running within the sandbox - the same way that many other network enforcement tools work. We do not claim that we can reliably drill down into the binary-name level.

Assisted-by: OpenCode (claude-sonnet-4-6@default)

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cgwalters cgwalters marked this pull request as draft May 21, 2026 13:45
Binary identity enforcement via `/proc/<pid>/exe` is nearly
impossible to make reliable. There is a reason that security
systems like SELinux assign security "domains" to a process
that are preserved across `execve()`.

One of the biggest is `LD_PRELOAD` and `LD_LIBRARY_PATH`:
the glibc dynamic linker will easily load arbitrary shared
libraries into the address space even for "fixed purpose"
binaries. And most uses of agents will have *some* writable
directory.

But a much bigger problem is that many binaries (including
`claude`) effectively include the ability to execute
arbitrary code inside that process - they are interpreters.

For example, `claude` is a Bun single-file executable.
A sandbox policy allowlisting only `claude` to reach
api.anthropic.com is bypassed with:

  printf "process.stderr.write('INJECTED\n'" > /tmp/e.js
  BUN_OPTIONS="--preload /tmp/e.js" claude --version
  # => INJECTED
  # => 2.1.146 (Claude Code)

Arbitrary JS runs inside the claude process before the app
starts — with full access to credentials and sockets — while
`/proc/<pid>/exe` still shows `claude`. The binary check
passes; the exfiltration is permitted.

It would absolutely be possible to try to craft an execution
environment for an agent that closed some of these loopholes,
but my opinion is that anyone doing that is already in
the business of minimizing what code goes into the container,
and at which point they are actually doing something better:
restricting what binaries *are present at all*.
That said, to do anything truly strong here gets into
things like [trusted_for](https://lwn.net/Articles/832959/)
etc.

Policy evaluation is now just enforced based on code running
within the sandbox - the same way that many other network
enforcement tools work. We do not claim that we can reliably
drill down into the binary-name level.

Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters cgwalters force-pushed the drop-binary-allowlisting branch from 02de554 to 66b3bbb Compare May 21, 2026 13:59
@cgwalters cgwalters marked this pull request as ready for review May 21, 2026 14:00
@derekwaynecarr
Copy link
Copy Markdown
Collaborator

before dropping this goal, we will meet across the contributor community to see where things end up after the proxy / sandbox boundary items under discussion start to land.

related: #981

@cgwalters
Copy link
Copy Markdown
Contributor Author

Sounds good to me! For the record, I had this commit as part of some other WIP changes I have in the direction of #981 and it was my mistake in not coordinating it with that - apologies.

@ericcurtin
Copy link
Copy Markdown
Contributor

ericcurtin commented May 21, 2026

@derekwaynecarr when is that meeting?

I do agree with @cgwalters that the allowlisting is janky (in the current implementation) and an alternate solution is needed, will be an interesting conversation.

We do have to keep in mind selinux alternatives also like apparmor.

@derekwaynecarr
Copy link
Copy Markdown
Collaborator

@ericcurtin we are trying to get that forum setup now and will try and get details published soon, look forward to your participation and input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants