Skip to content

feat(blobmanager): add managed CAS backend via S3 Access Points#3121

Open
jiparis wants to merge 8 commits into
chainloop-dev:mainfrom
jiparis:jiparis/managed-cas-s3-access-points
Open

feat(blobmanager): add managed CAS backend via S3 Access Points#3121
jiparis wants to merge 8 commits into
chainloop-dev:mainfrom
jiparis:jiparis/managed-cas-s3-access-points

Conversation

@jiparis
Copy link
Copy Markdown
Member

@jiparis jiparis commented May 15, 2026

Summary

  • Introduces a new AWS-S3-ACCESS-POINT CAS backend that targets a single shared bucket via per-tenant S3 Access Points. Each request mints a scoped session via sts:AssumeRole with a session policy and RoleSessionName derived from the authenticated requesting org carried in ctx.
  • Adds an optional blob_backends.s3_access_point config block (base_role_arn, region, session_duration, dev_mode_use_ambient_credentials) to both controlplane and artifact-cas. When absent the provider stays unregistered, so existing on-prem deployments are unaffected.
  • Carries the requesting org through the CAS robotaccount JWT (org-id claim) so artifact-cas can enrich its context via s3accesspoint.WithRequestingOrg before resolving the backend. Other providers ignore the key.
  • For Managed=true rows, redacts AWS implementation details from any wire output: the AP ARN (Location) becomes "managed by Chainloop" and the provider ID (Provider) becomes "Chainloop" in both API responses and audit-event payloads. The DB and biz layer keep the real values.

AI Assistance

This change was developed with Claude Code; per-commit Assisted-by: trailers record the specific commits.

Closes #3114

jiparis added 3 commits May 15, 2026 13:51
Introduce a new `AWS-S3-ACCESS-POINT` CAS backend that targets a single
shared bucket via per-tenant S3 Access Points. Each upload/download
mints scoped temporary credentials via `sts:AssumeRole` with a session
policy narrowed to the tenant's AP ARN and key prefix, and a session
name derived from the authenticated requesting org carried in
`ctx` (`s3accesspoint.WithRequestingOrg`).

Both upstream binaries pick up a new optional `blob_backends.s3_access_point`
config block (`base_role_arn`, `region`, `session_duration`); when the
block is absent the provider stays unregistered and behaviour is
identical to before. The pod's ambient AWS identity (IRSA / instance
profile / env vars) is used to call STS — no static credentials live
in config.

Per-tenant data (AP ARN, region override, key prefix) is stored as a
JSON blob in the secrets manager and read via `FromCredentials`, so
the existing `backend.Provider` interface is unchanged.

Add `OrgID` to the CAS robotaccount JWT claims so artifact-cas can
enrich its context with the requesting org before invoking the
backend; existing providers ignore the key.

Assisted-by: Claude Code
Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>

Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
… dev mode

Two related refinements to the AWS-S3-ACCESS-POINT provider.

1. The per-tenant key prefix is now derived at request time from the
   authenticated requesting org carried in ctx via WithRequestingOrg,
   rather than read from a `KeyPrefix` field in the secrets-manager
   blob. The prefix and the AssumeRole `RoleSessionName` now share
   their single source of truth, so a tampered Credentials blob can no
   longer reroute a tenant's writes into another tenant's namespace.
   The Credentials struct shrinks to {AccessPointARN, Region}. The
   session policy and the bucket-level key both use `<orgUUID>` as the
   prefix; the AP resource policy's Resource ARN must be
   `${apARN}/object/<orgUUID>/*` to match.

2. Add a `dev_mode_use_ambient_credentials` Config flag (proto +
   wire-plumbed in both binaries) that bypasses `sts:AssumeRole` and
   routes S3 calls through whatever ambient AWS identity the SDK's
   default credential chain produced. Local dev no longer requires an
   IAM role + trust policy setup. The missing-org fail-closed check
   still fires in dev mode so callers that forget WithRequestingOrg
   surface the same bug locally that they would in production. A loud
   warning is logged at startup. DEV ONLY — never enable in
   multi-tenant deployments.

Assisted-by: Claude Code
Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>

Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
…wire output

For Managed=true CAS backends, replace Location with "managed by
Chainloop" and Provider with "Chainloop" everywhere the controlplane
emits a CASBackend outside its trust boundary:

* API responses (bizCASBackendToPb), so `chainloop cas-backend ls`
  no longer prints the AWS account ID, region, or AP name.
* Audit-log events on the NATS bus (CASBackendCreated,
  CASBackendUpdated, CASBackendDeleted, CASBackendPermanentDeleted,
  CASBackendStatusChanged), so downstream consumers can't surface the
  same details to tenants either.

The DB and biz layer continue to carry the real ARN and provider ID
unchanged, so PerformValidation, the platform reconciler, and any
forensic join by CASBackendID still work. Two helpers
(displayLocation, displayProvider) keep the sanitization rule in one
place.

Assisted-by: Claude Code
Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>

Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
@chainloop-platform
Copy link
Copy Markdown
Contributor

chainloop-platform Bot commented May 15, 2026

AI Session Analysis

Avg score Sessions Failing policies Attribution Files Lines Total Duration
🟢 87% 1 ⚠️ 2 63% AI / 37% Human 33 +2431 / -524 97h0m53s

🟢 87% — 63% AI — ⚠️ 2 policies failing

May 14, 2026 09:34 UTC · 97h0m53s · $238.06 · 7.6k in / 626.1k out · claude-code 2.1.139 (claude-opus-4-7)

Change Summary

  • Adds a new S3 Access Point CAS backend provider and supporting package.
  • Wires the provider into the business logic, protobuf definitions, and dependency injection.
  • Adds comprehensive unit tests for the provider, backend, loader, and controlplane service.
  • Redacts ARN from CLI output and audit logs via display helper functions.
  • Fixes a P1 security regression where OrgID was sourced from request input instead of authenticated session context.
  • Adds a dev-mode ambient credentials flag for local development.
  • Guards the new backend type at the API layer.

AI Session Overall Score

🟢 87% — Full S3 Access Point provider delivered with strong tests, planning, and self-corrected security fix.

AI Session Analysis Breakdown

🟢 92% · verification

🟢 New test files added for s3accesspoint provider, backend, loader, and controlplane; tests run 21 times, all passed. · High Impact

🟡 Some test runs used -tail flags that could mask earlier failures in verbose output. · Low Severity

🟢 88% · context-and-planning

🟢 Before any edits, AI produced a comprehensive multi-section plan surfacing provider interface tradeoffs and security concerns for user input. · High Impact

🟡 Initial user prompt was a bare URL with no explicit constraints or scope boundaries stated inline. · Low Severity

🟢 88% · user-trust-signal

🟡 User interrupted plan mode three times to redirect; each was a clarification, not a correction. · Low Severity

🟢 87% · solution-quality

🟢 AI self-identified a P1 security regression and fixed all call sites across cascredential.go, attestation.go, and casredirect.go. · High Impact

🟠 DevModeUseAmbientCredentials bypasses STS AssumeRole; could be dangerous if reachable in production configs. · Medium Severity

💡 Confirm the flag cannot be set in production configs; consider a build-tag guard to make that structurally impossible.

🟢 85% · scope-discipline

No notes.

🟢 82% · alignment

🟠 AI shipped implementation with a P1 security bug: OrgID sourced from request input instead of authenticated session. · Medium Severity

💡 Verify auth-sensitive claims are always sourced from the authenticated session context, not user-controlled input, before committing.


File Attribution

████████████░░░░░░░░ 63% AI / 37% Human

Status Attribution File Lines
modified human app/controlplane/internal/conf/controlplane/config/v1/conf.pb.go +381 / -221
modified ai pkg/blobmanager/s3accesspoint/backend.go +401 / -20
modified human app/artifact-cas/internal/conf/conf.pb.go +256 / -114
modified ai pkg/blobmanager/s3accesspoint/provider.go +291 / -21
modified ai pkg/blobmanager/s3accesspoint/backend_test.go +234 / -19
modified ai pkg/blobmanager/s3accesspoint/provider_test.go +218 / -17
modified ai app/controlplane/internal/service/casbackend_test.go +95 / -0
modified ai app/controlplane/internal/conf/controlplane/config/v1/conf.proto +61 / -11
created ai pkg/blobmanager/loader/loader_test.go +70 / -0
modified ai app/controlplane/pkg/biz/casbackend.go +58 / -11
modified ai pkg/blobmanager/loader/loader.go +47 / -4
modified ai app/controlplane/internal/service/casbackend.go +36 / -12
modified human app/artifact-cas/cmd/wire_gen.go +34 / -11
modified human app/controlplane/cmd/wire_gen.go +34 / -11
modified ai app/controlplane/cmd/wire.go +32 / -10
modified ai app/artifact-cas/cmd/wire.go +32 / -9
modified ai app/artifact-cas/internal/conf/conf.proto +29 / -8
modified ai app/artifact-cas/internal/service/service.go +28 / -1
modified ai app/artifact-cas/configs/config.devel.yaml +18 / -4
modified ai app/controlplane/configs/config.devel.yaml +17 / -1
modified ai internal/robotaccount/cas/robotaccount.go +15 / -2
modified ai app/controlplane/internal/service/attestation.go +8 / -2
modified ai app/controlplane/internal/service/cascredential.go +7 / -2
modified ai app/controlplane/internal/service/casredirect.go +7 / -2
modified ai app/controlplane/pkg/biz/cascredentials.go +5 / -1

…and 8 more file(s).


Policies (4, 2 failing)

Status Policy Material Messages
✅ Passed ai-config-ai-agents-allowed ai-coding-session-234a03 -
⚠️ Failed ai-config-no-dangerous-commands ai-coding-session-234a03 Forbidden bash pattern /git[^|]push[^|]--force/ matched command: git push --force-with-lease origin jiparis/managed-cas-s3-access-points 2>&1 | tail -8
⚠️ Failed ai-config-no-secrets ai-coding-session-234a03
  • Potential secret (Quoted API key/password) found in session content [turn=1567, source=tool_result, line=39, value=secret: ...bdK"]
  • Potential secret (Quoted API key/password) found in session content [turn=662, source=tool_result, line=78, value=secret: ...mV0"]
✅ Passed ai-config-mcp-servers-allowed ai-coding-session-234a03 -

Powered by Chainloop and Chainloop Trace

@kusari-inspector
Copy link
Copy Markdown

kusari-inspector Bot commented May 15, 2026

Kusari Inspector

Kusari Analysis Results:

Proceed with these changes

✅ No Flagged Issues Detected
All values appear to be within acceptable risk parameters.

No pinned version dependency changes, code issues or exposed secrets detected!

Note

View full detailed analysis result for more information on the output and the checks that were run.


@kusari-inspector rerun - Trigger a re-analysis of this PR
@kusari-inspector feedback [your message] - Send feedback to our AI and team
See Kusari's documentation for setup and configuration.
Commit: cfa4ff4, performed at: 2026-05-15T19:34:39Z

Found this helpful? Give it a 👍 or 👎 reaction!

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 34 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="app/controlplane/internal/conf/controlplane/config/v1/conf.proto">

<violation number="1" location="app/controlplane/internal/conf/controlplane/config/v1/conf.proto:152">
P2: Enforce `base_role_arn` when dev mode is disabled; the current schema allows invalid production config that will fail only at runtime.</violation>
</file>

<file name="app/controlplane/internal/service/cascredential.go">

<violation number="1" location="app/controlplane/internal/service/cascredential.go:152">
P1: Use the authenticated requesting org when minting CAS credentials; deriving `OrgID` from `backend.OrganizationID` can incorrectly scope managed S3 access-point sessions to backend ownership instead of caller identity.</violation>
</file>

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Fix all with cubic
Re-trigger cubic

Comment thread app/controlplane/internal/service/cascredential.go Outdated
Comment thread app/controlplane/internal/conf/controlplane/config/v1/conf.proto
Two follow-ups from the PR review on chainloop-dev#3121:

* The CAS JWT minted by cascredential.go, attestation.go and
  casredirect.go now embeds OrgID from the authenticated caller
  (entities.CurrentOrg / robotAccount.OrgID) instead of
  backend.OrganizationID. For managed S3 Access Point backends this
  OrgID drives the AssumeRole session name and the AP-policy
  aws:userid match; deriving it from the resolved row would weaken
  the cross-tenant guarantee if a future bug ever let a caller
  resolve a backend they don't own.

* The S3AccessPoint proto message now carries a buf.validate CEL
  constraint that requires base_role_arn when
  dev_mode_use_ambient_credentials is false, surfacing the
  misconfiguration at config-load time rather than at first upload.

Assisted-by: Claude Code
Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>

Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
@kusari-inspector
Copy link
Copy Markdown

Kusari PR Analysis rerun based on - cfa4ff4 performed at: 2026-05-15T19:34:39Z - link to updated analysis

A `go mod tidy` while developing the s3accesspoint provider regressed
several deps:

* go-git/v6 downgraded alpha.3 -> alpha.2 (CVE-2026-45022, commit
  signature spoofing)
* go-billy/v5 downgraded 5.9.0 -> 5.8.0 (CVE-2026-44973 path
  traversal, CVE-2026-44740 symlink-loop DoS)
* go-billy/v6 swapped to an older snapshot
* go-git/v5 downgraded 5.19.0 -> 5.18.0
* unrelated olekukonko/* and golang.org/x/* version churn that broke
  CI's go-module tidy check

Restoring go.mod and go.sum to match origin/main resolves both the
Kusari CVE alerts and the CI failures. aws-sdk-go-v2/service/sts
(needed by the s3accesspoint provider) is already an indirect at
v1.41.9 on main, so no go.mod change is required for the new code
to build.

Assisted-by: Claude Code
Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>
@jiparis jiparis force-pushed the jiparis/managed-cas-s3-access-points branch from cfa4ff4 to 7457ed2 Compare May 15, 2026 19:36
// independently here so the artifact-cas binary doesn't depend on the
// controlplane's protobuf package. Keep field numbering in sync across
// both definitions.
message BlobBackends {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call it ManagedCASBackends

// caller can reuse it for the subsequent Upload/Download calls. Callers
// MUST use the returned context, not the original one.
func (s *commonService) loadBackendForClaims(ctx context.Context, claims *casJWT.Claims) (context.Context, backend.UploaderDownloader, error) {
ctx = s3accesspoint.WithRequestingOrg(ctx, claims.OrgID)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be a middleware instead, it doesn't seem explicitly related to loading the backend no?

@migmartri
Copy link
Copy Markdown
Member

@jiparis how does this work? Does it create a backend DB entry automatically if the managed setup is configured in the instance in a similar way we do it with inline?

jiparis added 3 commits May 18, 2026 12:23
The proto message and its YAML field describe configuration for
*managed* CAS backends (provisioned and operated by Chainloop), not
generic blob storage. Rename:

* proto message `BlobBackends` -> `ManagedCASBackends`
* proto field `blob_backends` -> `managed_cas_backends` in both
  controlplane and artifact-cas Bootstrap messages
* matching Go field on the regenerated `*conf.Bootstrap`
  (`ManagedCasBackends`) and references in wire.go / wire_gen.go
* commented-out example block in both `config.devel.yaml`

No behavioural change; the only deployments that read this block today
are local-dev configs (gitignored config.local.yaml) which have been
updated separately.

Assisted-by: Claude Code
Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>

Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
CASBackendService.Create previously accepted any provider ID present
in the loader's provider map, including AWS-S3-ACCESS-POINT. A
sufficiently determined user could craft a Create request that
half-provisioned a managed row pointing at an AP ARN they don't own,
bypassing the platform reconciler's trust boundary.

Add an explicit isManagedOnlyProvider() guard at the front of Create
so the public RPC fails fast with `managed CAS backends cannot be
created via this API`. The platform reconciler still creates managed
rows by calling biz.CASBackendUseCase.Create directly, which is
unaffected. Update/SoftDelete are already guarded against managed
rows in the biz layer.

Assisted-by: Claude Code
Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>

Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>
// backends. New managed providers append a nested message rather
// than adding top-level fields to Bootstrap, so the surface stays
// organised as more backends are added.
message ManagedCASBackends {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to add bool enabled = 2;

// block. Defined independently here so the artifact-cas binary doesn't
// depend on the controlplane's protobuf package. Keep field numbering
// in sync across both definitions.
message ManagedCASBackends {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to configure this in CAS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: support S3 Access points as CAS storage backend

2 participants