Skip to content

OCPBUGS-83335: dont save CAPI secrets#10498

Open
patrickdillon wants to merge 1 commit intoopenshift:mainfrom
patrickdillon:capi-secrets
Open

OCPBUGS-83335: dont save CAPI secrets#10498
patrickdillon wants to merge 1 commit intoopenshift:mainfrom
patrickdillon:capi-secrets

Conversation

@patrickdillon
Copy link
Copy Markdown
Contributor

@patrickdillon patrickdillon commented Apr 13, 2026

The installer saves all capi manifests to .clusterapi_output for debugging purposes. On some platforms, this may include secrets which is an unnecessary security risk as they don't help with debugging. Our CI scrubs these, but users shouldn't need to handle it.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Secrets are no longer written to manifest artifact files during cluster operations, preventing sensitive data from being stored in plaintext files.

The installer saves all capi manifests to .clusterapi_output for
debugging purposes. On some platforms, this may include secrets
which is an unnecessary security risk as they don't help
with debugging.
@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-83335, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The installer saves all capi manifests to .clusterapi_output for debugging purposes. On some platforms, this may include secrets which is an unnecessary security risk as they don't help with debugging. Our CI scrubs these, but users shouldn't need to handle it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9ae6815f-76a8-4e59-ad2c-33acb637dcd6

📥 Commits

Reviewing files that changed from the base of the PR and between 075b809 and f29204d.

📒 Files selected for processing (1)
  • pkg/infrastructure/clusterapi/clusterapi.go

Walkthrough

The collectManifests function in the cluster API infrastructure module now conditionally skips YAML marshaling and artifact file creation for Secret objects, whereas previously all manifest objects were serialized and written to disk regardless of kind.

Changes

Cohort / File(s) Summary
Secret filtering in manifest collection
pkg/infrastructure/clusterapi/clusterapi.go
Added logic to detect GroupVersionKind of each manifest and skip YAML marshaling and artifact file creation when the kind is "Secret".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the specific change: preventing CAPI secrets from being saved, which directly matches the main objective and code changes of skipping Secret objects in manifest collection.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Stable And Deterministic Test Names ✅ Passed PR only modifies infrastructure code in clusterapi.go to skip secret artifacts; no Ginkgo test files introduced or modified.
Test Structure And Quality ✅ Passed This pull request does not include any Ginkgo test code and therefore this check is not applicable.
Microshift Test Compatibility ✅ Passed This PR modifies infrastructure code in clusterapi.go, not adding any new Ginkgo e2e tests that need MicroShift compatibility verification.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests. Changes are limited to production infrastructure code modifying the collectManifests function to skip serializing Secrets for security purposes.
Topology-Aware Scheduling Compatibility ✅ Passed PR filters Secrets from debug artifacts; no scheduling constraints, deployment manifests, or topology-aware changes introduced.
Ote Binary Stdout Contract ✅ Passed Code changes only filter Secret manifests and do not introduce stdout writes. This is installer infrastructure code, not an OTE binary.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR modifies only infrastructure code by adding a conditional check to skip saving Secret objects, with no new e2e tests added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from andfasano and tthvo April 13, 2026 22:14
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-83335, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

The installer saves all capi manifests to .clusterapi_output for debugging purposes. On some platforms, this may include secrets which is an unnecessary security risk as they don't help with debugging. Our CI scrubs these, but users shouldn't need to handle it.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
  • Secrets are no longer written to manifest artifact files during cluster operations, preventing sensitive data from being stored in plaintext files.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

I guess this is just part of the full solution to avoid log-bundle being removed since gitleak analysis showed more "leaks"...

The PR achieved the intention in the bug though 👍 Tested locally with GCP, CAPI secrets are no longer written to disk.

$ cat .openshift_install.log | grep -i 'Skipping secret manifest'
time="2026-04-13T15:47:40-07:00" level=debug msg="Skipping secret manifest openshift-cluster-api-guests/thvo-pchqm-bootstrap"
time="2026-04-13T15:47:40-07:00" level=debug msg="Skipping secret manifest openshift-cluster-api-guests/thvo-pchqm-master"
time="2026-04-13T15:47:40-07:00" level=debug msg="Skipping secret manifest openshift-cluster-api-guests/thvo-pchqm-worker"
$ ls -la --time-style=+"%Y-%m-%d %H:%M" .clusterapi_output/ | awk '{print $6, $7, $8}'
2026-04-13 15:47 .
2026-04-13 15:41 ..
2026-04-13 15:47 Cluster-openshift-cluster-api-guests-thvo-pchqm.yaml
2026-04-13 15:41 envtest.kubeconfig
2026-04-13 15:41 etcd
2026-04-13 15:56 etcd.log
2026-04-13 15:47 GCPCluster-openshift-cluster-api-guests-thvo-pchqm.yaml
2026-04-13 15:47 GCPMachine-openshift-cluster-api-guests-thvo-pchqm-bootstrap.yaml
2026-04-13 15:47 GCPMachine-openshift-cluster-api-guests-thvo-pchqm-master-0.yaml
2026-04-13 15:47 GCPMachine-openshift-cluster-api-guests-thvo-pchqm-master-1.yaml
2026-04-13 15:47 GCPMachine-openshift-cluster-api-guests-thvo-pchqm-master-2.yaml
2026-04-13 15:51 kube-apiserver.log
2026-04-13 15:47 Machine-openshift-cluster-api-guests-thvo-pchqm-bootstrap.yaml
2026-04-13 15:47 Machine-openshift-cluster-api-guests-thvo-pchqm-master-0.yaml
2026-04-13 15:47 Machine-openshift-cluster-api-guests-thvo-pchqm-master-1.yaml
2026-04-13 15:47 Machine-openshift-cluster-api-guests-thvo-pchqm-master-2.yaml
2026-04-13 15:47 Namespace--openshift-cluster-api-guests.yaml

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tthvo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2026
@tthvo
Copy link
Copy Markdown
Member

tthvo commented Apr 14, 2026

@coderabbitai Can you look into the PR's e2e-aws-ovn job and check:

  • Whether the install completes successfully
  • Whether the install clusterapi artifacts contains any secret manifests
  • Whether the install log-bundle is downloadable (i.e. not being removed due to sensitive information). I expect it to still be removed so just confirming.

That's all. Make me proud 😁

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get PR checks/statuses for PR `#10498`
gh pr checks 10498 --repo openshift/installer 2>&1 | head -80

Length of output: 3748


@tthvo: I've checked the current CI status for PR #10498, but unfortunately the ci/prow/e2e-aws-ovn job is still pending (in progress) — so I'm unable to verify the three items you asked about just yet:

Check Status
Install completes successfully ⏳ Job still running
Clusterapi artifacts contain no secret manifests ⏳ Job still running
Log-bundle downloadable (not removed) ⏳ Job still running

You can track the job here: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_installer/10498/pull-ci-openshift-installer-main-e2e-aws-ovn/2043814984063913984

All other checks (gofmt, golint, govet, unit, images, etc.) are passing ✅. Once the e2e job completes, feel free to ping me again and I'll do a thorough check of the artifacts and log-bundle! 🙌

@tthvo
Copy link
Copy Markdown
Member

tthvo commented Apr 14, 2026

@coderabbitai The step e2e-aws-ovn-ipi-install-install in e2e-aws-ovn already completes. You can check the install artifacts normal without waiting for the entire job to finish.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Got it! Let me check the install step artifacts directly.


🏁 Script executed:

#!/bin/bash
# Check the overall job status and find the install step artifacts
JOB_BASE="https://storage.googleapis.com/test-platform-results/pr-logs/pull/openshift_installer/10498/pull-ci-openshift-installer-main-e2e-aws-ovn/2043814984063913984"

echo "=== Job finished.json ==="
curl -sfL "${JOB_BASE}/finished.json" 2>&1 || echo "Not available"

echo ""
echo "=== Artifacts directory listing ==="
curl -sfL "${JOB_BASE}/artifacts/" 2>&1 | head -100 || echo "Not available"

Length of output: 76


Let me try different URL patterns for the artifacts.


🏁 Script executed:

#!/bin/bash
JOB_ID="2043814984063913984"
JOB_NAME="pull-ci-openshift-installer-main-e2e-aws-ovn"

# Try alternate GCS bucket paths
echo "=== Attempt 1: origin-ci-test bucket ==="
curl -sfL "https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/10498/${JOB_NAME}/${JOB_ID}/finished.json" 2>&1 || echo "Not found"

echo ""
echo "=== Attempt 2: Prow deck API for job details ==="
curl -sfL "https://prow.ci.openshift.org/prowjob?prowjob=${JOB_ID}" 2>&1 | python3 -c "import sys,json; d=json.load(sys.stdin); print(json.dumps(d.get('status',{}), indent=2))" 2>&1 || echo "Not available"

echo ""
echo "=== Attempt 3: Prow spyglass artifacts ==="
curl -sfL "https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_installer/10498/${JOB_NAME}/${JOB_ID}" 2>&1 | grep -oP 'gs://[^"]+' | head -20 || echo "Not available"

echo ""
echo "=== Attempt 4: Check GCS via storage API with listing ==="
curl -sfL "https://storage.googleapis.com/storage/v1/b/test-platform-results/o?prefix=pr-logs/pull/openshift_installer/10498/${JOB_NAME}/${JOB_ID}/&delimiter=/&maxResults=50" 2>&1 | python3 -c "import sys,json; d=json.load(sys.stdin); [print(i.get('name','')) for i in d.get('items',[])] or [print(p) for p in d.get('prefixes',[])]" 2>&1 || echo "Not available"

Length of output: 1414


🏁 Script executed:

#!/bin/bash
JOB_ID="2043814984063913984"
JOB_NAME="pull-ci-openshift-installer-main-e2e-aws-ovn"
BASE_PREFIX="pr-logs/pull/openshift_installer/10498/${JOB_NAME}/${JOB_ID}"

echo "=== prowjob.json status ==="
curl -sfL "https://storage.googleapis.com/test-platform-results/${BASE_PREFIX}/prowjob.json" 2>&1 | python3 -c "import sys,json; d=json.load(sys.stdin); s=d.get('status',{}); print('state:', s.get('state')); print('startTime:', s.get('startTime')); print('completionTime:', s.get('completionTime','N/A'))" 2>&1 || echo "Not available"

echo ""
echo "=== List artifacts/ top-level prefixes ==="
curl -sfL "https://storage.googleapis.com/storage/v1/b/test-platform-results/o?prefix=${BASE_PREFIX}/artifacts/&delimiter=/&maxResults=50" 2>&1 | python3 -c "import sys,json; d=json.load(sys.stdin); [print(p) for p in d.get('prefixes',[])]" 2>&1 || echo "Not available"

echo ""
echo "=== List e2e-aws-ovn step artifacts ==="
STEP_PREFIX="${BASE_PREFIX}/artifacts/e2e-aws-ovn/ipi-install-install"
curl -sfL "https://storage.googleapis.com/storage/v1/b/test-platform-results/o?prefix=${STEP_PREFIX}/&delimiter=/&maxResults=100" 2>&1 | python3 -c "import sys,json; d=json.load(sys.stdin); [print(i.get('name','')) for i in d.get('items',[])] or [print(p) for p in d.get('prefixes',[])]" 2>&1 || echo "Not available"

Length of output: 913


🏁 Script executed:

#!/bin/bash
JOB_ID="2043814984063913984"
JOB_NAME="pull-ci-openshift-installer-main-e2e-aws-ovn"
BASE_PREFIX="pr-logs/pull/openshift_installer/10498/${JOB_NAME}/${JOB_ID}"
GCS_BASE="https://storage.googleapis.com/test-platform-results"
API_BASE="https://storage.googleapis.com/storage/v1/b/test-platform-results/o"

echo "=== Install step finished.json ==="
curl -sfL "${GCS_BASE}/${BASE_PREFIX}/artifacts/e2e-aws-ovn/ipi-install-install/finished.json" 2>&1

echo ""
echo "=== List all steps under e2e-aws-ovn/ ==="
curl -sfL "${API_BASE}?prefix=${BASE_PREFIX}/artifacts/e2e-aws-ovn/&delimiter=/&maxResults=100" 2>&1 | python3 -c "import sys,json; d=json.load(sys.stdin); [print(p) for p in d.get('prefixes',[])]" 2>&1

echo ""
echo "=== Search for clusterapi artifacts ==="
curl -sfL "${API_BASE}?prefix=${BASE_PREFIX}/artifacts/e2e-aws-ovn/&maxResults=500" 2>&1 | python3 -c "
import sys,json
d=json.load(sys.stdin)
items = d.get('items',[])
for i in items:
    name = i.get('name','')
    if 'clusterapi' in name.lower() or 'secret' in name.lower() or 'log-bundle' in name.lower() or 'gather' in name.lower():
        print(name)
" 2>&1 || echo "None found"

echo ""
echo "=== Search for log-bundle in all artifacts ==="
curl -sfL "${API_BASE}?prefix=${BASE_PREFIX}/artifacts/&maxResults=1000" 2>&1 | python3 -c "
import sys,json
d=json.load(sys.stdin)
items = d.get('items',[])
for i in items:
    name = i.get('name','')
    if 'log-bundle' in name.lower() or 'clusterapi' in name.lower():
        print(name)
" 2>&1 || echo "None found"

Length of output: 10438


🏁 Script executed:

#!/bin/bash
JOB_ID="2043814984063913984"
JOB_NAME="pull-ci-openshift-installer-main-e2e-aws-ovn"
BASE_PREFIX="pr-logs/pull/openshift_installer/10498/${JOB_NAME}/${JOB_ID}"
GCS_BASE="https://storage.googleapis.com/test-platform-results"
API_BASE="https://storage.googleapis.com/storage/v1/b/test-platform-results/o"

LOG_BUNDLE_PATH="${BASE_PREFIX}/artifacts/e2e-aws-ovn/ipi-install-install/artifacts/log-bundle-20260413232252.tar"

echo "=== Log-bundle GCS metadata ==="
curl -sfL "${API_BASE}/$(python3 -c "import urllib.parse; print(urllib.parse.quote('${LOG_BUNDLE_PATH}', safe=''))")" 2>&1 | python3 -c "
import sys,json
d=json.load(sys.stdin)
print('name:', d.get('name',''))
print('size:', d.get('size',''))
print('contentType:', d.get('contentType',''))
print('updated:', d.get('updated',''))
" 2>&1 || echo "Not available"

echo ""
echo "=== Log-bundle HTTP HEAD check ==="
curl -sI "${GCS_BASE}/${LOG_BUNDLE_PATH}" 2>&1 | head -10

echo ""
echo "=== Verify no Secret files in clusterapi output (double-check full listing) ==="
curl -sfL "${API_BASE}?prefix=${BASE_PREFIX}/artifacts/e2e-aws-ovn/ipi-install-install/artifacts/clusterapi_output-1776123589/&maxResults=200" 2>&1 | python3 -c "
import sys,json
d=json.load(sys.stdin)
items = d.get('items',[])
print('Total files in clusterapi_output:', len(items))
for i in items:
    name = i.get('name','').split('/')[-1]
    print(' -', name)
secret_files = [i for i in items if 'secret' in i.get('name','').lower()]
print()
print('Secret files found:', len(secret_files))
" 2>&1

Length of output: 1804


@tthvo: Here are the results from the e2e-aws-ovn-ipi-install-install step — all three checks confirmed! 🎉


✅ 1. Install completes successfully

The install step's finished.json shows:

{"passed": true, "result": "SUCCESS"}

✅ 2. No secret manifests in clusterapi artifacts

The clusterapi_output-1776123589/ directory contains 14 files — zero of which are Secret-*.yaml:

File
AWSCluster-...-l9qrs.yaml
AWSClusterControllerIdentity--default.yaml
AWSMachine-...-bootstrap.yaml
AWSMachine-...-master-0/1/2.yaml
Cluster-...-l9qrs.yaml
Machine-...-bootstrap.yaml
Machine-...-master-0/1/2.yaml
Namespace--openshift-cluster-api-guests.yaml
etcd.log
kube-apiserver.log

The PR's fix is working correctly — secrets are excluded from the debug artifacts. 🔒


✅ 3. Log-bundle still scrubbed by CI (as expected)

The log-bundle-20260413232252.tar is technically present in GCS and returns HTTP 200, but it's only 76 bytes with contentType: text/plain — clearly a stub/placeholder left by CI scrubbing, not an actual tar archive. So the log-bundle content has indeed been removed due to sensitive information, confirming the expected CI behavior is unchanged.

@patrickdillon
Copy link
Copy Markdown
Contributor Author

I guess this is just part of the full solution to avoid log-bundle being removed since gitleak analysis showed more "leaks"...

I missed the PR but discussed with @jcpowermac separately. Until now, I mistakenly thought the cause was the root cluster cloud creds. I didn't realize it was the local control plane.

@tthvo
Copy link
Copy Markdown
Member

tthvo commented Apr 14, 2026

/test e2e-aws-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 14, 2026

@patrickdillon: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn f29204d link true /test e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@patrickdillon
Copy link
Copy Markdown
Contributor Author

/verified by e2e-aws-ovn

previous run has (empty) secrets. This one does not.

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@patrickdillon: This PR has been marked as verified by e2e-aws-ovn.

Details

In response to this:

/verified by e2e-aws-ovn

previous run has (empty) secrets. This one does not.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 075b809 and 2 for PR HEAD f29204d in total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants