✨ e2e: isolate tests with per-scenario dynamic catalogs#2651
✨ e2e: isolate tests with per-scenario dynamic catalogs#2651joelanford wants to merge 7 commits intooperator-framework:mainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Reworks the E2E testing approach to eliminate shared, static test catalogs by generating and pushing per-scenario bundles/catalogs at runtime (with scenario-specific naming), enabling safer isolation and paving the way for parallel execution. It also removes the legacy “push static fixtures into a registry” tooling and relocates bundle/CSV testing helpers into a dedicated internal testing package.
Changes:
- Add per-scenario catalog/bundle builder (
test/internal/catalog) and in-cluster registry deploy helper (test/internal/registry) to support dynamic OCI image generation/push. - Convert Gherkin features and step implementations to use dynamic catalogs and
${SCENARIO_ID}/${PACKAGE:...}/${CATALOG:...}substitution. - Remove legacy static catalog/bundle fixtures and push tooling under
testdata/, and update Makefile targets accordingly.
Reviewed changes
Copilot reviewed 77 out of 79 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| testdata/push/push.go | Removed legacy image-building/pushing tool for static test fixtures. |
| testdata/push/go.sum | Removed legacy module deps for the deleted push tool. |
| testdata/push/go.mod | Removed legacy module for the deleted push tool. |
| testdata/push/README.md | Removed documentation for the deleted push tool. |
| testdata/images/catalogs/test-catalog/v2/configs/catalog.yaml | Removed static catalog fixture content. |
| testdata/images/catalogs/test-catalog/v2/configs/.indexignore | Removed static catalog fixture ignore file. |
| testdata/images/catalogs/test-catalog/v1/configs/catalog.yaml | Removed static catalog fixture content. |
| testdata/images/catalogs/test-catalog/v1/configs/.indexignore | Removed static catalog fixture ignore file. |
| testdata/images/bundles/test-operator/v1.2.0/metadata/properties.yaml | Removed static bundle fixture properties. |
| testdata/images/bundles/test-operator/v1.2.0/metadata/annotations.yaml | Removed static bundle fixture annotations. |
| testdata/images/bundles/test-operator/v1.2.0/manifests/testoperator.networkpolicy.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/test-operator/v1.2.0/manifests/testoperator.clusterserviceversion.yaml | Removed static bundle fixture CSV. |
| testdata/images/bundles/test-operator/v1.2.0/manifests/script.configmap.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/test-operator/v1.2.0/manifests/olm.operatorframework.com_olme2etest.yaml | Removed static bundle fixture CRD. |
| testdata/images/bundles/test-operator/v1.2.0/manifests/bundle.configmap.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/test-operator/v1.0.3/metadata/annotations.yaml | Removed static bundle fixture annotations. |
| testdata/images/bundles/test-operator/v1.0.3/manifests/testoperator.clusterserviceversion.yaml | Removed static bundle fixture CSV. |
| testdata/images/bundles/test-operator/v1.0.3/manifests/bundle.configmap.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/test-operator/v1.0.2/metadata/annotations.yaml | Removed static bundle fixture annotations. |
| testdata/images/bundles/test-operator/v1.0.2/manifests/testoperator.networkpolicy.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/test-operator/v1.0.2/manifests/testoperator.clusterserviceversion.yaml | Removed static bundle fixture CSV. |
| testdata/images/bundles/test-operator/v1.0.2/manifests/olm.operatorframework.com_olme2etest.yaml | Removed static bundle fixture CRD. |
| testdata/images/bundles/test-operator/v1.0.2/manifests/bundle.configmap.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/test-operator/v1.0.0/metadata/annotations.yaml | Removed static bundle fixture annotations. |
| testdata/images/bundles/test-operator/v1.0.0/manifests/testoperator.networkpolicy.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/test-operator/v1.0.0/manifests/testoperator.clusterserviceversion.yaml | Removed static bundle fixture CSV. |
| testdata/images/bundles/test-operator/v1.0.0/manifests/script.configmap.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/test-operator/v1.0.0/manifests/olm.operatorframework.com_olme2etest.yaml | Removed static bundle fixture CRD. |
| testdata/images/bundles/test-operator/v1.0.0/manifests/dummy.configmap.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/test-operator/v1.0.0/manifests/bundle.configmap.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/single-namespace-operator/v1.0.0/metadata/annotations.yaml | Removed static bundle fixture annotations. |
| testdata/images/bundles/single-namespace-operator/v1.0.0/manifests/singlenamespaceoperator.clusterserviceversion.yaml | Removed static bundle fixture CSV. |
| testdata/images/bundles/single-namespace-operator/v1.0.0/manifests/olm.operatorframework.com_singlenamespaces.yaml | Removed static bundle fixture CRD. |
| testdata/images/bundles/own-namespace-operator/v1.0.0/metadata/annotations.yaml | Removed static bundle fixture annotations. |
| testdata/images/bundles/own-namespace-operator/v1.0.0/manifests/ownnamespaceoperator.clusterserviceversion.yaml | Removed static bundle fixture CSV. |
| testdata/images/bundles/own-namespace-operator/v1.0.0/manifests/olm.operatorframework.com_ownnamespaces.yaml | Removed static bundle fixture CRD. |
| testdata/images/bundles/large-crd-operator/v1.0.0/metadata/annotations.yaml | Removed static bundle fixture annotations. |
| testdata/images/bundles/large-crd-operator/v1.0.0/manifests/script.configmap.yaml | Removed static bundle fixture manifest. |
| testdata/images/bundles/large-crd-operator/v1.0.0/manifests/largecrdoperator.clusterserviceversion.yaml | Removed static bundle fixture CSV. |
| testdata/build-test-registry.sh | Removed legacy script that deployed registry + push job for static fixtures. |
| testdata/Dockerfile | Removed legacy image build that packaged static fixtures and push binary. |
| test/upgrade-e2e/features/operator-upgrade.feature | Updated upgrade E2E feature to use dynamic per-scenario catalogs/packages. |
| test/internal/registry/registry.go | Added Go helper to deploy an in-cluster TLS-enabled registry via SSA for tests. |
| test/internal/catalog/catalog_test.go | Added unit tests for bundle/catalog generation logic and parameterization. |
| test/internal/catalog/catalog.go | Added catalog builder that generates/pushes per-scenario bundle + catalog OCI images. |
| test/internal/catalog/bundle.go | Added bundle builder producing parameterized manifests (CRD/CSV/Deploy/etc). |
| test/helpers/helpers.go | Removed legacy E2E helper package (static catalog assumptions). |
| test/helpers/feature_gates.go | Removed legacy feature-gate detection helper package. |
| test/extension-developer-e2e/setup.sh | Switched setup to env-var driven config; adjusted build/push commands. |
| test/extension-developer-e2e/extension_developer_test.go | Added TestMain orchestration for registry + setup; updated catalog ref wiring. |
| test/e2e/steps/upgrade_steps.go | Updated/added upgrade steps for per-scenario catalogs and reconciliation checks. |
| test/e2e/steps/testdata/test-catalog-template.yaml | Removed static catalog template. |
| test/e2e/steps/testdata/extra-catalog-template.yaml | Removed static catalog template. |
| test/e2e/steps/hooks.go | Updated scenario context to track per-scenario catalogs/packages; cleanup changes. |
| test/e2e/features_test.go | Increased Godog scenario concurrency (parallel execution). |
| test/e2e/features/user-managed-fields.feature | Updated feature to use per-scenario names and dynamic catalogs. |
| test/e2e/features/update.feature | Updated feature to use per-scenario names and dynamic catalogs/catalog updates. |
| test/e2e/features/uninstall.feature | Updated feature to use per-scenario names and dynamic catalogs. |
| test/e2e/features/status.feature | Updated feature to use per-scenario names and dynamic catalogs. |
| test/e2e/features/recover.feature | Updated feature to use per-scenario names and dynamic catalogs; updated messages. |
| test/e2e/features/install.feature | Updated feature to use per-scenario names and dynamic catalogs; revised scenarios. |
| test/e2e/README.md | Updated E2E docs for new variable substitution and dynamic catalog architecture. |
| internal/testing/bundle/fs/bundlefs_test.go | Updated tests to new internal testing bundle/fs package path/name. |
| internal/testing/bundle/fs/bundlefs.go | Renamed package to fs (new location/namespace for bundle fs builder). |
| internal/testing/bundle/csv/builder_test.go | Updated tests to new internal testing bundle/csv package path/name. |
| internal/testing/bundle/csv/builder.go | Renamed package to csv (new location/namespace for CSV builder). |
| internal/operator-controller/rukpak/render/render_test.go | Updated imports to use new internal testing bundle/csv builder. |
| internal/operator-controller/rukpak/render/registryv1/validators/validator_test.go | Updated imports to use new internal testing bundle/csv builder. |
| internal/operator-controller/rukpak/render/registryv1/registryv1_test.go | Updated imports to use new internal testing bundle/csv builder. |
| internal/operator-controller/rukpak/render/registryv1/generators/generators_test.go | Updated imports to use new internal testing bundle/csv builder. |
| internal/operator-controller/rukpak/bundle/source/source_test.go | Updated imports to use new internal testing bundle/fs and bundle/csv builders. |
| internal/operator-controller/config/error_formatting_test.go | Updated imports to use new internal testing bundle/csv builder. |
| internal/operator-controller/config/config_test.go | Updated imports to use new internal testing bundle/csv builder. |
| internal/operator-controller/applier/provider_test.go | Updated imports to use new internal testing bundle/fs and bundle/csv builders. |
| internal/operator-controller/applier/boxcutter_test.go | Updated imports to use new internal testing bundle/fs and bundle/csv builders. |
| docs/superpowers/specs/2026-04-13-e2e-test-isolation-design.md | Added design spec documenting the per-scenario dynamic catalog architecture. |
| Makefile | Removed legacy image-registry target; updated env exports and unit-test dirs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if err := wait.PollUntilContextTimeout(ctx, time.Second, 60*time.Second, true, func(ctx context.Context) (bool, error) { | ||
| if err := c.Get(ctx, deployKey, d); err != nil { | ||
| return false, nil | ||
| } |
There was a problem hiding this comment.
In the deployment availability wait loop, c.Get errors are swallowed (return false, nil). This can mask real failures (RBAC/connection issues) and only surface as a timeout. Consider returning the error unless it’s a NotFound, so the caller sees the real root cause quickly.
5077134 to
6597bfd
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2651 +/- ##
==========================================
- Coverage 68.92% 68.02% -0.90%
==========================================
Files 140 144 +4
Lines 9905 10563 +658
==========================================
+ Hits 6827 7186 +359
- Misses 2566 2857 +291
- Partials 512 520 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6597bfd to
9d85b1b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 77 out of 79 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Have you been able to check that this setup will also work in the downstream environment? |
|
@dtfranz That's a good point. I have not checked that. I'll make a PR downstream as well to see if I broke anything there. |
| @@ -0,0 +1,103 @@ | |||
| # E2E Test Isolation: Per-Scenario Catalogs via Dynamic OCI Image Building | |||
There was a problem hiding this comment.
@joelanford should we get it merged?
It might be nice have the design of the tests but should this dir be the correct one/
I think it was generated by CLAUDE to address the need.
There was a problem hiding this comment.
Do we have a directory for this kind of thing? I do think we should merge this kind of doc, and I was definitely intending to have this exact discussion.
| ``` | ||
| Scenario starts | ||
| -> Generate parameterized bundle manifests (CRD names, deployments, etc. include scenario ID) | ||
| -> Build + push bundle OCI images to e2e registry via go-containerregistry |
There was a problem hiding this comment.
Overall, I’m okay with the proposed changes.
We should sync with @tmshort to confirm whether there’s any reason this might not work downstream. Additionally, I suggest holding off on merging this until we can coordinate an upstream/downstream sync. That way, we can have a dedicated PR downstream containing only this change, which will make it easier to isolate and address any issues if they arise. PS> I saw either: openshift/operator-framework-operator-controller#699 to evaluate it in downstream which is a great idea 🚀
It also looks like the following doc may have been added unintentionally and should be reviewed/removed if not needed:
docs/superpowers/specs/2026-04-13-e2e-test-isolation-design.md
Lastly, looping in @pedjak for review as well, since this area is now primarily owned by him too.
9d85b1b to
cb8d120
Compare
|
I've got a downstream equivalent PR here to make sure I don't break downstream: openshift/operator-framework-operator-controller#699 This commit has the downstream changes that are required: openshift/operator-framework-operator-controller@1964d6b We can probably step this in by:
|
Replace the localhost:30000 hostPort-based registry access with Kubernetes port-forwarding. This makes the test runner work regardless of the cluster network topology (not just kind with extraPortMappings). - Remove port 30000 extraPortMappings from kind configs - Add PortForward() to the registry package using SPDY - Use port-forward in e2e steps and extension-developer-e2e - Remove LOCAL_REGISTRY_HOST env var (no longer needed) - Keep NodePort service for containerd on kind nodes (hosts.toml)
cb8d120 to
1b1ef51
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 82 out of 85 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Apply TLS certificate | ||
| cert := cmapplyconfig.Certificate(secretName, namespace). | ||
| WithSpec(cmapplyconfig.CertificateSpec(). | ||
| WithSecretName(secretName). | ||
| WithIsCA(true). | ||
| WithDNSNames( | ||
| fmt.Sprintf("%s.%s.svc", name, namespace), | ||
| fmt.Sprintf("%s.%s.svc.cluster.local", name, namespace), | ||
| ). | ||
| WithPrivateKey(cmapplyconfig.CertificatePrivateKey(). | ||
| WithRotationPolicy(certmanagerv1.RotationPolicyAlways). | ||
| WithAlgorithm(certmanagerv1.ECDSAKeyAlgorithm). | ||
| WithSize(256), | ||
| ). | ||
| WithIssuerRef(cmmetaapplyconfig.IssuerReference(). | ||
| WithName("olmv1-ca"). | ||
| WithKind("ClusterIssuer"). | ||
| WithGroup("cert-manager.io"), | ||
| ), | ||
| ) | ||
| if err := c.Apply(ctx, cert, fieldOwner, client.ForceOwnership); err != nil { | ||
| return fmt.Errorf("failed to apply certificate: %w", err) | ||
| } | ||
|
|
||
| // Apply deployment | ||
| deploy := appsv1ac.Deployment(name, namespace). | ||
| WithLabels(map[string]string{"app": "registry"}). | ||
| WithSpec(appsv1ac.DeploymentSpec(). | ||
| WithReplicas(1). | ||
| WithSelector(metav1ac.LabelSelector(). | ||
| WithMatchLabels(map[string]string{"app": "registry"}), | ||
| ). | ||
| WithTemplate(corev1ac.PodTemplateSpec(). | ||
| WithLabels(map[string]string{"app": "registry"}). | ||
| WithSpec(corev1ac.PodSpec(). | ||
| WithContainers(corev1ac.Container(). | ||
| WithName("registry"). | ||
| WithImage("registry:3"). | ||
| WithImagePullPolicy(corev1.PullIfNotPresent). | ||
| WithVolumeMounts(corev1ac.VolumeMount(). | ||
| WithName("certs-vol"). | ||
| WithMountPath("/certs"), | ||
| ). | ||
| WithEnv( | ||
| corev1ac.EnvVar().WithName("REGISTRY_HTTP_TLS_CERTIFICATE").WithValue("/certs/tls.crt"), | ||
| corev1ac.EnvVar().WithName("REGISTRY_HTTP_TLS_KEY").WithValue("/certs/tls.key"), | ||
| ), |
There was a problem hiding this comment.
The deployed registry is configured for TLS via REGISTRY_HTTP_TLS_CERTIFICATE/KEY, but the service port is named "http" and the rest of the test code pushes with crane.Insecure (HTTP). With TLS enabled, HTTP pushes will fail (and in-cluster pulls may also fail unless trust/insecure config is wired up). Either deploy the registry without TLS, or update the push/pull clients to use HTTPS with an appropriate trust policy/CA.
| tag := fmt.Sprintf("%s/bundles/%s:v%s", localRegistry, paramPkgName, bd.version) | ||
| if err := crane.Push(img, tag, crane.Insecure); err != nil { | ||
| return nil, fmt.Errorf("failed to push bundle image %s: %w", tag, err) | ||
| } |
There was a problem hiding this comment.
crane.Push is called with crane.Insecure, which forces plain HTTP. However the in-cluster registry deployment is configured with TLS, so these pushes are likely to fail. Either push over HTTPS with a transport that trusts the registry cert (or skips verification), or deploy the registry in HTTP mode to match crane.Insecure.
| catalogTag := fmt.Sprintf("%s/e2e/%s-catalog-%s:%s", localRegistry, c.name, c.scenarioID, tag) | ||
| if err := crane.Push(catalogImg, catalogTag, crane.Insecure); err != nil { | ||
| return nil, fmt.Errorf("failed to push catalog image %s: %w", catalogTag, err) |
There was a problem hiding this comment.
The catalog image push also uses crane.Insecure (HTTP). If the registry is serving HTTPS (as configured in test/internal/registry/registry.go), this will fail. Align registry deployment and crane push options (either HTTP everywhere, or HTTPS with trust configuration).
| if cfg.staticBundleDir != "" { | ||
| files, err := readBundleDir(cfg.staticBundleDir) | ||
| if err != nil { | ||
| panic(fmt.Errorf("failed to read static bundle dir %s: %w", cfg.staticBundleDir, err)) | ||
| } |
There was a problem hiding this comment.
buildBundle panics if StaticBundleDir cannot be read. Since this is test support code used from steps, a panic will abort the whole test run and skip cleanup, making failures harder to debug. Prefer returning an error from buildBundle (and threading it through callers) so the scenario fails cleanly with context.
| var opts = godog.Options{ | ||
| Format: "pretty", | ||
| Paths: []string{"features"}, | ||
| Output: colors.Colored(os.Stdout), | ||
| Concurrency: 1, | ||
| Concurrency: 100, | ||
| NoColors: true, |
There was a problem hiding this comment.
godog Concurrency is hard-coded to 100. This can easily overwhelm the API server and shared in-cluster components (registry, catalogd/operator-controller) and may cause non-deterministic flakes in CI. Consider defaulting to a conservative value and making higher concurrency opt-in via env/flag (e.g. GODOG_CONCURRENCY) once the suite is proven stable.
| // Push images via crane through the port-forward. | ||
| // setup.sh tags images with CLUSTER_REGISTRY_HOST. We export them | ||
| // from the container runtime via "save" and push through the | ||
| // port-forward with crane. | ||
| containerRuntime := os.Getenv("CONTAINER_RUNTIME") | ||
| bundlePath := "bundles/registry-v1/registry-bundle:v0.0.1" | ||
| for _, path := range []string{bundlePath, catalogTag} { | ||
| srcRef := fmt.Sprintf("%s/%s", clusterRegistryHost, path) | ||
| saveCmd := exec.Command(containerRuntime, "save", srcRef) //nolint:gosec | ||
| tarData, err := saveCmd.Output() | ||
| if err != nil { | ||
| panic(fmt.Sprintf("failed to save image %s: %v", srcRef, err)) | ||
| } | ||
| tag, err := name.NewTag(srcRef) | ||
| if err != nil { | ||
| panic(fmt.Sprintf("failed to parse tag %s: %v", srcRef, err)) | ||
| } | ||
| img, err := tarball.Image(func() (io.ReadCloser, error) { | ||
| return io.NopCloser(bytes.NewReader(tarData)), nil | ||
| }, &tag) | ||
| if err != nil { | ||
| panic(fmt.Sprintf("failed to load image %s from tarball: %v", srcRef, err)) | ||
| } | ||
| pushRef := fmt.Sprintf("%s/%s", localAddr, path) | ||
| if err := crane.Push(img, pushRef, crane.Insecure); err != nil { | ||
| panic(fmt.Sprintf("failed to push image %s: %v", pushRef, err)) | ||
| } |
There was a problem hiding this comment.
This push uses crane.Insecure (HTTP). If the in-cluster registry is configured with TLS (as in test/internal/registry/registry.go), the push will fail. Align the registry deployment and crane push options (HTTP registry, or HTTPS push with trust/skip-verify transport).
Description
Replace shared, static test catalogs with dynamically generated per-scenario catalogs and bundles. Each scenario now builds and pushes its own OCI images at runtime, parameterized by scenario ID, making all cluster-scoped resource names unique. This eliminates cascading failures and enables future parallel test execution (~1,950 lines deleted net).
Commit 1 — Design spec: Add design spec documenting the test isolation architecture, builder API, feature file conventions, and naming scheme.
Commit 2 — Move test helpers: Relocate
bundlefsandcsvbuilders frominternal/operator-framework/rukpak/util/testing/tointernal/testing/bundle/.Commit 3 — Add new libraries: Composable catalog/bundle builder API (
test/internal/catalog/) usinggo-containerregistryand Go-based registry deployment (test/internal/registry/). Addstest/internal/...to unit test coverage.Commit 4 — Convert all tests: Update all feature files, step definitions, hooks, and README to use dynamic per-scenario catalogs with
${PACKAGE:...}and${CATALOG:...}variable substitution. Removeimage-registrydependency fromtest-e2eandtest-experimental-e2etargets.Commit 5 — Extension-developer-e2e: Add
TestMainto deploy registry and runsetup.shfrom Go. Constants in Go are the single source of truth for catalog tag and package name (passed tosetup.shvia env vars). Clean up Makefile exports.Commit 6 — Remove old infrastructure: Delete static bundles, catalogs, templates, push tooling, dead code (
test/helpers/,testdata/build-test-registry.sh), and theimage-registryMakefile target.Commit 7 — Port-forward for registry access: Replace localhost:30000 hostPort-based registry access with Kubernetes port-forwarding via SPDY. Remove NodePort service (now ClusterIP), kind
extraPortMappingsfor port 30000, containerdhosts.toml, andLOCAL_REGISTRY_HOSTenv var. In extension-developer-e2e, images are exported from the container runtime and pushed via crane through the port-forward, avoiding Docker daemon network context issues.Reviewer Checklist
Generated with Claude Code