WIP merge wait methods in by denik · Pull Request #5391 · databricks/cli

denik · 2026-06-01T08:12:57Z

Changes

Why

Tests

eng-dev-ecosystem-bot · 2026-06-01T08:54:45Z

Commit: 79818c6

	Env	🟨KNOWN	💚RECOVERED	🙈SKIP	✅pass	🙈skip	Time
🟨	aws linux	7		15	261	925	7:10
🟨	aws windows	7		15	263	923	14:48
💚	aws-ucws linux		7	15	357	839	7:06
💚	aws-ucws windows		7	15	359	837	8:44
💚	azure linux		1	17	264	923	6:11
💚	azure windows		1	17	266	921	7:52
💚	azure-ucws linux		1	17	362	835	6:32
💚	azure-ucws windows		1	17	364	833	9:03
💚	gcp linux		1	17	260	926	7:25
💚	gcp windows		1	17	262	924	10:34

22 interesting tests: 15 SKIP, 7 KNOWN

	Test Name	aws linux	aws windows	aws-ucws linux	aws-ucws windows	azure linux	azure windows	azure-ucws linux	azure-ucws windows	gcp linux	gcp windows
🟨	TestAccept	🟨K	🟨K	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R
🙈	TestAccept/bundle/invariant/no_drift	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/permissions	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions	🟨K	🟨K	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct	🟨K	🟨K	💚R	💚R
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	🟨K	🟨K	💚R	💚R
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions	🟨K	🟨K	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct	🟨K	🟨K	💚R	💚R
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	🟨K	🟨K	💚R	💚R
🙈	TestAccept/bundle/resources/postgres_branches/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/replace_existing	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/update_protected	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/without_branch_id	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_projects/update_display_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/synced_database_tables/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_indexes/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_indexes/grants/select	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/ssh/connection	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S

Top 24 slowest tests (at least 2 minutes):

duration	env	testname
5:04	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:49	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:25	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:54	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:37	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:30	gcp windows	TestAccept
3:25	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:21	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:21	azure-ucws windows	TestAccept
3:20	azure windows	TestAccept
3:09	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:09	aws-ucws windows	TestAccept
3:04	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:57	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:49	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:45	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:44	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:43	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:38	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:37	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:36	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:35	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:34	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:32	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

…for intermediate state saves Resources with multi-step deployments (apps, clusters, model serving endpoints, etc.) previously split their logic between DoCreate/DoUpdate and WaitAfterCreate/WaitAfterUpdate at an arbitrary point, making it hard to persist state incrementally. Replace the WaitAfterXxx methods with an *Engine parameter on DoCreate and DoUpdate. Engine.SetID + Engine.SaveState can be called immediately after the initial API call succeeds, before any long-running wait, so the resource is tracked in state even if deployment is interrupted mid-wait (preventing orphaned resources). Simple resources pass _ *Engine and are unaffected. Complex resources (apps, clusters, database instances, model serving endpoints, vector search endpoints and indexes) now inline their wait logic and call engine.SetID/SaveState at the appropriate point. Co-authored-by: Denis Bilenko

Make testserver app DELETE asynchronous (sets DELETING state) to match real API behaviour. AppsGet auto-removes the app after returning it in DELETING state, so the next request sees it as gone. app_test.go: use real SDK calls (Create + DeleteByName) to put the app in DELETING state before testing DoCreate retry logic, instead of injecting state directly. RetriesWhenGetReturnsNotFound uses a one-shot POST override so GET returns 404 naturally without a custom GET handler. all_test.go: retry DoRead once after DoDelete to let async deletions (DELETING → gone) clear before asserting the resource is absent. Add Server.GetWorkspace to testserver for pre-seeding state in tests. Co-authored-by: Denis Bilenko

Use per-test [[Repls]] rules (matching raw digits) in the two publish-failure tests so the URL parameter is normalized to ?[WSPARAM]=[NUMID] regardless of whether the testserver or cloud environment is used. Revert the over-broad parent-level rule that broke detect-change's existing ?[ow]=... cleanup. Co-authored-by: Denis Bilenko

macOS ships bash 3.2 which does not support &>> (bash 4+). Replace with >> file 2>&1 which works on all bash versions. Co-authored-by: Denis Bilenko

MSYS_NO_PATHCONV=1 (set in the parent test.toml) prevents MSYS2 from converting POSIX paths to Windows paths, causing Python to receive a broken path (/c/a/... instead of C:\a\...) when invoking fault.py. Unset it before the fault.py call, matching the pattern used by other dashboard scripts before their bin helper invocations. Co-authored-by: Denis Bilenko

…d=false Engine.SaveState: - Accepts resourceKey for logging: "SaveState: resources.X id=Y N bytes: {...}" - Skips WAL write if state is unchanged (structdiff.IsEqual), logging a skip message. - Records the last saved value in e.lastSaved for subsequent comparisons. Dashboard DoCreate: - Saves intermediate state with Published=false (the actual draft state) instead of the user's Published=true. This ensures the planner sees a real diff (false→true) on the next deploy if publish is interrupted, rather than treating the resource as up-to-date and silently skipping the publish. publish-failure-retry acceptance test: - Adds `bundle plan -o json` and extracts the published change to verify old=false, new=true in the plan after a failed publish. README.md: update stale SetID+SaveState reference to current SaveState(ctx, id, state) API. Co-authored-by: Denis Bilenko

…reate/DoUpdate Mergiraf reintroduced the standalone WaitAfterCreate and WaitAfterUpdate methods during the rebase against main (sql_warehouse.go had a merge conflict). Inline their wait logic directly into DoCreate and DoUpdate and remove the standalone methods, consistent with the Engine callback pattern used by other resources. Co-authored-by: Denis Bilenko

…ublish-failure-retry - Replace fixed hex regex [[Repls]] with replace_ids.py called after the first deploy; the real dashboard ID is captured from state as [DASHBOARD1_ID]. - Inline the plan -o json | jq output directly into output.txt instead of saving to out.plan_published_change.json and trace cat-ing it. Co-authored-by: Denis Bilenko

…dashboard DoUpdate Two more cases where state could be persisted earlier to avoid orphaning or stale-etag conflicts: sql_warehouse DoCreate: the warehouse is created, then DoCreate polls for RUNNING (and may Stop it). If interrupted during the wait, the warehouse was orphaned. Now SaveState is called right after Create returns the id, before the wait. dashboard DoUpdate: Update() bumps the server-side etag, then publishDashboard() can fail. Mirror the DoCreate fix — save the new etag with Published=false before publishing. This keeps the etag in sync (a stale etag makes the next Update fail with a conflict) and records published=false so the planner re-publishes next deploy. Resolves the pre-existing TODO. Add publish-failure-retry-on-update acceptance test: deploy, then trigger an update with an injected publish failure, and verify the update issued a PATCH (no CREATE) and the plan records published old=false. Co-authored-by: Denis Bilenko

DoUpdate applies up to four sequential mutating calls (tags, AI gateway, config, notifications) and then waits up to 35 minutes for the endpoint to finish updating. If that wait was interrupted, none of the applied changes were persisted to state, forcing them to be re-applied on the next deploy. Save the new config right after the mutating calls and before the wait, mirroring DoCreate which already saves before waitForEndpointReady. The id (endpoint name) matches the one DoCreate persists, so there is no id mismatch. Co-authored-by: Denis Bilenko

…, not resource id) The previous attempt to save state before the async wait in postgres DoCreate used engine.SaveState(ctx, waiter.Name(), config). waiter.Name() returns the LRO operation name (e.g. .../branches/foo/operations/UUID), not the resource name (…/branches/foo) that DoCreate ultimately returns. Using the operation name as the id creates a dead WAL entry and does not help with orphan recovery. Replace with a TODO comment explaining the two options for a proper fix: 1. Derive the resource name from request inputs (Parent + resource-type + Id). 2. Use waiter.Metadata() if it surfaces the resource name before Wait completes. No panic in production: the framework saves state via db.SaveState (not through the Engine) after DoCreate returns, so the mismatch between operation-name and real-id never hits the Engine's id-mismatch check. The test panic was a test artifact where the same Engine was reused across Create→Update. Co-authored-by: Denis Bilenko

Previously apply.go called db.SaveState directly after DoCreate/DoUpdate returned, bypassing the engine. This meant a DoCreate that called engine.SaveState with a wrong id (e.g. an LRO operation name instead of the real resource name) would go undetected — the final save would silently use the correct id. Route both final saves through the engine instead: - If DoCreate/DoUpdate called engine.SaveState with a wrong id, the engine's id-mismatch check panics, catching the bug immediately in tests. - If DoCreate/DoUpdate already saved identical state (e.g. the resource saved before a long wait), the engine's dedup skips the redundant write. - State-save I/O errors are logged internally by the engine and no longer abort the deployment (the resource was already created/updated successfully; aborting would confuse the user). UpdateWithID is left unchanged: DoUpdateWithID takes no Engine parameter. Co-authored-by: Denis Bilenko

The type does exactly one thing: save state. StateSaver names that directly. NewNopEngine -> NewNopStateSaver, engine.go -> state_saver.go. Co-authored-by: Denis Bilenko

…aver - SaveStateWith[F any]: type-safe helper that temporarily sets a field to an intermediate value before saving, then restores it. Used to save Published=false before publishDashboard and Lifecycle=nil before warehouse/cluster lifecycle management, so the planner sees a real diff if deployment is interrupted mid-way. - Change saveFunc signature to func(id string, b json.RawMessage) error and add DeploymentState.SaveStateJSON to accept pre-marshaled bytes. StateSaver already marshals x to JSON for dedup; passing those bytes directly avoids a redundant marshal on every save. - Fix lastSaved aliasing: store []byte (JSON snapshot) instead of any pointer. SaveStateWith modifies the config, saves, then restores; a pointer-based lastSaved would point at the restored value, making the subsequent final save look like a no-op and leaving Published=false in the WAL. Co-authored-by: Isaac

Co-authored-by: Isaac

denik temporarily deployed to test-trigger-is June 1, 2026 08:13 — with GitHub Actions Inactive

denik force-pushed the denik/wait-method-removal branch from b00a6f3 to 786f578 Compare June 1, 2026 13:05

denik temporarily deployed to test-trigger-is June 1, 2026 13:06 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 1, 2026 15:14 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 2, 2026 13:32 — with GitHub Actions Inactive

denik force-pushed the denik/wait-method-removal branch from f0e65f7 to 9f7fc23 Compare June 2, 2026 19:10

denik temporarily deployed to test-trigger-is June 2, 2026 19:11 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 2, 2026 20:05 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 2, 2026 20:21 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 2, 2026 20:41 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 3, 2026 08:52 — with GitHub Actions Inactive

denik force-pushed the denik/wait-method-removal branch from bb3c2e3 to cdd95c8 Compare June 3, 2026 15:15

denik temporarily deployed to test-trigger-is June 3, 2026 15:16 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 3, 2026 15:35 — with GitHub Actions Inactive

denik added 2 commits June 4, 2026 11:41

denik added 5 commits June 4, 2026 11:41

acceptance/dashboards: fix &>> redirect for bash 3.2 compatibility

4b584f1

macOS ships bash 3.2 which does not support &>> (bash 4+). Replace with >> file 2>&1 which works on all bash versions. Co-authored-by: Denis Bilenko

denik force-pushed the denik/wait-method-removal branch from df19a64 to bf602f1 Compare June 4, 2026 09:41

denik temporarily deployed to test-trigger-is June 4, 2026 09:42 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 4, 2026 16:50 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 5, 2026 13:03 — with GitHub Actions Inactive

denik added 2 commits June 5, 2026 17:07

denik temporarily deployed to test-trigger-is June 5, 2026 15:19 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 5, 2026 15:32 — with GitHub Actions Inactive

direct: rename Engine -> StateSaver

339e386

The type does exactly one thing: save state. StateSaver names that directly. NewNopEngine -> NewNopStateSaver, engine.go -> state_saver.go. Co-authored-by: Denis Bilenko

denik temporarily deployed to test-trigger-is June 5, 2026 15:39 — with GitHub Actions Inactive

denik temporarily deployed to test-trigger-is June 5, 2026 17:47 — with GitHub Actions Inactive

direct: fix SaveStateWith parameter order (ctx first)

79818c6

Co-authored-by: Isaac

denik temporarily deployed to test-trigger-is June 5, 2026 18:16 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP merge wait methods in#5391

WIP merge wait methods in#5391
denik wants to merge 21 commits into
mainfrom
denik/wait-method-removal

denik commented Jun 1, 2026

Uh oh!

eng-dev-ecosystem-bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

denik commented Jun 1, 2026

Changes

Why

Tests

Uh oh!

eng-dev-ecosystem-bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eng-dev-ecosystem-bot commented Jun 1, 2026 •

edited

Loading