Skip to content

WIP merge wait methods in#5391

Draft
denik wants to merge 21 commits into
mainfrom
denik/wait-method-removal
Draft

WIP merge wait methods in#5391
denik wants to merge 21 commits into
mainfrom
denik/wait-method-removal

Conversation

@denik
Copy link
Copy Markdown
Contributor

@denik denik commented Jun 1, 2026

Changes

Why

Tests

@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:13 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:13 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Jun 1, 2026

Commit: 79818c6

Run: 27032227448

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 261 925 7:10
🟨​ aws windows 7 15 263 923 14:48
💚​ aws-ucws linux 7 15 357 839 7:06
💚​ aws-ucws windows 7 15 359 837 8:44
💚​ azure linux 1 17 264 923 6:11
💚​ azure windows 1 17 266 921 7:52
💚​ azure-ucws linux 1 17 362 835 6:32
💚​ azure-ucws windows 1 17 364 833 9:03
💚​ gcp linux 1 17 260 926 7:25
💚​ gcp windows 1 17 262 924 10:34
22 interesting tests: 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 24 slowest tests (at least 2 minutes):
duration env testname
5:04 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:49 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:25 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:54 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:37 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:30 gcp windows TestAccept
3:25 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:21 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:21 azure-ucws windows TestAccept
3:20 azure windows TestAccept
3:09 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:09 aws-ucws windows TestAccept
3:04 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:57 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:49 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:45 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:44 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:43 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:38 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:37 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:36 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:35 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:34 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:32 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

@denik denik force-pushed the denik/wait-method-removal branch from b00a6f3 to 786f578 Compare June 1, 2026 13:05
@denik denik temporarily deployed to test-trigger-is June 1, 2026 13:06 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 13:06 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 15:14 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 15:14 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 15:14 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 15:14 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 13:32 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 13:32 — with GitHub Actions Inactive
@denik denik force-pushed the denik/wait-method-removal branch from f0e65f7 to 9f7fc23 Compare June 2, 2026 19:10
@denik denik temporarily deployed to test-trigger-is June 2, 2026 19:11 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 19:11 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:05 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:05 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:21 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:21 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:41 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:41 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 08:52 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 08:52 — with GitHub Actions Inactive
@denik denik force-pushed the denik/wait-method-removal branch from bb3c2e3 to cdd95c8 Compare June 3, 2026 15:15
@denik denik temporarily deployed to test-trigger-is June 3, 2026 15:16 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 15:16 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 15:35 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 15:35 — with GitHub Actions Inactive
denik added 2 commits June 4, 2026 11:41
…for intermediate state saves

Resources with multi-step deployments (apps, clusters, model serving endpoints, etc.)
previously split their logic between DoCreate/DoUpdate and WaitAfterCreate/WaitAfterUpdate
at an arbitrary point, making it hard to persist state incrementally.

Replace the WaitAfterXxx methods with an *Engine parameter on DoCreate and DoUpdate.
Engine.SetID + Engine.SaveState can be called immediately after the initial API call
succeeds, before any long-running wait, so the resource is tracked in state even if
deployment is interrupted mid-wait (preventing orphaned resources).

Simple resources pass _ *Engine and are unaffected. Complex resources (apps, clusters,
database instances, model serving endpoints, vector search endpoints and indexes) now
inline their wait logic and call engine.SetID/SaveState at the appropriate point.

Co-authored-by: Denis Bilenko
Make testserver app DELETE asynchronous (sets DELETING state) to match
real API behaviour. AppsGet auto-removes the app after returning it in
DELETING state, so the next request sees it as gone.

app_test.go: use real SDK calls (Create + DeleteByName) to put the app
in DELETING state before testing DoCreate retry logic, instead of
injecting state directly. RetriesWhenGetReturnsNotFound uses a one-shot
POST override so GET returns 404 naturally without a custom GET handler.

all_test.go: retry DoRead once after DoDelete to let async deletions
(DELETING → gone) clear before asserting the resource is absent.

Add Server.GetWorkspace to testserver for pre-seeding state in tests.

Co-authored-by: Denis Bilenko
denik added 5 commits June 4, 2026 11:41
Use per-test [[Repls]] rules (matching raw digits) in the two
publish-failure tests so the URL parameter is normalized to
?[WSPARAM]=[NUMID] regardless of whether the testserver or cloud
environment is used. Revert the over-broad parent-level rule that
broke detect-change's existing ?[ow]=... cleanup.

Co-authored-by: Denis Bilenko
macOS ships bash 3.2 which does not support &>> (bash 4+).
Replace with >> file 2>&1 which works on all bash versions.

Co-authored-by: Denis Bilenko
MSYS_NO_PATHCONV=1 (set in the parent test.toml) prevents MSYS2 from
converting POSIX paths to Windows paths, causing Python to receive a
broken path (/c/a/... instead of C:\a\...) when invoking fault.py.
Unset it before the fault.py call, matching the pattern used by other
dashboard scripts before their bin helper invocations.

Co-authored-by: Denis Bilenko
…d=false

Engine.SaveState:
- Accepts resourceKey for logging: "SaveState: resources.X id=Y N bytes: {...}"
- Skips WAL write if state is unchanged (structdiff.IsEqual), logging a skip message.
- Records the last saved value in e.lastSaved for subsequent comparisons.

Dashboard DoCreate:
- Saves intermediate state with Published=false (the actual draft state) instead of
  the user's Published=true. This ensures the planner sees a real diff (false→true)
  on the next deploy if publish is interrupted, rather than treating the resource as
  up-to-date and silently skipping the publish.

publish-failure-retry acceptance test:
- Adds `bundle plan -o json` and extracts the published change to verify
  old=false, new=true in the plan after a failed publish.

README.md: update stale SetID+SaveState reference to current SaveState(ctx, id, state) API.

Co-authored-by: Denis Bilenko
…reate/DoUpdate

Mergiraf reintroduced the standalone WaitAfterCreate and WaitAfterUpdate
methods during the rebase against main (sql_warehouse.go had a merge conflict).
Inline their wait logic directly into DoCreate and DoUpdate and remove
the standalone methods, consistent with the Engine callback pattern used
by other resources.

Co-authored-by: Denis Bilenko
@denik denik force-pushed the denik/wait-method-removal branch from df19a64 to bf602f1 Compare June 4, 2026 09:41
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:42 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:42 — with GitHub Actions Inactive
…ublish-failure-retry

- Replace fixed hex regex [[Repls]] with replace_ids.py called after the first
  deploy; the real dashboard ID is captured from state as [DASHBOARD1_ID].
- Inline the plan -o json | jq output directly into output.txt instead of
  saving to out.plan_published_change.json and trace cat-ing it.

Co-authored-by: Denis Bilenko
@denik denik temporarily deployed to test-trigger-is June 4, 2026 16:50 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 16:50 — with GitHub Actions Inactive
…dashboard DoUpdate

Two more cases where state could be persisted earlier to avoid orphaning or
stale-etag conflicts:

sql_warehouse DoCreate: the warehouse is created, then DoCreate polls for RUNNING
(and may Stop it). If interrupted during the wait, the warehouse was orphaned.
Now SaveState is called right after Create returns the id, before the wait.

dashboard DoUpdate: Update() bumps the server-side etag, then publishDashboard()
can fail. Mirror the DoCreate fix — save the new etag with Published=false before
publishing. This keeps the etag in sync (a stale etag makes the next Update fail
with a conflict) and records published=false so the planner re-publishes next
deploy. Resolves the pre-existing TODO.

Add publish-failure-retry-on-update acceptance test: deploy, then trigger an
update with an injected publish failure, and verify the update issued a PATCH
(no CREATE) and the plan records published old=false.

Co-authored-by: Denis Bilenko
@denik denik temporarily deployed to test-trigger-is June 5, 2026 13:03 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 13:03 — with GitHub Actions Inactive
denik added 2 commits June 5, 2026 17:07
DoUpdate applies up to four sequential mutating calls (tags, AI gateway,
config, notifications) and then waits up to 35 minutes for the endpoint to
finish updating. If that wait was interrupted, none of the applied changes
were persisted to state, forcing them to be re-applied on the next deploy.

Save the new config right after the mutating calls and before the wait,
mirroring DoCreate which already saves before waitForEndpointReady. The id
(endpoint name) matches the one DoCreate persists, so there is no id mismatch.

Co-authored-by: Denis Bilenko
…, not resource id)

The previous attempt to save state before the async wait in postgres DoCreate
used engine.SaveState(ctx, waiter.Name(), config). waiter.Name() returns the
LRO operation name (e.g. .../branches/foo/operations/UUID), not the resource
name (…/branches/foo) that DoCreate ultimately returns. Using the operation
name as the id creates a dead WAL entry and does not help with orphan recovery.

Replace with a TODO comment explaining the two options for a proper fix:
  1. Derive the resource name from request inputs (Parent + resource-type + Id).
  2. Use waiter.Metadata() if it surfaces the resource name before Wait completes.

No panic in production: the framework saves state via db.SaveState (not through
the Engine) after DoCreate returns, so the mismatch between operation-name and
real-id never hits the Engine's id-mismatch check. The test panic was a test
artifact where the same Engine was reused across Create→Update.

Co-authored-by: Denis Bilenko
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:19 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:19 — with GitHub Actions Inactive
Previously apply.go called db.SaveState directly after DoCreate/DoUpdate
returned, bypassing the engine. This meant a DoCreate that called
engine.SaveState with a wrong id (e.g. an LRO operation name instead of
the real resource name) would go undetected — the final save would
silently use the correct id.

Route both final saves through the engine instead:

- If DoCreate/DoUpdate called engine.SaveState with a wrong id, the
  engine's id-mismatch check panics, catching the bug immediately in tests.
- If DoCreate/DoUpdate already saved identical state (e.g. the resource
  saved before a long wait), the engine's dedup skips the redundant write.
- State-save I/O errors are logged internally by the engine and no longer
  abort the deployment (the resource was already created/updated
  successfully; aborting would confuse the user).

UpdateWithID is left unchanged: DoUpdateWithID takes no Engine parameter.

Co-authored-by: Denis Bilenko
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:32 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:32 — with GitHub Actions Inactive
The type does exactly one thing: save state. StateSaver names that
directly. NewNopEngine -> NewNopStateSaver, engine.go -> state_saver.go.

Co-authored-by: Denis Bilenko
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:39 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:39 — with GitHub Actions Inactive
…aver

- SaveStateWith[F any]: type-safe helper that temporarily sets a field to
  an intermediate value before saving, then restores it. Used to save
  Published=false before publishDashboard and Lifecycle=nil before
  warehouse/cluster lifecycle management, so the planner sees a real diff
  if deployment is interrupted mid-way.

- Change saveFunc signature to func(id string, b json.RawMessage) error
  and add DeploymentState.SaveStateJSON to accept pre-marshaled bytes.
  StateSaver already marshals x to JSON for dedup; passing those bytes
  directly avoids a redundant marshal on every save.

- Fix lastSaved aliasing: store []byte (JSON snapshot) instead of any
  pointer. SaveStateWith modifies the config, saves, then restores; a
  pointer-based lastSaved would point at the restored value, making the
  subsequent final save look like a no-op and leaving Published=false in
  the WAL.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 5, 2026 17:47 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 17:47 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 18:16 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 18:16 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants