DestinationAppwrite: tolerate existing schema on Skip/Upsert re-migration#171
DestinationAppwrite: tolerate existing schema on Skip/Upsert re-migration#171premtsd-code wants to merge 15 commits intomainfrom
Conversation
Re-migration needs the destination to tolerate existing schema resources.
The database library stays minimal — migration owns the reconciliation
decision since it has both source and destination metadata in hand.
Per-resource behavior:
- Database: pre-check `databases` metadata. If exists, hydrate sequence
from the existing document and return. Both Skip and Upsert tolerate
unconditionally — databases contain the entire tree of collections +
rows, dropping would destroy everything.
- Table: pre-check `database_{sequence}` metadata. Same pattern —
tolerate unconditionally, hydrate sequence, never drop. Attribute and
index reconciliation happens per-resource at the lower layer.
- Attribute: pre-check `attributes` metadata by composite id
({databaseSeq}_{tableSeq}_{key}). Skip tolerates. Upsert compares
source's updatedAt vs destination's — if source is strictly newer the
column is dropped (via `dbForDatabases->deleteAttribute`) and recreated
so the spec matches source. Column data is wiped on drop; rows will be
repopulated by the row-level Upsert path that follows.
- Index: same pattern against `indexes` metadata. Drop is cheap because
indexes carry no user data — only rebuild time.
Approach is pre-check rather than try/catch for control flow: migration
is sequential single-writer, so the race condition justification for
try/catch doesn't apply, and pre-check reads top-to-bottom with no
exception-as-control-flow.
Row-level dispatch (unchanged, already committed):
OnDuplicate::Upsert → $dbForDatabases->upsertDocuments(...)
OnDuplicate::Skip → $dbForDatabases->skipDuplicates(fn () => createDocuments(...))
OnDuplicate::Fail → plain createDocuments
Both row-level primitives are existing library APIs — no database-library
changes are required for this feature.
Helpers:
shouldTolerateSchemaDuplicate() // onDuplicate !== Fail
sourceSpecIsNewer($src, $dst) // strtotime compare, false on empty
Greptile SummaryThis PR adds re-migration tolerance to
Confidence Score: 3/5Not safe to merge — two P1 bugs will cause Upsert re-migrations to fail with false limit/validation errors for collections at their attribute or index ceiling. Two confirmed P1 ordering bugs remain that surfaced in multiple prior review rounds: stale src/Migration/Destinations/Appwrite.php — specifically Important Files Changed
|
Guard against malformed datetime strings (e.g. "0000-00-00 00:00:00") that PHP's strtotime() returns false for. Without the explicit check, false gets loose-compared as 0 and the helper silently returns false, meaning a corrupted source updatedAt would always tolerate the existing destination entry and never trigger drop+recreate. Appwrite itself always emits parseable RFC 3339 timestamps, so this is mainly defensive for non-Appwrite sources (Supabase, NHost, CSV). Flagged by greptile P2.
On a first-run migration createField writes TWO metadata documents for a
two-way relationship attribute: parent-side at
{dbSeq}_{tableSeq}_{key} (line 763) and child-side at
{dbSeq}_{relatedTableSeq}_{twoWayKey} (line 819). Migration operates
directly on the database via $dbForProject — it doesn't rely on
Appwrite's attribute-event workers to derive the child side — so both
documents are physically written by migration itself.
The Upsert drop block was only removing the parent-side document +
column. On the recreate pass that followed, line 819 re-wrote the
child-side document and immediately hit DuplicateException because the
stale child doc from the previous migration run was still sitting there.
The inner catch rolled back the parent and the whole attribute aborted
with "Attribute already exists" — breaking Upsert re-migration for every
two-way relationship whose source spec had changed.
Fix: in the Upsert drop branch, when the resource is a two-way
relationship, also delete the child-side metadata document and the
child-side physical column on the related table, then purge caches.
Both deletes are wrapped in try/catch to tolerate the case where a prior
interrupted run (or utopia-php/database's relationship handling) already
cleaned one side — the goal is to guarantee a clean slate for the
downstream recreate, not to require both sides to still exist.
Flagged by greptile P1. Reproducing scenario:
- OnDuplicate::Upsert
- Destination already contains the attribute
- Source's two-way relationship spec was modified between runs
(source.updatedAt > dest.updatedAt)
DestinationAppwrite's shouldTolerateSchemaDuplicate() and sourceSpecIsNewer() were private helpers that encoded mode-specific behavior. Moving them onto OnDuplicate itself puts the behavior on the type that owns it and makes the call sites self-documenting: $this->onDuplicate->toleratesSchemaDuplicate() $this->onDuplicate->shouldReconcileSchema($src, $dst) shouldReconcileSchema() also subsumes the explicit Skip check at the call sites — it returns false for Skip and Fail, so the compound "`Skip || !sourceSpecIsNewer(...)`" condition collapses to a single "`!shouldReconcileSchema(...)`". No behavior change. Lint + PHPStan L3 clean.
OnDuplicate previously exposed two methods (toleratesSchemaDuplicate +
shouldReconcileSchema) that call sites had to combine. That split meant
the decision was spread across two methods and the call site — adding a
new mode later would require updating all three places in sync.
Replace with a single resolveSchemaAction() that returns one of three
SchemaAction enum cases (Create / Tolerate / DropAndRecreate). The
SchemaAction enum lives in the same file as OnDuplicate so the complete
decision surface is in one place.
For data-preserving containers (databases, tables) the method accepts
allowDrop=false so it can never return DropAndRecreate — callers that
use that path don't need to worry about accidentally dropping the
container. In practice the container call sites just use the inline
`$this->onDuplicate !== OnDuplicate::Fail` gate and then a simple
"is empty?" branch; resolveSchemaAction() is reserved for the leaves
(attributes, indexes) that actually benefit from the three-way choice.
Fail-mode short-circuit is preserved at every call site: no
getDocument pre-check when onDuplicate is Fail, so first-time
migrations (the common default-mode path) pay zero overhead. The
short-circuit stays at the call site rather than inside the enum method
so the "don't query on Fail" invariant is visible where the query is
issued.
Call sites:
- createDatabase / createEntity: inline `!== Fail` gate, tolerate on
existing. No drop ever.
- createField: Fail short-circuit, else resolveSchemaAction + match.
DropAndRecreate triggers the two-way relationship cleanup.
- createIndex: same pattern, no two-way concern.
…havioral comments
Three review improvements to the schema-tolerance work:
1. OnDuplicate::resolveSchemaAction — remove the $allowDrop parameter.
No caller passes it; all four call sites rely on the default. Containers
(databases, tables) bypass the method entirely with an inline
"!== Fail" check. The parameter advertised tuning that wasn't real,
confusing readers about what was actually gated. Docstring tightened
to mention that this method is only safe for leaf resources.
2. Extract deleteAttributeCompletely() as a reusable primitive that
fully removes an attribute from destination — both metadata documents
AND physical columns, mirroring createField's two-document write for
two-way relationships. The Upsert drop block in createField shrinks
from ~40 nested lines to a single method call; the two-way cleanup
is now a named concern that future non-Upsert paths (Merge mode,
explicit rollback, etc.) can reuse without duplication.
3. Replace line-number comments ("the first-run createField wrote TWO
metadata docs (parent at line 763, child at line 819)") with
behavioral descriptions ("createField writes a second `attributes`
document on the related table"). Line references rot on every edit;
behavioral descriptions don't.
Pure refactor — no functional change. Pint + PHPStan L3 clean.
… handling resolveSchemaAction is the load-bearing decision point for re-migration tolerance. Unit test locks the full mode × existence × timestamp matrix so a future refactor can't silently shift the mapping. 14 tests, sub-millisecond to run, no destination spin-up required. Coverage: - Not-exists returns Create for all modes. - Fail + exists → Create (routed through create flow so library DuplicateException surfaces as designed, not silently tolerated). - Skip ignores timestamps entirely — "don't touch" contract enforced by verifying both timestamp orderings produce Tolerate. - Upsert with source strictly newer → DropAndRecreate. - Upsert with dest newer or equal timestamps → Tolerate (strict > gate avoids unnecessary destructive rebuild when nothing has changed). - Upsert with unparseable timestamps → Tolerate (conservative, dataset covers empty strings, '0000-00-00 00:00:00', and non-date garbage). - OnDuplicate::values() case ordering guarded against accidental rename/reorder that would break SDK param validators. Writing the test surfaced a real gap in sourceIsNewer: PHP's strtotime() accepts '0000-00-00 00:00:00' leniently and returns a large negative epoch (-62169984000) rather than false. The previous `=== false` check missed the MySQL zero-date sentinel and would have returned true when source was a valid date and dest was '0000-00-00', triggering a destructive drop on garbage input. Harden sourceIsNewer to additionally reject non-positive epochs (<= 0) — any legitimate Appwrite updatedAt is well past 1970-01-01.
Extends the Upsert reconciliation to cover container metadata drift.
Previously databases and tables always tolerated an existing destination
under Upsert — source renames, enable toggles, and (for tables)
permission changes silently never reached destination on re-migration.
Now Upsert + source strictly newer triggers an updateDocument on the
container's own metadata doc, leaving children (rows, sub-resources)
untouched.
Mechanism:
- New SchemaAction::UpdateInPlace case alongside Create / Tolerate /
DropAndRecreate.
- OnDuplicate::resolveSchemaAction gains a canDrop parameter (default
false — safer default, destructive reconciliation must be opt-in):
canDrop: true → DropAndRecreate on Upsert-newer (attributes, indexes)
canDrop: false → UpdateInPlace on Upsert-newer (databases, tables)
- createField / createIndex pass canDrop: true.
- createDatabase / createEntity rely on the default and handle
SchemaAction::UpdateInPlace by calling updateDocument with a scoped
field set:
Database → name, search, enabled, type, originalId, database, $updatedAt
Table → name, search, enabled, $permissions, documentSecurity, $updatedAt
Immutable fields ($id, $createdAt, internal sequences) are never touched.
Source-wins semantics for table permissions follow the full-Upsert
contract: if the caller wants destination drift preserved across re-runs,
they should pick Skip mode, not Upsert. Row Upsert already works this
way; containers now match.
Unit tests cover both canDrop branches and confirm Skip never returns
UpdateInPlace regardless of timestamps (reinforcing the "don't touch"
contract). 16 tests, pint + phpstan L3 clean.
- Track attribute/index keys seen from source per (db, table); after rows land per table (and in a final sweep at end-of-run) drop destination attributes/indexes whose keys source did not emit. Source rename (delete+create) on source now propagates as a drop on destination without leaving the old column orphaned. Gated to Upsert mode only. - resolveSchemaAction now accepts nullable timestamps — attribute docs with missing $updatedAt no longer throw TypeError, they fall through to Tolerate (same conservative handling as zero-date/garbage input). - Tolerate branches mark resources as SKIPPED instead of implicit SUCCESS so re-migration counters reflect 'already exists' no-ops. - Internal refactor: consolidated 3 orphan-tracker state fields into one $orphanScopes map; extracted dropOrphansByKind(), tolerateMissing(), dropTwoWayMirror() helpers; renamed deleteAttributeCompletely to dropAttributeForRecreate.
- $orphanScopes -> $orphansByTable - orphanScopeKey() -> tableIdentity() - cleanupUpsertOrphansForScope() -> cleanupUpsertOrphansForTable() - $scopeKey local -> $tableId - $scope local -> $tracked (what source declared for this table) No behavior change. The old names were consumer-oriented jargon; the new names name the actual unit (one entry per table).
Processing one side of a two-way relationship already reconciles both metadata docs and both physical columns (via createRelationship on Create/DropAndRecreate, or via the mirror in dropTwoWayMirror on DropAndRecreate). Re-processing the partner side would either waste work or — worse — drop the mirror after the partner table's rows already migrated, destroying relationship data. Track a canonical pair-key (sorted (table, key) tuples scoped to the database) as soon as createField finishes a two-way attribute. On the partner's second pass the check at the top of createField sees the pair in $processedTwoWayPairs and short-circuits with STATUS_SKIPPED. Covers the scenario where one side's source _updatedAt is newer than the other: whichever side is processed first wins; the partner is SKIPPED regardless of its own timestamp. Data safety preferred over honoring a second, potentially-divergent schema decision.
| $destDocs = $this->dbForProject->find($metaCollection, [ | ||
| Query::equal('databaseInternalId', [$database->getSequence()]), | ||
| Query::equal('collectionInternalId', [$table->getSequence()]), | ||
| Query::limit(PHP_INT_MAX), | ||
| ]); |
There was a problem hiding this comment.
PHP_INT_MAX passed to Query::limit() is dangerous. If the underlying adapter enforces a maximum page size (as Appwrite's internal dbForProject may do), the query will throw a validation exception and cleanupUpsertOrphans() / cleanupUpsertOrphansForTable() will fail entirely — silently if called from the end-of-migration sweep, or fatally if called from createRecord. Attributes per collection are bounded by getLimitForAttributes() and indexes by getLimitForIndexes(); use those values (or a safe constant like 5000) instead.
| $destDocs = $this->dbForProject->find($metaCollection, [ | |
| Query::equal('databaseInternalId', [$database->getSequence()]), | |
| Query::equal('collectionInternalId', [$table->getSequence()]), | |
| Query::limit(PHP_INT_MAX), | |
| ]); | |
| $destDocs = $this->dbForProject->find($metaCollection, [ | |
| Query::equal('databaseInternalId', [$database->getSequence()]), | |
| Query::equal('collectionInternalId', [$table->getSequence()]), | |
| Query::limit(5000), | |
| ]); |
There was a problem hiding this comment.
Relationships can't take the DropAndRecreate path cleanly: - utopia's deleteAttribute throws for VAR_RELATIONSHIP - Two-way drops cascade to the partner table's physical column and destroy its row data (partner's rows migrated earlier in the run won't get re-migrated) Route relationships through SchemaAction::UpdateInPlace instead: - canDrop: !$isRelationship on resolveSchemaAction - New updateRelationshipInPlace() helper calls $dbForDatabases->updateRelationship() for the fields utopia supports (newTwoWayKey, twoWay, onDelete) + updateDocument() on Appwrite's application-level doc - relationType changes can't be updated in place and would silently diverge utopia's internal metadata from Appwrite's doc; throw a clear error asking the caller to drop+recreate manually Each side of a two-way is authoritative for its own Appwrite-level doc. utopia's updateRelationship cascades internal metadata on both sides in one call and is idempotent for same-target writes, so neither side's pass stomps the other. Dead-code cleanup: delete dropTwoWayMirror() and the relationship branch of dropAttributeForRecreate — both were only reachable from paths that no longer fire. Simplifies dropAttributeForRecreate signature (drops unused $type and $relatedTable params). Fix dropOrphanAttribute for relationship orphans: deleteAttribute throws for VAR_RELATIONSHIP, so use deleteRelationship which cascades both sides' physical columns + utopia metadata. Clean the two Appwrite-level docs separately. All 34 unit tests pass, pint + phpstan L3 clean.
Summary
Re-migration against a destination Appwrite project needs the destination to tolerate existing schema resources — otherwise the second run throws
DuplicateExceptionon every database/table/attribute/index that already exists. This PR makesDestinationAppwritehandle reconciliation locally, with no changes required inutopia-php/database.Schema decisions are routed through a single method on the enum —
OnDuplicate::resolveSchemaAction()— which returns one of fourSchemaActioncases:Create,Tolerate,DropAndRecreate, orUpdateInPlace. The$canDropparameter (defaultfalse) picks the Upsert-newer strategy:$canDrop = true→ leaves (attributes, indexes) →DropAndRecreateon source-newer$canDrop = false→ containers (databases, tables) →UpdateInPlaceon source-newerDefault-false keeps destructive reconciliation opt-in at the call site.
Per-resource behavior
name,search,enabled,type,originalId,database,$updatedAtname,search,enabled,$permissions,documentSecurity,$updatedAtskipDuplicates(createDocuments)upsertDocumentsupsertDocumentscreateDocumentsTolerate branches set
Resource::STATUS_SKIPPEDand returnfalseso migration counters showskipinstead ofsuccessfor no-op re-migrations. Immutable container fields ($id,$createdAt, internal sequences) are never touched on UpdateInPlace.Upsert orphan cleanup
DestinationAppwritenow tracks every attribute and index key source declares per (database, table) during the run in an$orphansByTablemap. At cleanup time it queries the destination'sattributes/indexescollections for that table and drops any whose key wasn't in the tracked set.When cleanup fires:
createRecordon first row batch flush for a given table. Ensures the Structure validator sees the post-cleanup schema, so rows aren't rejected on orphan required columns.run()override after source streaming completes. Only visits tables that had no rows (and so didn't get cleaned insidecreateRecord).Gating: tracking and cleanup are both short-circuited when
$this->onDuplicate !== OnDuplicate::Upsert. Skip and Fail modes never touch destination state they weren't explicitly asked to write.Fail-open on partial failure:
run()callscleanupUpsertOrphans()afterparent::run()returns — if the migration throws mid-stream, the cleanup is skipped and the destination is preserved as-is (no accidental destruction when we don't have a full picture of what source has).Two-way relationships: orphan drops extract relationship metadata from the destination's attribute doc and mirror the drop to the related table via the shared
dropTwoWayMirror()helper (same helper used by DropAndRecreate).Two-way relationship handling
Two-way relationships share a metadata doc and physical column on both tables. Any action on one side (Tolerate, DropAndRecreate, Create) reconciles both sides —
createField's create path writes both metadata docs and callscreateRelationshipto provision both physical columns;dropAttributeForRecreatedrops both viadropTwoWayMirror.If the partner side is later re-encountered by the source iteration (some sources emit each side as a distinct resource), re-processing would either waste work (double drop + recreate) or destroy row data on the partner table if its rows already migrated between the two calls.
Prevention: a canonical pair-key (
{dbSeq}@{min(tuple)}<->{max(tuple)}where tuple ={tableId}::{attributeKey}) is recorded in$processedTwoWayPairsafter the first side completes. At the top ofcreateField, the partner's second pass checks the set and short-circuits withSTATUS_SKIPPED.Covers divergent-timestamp edge case: if source has one side newer and the other older (e.g. from manual DB tampering), whichever side is processed first wins — the partner is SKIPPED regardless of its own timestamp. Data safety preferred over honoring a potentially-divergent second decision.
Why pre-check over try/catch
Migration is a sequential single-writer, so the race-condition justification for exception-driven control flow doesn't apply. Pre-check reads top-to-bottom without using exceptions for control flow. On
OnDuplicate::Fail(the default) the pre-check is gated off entirely — fresh-migration paths pay zero overhead.Why UpdateInPlace for containers, DropAndRecreate for leaves
Zero-date and null-timestamp hardening
sourceIsNewer()treats null, empty, garbage, and MySQL zero-date ('0000-00-00 00:00:00') timestamps as "unknown" → Tolerate rather than risk a destructive drop. Specifically: PHP'sstrtotime('0000-00-00 00:00:00')returns a large negative epoch rather thanfalse, so the naive=== falsecheck would have missed the sentinel. Also, attribute docs with missing$updatedAt(legitimate edge case on older destinations) no longer throw TypeError — the comparison arguments are nullable.Zero library changes
All of this uses existing
utopia-php/databaseAPIs:getDocument(collection, id)for pre-checkupdateDocumentfor container UpdateInPlacedeleteAttribute/deleteIndexfor leaf DropAndRecreate and orphan dropsfind('attributes' | 'indexes', ...)for orphan discoveryskipDuplicates()(5.3.22) for row-level SkipupsertDocuments()(existing) for row-level UpsertNo database library bump required.
Tests
resolveSchemaAction, including null/zero-date/unparseable-timestamp datasets andOnDuplicate::values()case ordering. Sub-millisecond run, no destination spin-up.testAppwriteMigrationRowsOnDuplicate— rows under all three modestestAppwriteMigrationReRunIsIdempotent— Skip re-run as no-optestAppwriteMigrationUpsertUpdatesContainerMetadata— Upsert reconciles database/table drift (UpdateInPlace)testAppwriteMigrationSkipPreservesContainerDrift— Skip preserves dest drift even when source is newertestAppwriteMigrationUpsertDropsOrphanColumn— Upsert drops destination columns source no longer declarestestAppwriteMigrationSkipKeepsOrphanColumn— Skip preserves orphan columns (cleanup is Upsert-only)Test plan
skipfor tolerated resources.vendor/bin/pint --testpasses.vendor/bin/phpstan analyse --level 3 src testspasses.