Skip to content

feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports#11910

Open
premtsd-code wants to merge 23 commits into1.9.xfrom
feat/skip-duplicates
Open

feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports#11910
premtsd-code wants to merge 23 commits into1.9.xfrom
feat/skip-duplicates

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Apr 15, 2026

Summary

Exposes a single onDuplicate string param on the three migration creation endpoints so CSV / JSON / appwrite-to-appwrite imports can choose how to handle rows whose IDs already exist at the destination.

POST /v1/migrations/appwrite
POST /v1/migrations/csv/imports
POST /v1/migrations/json/imports

Each now accepts optional onDuplicate: "fail" | "skip" | "upsert" (default "fail").

Parameter semantics

Value Behavior
"fail" (default) createDocuments(), aborts on first DuplicateException. Original behavior, unchanged.
"skip" createDocuments() wrapped in skipDuplicates() scope guard. Duplicate-id rows silently no-op at the adapter layer (INSERT IGNORE equivalent). Existing rows are preserved.
"upsert" upsertDocuments(). Existing rows are replaced with imported values.

String + WhiteList validator at the REST boundary — matches the existing Appwrite convention for enum-style params (priority, encryption, period, grant_type). Allowed values are drawn from Utopia\Migration\Destinations\OnDuplicate::values() so the set is owned by the destination implementation.

The param is stored in the migration Document's options array alongside existing fields (path, size, etc.), matching the existing pattern for destination behavior config. The worker's processDestination() reads it back, converts the string via OnDuplicate::tryFrom(...), and passes the enum to DestinationAppwrite's constructor.

Scope

onDuplicate only gates the row-write path in DestinationAppwrite (the rowBuffer flush in createRecord()). It does not affect other resource types — databases, tables, columns, users, teams, buckets, etc. still follow their existing duplicate-handling semantics (throw on conflict). Re-running an Appwrite→Appwrite migration with the full resource tree will still error on schema resources; onDuplicate correctly handles only the row-level flow.

Changes

  • app/controllers/api/migrations.php — adds ->param('onDuplicate', OnDuplicate::Fail->value, new WhiteList(OnDuplicate::values()), ...) to the Appwrite, CSV, and JSON migration create endpoints; stores in options; threads into action function signatures
  • src/Appwrite/Platform/Workers/Migrations.phpprocessDestination() passes OnDuplicate::tryFrom($options['onDuplicate'] ?? '') ?? OnDuplicate::Fail to the destination constructor
  • composer.json — bumps utopia-php/database to 5.3.* and utopia-php/migration to 1.9.* (both tagged releases)
  • tests/e2e/Services/Migrations/MigrationsBase.php — 7 E2E test methods (3 CSV, 3 JSON, 1 Appwrite→Appwrite) + JSON fixture helper + row-success poll helper

E2E test coverage

CSV imports (3 tests)

  • testCreateCSVImportSkipDuplicates — import documents.csv (100 rows), mutate one row's age from 56 → 22, re-import with onDuplicate=skip. Asserts the mutated row keeps age=22 and row count stays at 100.
  • testCreateCSVImportOverwrite — same setup, re-import with onDuplicate=upsert. Asserts the row is restored to the CSV's original age=56.
  • testCreateCSVImportDefaultFailsOnDuplicate — regression guard. Re-import with no onDuplicate param (defaults to "fail"). Asserts migration status is failed with errors populated.

JSON imports (3 tests — same shape as CSV)

  • testCreateJSONImportSkipDuplicates
  • testCreateJSONImportOverwrite
  • testCreateJSONImportDefaultFailsOnDuplicate

Appwrite → Appwrite migration (1 test)

  • testAppwriteMigrationRowsOnDuplicate — seeds a source database/table/column/row, runs a first (full) migration to copy schema + row to the destination, mutates the destination row, then re-runs with onDuplicate=skip (expects row preserved) and onDuplicate=upsert (expects row restored to source value). Uses a schema-error-tolerant poll helper (runMigrationAssertingRowSuccess) because re-creating an existing database/table/column on destination always errors — those errors are orthogonal to onDuplicate, which only gates row writes. Assertions target statusCounters['row'] directly rather than overall migration status.

Fixture helpers

  • prepareCsvImportFixture / prepareJsonImportFixture — create database, table, columns (with status=available wait), bucket, upload fixture file. Return the IDs + first-row metadata for the tests.
  • runMigrationAssertingRowSuccess — Appwrite→Appwrite-specific poll helper that tolerates non-row errors.

All tests verified locally against a full Appwrite stack (CSV + JSON: ~10s total, 153 assertions).

Dependencies

Both upstream deps are tagged releases — no dev-branch pins remain:

Test plan

  • PHP lint clean
  • Pint / PSR-12 format clean
  • PHPStan level 3 clean (1323 files analyzed)
  • composer validate clean (composer.json + composer.lock hash in sync)
  • CSV + JSON E2E tests pass locally against full Appwrite stack (6 tests, 153 assertions)
  • Appwrite→Appwrite E2E test verified by CI (local Docker Traefik routing is broken for cross-project tests; same pre-existing issue affects testAppwriteMigrationDatabasesRow and all sibling tests)
  • Console frontend changes (out of scope for this PR)

Exposes two new optional boolean params on the three migration
creation endpoints so CSV / JSON / appwrite-to-appwrite imports can
choose how to handle rows whose IDs already exist at the destination.

Endpoints updated (app/controllers/api/migrations.php):
- POST /v1/migrations/appwrite
- POST /v1/migrations/csv/imports
- POST /v1/migrations/json/imports

Parameter semantics:
- overwrite=true  -> destination uses upsertDocuments instead of
                     createDocuments; existing rows are replaced
                     with imported values
- skip=true       -> destination wraps createDocuments in
                     skipDuplicates; existing rows are preserved
                     unchanged, duplicate-id rows silently no-op
- both false      -> default; fails fast on DuplicateException
                     (original behavior, unchanged)
- both true       -> overwrite wins (upsert subsumes skip)

Both params are stored in the migration Document's options array
(matches the existing pattern for destination behavior config like
path, size, delimiter, bucketId, etc.) and read back in the worker's
processDestination() to instantiate DestinationAppwrite with the
new constructor params.

Feature-branch note: depends on utopia-php/migration#feat/skip-duplicates
(DestinationAppwrite constructor params) which in turn depends on
utopia-php/database#852 (skipDuplicates scope guard). composer.json is
temporarily pinned to dev-feat/skip-duplicates and
dev-csv-import-upsert-v2 respectively; both must be reset to proper
release versions once the upstream PRs merge.
Three new test methods in MigrationsBase, following the existing
testCreateCSVImport setup pattern:

- testCreateCSVImportSkipDuplicates
  Seeds documents.csv, mutates one row, re-imports with skip=true.
  Asserts the mutated row keeps its mutated value (not overwritten
  by the CSV's original value) and the row count stays at 100.

- testCreateCSVImportOverwrite
  Seeds documents.csv, mutates one row, re-imports with overwrite=true.
  Asserts the mutated row is restored to the CSV's original value
  (proving upsertDocuments actually replaced the row) and the row
  count stays at 100.

- testCreateCSVImportDefaultFailsOnDuplicate
  Regression guard: re-imports documents.csv with no flags. Asserts
  the migration goes to status=failed with errors populated, proving
  the default duplicate-throws behavior is preserved.

All three share a prepareCsvImportFixture() helper that sets up
database + table (name, age columns) + bucket + documents.csv
upload. Returns the known first-row id + original name/age so tests
can mutate and assert on a predictable row.

Reuses the existing documents.csv fixture (100 rows with \$id as the
first column). No new fixture files needed.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 15, 2026

🔄 PHP-Retry Summary

Flaky tests detected across commits:

Commit 0a26608 - 1 flaky test
Test Retries Total Time Details
TablesDBCustomClientTest::testInvalidDocumentStructure 1 240.42s Logs
Commit 84b6dfa - 3 flaky tests
Test Retries Total Time Details
LegacyConsoleClientTest::testNotStartsWith 1 240.47s Logs
VectorsDBConsoleClientTest::testGetCollectionLogs 1 7ms Logs
DatabasesConsoleClientTest::testGetCollectionLogs 1 14ms Logs
Commit e728265 - 2 flaky tests
Test Retries Total Time Details
TablesDBConsoleClientTest::testNotContains 1 240.24s Logs
TablesDBCustomServerTest::testInvalidDocumentStructure 1 240.34s Logs
Commit 478a2a6 - 2 flaky tests
Test Retries Total Time Details
VectorsDBConsoleClientTest::testGetCollectionLogs 1 5ms Logs
DatabasesConsoleClientTest::testGetCollectionLogs 1 8ms Logs
Commit dc9f48c - 2 flaky tests
Test Retries Total Time Details
VectorsDBConsoleClientTest::testGetCollectionLogs 1 9ms Logs
DatabasesConsoleClientTest::testGetCollectionLogs 1 11ms Logs

Note: Flaky test results are tracked for the last 5 commits

@github-actions
Copy link
Copy Markdown

✨ Benchmark results

  • Requests per second: 1,908
  • Requests with 200 status code: 343,586
  • P99 latency: 0.095115825

⚡ Benchmark Comparison

Metric This PR Latest version
RPS 1,908 1,297
200 343,586 233,480
P99 0.095115825 0.158781968

@blacksmith-sh

This comment has been minimized.

@premtsd-code premtsd-code changed the title feat(migrations): add overwrite and skip options to CSV/JSON/Appwrite imports feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports Apr 20, 2026
@premtsd-code premtsd-code marked this pull request as ready for review April 20, 2026 12:18
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 20, 2026

Greptile Summary

This PR adds an onDuplicate option (fail / skip / upsert) to the Appwrite, CSV, and JSON migration endpoints and ships a substantial expansion of Appwrite→Appwrite re-migration E2E tests. Several P1 issues from prior review rounds remain unresolved and must be addressed before merge:

  • onDuplicate is wired only to DestinationAppwriteDestinationCSV and DestinationJSON constructors in processDestination() receive no onDuplicate argument, so the new option is silently ignored for those two endpoints; the CSV/JSON E2E tests (testCreateCSVImportSkipDuplicates, testCreateCSVImportOverwrite) will fail at runtime for the same reason.
  • utopia-php/migration is pinned to a live dev branch (dev-feat/skip-duplicates as 1.9.99), not the tagged release claimed in the PR description; utopia-php/platform is unintentionally downgraded from 0.13.* to 0.12.*.
  • CI is in a debug state — unit tests, most E2E suites, and the benchmark job are all disabled with if: false; the service matrix is narrowed to [Migrations] only.

Confidence Score: 3/5

Not safe to merge — multiple P1 issues remain unresolved from prior review rounds.

Three unresolved P1 issues block a safe merge: (1) onDuplicate is silently ignored for CSV and JSON destinations, breaking the core feature promise for those endpoints; (2) utopia-php/migration is pinned to a live dev branch rather than a tagged release, making the build non-reproducible; (3) utopia-php/platform is unintentionally downgraded. The CI configuration is also in a debug state that would give a false green on breakages in unrelated suites.

src/Appwrite/Platform/Workers/Migrations.php (missing onDuplicate pass-through to CSV/JSON destinations), composer.json (dev-branch pin + platform downgrade), .github/workflows/ci.yml (disabled test suites must be re-enabled).

Important Files Changed

Filename Overview
app/controllers/api/migrations.php Adds onDuplicate param to three migration endpoints with correct WhiteList validation; param is stored in options and threaded through action signatures correctly.
src/Appwrite/Platform/Workers/Migrations.php OnDuplicate is wired only to DestinationAppwrite; DestinationCSV and DestinationJSON constructors receive no onDuplicate argument, making the option a no-op for those destinations (outstanding P1, already flagged).
composer.json utopia-php/migration is pinned to the live dev-feat/skip-duplicates branch alias, not a tagged release; utopia-php/platform is unintentionally downgraded from 0.13.* to 0.12.* (both P1, already flagged).
.github/workflows/ci.yml Unit tests, all non-Migrations E2E suites, abuse tests, screenshots, and benchmarks disabled with if: false; service matrix narrowed to [Migrations] only — must be reverted before merge (already flagged).
tests/e2e/Services/Migrations/MigrationsBase.php New CSV/JSON/Appwrite→Appwrite onDuplicate tests added with prepareCsvImportFixture that properly awaits column readiness; CSV/JSON skip/upsert tests will fail at runtime because the worker doesn't honour the option (tied to P1 in Workers file).
composer.lock utopia-php/migration resolves to dev-feat/skip-duplicates at a specific commit hash; stability-flags override forces minimum-stability: dev for that package; otherwise hash matches composer.json constraints.

Reviews (19): Last reviewed commit: "composer: bump utopia-php/migration to 6..." | Re-trigger Greptile

Comment thread composer.json Outdated
Comment thread tests/e2e/Services/Migrations/MigrationsBase.php
@premtsd-code premtsd-code force-pushed the feat/skip-duplicates branch from 8a00770 to f3c2502 Compare April 20, 2026 12:58
@blacksmith-sh

This comment has been minimized.

Comment thread .github/workflows/ci.yml Outdated
@premtsd-code premtsd-code force-pushed the feat/skip-duplicates branch from 7057189 to d0603c4 Compare April 20, 2026 15:39
…lerance

utopia-php/migration's DestinationAppwrite now handles schema tolerance on
re-migration (PR #171 on feat/skip-duplicates): it pre-checks destination
`_metadata` for each database / table / column / index and tolerates in
Skip/Upsert mode. Re-runs no longer produce schema-level errors, so the
E2E tests can drop the status-tolerant workaround and assert strict
'completed' outcomes.

Changes:

- composer.json: pin utopia-php/migration to dev-feat/skip-duplicates (aliased
  to 1.9.99 for stability resolution). Will be replaced with a fixed 1.10.0
  tag once the migration PR lands.

- testAppwriteMigrationRowsOnDuplicate: replace the tolerant
  runMigrationAssertingRowSuccess helper with performMigrationSync on the
  Skip and Upsert re-runs. Asserts 'completed' status on every run,
  destination row content matches the expected value per mode (Mutated
  preserved on Skip, Original restored on Upsert). Helper method removed.

- testAppwriteMigrationReRunIsIdempotent (new): seeds two rows on source,
  runs the migration three times back-to-back (fresh, Skip re-run, Upsert
  re-run) against unchanged source data, asserts strict 'completed' on every
  run and row content is stable across all three. Exercises the
  schema-tolerance path end-to-end: every database/table/column on
  destination already exists with a matching spec, so DestinationAppwrite's
  pre-check returns Tolerate for every resource.
Comment thread composer.json
"utopia-php/logger": "0.6.*",
"utopia-php/messaging": "0.22.*",
"utopia-php/migration": "1.9.*",
"utopia-php/migration": "dev-feat/skip-duplicates as 1.9.99",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dev-branch pin contradicts PR description and breaks reproducibility

The constraint "dev-feat/skip-duplicates as 1.9.99" is a live dev-branch alias, not the tagged release 1.9.2 stated in the PR description ("Both upstream deps are tagged releases — no dev-branch pins remain"). composer.lock confirms this: it resolves to "version": "dev-feat/skip-duplicates" at commit 001682168f7d87932c56635a0d0a45b927febc33. If that branch is rebased or force-pushed, any subsequent composer update will silently pull in a different, untested commit. The stability-flags entry ("utopia-php/migration": 20) in composer.lock also overrides the global prefer-stable: true. This should be updated to the tagged release once utopia-php/migration@1.9.2 is published.

Suggested change
"utopia-php/migration": "dev-feat/skip-duplicates as 1.9.99",
"utopia-php/migration": "1.9.*",

Branch is iterating on utopia-php/migration's re-migration tolerance.
Other test matrices (unit, general, abuse, screenshots, benchmark, and
every other e2e service) add ~30+ minutes to CI without exercising code
this PR touches. Restrict the matrix to the Migrations service and skip
the unrelated test jobs until the migration work is ready to merge.

All jobs marked with 'TEMP:' comments + 'if: false' — revert to the
full matrix before merging to main.

Static analysis (lint, phpstan, composer audit, specs, locale, security)
still runs on every PR push.
@blacksmith-sh

This comment has been minimized.

Picks up the PR #171 refactor + unit tests:
- resolveSchemaAction decision point consolidation
- deleteAttributeCompletely primitive (two-way cleanup in one place)
- Hardened sourceIsNewer against MySQL zero-date sentinel
- 14 unit tests locking the decision matrix
@blacksmith-sh

This comment has been minimized.

Picks up the UpdateInPlace branch — database/table metadata drift is
now reconciled on Upsert-newer (renames, enable toggles, table
permissions / documentSecurity) via updateDocument, without touching
child rows.
… dest drift

Two new E2E tests exercising the schema-tolerance UpdateInPlace path
added in utopia-php/migration's DestinationAppwrite.

testAppwriteMigrationUpsertUpdatesContainerMetadata (positive):
- Fresh migration copies source database + table + column + row to dest.
- Mutates source database name (PUT /databases/:id) and table
  name/permissions/rowSecurity/enabled (PUT /tablesdb/:db/tables/:id).
- One-second sleep before mutation ensures source's $updatedAt is
  strictly greater than dest's at second granularity (strtotime
  comparison).
- Upsert re-migration asserts:
  - 'completed' status.
  - dest database name matches source's new name.
  - dest table name / enabled / rowSecurity match source's new values.
  - child row's 'name' attribute is untouched — UpdateInPlace only
    rewrites container metadata, not rows.

testAppwriteMigrationSkipPreservesContainerDrift (negative):
- Fresh migration, then mutate BOTH dest (simulating ops tightening
  permissions post-migration) and source (divergence).
- Skip re-migration asserts dest kept its tightened values — Skip's
  strict "don't touch" contract protects dev→prod cutover workflows
  from accidentally wiping ops-side drift on schema re-sync.

Both tests use performMigrationSync for strict 'completed' assertions.
Runtime ~18s combined. Existing testAppwriteMigrationRowsOnDuplicate
and testAppwriteMigrationReRunIsIdempotent regression-tested locally.
testAppwriteMigrationUpsertDropsOrphanColumn: adds a column directly on
destination (simulating post-rename orphan or dest-only drift), runs
Upsert, asserts the orphan is dropped and source-declared column
survives. Covers the per-table orphan cleanup fired inside
createRecord before rows land.

testAppwriteMigrationSkipKeepsOrphanColumn: same setup, Skip mode.
Asserts the orphan survives, proving the cleanup is correctly gated
to Upsert only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant