Skip to content

Export partition to apache iceberg#1618

Open
arthurpassos wants to merge 66 commits intoantalya-26.1from
export_partition_iceberg
Open

Export partition to apache iceberg#1618
arthurpassos wants to merge 66 commits intoantalya-26.1from
export_partition_iceberg

Conversation

@arthurpassos
Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos commented Apr 6, 2026

Export partition mechanics changes:

  1. ping restarting thread in case of zookeeper session failure
  2. adds a few failpoints to make testing better
  3. makes export_merge_tree_partition_system_table_prefer_remote_information false by default (I am considering to remove it completely)
  4. adds commit retry count / max retries to prevent a task from living forever when commit is failing. Fail the entire task if commit retries exceeds max retries.
  5. fixes race condition in ExportPartitionManifestUpdatingTask by draining the status queue while only holding the status lock instead of holding the status lock and the export partition lock
  6. abstract away common functions like getContextCopyWithTaskSettings to avoid code duplication
  7. Add task timeout. If the task exceeds the timeout, it is killed with reason: timeout exceeded. This helps with apache iceberg idempotency vs old manifest cleanup, tasks stuck in pending state forever due to missing parts OR no destination table;
  8. rename enable_experimental_export_merge_tree_partition_feature to allow_experimental_export_merge_tree_partition
  9. throw on exports if allow_experimental_insert_into_iceberg not enabled

Apache Iceberg specifics:

  1. Store apache iceberg metadata json in zookeeper task
  2. Derive destination partition values from source merge tree part (no recalculation)
  3. preserve write_full_path_in_iceberg_metadata in zookeeper task
  4. now exportparttask has a commit step that is only executed in case it is not export partition - this is because we need to commit even a single part - maybe I should re-think this architecture.
  5. some vibe coded structures for iceberg stats
  6. write f_clickhouse_export_partition_transaction_id to apache iceberg manifest so we can check it before comitting twice
  7. copy and paste and adapt the icebegergstoragesink commmit phase to icebergmetadata so we can commit export partition operations
  8. create sidecar file to persist file level statistics so they can be used at commit time - those are downloaded / read at commit time
  9. create a simple icebergimportsink
  10. add per fil stats to multifilewriter

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Make changes to the export partition background engine and support experimental exports to apache iceberg

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

Workflow [PR], commit [dbf7866]

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b0e833565

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{
auto * object_storage = dynamic_cast<StorageObjectStorageCluster *>(dest_storage.get());

auto * iceberg_metadata = dynamic_cast<IcebergMetadata *>(object_storage->getExternalMetadata(query_context));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard IcebergMetadata casts with USE_AVRO

IcebergMetadata is defined only under #if USE_AVRO in IcebergMetadata.h, but this new dynamic_cast<IcebergMetadata *> is compiled unconditionally. In non-AVRO builds (USE_AVRO=0), this translation unit (and the same pattern added in StorageReplicatedMergeTree.cpp) cannot compile, so the change breaks optional-AVRO build configurations.

Useful? React with 👍 / 👎.

Comment on lines +1780 to +1781
const String sidecar_path = replaceFileExtensionWithAvro(
filename_generator.convertMetadataPathToStoragePath(path));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use storage paths directly when reading export sidecars

The export path list is populated from filename.path_in_storage, but commit now treats each entry as a metadata path and calls convertMetadataPathToStoragePath before reading sidecars. With write_full_path_in_iceberg_metadata=1, table_dir is an URI prefix (for example s3://...) while these entries are plain storage paths (/...), so the conversion throws and EXPORT PARTITION cannot commit.

Useful? React with 👍 / 👎.

Comment thread src/Storages/ExportReplicatedMergeTreePartitionManifest.h
@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c6194814d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/Storages/MergeTree/MergeTreeData.cpp
Comment thread src/Storages/StorageReplicatedMergeTree.cpp Outdated
context_copy->makeQueryContextForExportPart();
context_copy->setCurrentQueryId(manifest.query_id);
context_copy->setSetting("output_format_parallel_formatting", manifest.parallel_formatting);
context_copy->setSetting("output_format_parquet_parallel_encoding", manifest.parquet_parallel_encoding);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a failing test trying to use output_format_parquet_compression_method ZSTD and SNAPPY. Looks like this setting isn't propagated here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.. this is the reason, but only for partition export. For part export that should be working.

I still need to find the best approach for persisting these settings.

mkmkme
mkmkme previously approved these changes Apr 22, 2026
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way I could finish this review properly in time. From reading the code diagonally, it looks good. Discussed some of the AI findings in the call. Probably worth addressing in the future PRs related to that feature.

Approving.

Comment thread docs/en/antalya/partition_export.md Outdated
- **Type**: `Bool`
- **Default**: `false`
- **Description**: Ignore existing partition export and overwrite the ZooKeeper entry. Allows re-exporting a partition to the same destination before the manifest expires.
- **Description**: Ignore existing partition export and overwrite the ZooKeeper entry. Allows re-exporting a partition to the same destination before the manifest expires. **IMPORTANT:** this is dangerou because it can lead to duplicated data, use it with caution.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"dangerou" :)

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

todo arthur check key value pair string because maybe it will be duplicated

@DimensionWieldr
Copy link
Copy Markdown
Collaborator

DimensionWieldr commented Apr 23, 2026

@arthurpassos For anyone upgrading CH from a version that didn't have export to one that does, existing/old tables' ZooKeeper trees are missing the exports folder.

I have a test for this failing when it tries to run EXPORT PARTITION on a table whose ZooKeeper tree doesn't have the exports/ folder yet. CH tries to create a file inside that folder but ZooKeeper refuses because the parent folder doesn't exist. The EXPORT command fails with a "no such node" error.

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@arthurpassos For anyone upgrading CH from a version that didn't have export to one that does, existing/old tables' ZooKeeper trees are missing the exports folder.

I have a test for this failing when it tries to run EXPORT PARTITION on a table whose ZooKeeper tree doesn't have the exports/ folder yet. CH tries to create a file inside that folder but ZooKeeper refuses because the parent folder doesn't exist. The EXPORT command fails with a "no such node" error.

This is weird, I should have it covered. Let's discuss this internally

@DimensionWieldr
Copy link
Copy Markdown
Collaborator

@arthurpassos Truncate and export seem to disagree on REST catalog.

Basically when I EXPORT PARTITION A, CH writes the manifest-list path as a bucket-relative path: /data/iceberg_.../metadata/snap-797704761-...avro.
Then TRUNCATE the destination, this time CH writes the manifest-list path as a full S3 URI, something like s3://warehouse/data/iceberg_.../metadata/snap-1022122687-...avro.
Then I EXPORT PARTITION B and IcebergWrites.cpp inspects the existing files, sees both s3://... and /data/... paths coexisting, and refuses to commit with Code: 36. DB::Exception: Paths in Iceberg must use a consistent format.

This looks like kinda the same issue I mentioned before, which should be fixed by ClickHouse#100420. So I guess we leave it for now?

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@arthurpassos Truncate and export seem to disagree on REST catalog.

Basically when I EXPORT PARTITION A, CH writes the manifest-list path as a bucket-relative path: /data/iceberg_.../metadata/snap-797704761-...avro. Then TRUNCATE the destination, this time CH writes the manifest-list path as a full S3 URI, something like s3://warehouse/data/iceberg_.../metadata/snap-1022122687-...avro. Then I EXPORT PARTITION B and IcebergWrites.cpp inspects the existing files, sees both s3://... and /data/... paths coexisting, and refuses to commit with Code: 36. DB::Exception: Paths in Iceberg must use a consistent format.

This looks like kinda the same issue I mentioned before, which should be fixed by ClickHouse#100420. So I guess we leave it for now?

How does regular insert behave in this case? 'Who' does it agree with?

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

arthurpassos commented Apr 23, 2026

@arthurpassos For anyone upgrading CH from a version that didn't have export to one that does, existing/old tables' ZooKeeper trees are missing the exports folder.
I have a test for this failing when it tries to run EXPORT PARTITION on a table whose ZooKeeper tree doesn't have the exports/ folder yet. CH tries to create a file inside that folder but ZooKeeper refuses because the parent folder doesn't exist. The EXPORT command fails with a "no such node" error.

This is weird, I should have it covered. Let's discuss this internally

Discussed this internally and it was an err on test side. A re-start is required as the nodes are created upon initialization

@DimensionWieldr
Copy link
Copy Markdown
Collaborator

@arthurpassos Truncate and export seem to disagree on REST catalog.
Basically when I EXPORT PARTITION A, CH writes the manifest-list path as a bucket-relative path: /data/iceberg_.../metadata/snap-797704761-...avro. Then TRUNCATE the destination, this time CH writes the manifest-list path as a full S3 URI, something like s3://warehouse/data/iceberg_.../metadata/snap-1022122687-...avro. Then I EXPORT PARTITION B and IcebergWrites.cpp inspects the existing files, sees both s3://... and /data/... paths coexisting, and refuses to commit with Code: 36. DB::Exception: Paths in Iceberg must use a consistent format.
This looks like kinda the same issue I mentioned before, which should be fixed by ClickHouse#100420. So I guess we leave it for now?

How does regular insert behave in this case? 'Who' does it agree with?

Insert is bucket-relative too, so same as export. Looks like truncate is the outlier.

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@arthurpassos Truncate and export seem to disagree on REST catalog.
Basically when I EXPORT PARTITION A, CH writes the manifest-list path as a bucket-relative path: /data/iceberg_.../metadata/snap-797704761-...avro. Then TRUNCATE the destination, this time CH writes the manifest-list path as a full S3 URI, something like s3://warehouse/data/iceberg_.../metadata/snap-1022122687-...avro. Then I EXPORT PARTITION B and IcebergWrites.cpp inspects the existing files, sees both s3://... and /data/... paths coexisting, and refuses to commit with Code: 36. DB::Exception: Paths in Iceberg must use a consistent format.
This looks like kinda the same issue I mentioned before, which should be fixed by ClickHouse#100420. So I guess we leave it for now?

How does regular insert behave in this case? 'Who' does it agree with?

Insert is bucket-relative too, so same as export. Looks like truncate is the outlier.

So.. here's the deal: I trusted the current Iceberg writes upstream implementation. It will write full paths for the manifest files (not data files) IF, and only if, write_full_path_in_iceberg_metadata is true. The data files are not respecting it iirc, and that PR should fix it (maybe it does more than that).

On the other hand, the truncate implementation by @il9ue will write full paths if write_full_path_in_iceberg_metadata is true OR the catalog is transactional (this is the piece regular writes and exports are not following at this moment).

This is where the inconsistency lives. The question is: who is right?

@DimensionWieldr
Copy link
Copy Markdown
Collaborator

Tests are pretty much done and passing (with the exception of some issues discussed above with iceberg writes) and cover rest and glue catalogs.

Did one more pass of AI audit and it came back with one issue.

Medium: Commit precondition no-op leaves export `PENDING` without commit retry budget
  • Impact: After all parts are in processed/, commit() can return without calling Iceberg commit or updating ZK status, and without calling handleCommitFailure. The task can stay
    `PENDING` indefinitely (until manual kill, TTL on a later terminal state, or optional task timeout), with `commit_attempts` stuck at 0.
  • Anchor: ExportPartitionUtils::commit (ExportPartitionUtils.cpp, early returns after getExportedPaths); callers ExportPartitionTaskScheduler::handlePartExportSuccess and
    ExportPartitionManifestUpdatingTask.cpp (tryCleanup commit fix-up).
  • Trigger: getExportedPaths returns empty or fewer paths than manifest.parts.size() while ZK already shows no processing/* children (e.g. ZK read inconsistency, partial processed
    data, or path accounting bug).
  • Why this is a defect: The state model explicitly adds commit_attempts / FAILED for exceptional commit failures, but this branch is a silent failure — no terminal transition and
    no budget consumption; `tryCleanup` still returns `true` after a no-op `commit()`, which mislabels cleanup success relative to the documented tryCleanup contract.
  • Fix direction (short): Treat empty / incomplete exported paths as an error: throw (so existing handleCommitFailure runs) or call handleCommitFailure / a dedicated counter before
    return; make tryCleanup return `false` unless ZK status is actually terminalized.
  • Regression test direction (short): Integration or unit test: empty processed/ listing at commit time → assert commit_attempts increases and eventual FAILED (or explicit throw),
    not infinite PENDING.

@arthurpassos Thoughts?

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

Tests are pretty much done and passing (with the exception of some issues discussed above with iceberg writes) and cover rest and glue catalogs.

Did one more pass of AI audit and it came back with one issue.

Medium: Commit precondition no-op leaves export `PENDING` without commit retry budget
  • Impact: After all parts are in processed/, commit() can return without calling Iceberg commit or updating ZK status, and without calling handleCommitFailure. The task can stay
    `PENDING` indefinitely (until manual kill, TTL on a later terminal state, or optional task timeout), with `commit_attempts` stuck at 0.
  • Anchor: ExportPartitionUtils::commit (ExportPartitionUtils.cpp, early returns after getExportedPaths); callers ExportPartitionTaskScheduler::handlePartExportSuccess and
    ExportPartitionManifestUpdatingTask.cpp (tryCleanup commit fix-up).
  • Trigger: getExportedPaths returns empty or fewer paths than manifest.parts.size() while ZK already shows no processing/* children (e.g. ZK read inconsistency, partial processed
    data, or path accounting bug).
  • Why this is a defect: The state model explicitly adds commit_attempts / FAILED for exceptional commit failures, but this branch is a silent failure — no terminal transition and
    no budget consumption; `tryCleanup` still returns `true` after a no-op `commit()`, which mislabels cleanup success relative to the documented tryCleanup contract.
  • Fix direction (short): Treat empty / incomplete exported paths as an error: throw (so existing handleCommitFailure runs) or call handleCommitFailure / a dedicated counter before
    return; make tryCleanup return `false` unless ZK status is actually terminalized.
  • Regression test direction (short): Integration or unit test: empty processed/ listing at commit time → assert commit_attempts increases and eventual FAILED (or explicit throw),
    not infinite PENDING.

@arthurpassos Thoughts?

This is VERY unlikely, it can only happen if the zookeeper data got corrupted in between calls. And for these cases, the task timeout should do its job.

Regardless, the idea of treating those cases as errors and bump the commit_retry_attempts is a good idea. I will implement it.

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

Tests are pretty much done and passing (with the exception of some issues discussed above with iceberg writes) and cover rest and glue catalogs.
Did one more pass of AI audit and it came back with one issue.

Medium: Commit precondition no-op leaves export `PENDING` without commit retry budget
  • Impact: After all parts are in processed/, commit() can return without calling Iceberg commit or updating ZK status, and without calling handleCommitFailure. The task can stay
    `PENDING` indefinitely (until manual kill, TTL on a later terminal state, or optional task timeout), with `commit_attempts` stuck at 0.
  • Anchor: ExportPartitionUtils::commit (ExportPartitionUtils.cpp, early returns after getExportedPaths); callers ExportPartitionTaskScheduler::handlePartExportSuccess and
    ExportPartitionManifestUpdatingTask.cpp (tryCleanup commit fix-up).
  • Trigger: getExportedPaths returns empty or fewer paths than manifest.parts.size() while ZK already shows no processing/* children (e.g. ZK read inconsistency, partial processed
    data, or path accounting bug).
  • Why this is a defect: The state model explicitly adds commit_attempts / FAILED for exceptional commit failures, but this branch is a silent failure — no terminal transition and
    no budget consumption; `tryCleanup` still returns `true` after a no-op `commit()`, which mislabels cleanup success relative to the documented tryCleanup contract.
  • Fix direction (short): Treat empty / incomplete exported paths as an error: throw (so existing handleCommitFailure runs) or call handleCommitFailure / a dedicated counter before
    return; make tryCleanup return `false` unless ZK status is actually terminalized.
  • Regression test direction (short): Integration or unit test: empty processed/ listing at commit time → assert commit_attempts increases and eventual FAILED (or explicit throw),
    not infinite PENDING.

@arthurpassos Thoughts?

This is VERY unlikely, it can only happen if the zookeeper data got corrupted in between calls. And for these cases, the task timeout should do its job.

Regardless, the idea of treating those cases as errors and bump the commit_retry_attempts is a good idea. I will implement it.

Should be fixed by af3dda1

@DimensionWieldr
Copy link
Copy Markdown
Collaborator

DimensionWieldr commented Apr 24, 2026

We don't have time to iron out everything, but I think we've covered the main user scenarios. Since we're looking to release very soon, I'll put down a few more of AI's nitpicks that we can address if there is time. Otherwise, we can leave them for the next release.

Zero-row export/import path can dereference null writer internals
    • Impact: export-to-Iceberg can crash in finalization when no data file was opened.
    • Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp (IcebergImportSink::finalizeBuffers), src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.cpp
       (MultipleFileWriter::finalize).
    • Trigger: export pipeline produces no chunks (e.g., part rows fully masked by delete mask).
    • Why defect: finalizeBuffers() always calls writer->finalize(), but finalize() unconditionally dereferences output_format/buffer, which are initialized only after startNewFile()
       from consume().
    • Fix direction (short): guard finalize()/finalizeBuffers() for "no file opened" state.

Replicated export tasks can fail despite query-level Iceberg enablement
    • Impact: EXPORT PARTITION ... TO Iceberg can fail in background with SUPPORT_IS_DISABLED even when initiating query enabled Iceberg writes.
    • Anchor: src/Storages/MergeTree/ExportPartitionUtils.cpp (getContextCopyWithTaskSettings), src/Storages/MergeTree/MergeTreeData.cpp (exportPartToTable).
    • Trigger: global/profile allow_experimental_insert_into_iceberg=0, session enables it only for the export query (default scheduler mode).
    • Why defect: background task context reconstructs selected settings but does not propagate allow_experimental_insert_into_iceberg; part export re-checks it and throws.
    • Fix direction (short): persist and replay allow_experimental_insert_into_iceberg in manifest/task context.

Transactional truncate constructs malformed catalog metadata location
    • Impact: truncate on transactional catalogs can publish an invalid metadata URI (double-prefixed location), risking failed or broken catalog updates.
    • Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp (truncate), plus src/Storages/ObjectStorage/DataLakes/Iceberg/FileNamesGenerator.cpp
      (generateMetadataName).
    • Trigger: catalog->isTransactional() true in truncate flow.
    • Why defect: generator is initialized with table_dir=location, so metadata_name is already location-prefixed; code then prepends location again.
    • Fix direction (short): use metadata_name directly when already absolute/prefixed.

KILLED can still be overwritten to COMPLETED by racing commit
    • Impact: user-visible state machine is non-monotonic (PENDING -> KILLED -> COMPLETED) under kill/commit race.
    • Anchor: src/Storages/MergeTree/ExportPartitionUtils.cpp (commit), src/Storages/StorageReplicatedMergeTree.cpp (killExportPartition).
    • Trigger: KILL EXPORT PARTITION while commit already passed data commit and is updating status.
    • Why defect: kill uses version-checked CAS from PENDING; commit uses unconditional trySet(..., -1) to COMPLETED.
    • Fix direction (short): make final COMPLETED transition conditional on expected pending state/version.

@DimensionWieldr DimensionWieldr added the verified Approved for release label Apr 24, 2026
@arthurpassos
Copy link
Copy Markdown
Collaborator Author

We don't have time to iron out everything, but I think we've covered the main user scenarios. Since we're looking to release very soon, I'll put down a few more nitpicks that we can address if there is time. Otherwise, we can leave them for the next release.


Zero-row export/import path can dereference null writer internals

    • Impact: export-to-Iceberg can crash in finalization when no data file was opened.

    • Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp (IcebergImportSink::finalizeBuffers), src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.cpp

       (MultipleFileWriter::finalize).

    • Trigger: export pipeline produces no chunks (e.g., part rows fully masked by delete mask).

    • Why defect: finalizeBuffers() always calls writer->finalize(), but finalize() unconditionally dereferences output_format/buffer, which are initialized only after startNewFile()

       from consume().

    • Fix direction (short): guard finalize()/finalizeBuffers() for "no file opened" state.



Replicated export tasks can fail despite query-level Iceberg enablement

    • Impact: EXPORT PARTITION ... TO Iceberg can fail in background with SUPPORT_IS_DISABLED even when initiating query enabled Iceberg writes.

    • Anchor: src/Storages/MergeTree/ExportPartitionUtils.cpp (getContextCopyWithTaskSettings), src/Storages/MergeTree/MergeTreeData.cpp (exportPartToTable).

    • Trigger: global/profile allow_experimental_insert_into_iceberg=0, session enables it only for the export query (default scheduler mode).

    • Why defect: background task context reconstructs selected settings but does not propagate allow_experimental_insert_into_iceberg; part export re-checks it and throws.

    • Fix direction (short): persist and replay allow_experimental_insert_into_iceberg in manifest/task context.



Transactional truncate constructs malformed catalog metadata location

    • Impact: truncate on transactional catalogs can publish an invalid metadata URI (double-prefixed location), risking failed or broken catalog updates.

    • Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp (truncate), plus src/Storages/ObjectStorage/DataLakes/Iceberg/FileNamesGenerator.cpp

      (generateMetadataName).

    • Trigger: catalog->isTransactional() true in truncate flow.

    • Why defect: generator is initialized with table_dir=location, so metadata_name is already location-prefixed; code then prepends location again.

    • Fix direction (short): use metadata_name directly when already absolute/prefixed.



KILLED can still be overwritten to COMPLETED by racing commit

    • Impact: user-visible state machine is non-monotonic (PENDING -> KILLED -> COMPLETED) under kill/commit race.

    • Anchor: src/Storages/MergeTree/ExportPartitionUtils.cpp (commit), src/Storages/StorageReplicatedMergeTree.cpp (killExportPartition).

    • Trigger: KILL EXPORT PARTITION while commit already passed data commit and is updating status.

    • Why defect: kill uses version-checked CAS from PENDING; commit uses unconditional trySet(..., -1) to COMPLETED.

    • Fix direction (short): make final COMPLETED transition conditional on expected pending state/version.

  1. It can't happen, there is no way to construct a zero row part afaik
  2. True, but not sure what's the right course of action here
  3. Truncate related
  4. We have already discussed this 100 times, not an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants