Summary
A RIGHT OUTER JOIN between s3Cluster and icebergS3Cluster table functions with object_storage_cluster_join_mode='global' returns 6 rows instead of the expected 2. The 2 matched rows appear correctly, but 4 additional spurious rows with NULL on the left side are emitted — as if those right-side rows had no match on the left, even though every right-side row does match a left-side row.
Test Parameters
| Parameter |
Value |
| Left table |
s3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data3/data/**.parquet', ...) |
| Right table |
icebergS3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data1', ...) |
| Join clause |
RIGHT OUTER JOIN |
| Join condition |
t1.boolean_col = t2.boolean_col AND t1.string_col = t2.string_col |
object_storage_cluster_join_mode |
global |
object_storage_cluster |
replicated_cluster |
| Format |
Values |
| Order by |
tuple(*) |
All tables have same two rows:
SELECT *
FROM merge_tree_table_d435968e_3cab_11f1_a333_de7b9eea348f
Query id: 4ed08192-c8f0-4221-82f3-2058f558586c
┌─boolean_col─┬─long_col─┬─double_col─┬─string_col─┬──────────────timestamp_col─┬───date_col─┬────time_col─┬────────────timestamptz_col─┬─integer_col─┬─float_col─┬─decimal_col─┐
1. │ true │ 1000 │ 456.78 │ Alice │ 2024-01-01 13:00:00.000000 │ 2024-01-01 │ 43200000000 │ 2024-01-01 12:00:00.000000 │ 1000 │ 456.78 │ 456.78 │
2. │ false │ 2000 │ 456.78 │ Bob │ 2024-01-01 13:00:00.000000 │ 2024-01-01 │ 43200000000 │ 2024-01-01 12:00:00.000000 │ 2000 │ 456.78 │ 456.78 │
└─────────────┴──────────┴────────────┴────────────┴────────────────────────────┴────────────┴─────────────┴────────────────────────────┴─────────────┴───────────┴─────────────┘
2 rows in set. Elapsed: 0.010 sec.
What Happened
Step 1 — Baseline Query (MergeTree tables, passed)
The test first materialized both data sources into local MergeTree tables via
CREATE TABLE ... ENGINE = MergeTree ORDER BY tuple() AS SELECT * FROM <cluster_function>(...)
and ran the same RIGHT OUTER JOIN to establish the expected result. This produced 2 rows — one matched row for Bob and one matched row for Alice, with no NULL-left rows (every right-side row has a match on the left).
Expected output (2 rows):
(false,2000,456.78,'Bob','2024-01-01 12:00:00.000000','2024-01-01','1970-01-01 12:00:00.000000','2024-01-01 12:00:00.000000',2000,456.78,456.78,false,2000,456.78,'Bob','2024-01-01 13:00:00.000000','2024-01-01',43200000000,'2024-01-01 12:00:00.000000',2000,456.78,456.78),
(true,1000,456.78,'Alice','2024-01-01 12:00:00.000000','2024-01-01','1970-01-01 12:00:00.000000','2024-01-01 12:00:00.000000',1000,456.78,456.78,true,1000,456.78,'Alice','2024-01-01 13:00:00.000000','2024-01-01',43200000000,'2024-01-01 12:00:00.000000',1000,456.78,456.78)
Step 2 — Actual Query (cluster table functions, failed)
The same join was then executed using the s3Cluster and icebergS3Cluster table functions directly, with settings:
object_storage_cluster_join_mode='global'
object_storage_cluster='replicated_cluster'
This produced 6 rows — the 2 correct matched rows, plus 4 extra rows with NULL values in all left-side columns (2 spurious (NULL, ..., Bob) rows and 2 spurious (NULL, ..., Alice) rows).
Actual output (6 rows):
(false,2000,...,'Bob',...,false,2000,...,'Bob',...),
(true,1000,...,'Alice',...,true,1000,...,'Alice',...),
(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,false,2000,...,'Bob',...),
(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,false,2000,...,'Bob',...),
(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,true,1000,...,'Alice',...),
(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,true,1000,...,'Alice',...)
Assertion Failure
assert result.output == expected_result.output
The diff shows the first two rows match, followed by 4 extra NULL-left rows that should not exist.
Root Cause Analysis
The RIGHT OUTER JOIN contract guarantees that every right-side row appears exactly once in the result — matched against its corresponding left-side row(s) where matches exist, or padded with NULLs on the left where no match exists globally. The MergeTree baseline confirms this behavior: every right-side row matches a left-side row, so there are no NULL-left rows.
However, when the query is distributed across the cluster via s3Cluster/icebergS3Cluster with object_storage_cluster_join_mode='global', the "outer" side of the join is not evaluated globally:
- Each node processes a partition of the data.
- When a given right-side row does not find a match in the local left-side partition, the node emits a
NULL-left row, treating it as "unmatched".
- Those right-side rows do have matches on other nodes, but the distributed join executor does not deduplicate or cross-check matches across nodes before emitting the
NULL-padded rows.
- The final result therefore contains both the correctly matched rows and spurious
NULL-left rows produced by nodes where the match did not happen locally.
This is a ClickHouse bug in the global join mode for cluster table functions: outer join semantics are applied per-node rather than globally, resulting in spurious NULL-padded rows for right-side rows that are globally matched but locally unmatched.
Why the Test's Non-Stable Handling Did Not Catch This
The test has logic (lines 335–468 in joins.py) that relaxes the output assertion for certain "non-stable" table-type combinations. However, all those exemptions check for object_storage_cluster_join_mode == "allow". This failure occurs with object_storage_cluster_join_mode == "global", which is not covered by the exemptions. The global mode exhibits the same non-stable behavior as allow for this table-type combination, but the test was not updated to account for it.
Relationship to Existing Issue
This bug is closely related to Altinity/ClickHouse#1661, which reports the same category of problem (distributed cluster joins producing incorrect row counts with object_storage_cluster_join_mode='global'), but for INNER ANY JOIN — where matched rows are duplicated. This failure is a different manifestation of the same underlying issue: the distributed join executor does not enforce global join semantics across cluster nodes.
| Issue |
Join Type |
Symptom |
| #1661 |
INNER ANY JOIN |
Matched rows duplicated (6 rows instead of 2) |
| This failure |
RIGHT OUTER JOIN |
Matched rows correct, but 4 spurious NULL-left rows added (6 rows instead of 2) |
Affected Table-Type Combination
| Left Table Type |
Right Table Type |
Join Mode |
Join Clause |
Status |
s3Cluster_table_function |
icebergS3Cluster_table_function |
allow |
any non-stable |
Handled (relaxed assertion) |
s3Cluster_table_function |
icebergS3Cluster_table_function |
global |
RIGHT OUTER JOIN |
NOT handled — this failure |
Possible Resolutions
Option A — ClickHouse Fix (Preferred)
File a bug (or extend #1661) against ClickHouse for RIGHT OUTER JOIN emitting spurious NULL-left rows when using cluster table functions with object_storage_cluster_join_mode='global'. The outer-join semantics must be enforced globally, not per-node — a right-side row should only produce a NULL-left row if it has no match on any node.
Option B — Test Workaround
Extend the non-stable result handling in check_join() to also cover object_storage_cluster_join_mode == "global" for the s3Cluster + icebergS3Cluster combination (and likely other cluster function pairs), relaxing the assertion to only check exitcode == 0.
Option C — Combined
File the ClickHouse bug and add the test relaxation with an xfail or similar marker referencing the upstream issue, so that the test suite is not blocked while waiting for a fix.
Reproduction
Input data (both s3Cluster and icebergS3Cluster return 8 rows each; every right-side row has a matching left-side row on (boolean_col, string_col)).
SELECT *
FROM s3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data3/data/**.parquet', '<user>', '<password>') AS t1
RIGHT OUTER JOIN icebergS3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data1', '<user>', '<password>') AS t2
ON t1.boolean_col = t2.boolean_col AND t1.string_col = t2.string_col
ORDER BY tuple(*)
SETTINGS object_storage_cluster_join_mode='global', object_storage_cluster='replicated_cluster'
FORMAT Values
Compare against the same query against MergeTree copies of the same data — the MergeTree version returns 2 rows (all matched), while the cluster version returns 6 rows (2 matched + 4 spurious NULL-left rows).
Summary
A
RIGHT OUTER JOINbetweens3ClusterandicebergS3Clustertable functions withobject_storage_cluster_join_mode='global'returns 6 rows instead of the expected 2. The 2 matched rows appear correctly, but 4 additional spurious rows withNULLon the left side are emitted — as if those right-side rows had no match on the left, even though every right-side row does match a left-side row.Test Parameters
s3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data3/data/**.parquet', ...)icebergS3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data1', ...)RIGHT OUTER JOINt1.boolean_col = t2.boolean_col AND t1.string_col = t2.string_colobject_storage_cluster_join_modeglobalobject_storage_clusterreplicated_clusterValuestuple(*)All tables have same two rows:
What Happened
Step 1 — Baseline Query (MergeTree tables, passed)
The test first materialized both data sources into local MergeTree tables via
CREATE TABLE ... ENGINE = MergeTree ORDER BY tuple() AS SELECT * FROM <cluster_function>(...)and ran the same
RIGHT OUTER JOINto establish the expected result. This produced 2 rows — one matched row for Bob and one matched row for Alice, with noNULL-left rows (every right-side row has a match on the left).Expected output (2 rows):
Step 2 — Actual Query (cluster table functions, failed)
The same join was then executed using the
s3ClusterandicebergS3Clustertable functions directly, with settings:object_storage_cluster_join_mode='global'object_storage_cluster='replicated_cluster'This produced 6 rows — the 2 correct matched rows, plus 4 extra rows with
NULLvalues in all left-side columns (2 spurious(NULL, ..., Bob)rows and 2 spurious(NULL, ..., Alice)rows).Actual output (6 rows):
Assertion Failure
The diff shows the first two rows match, followed by 4 extra
NULL-left rows that should not exist.Root Cause Analysis
The
RIGHT OUTER JOINcontract guarantees that every right-side row appears exactly once in the result — matched against its corresponding left-side row(s) where matches exist, or padded withNULLs on the left where no match exists globally. The MergeTree baseline confirms this behavior: every right-side row matches a left-side row, so there are noNULL-left rows.However, when the query is distributed across the cluster via
s3Cluster/icebergS3Clusterwithobject_storage_cluster_join_mode='global', the "outer" side of the join is not evaluated globally:NULL-left row, treating it as "unmatched".NULL-padded rows.NULL-left rows produced by nodes where the match did not happen locally.This is a ClickHouse bug in the
globaljoin mode for cluster table functions: outer join semantics are applied per-node rather than globally, resulting in spuriousNULL-padded rows for right-side rows that are globally matched but locally unmatched.Why the Test's Non-Stable Handling Did Not Catch This
The test has logic (lines 335–468 in
joins.py) that relaxes the output assertion for certain "non-stable" table-type combinations. However, all those exemptions check forobject_storage_cluster_join_mode == "allow". This failure occurs withobject_storage_cluster_join_mode == "global", which is not covered by the exemptions. Theglobalmode exhibits the same non-stable behavior asallowfor this table-type combination, but the test was not updated to account for it.Relationship to Existing Issue
This bug is closely related to Altinity/ClickHouse#1661, which reports the same category of problem (distributed cluster joins producing incorrect row counts with
object_storage_cluster_join_mode='global'), but forINNER ANY JOIN— where matched rows are duplicated. This failure is a different manifestation of the same underlying issue: the distributed join executor does not enforce global join semantics across cluster nodes.INNER ANY JOINRIGHT OUTER JOINNULL-left rows added (6 rows instead of 2)Affected Table-Type Combination
s3Cluster_table_functionicebergS3Cluster_table_functionallows3Cluster_table_functionicebergS3Cluster_table_functionglobalRIGHT OUTER JOINPossible Resolutions
Option A — ClickHouse Fix (Preferred)
File a bug (or extend #1661) against ClickHouse for
RIGHT OUTER JOINemitting spuriousNULL-left rows when using cluster table functions withobject_storage_cluster_join_mode='global'. The outer-join semantics must be enforced globally, not per-node — a right-side row should only produce aNULL-left row if it has no match on any node.Option B — Test Workaround
Extend the non-stable result handling in
check_join()to also coverobject_storage_cluster_join_mode == "global"for thes3Cluster+icebergS3Clustercombination (and likely other cluster function pairs), relaxing the assertion to only checkexitcode == 0.Option C — Combined
File the ClickHouse bug and add the test relaxation with an
xfailor similar marker referencing the upstream issue, so that the test suite is not blocked while waiting for a fix.Reproduction
Input data (both
s3ClusterandicebergS3Clusterreturn 8 rows each; every right-side row has a matching left-side row on(boolean_col, string_col)).Compare against the same query against MergeTree copies of the same data — the MergeTree version returns 2 rows (all matched), while the cluster version returns 6 rows (2 matched + 4 spurious
NULL-left rows).