Dev runtime partition prune by BiteTheDDDDt · Pull Request #62589 · apache/doris

BiteTheDDDDt · 2026-04-17T09:29:28Z

-- =============================================================================
-- PART A — RANGE-partitioned fact table
--   4096 partitions, ~50M rows
-- =============================================================================

DROP TABLE IF EXISTS rf_perf_fact;
CREATE TABLE rf_perf_fact (
    id        BIGINT       NOT NULL,
    part_col  INT          NOT NULL,
    payload   VARCHAR(64)  NOT NULL
)
DUPLICATE KEY(id)
PARTITION BY RANGE(part_col) (
    -- 4096 buckets of width 1024 covering [0, 4096*1024)
    FROM (0) TO (4194304) INTERVAL 1024
)
DISTRIBUTED BY HASH(id) BUCKETS 4
PROPERTIES ("replication_num" = "1");

-- 50M rows; part_col uniformly spread across the 4096 partitions
INSERT INTO rf_perf_fact
SELECT
    number                              AS id,
    CAST(number % 4194304 AS INT)       AS part_col,
    CONCAT('p', CAST(number AS STRING)) AS payload
FROM numbers("number" = "50000000");

DROP TABLE IF EXISTS rf_perf_dim;
CREATE TABLE rf_perf_dim (
    dim_key  INT NOT NULL
)
DUPLICATE KEY(dim_key)
DISTRIBUTED BY HASH(dim_key) BUCKETS 1
PROPERTIES ("replication_num" = "1");

-- 5 keys, all in [0, 1024)  => all hit partition #0
INSERT INTO rf_perf_dim VALUES (1), (7), (42), (100), (999);

ANALYZE TABLE rf_perf_fact WITH SYNC;
ANALYZE TABLE rf_perf_dim  WITH SYNC;

-- =============================================================================
-- Baseline: feature OFF — BE must scan every partition and filter row by row
-- =============================================================================
SET enable_runtime_filter_partition_prune = false;

-- warm up file cache / page cache
SELECT COUNT(*) FROM rf_perf_fact f JOIN rf_perf_dim d ON f.part_col = d.dim_key;

SELECT /*+ SET_VAR(runtime_filter_type='IN_OR_BLOOM_FILTER') */
       'rf_prune_OFF' AS tag,
       COUNT(*)       AS matched_rows
FROM rf_perf_fact f
JOIN rf_perf_dim  d ON f.part_col = d.dim_key;

-- =============================================================================
-- Optimized: feature ON — BE drops 1022 of 1024 partitions before scanning
-- =============================================================================
SET enable_runtime_filter_partition_prune = true;

-- warm up under the new path too
SELECT COUNT(*) FROM rf_perf_fact f JOIN rf_perf_dim d ON f.part_col = d.dim_key;

SELECT /*+ SET_VAR(runtime_filter_type='IN_OR_BLOOM_FILTER') */
       'rf_prune_ON' AS tag,
       COUNT(*)      AS matched_rows
FROM rf_perf_fact f
JOIN rf_perf_dim  d ON f.part_col = d.dim_key;

before:

- ExecTime: 468.760ms
- NumScanners: 2.048K (2048)
- TabletNum: 2.048K (2048)

after:

- ExecTime: 43.52ms
- TabletsPrunedByRuntimeFilter: 2.047K (2047)
- NumScanners: 1

Fragment RPC Phase1 Time: 247ms -> 335ms
Wait and Fetch Result Time: 541ms ->35ms
Total: 868ms -> 450ms

fix fix [improvement](be) Add thrift definitions for BE-side runtime filter partition pruning Problem Summary: Add thrift structures needed for BE-side runtime filter partition pruning: partition_boundaries container in TOlapScanNode, TTargetExprMonotonicity enum, and target_expr_monotonicity field in TRuntimeFilterDesc. Also add detailed comments to existing TPartitionBoundary and partiton_to_tablets fields. Key design decisions reflected in this thrift change: - No is_default_partition field: FE omits partitions it does not want pruned - No is_range_partition field: partition type inferred from TPartitionBoundary field presence (range_start/range_end → Range, list_values → List) - Single TTargetExprMonotonicity enum (NON_MONOTONIC / MONOTONIC_INCREASING / MONOTONIC_DECREASING) instead of two separate booleans None - Test: No need to test (thrift definition only, no runtime behavior change) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

hello-stephen · 2026-04-17T09:29:35Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Copilot

Pull request overview

This PR adds BE-side runtime-filter-based partition pruning for OLAP scans by having FE send partition boundary metadata to BE, then using runtime filter values to mark partitions (and associated tablets/scanners) as prunable during scan execution. It also introduces a dedicated regression test suite to validate pruning counters across range/list partitions and multiple data types.

Changes:

Extend Thrift plan structures to carry partition boundary descriptors (and related RF metadata) from FE to BE.
FE: populate partition boundary descriptors in OlapScanNode thrift; annotate runtime filter targets with a monotonicity hint.
BE: add a reusable runtime-filter partition pruner, integrate it into scan local state and scanner scheduling, and expose profile counters for pruned/total partitions.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
regression-test/suites/query_p0/runtime_filter/rf_partition_pruning.groovy	New regression tests asserting partition pruning via profile counters across partition types and expressions.
regression-test/data/query_p0/runtime_filter/rf_partition_pruning.out	Golden output for the new regression suite queries.
regression-test/conf/regression-conf.groovy	Changes default JDBC/HTTP ports used by regression framework configuration.
gensrc/thrift/PlanNodes.thrift	Adds `TPartitionBoundary`, adds scan-node fields for partition pruning, and adds `TTargetExprMonotonicity` + RF descriptor field.
fe/fe-core/src/main/java/org/apache/doris/planner/RuntimeFilter.java	Sets `target_expr_monotonicity` for eligible RF targets.
fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java	Populates `partition_boundaries` on `TOlapScanNode` based on selected partitions.
be/src/exec/scan/scanner_scheduler.cpp	Stops scanner work early when its partition has been pruned after late RF processing.
be/src/exec/scan/scanner.h	Adds a virtual hook for scanners to report partition-pruned state.
be/src/exec/scan/olap_scanner.{h,cpp}	Implements partition-pruned checks for OLAP scanners.
be/src/exec/runtime_filter/runtime_filter_partition_pruner.{h,cpp}	New BE component to parse boundaries and prune partitions based on RF conjuncts.
be/src/exec/operator/scan_operator.{h,cpp}	Integrates pruner into scan local state and adds profile counters.
be/src/exec/operator/olap_scan_operator.{h,cpp}	Parses FE-provided boundaries and filters tablets whose partitions are pruned.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  // Partition-id → tablet-id list mapping; used together with partition_boundaries
+  // for BE-side runtime filter partition pruning.
+  26: optional map<Types.TPartitionId, list<Types.TTabletId>> partiton_to_tablets
+  // Partition boundary descriptors for BE-side runtime filter partition pruning.
+  // Only partitions that are candidates for pruning are included; partitions FE
+  // does not want pruned (e.g. default catch-all) are omitted from this list.
+  27: optional list<TPartitionBoundary> partition_boundaries


+// When the target expression is a plain SlotRef (identity), FE may omit this
+// field; BE treats an absent value as NON_MONOTONIC (conservative).


+        for (RuntimeFilterTarget target : targets) {
+            if (target.expr instanceof SlotRef && target.node instanceof OlapScanNode) {
+                SlotRef slotRef = (SlotRef) target.expr;
+                Column col = slotRef.getColumn();
+                if (col != null) {
+                    OlapScanNode scanNode = (OlapScanNode) target.node;
+                    OlapTable table = scanNode.getOlapTable();
+                    PartitionType partType = table.getPartitionInfo().getType();
+                    if (partType == PartitionType.RANGE || partType == PartitionType.LIST) {
+                        for (Column partCol : table.getPartitionInfo().getPartitionColumns()) {
+                            if (partCol.getName().equalsIgnoreCase(col.getName())) {
+                                tFilter.setTargetExprMonotonicity(
+                                        TTargetExprMonotonicity.MONOTONIC_INCREASING);
+                                break;
+                            }
+                        }
+                    }
+                }
+            }
+            // Only check the first target; if there are multiple targets on
+            // different tables, monotonicity may differ. We conservatively
+            // set it based on the first target only.
+            break;
+        }


+        for (const auto& pb : boundaries_it->second) {
+            if (_pruned_partition_ids.contains(pb.partition_id) ||
+                newly_pruned.contains(pb.partition_id)) {
+                continue;
+            }


+jdbcUrl = "jdbc:mysql://127.0.0.1:9333/?useLocalSessionState=true&allowLoadLocalInfile=true&zeroDateTimeBehavior=round"
+targetJdbcUrl = "jdbc:mysql://127.0.0.1:9333/?useLocalSessionState=true&allowLoadLocalInfile=true&zeroDateTimeBehavior=round"


 feSyncerPassword = ""

-feHttpAddress = "127.0.0.1:8030"
+feHttpAddress = "127.0.0.1:8333"


BiteTheDDDDt · 2026-04-17T10:27:06Z

run buildall

BiteTheDDDDt · 2026-04-17T10:27:13Z

/review

github-actions · 2026-04-17T10:31:26Z

OpenCode automated review failed and did not complete.

Error: Review step was skipped (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24560451334

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

hello-stephen · 2026-04-17T10:54:59Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	78.48% (1798/2291)
Line Coverage	64.12% (32283/50345)
Region Coverage	65.03% (16235/24967)
Branch Coverage	55.54% (8675/15620)

BiteTheDDDDt · 2026-04-17T13:35:13Z

/review

hello-stephen · 2026-04-17T14:31:38Z

BE Regression && UT Coverage Report

Increment line coverage 84.54% (257/304) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.71% (27513/37328)
Line Coverage	57.47% (298118/518704)
Region Coverage	54.54% (247670/454115)
Branch Coverage	56.05% (107208/191288)

hello-stephen · 2026-04-17T14:32:39Z

FE Regression Coverage Report

Increment line coverage 77.32% (75/97) 🎉
Increment coverage report
Complete coverage report

github-actions · 2026-04-17T15:31:27Z

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24567893050

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

BiteTheDDDDt · 2026-04-20T01:51:40Z

/review

github-actions

This PR is not ready to merge yet.

Key blocking issues:

be/src/exec/operator/olap_scan_operator.cpp: the OLAP path ignores the existing enable_runtime_filter_partition_prune session variable, so SET enable_runtime_filter_partition_prune = false no longer disables runtime-filter partition pruning for OLAP scans.
fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java: multi-column RANGE partitions can be pruned incorrectly because FE serializes only the first partition column as an independent range, but Doris range partition membership is lexicographic over the full PartitionKey.
be/src/exec/runtime_filter/runtime_filter_partition_pruner.cpp: nullable LIST partitions are unsafe because FE now serializes NULL_LITERAL boundary values, while BE converts every boundary literal into a concrete CppType instead of preserving NULL semantics.

Critical checkpoint conclusions:

Goal of task: add BE-side runtime-filter partition pruning for OLAP scans. The simple single-column non-null cases are covered, but the implementation is not correct for all supported partition shapes and it regresses the existing disable switch.
Modification size/focus: mostly focused, but it also adds thrift/API surface (partition_to_tablets, planId_to_target_monotonicity) that is not consumed end-to-end in this PR.
Concurrency: no new deadlock or lock-order issue stood out in the reviewed path; the new pruning state uses the existing _conjuncts_lock plus a shared_mutex.
Lifecycle/static initialization: no special lifecycle or SIOF issue found.
Configuration items: the existing enable_runtime_filter_partition_prune behavior is regressed on the OLAP path.
Compatibility: the new thrift fields are optional, so wire compatibility looks acceptable.
Functionally parallel code paths: external/file scans still honor the session flag, while OLAP scans do not.
Special conditional checks: the multi-column RANGE handling is not safe as implemented.
Test coverage: good single-column happy-path coverage, but no multi-column RANGE coverage, no nullable LIST-partition coverage, and the suite avoids late-arrival timing by forcing runtime_filter_wait_infinitely=true and parallel_pipeline_task_num=1.
Test result file changes: the new .out file looks consistent with the cases the suite actually covers.
Observability: the added counters are useful; no blocking observability issue found.
Transaction/persistence/data-write checkpoints: not applicable here.
FE-BE variable passing: the new partition-boundary payload is wired through, but its semantics are incomplete for multi-column and nullable partition cases.
Performance: no separate blocking perf issue beyond the unexpected extra work caused by ignoring the disable flag.
Other issues: none beyond the blockers above.

Please fix the blockers above before approval.

github-actions · 2026-04-20T02:17:33Z

@@ -91,6 +95,9 @@ Status OlapScanLocalState::init(RuntimeState* state, LocalStateInfo& info) {

    RETURN_IF_ERROR(Base::init(state, info));
    RETURN_IF_ERROR(_sync_cloud_tablets(state));


This new OLAP path bypasses the existing enable_runtime_filter_partition_prune session flag. FileScanner still gates runtime-filter partition pruning on query_options().enable_runtime_filter_partition_prune, but here we always parse partition boundaries and later run _do_partition_pruning_by_rf() unconditionally from _process_conjuncts() / _on_runtime_filter_update(). As a result, SET enable_runtime_filter_partition_prune = false no longer disables the optimization for OLAP scans, which is a user-visible behavior regression.

github-actions · 2026-04-20T02:17:33Z

+    }
+
+    private void addRangeBoundaries(List<TPartitionBoundary> boundaries, long partitionId,
+            RangePartitionItem rangeItem, List<Column> partColumns,


This is not safe for multi-column RANGE partitions. Doris range partition membership is based on lexicographic PartitionKey comparison, not on an independent range over the first column. For example, a legal partition like [(1, 1), (1, 5)) becomes a first-column boundary [1, 1), so BE will see an empty interval and can prune the partition even when k1 = 1 should still match rows inside it. Unless FE can serialize a provably correct projection, this optimization needs to be disabled for multi-column RANGE partitions.

github-actions · 2026-04-20T02:17:33Z

+        bool is_range = tb.__isset.range_start || tb.__isset.range_end;                       \
+        if (!is_list && !is_range) break;                                                     \
+        ColumnValueRange<TYPE_##NAME> cvr(slot->col_name(), is_nullable, precision, scale);   \
+        auto parse_texpr_node = [&](const TExprNode& node) -> std::pair<CppType, ColumnPtr> { \


FE now emits NULL_LITERAL entries for nullable LIST partitions (OlapScanNode.addListBoundaries()), but this parser assumes every boundary literal can be converted into a concrete CppType. For a NULL_LITERAL, VLiteral builds a ColumnNullable; get_data_at(0) returns {nullptr, 0}, so the non-string branch dereferences a null pointer and the string branch collapses NULL to "". That makes nullable LIST partitions unsafe here. We need to propagate NULL through ColumnValueRange::contain_null() (or skip NULL boundaries for now) instead of treating it as an ordinary fixed value.

…time_filter_partition_prune ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: The new internal-table runtime filter partition pruning feature on this branch was always emitting partition boundaries (OlapScanNode) and per-target monotonicity (RuntimeFilter) into the thrift plan, regardless of the existing session variable `enable_runtime_filter_partition_prune`. This made the switch ineffective for turning the new feature off. Gate both FE-side hooks on the session variable. When the switch is off, the BE receives no partition_boundaries and no plan_id_to_target_monotonicity, so the BE-side pruner stays empty and all pruning code paths short-circuit naturally. No BE change required. ### Release note None ### Check List (For Author) - Test: No need to test (gating-only change; behavior covered by existing rf_partition_pruning regression suite which already toggles the variable) - Behavior changed: Yes — `enable_runtime_filter_partition_prune=false` now actually disables the new internal-table feature. - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… RANGE and nullable LIST partitions ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: 1. Multi-column RANGE: Doris RANGE partition membership is lexicographic PartitionKey comparison across all partition columns, not an independent per-column range. Projecting a multi-column range like [(1,1),(1,5)) onto its first column yields the empty interval [1,1), which the BE pruner would then use to wrongly prune the partition even though k1=1 rows still match. Until we can serialize a provably correct projection, only emit RANGE boundaries when there is exactly one partition column. 2. Nullable LIST partitions: addListBoundaries() emitted NULL_LITERAL into list_values, but the BE parser builds a typed CppType from each literal via VLiteral::get_column_ptr()->get_data_at(0). For a NULL literal that returns {nullptr, 0}, so the non-string branch dereferences a null pointer and the string branch silently collapses NULL into the empty string. Skip the whole column boundary for any partition whose values for that column include NULL — leaving the partition unpruned is always safe; emitting an incorrect boundary is not. ### Release note None ### Check List (For Author) - Test: No need to test (correctness fix; existing rf_partition_pruning regression suite covers single-column RANGE and non-null LIST cases that remain enabled) - Behavior changed: Yes — multi-column RANGE and nullable LIST partitions are no longer pruned by RF (previously incorrect). - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…e NULL in LIST runtime filter partition pruning ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: Two correctness issues in the runtime filter partition pruning path: 1. Multi-column RANGE partitions previously projected the partition's first column as the half-open interval [L1, U1), which is unsafe. For example, the partition [(1,1), (1,5)) only contains rows whose first key equals 1, so projecting to [1, 1) would mark the range as empty and incorrectly prune the partition. The mathematically safe projection is the closed interval [L1, U1]. A new optional thrift field 'range_end_inclusive' on TPartitionBoundary is introduced so the FE can tell BE to treat range_end as inclusive for multi-column RANGE; single-column RANGE keeps the original half-open semantics. 2. LIST partitions can legally contain NULL. The previous code sent a string literal for NULL, which caused the BE parser to read from a null pointer (UB / crash) and silently treat it as empty string. The FE now emits a real NULL_LITERAL TExprNode, and the BE parser recognises it, marks the column-value-range as 'contain_null', and the partition pruner conservatively skips any partition whose boundary contains NULL until the pruner is taught to reason about null_aware runtime filters. This is always safe (never prunes a partition that could match). ### Release note None ### Check List (For Author) - Test: - Manual test (BE single-file compile verified; full regression pending) - Behavior changed: No (gated by enable_runtime_filter_partition_prune which defaults to true but the new logic is strictly more correct than before) - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tition pruner ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: Previously the BE pruner conservatively skipped any partition whose boundary contained NULL (set via FE-emitted NULL_LITERAL on LIST keys). That was correct but pessimistic: a NULL row in a partition can match the runtime filter only when the RF itself accepts NULL, which is already encoded as HybridSet::contain_null() (FilterBase folds in the filter's null_aware bit). This change uses that signal to enable pruning in two more cases: 1. "Only-NULL" LIST partitions (e.g. PARTITION pNull VALUES IN ((NULL))): tracked by a new ParsedBoundary::only_null flag. Pruned when the RF does not accept NULL (typical EQ / MinMax / IN-without- null), kept otherwise. 2. Mixed LIST partitions (NULL plus concrete values): if the RF does not accept NULL, the NULL rows can never match, so we ignore the contain_null bit and let the regular value intersection decide prunability. Still kept if the RF accepts NULL. For BINARY_PRED runtime filters (=, <, >, <=, >=) NULL never compares true, so they are treated as not accepting NULL. ### Release note None ### Check List (For Author) - Test: - Manual test (BE single-file compile verified) - Behavior changed: No (gated by enable_runtime_filter_partition_prune; strictly more partitions can be pruned, never fewer, vs. the previous conservative behaviour) - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

BiteTheDDDDt · 2026-04-20T03:14:21Z

/review

BiteTheDDDDt · 2026-04-20T03:29:18Z

run buildall

github-actions

This PR is much closer, but one blocking correctness issue remains on the current head.

Blocking issue:

be/src/exec/runtime_filter/runtime_filter_partition_pruner.cpp: mixed NULL+value LIST partitions are encoded as NULL-only. parse_boundaries() first intersects the concrete list values into cvr, but ColumnValueRange::set_contain_null(true) then calls set_empty_value_range() and drops those fixed values. For a partition like {NULL, 5}, the BE stores only the NULL marker, so a later RF on 5 can incorrectly prune the partition.

Critical checkpoint conclusions:

Goal: add FE->BE runtime-filter partition pruning for OLAP scans. The current head now wires the main flow correctly and the earlier config-gating / multi-column RANGE issues look addressed, but mixed-null LIST pruning is still incorrect.
Scope/focus: still reasonably focused; the remaining bug is localized to BE LIST-boundary parsing.
Concurrency: reviewed the pruner integration again; _conjuncts_lock serializes pruning, _prune_mutex protects shared state, and I did not find a new lock-order or lifecycle issue.
Lifecycle/static initialization: no new lifecycle or static-init problem found.
Configuration: enable_runtime_filter_partition_prune is now gated correctly on the FE side.
Compatibility: the new thrift fields remain optional; no new rolling-upgrade issue stood out.
Parallel code paths: the OLAP path now matches the session-variable behavior of the other scan paths.
Special conditions: the mixed-null LIST case needs a dedicated representation because set_contain_null(true) is only safe for NULL-only predicates, not NULL plus fixed values.
Tests: the regression suite is broad for single-column happy paths, but it still lacks a LIST partition case containing both NULL and concrete values; that missing negative case is what let this bug through.
Test results: the new .out file looks consistent with the cases the suite actually covers.
Observability: the added counters are sufficient for diagnosis.
Transaction/persistence/data-write checkpoints: not applicable here.
FE-BE variable passing: no additional wiring gap found on the current head.
Performance: no separate blocking performance issue found on the current head.
Other issues: none beyond the blocker above.

Please fix the mixed NULL LIST handling and add coverage for a LIST partition that contains both NULL and concrete values.

github-actions · 2026-04-20T03:32:45Z

+                cvr.intersection(empty_cvr);                                                   \
+            }                                                                                  \
+            if (list_has_null && is_nullable) {                                                \
+                cvr.set_contain_null(true);                                                    \


When a LIST boundary contains both NULL and concrete values, cvr.intersection(empty_cvr) first materializes the concrete fixed-value set, but set_contain_null(true) immediately calls set_empty_value_range() and discards those fixed values. That turns a partition like {NULL, 5} into a NULL-only boundary, so _try_prune_by_single_rf() can later prune it even when the runtime filter contains 5. Please preserve both the concrete values and the NULL marker here instead of resetting the CVR.

…ns in RF pruner ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: When a LIST boundary contained both NULL and concrete values (e.g. PARTITION p VALUES IN ((NULL),(5))), the parser called cvr.intersection(empty_cvr) to materialize the concrete fixed-value set and then cvr.set_contain_null(true). The latter unconditionally calls set_empty_value_range() (see ColumnValueRange::set_contain_null), which discards the fixed values that were just inserted. The boundary collapsed to a NULL-only CVR, so a non-null-aware RF that does NOT contain 5 would still be considered "empty intersection" and the partition would be wrongly pruned even though row 5 actually matches the RF. Track NULL membership separately on ParsedBoundary (`contains_null`) and let the CVR carry only the concrete fixed values; the pruner already consults the new flag instead of cvr.contain_null(). ### Release note None ### Check List (For Author) - Test: - Manual test (BE single-file compile verified) - Behavior changed: No (RF partition pruning is gated by enable_runtime_filter_partition_prune; this fixes a correctness bug introduced in the previous commit on the same path) - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… more types in RF partition pruning suite ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: Extends regression-test/suites/query_p0/runtime_filter/rf_partition_pruning.groovy with additional cases that exercise the recently-added correctness fixes in the runtime filter partition pruner: - NULL handling on LIST partitions - only-NULL partition pruned when RF does not accept NULL - mixed (NULL + concrete value) partition: concrete value must survive (regression test for the set_contain_null destructive overwrite bug) - non-matching RF on mixed partition still prunes (NULL row alone does not save the partition) - null-safe <=> join on mixed partition (smoke test for null_aware RF semantics) - Multi-column RANGE projection - partition [(1,1),(1,5)) keeps rows when RF probes value 1, validating the closed [L1, U1] projection of the first partition column - Type coverage for RANGE partitions: TINYINT, SMALLINT, LARGEINT, DECIMAL(10,2), DATE, DATETIME, VARCHAR - Switch coverage: enable_runtime_filter_partition_prune=false should disable the new path completely (no boundaries reported, no pruning) ### Release note None ### Check List (For Author) - Test: - Regression test (added; .out to be regenerated by running the suite) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

BiteTheDDDDt · 2026-04-20T04:04:18Z

run buildall

BiteTheDDDDt · 2026-04-20T04:04:24Z

/review

github-actions

Blocking issue:

be/src/exec/runtime_filter/runtime_filter_partition_pruner.cpp: NULL-aware runtime filters are still treated as !rf_contains_null unless they are direct-IN filters. get_set_func() is null for bloom filters and for null-aware min/max predicates, so pb.only_null partitions are pruned even though <=> semantics should keep NULL probe rows. This can return wrong results for LIST partitions like VALUES IN ((NULL)) when the runtime filter is realized as BLOOM or NULL_AWARE_BINARY_PRED.

Critical checkpoint conclusions:

Goal of task: add BE-side runtime-filter partition pruning for OLAP scans. The main path is wired and the regression suite covers many single-column/non-null cases, but the current head is still incorrect for NULL-only LIST partitions under null-aware non-IN runtime filters, so the goal is not fully met.
Modification size/focus: The PR stays reasonably focused on planner metadata, BE pruning integration, and regression coverage.
Concurrency: I reviewed the new state flow and lock usage. _conjuncts_lock serializes pruning updates and _prune_mutex protects shared pruned-partition state; I did not find a blocking deadlock or lock-order issue.
Lifecycle/static initialization: No special lifecycle or static-initialization issue found.
Configuration items: The FE-side enable_runtime_filter_partition_prune gating now looks correct on the current head.
Compatibility/incompatible changes: The new thrift fields are optional, so wire compatibility looks acceptable.
Functionally parallel code paths: The OLAP path is wired, and I did not find another remaining FE/BE path mismatch on the current head.
Special conditional checks: The NULL-handling shortcut in the pruner is still incorrect for null-aware non-IN filters.
Test coverage: The suite is broad for single-column and non-null cases, but it does not force the null-safe <=> case down a real BLOOM or null-aware MIN/MAX path, so the remaining bug is uncovered.
Test result changes: The new .out file looks consistent with the queries the suite actually runs.
Observability: The added partition-pruning counters are useful and sufficient.
Transaction/persistence-related modifications: Not applicable.
Data writes/modifications: Not applicable.
FE-BE variable passing: partition_boundaries and null_aware are passed through, but BE does not yet interpret NULL-matching semantics correctly for all RF implementations.
Performance: No separate blocking performance issue found on the current head.
Other issues: None beyond the blocker above.

github-actions · 2026-04-20T04:46:44Z

+    // contain_null() (see FilterBase::contain_null = _null_aware && _contain_null).
+    // For BINARY_PRED predicates (=, <, >, <=, >=) NULL never compares true,
+    // so they never match a NULL probe row -- treat as not containing NULL.
+    auto hybrid_set_for_null = impl->get_set_func();


rf_contains_null is only derived from get_set_func(), which means it works for direct-IN filters but stays false for the other null-aware RF implementations. In particular, BLOOM_PRED carries its NULL bit on get_bloom_filter_func()->contain_null(), and null-aware min/max filters arrive as NULL_AWARE_BINARY_PRED rather than BINARY_PRED. With the current logic, a LIST partition like VALUES IN ((NULL)) is pruned by the pb.only_null branch for a null-safe <=> join as soon as the RF is realized as bloom or null-aware min/max, even though NULL probe rows should match and the partition must be kept.

…artition pruner ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: The previous version determined whether a runtime filter accepts a NULL probe row only via `impl->get_set_func()->contain_null()`. That covers the IN flavour, but real RFs ship in two more shapes: * BLOOM_PRED: NULL bit lives on get_bloom_filter_func()->contain_null() * NULL_AWARE_BINARY_PRED: a Min/Max RF built for null-safe equal joins; the NULL semantic is encoded in the node type itself (see runtime_filter/utils.cpp::create_vbin_predicate). For these shapes, `rf_contains_null` was always false, so a LIST partition like `VALUES IN ((NULL))` would be incorrectly pruned by the `pb.only_null` branch even when a null-safe `<=>` join with a NULL on the build side should keep the partition. Detect both shapes so the NULL-aware semantics are honoured. Also extend the value-pruning branch to accept NULL_AWARE_BINARY_PRED so we still benefit from concrete-bound min/max pruning on partitions that do not contain NULL. ### Release note None ### Check List (For Author) - Test: - Manual test (BE single-file compile verified). Existing regression suite covers null-safe join smoke path; pruning counts for the bloom path will surface in profile-based assertions when run. - Behavior changed: No (gated by enable_runtime_filter_partition_prune; the change is strictly more conservative against false pruning) - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

### What problem does this PR solve? Issue Number: close #xxx Problem Summary: The suite asserts on FE query-profile counters (TotalPartitionsForRFPruning / PartitionsPrunedByRuntimeFilter). Even though each query embeds a unique UUID token used to locate its own profile, the FE profile list is bounded (max_query_profile_num) and shared across sessions. Heavy parallel regression traffic can evict our profile before the poller (15 attempts, 500ms apart) finds it, causing flaky failures with total=0/pruned=0. Move the suite into the nonConcurrent group so profile-counter assertions are deterministic. ### Release note None ### Check List (For Author) - Test: - No need to test (regression metadata only) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

BiteTheDDDDt · 2026-04-20T06:24:10Z

/review

BiteTheDDDDt · 2026-04-20T06:24:15Z

run buildall

github-actions

Findings:

regression-test/data/query_p0/runtime_filter/rf_partition_pruning.out is missing the five new golden sections added by the suite, so the new regression case does not pass as submitted.
be/src/exec/runtime_filter/runtime_filter_partition_pruner.cpp does not handle BLOOM_PRED, so internal-table partition pruning becomes a no-op once an IN_OR_BLOOM_FILTER consumer materializes a real Bloom filter, or when runtime_filter_type='BLOOM_FILTER' is used directly.
regression-test/suites/query_p0/runtime_filter/rf_partition_pruning.groovy still treats profile not found within 7.5s as zero counters, which leaves the suite timing-sensitive on slow FE/CI runs.

Critical checkpoint conclusions:

Goal of the task: partially met. The PR adds OLAP-side runtime-filter partition pruning and valuable NULL/multi-column RANGE fixes, but the regression artifact is incomplete and one RF shape (Bloom) is still unhandled.
Scope/focus: mostly focused on planner/scan/pruner/test changes; no blocking scope creep beyond the new optional thrift fields.
Concurrency: reviewed. Late-arrival RF updates are serialized by _conjuncts_lock, and the shared-mutex use around pruned-partition lookups is acceptable; no blocking race found.
Lifecycle/static init: reviewed. Scanner/pruner lifetimes and literal-column ownership look intentional; no blocking lifecycle issue found.
Config changes: no new config was added. Existing enable_runtime_filter_partition_prune gating is correctly honored on the FE side.
Compatibility: optional thrift additions look backward-compatible; no blocking rolling-upgrade issue found.
Parallel/related code paths: not fully covered. The external/file-scan partition-pruning path can evaluate actual RF conjuncts, but the new OLAP pruner still skips the Bloom RF shape.
Special condition checks: the new NULL and multi-column RANGE guards are well-motivated and conservative.
Test coverage: insufficient as submitted. The new ordered-query cases are not all reflected in the golden file, and the profile polling helper remains timing-sensitive.
Modified test results: incorrect/incomplete because the .out file is missing new sections.
Observability: the new partition counters are useful; no blocking observability issue beyond the flaky profile lookup in tests.
Transaction/persistence/data-write impacts: not applicable here; no blocking issue found.
Performance: Bloom RF partition pruning currently provides no benefit in the OLAP path, which leaves a real runtime-filter shape behind.

Requesting changes until the missing regression outputs are added and the remaining unsupported/flaky paths are addressed.

github-actions · 2026-04-20T06:53:33Z

+                    using CvrType = std::decay_t<decltype(boundary_cvr)>;
+                    using CppType = typename CvrType::CppType;
+
+                    auto hybrid_set = impl->get_set_func();


VBloomPredicate falls through this pruning logic today. In the real Bloom case (runtime_filter_type='BLOOM_FILTER', or IN_OR_BLOOM_FILTER after the wrapper switches past runtime_filter_max_in_num), impl->node_type()==BLOOM_PRED, get_set_func()==nullptr, and we never enter the MinMax branch either. The result is that the new OLAP partition-pruning path becomes a no-op for Bloom RFs, so LIST-partition scans lose pruning entirely once the filter stops being an IN set.

github-actions · 2026-04-20T06:53:33Z

+    // ---- Profile utilities ----
+    def profileAction = new ProfileAction(context)
+
+    def getProfileByToken = { String token ->


This helper still turns profile was not visible yet into total=0/pruned=0: if the profile list does not contain the token within 15 * 500ms, getProfileByToken() returns "" and extractCounterSum() quietly reads that as zero. On a slow or loaded FE that makes the suite fail even though pruning worked. Please fail explicitly when the profile is still missing, or keep polling until it appears / the query is known to be gone.

github-actions · 2026-04-20T06:53:33Z

+-- !span_minmax --
+13	250	m
+3	50	c
+


The suite now defines five more ordered result checks, but this golden file still stops at span_minmax. The missing sections are list_only_null_pruned, list_only_null_minmax, list_mixed_value_kept, list_mixed_nullsafe, and range_multi_first_col, so run-regression-test.sh -d query_p0 -s rf_partition_pruning will fail at the first new order_qt_* case.

hello-stephen · 2026-04-20T07:00:31Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	78.02% (1842/2361)
Line Coverage	64.60% (32919/50958)
Region Coverage	65.22% (16326/25034)
Branch Coverage	55.76% (8707/15616)

hello-stephen · 2026-04-20T07:47:40Z

FE UT Coverage Report

Increment line coverage 22.33% (23/103) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-20T08:00:05Z

BE UT Coverage Report

Increment line coverage 1.42% (5/353) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.19% (20328/38216)
Line Coverage	36.74% (191335/520768)
Region Coverage	33.02% (148832/450786)
Branch Coverage	33.99% (65012/191278)

hello-stephen · 2026-04-20T09:36:08Z

BE Regression && UT Coverage Report

Increment line coverage 81.59% (288/353) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.81% (27625/37429)
Line Coverage	57.53% (298704/519189)
Region Coverage	54.62% (248515/454969)
Branch Coverage	56.06% (107555/191868)

hello-stephen · 2026-04-20T09:48:09Z

FE Regression Coverage Report

Increment line coverage 57.89% (77/133) 🎉
Increment coverage report
Complete coverage report

BiteTheDDDDt and others added 2 commits April 17, 2026 11:36

support runtime partition prune

483d2e7

Copilot AI review requested due to automatic review settings April 17, 2026 09:29

Copilot started reviewing on behalf of BiteTheDDDDt April 17, 2026 09:30 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

update

141c78d

github-actions bot requested changes Apr 20, 2026

View reviewed changes

BiteTheDDDDt and others added 4 commits April 20, 2026 10:19

github-actions bot requested changes Apr 20, 2026

View reviewed changes

BiteTheDDDDt and others added 2 commits April 20, 2026 11:41

github-actions bot requested changes Apr 20, 2026

View reviewed changes

		// When the target expression is a plain SlotRef (identity), FE may omit this
		// field; BE treats an absent value as NON_MONOTONIC (conservative).

		jdbcUrl = "jdbc:mysql://127.0.0.1:9333/?useLocalSessionState=true&allowLoadLocalInfile=true&zeroDateTimeBehavior=round"
		targetJdbcUrl = "jdbc:mysql://127.0.0.1:9333/?useLocalSessionState=true&allowLoadLocalInfile=true&zeroDateTimeBehavior=round"

		@@ -91,6 +95,9 @@ Status OlapScanLocalState::init(RuntimeState* state, LocalStateInfo& info) {

		RETURN_IF_ERROR(Base::init(state, info));
		RETURN_IF_ERROR(_sync_cloud_tablets(state));

Conversation

BiteTheDDDDt commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hello-stephen commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

BiteTheDDDDt commented Apr 17, 2026

Uh oh!

BiteTheDDDDt commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

hello-stephen commented Apr 17, 2026

Cloud UT Coverage Report

Uh oh!

BiteTheDDDDt commented Apr 17, 2026

Uh oh!

hello-stephen commented Apr 17, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 17, 2026

FE Regression Coverage Report

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

BiteTheDDDDt commented Apr 20, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

BiteTheDDDDt commented Apr 20, 2026

Uh oh!

BiteTheDDDDt commented Apr 20, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

BiteTheDDDDt commented Apr 20, 2026

Uh oh!

BiteTheDDDDt commented Apr 20, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

BiteTheDDDDt commented Apr 20, 2026

Uh oh!

BiteTheDDDDt commented Apr 20, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

BiteTheDDDDt commented Apr 17, 2026 •

edited

Loading