Skip to content

feat(parquet): make PushBuffers boundary-agnostic for prefetch IO#9697

Open
HippoBaro wants to merge 1 commit intoapache:mainfrom
HippoBaro:boundary_agnostic_pushbuffers
Open

feat(parquet): make PushBuffers boundary-agnostic for prefetch IO#9697
HippoBaro wants to merge 1 commit intoapache:mainfrom
HippoBaro:boundary_agnostic_pushbuffers

Conversation

@HippoBaro
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The PushDecoder (introduced in #7997, #8080) is designed to decouple IO and CPU. It holds non-contiguous byte ranges, with a NeedsData/push_range protocol. However, it requires each logical read to be satisfied in full by a single physical buffer: has_range, get_bytes, and Read::read all searched for one buffer that entirely covered the requested range.

This assumption conflates two orthogonal IO strategies:

  • Coalescing: the IO layer merges adjacent requested ranges into fewer, larger fetches.
  • Prefetching: the IO layer pushes data ahead of what the decoder has requested. This is an inversion of control: the IO layer speculatively fills buffers at offsets not yet requested and for arbitrary buffer sizes.

These two strategies interact poorly with the current release mechanism (clear_ranges), which matches buffers by exact range equality:

  • Coalescing is both rewarded and punished. It is load bearing because without it, the number of physical buffers scale with ranges requested, and clear_ranges performs an O(N×M) scan to remove consumed ranges, producing quadratic overhead on wide schemas. But it is also punished because a coalesced buffer never exactly matches any individual requested range, so clear_ranges silently skips it: the buffer leaks in PushBuffers until the decoder finishes or the caller manually calls release_all_ranges (ParquetPushDecoder API to clear all buffered ranges #9624). This increases peak RSS proportionally to the amount of data coalesced ahead of the current row group.

  • Prefetching is structurally impossible: speculatively pushed buffers will straddle future read boundaries, so the decoder cannot consume them, and clear_ranges cannot release them.

What changes are included in this PR?

This commit makes PushBuffers boundary-agnostic, completing the prefetching story, and changes the internals to scale with buffer count instead of range count:

  • Buffer stitching: has_range, get_bytes, and Read::read resolve logical ranges across multiple contiguous physical buffers via binary search, so the IO layer is free to push arbitrarily-sized parts without knowing future read boundaries. This is a nice improvement, because some IO layer can be made much more efficient when using uniform buffers and vectorized reads.

  • Incremental release (release_through): replaces clear_ranges with a watermark-based release that drops all buffers below a byte offset, trimming straddling buffers via zero-copy Bytes::slice. The decoder calls this automatically at row-group boundaries.

Are these changes tested?

Significant coverage added, all tests passing. Benchmark results (vs baseline in #9696):

  push_decoder/1buf/1000ranges       321.9 µs   (was 323.5 µs,  −1%)
  push_decoder/1buf/10000ranges       3.26 ms   (was  3.25 ms,  +0%)
  push_decoder/1buf/100000ranges      34.9 ms   (was  34.6 ms,  +1%)
  push_decoder/1buf/500000ranges     192.2 ms   (was 185.3 ms,  +4%)
  push_decoder/Nbuf/1000ranges       363.9 µs   (was 437.2 µs, −17%)
  push_decoder/Nbuf/10000ranges       3.82 ms   (was  10.7 ms, −64%)
  push_decoder/Nbuf/100000ranges      42.1 ms   (was 711.6 ms, −94%)

Are there any user-facing changes?

PushBuffers:: clear_all_ranges marked as deprecated in favor of the newer PushBuffers::clear_all.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 13, 2026
@HippoBaro HippoBaro force-pushed the boundary_agnostic_pushbuffers branch from 5eab5b9 to 5d60935 Compare April 13, 2026 07:11
@HippoBaro
Copy link
Copy Markdown
Contributor Author

@alamb This PR changes code you previously wrote, and I’d value your take on the direction.

Beyond fixing the quadratic complexity issue, it also pushes PushBuffers—pun intended—further along the path you laid out. More broadly, better support for I/O coalescing and speculative prefetching would be very valuable for us, and this feels like a step in that direction.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 13, 2026

I will review it -- I am currently working on reviewing

Thanks @HippoBaro

@HippoBaro HippoBaro force-pushed the boundary_agnostic_pushbuffers branch 2 times, most recently from 504035a to 2f891e7 Compare April 13, 2026 21:16
@HippoBaro
Copy link
Copy Markdown
Contributor Author

Rebased onto main to resolve conflicts and pick up the new benchmarks (#9696). We can now run the fancy bot and check whether it shows the expected speedups.

The `PushDecoder` (introduced in apache#7997, apache#8080) is designed to decouple
IO and CPU. It holds non-contiguous byte ranges, with a
`NeedsData`/`push_range` protocol. However, it requires each logical
read to be satisfied in full by a single physical buffer: `has_range`,
`get_bytes`, and `Read::read` all searched for one buffer that entirely
covered the requested range.

This assumption conflates two orthogonal IO strategies:

- Coalescing: the IO layer merges adjacent requested ranges into fewer,
  larger fetches.
- Prefetching: the IO layer pushes data ahead of what the decoder has
  requested. This is an inversion of control: the IO layer speculatively
  fills buffers at offsets not yet requested and for arbitrary buffer
  sizes.

These two strategies interact poorly with the current release mechanism
(`clear_ranges`), which matches buffers by exact range equality:

- Coalescing is both rewarded and punished. It is load bearing because
  without it, the number of physical buffers scale with ranges
  requested, and `clear_ranges` performs an O(N×M) scan to remove
  consumed ranges, producing quadratic overhead on wide schemas.
  But it is also punished because a coalesced buffer never exactly
  matches any individual requested range, so `clear_ranges` silently
  skips it: the buffer leaks in `PushBuffers` until the decoder
  finishes or the caller manually calls `release_all_ranges` (apache#9624).
  This increases peak RSS proportionally to the amount of data coalesced
  ahead of the current row group.

- Prefetching is structurally impossible: speculatively pushed
  buffers will straddle future read boundaries, so the decoder
  cannot consume them, and `clear_ranges` cannot release them.

This commit makes `PushBuffers` boundary-agnostic, completing the
prefetching story, and changes the internals to scale with buffer count
instead of range count:

- Buffer stitching: `has_range`, `get_bytes`, and `Read::read` resolve
  logical ranges across multiple contiguous physical buffers via binary
  search, so the IO layer is free to push arbitrarily-sized parts
  without knowing future read boundaries. This is a nice improvement,
  because some IO layer can be made much more efficient when using
  uniform buffers and vectorized reads.

- Incremental release (`release_through`): replaces `clear_ranges` with
  a watermark-based release that drops all buffers below a byte offset,
  trimming straddling buffers via zero-copy `Bytes::slice`.
  The decoder calls this automatically at row-group boundaries.

Benchmark results (vs baseline):

  push_decoder/1buf/1000ranges       321.9 µs   (was 323.5 µs,  −1%)
  push_decoder/1buf/10000ranges       3.26 ms   (was  3.25 ms,  +0%)
  push_decoder/1buf/100000ranges      34.9 ms   (was  34.6 ms,  +1%)
  push_decoder/1buf/500000ranges     192.2 ms   (was 185.3 ms,  +4%)
  push_decoder/Nbuf/1000ranges       363.9 µs   (was 437.2 µs, −17%)
  push_decoder/Nbuf/10000ranges       3.82 ms   (was  10.7 ms, −64%)
  push_decoder/Nbuf/100000ranges      42.1 ms   (was 711.6 ms, −94%)

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
@HippoBaro HippoBaro force-pushed the boundary_agnostic_pushbuffers branch from 2f891e7 to d2ea6c4 Compare April 13, 2026 21:26
Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HippoBaro. This looks interesting. I still have to do a deep dive on #9653 before diving into this.

/// single range). Coalescing at this level would require copying the data but
/// the caller may already have the needed data in a single buffer which would
/// require no copying.
/// This buffer does not coalesce (merging adjacent ranges of bytes into a ingle
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This buffer does not coalesce (merging adjacent ranges of bytes into a ingle
/// This buffer does not coalesce (merging adjacent ranges of bytes into a single

@nathanb9
Copy link
Copy Markdown
Contributor

Nice, @HippoBaro. One finding: this change introduces a regression in reverse row-group scans, which surfaced to me in a Datafusion test:
https://github.com/apache/datafusion/blob/9dab336ee08eb778e23cd04acdd05f93fd0196f4/datafusion/datasource-parquet/src/opener.rs#L2471-L2477

// Test reverse scan
let opener = make_opener(true);
let stream = opener.open(file.clone()).unwrap().await.unwrap();
let reverse_values = collect_int32_values(stream).await;

// The forward scan should return data in the order written
assert_eq!(forward_values, vec![1, 2, 3, 4, 5, 6, 7, 8, 9]);

// With reverse scan, row groups are reversed, so we expect:
// Row group 3 (7,8,9), then row group 2 (4,5,6), then row group 1 (1,2,3)
assert_eq!(reverse_values, vec![7, 8, 9, 4, 5, 6, 1, 2, 3]);

result:panic...buffers.rs:193:9: has_range(57..110) below watermark 163

Basically, the incremental release watermark invariant conflicts with reverse row-group traversal. From what I understand, incremental release decodes a row group, marks its ending byte offset as the watermark, and then expects subsequent ranges to stay at or above that offset. So not able to do reverse traversal and safely release in this way.

could potentially support a symmetric reverse version of the same monotonic watermark tracking you've introduced here?


/// Use [`Self::release_all`] instead.
#[deprecated(since = "58.1.0", note = "Use `release_all` instead")]
pub fn clear_all_ranges(&mut self) {

This comment was marked as off-topic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, we can keep the symbol—it’s just a rename. I’m trying not to expose the concept of “ranges” on the release side, since the watermark mechanism supersedes it.

///
/// Because IO completions are expected to generally arrive in-order,
/// `push_range` appends without sorting. We instead delay sorting until
/// conumption to amortize its cost, if necessary.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// conumption to amortize its cost, if necessary.
/// consumption to amortize its cost, if necessary.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 14, 2026

run benchmark push_decoder

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4244008571-1224-mblm4 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing boundary_agnostic_pushbuffers (d2ea6c4) to 711fac8 (merge-base) diff
BENCH_NAME=push_decoder
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench push_decoder
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                             boundary_agnostic_pushbuffers          main
-----                             -----------------------------          ----
push_decoder/1buf/100000ranges    1.05     53.0±0.48ms        ? ?/sec    1.00     50.7±0.64ms        ? ?/sec
push_decoder/1buf/10000ranges     1.02      4.8±0.04ms        ? ?/sec    1.00      4.7±0.05ms        ? ?/sec
push_decoder/1buf/1000ranges      1.01    476.2±2.55µs        ? ?/sec    1.00    469.4±2.66µs        ? ?/sec
push_decoder/1buf/500000ranges    1.00    282.2±2.24ms        ? ?/sec    1.06   298.0±43.35ms        ? ?/sec
push_decoder/Nbuf/100000ranges    1.00     57.5±0.40ms        ? ?/sec    14.52  834.3±76.53ms        ? ?/sec
push_decoder/Nbuf/10000ranges     1.00      5.5±0.13ms        ? ?/sec    2.25     12.4±0.03ms        ? ?/sec
push_decoder/Nbuf/1000ranges      1.00    530.3±5.49µs        ? ?/sec    1.10    582.9±2.43µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 183.4s
Peak memory 5.8 GiB
Avg memory 5.1 GiB
CPU user 179.1s
CPU sys 4.1s
Peak spill 0 B

branch

Metric Value
Wall time 102.7s
Peak memory 5.7 GiB
Avg memory 4.7 GiB
CPU user 101.4s
CPU sys 1.3s
Peak spill 0 B

File an issue against this benchmark runner

@HippoBaro
Copy link
Copy Markdown
Contributor Author

@nathanb9 Thank you for the feedback; I'll see what I can do there

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 14, 2026

push_decoder/Nbuf/100000ranges 1.00 57.5±0.40ms ? ?/sec 14.52 834.3±76.53ms ? ?/sec

That is certainly nice looking

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 14, 2026

run benchmark arrow_reader

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4246036406-1241-wplv5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing boundary_agnostic_pushbuffers (d2ea6c4) to 711fac8 (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @HippoBaro and @nathanb9 -- this is a great start.

In general I have two major concerns:

  1. The assumption of range usage in the parquet decoder
  2. The manual management of the sorted flag / parallel buffers

Let me know what you think

/// Clear any staged ranges currently buffered for future decode work.
pub fn clear_all_ranges(&mut self) {
self.buffers.clear_all_ranges();
/// Release all staged ranges currently buffered for future decode work.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the clear_all_ranges API was introduced in #9673 by @nathanb9 (and since we haven't released that yet) this change isn't a breaking API

// All data for this row group has been extracted into the
// InMemoryRowGroup. Release physical buffers up to the end
// of this row group so streaming IO can reclaim memory.
self.buffers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can configure the parquet reader to read row groups in some arbitrary order with_row_groups

Also technically there is no reason that row groups have to be written in order (though most writers will do that) -- for example, you could have a file where the bytes for row group 0 are after the bytes for row group 1.

So I think assuming that the reader will never want any bytes prior to the current row group should be reconsidered.

Can we instead perhaps release data for the start/end of the row group? rather than just a one sided range?

self.file_offset
}

/// Returns the byte offset just past the last column chunk in this row group.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think practically speaking most parquet files will have the column chunks for one row group written contiguously in the file, but I am not sure the spec requires this. I do think it effectively requires all pages for a column to be in a contiguous range

///
/// Thus, the implementation defers to the caller to coalesce subsequent requests
/// if desired.
/// # No Speculative Prefetching
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// The buffers of data that can be used to decode the Parquet file
buffers: Vec<Bytes>,
/// High-water mark set by [`Self::release_through`]. After a release,
/// no push, has_range, or read may target offsets below this value.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is probably good to point outwhat "may not" means (like does the code panic if it is tried?)

/// binary search (`has_range`, `get_bytes`, `release_through`,
/// `Read::read`). Callers that hold `&mut PushBuffers` should call this
/// once before lending `&PushBuffers` to read-side code.
pub fn ensure_sorted(&mut self) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be pub?

return;
}

// Insertion sort: zero-allocation and linear on nearly-sorted input
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is n^2 on reverse sorted input though, right?

watermark: u64,
/// Whether `ranges`/`buffers` are sorted by range start.
/// Set to `false` on every `push_range`, restored lazily before reads.
sorted: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we encoded the sort invariant in the type system rather than relying on the flag to be set correctly? Something like

enum Buffers {
  Sorted {
    ranges: Vec<Range<u64>>,
    buffers: Vec<Bytes>,
  }
  UnSorted {
    ranges: Vec<Range<u64>>,
    buffers: Vec<Bytes>,
  }
}

Maybe it is overly complicated but it make it much clearer that all paths correctly update the sorting

@@ -48,6 +59,13 @@ pub(crate) struct PushBuffers {
ranges: Vec<Range<u64>>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to keep a list sorted by start range, did you consider using a BTreeSet? You could then define some sort of wrapper over Range/Bytes liie

struct RangeAndData {
  range: Range<u64>,
  buffer: Bytes
}
impl PartialOrd for RangeAndData {
  // define comparison from start range
}

pub(crate) struct PushBuffers {
...
    buffers: BtreeSet<RangeAndData>,
}

That would probably simplify the accounting significantly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PushBuffers::clear_ranges is quadratic and leaks

5 participants