Skip to content

perf+feat(decoding): block-subset partial decode (range + recovery) + per-block decompressed byte ranges #175

@polaz

Description

@polaz

⚠️ Deferred — post-Phase 6 drop-in parity (2026-05-19)

Project priority sequence per #28: complete encoder rewrite (#111 incl. #23) → speed/ratio optimizations (#178, #180) → params API (#27) → magicless (#26) → Phase 6 C-ABI / CLI drop-in (#126/#127/#128/#130/#131/#132) → THEN this track. lsm-tree bilateral coordination accepted 2026-05-18 — commitment preserved, but execution defers until drop-in parity ships. Pre-Phase-6 work on this issue will not be scheduled.


⚠️ Feature gate (mandatory): all Rust code added by this issue is compiled only when the lsm Cargo feature is enabled (#[cfg(feature = "lsm")] on every new public item — module, struct, enum variant, impl block, function). Feature is default off, opt-in for downstream consumers. Without lsm: build is byte-identical to today, no new public symbols, cdylib from Phase 6 stays strict drop-in for donor libzstd v1.5.7. C FFI surface is unaffected regardless of feature state.


Status

Confirmed scope (2026-06-06). Originally framed as a recovery-only spike ("decode as much as possible past corruption"). Re-scoped after checking the downstream consumer (lsm-tree #257) and the live ECC path:

So this lands as a block-subset primitive decode_blocks_partial(src, start_block, end_block) that serves performance first, with best-effort recovery as a general-purpose secondary mode of the same call (useful to non-encrypted callers; lsm-tree's corruption recovery lives one layer down on Page ECC). No longer a "may close without landing" spike: the perf path is a concrete, measurable deliverable.

Context

FrameDecoder::decode_blocks always decodes a frame from its first block and treats any block-decode failure as terminal. Two distinct consumer needs are unmet:

  1. Performance (primary): a range query needs only a subset of keys from an SST block. Mapping that key range to a decompressed byte range and then to inner-zstd-block indices, the caller wants to decode only the covering blocks, skip the rest, and avoid materialising / scanning the non-target output.
  2. Recovery (secondary): when ECC has exhausted its parity budget on a block and a zstd block is still unrecoverable, a caller may still want "give me whatever decoded successfully before the corruption" rather than discarding the whole frame.

A single block-subset partial decode covers both: a half-open [start_block, end_block) range for the perf path, and graceful stop-with-preserved-prefix for the recovery path.

Proposed scope

Decode side — decode_blocks_partial

#[cfg(feature = "lsm")]
impl FrameDecoder {
    /// Decode the inner blocks `[start_block, end_block)` of the current
    /// frame. Blocks before `start_block` are decoded into the window
    /// (required for match resolution — zstd blocks share one window, so
    /// leading blocks generally cannot be skipped) but their output is not
    /// returned; blocks at/after `end_block` are not decoded at all (the
    /// trailing-block perf win). Returns the decompressed bytes of the
    /// in-range blocks as one contiguous owned buffer.
    ///
    /// Doubles as best-effort recovery: if a block decode fails, decoding
    /// stops, the clean prefix of in-range output is preserved in `data`,
    /// and the failure is reported via `stopped_at`. `[0, u32::MAX)` decodes
    /// the whole frame, stopping at the first corrupt block (pure recovery).
    pub fn decode_blocks_partial(
        &mut self,
        src: impl Read,
        start_block: u32,
        end_block: u32, // exclusive; u32::MAX = to end of frame
    ) -> Result<PartialDecode, FrameDecoderError>;
}

#[cfg(feature = "lsm")]
pub struct PartialDecode {
    /// Decompressed bytes of the in-range blocks actually decoded
    /// (contiguous; `data.len()` == sum of their decompressed sizes).
    pub data: Vec<u8>,
    /// First block whose output is in `data` (== the requested start_block).
    pub start_block: u32,
    /// Count of in-range blocks successfully decoded into `data`.
    pub blocks_decoded: u32,
    /// `Some((block_index, error))` if decode stopped on a corrupt block
    /// before reaching `end_block`; `None` if the range decoded cleanly
    /// (or the frame's last block was reached first).
    pub stopped_at: Option<(u32, FrameDecoderError)>,
    /// True if the frame's last block was reached during this decode.
    pub frame_finished: bool,
}

Returns an owned Vec<u8> rather than the read()-cursor interface: a range query needs a contiguous buffer to scan keys anyway (the ring backend exposes two slices), and an owned buffer removes the window-retention ambiguity (the leading blocks' window bytes are physically required for in-range match resolution but must not leak into the returned output). A misused range (start_block > end_block) returns Err(FrameDecoderError::InvalidBlockRange { .. }), distinct from a corrupt-frame stop (reported via stopped_at).

Encode side — per-block decompressed byte ranges (consumer prerequisite)

#257 maps a key range → decompressed byte range → (start_block, end_block) and references FrameEmitInfo.blocks[*].decompressed_byte_range "from #173". That field was never specced or shipped in #173: FrameEmitInfo carries only compressed-side coordinates (offset_in_frame, body_size, ...). A Compressed block's decompressed size is not recoverable from its wire header (Block_Size is the compressed length), so the mapping must be captured encode-side. This issue adds it:

#[cfg(feature = "lsm")]
pub struct FrameBlock {
    // ...existing fields...
    /// Decompressed (regenerated) size of this block's output in bytes.
    pub decompressed_size: u32,
}

#[cfg(feature = "lsm")]
impl FrameEmitInfo {
    /// Half-open decompressed byte range `[start, end)` of `blocks[i]`
    /// in the frame's full decompressed output (prefix sum of
    /// `decompressed_size`). `None` if `i` is out of range.
    pub fn decompressed_byte_range(&self, block_index: usize) -> Option<core::ops::Range<u64>>;
}

Populated from the encoder's actual per-block input chunk sizes, threaded through both emit paths (one-shot borrowed + streaming) at the same block-emit sites as the existing per-block XXH64 sidecar (including post-split partitions).

Why this is hard

  • Subset correctness: zstd blocks share one window, so leading blocks [0, start_block) cannot simply be skipped — they must be decoded into the window so the in-range blocks' match offsets resolve, then their output discarded from the front once decoding is complete (match resolution done). The win is on the trailing side (blocks past end_block are never touched) plus not materialising / scanning the leading output.
  • Recovery boundary: the fused sequence executor already rolls the output buffer back to a pre-block checkpoint on a sequence error, so blocks 0..F-1 stay clean; the exact in-range byte count is tracked via per-block length deltas, and any trailing garbage from the failing block is simply left undrained. No FSE/window unwinding is needed because decode stops rather than resumes.

Acceptance criteria

  • Happy-path full decode (no corruption, [0, u32::MAX)) yields bytes identical to decode_blocks for the same frame.
  • Subset [s, e) returns PartialDecode.data byte-identical to the [decompressed_byte_range(s).start .. decompressed_byte_range(e-1).end) slice of the full decode, for every boundary in {1 block, 2 blocks, half, all}.
  • Trailing blocks past end_block are not decoded (assert via a sentinel / instrumented reader that the source past the last in-range block's body is not consumed).
  • Corrupt block at index N within range → stopped_at == Some((N, _)), data holds exactly the decompressed bytes of the in-range blocks start_block..N, blocks_decoded == N - start_block.
  • start_block > end_blockErr(InvalidBlockRange).
  • FrameEmitInfo::decompressed_byte_range(i) matches the actual decoded output offsets for Raw / RLE / Compressed blocks, including post-split frames.
  • No happy-path perf regression on compare_ffi decompress benches (< 1% delta) — the new path is lsm-gated and off the default decode path.
  • All lib tests pass.
  • At least one integration test demonstrating lsm-tree-style range-query partial recovery: encode a multi-block frame, use decompressed_byte_range to pick (start, end), decode the subset, assert it equals the corresponding slice of a full decode.

Related


ADDENDUM (2026-05-18): feature gating

decode_blocks_partial, PartialDecode, the new FrameBlock::decompressed_size field, and FrameEmitInfo::decompressed_byte_range are all gated behind the lsm Cargo feature (default off) — same gate as #171/#172/#173/#174. Default-build cdylib from #126/#127 remains strict drop-in for donor v1.5.7 (donor has no partial-decode primitive, so its absence in the no-feature build is correct).


Bilateral cross-reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3-lowLow priority — nice to haveenhancementNew feature or requestperformancePerformance optimization

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions