Skip to content

[Draft] Add compression ratio calculation and per-column compression stats (#18184)#18185

Draft
johnsolomonj wants to merge 16 commits intoapache:masterfrom
johnsolomonj:feature/compression-stats-tracking
Draft

[Draft] Add compression ratio calculation and per-column compression stats (#18184)#18185
johnsolomonj wants to merge 16 commits intoapache:masterfrom
johnsolomonj:feature/compression-stats-tracking

Conversation

@johnsolomonj
Copy link
Copy Markdown
Contributor

@johnsolomonj johnsolomonj commented Apr 13, 2026

Labels: feature, release-notes, observability

Summary

Draft implementation for the PEP proposed in #18184. Kept as draft pending design review on the issue.

Adds compression ratio tracking and per-column compression stats to Pinot's existing table size and metadata APIs:

  • Track uncompressed forward index sizes at write time in all raw column writers (BaseChunkForwardIndexWriter subclasses, VarByteChunkForwardIndexWriterV4/V5/V6, CLPForwardIndexCreatorV2)
  • Persist uncompressed size and compression codec to metadata.properties per column
  • Expose compressionStats, columnCompressionStats, and storageBreakdown on both GET /tables/{table}/size and GET /tables/{table}/metadata
  • Add TABLE_COMPRESSION_RATIO_PERCENT and TABLE_TIERED_STORAGE_SIZE controller gauges with tier lifecycle management
  • Gated by table-level indexingConfig.compressionStatsEnabled flag (default: off, zero overhead when disabled)

Design document

See #18184 for the full PEP including motivation, prior art, API response structure, and known corner cases.

Key design decisions

  • Per-value tracking: Uncompressed size tracked at individual put*() callsites, capturing raw ingested data size without chunk headers or alignment padding
  • Shared codec resolution: ForwardIndexType.resolveCompressionType() handles CLP codec variants, used by both BaseSegmentCreator and ForwardIndexHandler
  • Dict columns out of scope: Dictionary-encoded columns report -1 for uncompressed size since writers only see dictionary IDs
  • Backward compatible: New metadata fields are additive; old segments gracefully return defaults

Test plan

  • Unit tests for writer uncompressed size tracking (fixed-byte, var-byte V1-V3, V4/V5/V6)
  • Unit tests for CLP V2 sub-stream size aggregation
  • Unit tests for ForwardIndexType.resolveCompressionType() codec resolution
  • Unit tests for ForwardIndexHandler compression stats persistence on reload
  • Controller aggregation tests (dict sentinel preservation, negative ratio guards, partial coverage)
  • Integration test for end-to-end compression stats API response
  • Verify zero overhead when compressionStatsEnabled = false

…to table size API

This feature enables tracking and reporting of forward index compression
effectiveness across Pinot segments. When `compressionStatsEnabled` is set
in table config's indexing config, segment creation records uncompressed
forward index sizes and compression codec in metadata.properties. The
server-side table size endpoint now returns per-segment and per-column
raw/compressed forward index sizes. The controller aggregates these into
table-level compression ratio metrics (raw/compressed), with partial
coverage tracking for mixed-version clusters. Three new ControllerGauge
metrics (TABLE_COMPRESSION_RATIO_PERCENT, TABLE_RAW_FORWARD_INDEX_SIZE_PER_REPLICA,
TABLE_COMPRESSED_FORWARD_INDEX_SIZE_PER_REPLICA) are emitted for monitoring.
ForwardIndexHandler is updated to persist compression metadata during
segment reload operations (compression type change and dict-to-raw conversion).
…feature

- Add 6 new test files covering writer-level tracking, segment creation,
  corner cases, ForwardIndexHandler reload, and integration tests for both
  offline and realtime (Kafka) ingestion paths
- Merge redundant dual-loop in TableSizeReader into a single pass over
  server info, improving performance during table size aggregation
- Fix offline integration test teardown to properly wait for table data
  manager removal before stopping servers
- Wrap second table cleanup in offline test in finally block to prevent
  resource leaks on assertion failure
…tier breakdown, and stale metadata cleanup

- Wrap flat compression fields in nested CompressionStats DTO with @JsonInclude(NON_NULL)
- Add StorageBreakdown with per-tier segment count and size (always reported)
- Add per-column ColumnCompressionDetail with aggregated sizes, ratio, and codec (MIXED when codecs differ across segments)
- Gate compressionStats on tableConfig.indexingConfig.compressionStatsEnabled; suppress from JSON when OFF
- Fix isPartialCoverage: now correctly returns true when 0 segments have stats but non-missing segments exist
- Clear stale forwardIndex.compressionCodec and forwardIndex.uncompressedSizeBytes on raw-to-dict reload
- Support null values in SegmentMetadataUtils.updateMetadataProperties to clear properties
- Add TABLE_TIERED_STORAGE_SIZE gauge; emit tier metrics always; clear compression+tier gauges when flag OFF
- Add testRawToDictClearsCompressionStats, testCompressionStatsNullWhenFlagOff, per-column/tier assertions
- Update integration tests for nested compressionStats JSON structure
…API, and comprehensive tests

- Gate uncompressed size tracking in forward index writers via compressionStatsEnabled flag
  (ForwardIndexCreatorFactory, ForwardIndexHandler, all raw index creators)
- Add per-column compression stats aggregation to server TablesResource and
  ServerSegmentMetadataReader with MIXED codec detection
- Extend TableMetadataInfo DTO with columnCompressionStats field (NON_NULL suppression)
- Fix integration test schema name mismatch for disabled-stats table
- Add 7 new test classes: IndexingConfigCompressionFlagTest, SegmentGeneratorConfigPropagationTest,
  CLPForwardIndexCreatorV2StatsTest, ServerTableSizeReaderRawBytesTest,
  TableMetadataReaderCompressionTest, TableMetadataInfoCompressionTest,
  ForwardIndexHandlerCompressionStatsTest updates
…feature flag

- Replace Map<String, ColumnCompressionStatsInfo> with List<ColumnCompressionStatsInfo> array
  containing all required fields: column, uncompressedSizeInBytes, compressedSizeInBytes,
  compressionRatio, codec, hasDictionary, indexes
- Gate columnCompressionStats in both server endpoints (TablesResource metadata,
  TableSizeResource size) on compressionStatsEnabled feature flag
- Add controller-side suppression in PinotTableRestletResource for safety against
  old servers that may still emit stats when flag is off
- Fix forward index size accumulation: use getIndexSizeFor(StandardIndexes.forward())
  directly per segment instead of cumulative variable
- Sort columnCompressionStats array by column name for deterministic output
- Update all tests and DTOs for the new array schema
…, and metadata endpoint gaps

Resolve default compression codec (LZ4/PASS_THROUGH) in BaseSegmentCreator and
ForwardIndexHandler when table config leaves chunkCompressionType null. Include
dictionary-encoded columns in columnCompressionStats with hasDictionary=true.
Clear stale controller compression metrics when no segments report stats. Suppress
zeroed compressionStats for dict-only tables. Add compressionStats summary to the
metadata endpoint aggregated from per-column data. Add tests for all fixes.
…ession

- Move columnCompressionStats to top-level field on TableSubTypeSizeDetails
  instead of nesting inside CompressionStats inner class
- Remove unused ColumnCompressionDetail inner class; use shared
  ColumnCompressionStatsInfo DTO from pinot-common
- Fix metadata endpoint field names to match size endpoint:
  rawForwardIndexSizePerReplicaInBytes, compressedForwardIndexSizePerReplicaInBytes
- Suppress compressionStats and columnCompressionStats for dict-only tables
  and when feature flag is OFF
- Dictionary columns report -1 for uncompressedSizeInBytes to distinguish
  from zero-size raw columns on both size and metadata endpoints
- Use LinkedHashSet for index deduplication in per-column aggregation
…adata endpoint

- Create CompressionStatsSummary DTO in pinot-common with
  rawForwardIndexSizePerReplicaInBytes, compressedForwardIndexSizePerReplicaInBytes,
  and compressionRatio
- Create StorageBreakdownInfo DTO in pinot-common with per-tier count and size
- Add both as @JsonInclude(NON_NULL) fields on TableMetadataInfo with
  backward-compatible constructors
- Server computes compressionStats summary and storageBreakdown during segment
  iteration and includes them in TableMetadataInfo response
- Controller aggregates both fields across servers in ServerSegmentMetadataReader
  (divides by numReplica like other fields)
- Remove manual addCompressionStatsSummary() JSON manipulation from
  PinotTableRestletResource; controller now just strips fields when flag is OFF
- Fix use-after-release: tier accumulation moved inside try block before
  segments are released in the finally block
…racking accuracy

- Remove ObjectNode.remove() pattern from PinotTableRestletResource; pass
  compressionStatsEnabled flag through TableMetadataReader and
  ServerSegmentMetadataReader so DTOs are constructed with null at creation
  time when the flag is OFF (storageBreakdown always preserved)
- Add segmentsWithStats, totalSegments, isPartialCoverage to
  CompressionStatsSummary so metadata endpoint has identical JSON schema
  to the size endpoint; populate during server and controller aggregation
- Fix VarByteChunkForwardIndexWriterV4 uncompressed size tracking: track
  raw value byte lengths in putBytes() instead of buffer.remaining() in
  write() which included chunk-format header overhead
- Move segment stats counting inside the try block in TablesResource to
  avoid accessing segment metadata after segments are released
- Add test for compression stats suppression when flag is disabled
Old segments lacking uncompressed size metadata (pre-flag segments) were
contributing their compressed forward index size to the denominator while
adding nothing to the numerator, deflating the compression ratio. Now
only dictionary-encoded columns enter the compressed-only accumulation
path; raw columns on old segments without codec/uncompressed data are
skipped entirely from both numerator and denominator.
- Track uncompressed size per value in putInt/putLong/putFloat/putDouble
  (FixedByteChunkForwardIndexWriter) and putBytes (VarByteChunkForwardIndexWriter)
  instead of using chunk buffer remaining bytes which overcounts due to
  chunk-internal offset tables
- Gate server size endpoint compression field collection on feature flag
  so no metadata access or accumulation occurs when flag is OFF
- Exclude old raw segments lacking a persisted compression codec from
  per-column stats to prevent sentinel values leaking into aggregation
- Guard Math.max in per-column accumulation against INDEX_NOT_FOUND (-1)
  sentinel values
- Preserve per-column compression stats for dict-only tables even when
  no segments have raw forward index data (segmentsWithStats == 0)
- Rename TABLE_COMPRESSION_RATIO_HUNDREDTHS gauge to
  TABLE_COMPRESSION_RATIO_PERCENT for consistency
- Hoist IndexService.getInstance() outside per-column inner loop
- Fix negative compression ratio for dict columns in metadata endpoint:
  require both uncompressed > 0 and compressed > 0 before computing
  ratio in ServerSegmentMetadataReader (per-column and summary level)
- Track emitted tier gauge keys per table in TableSizeReader so stale
  tier-suffixed gauges (tableName.tierKey) are removed when tiers
  disappear or table is deleted via SegmentStatusChecker cleanup
- Resolve CLP V2 actual compression type (ZSTANDARD) from the
  CompressionCodec before falling back to ForwardIndexConfig's
  chunkCompressionType, which maps all CLP variants to PASS_THROUGH
Extract resolveCompressionType() as a shared utility in ForwardIndexType
that correctly maps CLP codec variants to their actual compression types
(CLPV2/CLPV2_ZSTD → ZSTANDARD, CLPV2_LZ4 → LZ4, CLP → PASS_THROUGH).
This fixes ForwardIndexHandler using incorrect compression types during
codec changes and dict-to-raw conversions. BaseSegmentCreator now uses the
same shared method, handling nullable fieldType for schema evolution cases.

Also document CLPForwardIndexCreatorV2.getUncompressedSize() semantics:
returns pre-compression sub-stream byte total, not original UTF-8 length.
…umns

Size endpoint flag-OFF path now uses the 5-arg SegmentSizeInfo constructor
to pass through tier information, fixing storageBreakdown flattening all
segments into the "default" tier when compression stats are disabled.

Metadata endpoint aggregation now skips columns from old raw segments that
have no persisted compression codec and no dictionary, preventing zero-filled
entries from appearing in per-column compression stats.
Dictionary columns report -1 as uncompressed forward index size. When
aggregating across replicas, skip accumulation for negative sentinels
(using >= 0 guard) and reconstruct -1 in the output for dict columns
that have no real uncompressed data.

Also guard compressionStatsSummary construction on segmentsWithStats > 0
after replica division, avoiding a degenerate summary when integer
division rounds the count to zero.
…stats

Tier gauge emission was passing the tier-suffixed key (e.g.
"myTable_OFFLINE.coldTier") to isLeaderForTable(), which expects the
base table name. Hoist the leader check to use the canonical
tableNameWithType before the tier loop, then emit gauges directly.

On the server metadata endpoint, move computeIfAbsent for per-column
compression accumulators inside the conditional branches so old raw
segments without a persisted codec no longer create zero-filled entries.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 57.60135% with 251 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.34%. Comparing base (34d3fd6) to head (2563a62).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...che/pinot/server/api/resources/TablesResource.java 31.50% 42 Missing and 8 partials ⚠️
...t/controller/util/ServerSegmentMetadataReader.java 51.25% 27 Missing and 12 partials ⚠️
.../apache/pinot/controller/util/TableSizeReader.java 78.34% 18 Missing and 16 partials ⚠️
.../pinot/server/api/resources/TableSizeResource.java 8.33% 29 Missing and 4 partials ⚠️
...ent/creator/impl/fwd/CLPForwardIndexCreatorV2.java 0.00% 19 Missing ⚠️
...mon/restlet/resources/CompressionStatsSummary.java 0.00% 14 Missing ⚠️
...segment/spi/index/metadata/ColumnMetadataImpl.java 0.00% 14 Missing ⚠️
...oller/api/resources/PinotTableRestletResource.java 0.00% 8 Missing ⚠️
...ment/index/forward/ForwardIndexCreatorFactory.java 68.18% 4 Missing and 3 partials ⚠️
...local/segment/creator/impl/BaseSegmentCreator.java 79.16% 0 Missing and 5 partials ⚠️
... and 10 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18185      +/-   ##
============================================
+ Coverage     63.13%   63.34%   +0.20%     
- Complexity     1610     1627      +17     
============================================
  Files          3213     3230      +17     
  Lines        195730   197198    +1468     
  Branches      30240    30520     +280     
============================================
+ Hits         123583   124916    +1333     
+ Misses        62281    62255      -26     
- Partials       9866    10027     +161     
Flag Coverage Δ
custom-integration1 100.00% <ø> (?)
integration 100.00% <ø> (+100.00%) ⬆️
integration1 100.00% <ø> (?)
integration2 0.00% <ø> (ø)
java-11 63.29% <57.60%> (+0.18%) ⬆️
java-21 63.30% <57.60%> (+0.20%) ⬆️
temurin 63.34% <57.60%> (+0.20%) ⬆️
unittests 63.34% <57.60%> (+0.20%) ⬆️
unittests1 55.26% <40.50%> (-0.11%) ⬇️
unittests2 35.08% <57.43%> (+0.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants