[Draft] Add compression ratio calculation and per-column compression stats (#18184)#18185
Draft
johnsolomonj wants to merge 16 commits intoapache:masterfrom
Draft
[Draft] Add compression ratio calculation and per-column compression stats (#18184)#18185johnsolomonj wants to merge 16 commits intoapache:masterfrom
johnsolomonj wants to merge 16 commits intoapache:masterfrom
Conversation
…to table size API This feature enables tracking and reporting of forward index compression effectiveness across Pinot segments. When `compressionStatsEnabled` is set in table config's indexing config, segment creation records uncompressed forward index sizes and compression codec in metadata.properties. The server-side table size endpoint now returns per-segment and per-column raw/compressed forward index sizes. The controller aggregates these into table-level compression ratio metrics (raw/compressed), with partial coverage tracking for mixed-version clusters. Three new ControllerGauge metrics (TABLE_COMPRESSION_RATIO_PERCENT, TABLE_RAW_FORWARD_INDEX_SIZE_PER_REPLICA, TABLE_COMPRESSED_FORWARD_INDEX_SIZE_PER_REPLICA) are emitted for monitoring. ForwardIndexHandler is updated to persist compression metadata during segment reload operations (compression type change and dict-to-raw conversion).
…feature - Add 6 new test files covering writer-level tracking, segment creation, corner cases, ForwardIndexHandler reload, and integration tests for both offline and realtime (Kafka) ingestion paths - Merge redundant dual-loop in TableSizeReader into a single pass over server info, improving performance during table size aggregation - Fix offline integration test teardown to properly wait for table data manager removal before stopping servers - Wrap second table cleanup in offline test in finally block to prevent resource leaks on assertion failure
…tier breakdown, and stale metadata cleanup - Wrap flat compression fields in nested CompressionStats DTO with @JsonInclude(NON_NULL) - Add StorageBreakdown with per-tier segment count and size (always reported) - Add per-column ColumnCompressionDetail with aggregated sizes, ratio, and codec (MIXED when codecs differ across segments) - Gate compressionStats on tableConfig.indexingConfig.compressionStatsEnabled; suppress from JSON when OFF - Fix isPartialCoverage: now correctly returns true when 0 segments have stats but non-missing segments exist - Clear stale forwardIndex.compressionCodec and forwardIndex.uncompressedSizeBytes on raw-to-dict reload - Support null values in SegmentMetadataUtils.updateMetadataProperties to clear properties - Add TABLE_TIERED_STORAGE_SIZE gauge; emit tier metrics always; clear compression+tier gauges when flag OFF - Add testRawToDictClearsCompressionStats, testCompressionStatsNullWhenFlagOff, per-column/tier assertions - Update integration tests for nested compressionStats JSON structure
…API, and comprehensive tests - Gate uncompressed size tracking in forward index writers via compressionStatsEnabled flag (ForwardIndexCreatorFactory, ForwardIndexHandler, all raw index creators) - Add per-column compression stats aggregation to server TablesResource and ServerSegmentMetadataReader with MIXED codec detection - Extend TableMetadataInfo DTO with columnCompressionStats field (NON_NULL suppression) - Fix integration test schema name mismatch for disabled-stats table - Add 7 new test classes: IndexingConfigCompressionFlagTest, SegmentGeneratorConfigPropagationTest, CLPForwardIndexCreatorV2StatsTest, ServerTableSizeReaderRawBytesTest, TableMetadataReaderCompressionTest, TableMetadataInfoCompressionTest, ForwardIndexHandlerCompressionStatsTest updates
…feature flag - Replace Map<String, ColumnCompressionStatsInfo> with List<ColumnCompressionStatsInfo> array containing all required fields: column, uncompressedSizeInBytes, compressedSizeInBytes, compressionRatio, codec, hasDictionary, indexes - Gate columnCompressionStats in both server endpoints (TablesResource metadata, TableSizeResource size) on compressionStatsEnabled feature flag - Add controller-side suppression in PinotTableRestletResource for safety against old servers that may still emit stats when flag is off - Fix forward index size accumulation: use getIndexSizeFor(StandardIndexes.forward()) directly per segment instead of cumulative variable - Sort columnCompressionStats array by column name for deterministic output - Update all tests and DTOs for the new array schema
…, and metadata endpoint gaps Resolve default compression codec (LZ4/PASS_THROUGH) in BaseSegmentCreator and ForwardIndexHandler when table config leaves chunkCompressionType null. Include dictionary-encoded columns in columnCompressionStats with hasDictionary=true. Clear stale controller compression metrics when no segments report stats. Suppress zeroed compressionStats for dict-only tables. Add compressionStats summary to the metadata endpoint aggregated from per-column data. Add tests for all fixes.
…ession - Move columnCompressionStats to top-level field on TableSubTypeSizeDetails instead of nesting inside CompressionStats inner class - Remove unused ColumnCompressionDetail inner class; use shared ColumnCompressionStatsInfo DTO from pinot-common - Fix metadata endpoint field names to match size endpoint: rawForwardIndexSizePerReplicaInBytes, compressedForwardIndexSizePerReplicaInBytes - Suppress compressionStats and columnCompressionStats for dict-only tables and when feature flag is OFF - Dictionary columns report -1 for uncompressedSizeInBytes to distinguish from zero-size raw columns on both size and metadata endpoints - Use LinkedHashSet for index deduplication in per-column aggregation
…adata endpoint - Create CompressionStatsSummary DTO in pinot-common with rawForwardIndexSizePerReplicaInBytes, compressedForwardIndexSizePerReplicaInBytes, and compressionRatio - Create StorageBreakdownInfo DTO in pinot-common with per-tier count and size - Add both as @JsonInclude(NON_NULL) fields on TableMetadataInfo with backward-compatible constructors - Server computes compressionStats summary and storageBreakdown during segment iteration and includes them in TableMetadataInfo response - Controller aggregates both fields across servers in ServerSegmentMetadataReader (divides by numReplica like other fields) - Remove manual addCompressionStatsSummary() JSON manipulation from PinotTableRestletResource; controller now just strips fields when flag is OFF - Fix use-after-release: tier accumulation moved inside try block before segments are released in the finally block
…racking accuracy - Remove ObjectNode.remove() pattern from PinotTableRestletResource; pass compressionStatsEnabled flag through TableMetadataReader and ServerSegmentMetadataReader so DTOs are constructed with null at creation time when the flag is OFF (storageBreakdown always preserved) - Add segmentsWithStats, totalSegments, isPartialCoverage to CompressionStatsSummary so metadata endpoint has identical JSON schema to the size endpoint; populate during server and controller aggregation - Fix VarByteChunkForwardIndexWriterV4 uncompressed size tracking: track raw value byte lengths in putBytes() instead of buffer.remaining() in write() which included chunk-format header overhead - Move segment stats counting inside the try block in TablesResource to avoid accessing segment metadata after segments are released - Add test for compression stats suppression when flag is disabled
Old segments lacking uncompressed size metadata (pre-flag segments) were contributing their compressed forward index size to the denominator while adding nothing to the numerator, deflating the compression ratio. Now only dictionary-encoded columns enter the compressed-only accumulation path; raw columns on old segments without codec/uncompressed data are skipped entirely from both numerator and denominator.
- Track uncompressed size per value in putInt/putLong/putFloat/putDouble (FixedByteChunkForwardIndexWriter) and putBytes (VarByteChunkForwardIndexWriter) instead of using chunk buffer remaining bytes which overcounts due to chunk-internal offset tables - Gate server size endpoint compression field collection on feature flag so no metadata access or accumulation occurs when flag is OFF - Exclude old raw segments lacking a persisted compression codec from per-column stats to prevent sentinel values leaking into aggregation - Guard Math.max in per-column accumulation against INDEX_NOT_FOUND (-1) sentinel values - Preserve per-column compression stats for dict-only tables even when no segments have raw forward index data (segmentsWithStats == 0) - Rename TABLE_COMPRESSION_RATIO_HUNDREDTHS gauge to TABLE_COMPRESSION_RATIO_PERCENT for consistency - Hoist IndexService.getInstance() outside per-column inner loop
- Fix negative compression ratio for dict columns in metadata endpoint: require both uncompressed > 0 and compressed > 0 before computing ratio in ServerSegmentMetadataReader (per-column and summary level) - Track emitted tier gauge keys per table in TableSizeReader so stale tier-suffixed gauges (tableName.tierKey) are removed when tiers disappear or table is deleted via SegmentStatusChecker cleanup - Resolve CLP V2 actual compression type (ZSTANDARD) from the CompressionCodec before falling back to ForwardIndexConfig's chunkCompressionType, which maps all CLP variants to PASS_THROUGH
Extract resolveCompressionType() as a shared utility in ForwardIndexType that correctly maps CLP codec variants to their actual compression types (CLPV2/CLPV2_ZSTD → ZSTANDARD, CLPV2_LZ4 → LZ4, CLP → PASS_THROUGH). This fixes ForwardIndexHandler using incorrect compression types during codec changes and dict-to-raw conversions. BaseSegmentCreator now uses the same shared method, handling nullable fieldType for schema evolution cases. Also document CLPForwardIndexCreatorV2.getUncompressedSize() semantics: returns pre-compression sub-stream byte total, not original UTF-8 length.
…umns Size endpoint flag-OFF path now uses the 5-arg SegmentSizeInfo constructor to pass through tier information, fixing storageBreakdown flattening all segments into the "default" tier when compression stats are disabled. Metadata endpoint aggregation now skips columns from old raw segments that have no persisted compression codec and no dictionary, preventing zero-filled entries from appearing in per-column compression stats.
Dictionary columns report -1 as uncompressed forward index size. When aggregating across replicas, skip accumulation for negative sentinels (using >= 0 guard) and reconstruct -1 in the output for dict columns that have no real uncompressed data. Also guard compressionStatsSummary construction on segmentsWithStats > 0 after replica division, avoiding a degenerate summary when integer division rounds the count to zero.
…stats Tier gauge emission was passing the tier-suffixed key (e.g. "myTable_OFFLINE.coldTier") to isLeaderForTable(), which expects the base table name. Hoist the leader check to use the canonical tableNameWithType before the tier loop, then emit gauges directly. On the server metadata endpoint, move computeIfAbsent for per-column compression accumulators inside the conditional branches so old raw segments without a persisted codec no longer create zero-filled entries.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18185 +/- ##
============================================
+ Coverage 63.13% 63.34% +0.20%
- Complexity 1610 1627 +17
============================================
Files 3213 3230 +17
Lines 195730 197198 +1468
Branches 30240 30520 +280
============================================
+ Hits 123583 124916 +1333
+ Misses 62281 62255 -26
- Partials 9866 10027 +161
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Labels:
feature,release-notes,observabilitySummary
Draft implementation for the PEP proposed in #18184. Kept as draft pending design review on the issue.
Adds compression ratio tracking and per-column compression stats to Pinot's existing table size and metadata APIs:
metadata.propertiesper columncompressionStats,columnCompressionStats, andstorageBreakdownon bothGET /tables/{table}/sizeandGET /tables/{table}/metadataTABLE_COMPRESSION_RATIO_PERCENTandTABLE_TIERED_STORAGE_SIZEcontroller gauges with tier lifecycle managementindexingConfig.compressionStatsEnabledflag (default: off, zero overhead when disabled)Design document
See #18184 for the full PEP including motivation, prior art, API response structure, and known corner cases.
Key design decisions
put*()callsites, capturing raw ingested data size without chunk headers or alignment paddingForwardIndexType.resolveCompressionType()handles CLP codec variants, used by bothBaseSegmentCreatorandForwardIndexHandler-1for uncompressed size since writers only see dictionary IDsTest plan
ForwardIndexType.resolveCompressionType()codec resolutionForwardIndexHandlercompression stats persistence on reloadcompressionStatsEnabled = false