[opt](memory) Remove unused dbId field from CloudReplica#62079
[opt](memory) Remove unused dbId field from CloudReplica#62079dataroaring merged 1 commit intoapache:masterfrom
Conversation
…plica) The dbId field in CloudReplica is never read by any external caller. Remove it to save 8 bytes per replica object. Gson silently ignores unknown keys when deserializing old images, so this is backward compatible. The constructor parameter is retained but no longer stored to avoid changing the constructor signature across multiple callers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes FE cloud metadata memory usage by removing the persisted dbId field from CloudReplica, relying on TabletMeta.getDbId() via the inverted index instead while keeping constructor signature compatibility.
Changes:
- Removed
dbIdfield and its@SerializedName("dbId")fromCloudReplicato reduce per-replica footprint. - Stopped storing
dbIdin the constructor (parameter remains for source compatibility). - Removed the
getDbId()accessor fromCloudReplica.
Comments suppressed due to low confidence (2)
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java:117
- The
dbIdconstructor parameter is now intentionally unused (the field was removed), but the code doesn’t document this. This makes it easy for future readers to assumedbIdis still persisted/meaningful. Consider renaming the parameter (e.g.,unusedDbId) and/or adding an inline comment explaining it’s kept only for source compatibility and is not stored anymore.
public CloudReplica(long replicaId, Long backendId, ReplicaState state, long version, int schemaHash,
long dbId, long tableId, long partitionId, long indexId, long idx) {
super(replicaId, -1, state, version, schemaHash);
this.tableId = tableId;
this.partitionId = partitionId;
this.indexId = indexId;
this.idx = idx;
}
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java:65
- This change modifies CloudReplica’s persisted Gson schema by removing the
dbIdfield/key. The PR description mentions backward compatibility (old JSON containingdbIdshould still deserialize), but there’s no unit test covering that contract. Please add a serialization/deserialization test that (1) deserializes an old-format JSON containingdbIdintoCloudReplicawithout errors and (2) verifies the new JSON output no longer containsdbId, to prevent regressions in future refactors.
@SerializedName(value = "bes")
private ConcurrentHashMap<String, List<Long>> primaryClusterToBackends = null;
@SerializedName(value = "be")
private ConcurrentHashMap<String, Long> primaryClusterToBackend = new ConcurrentHashMap<>();
@SerializedName(value = "tableId")
private long tableId = -1;
@SerializedName(value = "partitionId")
private long partitionId = -1;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
run buildall |
TPC-H: Total hot run time: 28816 ms |
TPC-DS: Total hot run time: 178171 ms |
|
PR approved by anyone and no changes requested. |
|
PR approved by at least one committer and no changes requested. |
## Summary - Remove the `dbId` field and its `@SerializedName` from `CloudReplica`, saving 8 bytes per replica object - No external callers read `dbId` from CloudReplica — all callers use `TabletMeta.getDbId()` via the inverted index instead - Constructor parameter is kept for source compatibility but the value is no longer stored - Gson silently ignores unknown keys when deserializing old images, so backward compatibility is maintained ## Memory Impact -8 bytes/replica. At 10M replicas: ~76 MB saved. At 100M replicas: ~763 MB saved. ## Test plan - [ ] Existing cloud tests pass (`mvn test -pl fe/fe-core -Dtest="*Cloud*"`) - [ ] Gson backward compatibility: old-format JSON with "dbId" key deserializes without errors (silently ignored) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
schemaHash in the base Replica class wastes 4 bytes per CloudReplica. Cloud storage paths don't use schema_hash, and callers like MetadataViewer already guard with getSchemaHash() != -1. This follows the same pattern as dbId removal in apache#62079. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
schemaHash in the base Replica class wastes 4 bytes per CloudReplica. Cloud storage paths don't use schema_hash, and callers like MetadataViewer already guard with getSchemaHash() != -1. This follows the same pattern as dbId removal in apache#62079. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove the `dbId` field and its `@SerializedName` from `CloudReplica`, saving 8 bytes per replica object - No external callers read `dbId` from CloudReplica — all callers use `TabletMeta.getDbId()` via the inverted index instead - Constructor parameter is kept for source compatibility but the value is no longer stored - Gson silently ignores unknown keys when deserializing old images, so backward compatibility is maintained -8 bytes/replica. At 10M replicas: ~76 MB saved. At 100M replicas: ~763 MB saved. - [ ] Existing cloud tests pass (`mvn test -pl fe/fe-core -Dtest="*Cloud*"`) - [ ] Gson backward compatibility: old-format JSON with "dbId" key deserializes without errors (silently ignored) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> (cherry picked from commit 186a53e) Generated by ThinkOps
|
Cherry-picked to branch-4.0 in #62597. Trivial conflict resolved (preserved branch-4.0's existing — ThinkOps 🤖 |
Summary
dbIdfield and its@SerializedNamefromCloudReplica, saving 8 bytes per replica objectdbIdfrom CloudReplica — all callers useTabletMeta.getDbId()via the inverted index insteadMemory Impact
-8 bytes/replica. At 10M replicas: ~76 MB saved. At 100M replicas: ~763 MB saved.
Test plan
mvn test -pl fe/fe-core -Dtest="*Cloud*")🤖 Generated with Claude Code