Skip to content

[opt](memory) Remove unused dbId field from CloudReplica#62079

Merged
dataroaring merged 1 commit intoapache:masterfrom
dataroaring:opt/cloud-replica-remove-dbId
Apr 3, 2026
Merged

[opt](memory) Remove unused dbId field from CloudReplica#62079
dataroaring merged 1 commit intoapache:masterfrom
dataroaring:opt/cloud-replica-remove-dbId

Conversation

@dataroaring
Copy link
Copy Markdown
Contributor

Summary

  • Remove the dbId field and its @SerializedName from CloudReplica, saving 8 bytes per replica object
  • No external callers read dbId from CloudReplica — all callers use TabletMeta.getDbId() via the inverted index instead
  • Constructor parameter is kept for source compatibility but the value is no longer stored
  • Gson silently ignores unknown keys when deserializing old images, so backward compatibility is maintained

Memory Impact

-8 bytes/replica. At 10M replicas: ~76 MB saved. At 100M replicas: ~763 MB saved.

Test plan

  • Existing cloud tests pass (mvn test -pl fe/fe-core -Dtest="*Cloud*")
  • Gson backward compatibility: old-format JSON with "dbId" key deserializes without errors (silently ignored)

🤖 Generated with Claude Code

…plica)

The dbId field in CloudReplica is never read by any external caller.
Remove it to save 8 bytes per replica object. Gson silently ignores
unknown keys when deserializing old images, so this is backward compatible.

The constructor parameter is retained but no longer stored to avoid
changing the constructor signature across multiple callers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 3, 2026 04:02
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes FE cloud metadata memory usage by removing the persisted dbId field from CloudReplica, relying on TabletMeta.getDbId() via the inverted index instead while keeping constructor signature compatibility.

Changes:

  • Removed dbId field and its @SerializedName("dbId") from CloudReplica to reduce per-replica footprint.
  • Stopped storing dbId in the constructor (parameter remains for source compatibility).
  • Removed the getDbId() accessor from CloudReplica.
Comments suppressed due to low confidence (2)

fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java:117

  • The dbId constructor parameter is now intentionally unused (the field was removed), but the code doesn’t document this. This makes it easy for future readers to assume dbId is still persisted/meaningful. Consider renaming the parameter (e.g., unusedDbId) and/or adding an inline comment explaining it’s kept only for source compatibility and is not stored anymore.
    public CloudReplica(long replicaId, Long backendId, ReplicaState state, long version, int schemaHash,
            long dbId, long tableId, long partitionId, long indexId, long idx) {
        super(replicaId, -1, state, version, schemaHash);
        this.tableId = tableId;
        this.partitionId = partitionId;
        this.indexId = indexId;
        this.idx = idx;
    }

fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java:65

  • This change modifies CloudReplica’s persisted Gson schema by removing the dbId field/key. The PR description mentions backward compatibility (old JSON containing dbId should still deserialize), but there’s no unit test covering that contract. Please add a serialization/deserialization test that (1) deserializes an old-format JSON containing dbId into CloudReplica without errors and (2) verifies the new JSON output no longer contains dbId, to prevent regressions in future refactors.
    @SerializedName(value = "bes")
    private ConcurrentHashMap<String, List<Long>> primaryClusterToBackends = null;
    @SerializedName(value = "be")
    private ConcurrentHashMap<String, Long> primaryClusterToBackend = new ConcurrentHashMap<>();
    @SerializedName(value = "tableId")
    private long tableId = -1;
    @SerializedName(value = "partitionId")
    private long partitionId = -1;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dataroaring
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 28816 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5e426add6d83dd5126bfdf17ba47f3db60dc0284, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17837	3888	3775	3775
q2	q3	10699	856	599	599
q4	4676	466	359	359
q5	7479	1352	1157	1157
q6	194	176	146	146
q7	929	939	781	781
q8	9305	1423	1341	1341
q9	5529	5327	5251	5251
q10	6245	2042	1736	1736
q11	483	298	273	273
q12	639	408	286	286
q13	18053	2798	2163	2163
q14	290	285	261	261
q15	q16	854	857	787	787
q17	996	1165	740	740
q18	6431	5674	5487	5487
q19	1167	1220	1079	1079
q20	542	424	288	288
q21	4002	2342	1964	1964
q22	453	403	343	343
Total cold run time: 96803 ms
Total hot run time: 28816 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4052	4019	3998	3998
q2	q3	4573	4694	4158	4158
q4	2072	2107	1381	1381
q5	4870	4994	5151	4994
q6	207	177	138	138
q7	1987	1777	1612	1612
q8	3825	3250	3243	3243
q9	8305	8326	8314	8314
q10	4460	4431	4262	4262
q11	656	453	431	431
q12	680	746	521	521
q13	2802	3415	2325	2325
q14	312	314	287	287
q15	q16	775	819	719	719
q17	1393	1308	1279	1279
q18	8145	7115	7088	7088
q19	1132	1123	1135	1123
q20	2246	2215	1961	1961
q21	6233	5537	5160	5160
q22	570	556	447	447
Total cold run time: 59295 ms
Total hot run time: 53441 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 178171 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5e426add6d83dd5126bfdf17ba47f3db60dc0284, data reload: false

query5	4367	668	512	512
query6	329	222	201	201
query7	4302	582	329	329
query8	333	243	222	222
query9	8694	3925	3908	3908
query10	445	343	294	294
query11	6655	5475	5151	5151
query12	194	138	127	127
query13	1306	639	434	434
query14	5649	5179	4792	4792
query14_1	4088	4065	4067	4065
query15	207	204	180	180
query16	976	425	418	418
query17	920	740	612	612
query18	2479	494	380	380
query19	216	201	166	166
query20	137	134	126	126
query21	219	142	121	121
query22	13610	13488	13409	13409
query23	17942	17452	16995	16995
query23_1	16838	16836	17312	16836
query24	7724	1812	1410	1410
query24_1	1504	1418	1394	1394
query25	649	515	491	491
query26	1447	332	205	205
query27	3047	624	387	387
query28	4547	1979	1891	1891
query29	1037	679	581	581
query30	307	243	203	203
query31	1130	1092	993	993
query32	86	69	73	69
query33	560	355	296	296
query34	1276	1202	713	713
query35	807	804	693	693
query36	1347	1402	1184	1184
query37	145	99	88	88
query38	3130	3060	2965	2965
query39	932	905	876	876
query39_1	867	855	874	855
query40	244	163	134	134
query41	61	58	60	58
query42	112	109	108	108
query43	327	345	298	298
query44	
query45	211	208	196	196
query46	1286	1333	843	843
query47	2409	2403	2307	2307
query48	411	442	311	311
query49	638	529	415	415
query50	779	288	218	218
query51	4243	4249	4197	4197
query52	107	107	95	95
query53	254	283	203	203
query54	308	282	253	253
query55	98	92	87	87
query56	296	313	312	312
query57	1665	1786	1592	1592
query58	296	265	304	265
query59	2886	2983	2720	2720
query60	341	336	328	328
query61	158	150	156	150
query62	693	631	569	569
query63	245	202	194	194
query64	5412	1429	1042	1042
query65	
query66	1549	486	390	390
query67	24343	24491	24171	24171
query68	
query69	449	362	310	310
query70	956	1022	932	932
query71	313	285	282	282
query72	3128	2747	2455	2455
query73	838	768	465	465
query74	9910	9768	9584	9584
query75	2766	2606	2296	2296
query76	2373	1117	768	768
query77	401	399	323	323
query78	11318	11337	10757	10757
query79	1509	1068	864	864
query80	691	558	492	492
query81	472	275	235	235
query82	1349	154	119	119
query83	390	288	263	263
query84	311	149	118	118
query85	867	499	432	432
query86	413	360	305	305
query87	3286	3176	3077	3077
query88	3660	2741	2736	2736
query89	439	380	349	349
query90	1984	188	183	183
query91	177	171	143	143
query92	81	74	72	72
query93	975	958	571	571
query94	535	356	293	293
query95	647	368	428	368
query96	1033	749	341	341
query97	2658	2666	2552	2552
query98	243	241	227	227
query99	1077	1050	946	946
Total cold run time: 258823 ms
Total hot run time: 178171 ms

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

PR approved by anyone and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit 186a53e into apache:master Apr 3, 2026
37 of 39 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 3, 2026
## Summary
- Remove the `dbId` field and its `@SerializedName` from `CloudReplica`,
saving 8 bytes per replica object
- No external callers read `dbId` from CloudReplica — all callers use
`TabletMeta.getDbId()` via the inverted index instead
- Constructor parameter is kept for source compatibility but the value
is no longer stored
- Gson silently ignores unknown keys when deserializing old images, so
backward compatibility is maintained

## Memory Impact
-8 bytes/replica. At 10M replicas: ~76 MB saved. At 100M replicas: ~763
MB saved.

## Test plan
- [ ] Existing cloud tests pass (`mvn test -pl fe/fe-core
-Dtest="*Cloud*"`)
- [ ] Gson backward compatibility: old-format JSON with "dbId" key
deserializes without errors (silently ignored)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
dataroaring added a commit to dataroaring/incubator-doris that referenced this pull request Apr 4, 2026
schemaHash in the base Replica class wastes 4 bytes per CloudReplica.
Cloud storage paths don't use schema_hash, and callers like
MetadataViewer already guard with getSchemaHash() != -1.

This follows the same pattern as dbId removal in apache#62079.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dataroaring added a commit to dataroaring/incubator-doris that referenced this pull request Apr 17, 2026
schemaHash in the base Replica class wastes 4 bytes per CloudReplica.
Cloud storage paths don't use schema_hash, and callers like
MetadataViewer already guard with getSchemaHash() != -1.

This follows the same pattern as dbId removal in apache#62079.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dataroaring added a commit to dataroaring/incubator-doris that referenced this pull request Apr 17, 2026
- Remove the `dbId` field and its `@SerializedName` from `CloudReplica`,
saving 8 bytes per replica object
- No external callers read `dbId` from CloudReplica — all callers use
`TabletMeta.getDbId()` via the inverted index instead
- Constructor parameter is kept for source compatibility but the value
is no longer stored
- Gson silently ignores unknown keys when deserializing old images, so
backward compatibility is maintained

-8 bytes/replica. At 10M replicas: ~76 MB saved. At 100M replicas: ~763
MB saved.

- [ ] Existing cloud tests pass (`mvn test -pl fe/fe-core
-Dtest="*Cloud*"`)
- [ ] Gson backward compatibility: old-format JSON with "dbId" key
deserializes without errors (silently ignored)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 186a53e)

Generated by ThinkOps
@dataroaring
Copy link
Copy Markdown
Contributor Author

Cherry-picked to branch-4.0 in #62597. Trivial conflict resolved (preserved branch-4.0's existing primaryClusterToBackend = null initialization; only the dbId field, constructor assignment, and getter were removed as per this PR's scope).

— ThinkOps 🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants