Skip to content

[opt](memory) Remove unused fields from CloudReplica#62125

Merged
gavinchou merged 3 commits intoapache:masterfrom
dataroaring:remove-unused-cloud-replica-fields
Apr 7, 2026
Merged

[opt](memory) Remove unused fields from CloudReplica#62125
gavinchou merged 3 commits intoapache:masterfrom
dataroaring:remove-unused-cloud-replica-fields

Conversation

@dataroaring
Copy link
Copy Markdown
Contributor

Summary

  • Remove segmentCount, rowsetCount fields and their getter/setter overrides from CloudReplica, moving the storage to the base Replica class
  • Remove memClusterToBackends in-memory cache and initMemClusterToBackends() method, simplifying the multi-replica backend selection path in getBackendIdImpl()
  • This follows the same pattern as [opt](memory) Remove unused dbId field from CloudReplica #62079 which removed the unused dbId field

Test plan

  • Verify FE compilation passes
  • Run full regression tests via run buildall

🤖 Generated with Claude Code

Remove segmentCount, rowsetCount, and memClusterToBackends from
CloudReplica. Move segmentCount and rowsetCount storage to the base
Replica class so all replica types can use them directly. Remove the
memClusterToBackends in-memory cache and its associated
initMemClusterToBackends method, simplifying the multi-replica
backend selection path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 5, 2026 02:04
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring
Copy link
Copy Markdown
Contributor Author

run buildall

dataroaring and others added 2 commits April 4, 2026 19:06
Keep the base Replica class lean by using no-op setters instead of
storing the values. These counts are not needed for non-cloud replicas.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…llers

Remove getSegmentCount/setSegmentCount/getRowsetCount/setRowsetCount
from Replica base class and clean up all call sites in
CloudTabletStatMgr and Checkpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dataroaring
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces memory footprint in cloud FE metadata by removing unused per-CloudReplica fields/caches and consolidating replica stat storage.

Changes:

  • Removed segmentCount / rowsetCount fields and overrides from CloudReplica.
  • Removed memClusterToBackends cache and related init/update logic from cloud multi-replica backend selection.
  • Added segmentCount / rowsetCount storage + serialization to base Replica.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java Drops unused fields/cache and simplifies multi-replica BE selection path.
fe/fe-core/src/main/java/org/apache/doris/catalog/Replica.java Adds stat fields + (de)serialization and enables base getters/setters to store values.
Comments suppressed due to low confidence (2)

fe/fe-core/src/main/java/org/apache/doris/catalog/Replica.java:110

  • Moving segmentCount / rowsetCount storage into the base Replica class adds two longs to all replica types (e.g. LocalReplica), even though these stats appear to be cloud-only (Prometheus export is gated by Config.isNotCloudMode()). This is likely a net memory increase for non-cloud deployments and seems at odds with the PR’s memory-optimization goal. Consider keeping these fields in CloudReplica (with overrides) or introducing a cloud-specific replica base/type so local replicas don’t pay the per-object cost.

    public Replica() {
    }

fe/fe-core/src/main/java/org/apache/doris/catalog/Replica.java:110

  • CloudReplica previously defaulted rowsetCount to 1 (// [0-1] rowset), but the new base field initializes it to 0. If persisted images (or newly constructed replicas) rely on the prior default, this changes reported stats and could affect the tablet-stats polling interval logic that compares previous vs new rowset counts. Consider preserving the prior default for cloud replicas (e.g., initialize to 1 in CloudReplica constructors or document/justify the new default).
    public Replica() {
    }


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 313 to 317
if (Config.enable_cloud_multi_replica) {
int indexRand = rand.nextInt(Config.cloud_replica_num);
int coldReadRand = rand.nextInt(100);
boolean allowColdRead = coldReadRand < Config.cloud_cold_read_percent;
initMemClusterToBackends();
boolean replicaEnough = memClusterToBackends.get(clusterId) != null
&& memClusterToBackends.get(clusterId).size() > indexRand;

long backendId = -1;
if (replicaEnough) {
backendId = memClusterToBackends.get(clusterId).get(indexRand);
}

if (!replicaEnough && !allowColdRead && primaryClusterToBackend.containsKey(clusterId)) {
backendId = primaryClusterToBackend.get(clusterId);
}

if (backendId > 0) {
Backend be = Env.getCurrentSystemInfo().getBackend(backendId);
if (be != null && be.isQueryAvailable()) {
if (LOG.isDebugEnabled()) {
LOG.debug("backendId={} ", backendId);
}
return backendId;
}
}

List<Long> res = hashReplicaToBes(clusterId, false, Config.cloud_replica_num);
if (res.size() < indexRand + 1) {
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable_cloud_multi_replica path now returns a backend ID from hashReplicaToBes(...) without validating that the selected backend is query-available. hashReplicaToBes filters by isAlive() / heartbeat time, not isQueryAvailable(), so this can route queries to an unavailable backend. Consider filtering by be.isQueryAvailable() (like hashReplicaToBe) or checking availability for the chosen ID and falling back to another candidate (or primary/secondary mapping) before returning.

Copilot uses AI. Check for mistakes.
@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29262 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c235d14bcd10fd25ab8474552b35e227e85ba272, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17618	3802	3880	3802
q2	q3	10709	849	626	626
q4	4679	469	374	374
q5	7448	1348	1159	1159
q6	189	171	142	142
q7	918	948	777	777
q8	9314	1497	1327	1327
q9	5551	5320	5267	5267
q10	6296	2046	1782	1782
q11	482	278	277	277
q12	692	406	295	295
q13	18132	2791	2174	2174
q14	284	283	261	261
q15	q16	882	859	782	782
q17	1041	1078	835	835
q18	6423	5662	5579	5579
q19	1304	1302	1092	1092
q20	568	439	290	290
q21	4888	2480	2056	2056
q22	504	452	365	365
Total cold run time: 97922 ms
Total hot run time: 29262 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4531	4529	4562	4529
q2	q3	4609	4710	4167	4167
q4	2080	2087	1336	1336
q5	4910	4930	5178	4930
q6	205	169	142	142
q7	1961	1767	1644	1644
q8	3265	3086	3109	3086
q9	8213	8195	8231	8195
q10	4434	4527	4269	4269
q11	623	436	467	436
q12	658	761	498	498
q13	2888	3078	2441	2441
q14	303	312	271	271
q15	q16	762	780	689	689
q17	1297	1259	1190	1190
q18	7986	7016	7054	7016
q19	1166	1159	1148	1148
q20	2202	2224	1968	1968
q21	6154	5443	5181	5181
q22	562	508	449	449
Total cold run time: 58809 ms
Total hot run time: 53585 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 178368 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c235d14bcd10fd25ab8474552b35e227e85ba272, data reload: false

query5	4339	673	539	539
query6	331	226	204	204
query7	4267	554	339	339
query8	332	239	238	238
query9	8759	3880	3922	3880
query10	454	349	306	306
query11	6685	5492	5117	5117
query12	192	136	130	130
query13	1371	608	458	458
query14	5697	5193	4794	4794
query14_1	4107	4082	4110	4082
query15	213	197	184	184
query16	998	472	416	416
query17	1105	738	626	626
query18	2604	487	368	368
query19	215	201	162	162
query20	145	134	126	126
query21	224	142	117	117
query22	13897	14791	14793	14791
query23	17986	17146	16649	16649
query23_1	16672	16711	16830	16711
query24	8039	2030	1491	1491
query24_1	1399	1392	1363	1363
query25	580	493	434	434
query26	1238	322	171	171
query27	2640	622	385	385
query28	5024	1885	1887	1885
query29	962	681	563	563
query30	297	237	197	197
query31	1096	1072	947	947
query32	87	79	69	69
query33	530	355	296	296
query34	1189	1197	682	682
query35	797	813	683	683
query36	1288	1242	1049	1049
query37	147	95	82	82
query38	3111	3028	2989	2989
query39	925	902	866	866
query39_1	828	844	845	844
query40	228	149	130	130
query41	61	58	57	57
query42	112	106	103	103
query43	308	317	277	277
query44	
query45	206	193	190	190
query46	1097	1231	836	836
query47	2365	2331	2213	2213
query48	422	421	311	311
query49	659	517	427	427
query50	744	279	223	223
query51	4323	4260	4198	4198
query52	110	106	99	99
query53	243	267	201	201
query54	319	290	259	259
query55	104	91	88	88
query56	304	313	291	291
query57	1692	1645	1746	1645
query58	297	273	271	271
query59	2862	3009	2707	2707
query60	325	322	329	322
query61	155	145	153	145
query62	679	618	579	579
query63	234	198	193	193
query64	5265	1406	1065	1065
query65	
query66	1464	525	366	366
query67	24365	24231	24187	24187
query68	
query69	448	343	305	305
query70	1030	1003	952	952
query71	309	276	277	276
query72	2911	2725	2175	2175
query73	860	767	456	456
query74	9895	9751	9573	9573
query75	2745	2608	2283	2283
query76	2319	1132	742	742
query77	398	434	332	332
query78	11272	11308	10785	10785
query79	1553	1064	825	825
query80	1363	573	509	509
query81	512	282	247	247
query82	1283	152	118	118
query83	354	289	259	259
query84	256	151	128	128
query85	911	504	449	449
query86	434	355	320	320
query87	3295	3201	3070	3070
query88	3564	2694	2715	2694
query89	458	394	351	351
query90	1976	179	174	174
query91	175	172	142	142
query92	80	72	68	68
query93	979	976	561	561
query94	708	326	297	297
query95	657	447	335	335
query96	1068	720	303	303
query97	2657	2668	2561	2561
query98	247	230	217	217
query99	1098	1072	983	983
Total cold run time: 258680 ms
Total hot run time: 178368 ms

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

PR approved by anyone and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 7, 2026
@gavinchou gavinchou merged commit d224c08 into apache:master Apr 7, 2026
30 of 33 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 7, 2026

PR approved by at least one committer and no changes requested.

github-actions Bot pushed a commit that referenced this pull request Apr 7, 2026
- Remove segmentCount, rowsetCount fields from CloudReplica, moving storage to base Replica class
- Remove memClusterToBackends in-memory cache and initMemClusterToBackends() method
- Simplifies multi-replica backend selection path in getBackendIdImpl()
iaorekhov-1980 pushed a commit to iaorekhov-1980/doris that referenced this pull request Apr 7, 2026
- Remove segmentCount, rowsetCount fields from CloudReplica, moving storage to base Replica class
- Remove memClusterToBackends in-memory cache and initMemClusterToBackends() method
- Simplifies multi-replica backend selection path in getBackendIdImpl()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants