Skip to content

[opt](memory) Move schemaHash from Replica to LocalReplica#62122

Open
dataroaring wants to merge 3 commits intoapache:masterfrom
dataroaring:opt/replica-move-schemaHash
Open

[opt](memory) Move schemaHash from Replica to LocalReplica#62122
dataroaring wants to merge 3 commits intoapache:masterfrom
dataroaring:opt/replica-move-schemaHash

Conversation

@dataroaring
Copy link
Copy Markdown
Contributor

Summary

  • Move schemaHash field from base Replica class to LocalReplica, saving 4 bytes per CloudReplica instance
  • Base Replica.getSchemaHash() now returns -1 (stub), setSchemaHash() is a no-op
  • LocalReplica overrides both methods with actual storage
  • Follows the same pattern as dbId removal in [opt](memory) Remove unused dbId field from CloudReplica #62079

Details

Cloud storage paths don't use schema_hash, and callers like MetadataViewer already guard with getSchemaHash() != -1, so CloudReplica returning -1 is safe and correct.

Changes:

  • Replica.java: Remove schemaHash field; getSchemaHash() returns -1; setSchemaHash() is no-op; toString() uses getSchemaHash() for polymorphism
  • LocalReplica.java: Add schemaHash field; override getSchemaHash()/setSchemaHash(); set in constructor
  • CloudReplica.java: No changes needed (schemaHash param harmlessly ignored by super constructor)

Test plan

  • CloudReplica.getSchemaHash() returns -1
  • LocalReplica.getSchemaHash() returns actual value
  • MetadataViewer SCHEMA_ERROR check unaffected (guards with != -1)
  • FE unit tests pass (ReplicaTest)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 4, 2026 14:49
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 4, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Moves schemaHash storage from the base Replica class into LocalReplica to reduce per-instance memory usage for CloudReplica (which does not need schemaHash).

Changes:

  • Remove schemaHash field from Replica; make getSchemaHash() return -1 and setSchemaHash() a no-op in the base class.
  • Add schemaHash field to LocalReplica and override getSchemaHash() / setSchemaHash() to store and return the real value.
  • Update Replica.toString() to call getSchemaHash() to preserve polymorphic behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
fe/fe-core/src/main/java/org/apache/doris/catalog/Replica.java Removes schemaHash storage from the base type and stubs schema-hash accessors for non-local replicas.
fe/fe-core/src/main/java/org/apache/doris/catalog/LocalReplica.java Adds schemaHash storage and overrides accessors so local replicas still expose the actual schema hash.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 140 to 151
public long getVersion() {
return this.version;
}

public int getSchemaHash() {
return schemaHash;
return -1;
}

// for compatibility
public void setSchemaHash(int schemaHash) {
this.schemaHash = schemaHash;
// no-op in base class; overridden in LocalReplica
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replica schemaHash behavior has changed (base returns -1/no-op; LocalReplica stores the value). Please add/adjust unit tests (e.g., in ReplicaTest) to assert that LocalReplica.getSchemaHash() returns the constructor value and that a CloudReplica (or any non-LocalReplica Replica) reports -1, so regressions are caught during refactors.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 7f5e6d3. The existing test tried to instantiate abstract Replica directly — merged the base-class no-op assertion into the CloudReplica test, which inherits the same stubs. All three schemaHash polymorphism tests now compile and cover the requested scenarios.

— ThinkOps 🤖

@dataroaring
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 28801 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e391c659dca156919f176a2b16f79897be29820e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17848	3870	3751	3751
q2	q3	10689	872	607	607
q4	4672	459	357	357
q5	7440	1319	1156	1156
q6	194	168	137	137
q7	914	959	767	767
q8	9318	1464	1294	1294
q9	5481	5348	5245	5245
q10	6305	2035	1765	1765
q11	483	276	280	276
q12	643	409	288	288
q13	18047	2790	2135	2135
q14	292	289	259	259
q15	q16	895	871	781	781
q17	998	1194	834	834
q18	6479	5587	5509	5509
q19	1339	1252	1002	1002
q20	545	428	294	294
q21	5673	2490	1989	1989
q22	482	412	355	355
Total cold run time: 98737 ms
Total hot run time: 28801 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4659	4423	4322	4322
q2	q3	4603	4765	4222	4222
q4	2010	2191	1336	1336
q5	4899	4993	5185	4993
q6	217	170	132	132
q7	2028	1792	1668	1668
q8	3495	3073	3032	3032
q9	8344	8597	8274	8274
q10	4418	4457	4195	4195
q11	571	425	402	402
q12	864	710	478	478
q13	2667	3219	2476	2476
q14	339	307	284	284
q15	q16	775	782	684	684
q17	1309	1258	1230	1230
q18	7934	6947	6842	6842
q19	1151	1151	1154	1151
q20	2215	2209	1911	1911
q21	5933	5400	4742	4742
q22	550	536	442	442
Total cold run time: 58981 ms
Total hot run time: 52816 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 178496 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e391c659dca156919f176a2b16f79897be29820e, data reload: false

query5	4324	660	518	518
query6	338	232	205	205
query7	4220	582	330	330
query8	339	246	237	237
query9	8751	3891	3896	3891
query10	458	376	308	308
query11	6865	5497	5119	5119
query12	190	142	129	129
query13	1286	660	442	442
query14	5724	5156	4808	4808
query14_1	4098	4130	4141	4130
query15	218	205	188	188
query16	1020	446	442	442
query17	1162	776	651	651
query18	2724	500	386	386
query19	234	211	182	182
query20	144	131	133	131
query21	222	144	118	118
query22	13983	15406	14780	14780
query23	18125	17597	16643	16643
query23_1	16643	16746	16670	16670
query24	8259	1772	1353	1353
query24_1	1351	1372	1404	1372
query25	667	503	439	439
query26	1268	321	187	187
query27	2614	652	381	381
query28	4409	1851	1864	1851
query29	980	677	559	559
query30	300	235	190	190
query31	1103	1052	939	939
query32	97	74	71	71
query33	539	350	289	289
query34	1202	1149	672	672
query35	746	765	658	658
query36	1215	1217	1064	1064
query37	154	94	84	84
query38	3109	3067	2995	2995
query39	911	894	854	854
query39_1	828	841	833	833
query40	231	148	132	132
query41	64	60	96	60
query42	111	106	108	106
query43	317	322	287	287
query44	
query45	203	201	188	188
query46	1097	1272	787	787
query47	2319	2350	2206	2206
query48	404	432	294	294
query49	649	543	444	444
query50	766	292	222	222
query51	4332	4266	4223	4223
query52	113	109	99	99
query53	255	288	214	214
query54	318	276	290	276
query55	100	95	90	90
query56	300	319	309	309
query57	1780	1754	1658	1658
query58	299	270	276	270
query59	2887	2971	2706	2706
query60	341	334	324	324
query61	162	155	165	155
query62	685	632	575	575
query63	243	195	195	195
query64	5297	1362	956	956
query65	
query66	1446	470	415	415
query67	24189	24246	24120	24120
query68	
query69	445	351	318	318
query70	1003	990	962	962
query71	320	284	274	274
query72	2887	2684	2397	2397
query73	855	835	451	451
query74	9830	9748	9523	9523
query75	2736	2624	2300	2300
query76	2305	1104	761	761
query77	404	395	348	348
query78	11294	11307	10767	10767
query79	3367	1146	795	795
query80	1832	584	486	486
query81	543	274	242	242
query82	917	158	119	119
query83	334	289	275	275
query84	271	147	124	124
query85	926	490	451	451
query86	441	344	326	326
query87	3289	3186	3094	3094
query88	3652	2727	2715	2715
query89	446	384	346	346
query90	1990	169	180	169
query91	182	170	144	144
query92	79	73	72	72
query93	1355	965	559	559
query94	720	304	284	284
query95	656	363	350	350
query96	1024	748	334	334
query97	2679	2689	2574	2574
query98	248	232	234	232
query99	1097	1063	980	980
Total cold run time: 261334 ms
Total hot run time: 178496 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 57.14% (4/7) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking findings.

Critical checkpoints:

  • Goal: The change is narrowly aimed at removing schemaHash storage from the base Replica object while preserving real schema-hash state for LocalReplica. Reviewing Replica.java, LocalReplica.java, CloudReplica.java, MetadataViewer.java, LocalTabletInvertedIndex.java, EnvFactory.java, CloudEnvFactory.java, OlapTable.java, CloudInternalCatalog.java, CloudRestoreJob.java, and GsonUtils.java, the goal is met. LocalReplica still stores and returns the real value; cloud replicas now consistently expose -1 through the base implementation.
  • Minimality: The patch is small and focused to the two touched classes. No extra protocol, config, or storage-format changes were introduced.
  • Concurrency: No new concurrent state, lock ordering, or thread-safety concerns were introduced.
  • Lifecycle / static initialization: No special lifecycle or static-initialization risk is involved.
  • Config: No configuration changes.
  • Compatibility: Safe. schemaHash was not part of Replica's Gson-serialized state before this patch, so moving the live field from Replica to LocalReplica does not change FE image / replay payloads. No FE-BE protocol compatibility issue was introduced.
  • Parallel paths: I checked both local and cloud creation paths. Local-mode paths still populate schema hash in LocalReplica; cloud-mode paths do not depend on per-replica schema-hash state.
  • Conditional checks: Existing replica.getSchemaHash() != -1 guards in MetadataViewer remain correct.
  • Test coverage: No targeted test was added for the new polymorphic behavior, so there is still a small coverage gap around LocalReplica.get/setSchemaHash() versus the cloud sentinel behavior.
  • Observability: Cloud proc / debug paths that print replica.getSchemaHash() will now consistently show -1. That is acceptable because schema hash is intentionally not meaningful for CloudReplica, but it is a small user-visible diagnostic change.
  • Transaction / persistence / data-write correctness: Not involved.
  • FE-BE variable passing: Not involved.
  • Performance: The intended memory optimization is plausible and I did not find any obvious hot-path regression.
  • Other issues: None found.

Overall opinion: this looks good to merge.

@dataroaring
Copy link
Copy Markdown
Contributor Author

Added unit tests for the schemaHash polymorphism in commit 8711734:

  • testLocalReplicaGetSchemaHash: verifies LocalReplica.getSchemaHash() returns the constructor value
  • testLocalReplicaSetSchemaHash: verifies setSchemaHash() updates the stored value
  • testBaseReplicaSchemaHashReturnsNegativeOne: verifies base Replica returns -1 and setSchemaHash() is a no-op
  • testCloudReplicaSchemaHashReturnsNegativeOne: verifies CloudReplica inherits the -1 stub (schemaHash param harmlessly ignored)

This addresses the review feedback requesting regression tests for the schemaHash behavior change.

— ThinkOps 🤖

@dataroaring
Copy link
Copy Markdown
Contributor Author

run buildall

2 similar comments
@dataroaring
Copy link
Copy Markdown
Contributor Author

run buildall

@dataroaring
Copy link
Copy Markdown
Contributor Author

run buildall

@dataroaring
Copy link
Copy Markdown
Contributor Author

Fixed the compile error in ReplicaTest.java — the testBaseReplicaSchemaHashReturnsNegativeOne test tried to instantiate the abstract Replica class directly (new Replica()), which doesn't compile since Replica has abstract method getBackendId().

Merged the base-class no-op setSchemaHash() assertion into the existing testCloudReplicaSchemaHashReturnsNegativeOne test. Since CloudReplica extends Replica without overriding getSchemaHash()/setSchemaHash(), it exercises the exact same base-class code paths.

The final test coverage for the Copilot-requested schemaHash polymorphism:

  • testLocalReplicaGetSchemaHash — constructor value is stored and returned ✅
  • testLocalReplicaSetSchemaHash — setter updates the value ✅
  • testCloudReplicaSchemaHashReturnsNegativeOne — inherits base -1 stub + setSchemaHash is no-op ✅

— ThinkOps 🤖

@dataroaring
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (7/7) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

/review

dataroaring and others added 3 commits April 17, 2026 05:00
schemaHash in the base Replica class wastes 4 bytes per CloudReplica.
Cloud storage paths don't use schema_hash, and callers like
MetadataViewer already guard with getSchemaHash() != -1.

This follows the same pattern as dbId removal in apache#62079.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add four tests to ReplicaTest covering the schemaHash behavior change:
- testLocalReplicaGetSchemaHash: constructor value is returned
- testLocalReplicaSetSchemaHash: setter updates the value
- testBaseReplicaSchemaHashReturnsNegativeOne: base class returns -1, setter is no-op
- testCloudReplicaSchemaHashReturnsNegativeOne: CloudReplica inherits -1 stub

Generated by ThinkOps
…ract Replica

The testBaseReplicaSchemaHashReturnsNegativeOne test tried to instantiate
abstract class Replica directly, causing a compile error. Merged the
no-op setSchemaHash assertion into the existing CloudReplica test, which
exercises the same base-class code paths since CloudReplica inherits
getSchemaHash()/setSchemaHash() without overriding them.

Generated by ThinkOps
@dataroaring dataroaring force-pushed the opt/replica-move-schemaHash branch from 7f5e6d3 to 6083321 Compare April 17, 2026 12:00
@dataroaring
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉
Increment coverage report
Complete coverage report

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24563880330

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (7/7) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants