Skip to content

[test](regression) Isolate local Hive regression side effects#62579

Draft
xylaaaaa wants to merge 5 commits intoapache:masterfrom
xylaaaaa:codex/hive-idempotent-pr
Draft

[test](regression) Isolate local Hive regression side effects#62579
xylaaaaa wants to merge 5 commits intoapache:masterfrom
xylaaaaa:codex/hive-idempotent-pr

Conversation

@xylaaaaa
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary:
This PR reduces shared local Hive Docker side effects across regression suites by moving many local-Hive write paths to per-run temporary Hive objects and best-effort cleanup. The goal is to make local Hive state safer for CI reuse and reruns.

Covered in this PR:

  • add shared helper methods in the regression suite framework for per-run temporary Hive object naming and cleanup
  • isolate many local Hive Docker write suites to temporary tables or databases
  • add cleanup in finally blocks so failed runs also attempt to recover local Hive state
  • avoid direct writes to several shared baseline Hive tables by copying data into temporary write targets first

Remaining follow-up after this PR:

  • regression-test/suites/external_table_p0/hive/ddl/test_hive_ctas.groovy
  • regression-test/suites/external_table_p0/hive/ddl/test_hive_ddl.groovy
  • regression-test/suites/external_table_p0/hive/test_hive_case_sensibility.groovy

Release note

None

Check List (For Author)

  • Test: No need to test (workflow preparation only; git diff --check was run in this environment)
  • Behavior changed: Yes (local Hive regression suites use temporary Hive objects and cleanup more aggressively)
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Several local Hive regression suites mutate shared Docker Hive state through fixed table names, fixed database names, or writes against shared baseline tables. That makes it unsafe to reuse a Hive Docker environment across multiple pipelines or failed reruns.

### Release note

None

### Check List (For Author)

- Test: No need to test (workflow preparation only; only git diff check was run in this environment)
- Behavior changed: Yes (regression suites now prefer per-run temporary Hive objects and best-effort cleanup)
- Does this need documentation: No
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 17, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@xylaaaaa
Copy link
Copy Markdown
Contributor Author

run buildall

The previous implementation included suite name and hivePrefix in the suffix,
causing redundant and overly long names like:
  hive2_test_partitions_catalog_test_hive_partitions_hive2_ec0f5f5905ff (68 chars)

New implementation generates cleaner names:
  hive2_test_partitions_catalog_ec0f5f5905ff (42 chars)

This fixes test failures where catalog names were too verbose.
@xylaaaaa
Copy link
Copy Markdown
Contributor Author

run buildall

Fix variable scope issue where catalog_name was defined inside try block
but accessed in finally block, causing MissingPropertyException.
@xylaaaaa
Copy link
Copy Markdown
Contributor Author

run buildall

- test_hive_ddl_text_format: Add database prefix to hive_docker queries
- test_hive_partition_values_tvf: Add cleanup for partitionValuesDb in finally block

These tests were failing because:
1. hive_docker queries need database.table format when using temp databases
2. Missing cleanup caused database not empty errors in subsequent runs
@xylaaaaa
Copy link
Copy Markdown
Contributor Author

run buildall

- test_hms_event_notification_multi_catalog: Add null checks for db1/db2 in finally block
- test_hive_use_meta_cache_true: Replace order_qt with assertions to handle dynamic temp names

These tests were failing because:
1. db1/db2 could be null if try block failed before assignment
2. order_qt expects fixed output but temp names are dynamic with UUID
@xylaaaaa
Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants