[Enhancement] Add compatibility rewrites for common date and string functions#62583
[Enhancement] Add compatibility rewrites for common date and string functions#62583tonywasson wants to merge 2 commits intoapache:masterfrom
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Add SQL-visible Doris product version functions while keeping version() pinned to the MySQL compatibility string, and align @@version_comment with the richer Doris build identity.
### Release note
Expose current_version() and version_long() in FE SQL metadata while preserving version() compatibility.
### Check List (For Author)
- Test: Checkstyle and targeted FE test updates
- Manual test / No need to test (with reason)
- Behavior changed: Yes (adds new SQL functions and updates @@version_comment output)
- Does this need documentation: Yes
Add parser-level compatibility rewrites for portable date and string function spellings in Nereids.\n\nThe patch lowers date_part, dateadd, to_timestamp, to_char, strpos, regexp_substr, char_length, and split to existing Doris functions, and adds parser coverage for the new aliases.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Found blocking issues.
Critical Checkpoints
- Goal: Partially met. The PR aims to add compatibility rewrites and new version metadata functions, but
DATEADD(unit, amount, ts)regresses an already-supported syntax path andcurrent_version()/version_long()only work while FE constant folding stays enabled. The added tests do not prove the broken paths. - Scope: Mostly focused, but several parser rewrites are broader than needed because they intercept calls before normal function resolution.
- Concurrency: No concurrency-sensitive code involved.
- Lifecycle / static initialization: No special lifecycle issues found.
- Config: No new configuration items.
- Compatibility / parallel code paths: Not preserved. Existing
DATEADD(unit, amount, ts)handling inDatetimeFunctionBinderis bypassed, existing alias/UDF resolution for names likesplit/char_lengthis skipped, and the new version functions are not implemented on the non-folded FE -> BE execution path. - Special conditional checks: The new string-literal requirement for
dateaddis not justified because Doris already supports the identifier-unit form. - Test coverage: Insufficient. Added tests cover happy-path rewrites but miss
DATEADD(DAY, ...),prefer_udf_over_builtin=trueon already-supported aliases, anddebug_skip_fold_constant=truefor the new version functions. - Test result updates: No
.outupdates were needed; modified assertions do not cover the broken paths above. - Observability: No additional observability appears necessary.
- Transaction / persistence / data writes / FE-BE variable passing: Not applicable.
- Performance: No major hot-path issue found, but the parser rewrites unnecessarily bypass normal function resolution.
- Other issues:
to_timestampis not type-equivalent tostr_to_date; with a date-only format literal the rewrite yields a DATE instead of a timestamp/datetime value.
Blocking findings are annotated inline.
| return Optional.empty(); | ||
| } | ||
|
|
||
| String normalizedName = functionName.toLowerCase(Locale.ROOT); |
There was a problem hiding this comment.
These parser-level rewrites run before FunctionRegistry.findFunctionBuilder(), so names that were already supported aliases no longer participate in normal builtin-vs-UDF resolution. Doris explicitly supports prefer_udf_over_builtin=true, but after this change a user-defined split or char_length can never win because the parser rewrites the call to a different builtin name first. For existing aliases this is a behavior regression, not just additive compatibility.
| case "date_part": | ||
| return Optional.of(rewriteDatePart(params)); | ||
| case "dateadd": | ||
| return Optional.of(rewriteDateAdd(params)); |
There was a problem hiding this comment.
to_timestamp is being lowered to str_to_date, but those functions do not have the same type contract. StrToDate.computeSignature() returns DATEV2 when the format literal has no time part, so to_timestamp('2026-04-16', '%Y-%m-%d') will analyze as a DATE instead of a timestamp/datetime value. A compatibility alias needs to preserve the original return-type semantics, not just map to a roughly similar function name.
| Expression delta = params.get(1); | ||
| Expression source = params.get(2); | ||
| String normalizedUnit = unit.trim().toLowerCase(Locale.ROOT); | ||
|
|
There was a problem hiding this comment.
DATEADD was already supported in Nereids via DatetimeFunctionBinder with the existing DATEADD(unit, amount, ts) form (TIMESTAMP_ADD_FUNCTION_NAMES includes DATEADD, and DatetimeFunctionBinderTest#testTimestampAdd covers that path). This parser rewrite now intercepts every dateadd(...) call first and requires the unit to be a string literal, so previously valid inputs like DATEADD(DAY, 1, ts) are rejected before the binder can run. Please preserve the old binder path or make the rewrite accept both forms instead of narrowing the syntax.
| @@ -373,6 +378,11 @@ public Expression visitCurrentCatalog(CurrentCatalog currentCatalog, ExpressionR | |||
There was a problem hiding this comment.
current_version() and version_long() only get semantics here through FE constant folding, but unlike version() there is no BE/runtime implementation for these names. FoldConstantRule.evaluateOrThrow() returns the expression unchanged when debug_skip_fold_constant=true, and ExpressionTranslator.visitScalarFunction() will then emit a plain builtin FunctionCallExpr for BE execution. Since BE only registers version() (be/src/exprs/function/function_utility.cpp), the new functions can fail on that existing execution path. The new regression test only checks version() under debug_skip_fold_constant=true, so this break is currently uncovered.
What\nAdd Nereids parser compatibility rewrites for portable date and string function spellings.\n\n## Changes\n- date_part -> native unit extraction\n- dateadd -> *_add functions\n- to_timestamp -> str_to_date\n- to_char -> date_format\n- strpos -> locate\n- regexp_substr -> regexp_extract(..., 0)\n- char_length -> character_length\n- split -> split_by_string\n\n## Test\n- JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home mvn -pl fe-core -am test -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=org.apache.doris.nereids.parser.NereidsParserTest