Skip to content

[Enhancement] Add compatibility rewrites for common date and string functions#62583

Open
tonywasson wants to merge 2 commits intoapache:masterfrom
tonywasson:codex/compat-functions-pr
Open

[Enhancement] Add compatibility rewrites for common date and string functions#62583
tonywasson wants to merge 2 commits intoapache:masterfrom
tonywasson:codex/compat-functions-pr

Conversation

@tonywasson
Copy link
Copy Markdown

What\nAdd Nereids parser compatibility rewrites for portable date and string function spellings.\n\n## Changes\n- date_part -> native unit extraction\n- dateadd -> *_add functions\n- to_timestamp -> str_to_date\n- to_char -> date_format\n- strpos -> locate\n- regexp_substr -> regexp_extract(..., 0)\n- char_length -> character_length\n- split -> split_by_string\n\n## Test\n- JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home mvn -pl fe-core -am test -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=org.apache.doris.nereids.parser.NereidsParserTest

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Add SQL-visible Doris product version functions while keeping version() pinned to the MySQL compatibility string, and align @@version_comment with the richer Doris build identity.

### Release note

Expose current_version() and version_long() in FE SQL metadata while preserving version() compatibility.

### Check List (For Author)

- Test: Checkstyle and targeted FE test updates
    - Manual test / No need to test (with reason)
- Behavior changed: Yes (adds new SQL functions and updates @@version_comment output)
- Does this need documentation: Yes
Add parser-level compatibility rewrites for portable date and string function spellings in Nereids.\n\nThe patch lowers date_part, dateadd, to_timestamp, to_char, strpos, regexp_substr, char_length, and split to existing Doris functions, and adds parser coverage for the new aliases.
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 17, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@924060929
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found blocking issues.

Critical Checkpoints

  • Goal: Partially met. The PR aims to add compatibility rewrites and new version metadata functions, but DATEADD(unit, amount, ts) regresses an already-supported syntax path and current_version() / version_long() only work while FE constant folding stays enabled. The added tests do not prove the broken paths.
  • Scope: Mostly focused, but several parser rewrites are broader than needed because they intercept calls before normal function resolution.
  • Concurrency: No concurrency-sensitive code involved.
  • Lifecycle / static initialization: No special lifecycle issues found.
  • Config: No new configuration items.
  • Compatibility / parallel code paths: Not preserved. Existing DATEADD(unit, amount, ts) handling in DatetimeFunctionBinder is bypassed, existing alias/UDF resolution for names like split / char_length is skipped, and the new version functions are not implemented on the non-folded FE -> BE execution path.
  • Special conditional checks: The new string-literal requirement for dateadd is not justified because Doris already supports the identifier-unit form.
  • Test coverage: Insufficient. Added tests cover happy-path rewrites but miss DATEADD(DAY, ...), prefer_udf_over_builtin=true on already-supported aliases, and debug_skip_fold_constant=true for the new version functions.
  • Test result updates: No .out updates were needed; modified assertions do not cover the broken paths above.
  • Observability: No additional observability appears necessary.
  • Transaction / persistence / data writes / FE-BE variable passing: Not applicable.
  • Performance: No major hot-path issue found, but the parser rewrites unnecessarily bypass normal function resolution.
  • Other issues: to_timestamp is not type-equivalent to str_to_date; with a date-only format literal the rewrite yields a DATE instead of a timestamp/datetime value.

Blocking findings are annotated inline.

return Optional.empty();
}

String normalizedName = functionName.toLowerCase(Locale.ROOT);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parser-level rewrites run before FunctionRegistry.findFunctionBuilder(), so names that were already supported aliases no longer participate in normal builtin-vs-UDF resolution. Doris explicitly supports prefer_udf_over_builtin=true, but after this change a user-defined split or char_length can never win because the parser rewrites the call to a different builtin name first. For existing aliases this is a behavior regression, not just additive compatibility.

case "date_part":
return Optional.of(rewriteDatePart(params));
case "dateadd":
return Optional.of(rewriteDateAdd(params));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_timestamp is being lowered to str_to_date, but those functions do not have the same type contract. StrToDate.computeSignature() returns DATEV2 when the format literal has no time part, so to_timestamp('2026-04-16', '%Y-%m-%d') will analyze as a DATE instead of a timestamp/datetime value. A compatibility alias needs to preserve the original return-type semantics, not just map to a roughly similar function name.

Expression delta = params.get(1);
Expression source = params.get(2);
String normalizedUnit = unit.trim().toLowerCase(Locale.ROOT);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DATEADD was already supported in Nereids via DatetimeFunctionBinder with the existing DATEADD(unit, amount, ts) form (TIMESTAMP_ADD_FUNCTION_NAMES includes DATEADD, and DatetimeFunctionBinderTest#testTimestampAdd covers that path). This parser rewrite now intercepts every dateadd(...) call first and requires the unit to be a string literal, so previously valid inputs like DATEADD(DAY, 1, ts) are rejected before the binder can run. Please preserve the old binder path or make the rewrite accept both forms instead of narrowing the syntax.

@@ -373,6 +378,11 @@ public Expression visitCurrentCatalog(CurrentCatalog currentCatalog, ExpressionR
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_version() and version_long() only get semantics here through FE constant folding, but unlike version() there is no BE/runtime implementation for these names. FoldConstantRule.evaluateOrThrow() returns the expression unchanged when debug_skip_fold_constant=true, and ExpressionTranslator.visitScalarFunction() will then emit a plain builtin FunctionCallExpr for BE execution. Since BE only registers version() (be/src/exprs/function/function_utility.cpp), the new functions can fail on that existing execution path. The new regression test only checks version() under debug_skip_fold_constant=true, so this break is currently uncovered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants