Skip to content

regex: accept literal [ in bracket expressions#439

Open
kevinburke wants to merge 2 commits into
uutils:mainfrom
kevinburke:fix-literal-open-bracket-class
Open

regex: accept literal [ in bracket expressions#439
kevinburke wants to merge 2 commits into
uutils:mainfrom
kevinburke:fix-literal-open-bracket-class

Conversation

@kevinburke

Copy link
Copy Markdown

POSIX allows '[' to represent itself inside a bracket expression unless it starts a character class, collating symbol, or equivalence class construct. The parser was consuming the following character after a literal '[', which made '[[]' look unterminated, and the Rust regex backend also rejects unescaped literal '[' in classes.

Reproduce the compatibility gap by comparing these commands:

printf 'x\n' | ./target/debug/sed -E 's/[[]/X/'

printf 'x\n' | gsed -E 's/[[]/X/'

printf 'x\n' | ./target/debug/sed -E 's/[^[]/X/'

printf 'x\n' | gsed -E 's/[^[]/X/'

Before this change, uutils sed rejected the scripts as unterminated or invalid regexes while GNU sed parsed them cleanly. After this change the outputs match GNU sed: x, X, x, and X for the reported cases.

Leave the following character available for normal class parsing, then escape literal '[' characters inside parsed classes before compiling with the regex backend. Add parser, compiler, and command-level regressions for the failing sed -E substitution cases.

Observed this while trying to install Python 3.14.5 with Pyenv and uutils sed on the PATH, which failed.

@codecov

codecov Bot commented May 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.24561% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.40%. Comparing base (b90c599) to head (dd3ddc0).

Files with missing lines Patch % Lines
src/sed/compiler.rs 97.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #439      +/-   ##
==========================================
+ Coverage   83.12%   83.40%   +0.27%     
==========================================
  Files          13       13              
  Lines        5885     5994     +109     
  Branches      339      347       +8     
==========================================
+ Hits         4892     4999     +107     
- Misses        990      992       +2     
  Partials        3        3              
Flag Coverage Δ
macos_latest 84.06% <98.24%> (+0.26%) ⬆️
ubuntu_latest 84.14% <98.24%> (+0.26%) ⬆️
windows_latest 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevinburke kevinburke force-pushed the fix-literal-open-bracket-class branch from a5d4b13 to cdd35dd Compare May 23, 2026 05:19
@codspeed-hq

codspeed-hq Bot commented May 23, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 11 untouched benchmarks


Comparing kevinburke:fix-literal-open-bracket-class (dd3ddc0) with main (b90c599)

Open in CodSpeed

Comment thread src/sed/compiler.rs
result
}

fn escape_literal_open_brackets_in_classes(pattern: &str) -> String {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a rustdoc comment here :)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kevinburke kevinburke force-pushed the fix-literal-open-bracket-class branch from cdd35dd to 8979d08 Compare June 5, 2026 18:57
Comment thread src/sed/compiler.rs
result
}

/// Escape literal `[` characters that appear inside a bracket expression.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it is a bit long now :/

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, ok.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this

@kevinburke kevinburke force-pushed the fix-literal-open-bracket-class branch from 0d140be to 5dced76 Compare June 5, 2026 19:42
POSIX allows '[' to represent itself inside a bracket expression
unless it starts a character class, collating symbol, or equivalence
class construct. The parser was consuming the following character
after a literal '[', which made '[[]' look unterminated, and the Rust
regex backend also rejects unescaped literal '[' in classes.

Reproduce the compatibility gap by comparing these commands:

    printf 'x\n' | ./target/debug/sed -E 's/[[]/X/'

    printf 'x\n' | gsed -E 's/[[]/X/'

    printf 'x\n' | ./target/debug/sed -E 's/[^[]/X/'

    printf 'x\n' | gsed -E 's/[^[]/X/'

Before this change, uutils sed rejected the scripts as unterminated or
invalid regexes while GNU sed parsed them cleanly. After this change
the outputs match GNU sed: x, X, x, and X for the reported cases.

Leave the following character available for normal class parsing, then
escape literal '[' characters inside parsed classes before compiling
with the regex backend. Add parser, compiler, and command-level
regressions for the failing sed -E substitution cases.

Observed this while trying to install Python 3.14.5 with Pyenv and
uutils `sed` on the PATH, which failed.
@kevinburke kevinburke force-pushed the fix-literal-open-bracket-class branch from 5dced76 to dd3ddc0 Compare June 15, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants