regex: accept literal [ in bracket expressions#439
Open
kevinburke wants to merge 2 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #439 +/- ##
==========================================
+ Coverage 83.12% 83.40% +0.27%
==========================================
Files 13 13
Lines 5885 5994 +109
Branches 339 347 +8
==========================================
+ Hits 4892 4999 +107
- Misses 990 992 +2
Partials 3 3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
a5d4b13 to
cdd35dd
Compare
sylvestre
reviewed
Jun 5, 2026
| result | ||
| } | ||
|
|
||
| fn escape_literal_open_brackets_in_classes(pattern: &str) -> String { |
Contributor
There was a problem hiding this comment.
please add a rustdoc comment here :)
cdd35dd to
8979d08
Compare
sylvestre
reviewed
Jun 5, 2026
| result | ||
| } | ||
|
|
||
| /// Escape literal `[` characters that appear inside a bracket expression. |
Contributor
There was a problem hiding this comment.
Well it is a bit long now :/
0d140be to
5dced76
Compare
POSIX allows '[' to represent itself inside a bracket expression
unless it starts a character class, collating symbol, or equivalence
class construct. The parser was consuming the following character
after a literal '[', which made '[[]' look unterminated, and the Rust
regex backend also rejects unescaped literal '[' in classes.
Reproduce the compatibility gap by comparing these commands:
printf 'x\n' | ./target/debug/sed -E 's/[[]/X/'
printf 'x\n' | gsed -E 's/[[]/X/'
printf 'x\n' | ./target/debug/sed -E 's/[^[]/X/'
printf 'x\n' | gsed -E 's/[^[]/X/'
Before this change, uutils sed rejected the scripts as unterminated or
invalid regexes while GNU sed parsed them cleanly. After this change
the outputs match GNU sed: x, X, x, and X for the reported cases.
Leave the following character available for normal class parsing, then
escape literal '[' characters inside parsed classes before compiling
with the regex backend. Add parser, compiler, and command-level
regressions for the failing sed -E substitution cases.
Observed this while trying to install Python 3.14.5 with Pyenv and
uutils `sed` on the PATH, which failed.
5dced76 to
dd3ddc0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
POSIX allows '[' to represent itself inside a bracket expression unless it starts a character class, collating symbol, or equivalence class construct. The parser was consuming the following character after a literal '[', which made '[[]' look unterminated, and the Rust regex backend also rejects unescaped literal '[' in classes.
Reproduce the compatibility gap by comparing these commands:
Before this change, uutils sed rejected the scripts as unterminated or invalid regexes while GNU sed parsed them cleanly. After this change the outputs match GNU sed: x, X, x, and X for the reported cases.
Leave the following character available for normal class parsing, then escape literal '[' characters inside parsed classes before compiling with the regex backend. Add parser, compiler, and command-level regressions for the failing sed -E substitution cases.
Observed this while trying to install Python 3.14.5 with Pyenv and uutils
sedon the PATH, which failed.