Skip to content

fix: CN-272 whiteout regex bug#690

Open
parker-snyk wants to merge 23 commits intomainfrom
CN-272-whiteout-regex-bug
Open

fix: CN-272 whiteout regex bug#690
parker-snyk wants to merge 23 commits intomainfrom
CN-272-whiteout-regex-bug

Conversation

@parker-snyk
Copy link
Copy Markdown
Contributor

@parker-snyk parker-snyk commented Aug 8, 2025

We were using .wh. to look for whited out files, but we didn't escape the ., so it was a match-any character

We also weren't following the OpenContainer spec: https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts

.wh. needs to be the prefix of the filename. So it either needs to prefix everything (no slashes, just filename) or it needs to prefix the filename after the last slash (full path)

Our current implementation just looks for .wh. to appear anywhere, so we are incorrectly whiting out files. This includes libraries, so all of the node 'which' packages were being skipped, for example

@parker-snyk parker-snyk force-pushed the CN-272-whiteout-regex-bug branch 3 times, most recently from 7f7dd67 to f59fe4a Compare August 8, 2025 19:30
@parker-snyk parker-snyk marked this pull request as ready for review August 8, 2025 19:34
@parker-snyk parker-snyk requested a review from a team as a code owner August 8, 2025 19:34
@parker-snyk parker-snyk enabled auto-merge (squash) August 8, 2025 19:35
@parker-snyk parker-snyk disabled auto-merge August 8, 2025 20:37
This was referenced Aug 11, 2025
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@parker-snyk parker-snyk force-pushed the CN-272-whiteout-regex-bug branch from ea113ba to ef6a033 Compare August 18, 2025 18:18
@parker-snyk parker-snyk requested a review from bgardiner August 19, 2025 13:37
export function removeWhiteoutPrefix(filename: string): string {
// Replace .wh. that appears at the start or after the last slash,
// and ensure no slashes come after .wh.
return filename.replace(/^(.*\/)?\.wh\.([^\/]*)$/, "$1$2");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think this should be "+" instead? I guess we expect 0 times to be a bug in the generator (i.e., impossible) but we should still remove it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The comment applied to the * in [^\/]*)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with * intentionally to handle the degenerate case where someone has a file literally named .wh. with nothing after it. In practice I doubt we'd ever see it, but figured it's safer to handle it than to silently skip it.

Happy to switch to + if you feel strongly though.

*/
export function isWhitedOutFile(filename: string) {
return filename.match(/.wh./gm);
const lastSlashIndex = filename.lastIndexOf("/");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes unix path separator -- do we need to handle windows too?

Copy link
Copy Markdown
Contributor Author

@parker-snyk parker-snyk Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated to use path.basename so it should account for this now - https://nodejs.org/api/path.html#windows-vs-posix
I didn't know about this function until you mentioned it, so let me know if I'm incorrect

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, path.basename only splits on the platform's native separator, so on Linux it wouldn't handle backslashes. I extracted a getBasename helper that splits on both / and \ regardless of platform, and isWhitedOutFile and isOpaqueWhiteout both use it now

* https://www.madebymikal.com/interpreting-whiteout-files-in-docker-image-layers
* https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts
*/
export function isWhitedOutFile(filename: string) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this function also need to account for opaque whiteouts: https://github.com/opencontainers/image-spec/blob/main/layer.md#opaque-whiteout?

In other words, check that the filename is not .wh..wh..opq? Looking above, it's not clear our extractor logic even handles opaque whiteouts at all 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as it starts with .wh it should account for it, it looks like out logic checks if a file is or within any whited out file or directory, so if a directory starts with .wh, any files under it should be excluded. I can manually verify though

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I misunderstood how .wh..wh..opq worked, I'll fix this

Copy link
Copy Markdown
Contributor Author

@parker-snyk parker-snyk Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right, I didn't realize opaque means "nuke everything in this directory from lower layers." I've pushed a fix that adds isOpaqueWhiteout() detection and tracks opaque directories during layer merging so files from older layers in those dirs get properly excluded. Tests included

@parker-snyk parker-snyk marked this pull request as draft November 7, 2025 20:45
Refactor isWhitedOutFile to use cross-platform path.basename and update removeWhiteoutPrefix regex to support both POSIX and Windows path separators. This resolves a regression introduced by the initial whiteout regex fix where Windows-style paths would not be correctly identified or parsed.
…me handling

- Handle .wh..wh..opq opaque whiteouts per OCI image spec: files in
  opaque-whiteout directories from older layers are now properly excluded
- Replace path.basename with a custom getBasename helper that handles
  both / and \ separators regardless of platform, keeping isWhitedOutFile
  and isOpaqueWhiteout consistent with removeWhiteoutPrefix
- Add tests for isOpaqueWhiteout
Opaque whiteouts should not affect files from the same layer — only
files from older layers. Collecting opaque dirs in a per-layer set and
merging after the inner loop prevents new files in the same layer from
being incorrectly excluded.
@parker-snyk parker-snyk marked this pull request as ready for review April 6, 2026 20:34
@snyk-pr-review-bot

This comment has been minimized.

@snyk-pr-review-bot

This comment has been minimized.

@parker-snyk parker-snyk force-pushed the CN-272-whiteout-regex-bug branch from 05f6f4a to 3e278f7 Compare April 6, 2026 20:54
@snyk-pr-review-bot

This comment has been minimized.

isFileInOpaqueDir was using path.dirname and path.sep directly, which
breaks on Windows where path.sep is \ but Docker paths use /. Reuse
the existing isFileInFolder helper which normalizes paths first —
same pattern as isFileInARemovedFolder.
@parker-snyk parker-snyk force-pushed the CN-272-whiteout-regex-bug branch from 3e278f7 to 40159a2 Compare April 6, 2026 21:01
@parker-snyk parker-snyk requested a review from bgardiner April 6, 2026 21:01
@snyk-pr-review-bot
Copy link
Copy Markdown

PR Reviewer Guide 🔍

🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Platform Incompatibility 🟠 [major]

The isFileInFolder function (used via isFileInOpaqueDir) uses path.sep and path.join. Since container image layers always use POSIX paths (/), this logic will fail on Windows hosts because path.join will insert backslashes (\), making startsWith return false for all nested files. The code should use a hardcoded / or path.posix for image path manipulation.

isFileInFolder(filename, opaqueDir),
Regex Logic Error 🟡 [minor]

The regex in removeWhiteoutPrefix uses ^(.*[\/\\])?, where the ? makes the path prefix optional. However, the subsequent \.wh\. expects a literal dot. In the root directory (e.g., .wh.file), the first capture group $1 is empty. The replacement $1$2 will work for root files, but the unit tests for root files (.wh.hosts -> hosts) only pass because the regex matches; if the path contains multiple .wh. sequences, the greedy .* might behave unexpectedly compared to the standard getBasename logic used in detection.

return filename.replace(/^(.*[\/\\])?\.wh\.([^\/\\]*)$/, "$1$2");
📚 Repository Context Analyzed

This review considered 18 relevant code sections from 14 files (average relevance: 0.96)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants