Conversation
7f7dd67 to
f59fe4a
Compare
ea113ba to
ef6a033
Compare
lib/extractor/index.ts
Outdated
| export function removeWhiteoutPrefix(filename: string): string { | ||
| // Replace .wh. that appears at the start or after the last slash, | ||
| // and ensure no slashes come after .wh. | ||
| return filename.replace(/^(.*\/)?\.wh\.([^\/]*)$/, "$1$2"); |
There was a problem hiding this comment.
do you think this should be "+" instead? I guess we expect 0 times to be a bug in the generator (i.e., impossible) but we should still remove it?
There was a problem hiding this comment.
(The comment applied to the * in [^\/]*)
There was a problem hiding this comment.
I went with * intentionally to handle the degenerate case where someone has a file literally named .wh. with nothing after it. In practice I doubt we'd ever see it, but figured it's safer to handle it than to silently skip it.
Happy to switch to + if you feel strongly though.
lib/extractor/index.ts
Outdated
| */ | ||
| export function isWhitedOutFile(filename: string) { | ||
| return filename.match(/.wh./gm); | ||
| const lastSlashIndex = filename.lastIndexOf("/"); |
There was a problem hiding this comment.
this assumes unix path separator -- do we need to handle windows too?
There was a problem hiding this comment.
I just updated to use path.basename so it should account for this now - https://nodejs.org/api/path.html#windows-vs-posix
I didn't know about this function until you mentioned it, so let me know if I'm incorrect
There was a problem hiding this comment.
You're right, path.basename only splits on the platform's native separator, so on Linux it wouldn't handle backslashes. I extracted a getBasename helper that splits on both / and \ regardless of platform, and isWhitedOutFile and isOpaqueWhiteout both use it now
lib/extractor/index.ts
Outdated
| * https://www.madebymikal.com/interpreting-whiteout-files-in-docker-image-layers | ||
| * https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts | ||
| */ | ||
| export function isWhitedOutFile(filename: string) { |
There was a problem hiding this comment.
does this function also need to account for opaque whiteouts: https://github.com/opencontainers/image-spec/blob/main/layer.md#opaque-whiteout?
In other words, check that the filename is not .wh..wh..opq? Looking above, it's not clear our extractor logic even handles opaque whiteouts at all 🤔
There was a problem hiding this comment.
I think as long as it starts with .wh it should account for it, it looks like out logic checks if a file is or within any whited out file or directory, so if a directory starts with .wh, any files under it should be excluded. I can manually verify though
There was a problem hiding this comment.
Okay, I misunderstood how .wh..wh..opq worked, I'll fix this
There was a problem hiding this comment.
You were right, I didn't realize opaque means "nuke everything in this directory from lower layers." I've pushed a fix that adds isOpaqueWhiteout() detection and tracks opaque directories during layer merging so files from older layers in those dirs get properly excluded. Tests included
Refactor isWhitedOutFile to use cross-platform path.basename and update removeWhiteoutPrefix regex to support both POSIX and Windows path separators. This resolves a regression introduced by the initial whiteout regex fix where Windows-style paths would not be correctly identified or parsed.
…me handling - Handle .wh..wh..opq opaque whiteouts per OCI image spec: files in opaque-whiteout directories from older layers are now properly excluded - Replace path.basename with a custom getBasename helper that handles both / and \ separators regardless of platform, keeping isWhitedOutFile and isOpaqueWhiteout consistent with removeWhiteoutPrefix - Add tests for isOpaqueWhiteout
Opaque whiteouts should not affect files from the same layer — only files from older layers. Collecting opaque dirs in a per-layer set and merging after the inner loop prevents new files in the same layer from being incorrectly excluded.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
05f6f4a to
3e278f7
Compare
This comment has been minimized.
This comment has been minimized.
isFileInOpaqueDir was using path.dirname and path.sep directly, which breaks on Windows where path.sep is \ but Docker paths use /. Reuse the existing isFileInFolder helper which normalizes paths first — same pattern as isFileInARemovedFolder.
3e278f7 to
40159a2
Compare
PR Reviewer Guide 🔍
|
We were using
.wh.to look for whited out files, but we didn't escape the., so it was a match-any characterWe also weren't following the OpenContainer spec: https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts
.wh. needs to be the prefix of the filename. So it either needs to prefix everything (no slashes, just filename) or it needs to prefix the filename after the last slash (full path)
Our current implementation just looks for .wh. to appear anywhere, so we are incorrectly whiting out files. This includes libraries, so all of the node 'which' packages were being skipped, for example