py/tarslip: detect shutil.unpack_archive and subprocess tar extraction#21720
py/tarslip: detect shutil.unpack_archive and subprocess tar extraction#21720
Conversation
…urces and sinks - Add test cases for shutil.unpack_archive and subprocess.run(["tar", ...]) to tarslip.py - Add ShutilUnpackArchiveSource/Sink for shutil.unpack_archive calls with non-literal filenames - Add SubprocessTarExtractionSource/Sink for subprocess calls invoking tar with extraction flags - Update TarSlip.expected with expected test output for new cases Agent-Logs-Url: https://github.com/github/codeql/sessions/f31a3622-9b18-415f-85f1-62ec14a8319f Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
Use regexpMatch instead of matches to avoid false positives: - Command name: regexpMatch(\"(.*/)?tar\") to match only \"tar\" or paths ending in \"/tar\" - Extraction flag: regexpMatch(\"-[a-zA-Z]*x[a-zA-Z]*\") to match only single-dash flags containing x Agent-Logs-Url: https://github.com/github/codeql/sessions/f31a3622-9b18-415f-85f1-62ec14a8319f Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
Replace regexpMatch for command name with explicit equality check: - cmd = \"tar\" or cmd.matches(\"%/tar\") - clearly matches only exact \"tar\" or paths ending with \"/tar\" Keep flag check as regexpMatch since it naturally excludes double-dash flags Agent-Logs-Url: https://github.com/github/codeql/sessions/f31a3622-9b18-415f-85f1-62ec14a8319f Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
| * | ||
| * The archive filename is not hardcoded, so it may come from user input. | ||
| */ | ||
| class ShutilUnpackArchiveSource extends Source { |
There was a problem hiding this comment.
Remove (should not be a source).
There was a problem hiding this comment.
Done in 5084883 — ShutilUnpackArchiveSource removed.
| class ShutilUnpackArchiveSink extends Sink { | ||
| ShutilUnpackArchiveSink() { | ||
| this = API::moduleImport("shutil").getMember("unpack_archive").getACall() and | ||
| not this.(DataFlow::CallCfgNode).getArg(0).getALocalSource().asExpr() instanceof StringLiteral |
There was a problem hiding this comment.
Remove this line, not really needed.
There was a problem hiding this comment.
Done in 5084883 — changed this to call.getArg(0) (the filename argument), which makes the literal check redundant and removed it.
| * A call to `subprocess` functions that invokes `tar` for archive extraction, | ||
| * considered as a flow source. | ||
| */ | ||
| class SubprocessTarExtractionSource extends Source { |
There was a problem hiding this comment.
Remove (should not be a source).
There was a problem hiding this comment.
Done in 5084883 — SubprocessTarExtractionSource removed.
| * considered as a flow sink. | ||
| */ | ||
| class SubprocessTarExtractionSink extends Sink { | ||
| SubprocessTarExtractionSink() { isSubprocessTarExtraction(this) } |
There was a problem hiding this comment.
It should not be the call that is the sink, but instead the element in the command list that corresponds to the file to extract.
There was a problem hiding this comment.
Done in 5084883 — the sink is now cmdList.getElement(i) (the specific non-literal element in the command list, i.e. the archive filename), not the call node.
- Remove ShutilUnpackArchiveSource (should not be a source) - Change ShutilUnpackArchiveSink to target getArg(0) (the filename arg, not the whole call); removes the now-redundant literal check - Remove SubprocessTarExtractionSource (should not be a source) - Change SubprocessTarExtractionSink to target the specific non-literal element in the command list (the filename), not the call itself - Remove private isSubprocessTarExtraction predicate (inlined into the sink) - Revert TarSlip.expected to its pre-PR state (the new sinks require proper source taint flow to fire) Agent-Logs-Url: https://github.com/github/codeql/sessions/833673da-f868-4c3b-8bff-62364ee0ed19 Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
The
py/tarslipquery missed archive extraction viashutil.unpack_archiveand systemtarcommands invoked throughsubprocess, both common patterns in real-world code.Changes
New sinks in
TarSlipCustomizations.qllShutilUnpackArchiveSink: The first argument ofshutil.unpack_archive(filename, ...)is modelled as a sink. Tainted archive filenames flowing into this argument will be flagged.SubprocessTarExtractionSink: The specific non-literal element in a subprocess command list is modelled as a sink, when the command is"tar"(or a path ending in"/tar", e.g./usr/bin/tar) and an extraction flag is present (single-dash flags containingxsuch as-x,-xf,-xvf, or--extract). Detection operates at the CFG level usingSequenceNode.getElement(i)to inspect inline list literals.Test cases