Skip to content

py/tarslip: detect shutil.unpack_archive and subprocess tar extraction#21720

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/fix-shutil-unpack-archive-detection
Draft

py/tarslip: detect shutil.unpack_archive and subprocess tar extraction#21720
Copilot wants to merge 5 commits intomainfrom
copilot/fix-shutil-unpack-archive-detection

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 16, 2026

The py/tarslip query missed archive extraction via shutil.unpack_archive and system tar commands invoked through subprocess, both common patterns in real-world code.

Changes

New sinks in TarSlipCustomizations.qll

  • ShutilUnpackArchiveSink: The first argument of shutil.unpack_archive(filename, ...) is modelled as a sink. Tainted archive filenames flowing into this argument will be flagged.

  • SubprocessTarExtractionSink: The specific non-literal element in a subprocess command list is modelled as a sink, when the command is "tar" (or a path ending in "/tar", e.g. /usr/bin/tar) and an extraction flag is present (single-dash flags containing x such as -x, -xf, -xvf, or --extract). Detection operates at the CFG level using SequenceNode.getElement(i) to inspect inline list literals.

Test cases

shutil.unpack_archive(unsafe_filename_tar, "out")          # sink: first argument
shutil.unpack_archive("safe.tar", "out")                   # not a sink (hardcoded literal)

subprocess.run(["tar", "-xf", unsafe_filename_tar])        # sink: unsafe_filename_tar element
subprocess.check_call(["tar", "-xf", unsafe_filename_tar]) # sink: unsafe_filename_tar element
subprocess.run(["tar", "-xf", "safe.tar"])                 # not a sink (all literal)
subprocess.run(["echo", unsafe_filename_tar])              # not a sink (not a tar extraction)

Copilot AI and others added 3 commits April 16, 2026 08:34
…urces and sinks

- Add test cases for shutil.unpack_archive and subprocess.run(["tar", ...]) to tarslip.py
- Add ShutilUnpackArchiveSource/Sink for shutil.unpack_archive calls with non-literal filenames
- Add SubprocessTarExtractionSource/Sink for subprocess calls invoking tar with extraction flags
- Update TarSlip.expected with expected test output for new cases

Agent-Logs-Url: https://github.com/github/codeql/sessions/f31a3622-9b18-415f-85f1-62ec14a8319f

Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
Use regexpMatch instead of matches to avoid false positives:
- Command name: regexpMatch(\"(.*/)?tar\") to match only \"tar\" or paths ending in \"/tar\"
- Extraction flag: regexpMatch(\"-[a-zA-Z]*x[a-zA-Z]*\") to match only single-dash flags containing x

Agent-Logs-Url: https://github.com/github/codeql/sessions/f31a3622-9b18-415f-85f1-62ec14a8319f

Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
Replace regexpMatch for command name with explicit equality check:
- cmd = \"tar\" or cmd.matches(\"%/tar\") - clearly matches only exact \"tar\" or paths ending with \"/tar\"
Keep flag check as regexpMatch since it naturally excludes double-dash flags

Agent-Logs-Url: https://github.com/github/codeql/sessions/f31a3622-9b18-415f-85f1-62ec14a8319f

Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix false negative on shutil.unpack_archive in detection rule py/tarslip: detect shutil.unpack_archive and subprocess tar extraction Apr 16, 2026
Copilot AI requested a review from hvitved April 16, 2026 08:44
*
* The archive filename is not hardcoded, so it may come from user input.
*/
class ShutilUnpackArchiveSource extends Source {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove (should not be a source).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5084883ShutilUnpackArchiveSource removed.

class ShutilUnpackArchiveSink extends Sink {
ShutilUnpackArchiveSink() {
this = API::moduleImport("shutil").getMember("unpack_archive").getACall() and
not this.(DataFlow::CallCfgNode).getArg(0).getALocalSource().asExpr() instanceof StringLiteral
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line, not really needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5084883 — changed this to call.getArg(0) (the filename argument), which makes the literal check redundant and removed it.

* A call to `subprocess` functions that invokes `tar` for archive extraction,
* considered as a flow source.
*/
class SubprocessTarExtractionSource extends Source {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove (should not be a source).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5084883SubprocessTarExtractionSource removed.

* considered as a flow sink.
*/
class SubprocessTarExtractionSink extends Sink {
SubprocessTarExtractionSink() { isSubprocessTarExtraction(this) }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be the call that is the sink, but instead the element in the command list that corresponds to the file to extract.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5084883 — the sink is now cmdList.getElement(i) (the specific non-literal element in the command list, i.e. the archive filename), not the call node.

- Remove ShutilUnpackArchiveSource (should not be a source)
- Change ShutilUnpackArchiveSink to target getArg(0) (the filename arg, not the whole call); removes the now-redundant literal check
- Remove SubprocessTarExtractionSource (should not be a source)
- Change SubprocessTarExtractionSink to target the specific non-literal element in the command list (the filename), not the call itself
- Remove private isSubprocessTarExtraction predicate (inlined into the sink)
- Revert TarSlip.expected to its pre-PR state (the new sinks require proper source taint flow to fire)

Agent-Logs-Url: https://github.com/github/codeql/sessions/833673da-f868-4c3b-8bff-62364ee0ed19

Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
Copilot AI requested a review from hvitved April 16, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

False negative: unzip using subprocess or shutil.unpack_archive is not covered in py/tarslip

2 participants