gh-148653: Fix SIGSEGV in marshal.loads via self-referencing containers#148652
Open
mjbommar wants to merge 2 commits intopython:mainfrom
Open
gh-148653: Fix SIGSEGV in marshal.loads via self-referencing containers#148652mjbommar wants to merge 2 commits intopython:mainfrom
mjbommar wants to merge 2 commits intopython:mainfrom
Conversation
…gv-g5p9-9qqq)
TYPE_TUPLE, TYPE_LIST, TYPE_DICT, and TYPE_SET used R_REF() to register
containers in p->refs immediately after allocation, before populating
their slots. A crafted payload containing a TYPE_REF back-reference to
the partial container could reach a hashing or iteration site with NULL
slots, causing tuplehash/PyObject_Hash(NULL) -> SIGSEGV.
Fix: use the existing two-phase r_ref_reserve()/r_ref_insert() pattern
(already used by TYPE_FROZENSET, TYPE_CODE, and TYPE_SLICE) for all
four container types. r_ref_reserve() places Py_None as a placeholder
in p->refs; the TYPE_REF handler (line 1675) already detects Py_None
and raises ValueError("bad marshal data (invalid reference)"). After
the container is fully populated, r_ref_insert() replaces the
placeholder with the real object.
Includes regression tests for tuple, list, set, and dict self-reference
payloads.
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Author
|
Trivial self-reference is broken. Pushing fix in a moment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR summary
Fix a deterministic SIGSEGV in
marshal.loads()caused byTYPE_TUPLE,TYPE_LIST,TYPE_DICT, andTYPE_SETregistering containers inp->refsviaR_REF()before their slots are populated. A crafted payload with aTYPE_REFback-reference to the partial container can reach a hashing site (PySet_Addcallingtuplehash), which callsPyObject_Hash(NULL)on unfilled slots.The fix adopts the existing two-phase
r_ref_reserve()/r_ref_insert()pattern already used byTYPE_FROZENSET,TYPE_CODE, andTYPE_SLICE. ThePy_Noneplaceholder inp->refsis detected by the existingTYPE_REFhandler at line 1675, which raisesValueError("bad marshal data (invalid reference)")instead of crashing.16-byte reproducer:
Includes regression tests for tuple, list, set, and dict self-reference payloads.
Originally filed as GHSA-m7gv-g5p9-9qqq; PSRT assessed as outside the security threat model (marshal.loads is documented as not secure against malicious data). Converting to public issue + PR per their guidance.
First time committer introduction
I use Python extensively in data science and legal tech. I found this while building an automated vulnerability scanner seeded from prior CPython CVE fix diffs (CVE-2018-20406, CVE-2022-48560). Happy to iterate on the patch.
AI Disclosure
Claude Code (Anthropic) assisted with grepping the codebase, drafting the byte-stream reproducer, and constructing the regression tests. I reviewed and understand all code changes. The fix follows the existing
r_ref_reserve/r_ref_inserttwo-phase pattern already used by TYPE_FROZENSET/TYPE_CODE/TYPE_SLICE in the same file.