Skip to content

gh-148653: Fix SIGSEGV in marshal.loads via self-referencing containers#148652

Open
mjbommar wants to merge 2 commits intopython:mainfrom
mjbommar:marshal-self-ref-fix
Open

gh-148653: Fix SIGSEGV in marshal.loads via self-referencing containers#148652
mjbommar wants to merge 2 commits intopython:mainfrom
mjbommar:marshal-self-ref-fix

Conversation

@mjbommar
Copy link
Copy Markdown

@mjbommar mjbommar commented Apr 16, 2026

PR summary

Fix a deterministic SIGSEGV in marshal.loads() caused by TYPE_TUPLE, TYPE_LIST, TYPE_DICT, and TYPE_SET registering containers in p->refs via R_REF() before their slots are populated. A crafted payload with a TYPE_REF back-reference to the partial container can reach a hashing site (PySet_Add calling tuplehash), which calls PyObject_Hash(NULL) on unfilled slots.

The fix adopts the existing two-phase r_ref_reserve() / r_ref_insert() pattern already used by TYPE_FROZENSET, TYPE_CODE, and TYPE_SLICE. The Py_None placeholder in p->refs is detected by the existing TYPE_REF handler at line 1675, which raises ValueError("bad marshal data (invalid reference)") instead of crashing.

16-byte reproducer:

import marshal
marshal.loads(b'\xa8\x02\x00\x00\x00N<\x01\x00\x00\x00r\x00\x00\x00\x00')

Includes regression tests for tuple, list, set, and dict self-reference payloads.

Originally filed as GHSA-m7gv-g5p9-9qqq; PSRT assessed as outside the security threat model (marshal.loads is documented as not secure against malicious data). Converting to public issue + PR per their guidance.

First time committer introduction

I use Python extensively in data science and legal tech. I found this while building an automated vulnerability scanner seeded from prior CPython CVE fix diffs (CVE-2018-20406, CVE-2022-48560). Happy to iterate on the patch.

AI Disclosure

Claude Code (Anthropic) assisted with grepping the codebase, drafting the byte-stream reproducer, and constructing the regression tests. I reviewed and understand all code changes. The fix follows the existing r_ref_reserve/r_ref_insert two-phase pattern already used by TYPE_FROZENSET/TYPE_CODE/TYPE_SLICE in the same file.

…gv-g5p9-9qqq)

TYPE_TUPLE, TYPE_LIST, TYPE_DICT, and TYPE_SET used R_REF() to register
containers in p->refs immediately after allocation, before populating
their slots. A crafted payload containing a TYPE_REF back-reference to
the partial container could reach a hashing or iteration site with NULL
slots, causing tuplehash/PyObject_Hash(NULL) -> SIGSEGV.

Fix: use the existing two-phase r_ref_reserve()/r_ref_insert() pattern
(already used by TYPE_FROZENSET, TYPE_CODE, and TYPE_SLICE) for all
four container types. r_ref_reserve() places Py_None as a placeholder
in p->refs; the TYPE_REF handler (line 1675) already detects Py_None
and raises ValueError("bad marshal data (invalid reference)"). After
the container is fully populated, r_ref_insert() replaces the
placeholder with the real object.

Includes regression tests for tuple, list, set, and dict self-reference
payloads.
@python-cla-bot
Copy link
Copy Markdown

python-cla-bot bot commented Apr 16, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link
Copy Markdown

bedevere-app bot commented Apr 16, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@mjbommar mjbommar changed the title Fix SIGSEGV in marshal.loads via self-referencing containers gh-148653: Fix SIGSEGV in marshal.loads via self-referencing containers Apr 16, 2026
@serhiy-storchaka serhiy-storchaka self-requested a review April 16, 2026 17:13
@mjbommar
Copy link
Copy Markdown
Author

Trivial self-reference is broken. Pushing fix in a moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant