Skip to content

gh-148653: Fix SIGSEGV in marshal.loads for self-referencing tuples#148652

Open
mjbommar wants to merge 1 commit intopython:mainfrom
mjbommar:marshal-self-ref-fix
Open

gh-148653: Fix SIGSEGV in marshal.loads for self-referencing tuples#148652
mjbommar wants to merge 1 commit intopython:mainfrom
mjbommar:marshal-self-ref-fix

Conversation

@mjbommar
Copy link
Copy Markdown

@mjbommar mjbommar commented Apr 16, 2026

PR summary

Fix a deterministic SIGSEGV in marshal.loads() caused by TYPE_TUPLE, TYPE_LIST, TYPE_DICT, and TYPE_SET registering containers in p->refs via R_REF() before their slots are populated. A crafted payload with a TYPE_REF back-reference to the partial container can reach a hashing site (PySet_Add calling tuplehash), which calls PyObject_Hash(NULL) on unfilled slots.

The fix adopts the existing two-phase r_ref_reserve() / r_ref_insert() pattern already used by TYPE_FROZENSET, TYPE_CODE, and TYPE_SLICE. The Py_None placeholder in p->refs is detected by the existing TYPE_REF handler at line 1675, which raises ValueError("bad marshal data (invalid reference)") instead of crashing.

16-byte reproducer:

import marshal
marshal.loads(b'\xa8\x02\x00\x00\x00N<\x01\x00\x00\x00r\x00\x00\x00\x00')

Includes regression tests for tuple, list, set, and dict self-reference payloads.

Originally filed as GHSA-m7gv-g5p9-9qqq; PSRT assessed as outside the security threat model (marshal.loads is documented as not secure against malicious data). Converting to public issue + PR per their guidance.

First time committer introduction

I use Python extensively in data science and legal tech. I found this while building an automated vulnerability scanner seeded from prior CPython CVE fix diffs (CVE-2018-20406, CVE-2022-48560). Happy to iterate on the patch.

AI Disclosure

Claude Code (Anthropic) assisted with grepping the codebase, drafting the byte-stream reproducer, and constructing the regression tests. I reviewed and understand all code changes. The fix follows the existing r_ref_reserve/r_ref_insert two-phase pattern already used by TYPE_FROZENSET/TYPE_CODE/TYPE_SLICE in the same file.

@python-cla-bot
Copy link
Copy Markdown

python-cla-bot bot commented Apr 16, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link
Copy Markdown

bedevere-app bot commented Apr 16, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@mjbommar mjbommar changed the title Fix SIGSEGV in marshal.loads via self-referencing containers gh-148653: Fix SIGSEGV in marshal.loads via self-referencing containers Apr 16, 2026
@serhiy-storchaka serhiy-storchaka self-requested a review April 16, 2026 17:13
@mjbommar
Copy link
Copy Markdown
Author

Trivial self-reference is broken. Pushing fix in a moment

@mjbommar mjbommar force-pushed the marshal-self-ref-fix branch from 8b1fa09 to 7c214ea Compare April 16, 2026 17:40
@mjbommar mjbommar changed the title gh-148653: Fix SIGSEGV in marshal.loads via self-referencing containers gh-148653: Fix SIGSEGV in marshal.loads for self-referencing tuples Apr 16, 2026
@mjbommar
Copy link
Copy Markdown
Author

@serhiy-storchaka - should I force push to trigger a retest? There was a flakey upstream apt failure

@StanFromIreland
Copy link
Copy Markdown
Member

I re-ran it.

@serhiy-storchaka
Copy link
Copy Markdown
Member

Try to load b'\xa9\x01[\x01\x00\x00\x00r\x00\x00\x00\x00'. Currently it works, but with this PR it fails.

@mjbommar
Copy link
Copy Markdown
Author

Try to load b'\xa9\x01[\x01\x00\x00\x00r\x00\x00\x00\x00'. Currently it works, but with this PR it fails.

Good catch! I found a few other weird cases that regress:

  - Direct self-referential tuple: t = (t,)
    b'\xa9\x01r\x00\x00\x00\x00'
  - Tuple through dict value: ({'x': t},)
    b'\xa9\x01\xfb\xda\x01xr\x00\x00\x00\x000'
  - Tuple through list with multiple refs back to the same tuple
    b'\xa9\x01[\x02\x00\x00\x00r\x00\x00\x00\x00r\x00\x00\x00\x00'
  - Non-tuple root, but inner tuple is still the back-ref target
    Shape: [( [t], )]
    b'\xdb\x01\x00\x00\x00\xa9\x01\xdb\x01\x00\x00\x00r\x01\x00\x00\x00'
  1. @serhiy-storchaka , would you want me to improve the test cases in ./Lib/test/test_marshal.py to add these all or leave that as-is?

  2. I assume you had already thought of this too, which was the point of your example, but perhaps the real solution is to avoid changing r_object() and add the guard in tuple_hash() instead. If so, would you go with ValueError or SystemError?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants