gh-148653: Fix SIGSEGV in marshal.loads for self-referencing tuples#148652
gh-148653: Fix SIGSEGV in marshal.loads for self-referencing tuples#148652mjbommar wants to merge 1 commit intopython:mainfrom
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
Trivial self-reference is broken. Pushing fix in a moment |
8b1fa09 to
7c214ea
Compare
|
@serhiy-storchaka - should I force push to trigger a retest? There was a flakey upstream apt failure |
|
I re-ran it. |
|
Try to load |
Good catch! I found a few other weird cases that regress:
|
PR summary
Fix a deterministic SIGSEGV in
marshal.loads()caused byTYPE_TUPLE,TYPE_LIST,TYPE_DICT, andTYPE_SETregistering containers inp->refsviaR_REF()before their slots are populated. A crafted payload with aTYPE_REFback-reference to the partial container can reach a hashing site (PySet_Addcallingtuplehash), which callsPyObject_Hash(NULL)on unfilled slots.The fix adopts the existing two-phase
r_ref_reserve()/r_ref_insert()pattern already used byTYPE_FROZENSET,TYPE_CODE, andTYPE_SLICE. ThePy_Noneplaceholder inp->refsis detected by the existingTYPE_REFhandler at line 1675, which raisesValueError("bad marshal data (invalid reference)")instead of crashing.16-byte reproducer:
Includes regression tests for tuple, list, set, and dict self-reference payloads.
Originally filed as GHSA-m7gv-g5p9-9qqq; PSRT assessed as outside the security threat model (marshal.loads is documented as not secure against malicious data). Converting to public issue + PR per their guidance.
First time committer introduction
I use Python extensively in data science and legal tech. I found this while building an automated vulnerability scanner seeded from prior CPython CVE fix diffs (CVE-2018-20406, CVE-2022-48560). Happy to iterate on the patch.
AI Disclosure
Claude Code (Anthropic) assisted with grepping the codebase, drafting the byte-stream reproducer, and constructing the regression tests. I reviewed and understand all code changes. The fix follows the existing
r_ref_reserve/r_ref_inserttwo-phase pattern already used by TYPE_FROZENSET/TYPE_CODE/TYPE_SLICE in the same file.