gh-149590: Remove faulthandler_traverse#150023
Conversation
`faulthandler_traverse` visits Python objects owned by `_PyRuntime`, not by the module instance. With multi-phase init allowing multiple module instances, each instance's GC traversal decrements `gc_refs` on the same runtime-owned objects, driving it negative when two instances are collected simultaneously.
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
Thanks @armaan-v924 for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14, 3.15. |
|
Sorry, @armaan-v924 and @JelleZijlstra, I could not cleanly backport this to |
|
GH-150037 is a backport of this pull request to the 3.15 branch. |
|
Sorry, @armaan-v924 and @JelleZijlstra, I could not cleanly backport this to |
|
Thanks @armaan-v924! Are you able to create the backport PRs for 3.13 and 3.14? It should be possible with cherry-picker plus some small merge conflict fixes. |
gh-149590: Remove faulthandler_traverse (GH-150023) `faulthandler_traverse` visits Python objects owned by `_PyRuntime`, not by the module instance. With multi-phase init allowing multiple module instances, each instance's GC traversal decrements `gc_refs` on the same runtime-owned objects, driving it negative when two instances are collected simultaneously. (cherry picked from commit 5673748) Co-authored-by: Armaan Vakharia <43391096+armaan-v924@users.noreply.github.com>
Yea, for sure. I'll take care of it tomorrow. |
tldr;
faulthandler_traversevisits Python objects owned by_PyRuntime, not by the module instance. With multi-phase init allowing multiple module instances, each instance's GC traversal decrementsgc_refson the same runtime-owned objects, driving it negative when two instances are collected simultaneously. Cleanup is already handled well, traversal is redundant, and harmless with a single instantiation.Expanded Explanation:
RCA:
BG:
faulthandleruses multi-phase init, so deleting it fromsys.modulesand re-importing produces a second, independent module object. Both module objects are tracked by the garbage collector and both havemd_state_traverse = faulthandler_traversestored.subtract_refsiterates every object in the GC generation list.Py_TYPE(op)->tp_traverse— for module objects that'smodule_traverseinObjects/moduleobject.cmodule_traversecallsmd_state_traverse(i.e.faulthandler_traverse) then visitsmd_dictfaulthandler_traversecallsPy_VISIT(fatal_error.file)(expands tovisit_decref(_PyRuntime.faulthandler.fatal_error.file))gc_refsonfatal_error.file(stderr, by default) is decremented twicefatal_error.file's real refcount only has one reference from_PyRuntime;gc_refsgoes negative → debug assertion fires, silent UAF in releaseSafety
m_traverseexists to tell the GC about Python objects owned by per-module C state (md_state, allocated whenm_size > 0).faulthandlerhasm_size = 0, so there is no per-module C state.fatal_error.file,thread.file, anduser_signals[signum].fileare all owned by_PyRuntime, not by any module instance._PyRuntimeis not a Python object and is never incontainers, so its references are never passed tovisit_decref, they contribute toob_refcntbut are never subtracted, meaninggc_refs >= 1aftersubtract_refs. GC will never mark these objects as unreachable regardless of whether the module traverse visits them. Cleanup is already handled correctly:faulthandler_disable()callsPy_CLEAR(fatal_error.file)(and equivalents), and_PyFaulthandler_Fini()callsfaulthandler_disable()at shutdown. No GC involvement is needed.Functionality
From main:

After removal:
