Skip to content

gh-149146: Add dealloc-depth counter fallback to trashcan trigger#151595

Open
bhuvi27 wants to merge 1 commit into
python:mainfrom
bhuvi27:gh-149146-tuple-memoryerror-cleanup
Open

gh-149146: Add dealloc-depth counter fallback to trashcan trigger#151595
bhuvi27 wants to merge 1 commit into
python:mainfrom
bhuvi27:gh-149146-tuple-memoryerror-cleanup

Conversation

@bhuvi27

@bhuvi27 bhuvi27 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Fixes #149146.

while True: b = (b, None) under RLIMIT_AS segfaults during cleanup:

b = 9
while True:
    b = (b, None)

Docker with ulimit -v 524288: MemoryErrorlost sys.stderr → segfault, exit 139.

The trashcan deposit in _Py_Dealloc only fires when _Py_RecursionLimit_GetMargin(tstate) < 2, which compares the stack pointer against c_stack_soft_limit. Under RLIMIT_AS the kernel SIGSEGVs trying to grow the stack while SP is still nowhere near the soft limit — issue #150722 has an LLDB trace from the same reporter showing SP ~7 MB above c_stack_hard_limit at the fault. So the trashcan never deposits and the recursive tuple_dealloc → Py_XDECREF → _Py_Dealloc → tuple_dealloc chain just runs into the wall.

This adds a per-thread c_dealloc_depth counter and uses it as a secondary trigger inside _Py_Dealloc, depositing at depth 50 (matches the old _PyTrash_UNWIND_LEVEL). The stack-pointer margin stays the primary signal; the counter only kicks in when SP can't.

One thing I missed on the first pass: the drain side has the same problem. margin >= 4 is also true in this scenario, so every _Py_Dealloc exit would re-enter _PyTrash_thread_destroy_chain from inside the deallocator it just called. Fix: destroy_chain bumps the counter for the duration of the drain (same idea as the old delete_nesting), and the drain check in _Py_Dealloc becomes just c_dealloc_depth == 0. Counter is only touched for GC types so non-GC types pay nothing.

Tests: new test_trashcan_nested_tuple_deep in test_gc.py (100k nested tuples, must clean up without crash); test_gc test_exceptions test_capi all green locally. Docker repro now exits 1 with MemoryError instead of 139 with Segmentation fault.

#150722 is the same SP-signal blindness in the comparison path (Py_EnterRecursiveCall) — separate code path, separate PR.

Under memory pressure (for example RLIMIT_AS), Python's trashcan trigger
in _Py_Dealloc never fires. The trigger relies on
_Py_RecursionLimit_GetMargin, which compares the machine stack pointer
against c_stack_soft_limit. When RLIMIT_AS prevents the kernel from
growing the C stack, the kernel SIGSEGVs while the stack pointer is
still megabytes above the soft limit (see sibling issue pythongh-150722 for
an LLDB trace showing SP ~7.2 MB above c_stack_hard_limit at the
SIGSEGV), so the trashcan never deposits and the recursive tuple_dealloc
chain runs out the stack.

Add a per-thread c_dealloc_depth counter as a fallback trigger,
mirroring the historical _PyTrash_UNWIND_LEVEL=50 protection that was
removed when the trashcan was consolidated into _Py_Dealloc.
_Py_RecursionLimit_GetMargin remains the primary signal; the counter
only kicks in when the stack-pointer signal cannot fire.

_PyTrash_thread_destroy_chain itself bumps c_dealloc_depth for the
duration of the drain so any _Py_Dealloc invoked while draining cannot
recursively re-enter destroy_chain (which would rebuild the same
unbounded recursion the trashcan exists to prevent). This mirrors the
historical delete_nesting bookkeeping.

The counter is per-tstate (no atomics needed in the free-threaded
build) and is only touched for GC types, so non-GC types pay zero
overhead.

Add a regression test in test_gc that builds a 100,000-deep
(b, None) tuple chain and asserts that ``del b`` cleans it up
without a C-stack overflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segfault when cleaning up after a MemoryError with deeply nested objects

1 participant