Bug report
Bug description:
Summary
xml.etree.ElementTree has no limit on element nesting depth when parsing XML. A /* FIXME: check stack size? */ comment in Modules/_elementtree.c (inside treebuilder_done) has existed since at least Python 3.x, acknowledging this gap.
While Expat itself parses iteratively, the resulting Element tree is a deeply linked structure. When Python later traverses or garbage-collects a deeply nested element tree, it does so recursively (e.g., element_gc_clear → Py_DECREF → destructor chain). This recursive traversal can overflow the C stack and crash the interpreter (SIGSEGV) for sufficiently deep documents.
Note: element.__deepcopy__ already uses _Py_EnterRecursiveCall (line ~822), proving the risk from recursive traversal was known — but the parse-time guard was never added.
Minimal Reproducible Example
import xml.etree.ElementTree as ET
# Deeply nested XML — no limit is enforced during parsing
depth = 100_000
xml_data = b'<a>' * depth + b'</a>' * depth
# Parse succeeds, but the resulting tree can cause a C stack overflow
# during garbage collection or deepcopy (SIGSEGV / crash)
root = ET.fromstring(xml_data)
del root # GC traverses the tree recursively — potential crash
Root Cause
treebuilder_handle_start() in Modules/_elementtree.c pushes to self->stack with no upper bound check on self->index (the current nesting depth). The stack list grows without limit.
The FIXME at treebuilder_done() reads:
/* FIXME: check stack size? */
Expected Behavior
Parsing XML whose nesting depth exceeds a safe limit should raise xml.etree.ElementTree.ParseError with a descriptive message, instead of silently building a tree that can crash the interpreter later.
Proposed Fix
Add a MAX_XML_NESTING_DEPTH constant (e.g., 5000) and check self->index against it at the top of treebuilder_handle_start():
#define MAX_XML_NESTING_DEPTH 5000
if (self->index >= MAX_XML_NESTING_DEPTH) {
PyErr_Format(st->parseerror_obj,
"xml nesting depth limit (%d levels) exceeded",
MAX_XML_NESTING_DEPTH);
return NULL;
}
Also update Doc/library/xml.rst to document "deeply nested elements" as a known XML attack vector (it is currently missing from the security section, which lists only 4 vectors: billion laughs, quadratic blowup, decompression bomb, large tokens).
Note on _Py_EnterRecursiveCall
_Py_EnterRecursiveCall is not suitable here. In Python 3.12+, it checks the actual C machine stack pointer — but Expat calls treebuilder_handle_start iteratively at a constant C depth, so the stack pointer never triggers the check. A dedicated depth counter is required.
CPython versions tested on:
3.16
Operating systems tested on:
Windows
Linked PRs
Bug report
Bug description:
Summary
xml.etree.ElementTreehas no limit on element nesting depth when parsing XML. A/* FIXME: check stack size? */comment inModules/_elementtree.c(insidetreebuilder_done) has existed since at least Python 3.x, acknowledging this gap.While Expat itself parses iteratively, the resulting
Elementtree is a deeply linked structure. When Python later traverses or garbage-collects a deeply nested element tree, it does so recursively (e.g.,element_gc_clear→Py_DECREF→ destructor chain). This recursive traversal can overflow the C stack and crash the interpreter (SIGSEGV) for sufficiently deep documents.Note:
element.__deepcopy__already uses_Py_EnterRecursiveCall(line ~822), proving the risk from recursive traversal was known — but the parse-time guard was never added.Minimal Reproducible Example
Root Cause
treebuilder_handle_start()inModules/_elementtree.cpushes toself->stackwith no upper bound check onself->index(the current nesting depth). The stack list grows without limit.The FIXME at
treebuilder_done()reads:/* FIXME: check stack size? */Expected Behavior
Parsing XML whose nesting depth exceeds a safe limit should raise
xml.etree.ElementTree.ParseErrorwith a descriptive message, instead of silently building a tree that can crash the interpreter later.Proposed Fix
Add a
MAX_XML_NESTING_DEPTHconstant (e.g., 5000) and checkself->indexagainst it at the top oftreebuilder_handle_start():Also update
Doc/library/xml.rstto document "deeply nested elements" as a known XML attack vector (it is currently missing from the security section, which lists only 4 vectors: billion laughs, quadratic blowup, decompression bomb, large tokens).Note on
_Py_EnterRecursiveCall_Py_EnterRecursiveCallis not suitable here. In Python 3.12+, it checks the actual C machine stack pointer — but Expat callstreebuilder_handle_startiteratively at a constant C depth, so the stack pointer never triggers the check. A dedicated depth counter is required.CPython versions tested on:
3.16
Operating systems tested on:
Windows
Linked PRs