Skip to content

gh-151618: Add nesting depth limit to xml.etree.ElementTree TreeBuilder#151620

Closed
NaveenKumarG-dev wants to merge 3 commits into
python:mainfrom
NaveenKumarG-dev:gh-151618-elementtree-nesting-depth-limit
Closed

gh-151618: Add nesting depth limit to xml.etree.ElementTree TreeBuilder#151620
NaveenKumarG-dev wants to merge 3 commits into
python:mainfrom
NaveenKumarG-dev:gh-151618-elementtree-nesting-depth-limit

Conversation

@NaveenKumarG-dev

Copy link
Copy Markdown

Fixes gh-151618.

Summary

xml.etree.ElementTree had no limit on XML element nesting depth. A /* FIXME: check stack size? */ comment in Modules/_elementtree.c acknowledged this gap. Without a guard, parsing deeply nested XML silently builds an arbitrarily deep Element tree that can crash the interpreter (SIGSEGV) when Python's GC traverses it recursively (element_gc_clearPy_DECREF → destructor chain).

Changes

Modules/_elementtree.c

  • Added #define MAX_XML_NESTING_DEPTH 5000 above treebuilder_handle_start
  • Added depth check: raises ParseError("xml nesting depth limit (5000 levels) exceeded") when self->index >= MAX_XML_NESTING_DEPTH
  • Replaced the FIXME comment in treebuilder_done with an accurate comment

Lib/test/test_xml_etree.py

  • Added NestingDepthTest with 6 tests (deep nesting, shallow nesting, exact boundary conditions, TreeBuilder, XMLParser)
  • Tests skip for the pure-Python ET; run fully via test_xml_etree_c.py

Doc/library/xml.rst

  • Added "deeply nested elements" as a 5th XML security attack vector

Misc/NEWS.d/

  • Added NEWS blurb

Why not _Py_EnterRecursiveCall?

In Python 3.12+, _Py_EnterRecursiveCall checks the C machine stack pointer, not a counter. Expat calls treebuilder_handle_start iteratively at a constant C depth — the pointer check never triggers. A dedicated counter on self->index is required.

Test Results

Ran 255 tests in 2.786s
OK (skipped=7)

0 regressions. 6 new NestingDepthTest cases all pass via test_xml_etree_c.

…eBuilder

Add a MAX_XML_NESTING_DEPTH constant (5000 levels) in treebuilder_handle_start() to prevent C stack overflows caused by deeply nested XML documents. When the limit is exceeded, ParseError is raised with a descriptive message instead of silently building a tree that could crash the interpreter during GC or deepcopy.

The FIXME comment /* FIXME: check stack size? */ in treebuilder_done() is replaced with an accurate comment explaining where the guard lives.

Note: _Py_EnterRecursiveCall is not suitable here because in Python 3.12+ it checks the C machine stack pointer, but Expat calls treebuilder_handle_start iteratively at a constant depth, so the pointer check never triggers.

Also document 'deeply nested elements' as an XML DoS attack vector in Doc/library/xml.rst, which previously listed only 4 vectors.
@read-the-docs-community

read-the-docs-community Bot commented Jun 18, 2026

Copy link
Copy Markdown

Documentation build overview

📚 cpython-previews | 🛠️ Build #33197813 | 📁 Comparing cec6e61 against main (8d7c6dc)

  🔍 Preview build  

6 files changed · ± 6 modified

± Modified

@picnixz

picnixz commented Jun 18, 2026

Copy link
Copy Markdown
Member

I'm closing this PR because it's premature. We need a discussion first especially to consider whether this is really an issue or not.

@picnixz picnixz closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

xml.etree.ElementTree: missing nesting depth limit allows DoS via deeply nested XML

2 participants