Skip to content

xml.etree.ElementTree: missing nesting depth limit allows DoS via deeply nested XML #151618

@NaveenKumarG-dev

Description

@NaveenKumarG-dev

Bug report

Bug description:

Summary

xml.etree.ElementTree has no limit on element nesting depth when parsing XML. A /* FIXME: check stack size? */ comment in Modules/_elementtree.c (inside treebuilder_done) has existed since at least Python 3.x, acknowledging this gap.

While Expat itself parses iteratively, the resulting Element tree is a deeply linked structure. When Python later traverses or garbage-collects a deeply nested element tree, it does so recursively (e.g., element_gc_clearPy_DECREF → destructor chain). This recursive traversal can overflow the C stack and crash the interpreter (SIGSEGV) for sufficiently deep documents.

Note: element.__deepcopy__ already uses _Py_EnterRecursiveCall (line ~822), proving the risk from recursive traversal was known — but the parse-time guard was never added.

Minimal Reproducible Example

import xml.etree.ElementTree as ET

# Deeply nested XML — no limit is enforced during parsing
depth = 100_000
xml_data = b'<a>' * depth + b'</a>' * depth

# Parse succeeds, but the resulting tree can cause a C stack overflow
# during garbage collection or deepcopy (SIGSEGV / crash)
root = ET.fromstring(xml_data)
del root  # GC traverses the tree recursively — potential crash

Root Cause

treebuilder_handle_start() in Modules/_elementtree.c pushes to self->stack with no upper bound check on self->index (the current nesting depth). The stack list grows without limit.

The FIXME at treebuilder_done() reads:

/* FIXME: check stack size? */

Expected Behavior

Parsing XML whose nesting depth exceeds a safe limit should raise xml.etree.ElementTree.ParseError with a descriptive message, instead of silently building a tree that can crash the interpreter later.

Proposed Fix

Add a MAX_XML_NESTING_DEPTH constant (e.g., 5000) and check self->index against it at the top of treebuilder_handle_start():

#define MAX_XML_NESTING_DEPTH 5000

if (self->index >= MAX_XML_NESTING_DEPTH) {
    PyErr_Format(st->parseerror_obj,
                 "xml nesting depth limit (%d levels) exceeded",
                 MAX_XML_NESTING_DEPTH);
    return NULL;
}

Also update Doc/library/xml.rst to document "deeply nested elements" as a known XML attack vector (it is currently missing from the security section, which lists only 4 vectors: billion laughs, quadratic blowup, decompression bomb, large tokens).

Note on _Py_EnterRecursiveCall

_Py_EnterRecursiveCall is not suitable here. In Python 3.12+, it checks the actual C machine stack pointer — but Expat calls treebuilder_handle_start iteratively at a constant C depth, so the stack pointer never triggers the check. A dedicated depth counter is required.

CPython versions tested on:

3.16

Operating systems tested on:

Windows

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirpendingThe issue will be closed if no feedback is providedtopic-XMLtype-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions