Skip to content

gh-150942: Speed up ElementTree attribute parsing#151209

Open
deadlovelll wants to merge 2 commits into
python:mainfrom
deadlovelll:gh-150942-elementtree-attrib-take2
Open

gh-150942: Speed up ElementTree attribute parsing#151209
deadlovelll wants to merge 2 commits into
python:mainfrom
deadlovelll:gh-150942-elementtree-attrib-take2

Conversation

@deadlovelll

@deadlovelll deadlovelll commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

expat_start_handler() in Modules/_elementtree.c calls a PyDict_SetItem() and two Py_DECREF() while building attrib dictionary. Replaced it with reference-stealing _PyDict_SetItem_Take2(), dropped the pair of Py_DECREF().

Benchmark:

import pyperf
import xml.etree.ElementTree as ET


def make_doc(n_elements, n_attrs):
    attrs = " ".join(f'a{i}="v{i}"' for i in range(n_attrs))
    body = "".join(f"<item {attrs}/>" for _ in range(n_elements))
    return f"<root>{body}</root>"


DOCS = {
    "many-attrs-per-elem": make_doc(n_elements=2000, n_attrs=16),
    "few-attrs-many-elems": make_doc(n_elements=20000, n_attrs=2),
    "wide-attrs": make_doc(n_elements=200, n_attrs=64),
}


if __name__ == "__main__":
    runner = pyperf.Runner()
    for name, xml in DOCS.items():
        runner.bench_func(f"ET.fromstring {name}", ET.fromstring, xml)

Results:

+------------------------------------+---------+-----------------------+
| Benchmark                          | base    | opt                   |
+====================================+=========+=======================+
| ET.fromstring many-attrs-per-elem  | 9.07 ms | 8.51 ms: 1.07x faster |
+------------------------------------+---------+-----------------------+
| ET.fromstring few-attrs-many-elems | 20.3 ms | 19.6 ms: 1.03x faster |
+------------------------------------+---------+-----------------------+
| ET.fromstring wide-attrs           | 3.29 ms | 3.08 ms: 1.07x faster |
+------------------------------------+---------+-----------------------+
| Geometric mean                     | (ref)   | 1.06x faster          |
+------------------------------------+---------+-----------------------+

@eendebakpt eendebakpt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty small optimisation, and the entry does not need internal details, so I suggest removing it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! News entry was removed

@picnixz

picnixz commented Jun 9, 2026

Copy link
Copy Markdown
Member

I'm not very fond of using private functions in extension modules. Is there a plan to make those _Py* functions in the public C API? This is one reason why we switched to PyUnicode_Writer while we could have kept the private implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants