Skip to content

gh-148284: Block inlining of gigantic functions in ceval.c for clang 22#148334

Merged
Fidget-Spinner merged 6 commits intopython:mainfrom
Fidget-Spinner:block_inlining_3.15
Apr 10, 2026
Merged

gh-148284: Block inlining of gigantic functions in ceval.c for clang 22#148334
Fidget-Spinner merged 6 commits intopython:mainfrom
Fidget-Spinner:block_inlining_3.15

Conversation

@Fidget-Spinner
Copy link
Copy Markdown
Member

@Fidget-Spinner Fidget-Spinner commented Apr 10, 2026

It seems that on clang-22, the inliner is too aggressive on _PyEval_EvalFrameDefault when on computed goto interpreter. Together with some strange interaction with the stackref buffer, the function requires 40kB of stack space (!!!) versus the usual 1-2kB normally used.

This sets the inline limit to functions of max 512B stack space (1/4th of normal) allowed to be inlined in ceval.c. I checked the dissasembly and the new function uses about 2kB of stack.


📚 Documentation preview 📚: https://cpython-previews--148334.org.readthedocs.build/

@Fidget-Spinner
Copy link
Copy Markdown
Member Author

Tested with

RANLIB="/usr/bin/llvm-ranlib-22" LLVM_PROFDATA="/usr/bin/llvm-profdata-22" LLVM_AR="/usr/bin/llvm-ar-22" CC="clang-22" CXX="clang++-22" LDFLAGS="-fuse-ld=lld-22" ./configure --enable-optimizations --with-lto --enable-shared  && make clean && make -j18

test_call fails on main, passes on this branch. Please help me check if you can. Thank you!

@Fidget-Spinner
Copy link
Copy Markdown
Member Author

So it seems whether the original bug reproduces on main or not is nondeterministic 😨 . However, I still believe this fix is right.

@Fidget-Spinner Fidget-Spinner requested a review from vstinner April 10, 2026 14:50
@vstinner
Copy link
Copy Markdown
Member

So it seems whether the original bug reproduces on main or not is nondeterministic 😨 . However, I still believe this fix is right.

I built the Python main branch (without the fix) 3 times in a row from scratch (git clean -fdx): the two first builds didn't reproduce the issue (stack memory per call: 1.2 kB), the 3rd build reproduced the issue (stack memory per call: 33.1 kB).


I built the Python main branch on Fedora Rawhide (clang 22.1.3) with:

./configure --enable-optimizations --with-lto --enable-shared CC=clang LD=clang
time make -j14

On the two first builds, LD_LIBRARY_PATH=$PWD ./python -m test -v test_call test passed successfully (expected: test fails). On the 3rd build, the test failed.

Script:

import _testcapi
import functools

def py_call():
    return _testcapi.stack_pointer()

def call():
    return py_call()

start = _testcapi.stack_pointer()
call = functools.partial(call)
top = call()
diff = start - top
print("stack memory per call: %.1f kB" % (diff / 1024))

Output on the first two builds:

stack memory per call: 1.2 kB

Output on the third build:

stack memory per call: 33.1 kB

Note: I installed clang and its dependencies using sudo dnf install clang llvm-ar compiler-rt.

@vstinner
Copy link
Copy Markdown
Member

To get a more reproducible output, I tried using ./python -I -c pass as the "profile task".

Using ./configure --enable-optimizations --with-lto --enable-shared CC=clang LD=clang PROFILE_TASK='-I -c pass', I get a smaller stack memory usage: stack memory per call: 0.8 kB.

I built the Python main branch 3 times with these options and I got the same result each time: test_call pass and stack memory per call: 0.8 kB. I don't reproduce the issue with these options.

@vstinner
Copy link
Copy Markdown
Member

Ok, now testing the fix. I built Python with the fix 5 times in a row and test_call succeeded each time. The fix is reliable!

  • Build 1: success (stack memory per call: 1.2 kB)
  • Build 2: success (stack memory per call: 1.2 kB)
  • Build 3: success (stack memory per call: 1.2 kB)
  • Build 4: success (stack memory per call: 1.2 kB)
  • Build 5: success (stack memory per call: 1.2 kB)

Commands:

./configure --enable-optimizations --with-lto --enable-shared CC=clang LD=clang
time make -j14
LD_LIBRARY_PATH=$PWD ./python -m test -v test_call

Result: test_call pass successfully and my script outputs stack memory per call: 1.2 kB.

Moreover, the C flags are set as expected in Makefile:

$ grep ^CFLAGS_CEVAL Makefile
CFLAGS_CEVAL= -finline-max-stacksize=512

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The fix works as expected according to my manual tests (see above).

In general, I'm not fan of using a specific compiler option depending on the compiler veresion. In this case, I'm fine with it, it's a good trade-off.

@Fidget-Spinner Fidget-Spinner merged commit e007631 into python:main Apr 10, 2026
53 checks passed
@Fidget-Spinner Fidget-Spinner deleted the block_inlining_3.15 branch April 10, 2026 16:52
@Fidget-Spinner Fidget-Spinner added the needs backport to 3.14 bugs and security fixes label Apr 10, 2026
@Fidget-Spinner
Copy link
Copy Markdown
Member Author

Thanks @vstinner for the very thorough testing and review :).

@miss-islington-app
Copy link
Copy Markdown

Thanks @Fidget-Spinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 10, 2026
…clang 22 (pythonGH-148334)

(cherry picked from commit e007631)

Co-authored-by: Ken Jin <kenjin@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-app
Copy link
Copy Markdown

bedevere-app bot commented Apr 10, 2026

GH-148349 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Apr 10, 2026
Fidget-Spinner added a commit that referenced this pull request Apr 10, 2026
… clang 22 (GH-148334) (GH-148349)

gh-148284: Block inlining of gigantic functions in ceval.c for clang 22 (GH-148334)
(cherry picked from commit e007631)

Co-authored-by: Ken Jin <kenjin@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner
Copy link
Copy Markdown
Member

Since the bug occurs randomly on clang 22 (coarse estimation 1/3 builds affected), I double checked clang 21.

I built the Python main branch (without the fix) multiple times on Fedora Stable (clang 21.1.8): I failed to reproduce the issue. I confirm that the issue is specific to clang 22.

  • Build 1: success (stack memory per call: 1.2 kB)
  • Build 2: success (stack memory per call: 1.2 kB)
  • Build 3: success (stack memory per call: 1.2 kB)
  • Build 4: success (stack memory per call: 1.2 kB)
  • Build 5: success (stack memory per call: 1.2 kB)

By the way, we do have Fedora Rawhide Clang buildbots (AArch64, AMD64, PPC64LE, s390x) with Clang 22.1.3, but they don't test the LTO+PGO case:

  • "Clang" flavor tests ./configure CC=clang LD=clang --with-pydebug (clang -Og)
  • "Clang Installed" flavor tests ./configure CC=clang LD=clang (clang -O3)

@Fidget-Spinner
Copy link
Copy Markdown
Member Author

@vstinner if you want to test. 3.14 reliably crashes every single time I build with clang-22 before my patch was applied (ie at commit 429c1d3) You can try it out with clang-21 to see if it crashes.

@vstinner
Copy link
Copy Markdown
Member

@vstinner if you want to test. 3.14 reliably crashes every single time I build with clang-22 before my patch was applied (ie at commit 429c1d3) You can try it out with clang-21 to see if it crashes.

I built the commit 429c1d3 wih clang 21 (Fedora Stable): I failed to reproduce the issue, test_call pass successfully and my script gives stack memory per call: 2.3 kB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants