Use Sphinx 1.4.9 for now by methane · Pull Request #15 · python/cpython

methane · 2017-02-11T02:09:51Z

Sphinx 1.5 is more strict.
We should fix them before using Sphinx 1.5 on Travis.

methane · 2017-02-11T02:33:49Z

test failed even with sphinx-1.4.9
#16 may fix it.

Win arm32 fix tests

TODO: - news etc.? - test somehow? at least make sure semantic tests are adequate - that "older version" path... shouldn't it be MAYBE? - mention explicitly in commit message that *this* is the actual algorithm from UAX python#15 - think if there are counter-cases where this is slower. If caller treats MAYBE same as NO... e.g. if caller actually just wants to normalize? May need to parametrize and offer both behaviors. This lets us return a NO answer instead of MAYBE when that's what a Quick_Check property tells us; or also when that's what the canonical combining classes tell us, after a Quick_Check property has said "maybe". At a quick test on my laptop, the existing code takes about 6.7 ms/MB (so 6.7 ns per byte) when the quick check returns MAYBE and it has to do the slow comparison: $ ./python -m timeit -s 'import unicodedata; s = "\uf900"*500000' -- \ 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 6.67 msec per loop With this patch, it gets the answer instantly (78 ns) on the same 1 MB string: $ ./python -m timeit -s 'import unicodedata; s = "\uf900"*500000' -- \ 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 78 nsec per loop

The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX python#15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop

…H-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX #15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop

…orithm. (pythonGH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX pythonGH-15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop (cherry picked from commit 2f09413) Co-authored-by: Greg Price <gnprice@gmail.com>

GH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX GH-15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop (cherry picked from commit 2f09413) Co-authored-by: Greg Price <gnprice@gmail.com>

…ithm. (pythonGH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX python#15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop

Now we can also remove `__setstate__`.

16: Warn for specific thread module methods r=ltratt a=nanjekyejoannah Dont merge until python#13 and python#14 are merged, some helper code cuts across. This replaces python#15 Threading module Notes Python 2: ``` >>> from thread import get_ident >>> from threading import get_ident Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name get_ident >>> import threading >>> from threading import _get_ident >>> ``` Python 3: ``` >>> from threading import get_ident >>> from thread import get_ident Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'thread' > ``` **Note:** There is no neutral way of porting Co-authored-by: Joannah Nanjekye <jnanjekye@python.org>

This commit updates the build system to automatically detect cargo and enable/disable _base64 without needing to pass a flag. If cargo is unavailable, _base64 is disabled. It also updates cpython-sys to use a hand written header (which is what Linux seems to do) and splits off the parser bindings to be handled in the future (since the files are included differently).

Co-authored-by: Emma Smith <emma@emmatyping.dev>

borrowed_regs stored PhxRegState* pointers into live_regs.values (flat hash map). When phx_sm_get_or_create triggered sm_grow (at 16+ entries), all values relocated to new memory, leaving borrowed_regs with dangling pointers. Next invalidate_bs_impl read freed memory → SIGSEGV. C++ unordered_map had reference stability (node-based storage); our C PhxStateMap (open-addressing) does not. Fix: store model register keys (void*) instead of value pointers. Look up PhxRegState by key via phx_sm_get on each access. Model keys are HIR Register* pointers that live in the graph, not the hash map. Bug python#15: triggered by nbody benchmark (~28 simultaneous live registers from nested loops + float temporaries, exceeding initial capacity of 32).

phx_rc_kill_registers cached PhxRegState* pointers from phx_sm_get into a local RegCopy array, then iterated calling phx_rc_kill_register which calls phx_sm_erase. Erase rehashes subsequent probe-chain entries, potentially moving them and invalidating cached pointers. Fix: store model keys and kind (int) in RegCopy, re-lookup PhxRegState via phx_sm_get before each kill_register call. Same pattern as bug python#15.

Per supervisor 2026-04-22 03:06:55Z + theologian 03:07:12Z + pythia python#58: push 44 introduces the W3 R4 oracle dispatcher in compiler.cpp behind #ifdef RC_ORACLE. The push-44 nm production-binary check is one-shot — need a STANDING gate assertion so future compiler.cpp edits cannot silently leak RC_ORACLE dispatch into production. Failure mode caught: Any future commit that drops, inverts, or accidentally hard-defines the #ifdef RC_ORACLE guard would leak the C++ rc_oracle dispatch path (linked from libphoenix_rc_oracle.a) into the production python binary. Without this assertion, the leak is invisible until the next manual nm audit. Same silent-failure class as the cp-||-true loophole (catch python#4, push 38) — accepted bad state silently. Implementation (5 LOC after BINARY_MATCH (clean) ✓): RC_ORACLE_LEAK=$(nm $PYTHON | grep -c 'rc_oracle') if [ $RC_ORACLE_LEAK -ne 0 ]; then echo BINARY_RC_ORACLE_LEAK_DETECTED ... exit 1 fi echo BINARY_RC_ORACLE_OK: production binary clean (0 rc_oracle symbols) Verbatim wording per gatekeeper item python#15 (03:07:25Z): - PASS: 'BINARY_RC_ORACLE_OK: production binary clean (0 rc_oracle symbols)' - FAIL: 'BINARY_RC_ORACLE_LEAK_DETECTED' + FATAL + exit 1 - Mirrors BINARY_DIRTY discipline (catch silent failure structurally) Verification (compile-clean pre-commit): bash -n scripts/gate_phoenix.sh: SYNTAX OK Inserted at line 120 (immediately after BINARY_MATCH block at line 119). Bundled into push 44 (rather than standalone push 45) because the dispatcher lands in this push — the leak-check guards it from day 1 instead of leaving a one-push window where item python#15 isn't enforced. Push 44 batch grows 3 → 4 commits: THIS COMMIT — gate item python#15 (RC_ORACLE leak assertion) 63568c0 — W3 Step 5 expansion (4 injection classes + invariant python#7) 4f591a1 — W3 Step 5 v1 (rc_oracle_self_test.sh) a99db92 — W3 Steps 1-4 (scratch lib + dispatcher) ABBA cap usage: 17 → 18 (4 commits this push).

Bug surfaced during Step 5 execution at HEAD 9cbf413: /data/users/alexturner/phoenix/cpython/Python/jit/compiler.cpp:136:10: error: expected unqualified-id ...:138:5: error: use of undeclared identifier 'rc_oracle_run' Root cause: my push 44 W3 dispatcher (a99db92 Step 3.5, also in 6450421c93 amended → a99db92) declared `extern "C" int rc_oracle_run(...)` inside the function body. C++17 [dcl.link] forbids linkage-specifications in block scope — they may only appear at namespace scope. Production builds (RC_ORACLE undefined) MASKED the bug because the entire #ifdef block is absent; the linkage-specification only enters the parser when RC_ORACLE is defined. push 44 gate caught nothing because no python build defines RC_ORACLE. Fix: move `extern "C" int rc_oracle_run(void *func);` to file scope (line 32, between #include block and `namespace jit`), guarded by the same #ifdef RC_ORACLE so production builds remain unaffected. Inside the function body, just call rc_oracle_run() (now visible via the file-scope declaration). Verification: - Production cmake --build phoenix_jit + make python: PASS, 0 errors (RC_ORACLE undefined → both forward-decl and dispatcher absent) - Out-of-band -DRC_ORACLE=1 compile of compiler.cpp: PASS (3.2 MB compiler_rc_oracle.cpp.o), 0 errors - nm production python | grep rc_oracle: 0 matches (Item python#15 falsifier still satisfied) Push 45 (gate-script fix follow-up). Discovery + fix is itself the push 44 W3 oracle's first real value: the synthetic-injection infrastructure surfaced a real linkage bug in the dispatcher that production gates would never have caught (RC_ORACLE not defined → block absent → bug invisible).

Use Sphinx 1.4.9 for now

d9c54db

the-knights-who-say-ni added the CLA signed label Feb 11, 2017

vstinner approved these changes Feb 11, 2017

View reviewed changes

methane closed this Feb 11, 2017

methane deleted the sphinx-1.4 branch February 11, 2017 02:33

paulmon added a commit to paulmon/cpython that referenced this pull request Jan 10, 2019

Merge pull request python#15 from paulmon/win-arm32-fix-tests

abaaa92

Win arm32 fix tests

gnprice mentioned this pull request Aug 28, 2019

bpo-37966: Fully implement the UAX #15 quick-check algorithm. #15558

Merged

gnprice added a commit to gnprice/cpython that referenced this pull request Aug 29, 2019

Move UAX python#15 link to doc-comment.

27e8122

miss-islington mentioned this pull request Sep 4, 2019

[3.8] closes bpo-37966: Fully implement the UAX GH-15 quick-check algorithm. (GH-15558) #15671

Merged

emmatyping referenced this pull request in emmatyping/cpython Mar 16, 2020

Make __parameters__ lazy (#15)

e50136d

Now we can also remove `__setstate__`.

pablogsal mentioned this pull request Jun 12, 2020

bpo-40958: Avoid buffer overflow in the parser when indexing the current line #20842

Closed

itachaaa mentioned this pull request Aug 22, 2022

Python 3.10 hang at exit in drop_gil() (due to resource warning at exit?) #91414

Open

mdboom mentioned this pull request Aug 25, 2022

Assert and incorrect error message when loading source file containing invalid UTF-8 #96268

Closed

mdboom mentioned this pull request Nov 22, 2022

Type punning (and strict aliasing) issue in Py_CLEAR() and Py_SETREF() macros: Python --enable-pystats is miscompiled #99701

Closed

ziegenbalg mentioned this pull request Nov 30, 2022

double free in io.TextIOWrapper #72573

Closed

gvanrossum mentioned this pull request Aug 22, 2023

heap-use-after-free in _PyFunction_LookupByVersion #108253

Closed

LinanV mentioned this pull request Oct 23, 2023

Objects/typeobject.c: No such file or directory. #111203

Closed

stasos24 mentioned this pull request Oct 24, 2023

Modules/cjkcodecs/_codecs_iso2022.c - read out of bounds #101180

Closed

williamhu020 mentioned this pull request Nov 5, 2023

Use the API C of 'Py-NewInterpreterFromConfig' to exit unexpectedly in multiple threads. #111751

Closed

kcatss mentioned this pull request Nov 15, 2023

Use-after-free in unregister() of atexit module #112127

Closed

kcatss mentioned this pull request Jan 14, 2024

crash in long_vectorcall in longobject.c #114050

Closed

kcatss mentioned this pull request Feb 12, 2024

Segmentation Fault in pthread_getcpuclockid function in time module #115378

Closed

kcatss mentioned this pull request Feb 20, 2024

Use After Free at _heapreplace_max #115706

Closed

ngoldbaum mentioned this pull request Sep 23, 2024

Crash running PyO3 tests with --test-threads=1000 #124375

Closed

Dean6767 mentioned this pull request Feb 10, 2025

🚨 Segmentation Fault in Python 3.10.12 when using OpenSSL (libcrypto.so.3) #129974

Closed

This was referenced Feb 11, 2025

segmentfault when pip installing setuptools #129992

Open

segmentfault when pip installing setuptools #129993

Closed

prashanthallu mentioned this pull request Mar 10, 2025

test_math.test_fma_zero_result() fails with the musl C library #131032

Closed

johndoe31415 mentioned this pull request May 8, 2025

Heisenbug that kills process via SIGALRM #133687

Open

DominiquePACCO mentioned this pull request Jun 3, 2025

PdfWriter add_page adds (re)writes old page instead of new #135062

Closed

efimov-mikhail mentioned this pull request Nov 5, 2025

memory leak in threading stack size #141044

Open

sergey-miryanov mentioned this pull request Dec 18, 2025

Observed memory leak in ssl library: Python 3.14 GC issue #142516

Open

Eclips4 pushed a commit to Eclips4/cpython that referenced this pull request Jan 1, 2026

Introduce rustfmt, clippy and fix their errors (python#15)

15c8e58

Co-authored-by: Emma Smith <emma@emmatyping.dev>

Qanux mentioned this pull request Feb 11, 2026

heap-buffer-overflow in functools.partial.__repr__() #144475

Closed

ngoldbaum mentioned this pull request Mar 6, 2026

Remove deprecated and undocumented function ctypes.SetPointerType #133866

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use Sphinx 1.4.9 for now#15

Use Sphinx 1.4.9 for now#15
methane wants to merge 1 commit intopython:masterfrom
methane:sphinx-1.4

methane commented Feb 11, 2017

Uh oh!

methane commented Feb 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

methane commented Feb 11, 2017

Uh oh!

methane commented Feb 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants