Closed
Conversation
vstinner
approved these changes
Feb 11, 2017
Member
Author
|
test failed even with sphinx-1.4.9 |
paulmon
added a commit
to paulmon/cpython
that referenced
this pull request
Jan 10, 2019
Win arm32 fix tests
gnprice
added a commit
to gnprice/cpython
that referenced
this pull request
Aug 28, 2019
TODO: - news etc.? - test somehow? at least make sure semantic tests are adequate - that "older version" path... shouldn't it be MAYBE? - mention explicitly in commit message that *this* is the actual algorithm from UAX python#15 - think if there are counter-cases where this is slower. If caller treats MAYBE same as NO... e.g. if caller actually just wants to normalize? May need to parametrize and offer both behaviors. This lets us return a NO answer instead of MAYBE when that's what a Quick_Check property tells us; or also when that's what the canonical combining classes tell us, after a Quick_Check property has said "maybe". At a quick test on my laptop, the existing code takes about 6.7 ms/MB (so 6.7 ns per byte) when the quick check returns MAYBE and it has to do the slow comparison: $ ./python -m timeit -s 'import unicodedata; s = "\uf900"*500000' -- \ 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 6.67 msec per loop With this patch, it gets the answer instantly (78 ns) on the same 1 MB string: $ ./python -m timeit -s 'import unicodedata; s = "\uf900"*500000' -- \ 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 78 nsec per loop
gnprice
added a commit
to gnprice/cpython
that referenced
this pull request
Aug 28, 2019
The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX python#15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop
gnprice
added a commit
to gnprice/cpython
that referenced
this pull request
Aug 29, 2019
benjaminp
pushed a commit
that referenced
this pull request
Sep 4, 2019
…H-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX #15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this pull request
Sep 4, 2019
…orithm. (pythonGH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX pythonGH-15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop (cherry picked from commit 2f09413) Co-authored-by: Greg Price <gnprice@gmail.com>
miss-islington
added a commit
that referenced
this pull request
Sep 4, 2019
GH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX GH-15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop (cherry picked from commit 2f09413) Co-authored-by: Greg Price <gnprice@gmail.com>
lisroach
pushed a commit
to lisroach/cpython
that referenced
this pull request
Sep 10, 2019
…ithm. (pythonGH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX python#15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop
DinoV
pushed a commit
to DinoV/cpython
that referenced
this pull request
Jan 14, 2020
…ithm. (pythonGH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX python#15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop
emmatyping
referenced
this pull request
in emmatyping/cpython
Mar 16, 2020
Now we can also remove `__setstate__`.
nanjekyejoannah
added a commit
to nanjekyejoannah/cpython
that referenced
this pull request
Dec 1, 2022
16: Warn for specific thread module methods r=ltratt a=nanjekyejoannah Dont merge until python#13 and python#14 are merged, some helper code cuts across. This replaces python#15 Threading module Notes Python 2: ``` >>> from thread import get_ident >>> from threading import get_ident Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name get_ident >>> import threading >>> from threading import _get_ident >>> ``` Python 3: ``` >>> from threading import get_ident >>> from thread import get_ident Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'thread' > ``` **Note:** There is no neutral way of porting Co-authored-by: Joannah Nanjekye <jnanjekye@python.org>
This was referenced Feb 11, 2025
Eclips4
pushed a commit
to Eclips4/cpython
that referenced
this pull request
Nov 17, 2025
This commit updates the build system to automatically detect cargo and enable/disable _base64 without needing to pass a flag. If cargo is unavailable, _base64 is disabled. It also updates cpython-sys to use a hand written header (which is what Linux seems to do) and splits off the parser bindings to be handled in the future (since the files are included differently).
Eclips4
pushed a commit
to Eclips4/cpython
that referenced
this pull request
Jan 1, 2026
Co-authored-by: Emma Smith <emma@emmatyping.dev>
SonicField
added a commit
to SonicField/cpython
that referenced
this pull request
Apr 19, 2026
borrowed_regs stored PhxRegState* pointers into live_regs.values (flat hash map). When phx_sm_get_or_create triggered sm_grow (at 16+ entries), all values relocated to new memory, leaving borrowed_regs with dangling pointers. Next invalidate_bs_impl read freed memory → SIGSEGV. C++ unordered_map had reference stability (node-based storage); our C PhxStateMap (open-addressing) does not. Fix: store model register keys (void*) instead of value pointers. Look up PhxRegState by key via phx_sm_get on each access. Model keys are HIR Register* pointers that live in the graph, not the hash map. Bug python#15: triggered by nbody benchmark (~28 simultaneous live registers from nested loops + float temporaries, exceeding initial capacity of 32).
SonicField
added a commit
to SonicField/cpython
that referenced
this pull request
Apr 19, 2026
phx_rc_kill_registers cached PhxRegState* pointers from phx_sm_get into a local RegCopy array, then iterated calling phx_rc_kill_register which calls phx_sm_erase. Erase rehashes subsequent probe-chain entries, potentially moving them and invalidating cached pointers. Fix: store model keys and kind (int) in RegCopy, re-lookup PhxRegState via phx_sm_get before each kill_register call. Same pattern as bug python#15.
SonicField
added a commit
to SonicField/cpython
that referenced
this pull request
Apr 22, 2026
Per supervisor 2026-04-22 03:06:55Z + theologian 03:07:12Z + pythia python#58: push 44 introduces the W3 R4 oracle dispatcher in compiler.cpp behind #ifdef RC_ORACLE. The push-44 nm production-binary check is one-shot — need a STANDING gate assertion so future compiler.cpp edits cannot silently leak RC_ORACLE dispatch into production. Failure mode caught: Any future commit that drops, inverts, or accidentally hard-defines the #ifdef RC_ORACLE guard would leak the C++ rc_oracle dispatch path (linked from libphoenix_rc_oracle.a) into the production python binary. Without this assertion, the leak is invisible until the next manual nm audit. Same silent-failure class as the cp-||-true loophole (catch python#4, push 38) — accepted bad state silently. Implementation (5 LOC after BINARY_MATCH (clean) ✓): RC_ORACLE_LEAK=$(nm $PYTHON | grep -c 'rc_oracle') if [ $RC_ORACLE_LEAK -ne 0 ]; then echo BINARY_RC_ORACLE_LEAK_DETECTED ... exit 1 fi echo BINARY_RC_ORACLE_OK: production binary clean (0 rc_oracle symbols) Verbatim wording per gatekeeper item python#15 (03:07:25Z): - PASS: 'BINARY_RC_ORACLE_OK: production binary clean (0 rc_oracle symbols)' - FAIL: 'BINARY_RC_ORACLE_LEAK_DETECTED' + FATAL + exit 1 - Mirrors BINARY_DIRTY discipline (catch silent failure structurally) Verification (compile-clean pre-commit): bash -n scripts/gate_phoenix.sh: SYNTAX OK Inserted at line 120 (immediately after BINARY_MATCH block at line 119). Bundled into push 44 (rather than standalone push 45) because the dispatcher lands in this push — the leak-check guards it from day 1 instead of leaving a one-push window where item python#15 isn't enforced. Push 44 batch grows 3 → 4 commits: THIS COMMIT — gate item python#15 (RC_ORACLE leak assertion) 63568c0 — W3 Step 5 expansion (4 injection classes + invariant python#7) 4f591a1 — W3 Step 5 v1 (rc_oracle_self_test.sh) a99db92 — W3 Steps 1-4 (scratch lib + dispatcher) ABBA cap usage: 17 → 18 (4 commits this push).
SonicField
added a commit
to SonicField/cpython
that referenced
this pull request
Apr 22, 2026
Bug surfaced during Step 5 execution at HEAD 9cbf413: /data/users/alexturner/phoenix/cpython/Python/jit/compiler.cpp:136:10: error: expected unqualified-id ...:138:5: error: use of undeclared identifier 'rc_oracle_run' Root cause: my push 44 W3 dispatcher (a99db92 Step 3.5, also in 6450421c93 amended → a99db92) declared `extern "C" int rc_oracle_run(...)` inside the function body. C++17 [dcl.link] forbids linkage-specifications in block scope — they may only appear at namespace scope. Production builds (RC_ORACLE undefined) MASKED the bug because the entire #ifdef block is absent; the linkage-specification only enters the parser when RC_ORACLE is defined. push 44 gate caught nothing because no python build defines RC_ORACLE. Fix: move `extern "C" int rc_oracle_run(void *func);` to file scope (line 32, between #include block and `namespace jit`), guarded by the same #ifdef RC_ORACLE so production builds remain unaffected. Inside the function body, just call rc_oracle_run() (now visible via the file-scope declaration). Verification: - Production cmake --build phoenix_jit + make python: PASS, 0 errors (RC_ORACLE undefined → both forward-decl and dispatcher absent) - Out-of-band -DRC_ORACLE=1 compile of compiler.cpp: PASS (3.2 MB compiler_rc_oracle.cpp.o), 0 errors - nm production python | grep rc_oracle: 0 matches (Item python#15 falsifier still satisfied) Push 45 (gate-script fix follow-up). Discovery + fix is itself the push 44 W3 oracle's first real value: the synthetic-injection infrastructure surfaced a real linkage bug in the dispatcher that production gates would never have caught (RC_ORACLE not defined → block absent → bug invisible).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sphinx 1.5 is more strict.
We should fix them before using Sphinx 1.5 on Travis.