Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
2b74021
Declare explicit interning routines
encukou Jun 10, 2024
0aedb83
Use _PyUnicode_InternStatic for the statically allocated stuff
encukou Jun 10, 2024
aad79b2
Add _PyUnicode_InternStatic and extra checks
encukou Jun 10, 2024
5413223
Check against immortalizing un-interned strings
encukou Jun 10, 2024
23845fb
Add _PyUnicode_InternImmortal & make `marshal` use it
encukou Jun 10, 2024
9ece220
Remove special case that makes previously-immortal strings STATIC
encukou Jun 10, 2024
7c62641
Split _PyUnicode_InternMortal and _PyUnicode_InternImmortal
encukou Jun 10, 2024
62a35a1
Do the refcnt dance in _PyUnicode_InternMortal & _PyUnicode_ClearInte…
encukou Jun 10, 2024
ac55a05
Deallocate mortal interned strings
encukou Jun 10, 2024
2466ae9
Use _PyUnicode_InternMortal in codecs
encukou Jun 10, 2024
032ac17
Start a notes document
encukou Jun 10, 2024
198d9c6
Handle attempts to "overwrite" interned heap types by static ones
encukou Jun 10, 2024
86ccb08
Intern statically allocated non-identifier strings at init
encukou Jun 11, 2024
e34b8da
Parenthesize the LATIN1 macro argument
encukou Jun 11, 2024
66338fe
Don't create the per-interp interned_dict until after InitStaticStrings
encukou Jun 11, 2024
9f16cb0
Move hashmap destroy to _PyUnicode_ClearInterned (symmetry with creat…
encukou Jun 14, 2024
e27abfc
Special-case short string singletons
encukou Jun 11, 2024
89f24df
Verify we don't add process-global entries after a per-interp dict ex…
encukou Jun 11, 2024
b965acf
More editing of the InternalDocs write-up
encukou Jun 11, 2024
4b69712
Only readjust refleak tests for *immortal* interned strings
encukou Jun 12, 2024
b2c9865
Be pedantic with the ref total
encukou Jun 12, 2024
cf7cb72
Split InternInPlace in sysmodule
encukou Jun 12, 2024
85f9fe0
Split InternInPlace in import
encukou Jun 12, 2024
a288389
Split InternInPlace in getargs
encukou Jun 12, 2024
6036cb1
Use mortal strings in type_setattro
encukou Jun 12, 2024
01f2dbf
Use mortal string in type_module
encukou Jun 12, 2024
73f7fb3
Use mortal strings for object attributes
encukou Jun 12, 2024
13be4e7
Use mortal strings for code object names
encukou Jun 12, 2024
1930919
Use mortal strings for code constants
encukou Jun 12, 2024
d45f20b
Use mortals in pickle
encukou Jun 12, 2024
348d95c
Use mortals for PyDict_SetItemString keys
encukou Jun 12, 2024
04f080e
Use mortals in operator: methodcaller_new
encukou Jun 12, 2024
afe5400
Use mortals in operator: attrgetter_new
encukou Jun 12, 2024
fe0b8c5
Simplify logic in _PyUnicode_InternImmortal
encukou Jun 12, 2024
1116191
Immortalize ill interned strings in the free-threaded build
encukou Jun 12, 2024
f10e521
Rewrite the write-up
encukou Jun 12, 2024
0787f8f
Restore immortalization for PyDict_SetItemString.
encukou Jun 12, 2024
0d56eba
Intern single-byte (latin1) strings at startup in free-threaded build
encukou Jun 12, 2024
8b32762
Make the three sets of singletons disjoint
encukou Jun 12, 2024
24bf76a
One more single-char string in _Py_STR
encukou Jun 13, 2024
2fb04fd
Use a less unwieldy name
encukou Jun 14, 2024
26fa26e
Adjust comments & writeup
encukou Jun 14, 2024
9b14dbb
Don't call _Py_SetImmortal on strings.
encukou Jun 14, 2024
ac6dfae
Beef up the tests
encukou Jun 14, 2024
a9e91b1
Fix comment
encukou Jun 14, 2024
ee0f068
A bit more thought-through error handling for failed PyDict_Pop
encukou Jun 14, 2024
61bf404
Switch parser to PyUnicode_InternImmortal
encukou Jun 14, 2024
80ce95b
Touch up comments
encukou Jun 14, 2024
70aa294
Switch public PyUnicode API to _PyUnicode_InternImmortal
encukou Jun 14, 2024
de2ff7f
Add an assert to _PyUnicode_EqualToASCIIId
encukou Jun 14, 2024
0e6744e
Remove #ifdef Py_DEBUG from the body of _PyUnicode_ClearInterned.
encukou Jun 14, 2024
c50e151
Consolidate the interning logic
encukou Jun 14, 2024
08798d0
Fix the free-threading initialization
encukou Jun 14, 2024
62959cd
Typo in comments
encukou Jun 14, 2024
f62ccc6
Add blurb
encukou Jun 14, 2024
86cf124
Guard call to debug function
encukou Jun 14, 2024
f2e857e
Avoid -bb warnings in tests
encukou Jun 14, 2024
6011c05
Add typing to a clinic function
encukou Jun 14, 2024
7e8d727
Work around build failure on macOS clang
encukou Jun 14, 2024
ccb7f42
Silence a mypy error
encukou Jun 17, 2024
e0bb1c2
_PyCodec_Lookup: Immortalize key on success
encukou Jun 17, 2024
975e4ba
getargs.c: Immortalize the kwtuple keys
encukou Jun 17, 2024
1c05a60
Don't re-mortalize interned immortals at interpreter shutdown (in non…
encukou Jun 17, 2024
5ac3c5f
Avoid `case` label on a declaration (invalid in standard C and, fortu…
encukou Jun 17, 2024
686d2b6
Merge in the main branch
encukou Jun 18, 2024
7d79d10
Remove PyUnicode_InternImmortal from the header
encukou Jun 19, 2024
d4eb879
Move _Py_LATIN1_CHR to pycore_global_strings.h
encukou Jun 19, 2024
f7df09a
Remove mistaken check in _pickle.c
encukou Jun 19, 2024
fe7fb13
Comment/doc clarifications, rewordings; PEP-7 style
encukou Jun 19, 2024
7a75099
Add a pedantic DECREF
encukou Jun 19, 2024
ac402d8
Use more straightforward signatures for the internal functions
encukou Jun 19, 2024
aa58c01
Group _PyUnicode_Intern funcs in the header
encukou Jun 19, 2024
929d0bc
Break out init_global_interned_strings & clear_global_interned_strings
encukou Jun 19, 2024
44c0192
Merge in the main branch
encukou Jun 19, 2024
9e3ce44
Fix function declaration
encukou Jun 20, 2024
2ebf8a0
Fix return value
encukou Jun 20, 2024
bf49f61
Convert check to assert
encukou Jun 20, 2024
6d668e6
Limit _PyUnicode_InternStatic to runtime initialization
encukou Jun 20, 2024
fd8ca83
Add a comment for _Py_hashtable_new_full destroys
encukou Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
More editing of the InternalDocs write-up
  • Loading branch information
encukou committed Jun 14, 2024
commit b965acfc671b4fa3978557e79ed05dc5fb1edac5
48 changes: 36 additions & 12 deletions InternalDocs/string_interning.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,16 @@ Strings (`PyUnicode_*`) can have the following properties:
- *immortal*: As with other immortal objects: exists until interpreter
finalization. Reference counting operations don't have any effect,
and can be safely skipped.
- *statically allocated*: `static`. Shared between interpreters.
- *singleton*: statically allocated in the core or deep-frozen modules
(but not extensions). Indicated by the `statically_allocated` flag.
These are:
- `_Py_ID` and `_Py_STR` strings collected by `make regen-global-objects`
(`Tools/build/generate_global_objects.py`); initialazed and interned at
runtime initialization time, and
- the 256 length-1 latin-1 strings. These are not interned at init,
but special-cased.

The intern state is checked with PyUnicode_CHECK_INTERNED(s), and can:
The intern state is checked with PyUnicode_CHECK_INTERNED(s), and can be:

- `SSTATE_NOT_INTERNED`
- `SSTATE_INTERNED_MORTAL`
Expand All @@ -21,13 +28,9 @@ The intern state is checked with PyUnicode_CHECK_INTERNED(s), and can:
Every object starts out as `SSTATE_NOT_INTERNED`, and can be interned using
the following functions:

- `_PyUnicode_InternStatic`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternMortal`

All of these intern an object in place. That is: Before the call you should own
a strong reference to `*p`, and after the call you still own a reference
to `*p` but it might not be the same object.
- `_PyUnicode_InternStaticSingleton`

Note that interning a *statically allocated* singleton is *not* guaranteed to
give you a statically allocated object back. (This happens when someone
Expand All @@ -44,9 +47,17 @@ All the interning functions delegate to `_PyUnicode_InternStatic` when given
a *statically allocated* string.
Use the function directly if you know you have one.

`_PyUnicode_InternStatic` is guaranteed to give you an *immortal* string.
However, it does *not* guarantee a *statically allocated* object.
(Someone might have interned an equivalent heap string before you!)
If an equivalent heap string was interned *before* a call to
`_PyUnicode_InternStatic`, you'll get the previously interned string back.
That means this function does *not* guarantee a *statically allocated* object
(i.e. the interned string might not be shared between interpreters).
However, you are guaranteed to get an *immortal* string.

Statically allocated strings are interned in their own, process-wide lookup
table. However, the interpreter-specific table takes priority; for example:
- interpreter A interns heap string "foo" into its dict
- interpreter B interns static string "foo" into the shared lookup table
- A interns a new string "foo"; it must get the one it interned previously


## Interning with immortalization
Expand All @@ -57,5 +68,18 @@ having it *statically allocated*.

As with any immortal object, you should only use this for strings that will
live until interpreter shutdown.
We currently also use in for strings contained in code objects and similar;
these are “close enough” to immortal.
We currently also use in for strings contained in code objects and similar.
These are “close enough” to immortal: even in use cases like hot reloading
or `eval`-ing user input, the number of distinct variable names and string
constants expected to stay low.


## Interning mortal strings

For mortal strings, 2 references from the interned dict (key and value) are
excluded from the object's refcount. When a string is deallocated,

Reference counting: `_PyUnicode_InternMortal` takes ownership of (“steals”)
the reference to its argument, and updates the argument with a new reference.
This makes it “reference neutral”. Beware that the function must not be called
with a borrowed reference.