Bug report
Bug description:
Issue
ShareableList allocates string slots based on character count instead of UTF-8 byte count, causing corruption for multi-byte characters. Additionally, rstrip(b'\x00') strips legitimate trailing null bytes from bytes values.
Note: The trailing null bytes issue was originally reported in #106939 (July 2023) and documented as a known issue with a workaround. This fix attempts to resolve both that long-standing issue and the newly discovered UTF-8 corruption bug.
Reproducer
from multiprocessing.shared_memory import ShareableList
# String corruption
sl = ShareableList(['0\U00010000\U00010000'])
print(sl[0]) # UnicodeDecodeError
sl.shm.close(); sl.shm.unlink()
# Bytes corruption
sl = ShareableList([b'\x00'])
print(repr(sl[0])) # b'' instead of b'\x00'
sl.shm.close(); sl.shm.unlink()
CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Linked PRs
Bug report
Bug description:
Issue
ShareableListallocates string slots based on character count instead of UTF-8 byte count, causing corruption for multi-byte characters. Additionally,rstrip(b'\x00')strips legitimate trailing null bytes frombytesvalues.Note: The trailing null bytes issue was originally reported in #106939 (July 2023) and documented as a known issue with a workaround. This fix attempts to resolve both that long-standing issue and the newly discovered UTF-8 corruption bug.
Reproducer
CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Linked PRs