Feature or enhancement
Proposal:
Code reading data in pure python tends to make a buffer variable, call os.read() which returns a separate newly allocated buffer of data, then copy/append that data onto the pre-allocated buffer[0]. That creates unnecessary extra buffer objects, as well as unnecessary copies. Provide os.readinto for directly filling a Buffer Protocol object.
os.readinto should closely mirror _Py_read which underlies os.read in order to get the same behaviors around retries as well as well-tested cross-platform support.
Move simple cases that use os.read (ex. [0]) to use the new API when it makes code simpler and more efficient. Potentially adding readinto to more readable/writeable file-like proxy objects or objects which transform the data (ex. Lib/_compression) is out of scope for this issue.
[0]
|
# Wait for exec to fail or succeed; possibly raising an |
|
# exception (limited in size) |
|
errpipe_data = bytearray() |
|
while True: |
|
part = os.read(errpipe_read, 50000) |
|
errpipe_data += part |
|
if not part or len(errpipe_data) > 50000: |
|
break |
|
def read_signed(fd): |
|
data = b'' |
|
length = SIGNED_STRUCT.size |
|
while len(data) < length: |
|
s = os.read(fd, length - len(data)) |
|
if not s: |
|
raise EOFError('unexpected EOF') |
|
data += s |
|
return SIGNED_STRUCT.unpack(data)[0] |
|
def readinto(self, b): |
|
"""Same as RawIOBase.readinto().""" |
|
m = memoryview(b).cast('B') |
|
data = self.read(len(m)) |
|
n = len(data) |
|
m[:n] = data |
|
return n |
os.read loops to migrate
Well contained os.read loops
os.read loop interleaved with other code
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
#129005 (comment)
Linked PRs
Feature or enhancement
Proposal:
Code reading data in pure python tends to make a buffer variable, call
os.read()which returns a separate newly allocated buffer of data, then copy/append that data onto the pre-allocated buffer[0]. That creates unnecessary extra buffer objects, as well as unnecessary copies. Provideos.readintofor directly filling a Buffer Protocol object.os.readintoshould closely mirror_Py_readwhich underlies os.read in order to get the same behaviors around retries as well as well-tested cross-platform support.Move simple cases that use os.read (ex. [0]) to use the new API when it makes code simpler and more efficient. Potentially adding
readintoto more readable/writeable file-like proxy objects or objects which transform the data (ex.Lib/_compression) is out of scope for this issue.[0]
cpython/Lib/subprocess.py
Lines 1914 to 1921 in 298dda5
cpython/Lib/multiprocessing/forkserver.py
Lines 384 to 392 in 298dda5
cpython/Lib/_pyio.py
Lines 1695 to 1701 in 298dda5
os.readloops to migrateWell contained
os.readloopsmultiprocessing.forkserver read_signed- @cmaloney - gh-129205: Update multiprocessing.forkserver to use os.readinto #129425[x]subprocess Popen._execute_child- @cmaloney - gh-129205: Use os.readinto() in subprocess errpipe_read #129498os.readloop interleaved with other code_pyio FileIO.read FileIO.readall FileIO.readintosee, Reduce copies when reading files in pyio, match behavior of _io #129005 -- @cmaloney_pyrepl.unix_console UnixConsole.input_buffer-- fixed length underlying buffer with "pos" / window on top.pty _copy. Operates around a "high waterlevel" / attempt to have a fixed-ish size buffer. Wrapsos.readwith a_readfunction.subprocess Popen.communicate. Note, this feels like something non-contiguous Py_buffer would be really good for, particularly inself.text_modewhere currently all the bytes are "copied" into a contiguousbytesto turn then turn into text...tarfile _Stream._read and _Stream.__read. Note, builds _LowLevelFile aroundos.read, but other read methods also available.Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
#129005 (comment)
Linked PRs