gh-151814: Fix unbounded memory growth from repeated empty writes to io.TextIOWrapper#151817
gh-151814: Fix unbounded memory growth from repeated empty writes to io.TextIOWrapper#151817StanFromIreland wants to merge 1 commit into
io.TextIOWrapper#151817Conversation
| else if (!PyList_CheckExact(self->pending_bytes)) { | ||
| PyObject *list = PyList_New(2); | ||
| if (list == NULL) { | ||
| if (bytes_len > 0) { |
There was a problem hiding this comment.
Git seems to render the diff poorly, locally I see with -w (--ignore-all-space):
$ git show -w HEAD -- Modules/_io/textio.c
commit c6b5163133619febd0fbe8c327e52399b1a54ffd (HEAD -> textio-acc, origin/textio-acc)
Author: Stan Ulbrych <stan@python.org>
Date: Sat Jun 20 21:16:54 2026 +0100
Fix unbounded memory growth from repeated empty writes to io.TextIOWrapper
diff --git a/Modules/_io/textio.c b/Modules/_io/textio.c
index 24e08cec88f..5b2a20a30c2 100644
--- a/Modules/_io/textio.c
+++ b/Modules/_io/textio.c
@@ -1820,6 +1820,7 @@ _io_TextIOWrapper_write_impl(textio *self, PyObject *text)
}
}
+ if (bytes_len > 0) {
if (self->pending_bytes == NULL) {
assert(self->pending_bytes_count == 0);
self->pending_bytes = b;
@@ -1846,6 +1847,11 @@ _io_TextIOWrapper_write_impl(textio *self, PyObject *text)
}
self->pending_bytes_count += bytes_len;
+ }
+ else {
+ Py_DECREF(b);
+ }
+
if (self->pending_bytes_count >= self->chunk_size || needflush ||
text_needflush) {
if (_textiowrapper_writeflush(self) < 0)io.TextIOWrapper
| self.assertRaises(TypeError, txt.writelines, None) | ||
| self.assertRaises(TypeError, txt.writelines, b'abc') | ||
|
|
||
| def test_write_empty_stress(self): |
There was a problem hiding this comment.
This test passes for me without this patch. Maybe expose privately pending_bytes to be able to test it isn't getting longer?
There was a problem hiding this comment.
This test passes for me without this patch.
I wanted a simple little test to stress the patch, it does pass on modern systems with vast amounts of resources. I'm not sure about exposing new attributes, even privately. We could do messy things like gc.get_referents(txt), but I worry it might break in the future for other reasons.
There was a problem hiding this comment.
👍 to not relying on GC (in theory we should get the zero-length immortal bytes object here; lots of internals).
This test for my machine only peeks at ~700MB of ram while taking ~2.5 seconds. Most that time is in _pyio which looks like it needs a similar improvement (although probably in BufferedWriter for that one...).
I'm okay with this test if it's tagged walltime but would really prefer a test which fails if the empty string path is removed.
There was a problem hiding this comment.
This test for my machine only peeks at ~700MB of ram while taking ~2.5 seconds. Most that time is in _pyio which looks like it needs a similar improvement (although probably in BufferedWriter for that one...).
I looked both at _pyio.TextIOWrapper.write which writes straight through and has no accumulation. And BufferedWriter which extends a bytearrary, so adding an empty string shouldn't be a problem there.
I'm okay with this test if it's tagged walltime but would really prefer a test which fails if the empty string path is removed.
As Zach noted on the issue, this is a "degenerate case," I don't think the complexity of testing this precisely is warranted here.
There was a problem hiding this comment.
In that case I'd prefer no test and just comment well in the code.
If the test doesn't regress it's not adding a lot of value for the complexity/runtime. For me on debug linux ./python -m test test_io -j12 current takes 8.3 seconds. Adding 2.5 seconds with this test which doesn't fail if the new code is removed isn't worth it.
Uh oh!
There was an error while loading. Please reload this page.