Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions Lib/test/test_io/test_textio.py
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,18 @@ def test_writelines_error(self):
self.assertRaises(TypeError, txt.writelines, None)
self.assertRaises(TypeError, txt.writelines, b'abc')

def test_write_empty_stress(self):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes for me without this patch. Maybe expose privately pending_bytes to be able to test it isn't getting longer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes for me without this patch.

I wanted a simple little test to stress the patch, it does pass on modern systems with vast amounts of resources. I'm not sure about exposing new attributes, even privately. We could do messy things like gc.get_referents(txt), but I worry it might break in the future for other reasons.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to not relying on GC (in theory we should get the zero-length immortal bytes object here; lots of internals).

This test for my machine only peeks at ~700MB of ram while taking ~2.5 seconds. Most that time is in _pyio which looks like it needs a similar improvement (although probably in BufferedWriter for that one...).

I'm okay with this test if it's tagged walltime but would really prefer a test which fails if the empty string path is removed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test for my machine only peeks at ~700MB of ram while taking ~2.5 seconds. Most that time is in _pyio which looks like it needs a similar improvement (although probably in BufferedWriter for that one...).

I looked both at _pyio.TextIOWrapper.write which writes straight through and has no accumulation. And BufferedWriter which extends a bytearrary, so adding an empty string shouldn't be a problem there.

I'm okay with this test if it's tagged walltime but would really prefer a test which fails if the empty string path is removed.

As Zach noted on the issue, this is a "degenerate case," I don't think the complexity of testing this precisely is warranted here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'd prefer no test and just comment well in the code.

If the test doesn't regress it's not adding a lot of value for the complexity/runtime. For me on debug linux ./python -m test test_io -j12 current takes 8.3 seconds. Adding 2.5 seconds with this test which doesn't fail if the new code is removed isn't worth it.

# gh-151814: repeatedly writing the empty string shouldn't accumulate
# in the pending-write buffer.
buf = self.BytesIO()
txt = self.TextIOWrapper(buf, encoding="utf-8")
for _ in range(1_000_000):
txt.write('')
self.assertEqual(buf.getvalue(), b'')
txt.write('S')
txt.flush()
self.assertEqual(buf.getvalue(), b'S')

def test_issue1395_1(self):
txt = self.TextIOWrapper(self.BytesIO(self.testdata), encoding="ascii")

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Fix unbounded memory growth in :class:`io.TextIOWrapper` when repeatedly
writing an empty string.
44 changes: 25 additions & 19 deletions Modules/_io/textio.c
Original file line number Diff line number Diff line change
Expand Up @@ -1820,32 +1820,38 @@ _io_TextIOWrapper_write_impl(textio *self, PyObject *text)
}
}

if (self->pending_bytes == NULL) {
assert(self->pending_bytes_count == 0);
self->pending_bytes = b;
}
else if (!PyList_CheckExact(self->pending_bytes)) {
PyObject *list = PyList_New(2);
if (list == NULL) {
if (bytes_len > 0) {

@StanFromIreland StanFromIreland Jun 20, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git seems to render the diff poorly, locally I see with -w (--ignore-all-space):

$ git show -w HEAD -- Modules/_io/textio.c
commit c6b5163133619febd0fbe8c327e52399b1a54ffd (HEAD -> textio-acc, origin/textio-acc)
Author: Stan Ulbrych <stan@python.org>
Date:   Sat Jun 20 21:16:54 2026 +0100

    Fix unbounded memory growth from repeated empty writes to io.TextIOWrapper

diff --git a/Modules/_io/textio.c b/Modules/_io/textio.c
index 24e08cec88f..5b2a20a30c2 100644
--- a/Modules/_io/textio.c
+++ b/Modules/_io/textio.c
@@ -1820,6 +1820,7 @@ _io_TextIOWrapper_write_impl(textio *self, PyObject *text)
         }
     }
 
+    if (bytes_len > 0) {
         if (self->pending_bytes == NULL) {
             assert(self->pending_bytes_count == 0);
             self->pending_bytes = b;
@@ -1846,6 +1847,11 @@ _io_TextIOWrapper_write_impl(textio *self, PyObject *text)
         }
 
         self->pending_bytes_count += bytes_len;
+    }
+    else {
+        Py_DECREF(b);
+    }
+
     if (self->pending_bytes_count >= self->chunk_size || needflush ||
         text_needflush) {
         if (_textiowrapper_writeflush(self) < 0)

Comment thread
cmaloney marked this conversation as resolved.
if (self->pending_bytes == NULL) {
assert(self->pending_bytes_count == 0);
self->pending_bytes = b;
}
else if (!PyList_CheckExact(self->pending_bytes)) {
PyObject *list = PyList_New(2);
if (list == NULL) {
Py_DECREF(b);
return NULL;
}
// Since Python 3.12, allocating GC object won't trigger GC and release
// GIL. See https://github.com/python/cpython/issues/97922
assert(!PyList_CheckExact(self->pending_bytes));
PyList_SET_ITEM(list, 0, self->pending_bytes);
PyList_SET_ITEM(list, 1, b);
self->pending_bytes = list;
}
else {
if (PyList_Append(self->pending_bytes, b) < 0) {
Py_DECREF(b);
return NULL;
}
Py_DECREF(b);
return NULL;
}
// Since Python 3.12, allocating GC object won't trigger GC and release
// GIL. See https://github.com/python/cpython/issues/97922
assert(!PyList_CheckExact(self->pending_bytes));
PyList_SET_ITEM(list, 0, self->pending_bytes);
PyList_SET_ITEM(list, 1, b);
self->pending_bytes = list;

self->pending_bytes_count += bytes_len;
}
else {
if (PyList_Append(self->pending_bytes, b) < 0) {
Py_DECREF(b);
return NULL;
}
Py_DECREF(b);
}

self->pending_bytes_count += bytes_len;
if (self->pending_bytes_count >= self->chunk_size || needflush ||
text_needflush) {
if (_textiowrapper_writeflush(self) < 0)
Expand Down
Loading