gh-138270: Use `PyUnicodeWriter` in `csv.writer` #138271

maurycy · 2025-08-30T20:51:30Z

The purpose of this PR is not performance but using the modern https://docs.python.org/dev/c-api/unicode.html#c.PyUnicodeWriter API, similarly to gh-125196.

There's a risk that the code is slower, as it turned out in gh-133968. I'd prefer optimizing it after getting an ack that this is the correct direction.

Similarly to #138214 (comment), I'm not sure what is the best benchmarking strategy, besides a simple snippet. Perhaps we need https://github.com/nineteendo/jsonyx-performance-tests but for CSV.

I believe that csv.reader (ReaderObj) could also use PyUnicodeWriter. If my thinking is sound, if there's any interest and this code is OK, I can handle it.

Issue: csv module should use PyUnicodeWriter #138270

Modules/_csv.c

maurycy · 2025-08-30T21:59:00Z

cc @vstinner

picnixz

Please:

don't add comments for self-explanatory code;
follow PEP-7 for C code;
revert unrelated changes;
provide benchmarks to show whether this speeds things up or not.

picnixz · 2025-08-30T23:34:46Z

Modules/_csv.c

+                    c == dialect->escapechar ||
+                    c == dialect->quotechar) {
+                    if (dialect->escapechar == NOT_SET) {
+                        PyErr_SetString(self->error_obj, "need to escape, but no escapechar set");


Don't change this.

Modules/_csv.c

picnixz · 2025-08-30T23:48:45Z

Modules/_csv.c

+    if (writer) {
+        PyUnicodeWriter_Discard(writer);
+    }


Suggested change

if (writer) {

PyUnicodeWriter_Discard(writer);

}

PyUnicodeWriter_Discard(writer);

The error label is only jumped to when writer is not NULL I think (and Discard is a no-op for NULL, but check that)

picnixz · 2025-08-30T23:50:34Z

Modules/_csv.c

+    bool first_field_was_empty_like = false;
+    bool first_field_was_none = false;
+    bool first_field_was_quoted_in_loop = false;


Why are those now needed?

bedevere-app · 2025-08-30T23:51:11Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

picnixz · 2025-08-30T23:52:42Z

And yes, it could be meaningful to use PyUnicodeWriter instead of manual buffer constructions, but we need to check if this really improves things or not before deciding whether we can do it.

csv.writer w/ PyUnicodeWriter

efe921c

bedevere-app bot mentioned this pull request Aug 30, 2025

csv module should use PyUnicodeWriter #138270

Open

style; need to turn some vscode ext

d5c8539

StanFromIreland reviewed Aug 30, 2025

View reviewed changes

Modules/_csv.c Outdated Show resolved Hide resolved

blurb

f6db601

maurycy marked this pull request as ready for review August 30, 2025 21:58

bedevere-app bot added the awaiting review label Aug 30, 2025

picnixz requested changes Aug 30, 2025

View reviewed changes

bedevere-app bot removed the awaiting review label Aug 30, 2025

bedevere-app bot added the awaiting changes label Aug 30, 2025

maurycy added 5 commits August 31, 2025 02:52

pep7, redundant comments

3c3f7ec

defensive check

a9e6d3e

pep7

07ad8d0

s/is_none/null_field/

66cf1b0

one-line cond

dbd0b3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-138270: Use `PyUnicodeWriter` in `csv.writer` #138271

gh-138270: Use `PyUnicodeWriter` in `csv.writer` #138271

maurycy commented Aug 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

maurycy commented Aug 30, 2025

Uh oh!

picnixz left a comment

Uh oh!

picnixz Aug 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz Aug 30, 2025

Uh oh!

picnixz Aug 30, 2025

Uh oh!

bedevere-app bot commented Aug 30, 2025

Uh oh!

picnixz commented Aug 30, 2025

Uh oh!

Uh oh!

Uh oh!

gh-138270: Use PyUnicodeWriter in csv.writer #138271

Are you sure you want to change the base?

gh-138270: Use PyUnicodeWriter in csv.writer #138271

Conversation

maurycy commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

maurycy commented Aug 30, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

picnixz Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

picnixz Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

bedevere-app bot commented Aug 30, 2025

Uh oh!

picnixz commented Aug 30, 2025

Uh oh!

Uh oh!

gh-138270: Use `PyUnicodeWriter` in `csv.writer` #138271

gh-138270: Use `PyUnicodeWriter` in `csv.writer` #138271

maurycy commented Aug 30, 2025 •

edited

Loading