Skip to content

Conversation

maurycy
Copy link
Contributor

@maurycy maurycy commented Aug 30, 2025

The purpose of this PR is not performance but using the modern https://docs.python.org/dev/c-api/unicode.html#c.PyUnicodeWriter API, similarly to gh-125196.

There's a risk that the code is slower, as it turned out in gh-133968. I'd prefer optimizing it after getting an ack that this is the correct direction.

Similarly to #138214 (comment), I'm not sure what is the best benchmarking strategy, besides a simple snippet. Perhaps we need https://github.com/nineteendo/jsonyx-performance-tests but for CSV.

I believe that csv.reader (ReaderObj) could also use PyUnicodeWriter. If my thinking is sound, if there's any interest and this code is OK, I can handle it.

@maurycy maurycy marked this pull request as ready for review August 30, 2025 21:58
@maurycy
Copy link
Contributor Author

maurycy commented Aug 30, 2025

cc @vstinner

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please:

  • don't add comments for self-explanatory code;
  • follow PEP-7 for C code;
  • revert unrelated changes;
  • provide benchmarks to show whether this speeds things up or not.

c == dialect->escapechar ||
c == dialect->quotechar) {
if (dialect->escapechar == NOT_SET) {
PyErr_SetString(self->error_obj, "need to escape, but no escapechar set");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change this.

Comment on lines +1372 to +1374
if (writer) {
PyUnicodeWriter_Discard(writer);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (writer) {
PyUnicodeWriter_Discard(writer);
}
PyUnicodeWriter_Discard(writer);

The error label is only jumped to when writer is not NULL I think (and Discard is a no-op for NULL, but check that)

Comment on lines +1244 to +1246
bool first_field_was_empty_like = false;
bool first_field_was_none = false;
bool first_field_was_quoted_in_loop = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are those now needed?

@bedevere-app
Copy link

bedevere-app bot commented Aug 30, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@picnixz
Copy link
Member

picnixz commented Aug 30, 2025

And yes, it could be meaningful to use PyUnicodeWriter instead of manual buffer constructions, but we need to check if this really improves things or not before deciding whether we can do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants