-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
gh-138270: Use PyUnicodeWriter in csv.writer
#138271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
cc @vstinner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please:
- don't add comments for self-explanatory code;
- follow PEP-7 for C code;
- revert unrelated changes;
- provide benchmarks to show whether this speeds things up or not.
| c == dialect->escapechar || | ||
| c == dialect->quotechar) { | ||
| if (dialect->escapechar == NOT_SET) { | ||
| PyErr_SetString(self->error_obj, "need to escape, but no escapechar set"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't change this.
| if (writer) { | ||
| PyUnicodeWriter_Discard(writer); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (writer) { | |
| PyUnicodeWriter_Discard(writer); | |
| } | |
| PyUnicodeWriter_Discard(writer); |
The error label is only jumped to when writer is not NULL I think (and Discard is a no-op for NULL, but check that)
| bool first_field_was_empty_like = false; | ||
| bool first_field_was_none = false; | ||
| bool first_field_was_quoted_in_loop = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are those now needed?
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
And yes, it could be meaningful to use PyUnicodeWriter instead of manual buffer constructions, but we need to check if this really improves things or not before deciding whether we can do it. |
The purpose of this PR is not performance but using the modern https://docs.python.org/dev/c-api/unicode.html#c.PyUnicodeWriter API, similarly to gh-125196.
There's a risk that the code is slower, as it turned out in gh-133968. I'd prefer optimizing it after getting an ack that this is the correct direction.
Similarly to #138214 (comment), I'm not sure what is the best benchmarking strategy, besides a simple snippet. Perhaps we need https://github.com/nineteendo/jsonyx-performance-tests but for CSV.
I believe that
csv.reader(ReaderObj) could also usePyUnicodeWriter. If my thinking is sound, if there's any interest and this code is OK, I can handle it.csvmodule should usePyUnicodeWriter#138270