bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly#29468
Conversation
No longer use len() to get the length of the input data. For some buffer protocol objects, the length obtained by using len() is wrong. Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com>
Getting `self->unconsumed_tail` before acquiring the thread lock may mix up decompress state.
| if isinstance(data, (bytes, bytearray)): | ||
| nbytes = len(data) | ||
| else: | ||
| data = memoryview(data) |
There was a problem hiding this comment.
I think it's recommended to manually release a memoryview. Implementations other than CPython may use a garbage collector that doesn't immediately finalize unreferenced objects. Maybe it's simpler to always use a memoryview:
# Accept any data that supports the buffer protocol
with memoryview(data) as data:
nbytes = data.nbytes
self._file_size += nbytes
self._crc = crc32(data, self._crc)
if self._compressor:
data = self._compressor.compress(data)
self._compress_size += len(data)
self._fileobj.write(data)
return nbytesThere was a problem hiding this comment.
Good point.
There are some sites in stdlib have this problem, maybe they can be solved in another issue together.
In addition, I suspect if other Python implementations don't release the underlying buffer of memoryview, the program will run abnormally.
Let @serhiy-storchaka decide.
update: I ran a benchmark, the performances are no significant different.
There was a problem hiding this comment.
@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.
Fix this in another issue.
|
This PR is stale because it has been open for 30 days with no activity. |
MaxwellDupre
left a comment
There was a problem hiding this comment.
Ran 258 tests in 27.542s
OK (skipped=1)
== Tests result: SUCCESS ==
Look ok to me.
| with zipfile.ZipFile(io.BytesIO(), 'w') as zip: | ||
| with zip.open('data', 'w') as data: | ||
| self.assertEqual(data.write(q), LENGTH) | ||
| self.assertEqual(data._file_size, LENGTH) |
There was a problem hiding this comment.
_file_size is a private attribute. It is better to use a public API in tests. For example read the content of the file.
| # quickly. | ||
| self.assertRaises(OSError, zipfile.ZipFile, TESTFN) | ||
|
|
||
| def test_issue44439(self): |
There was a problem hiding this comment.
It would be better to move this test to AbstractWriterTests and test with different compressions.
| if isinstance(data, (bytes, bytearray)): | ||
| nbytes = len(data) | ||
| else: | ||
| data = memoryview(data) |
There was a problem hiding this comment.
@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.
| @@ -0,0 +1,2 @@ | |||
| Fix in `_ZipWriteFile.write()` method, when the input data is an object that | |||
There was a problem hiding this comment.
_ZipWriteFile is a private class. It would be better to reword the NEWS entry in terms of the public API.
|
Thanks for your review. Close and reopen to trigger tests. |
|
@serhiy-storchaka |
|
Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9. |
|
Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10. |
|
GH-31755 is a backport of this pull request to the 3.9 branch. |
…thonGH-29468) Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
|
GH-31756 is a backport of this pull request to the 3.10 branch. |
…thonGH-29468) Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
…thonGH-29468) Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
https://bugs.python.org/issue44439