Skip to content

bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly#29468

Merged
serhiy-storchaka merged 6 commits into
mainfrom
unknown repository
Mar 8, 2022
Merged

bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly#29468
serhiy-storchaka merged 6 commits into
mainfrom
unknown repository

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Nov 8, 2021

wjssz and others added 2 commits November 8, 2021 20:42
No longer use len() to get the length of the input data. For some buffer protocol objects,
the length obtained by using len() is wrong.

Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com>
Getting `self->unconsumed_tail` before acquiring the thread lock may mix up decompress state.
Comment thread Lib/zipfile.py
if isinstance(data, (bytes, bytearray)):
nbytes = len(data)
else:
data = memoryview(data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's recommended to manually release a memoryview. Implementations other than CPython may use a garbage collector that doesn't immediately finalize unreferenced objects. Maybe it's simpler to always use a memoryview:

        # Accept any data that supports the buffer protocol
        with memoryview(data) as data:
            nbytes = data.nbytes
            self._file_size += nbytes
            self._crc = crc32(data, self._crc)
            if self._compressor:
                data = self._compressor.compress(data)
                self._compress_size += len(data)
            self._fileobj.write(data)
            return nbytes

Copy link
Copy Markdown
Author

@ghost ghost Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
There are some sites in stdlib have this problem, maybe they can be solved in another issue together.
In addition, I suspect if other Python implementations don't release the underlying buffer of memoryview, the program will run abnormally.

Let @serhiy-storchaka decide.

update: I ran a benchmark, the performances are no significant different.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label Dec 15, 2021
Copy link
Copy Markdown
Contributor

@MaxwellDupre MaxwellDupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran 258 tests in 27.542s

OK (skipped=1)

== Tests result: SUCCESS ==
Look ok to me.

Comment thread Lib/test/test_zipfile.py Outdated
with zipfile.ZipFile(io.BytesIO(), 'w') as zip:
with zip.open('data', 'w') as data:
self.assertEqual(data.write(q), LENGTH)
self.assertEqual(data._file_size, LENGTH)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_file_size is a private attribute. It is better to use a public API in tests. For example read the content of the file.

Comment thread Lib/test/test_zipfile.py Outdated
# quickly.
self.assertRaises(OSError, zipfile.ZipFile, TESTFN)

def test_issue44439(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to move this test to AbstractWriterTests and test with different compressions.

Comment thread Lib/zipfile.py
if isinstance(data, (bytes, bytearray)):
nbytes = len(data)
else:
data = memoryview(data)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.

@@ -0,0 +1,2 @@
Fix in `_ZipWriteFile.write()` method, when the input data is an object that
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ZipWriteFile is a private class. It would be better to reword the NEWS entry in terms of the public API.

@serhiy-storchaka serhiy-storchaka removed the stale Stale PR or inactive for long period of time. label Mar 7, 2022
Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

@ghost
Copy link
Copy Markdown
Author

ghost commented Mar 7, 2022

Thanks for your review.

Close and reopen to trigger tests.

@ghost ghost closed this Mar 7, 2022
@ghost ghost reopened this Mar 7, 2022
@ghost
Copy link
Copy Markdown
Author

ghost commented Mar 7, 2022

@serhiy-storchaka
I modified the NEWS file slightly, the tests are all green now.

@serhiy-storchaka serhiy-storchaka merged commit 36dd739 into python:main Mar 8, 2022
@serhiy-storchaka serhiy-storchaka added needs backport to 3.9 needs backport to 3.10 only security fixes type-bug An unexpected behavior, bug, or error labels Mar 8, 2022
@miss-islington
Copy link
Copy Markdown
Contributor

Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

@miss-islington
Copy link
Copy Markdown
Contributor

Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@bedevere-bot
Copy link
Copy Markdown

GH-31755 is a backport of this pull request to the 3.9 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 8, 2022
…thonGH-29468)

Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Mar 8, 2022
@bedevere-bot
Copy link
Copy Markdown

GH-31756 is a backport of this pull request to the 3.10 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 8, 2022
…thonGH-29468)

Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
miss-islington added a commit that referenced this pull request Mar 8, 2022
…-29468)

Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
miss-islington added a commit that referenced this pull request Mar 8, 2022
…-29468)

Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
@ghost ghost deleted the fix_compression branch March 8, 2022 12:11
hello-adam pushed a commit to hello-adam/cpython that referenced this pull request Jun 2, 2022
…thonGH-29468)

Co-authored-by: Marco Ribeiro <marcoffee@users.noreply.github.com>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants