feat(storage): log additional bytes received from GCS in read path#17423
feat(storage): log additional bytes received from GCS in read path#17423nidhiii-27 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces warning logs when more bytes are downloaded than requested from GCS, implemented in both _download.py and blob.py, along with a new unit test. The review feedback highlights several critical and medium-severity issues: an undefined headers variable in _download.py that will cause a NameError, a missing function call to _get_headers in blob.py that will fail at runtime, and a flawed unit test that does not actually execute the download loop due to incorrect mocking of download.finished and consume_next_chunk. Additionally, suggestions are provided to ensure fallback values are correctly applied when bucket or object names are None, and to align the range-handling logic in blob.py with _download.py for consistency.
| requested_length = self.end + 1 | ||
|
|
||
| if requested_length is not None and self._bytes_downloaded > requested_length: | ||
| if headers.get("x-goog-stored-content-encoding") != "gzip": |
There was a problem hiding this comment.
The variable headers is not defined in the scope of _process_response. This will raise a NameError at runtime when this code path is executed. It should be changed to response.headers.
| if headers.get("x-goog-stored-content-encoding") != "gzip": | |
| if response.headers.get("x-goog-stored-content-encoding") != "gzip": |
| if response is not None and not media_helpers._is_decompressive_transcoding( | ||
| response, download._get_headers | ||
| ): |
There was a problem hiding this comment.
The method download._get_headers is passed as a reference instead of being called. Since _is_decompressive_transcoding expects a dictionary of headers, this will raise an error at runtime when response is not None. It should be called as download._get_headers().
The unit test test__do_download_log_extra_bytes did not catch this because download.finished is a Mock object, which makes not download.finished evaluate to False. As a result, the while loop is skipped, response remains None, and the condition response is not None prevents _is_decompressive_transcoding from being called.
| if response is not None and not media_helpers._is_decompressive_transcoding( | |
| response, download._get_headers | |
| ): | |
| if response is not None and not media_helpers._is_decompressive_transcoding( | |
| response, download._get_headers() | |
| ): |
| bucket_name = getattr(self, "client_info_bucket_name", "unknown") | ||
| object_name = getattr(self, "client_info_object_name", "unknown") |
There was a problem hiding this comment.
Since self.client_info_bucket_name and self.client_info_object_name are initialized in __init__ (and thus always exist on self), getattr(self, ..., "unknown") will return None if they were passed as None, rather than falling back to "unknown". To ensure they fall back to "unknown" when they are None, you should use or "unknown".
| bucket_name = getattr(self, "client_info_bucket_name", "unknown") | |
| object_name = getattr(self, "client_info_object_name", "unknown") | |
| bucket_name = getattr(self, "client_info_bucket_name", None) or "unknown" | |
| object_name = getattr(self, "client_info_object_name", None) or "unknown" |
| requested_length = None | ||
| if start is not None and end is not None: | ||
| requested_length = end - start + 1 | ||
| elif start is None and end is not None: | ||
| requested_length = end + 1 |
There was a problem hiding this comment.
For consistency with the logic in _media/_download.py, blob.py should also handle the case where start is negative and end is None (which requests the last -start bytes of the blob).
| requested_length = None | |
| if start is not None and end is not None: | |
| requested_length = end - start + 1 | |
| elif start is None and end is not None: | |
| requested_length = end + 1 | |
| requested_length = None | |
| if start is not None and start < 0 and end is None: | |
| requested_length = -start | |
| elif start is not None and end is not None: | |
| requested_length = end - start + 1 | |
| elif start is None and end is not None: | |
| requested_length = end + 1 |
| download = patched.return_value | ||
| download._bytes_downloaded = 10 | ||
|
|
||
| mock_response = mock.Mock() | ||
| mock_response.headers = {} | ||
| download.consume.return_value = mock_response | ||
| download._get_headers.return_value = {} |
There was a problem hiding this comment.
The test does not actually execute the while not download.finished: loop because download.finished is a Mock object (which is truthy, making not download.finished evaluate to False). Additionally, the test mocks download.consume instead of download.consume_next_chunk.
To make the test effective and ensure the loop runs once, you should mock download.finished to return False then True, and mock download.consume_next_chunk instead of download.consume.
| download = patched.return_value | |
| download._bytes_downloaded = 10 | |
| mock_response = mock.Mock() | |
| mock_response.headers = {} | |
| download.consume.return_value = mock_response | |
| download._get_headers.return_value = {} | |
| download = patched.return_value | |
| download._bytes_downloaded = 10 | |
| type(download).finished = mock.PropertyMock(side_effect=[False, True]) | |
| mock_response = mock.Mock() | |
| mock_response.headers = {} | |
| download.consume_next_chunk.return_value = mock_response | |
| download._get_headers.return_value = {} |
It has been found that GCS can occasionally send additional bytes while reading from stream. This scenario should be logged properly for debugging and tracking purposes. Fixes: 475824752 [Generated-by: AI]
d7a9e73 to
ad8a555
Compare
|
Addressed all PR review comments. |
It has been found that GCS can occasionally send additional bytes while reading from stream. This scenario should be logged properly for debugging and tracking purposes.
Fixes: 475824752