Skip to content

feat(storage): log additional bytes received from GCS in read path#17423

Draft
nidhiii-27 wants to merge 2 commits into
mainfrom
feat/log-additional-bytes
Draft

feat(storage): log additional bytes received from GCS in read path#17423
nidhiii-27 wants to merge 2 commits into
mainfrom
feat/log-additional-bytes

Conversation

@nidhiii-27

Copy link
Copy Markdown
Contributor

It has been found that GCS can occasionally send additional bytes while reading from stream. This scenario should be logged properly for debugging and tracking purposes.

Fixes: 475824752

@nidhiii-27 nidhiii-27 requested a review from a team as a code owner June 11, 2026 09:40

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces warning logs when more bytes are downloaded than requested from GCS, implemented in both _download.py and blob.py, along with a new unit test. The review feedback highlights several critical and medium-severity issues: an undefined headers variable in _download.py that will cause a NameError, a missing function call to _get_headers in blob.py that will fail at runtime, and a flawed unit test that does not actually execute the download loop due to incorrect mocking of download.finished and consume_next_chunk. Additionally, suggestions are provided to ensure fallback values are correctly applied when bucket or object names are None, and to align the range-handling logic in blob.py with _download.py for consistency.

requested_length = self.end + 1

if requested_length is not None and self._bytes_downloaded > requested_length:
if headers.get("x-goog-stored-content-encoding") != "gzip":

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable headers is not defined in the scope of _process_response. This will raise a NameError at runtime when this code path is executed. It should be changed to response.headers.

Suggested change
if headers.get("x-goog-stored-content-encoding") != "gzip":
if response.headers.get("x-goog-stored-content-encoding") != "gzip":

Comment on lines +1138 to +1140
if response is not None and not media_helpers._is_decompressive_transcoding(
response, download._get_headers
):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The method download._get_headers is passed as a reference instead of being called. Since _is_decompressive_transcoding expects a dictionary of headers, this will raise an error at runtime when response is not None. It should be called as download._get_headers().

The unit test test__do_download_log_extra_bytes did not catch this because download.finished is a Mock object, which makes not download.finished evaluate to False. As a result, the while loop is skipped, response remains None, and the condition response is not None prevents _is_decompressive_transcoding from being called.

Suggested change
if response is not None and not media_helpers._is_decompressive_transcoding(
response, download._get_headers
):
if response is not None and not media_helpers._is_decompressive_transcoding(
response, download._get_headers()
):

Comment on lines +507 to +508
bucket_name = getattr(self, "client_info_bucket_name", "unknown")
object_name = getattr(self, "client_info_object_name", "unknown")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since self.client_info_bucket_name and self.client_info_object_name are initialized in __init__ (and thus always exist on self), getattr(self, ..., "unknown") will return None if they were passed as None, rather than falling back to "unknown". To ensure they fall back to "unknown" when they are None, you should use or "unknown".

Suggested change
bucket_name = getattr(self, "client_info_bucket_name", "unknown")
object_name = getattr(self, "client_info_object_name", "unknown")
bucket_name = getattr(self, "client_info_bucket_name", None) or "unknown"
object_name = getattr(self, "client_info_object_name", None) or "unknown"

Comment on lines +1127 to +1131
requested_length = None
if start is not None and end is not None:
requested_length = end - start + 1
elif start is None and end is not None:
requested_length = end + 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the logic in _media/_download.py, blob.py should also handle the case where start is negative and end is None (which requests the last -start bytes of the blob).

Suggested change
requested_length = None
if start is not None and end is not None:
requested_length = end - start + 1
elif start is None and end is not None:
requested_length = end + 1
requested_length = None
if start is not None and start < 0 and end is None:
requested_length = -start
elif start is not None and end is not None:
requested_length = end - start + 1
elif start is None and end is not None:
requested_length = end + 1

Comment on lines +1391 to +1397
download = patched.return_value
download._bytes_downloaded = 10

mock_response = mock.Mock()
mock_response.headers = {}
download.consume.return_value = mock_response
download._get_headers.return_value = {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test does not actually execute the while not download.finished: loop because download.finished is a Mock object (which is truthy, making not download.finished evaluate to False). Additionally, the test mocks download.consume instead of download.consume_next_chunk.

To make the test effective and ensure the loop runs once, you should mock download.finished to return False then True, and mock download.consume_next_chunk instead of download.consume.

Suggested change
download = patched.return_value
download._bytes_downloaded = 10
mock_response = mock.Mock()
mock_response.headers = {}
download.consume.return_value = mock_response
download._get_headers.return_value = {}
download = patched.return_value
download._bytes_downloaded = 10
type(download).finished = mock.PropertyMock(side_effect=[False, True])
mock_response = mock.Mock()
mock_response.headers = {}
download.consume_next_chunk.return_value = mock_response
download._get_headers.return_value = {}

It has been found that GCS can occasionally send additional bytes while reading from stream.
This scenario should be logged properly for debugging and tracking purposes.

Fixes: 475824752

[Generated-by: AI]
@nidhiii-27 nidhiii-27 force-pushed the feat/log-additional-bytes branch from d7a9e73 to ad8a555 Compare June 11, 2026 10:23
@nidhiii-27

Copy link
Copy Markdown
Contributor Author

Addressed all PR review comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant