Skip to content

chore(storage): add pytest for checksum overhead benchmarking#17422

Open
chandra-siri wants to merge 1 commit into
mainfrom
perf-test-checksum-overhead
Open

chore(storage): add pytest for checksum overhead benchmarking#17422
chandra-siri wants to merge 1 commit into
mainfrom
perf-test-checksum-overhead

Conversation

@chandra-siri

Copy link
Copy Markdown
Contributor

Add a pytest file to run checksum overhead benchmarking.

@chandra-siri chandra-siri requested a review from a team as a code owner June 11, 2026 05:44
@chandra-siri chandra-siri changed the title chore: add pytest for checksum overhead benchmarking chore(storage): add pytest for checksum overhead benchmarking Jun 11, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new microbenchmark script, test_checksum_overhead.py, to measure the performance overhead of checksum validation during range downloads from Google Cloud Storage. The feedback focuses on improving the robustness and efficiency of the benchmark: avoiding hardcoded personal bucket defaults, skipping tests when the required environment variables are missing, making the long warmup duration configurable to speed up test execution, and properly resetting the global event loop state after closing to prevent side effects on subsequent tests.

AsyncMultiRangeDownloader,
)

DEFAULT_BUCKET = os.environ.get("DEFAULT_RAPID_ZONAL_BUCKET", "chandrasiri-gcsfs-zb")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding a specific personal bucket (chandrasiri-gcsfs-zb) as a default is a maintainability and security risk. If other developers run the benchmark without setting the environment variable, it will attempt to use this bucket, which they likely do not have access to. It is better to default to None and skip the test if the environment variable is not set.

Suggested change
DEFAULT_BUCKET = os.environ.get("DEFAULT_RAPID_ZONAL_BUCKET", "chandrasiri-gcsfs-zb")
DEFAULT_BUCKET = os.environ.get("DEFAULT_RAPID_ZONAL_BUCKET")

Comment on lines +133 to +134
if not enable_checksum and download_size == object_size - 1:
pytest.skip("Skip Full-1 range download when checksum is disabled")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Skip the benchmark if the DEFAULT_RAPID_ZONAL_BUCKET environment variable is not configured, rather than attempting to run with a None bucket name.

Suggested change
if not enable_checksum and download_size == object_size - 1:
pytest.skip("Skip Full-1 range download when checksum is disabled")
if not DEFAULT_BUCKET:
pytest.skip("DEFAULT_RAPID_ZONAL_BUCKET environment variable is not set")
if not enable_checksum and download_size == object_size - 1:
pytest.skip("Skip Full-1 range download when checksum is disabled")
References
  1. Use pytest.skip() to skip an entire test. To skip only a portion of a test (e.g., a final verification step), use a conditional return instead, as pytest.skip() would mark the entire test as skipped.

)

DEFAULT_BUCKET = os.environ.get("DEFAULT_RAPID_ZONAL_BUCKET", "chandrasiri-gcsfs-zb")
DEFAULT_ROUNDS = int(os.environ.get("BENCHMARK_ROUNDS", "5"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add a configurable warmup duration constant to avoid hardcoding the 10-second warmup period, which makes the test suite run extremely slowly by default.

Suggested change
DEFAULT_ROUNDS = int(os.environ.get("BENCHMARK_ROUNDS", "5"))
DEFAULT_ROUNDS = int(os.environ.get("BENCHMARK_ROUNDS", "5"))
DEFAULT_WARMUP_SEC = float(os.environ.get("BENCHMARK_WARMUP_SEC", "2.0"))

# 2. warmup
warmup_start = time.perf_counter()
warmup_chunk_size = min(10 * 1024 * 1024, object_size)
while time.perf_counter() - warmup_start < 10.0:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the configurable warmup duration instead of a hardcoded 10.0 seconds. A 10-second warmup per parameterized test case (24 combinations) results in at least 4 minutes of warmup alone, which is highly inefficient.

Suggested change
while time.perf_counter() - warmup_start < 10.0:
while time.perf_counter() - warmup_start < DEFAULT_WARMUP_SEC:

task.cancel()
if tasks:
loop.run_until_complete(asyncio.gather(*tasks, return_exceptions=True))
loop.close()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

After closing the event loop, the global/thread-local event loop state is not reset. This can cause subsequent tests or async operations in the same thread to fail with RuntimeError: Event loop is closed when they attempt to get the current event loop. Calling asyncio.set_event_loop(None) after closing the loop ensures a clean state.

Suggested change
loop.close()
loop.close()
asyncio.set_event_loop(None)

@parthea

parthea commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@chandra-siri, Please could you address the feedback from gemini-code-assist?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants