chore(storage): add pytest for checksum overhead benchmarking#17422
chore(storage): add pytest for checksum overhead benchmarking#17422chandra-siri wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new microbenchmark script, test_checksum_overhead.py, to measure the performance overhead of checksum validation during range downloads from Google Cloud Storage. The feedback focuses on improving the robustness and efficiency of the benchmark: avoiding hardcoded personal bucket defaults, skipping tests when the required environment variables are missing, making the long warmup duration configurable to speed up test execution, and properly resetting the global event loop state after closing to prevent side effects on subsequent tests.
| AsyncMultiRangeDownloader, | ||
| ) | ||
|
|
||
| DEFAULT_BUCKET = os.environ.get("DEFAULT_RAPID_ZONAL_BUCKET", "chandrasiri-gcsfs-zb") |
There was a problem hiding this comment.
Hardcoding a specific personal bucket (chandrasiri-gcsfs-zb) as a default is a maintainability and security risk. If other developers run the benchmark without setting the environment variable, it will attempt to use this bucket, which they likely do not have access to. It is better to default to None and skip the test if the environment variable is not set.
| DEFAULT_BUCKET = os.environ.get("DEFAULT_RAPID_ZONAL_BUCKET", "chandrasiri-gcsfs-zb") | |
| DEFAULT_BUCKET = os.environ.get("DEFAULT_RAPID_ZONAL_BUCKET") |
| if not enable_checksum and download_size == object_size - 1: | ||
| pytest.skip("Skip Full-1 range download when checksum is disabled") |
There was a problem hiding this comment.
Skip the benchmark if the DEFAULT_RAPID_ZONAL_BUCKET environment variable is not configured, rather than attempting to run with a None bucket name.
| if not enable_checksum and download_size == object_size - 1: | |
| pytest.skip("Skip Full-1 range download when checksum is disabled") | |
| if not DEFAULT_BUCKET: | |
| pytest.skip("DEFAULT_RAPID_ZONAL_BUCKET environment variable is not set") | |
| if not enable_checksum and download_size == object_size - 1: | |
| pytest.skip("Skip Full-1 range download when checksum is disabled") |
References
- Use
pytest.skip()to skip an entire test. To skip only a portion of a test (e.g., a final verification step), use a conditionalreturninstead, aspytest.skip()would mark the entire test as skipped.
| ) | ||
|
|
||
| DEFAULT_BUCKET = os.environ.get("DEFAULT_RAPID_ZONAL_BUCKET", "chandrasiri-gcsfs-zb") | ||
| DEFAULT_ROUNDS = int(os.environ.get("BENCHMARK_ROUNDS", "5")) |
There was a problem hiding this comment.
Add a configurable warmup duration constant to avoid hardcoding the 10-second warmup period, which makes the test suite run extremely slowly by default.
| DEFAULT_ROUNDS = int(os.environ.get("BENCHMARK_ROUNDS", "5")) | |
| DEFAULT_ROUNDS = int(os.environ.get("BENCHMARK_ROUNDS", "5")) | |
| DEFAULT_WARMUP_SEC = float(os.environ.get("BENCHMARK_WARMUP_SEC", "2.0")) |
| # 2. warmup | ||
| warmup_start = time.perf_counter() | ||
| warmup_chunk_size = min(10 * 1024 * 1024, object_size) | ||
| while time.perf_counter() - warmup_start < 10.0: |
There was a problem hiding this comment.
Use the configurable warmup duration instead of a hardcoded 10.0 seconds. A 10-second warmup per parameterized test case (24 combinations) results in at least 4 minutes of warmup alone, which is highly inefficient.
| while time.perf_counter() - warmup_start < 10.0: | |
| while time.perf_counter() - warmup_start < DEFAULT_WARMUP_SEC: |
| task.cancel() | ||
| if tasks: | ||
| loop.run_until_complete(asyncio.gather(*tasks, return_exceptions=True)) | ||
| loop.close() |
There was a problem hiding this comment.
After closing the event loop, the global/thread-local event loop state is not reset. This can cause subsequent tests or async operations in the same thread to fail with RuntimeError: Event loop is closed when they attempt to get the current event loop. Calling asyncio.set_event_loop(None) after closing the loop ensures a clean state.
| loop.close() | |
| loop.close() | |
| asyncio.set_event_loop(None) |
|
@chandra-siri, Please could you address the feedback from |
Add a pytest file to run checksum overhead benchmarking.