perf: add microbenchmarks for crc32c and MRD reads#17392
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces two new microbenchmark scripts: "benchmark_crc32c.py" to measure the execution time of "google_crc32c.value", and "benchmark_mrd_reads.py" to benchmark GCS Object Range Downloads using "AsyncMultiRangeDownloader" (MRD) with and without checksums. Feedback on "benchmark_mrd_reads.py" highlights two potential issues: hardcoding the object size to 10 GiB (which can cause out-of-range errors if a smaller object is specified) and a potential crash in "random.randint" if the requested download size exceeds the object size. Suggestions were provided to dynamically fetch the object size and add a guard clause.
| # 10 GiB in bytes | ||
| object_size_bytes = 10 * 1024 * 1024 * 1024 | ||
|
|
||
| grpc_client = AsyncGrpcClient() |
There was a problem hiding this comment.
Hardcoding object_size_bytes to 10 GiB breaks the benchmark when a different --object is specified via the command line. If the specified object is smaller than 10 GiB, generating random offsets up to 10 GiB will result in out-of-range errors during download.
To fix this, dynamically retrieve the actual object size using the grpc_client.
| # 10 GiB in bytes | |
| object_size_bytes = 10 * 1024 * 1024 * 1024 | |
| grpc_client = AsyncGrpcClient() | |
| grpc_client = AsyncGrpcClient() | |
| try: | |
| object_metadata = await grpc_client.get_object(args.bucket, args.object) | |
| object_size_bytes = object_metadata.size | |
| except Exception as e: | |
| print(f"Warning: Could not retrieve object metadata ({e}). Defaulting to 10GiB.", file=sys.stderr) | |
| object_size_bytes = 10 * 1024 * 1024 * 1024 |
| for size_str, size_bytes in sizes_to_test: | ||
| # Pre-generate random offsets so that both Enabled and Disabled configurations run on the exact same offsets | ||
| offsets = [random.randint(0, object_size_bytes - size_bytes) for _ in range(args.iterations)] |
There was a problem hiding this comment.
If size_bytes is larger than object_size_bytes, random.randint(0, object_size_bytes - size_bytes) will raise a ValueError: empty range for randrange() and crash the script.
Adding a guard clause to skip sizes larger than the object size makes the benchmark much more robust.
for size_str, size_bytes in sizes_to_test:
if size_bytes > object_size_bytes:
print(f"Skipping size {size_str} ({size_bytes} bytes) because it is larger than the object size ({object_size_bytes} bytes).", file=sys.stderr)
continue
# Pre-generate random offsets so that both Enabled and Disabled configurations run on the exact same offsets
offsets = [random.randint(0, object_size_bytes - size_bytes) for _ in range(args.iterations)]…r progress output
Benchmarks for crc32c execution time and MRD reads with checksum enabled vs disabled.