feat: implement Approach A Python-native profiler and baseline traces#17379
feat: implement Approach A Python-native profiler and baseline traces#17379hebaalazzeh wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a Python SDK Import Profiler tool under scripts/import_profiler/, including a master/worker benchmarking script (profiler.py), documentation, a project plan, and status tracking. Feedback on these changes highlights two main areas of improvement: replacing absolute local file paths with relative paths in the markdown documentation (documentation.md and status.md) to prevent broken links, and refactoring the manual command-line argument parsing in profiler.py to use the standard argparse module for robustness.
| The profiling tool is located in the [scripts/import_profiler/](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/) directory: | ||
|
|
||
| * **[profiler.py](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/profiler.py)**: The core executable script. It is designed as a single-file, self-spawning harness that performs process-isolated importing benchmarks and generates trace logs. | ||
| * **[architecture.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/architecture.md)**: Describes the design pattern, Master/Worker model, and process-isolation details. | ||
| * **[plan.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/plan.md)**: The current project phases and roadmap checklist. | ||
| * **[status.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/status.md)**: Tracks the active task state and hosts recorded baseline performance metrics. | ||
| * **[iteration_comparison_report.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/iteration_comparison_report.md)**: A management/engineering brief comparing statistics (Median, Mean, Standard Deviation, Standard Error) across different sample sizes ($N$ runs) to show diminishing returns and justify the chosen sample count. | ||
| * **[style_guide.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/style_guide.md)**: Development conventions for writing clean, low-overhead performance scripts. |
There was a problem hiding this comment.
The markdown links use absolute local file paths (file:///usr/local/google/home/...) which are specific to your local machine and will be broken for other users or when viewed on GitHub. Please use relative paths instead.
| The profiling tool is located in the [scripts/import_profiler/](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/) directory: | |
| * **[profiler.py](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/profiler.py)**: The core executable script. It is designed as a single-file, self-spawning harness that performs process-isolated importing benchmarks and generates trace logs. | |
| * **[architecture.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/architecture.md)**: Describes the design pattern, Master/Worker model, and process-isolation details. | |
| * **[plan.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/plan.md)**: The current project phases and roadmap checklist. | |
| * **[status.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/status.md)**: Tracks the active task state and hosts recorded baseline performance metrics. | |
| * **[iteration_comparison_report.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/iteration_comparison_report.md)**: A management/engineering brief comparing statistics (Median, Mean, Standard Deviation, Standard Error) across different sample sizes ($N$ runs) to show diminishing returns and justify the chosen sample count. | |
| * **[style_guide.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/style_guide.md)**: Development conventions for writing clean, low-overhead performance scripts. | |
| The profiling tool is located in the [scripts/import_profiler/](./) directory: | |
| * **[profiler.py](./profiler.py)**: The core executable script. It is designed as a single-file, self-spawning harness that performs process-isolated importing benchmarks and generates trace logs. | |
| * **[architecture.md](./architecture.md)**: Describes the design pattern, Master/Worker model, and process-isolation details. | |
| * **[plan.md](./plan.md)**: The current project phases and roadmap checklist. | |
| * **[status.md](./status.md)**: Tracks the active task state and hosts recorded baseline performance metrics. | |
| * **[iteration_comparison_report.md](./iteration_comparison_report.md)**: A management/engineering brief comparing statistics (Median, Mean, Standard Deviation, Standard Error) across different sample sizes ($N$ runs) to show diminishing returns and justify the chosen sample count. | |
| * **[style_guide.md](./style_guide.md)**: Development conventions for writing clean, low-overhead performance scripts. |
| if __name__ == "__main__": | ||
| # Parse CLI arguments | ||
| target_module = "google.cloud.compute" | ||
| iterations = 50 | ||
| trace = False | ||
|
|
||
| for arg in sys.argv[1:]: | ||
| if arg.startswith("--module="): | ||
| target_module = arg.split("=")[1] | ||
| elif arg.startswith("--iterations="): | ||
| iterations = int(arg.split("=")[1]) | ||
| elif arg == "--trace": | ||
| trace = True | ||
|
|
||
| if "--worker" in sys.argv: | ||
| run_worker(target_module) | ||
| elif trace: | ||
| run_trace(target_module) | ||
| else: | ||
| run_master(iterations, target_module) No newline at end of file |
There was a problem hiding this comment.
Manual parsing of sys.argv is fragile and doesn't handle standard CLI conventions well (such as space-separated arguments like --iterations 10 or displaying help messages). Using the standard library's argparse module provides a much more robust, self-documenting, and standard way to handle command-line arguments.
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Python SDK Import Profiler")
parser.add_argument("--module", default="google.cloud.compute", help="Target module to profile")
parser.add_argument("--iterations", type=int, default=50, help="Number of iterations")
parser.add_argument("--trace", action="store_true", help="Generate importtime trace log")
parser.add_argument("--worker", action="store_true", help="Run as worker process")
args = parser.parse_args()
if args.worker:
run_worker(args.module)
elif args.trace:
run_trace(args.module)
else:
run_master(args.iterations, args.module)| * Generated: [import_trace_google_cloud_compute.log](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/import_trace_google_cloud_compute.log) | ||
| * Generated: [import_trace_google_cloud_aiplatform.log](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/import_trace_google_cloud_aiplatform.log) |
There was a problem hiding this comment.
These links also use absolute local file paths. Please update them to use relative paths so they work correctly for other developers and on GitHub.
| * Generated: [import_trace_google_cloud_compute.log](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/import_trace_google_cloud_compute.log) | |
| * Generated: [import_trace_google_cloud_aiplatform.log](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/import_trace_google_cloud_aiplatform.log) | |
| * Generated: [import_trace_google_cloud_compute.log](./import_trace_google_cloud_compute.log) | |
| * Generated: [import_trace_google_cloud_aiplatform.log](./import_trace_google_cloud_aiplatform.log) |
Introduces profiler harness targeting SDK load times and peak memory (RAM) footprint.
Key Changes:
profiler.pyusingtracemallocandtime.perf_counterwith process-isolation (subprocess master/worker design) to bypass interpreter cache.--tracecapability to dump detailed hierarchical import times via Python's-X importtimeflag.google-cloud-computeandgoogle-cloud-aiplatform.documentation.mdfile.scripts/import_profiler/to maintain a minimal layout.