Skip to content

feat: implement Approach A Python-native profiler and baseline traces#17379

Draft
hebaalazzeh wants to merge 2 commits into
mainfrom
feat/import-profiler-native
Draft

feat: implement Approach A Python-native profiler and baseline traces#17379
hebaalazzeh wants to merge 2 commits into
mainfrom
feat/import-profiler-native

Conversation

@hebaalazzeh
Copy link
Copy Markdown
Contributor

Introduces profiler harness targeting SDK load times and peak memory (RAM) footprint.

Key Changes:

  • Finalized and implemented Approach A (Pure Python-Native Profiling) inside profiler.py using tracemalloc and time.perf_counter with process-isolation (subprocess master/worker design) to bypass interpreter cache.
  • Added a --trace capability to dump detailed hierarchical import times via Python's -X importtime flag.
  • Generated baseline trace logs for google-cloud-compute and google-cloud-aiplatform.
  • Consolidated architecture, baseline comparison metrics, and trace guide details into a single documentation.md file.
  • Cleaned up redundant markdown files in scripts/import_profiler/ to maintain a minimal layout.
  • go/sdk-performance-design

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK Import Profiler tool under scripts/import_profiler/, including a master/worker benchmarking script (profiler.py), documentation, a project plan, and status tracking. Feedback on these changes highlights two main areas of improvement: replacing absolute local file paths with relative paths in the markdown documentation (documentation.md and status.md) to prevent broken links, and refactoring the manual command-line argument parsing in profiler.py to use the standard argparse module for robustness.

Comment on lines +8 to +15
The profiling tool is located in the [scripts/import_profiler/](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/) directory:

* **[profiler.py](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/profiler.py)**: The core executable script. It is designed as a single-file, self-spawning harness that performs process-isolated importing benchmarks and generates trace logs.
* **[architecture.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/architecture.md)**: Describes the design pattern, Master/Worker model, and process-isolation details.
* **[plan.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/plan.md)**: The current project phases and roadmap checklist.
* **[status.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/status.md)**: Tracks the active task state and hosts recorded baseline performance metrics.
* **[iteration_comparison_report.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/iteration_comparison_report.md)**: A management/engineering brief comparing statistics (Median, Mean, Standard Deviation, Standard Error) across different sample sizes ($N$ runs) to show diminishing returns and justify the chosen sample count.
* **[style_guide.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/style_guide.md)**: Development conventions for writing clean, low-overhead performance scripts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The markdown links use absolute local file paths (file:///usr/local/google/home/...) which are specific to your local machine and will be broken for other users or when viewed on GitHub. Please use relative paths instead.

Suggested change
The profiling tool is located in the [scripts/import_profiler/](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/) directory:
* **[profiler.py](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/profiler.py)**: The core executable script. It is designed as a single-file, self-spawning harness that performs process-isolated importing benchmarks and generates trace logs.
* **[architecture.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/architecture.md)**: Describes the design pattern, Master/Worker model, and process-isolation details.
* **[plan.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/plan.md)**: The current project phases and roadmap checklist.
* **[status.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/status.md)**: Tracks the active task state and hosts recorded baseline performance metrics.
* **[iteration_comparison_report.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/iteration_comparison_report.md)**: A management/engineering brief comparing statistics (Median, Mean, Standard Deviation, Standard Error) across different sample sizes ($N$ runs) to show diminishing returns and justify the chosen sample count.
* **[style_guide.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/style_guide.md)**: Development conventions for writing clean, low-overhead performance scripts.
The profiling tool is located in the [scripts/import_profiler/](./) directory:
* **[profiler.py](./profiler.py)**: The core executable script. It is designed as a single-file, self-spawning harness that performs process-isolated importing benchmarks and generates trace logs.
* **[architecture.md](./architecture.md)**: Describes the design pattern, Master/Worker model, and process-isolation details.
* **[plan.md](./plan.md)**: The current project phases and roadmap checklist.
* **[status.md](./status.md)**: Tracks the active task state and hosts recorded baseline performance metrics.
* **[iteration_comparison_report.md](./iteration_comparison_report.md)**: A management/engineering brief comparing statistics (Median, Mean, Standard Deviation, Standard Error) across different sample sizes ($N$ runs) to show diminishing returns and justify the chosen sample count.
* **[style_guide.md](./style_guide.md)**: Development conventions for writing clean, low-overhead performance scripts.

Comment on lines +80 to +99
if __name__ == "__main__":
# Parse CLI arguments
target_module = "google.cloud.compute"
iterations = 50
trace = False

for arg in sys.argv[1:]:
if arg.startswith("--module="):
target_module = arg.split("=")[1]
elif arg.startswith("--iterations="):
iterations = int(arg.split("=")[1])
elif arg == "--trace":
trace = True

if "--worker" in sys.argv:
run_worker(target_module)
elif trace:
run_trace(target_module)
else:
run_master(iterations, target_module) No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Manual parsing of sys.argv is fragile and doesn't handle standard CLI conventions well (such as space-separated arguments like --iterations 10 or displaying help messages). Using the standard library's argparse module provides a much more robust, self-documenting, and standard way to handle command-line arguments.

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(description="Python SDK Import Profiler")
    parser.add_argument("--module", default="google.cloud.compute", help="Target module to profile")
    parser.add_argument("--iterations", type=int, default=50, help="Number of iterations")
    parser.add_argument("--trace", action="store_true", help="Generate importtime trace log")
    parser.add_argument("--worker", action="store_true", help="Run as worker process")
    args = parser.parse_args()

    if args.worker:
        run_worker(args.module)
    elif args.trace:
        run_trace(args.module)
    else:
        run_master(args.iterations, args.module)

Comment on lines +14 to +15
* Generated: [import_trace_google_cloud_compute.log](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/import_trace_google_cloud_compute.log)
* Generated: [import_trace_google_cloud_aiplatform.log](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/import_trace_google_cloud_aiplatform.log)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These links also use absolute local file paths. Please update them to use relative paths so they work correctly for other developers and on GitHub.

Suggested change
* Generated: [import_trace_google_cloud_compute.log](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/import_trace_google_cloud_compute.log)
* Generated: [import_trace_google_cloud_aiplatform.log](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/import_trace_google_cloud_aiplatform.log)
* Generated: [import_trace_google_cloud_compute.log](./import_trace_google_cloud_compute.log)
* Generated: [import_trace_google_cloud_aiplatform.log](./import_trace_google_cloud_aiplatform.log)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant