Skip to content

Prototype pep 0810 lazy imports#17544

Draft
hebaalazzeh wants to merge 29 commits into
mainfrom
prototype-pep-0810-lazy-imports
Draft

Prototype pep 0810 lazy imports#17544
hebaalazzeh wants to merge 29 commits into
mainfrom
prototype-pep-0810-lazy-imports

Conversation

@hebaalazzeh

Copy link
Copy Markdown
Contributor

Overview

This PR resolves the severe initialization bottlenecks (~10s-13s) experienced by customers in serverless environments (Cloud Run/Functions) caused by eager loading cascades across generated service clients and protobuf types.
Rather than implementing custom __getattr__ hacks that break static analyzers (MyPy) and IDE autocompletion, this PR adopts PEP 0810 (Explicit Lazy Imports) natively for Python 3.15.

Implementation Architecture

We utilize the official PEP 0810 __lazy_modules__ list declaration. On Python 3.15+, this tells the interpreter which modules should be deferred. The standard eager imports follow immediately below it so that older Python versions simply ignore the list and load the clients eagerly, guaranteeing 100% backwards compatibility without needing hacky version guards.

# 1. Define the module paths that should be treated as lazy in Python 3.15+
__lazy_modules__ = [
    f"{__spec__.parent}.services.accelerator_types",
    f"{__spec__.parent}.services.addresses",
    f"{__spec__.parent}.services.autoscalers",
    # ... mapped to all services
]
# 2. Expose the imports normally at the top-level.
from .services.accelerator_types import AcceleratorTypesClient
from .services.addresses import AddressesClient
from .services.autoscalers import AutoscalersClient
# ... all standard eager imports
__all__ = (
    "AcceleratorTypesClient",
    "AddressesClient",
    "AutoscalersClient",
    # ...
)

Related Links:

Design Doc: go/sdk-performance-design
Towards googleapis/python-aiplatform#4749

hebaalazzeh and others added 29 commits June 15, 2026 21:27
- Add fallback to handle single-iteration quantile calculation crashes
- Normalize .pyc file paths during line counting and log exceptions
- Expose worker process stderr to facilitate debugging CalledProcessError
- Fix absolute paths in documentation.md to use relative directory paths
…g, and encodings

- Use importlib.util.source_from_cache for robust .pyc resolution
- Move importlib.util and logging imports to module level
- Refine json.loads to parse only the last line of stdout
- Specify UTF-8 encoding when opening files for writing
- Extract worker logic to _run_worker_and_parse for robust JSON parsing
- Add early validation for iterations parameter in run_master
- Add non-zero exit code check and warning for run_trace failures
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- Execute cProfile in a fresh subprocess to ensure cold-start accuracy
- Execute tracemalloc in a fresh subprocess to capture all memory allocations
- Clean up master process exception handling for missing taskset vs crashed workers
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- Use json.dumps to safely escape target_module string in all subprocess commands (mprofile, cprofile, trace)
- Replace detailed implementation architecture and alternative solutions with a concise high-level usage and expected output format per PR review feedback
- Read /proc/self/statm inside worker before and after import
- Exposes pure C-extension memory allocations (e.g. grpc/protobuf) missed by Python's internal tracemalloc
- Requires zero dependencies, bypassing the need for psutil
- Recursively wipes __pycache__ and .pyc files via pathlib prior to iteration loops.
- Employs Python's -B flag on worker subprocesses to prevent re-caching during the benchmark, enforcing raw .py parsing for every execution.
- Replaces --clear-pycache with an opt-out --keep-pycache flag
- The profiler now automatically sweeps and enforces disk-level cold starts without requiring explicit flags from the user.
Address PR review comment by moving setup tracking (rss_before, modules_before) outside of the time.perf_counter() window to prevent artificial latency inflation.
Addresses PR feedback to avoid broad exception catching when parsing physical python files for line counts.
Addresses PR review by explicitly documenting the expected PEP 3147/488 error path for irregular .pyc files.
…ssing modules

Addresses PR review by replacing null-checks with a cleaner try/except block that assumes normal module behavior and explicitly logs when a C-extension or built-in is skipped due to a missing __file__ attribute.
Addresses PR review by replacing the 'none' string check with a typed NO_CPU_PINNING = -1 constant for cleaner argument parsing and state tracking.
…askset

Addresses PR review by removing the fallback unpinned execution logic when taskset is not installed. The script now fails immediately with a clear error message, preventing code duplication.
Addresses PR review by collapsing the repeated statistics.quantiles logic into a reusable _calculate_percentiles helper method.
…terations

Addresses PR review by validating that 'loaded_modules' and 'loaded_lines' remain deterministic across cold-start iterations, warning the user if non-deterministic import paths are triggered.
… tracemalloc profiling

Addresses PR review by replacing the hard-to-maintain dynamic code string inside run_mprofile with a properly structured multiprocessing 'spawn' context.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK Import Profiler tool, which includes a detailed documentation guide (documentation.md) and a core profiling script (profiler.py) designed to benchmark import latency, memory usage, and code volume. The feedback suggests a minor improvement to remove a redundant local import of tracemalloc inside the _mprofile_worker function, as it is already imported at the module level.

Comment on lines +269 to +278
def _mprofile_worker(target_module):
import tracemalloc
tracemalloc.start()
importlib.import_module(target_module)
snapshot = tracemalloc.take_snapshot()
tracemalloc.stop()
top_stats = snapshot.statistics('lineno')
print("\n--- Top 15 memory allocations by line ---")
for stat in top_stats[:15]:
print(stat)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Do not import modules or classes inside functions if they are already imported at the module level. tracemalloc is already imported at the top of the file (line 6), so the local import inside _mprofile_worker is redundant and should be removed.

Suggested change
def _mprofile_worker(target_module):
import tracemalloc
tracemalloc.start()
importlib.import_module(target_module)
snapshot = tracemalloc.take_snapshot()
tracemalloc.stop()
top_stats = snapshot.statistics('lineno')
print("\n--- Top 15 memory allocations by line ---")
for stat in top_stats[:15]:
print(stat)
def _mprofile_worker(target_module):
tracemalloc.start()
importlib.import_module(target_module)
snapshot = tracemalloc.take_snapshot()
tracemalloc.stop()
top_stats = snapshot.statistics('lineno')
print("\n--- Top 15 memory allocations by line ---")
for stat in top_stats[:15]:
print(stat)
References
  1. Do not import modules or classes inside functions if they are already imported at the module level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant