feat: add import profiler tool with dynamic loaded lines code volume … by hebaalazzeh · Pull Request #17467 · googleapis/google-cloud-python

hebaalazzeh · 2026-06-15T21:31:01Z

This PR introduces profiler.py, a standalone, process-isolated import profiling tool designed to benchmark Python modules without dynamic cache interference.

To evaluate accurate import load times without observer effect, the profiler runs using a Master/Worker architecture to execute each import in a completely isolated sub-process. This ensures a 100% "cold start" evaluation.

Key Features / Telemetry Vectors:

Initialization Latency (ms): Measured at high resolution natively.
Peak Memory Usage (MB): Measured via tracemalloc.
Dynamic Code Volume (LOC): Tracks the deterministic delta in sys.modules to aggregate the exact physical Lines of Code loaded by the interpreter.
Includes detailed breakdown in scripts/import_profiler/documentation.md.

Code Review

This pull request introduces a process-isolated Python SDK Import Profiler tool (profiler.py) along with its comprehensive documentation (documentation.md). The profiler measures initialization latency, peak memory usage, and dynamic code volume across multiple cold-start iterations. Feedback on the implementation highlights several key improvements: adding a guard to prevent crashes during single-iteration runs, replacing absolute local file paths in the documentation with relative links, handling .pyc files and logging exceptions during line counting, and exposing the worker process's stderr to facilitate debugging when subprocesses fail.

- Add fallback to handle single-iteration quantile calculation crashes - Normalize .pyc file paths during line counting and log exceptions - Expose worker process stderr to facilitate debugging CalledProcessError - Fix absolute paths in documentation.md to use relative directory paths

gemini-code-assist

Code Review

This pull request introduces a Python SDK import profiler tool along with its documentation. The profiler is designed to run process-isolated benchmarks to measure import latency, peak memory usage, and dynamic code volume. The feedback provided aims to improve the robustness of the tool. Key recommendations include using importlib.util.source_from_cache to correctly resolve source paths from compiled .pyc files, moving imports to the module level, parsing only the last line of subprocess stdout to handle noisy module initializations safely, and explicitly specifying encoding="utf-8" when writing files to ensure platform-independent behavior.

gemini-code-assist · 2026-06-15T21:42:29Z

+            if file_path.endswith('.pyc'):
+                file_path = file_path[:-1]
+            if file_path.endswith('.py'):


In Python 3, compiled .pyc files are stored in __pycache__ directories with interpreter-specific suffixes (e.g., module.cpython-312.pyc). Simply stripping the last character (file_path[:-1]) results in a non-existent path like __pycache__/module.cpython-312.py, causing the line counter to fail for almost all cached imports.

Use importlib.util.source_from_cache to correctly resolve the source .py file path.

if file_path.endswith('.pyc'): try: file_path = importlib.util.source_from_cache(file_path) except ValueError: pass if file_path.endswith('.py'):

gemini-code-assist · 2026-06-15T21:42:29Z

+import importlib
+import csv
+import os


Import importlib.util and logging at the module level to support robust .pyc path resolution and avoid importing logging inside a loop.

import importlib import importlib.util import csv import os import logging

…g, and encodings - Use importlib.util.source_from_cache for robust .pyc resolution - Move importlib.util and logging imports to module level - Refine json.loads to parse only the last line of stdout - Specify UTF-8 encoding when opening files for writing

hebaalazzeh · 2026-06-15T22:00:55Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Python SDK import profiling tool (profiler.py) along with its comprehensive documentation (documentation.md). The profiler uses a process-isolated master/worker architecture to accurately measure cold-start import latency, peak memory usage, and dynamic code volume. The review feedback highlights two key areas for improvement: robustly parsing the worker's stdout in run_master to prevent crashes from unexpected output (such as warnings) and validating the iteration count, and checking the subprocess return code in run_trace to warn users if an import fails instead of silently writing the error log.

gemini-code-assist · 2026-06-15T22:03:11Z

+def run_master(iterations, target_module, cpu="0", csv_path=None):
+    """Orchestrates the benchmark."""
+    times, memories = [], []
+    loaded_modules_val, loaded_lines_val = 0, 0
+
+    print(f"Profiling start... Running {iterations} cold-start iterations for {target_module}.")
+    if cpu.lower() != "none":
+        print(f"CPU Pinning enabled: Pinning processes to core {cpu} using taskset.")
+    else:
+        print("CPU Pinning disabled.")
+
+    for i in range(iterations):
+        # Build command line
+        cmd = []
+        if cpu.lower() != "none":
+            cmd += ["taskset", "-c", cpu]
+
+        cmd += [sys.executable, __file__, "--worker", f"--module={target_module}"]
+
+        try:
+            result = subprocess.run(
+                cmd, capture_output=True, text=True, check=True
+            )
+            data = json.loads(result.stdout.strip().splitlines()[-1])
+            times.append(data["time_ms"])
+            memories.append(data["peak_ram_mb"])
+            loaded_modules_val = data.get("loaded_modules", 0)
+            loaded_lines_val = data.get("loaded_lines", 0)
+        except (subprocess.CalledProcessError, FileNotFoundError) as e:
+            # Fallback if taskset is not found or fails
+            if cpu.lower() != "none" and i == 0:
+                print("WARNING: taskset CPU pinning failed or is not available. Falling back to unpinned execution...")
+                # Try running without taskset
+                cmd = [sys.executable, __file__, "--worker", f"--module={target_module}"]
+                try:
+                    result = subprocess.run(cmd, capture_output=True, text=True, check=True)
+                    data = json.loads(result.stdout.strip().splitlines()[-1])
+                    times.append(data["time_ms"])
+                    memories.append(data["peak_ram_mb"])
+                    loaded_modules_val = data.get("loaded_modules", 0)
+                    loaded_lines_val = data.get("loaded_lines", 0)
+                    cpu = "none" # Disable cpu pinning for remaining iterations
+                except subprocess.CalledProcessError as err:
+                    print(f"Error in worker process:\n{err.stderr}", file=sys.stderr)
+                    raise err
+            else:
+                if isinstance(e, subprocess.CalledProcessError):
+                    print(f"Error in worker process:\n{e.stderr}", file=sys.stderr)
+                raise e


The current implementation of run_master parses the worker's stdout by splitting lines and taking the last line. However, if the worker process prints any warnings, debug messages, or if the imported module itself prints to stdout, the last line might not be valid JSON. This will cause the master process to crash with an unhandled json.JSONDecodeError or IndexError without displaying the worker's actual output or traceback, making debugging extremely difficult.

Additionally, there is no validation that iterations is at least 1, which can lead to division-by-zero or statistics errors later in the function.

To address these issues, we can extract the worker execution and robust parsing logic into a helper function _run_worker_and_parse that catches parsing errors and prints the raw stdout/stderr of the worker before raising, and validate iterations at the start of run_master.

def _run_worker_and_parse(cmd): result = subprocess.run(cmd, capture_output=True, text=True, check=True) try: lines = result.stdout.strip().splitlines() if not lines: raise ValueError("Worker produced no output on stdout.") data = json.loads(lines[-1]) for key in ("time_ms", "peak_ram_mb", "loaded_modules", "loaded_lines"): if key not in data: raise KeyError(f"Missing key '{key}' in worker output") return data except (json.JSONDecodeError, IndexError, KeyError, ValueError) as parse_err: print(f"Error parsing worker output: {parse_err}", file=sys.stderr) print(f"Worker stdout:\\n{result.stdout}", file=sys.stderr) print(f"Worker stderr:\\n{result.stderr}", file=sys.stderr) raise parse_err def run_master(iterations, target_module, cpu="0", csv_path=None): """Orchestrates the benchmark.""" if iterations < 1: raise ValueError("Number of iterations must be at least 1.") times, memories = [], [] loaded_modules_val, loaded_lines_val = 0, 0 print(f"Profiling start... Running {iterations} cold-start iterations for {target_module}.") if cpu.lower() != "none": print(f"CPU Pinning enabled: Pinning processes to core {cpu} using taskset.") else: print("CPU Pinning disabled.") for i in range(iterations): cmd = [] if cpu.lower() != "none": cmd += ["taskset", "-c", cpu] cmd += [sys.executable, __file__, "--worker", f"--module={target_module}"] try: data = _run_worker_and_parse(cmd) times.append(data["time_ms"]) memories.append(data["peak_ram_mb"]) loaded_modules_val = data["loaded_modules"] loaded_lines_val = data["loaded_lines"] except (subprocess.CalledProcessError, FileNotFoundError) as e: if cpu.lower() != "none" and i == 0: print("WARNING: taskset CPU pinning failed or is not available. Falling back to unpinned execution...") cmd = [sys.executable, __file__, "--worker", f"--module={target_module}"] try: data = _run_worker_and_parse(cmd) times.append(data["time_ms"]) memories.append(data["peak_ram_mb"]) loaded_modules_val = data["loaded_modules"] loaded_lines_val = data["loaded_lines"] cpu = "none" except subprocess.CalledProcessError as err: print(f"Error in worker process:\n{err.stderr}", file=sys.stderr) raise err else: if isinstance(e, subprocess.CalledProcessError): print(f"Error in worker process:\n{e.stderr}", file=sys.stderr) raise e

gemini-code-assist · 2026-06-15T22:03:11Z

+def run_trace(target_module):
+    """Generates importtime trace log and writes it to a file."""
+    trace_file = f"import_trace_{target_module.replace('.', '_')}.log"
+    print(f"Generating importtime trace log for {target_module} -> {trace_file}...")
+
+    # We run: python -X importtime -c "import <module>"
+    result = subprocess.run(
+        [sys.executable, "-X", "importtime", "-c", f"import {target_module}"],
+        capture_output=True, text=True
+    )
+
+    with open(trace_file, "w", encoding="utf-8") as f:
+        f.write(result.stderr)
+
+    print(f"Trace log successfully written to {trace_file}")


In run_trace, if the import fails, subprocess.run will complete with a non-zero exit code, but the script will silently write the stderr to the trace file and print a success message without warning the user that the import actually failed. Adding a check for result.returncode and printing a warning with the worker's stdout/stderr will significantly improve usability and prevent users from analyzing incomplete or failed traces.

def run_trace(target_module): """Generates importtime trace log and writes it to a file.""" trace_file = f"import_trace_{target_module.replace('.', '_')}.log" print(f"Generating importtime trace log for {target_module} -> {trace_file}...") # We run: python -X importtime -c "import <module>" result = subprocess.run( [sys.executable, "-X", "importtime", "-c", f"import {target_module}"], capture_output=True, text=True ) if result.returncode != 0: print(f"WARNING: Import failed with exit code {result.returncode}. The trace log may be incomplete or contain errors.", file=sys.stderr) if result.stdout: print(f"Worker stdout:\\n{result.stdout}", file=sys.stderr) if result.stderr: print(f"Worker stderr:\\n{result.stderr}", file=sys.stderr) with open(trace_file, "w", encoding="utf-8") as f: f.write(result.stderr) print(f"Trace log successfully written to {trace_file}")

- Extract worker logic to _run_worker_and_parse for robust JSON parsing - Add early validation for iterations parameter in run_master - Add non-zero exit code check and warning for run_trace failures

hebaalazzeh · 2026-06-15T22:06:48Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Python SDK Import Profiler, which includes a comprehensive documentation guide and a self-spawning profiling script (profiler.py) designed to benchmark import latency, peak memory, and dynamic code volume using a process-isolated master/worker architecture. The feedback recommends enhancing the robustness of the worker's stdout parsing by prefixing metrics with a unique identifier to prevent interference from module initialization logs, and replacing the manual CLI argument parsing with the standard argparse library for improved usability and maintainability.

gemini-code-assist · 2026-06-15T22:08:56Z

+if __name__ == "__main__":
+    # Parse CLI arguments
+    target_module = "google.cloud.compute"
+    iterations = 50
+    trace = False
+    cprofile = False
+    mprofile = False
+    cpu = "0"
+    csv_path = None
+
+    for arg in sys.argv[1:]:
+        if arg.startswith("--module="):
+            target_module = arg.split("=")[1]
+        elif arg.startswith("--iterations="):
+            iterations = int(arg.split("=")[1])
+        elif arg.startswith("--cpu="):
+            cpu = arg.split("=")[1]
+        elif arg.startswith("--csv="):
+            csv_path = arg.split("=")[1]
+        elif arg == "--trace":
+            trace = True
+        elif arg == "--cprofile":
+            cprofile = True
+        elif arg == "--mprofile":
+            mprofile = True
+
+    if "--worker" in sys.argv:
+        run_worker(target_module)
+    elif trace:
+        run_trace(target_module)
+    elif cprofile:
+        run_cprofile(target_module)
+    elif mprofile:
+        run_mprofile(target_module)
+    else:
+        run_master(iterations, target_module, cpu, csv_path)


Using manual argument parsing is fragile, does not support standard help flags (-h/--help), and silently ignores invalid arguments. Replacing it with the standard library argparse module improves maintainability, usability, and robustness.

if __name__ == "__main__": import argparse parser = argparse.ArgumentParser(description="Python SDK Import Profiler") parser.add_argument("--module", default="google.cloud.compute", help="Target module to profile") parser.add_argument("--iterations", type=int, default=50, help="Number of iterations") parser.add_argument("--cpu", default="0", help="CPU core to pin to (or 'none')") parser.add_argument("--csv", help="Path to export CSV results") parser.add_argument("--trace", action="store_true", help="Generate importtime trace log") parser.add_argument("--cprofile", action="store_true", help="Run cProfile") parser.add_argument("--mprofile", action="store_true", help="Run tracemalloc memory snapshot") parser.add_argument("--worker", action="store_true", help=argparse.SUPPRESS) args = parser.parse_args() if args.worker: run_worker(args.module) elif args.trace: run_trace(args.module) elif args.cprofile: run_cprofile(args.module) elif args.mprofile: run_mprofile(args.module) else: run_master(args.iterations, args.module, args.cpu, args.csv)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

hebaalazzeh · 2026-06-15T22:24:06Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Python SDK Import Profiler tool, consisting of a detailed documentation guide and a self-spawning, process-isolated benchmarking script (profiler.py) designed to measure cold-start import latency, peak memory usage, and dynamic code volume. The reviewer's feedback focuses on improving the robustness and usability of the script. Key recommendations include converting the cpu parameter to a string to prevent potential runtime errors, automatically creating parent directories for the CSV export path to avoid FileNotFoundError, and replacing manual CLI argument parsing with Python's standard argparse module for a more robust and user-friendly interface.

I am having trouble creating individual review comments. Click here to see my feedback.

scripts/import_profiler/profiler.py (78-81)

If cpu is passed as an integer (e.g., cpu=0), calling cpu.lower() will raise an AttributeError. Additionally, passing an integer to the subprocess command list will cause a TypeError during execution. Converting cpu to a string at the beginning of run_master ensures robust handling of both integer and string inputs.

def run_master(iterations, target_module, cpu="0", csv_path=None):
    cpu = str(cpu) if cpu is not None else "none"
    if iterations < 1:
        raise ValueError("Number of iterations must be at least 1.")

scripts/import_profiler/profiler.py (127-128)

If the provided csv_path contains nested directories that do not exist yet, open(csv_path, "w") will raise a FileNotFoundError. Automatically creating the parent directories using os.makedirs makes the tool more robust and utilizes the otherwise unused os import.

    if csv_path:
        os.makedirs(os.path.dirname(os.path.abspath(csv_path)), exist_ok=True)
        with open(csv_path, "w", newline="", encoding="utf-8") as f:

scripts/import_profiler/profiler.py (232-267)

Using manual argument parsing is error-prone, doesn't support standard --help / -h flags (which currently falls back to running the full benchmark), and only supports specific formats like --module=value instead of --module value. Replacing this with Python's standard argparse module provides robust validation, automatic help documentation, and a much better user experience.

if __name__ == "__main__":
    import argparse
    
    parser = argparse.ArgumentParser(
        description="Python SDK Import Profiler: benchmark and profile module imports."
    )
    parser.add_argument(
        "--module",
        default="google.cloud.compute",
        help="Target module to profile (e.g., google.cloud.aiplatform)"
    )
    parser.add_argument(
        "--iterations",
        type=int,
        default=50,
        help="Number of cold-start iterations to run"
    )
    parser.add_argument(
        "--cpu",
        default="0",
        help="CPU core to pin processes to using taskset (set to 'none' to disable)"
    )
    parser.add_argument(
        "--csv",
        help="Path to export raw iteration metrics as a CSV file"
    )
    parser.add_argument(
        "--trace",
        action="store_true",
        help="Generate an importtime trace log file"
    )
    parser.add_argument(
        "--cprofile",
        action="store_true",
        help="Run cProfile to capture latency stack traces"
    )
    parser.add_argument(
        "--mprofile",
        action="store_true",
        help="Run tracemalloc memory snapshot"
    )
    parser.add_argument(
        "--worker",
        action="store_true",
        help="Run in worker mode (internal use)"
    )
    
    args = parser.parse_args()
            
    if args.worker:
        run_worker(args.module)
    elif args.trace:
        run_trace(args.module)
    elif args.cprofile:
        run_cprofile(args.module)
    elif args.mprofile:
        run_mprofile(args.module)
    else: 
        run_master(args.iterations, args.module, args.cpu, args.csv)

hebaalazzeh · 2026-06-15T22:26:58Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Python SDK Import Profiler tool, consisting of a master/worker profiling script (profiler.py) and its comprehensive documentation (documentation.md). The profiler measures import latency, peak memory, and dynamic code volume in a process-isolated manner to avoid caching issues. The feedback suggests a platform-aware improvement to default the CPU pinning option to 'none' on non-Linux systems, preventing unnecessary warnings and fallback attempts on macOS or Windows.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

hebaalazzeh · 2026-06-15T22:48:43Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Python SDK Import Profiler tool, consisting of a comprehensive documentation guide and a master/worker execution script (profiler.py) designed to benchmark module import performance under cold-start conditions. Feedback on the implementation highlights opportunities to preserve the cold-start design by running both cProfile and tracemalloc profiling in isolated subprocesses rather than directly within the master process. Additionally, it is recommended to separate FileNotFoundError and subprocess.CalledProcessError handling in the master process to avoid misleading warnings about taskset availability when a worker process crashes.

gemini-code-assist · 2026-06-15T22:51:19Z

+def run_cprofile(target_module):
+    """Runs cProfile to capture stack traces for latency."""
+    import cProfile
+    import pstats
+
+    prof_file = f"cprofile_{target_module.replace('.', '_')}.prof"
+    print(f"Generating cProfile data for {target_module} -> {prof_file}...")
+
+    # Run profiling
+    pr = cProfile.Profile()
+    pr.enable()
+    importlib.import_module(target_module)
+    pr.disable()
+
+    # Save for flame charts (e.g. via snakeviz)
+    pr.dump_stats(prof_file)
+    print(f"cProfile stats successfully written to {prof_file}")
+
+    # Print top bottlenecks
+    print("\n--- Top 15 functions by cumulative time ---")
+    ps = pstats.Stats(pr).sort_stats(pstats.SortKey.CUMULATIVE)
+    ps.print_stats(15)


Running cProfile directly in the master process defeats the "cold start" design of the profiler. Since profiler.py already imports heavy standard library modules (like subprocess, json, csv, logging, tracemalloc, statistics), any imports of these modules by the target module will be resolved instantly from Python's sys.modules cache and won't be captured in the profile.

To ensure a true cold-start profile, cProfile should be executed in an isolated subprocess.

def run_cprofile(target_module): """Runs cProfile in a clean subprocess to capture stack traces for latency.""" import pstats prof_file = f"cprofile_{target_module.replace('.', '_')}.prof" print(f"Generating cProfile data for {target_module} -> {prof_file}...") # Run profiling in a clean subprocess to ensure cold-start result = subprocess.run( [sys.executable, "-m", "cProfile", "-o", prof_file, "-c", f"import importlib; importlib.import_module('{target_module}')"], capture_output=True, text=True ) if result.returncode != 0: print(f"Error generating cProfile data:\n{result.stderr}", file=sys.stderr) return print(f"cProfile stats successfully written to {prof_file}") # Print top bottlenecks print("\n--- Top 15 functions by cumulative time ---") ps = pstats.Stats(prof_file).sort_stats(pstats.SortKey.CUMULATIVE) ps.print_stats(15)

- Execute cProfile in a fresh subprocess to ensure cold-start accuracy - Execute tracemalloc in a fresh subprocess to capture all memory allocations - Clean up master process exception handling for missing taskset vs crashed workers

hebaalazzeh · 2026-06-15T22:57:42Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Python SDK Import Profiler tool, which includes a comprehensive documentation file (documentation.md) and a core profiling script (profiler.py) designed to measure import latency, peak memory, and dynamic code volume using process-isolated worker runs. The review feedback highlights several critical security and robustness improvements: addressing potential code injection vulnerabilities in run_trace, run_cprofile, and run_mprofile by safely escaping the target_module parameter using json.dumps(), and preventing potential UnicodeDecodeError failures in run_worker when reading source files by adding errors='ignore' to the file-opening logic.

gemini-code-assist · 2026-06-15T22:59:48Z

+        "import tracemalloc\n"
+        "import importlib\n"
+        "tracemalloc.start()\n"
+        f"importlib.import_module('{target_module}')\n"
+        "snapshot = tracemalloc.take_snapshot()\n"
+        "tracemalloc.stop()\n"
+        "top_stats = snapshot.statistics('lineno')\n"
+        "for stat in top_stats[:15]:\n"
+        "    print(stat)\n"
+    )
+    result = subprocess.run([sys.executable, "-c", code], capture_output=True, text=True)


Directly interpolating target_module into the code string is vulnerable to code injection. Use json.dumps() to safely escape the module name.

code = ( "import tracemalloc\n" "import importlib\n" "tracemalloc.start()\n" f"importlib.import_module({json.dumps(target_module)})\n" "snapshot = tracemalloc.take_snapshot()\n" "tracemalloc.stop()\n" "top_stats = snapshot.statistics('lineno')\n" "for stat in top_stats[:15]:\n" " print(stat)\n" )

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

- Use json.dumps to safely escape target_module string in all subprocess commands (mprofile, cprofile, trace)

hebaalazzeh · 2026-06-15T23:08:24Z

/gemini review

ohmayr · 2026-06-15T23:10:06Z

+* **[plan.md](./plan.md)**: The current project phases and roadmap checklist.
+* **[status.md](./status.md)**: Tracks the active task state and hosts recorded baseline performance metrics.


where are these files?

gemini-code-assist

Code Review

This pull request introduces a Python SDK Import Profiler tool, including a detailed documentation guide and a self-spawning script (profiler.py) that benchmarks module imports in process-isolated environments. The reviewer feedback highlights two main improvement opportunities: preventing a potential AttributeError by verifying that the module's __file__ attribute is a string before checking its extension, and refactoring the error handling block in run_master to eliminate duplicate metric collection logic.

gemini-code-assist · 2026-06-15T23:10:15Z

+        if mod and getattr(mod, '__file__', None):
+            file_path = mod.__file__


The __file__ attribute of a module can sometimes be None (e.g., for namespace packages or dynamically created modules) or potentially a non-string object in custom environments. Checking if it is a string before performing string operations like .endswith() prevents potential AttributeErrors.

Suggested change

if mod and getattr(mod, '__file__', None):

file_path = mod.__file__

if mod and isinstance(getattr(mod, '__file__', None), str):

file_path = mod.__file__

gemini-code-assist · 2026-06-15T23:10:15Z

+        try:
+            data = _run_worker_and_parse(cmd)
+            times.append(data["time_ms"])
+            memories.append(data["peak_ram_mb"])
+            loaded_modules_val = data["loaded_modules"]
+            loaded_lines_val = data["loaded_lines"]
+        except FileNotFoundError as e:
+            if cpu.lower() != "none" and i == 0:
+                print("WARNING: taskset CPU pinning is not available. Falling back to unpinned execution...")
+                cpu = "none"
+                cmd = [sys.executable, __file__, "--worker", f"--module={target_module}"]
+                try:
+                    data = _run_worker_and_parse(cmd)
+                    times.append(data["time_ms"])
+                    memories.append(data["peak_ram_mb"])
+                    loaded_modules_val = data["loaded_modules"]
+                    loaded_lines_val = data["loaded_lines"]
+                except subprocess.CalledProcessError as err:
+                    print(f"Error in worker process:\n{err.stderr}", file=sys.stderr)
+                    raise err
+            else:
+                raise e
+        except subprocess.CalledProcessError as e:
+            print(f"Error in worker process:\n{e.stderr}", file=sys.stderr)
+            raise e


There is significant code duplication in the error handling block where metrics are appended to lists and assigned to variables. Refactoring this block to perform the list appending and variable assignment once after the try-except block simplifies the logic and improves maintainability.

try: data = _run_worker_and_parse(cmd) except FileNotFoundError as e: if cpu.lower() != "none" and i == 0: print("WARNING: taskset CPU pinning is not available. Falling back to unpinned execution...") cpu = "none" cmd = [sys.executable, __file__, "--worker", f"--module={target_module}"] try: data = _run_worker_and_parse(cmd) except subprocess.CalledProcessError as err: print(f"Error in worker process:\\n{err.stderr}", file=sys.stderr) raise err else: raise e except subprocess.CalledProcessError as e: print(f"Error in worker process:\\n{e.stderr}", file=sys.stderr) raise e times.append(data["time_ms"]) memories.append(data["peak_ram_mb"]) loaded_modules_val = data["loaded_modules"] loaded_lines_val = data["loaded_lines"]

References

Remove duplicate lines of code, especially duplicate assertions in tests, to keep the codebase clean and avoid redundancy.

ohmayr · 2026-06-15T23:13:24Z

+**Proposed Solution**
+The selected design for the tracking framework is Hybrid Process Orchestration, implemented via a highly isolated Master/Worker architecture within a single self-spawning script (`profiler.py`). This setup directly resolves the "Cache Problem," where Python caches completed imports inside `sys.modules` and loads them in <0.1ms on subsequent calls, making standard loop benchmarking inaccurate.
+
+*   **The Master Process:** Operates as the central harness controller, looping exactly N times. For each iteration, it calls a clean subprocess using `sys.executable` pointing back to itself with a specialized `--worker` flag. This isolates every single run from Python's runtime memory cache, guaranteeing a true, 100% "cold start" baseline.
+*   **The Worker Process:** Spawns a fresh interpreter instance, executes the dynamic target module import via `importlib.import_module()`, natively tracks system metrics, pipes the final calculation payload out as a clean JSON string to `stdout`, and exits.


Maybe a high level description of usage and the expected output is sufficient instead of solution details?

Something like:

Run this command to collect the metrics

./profiler <arg_1> <arg_2>

Expected output:

package name: load time: memory: package size:

- Replace detailed implementation architecture and alternative solutions with a concise high-level usage and expected output format per PR review feedback

feat: add import profiler tool with dynamic loaded lines code volume …

7e325db

…metrics

hebaalazzeh self-assigned this Jun 15, 2026

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread scripts/import_profiler/profiler.py Outdated

Comment thread scripts/import_profiler/documentation.md Outdated

Comment thread scripts/import_profiler/profiler.py Outdated

Comment thread scripts/import_profiler/profiler.py Outdated

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

fix: refactor profiler worker execution and handling

9340dc9

- Extract worker logic to _run_worker_and_parse for robust JSON parsing - Add early validation for iterations parameter in run_master - Add non-zero exit code check and warning for run_trace failures

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

hebaalazzeh and others added 2 commits June 15, 2026 15:18

Update scripts/import_profiler/profiler.py

d8c26a2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/import_profiler/profiler.py

f2c998c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

refactor: replace manual sys.argv parsing with argparse

d7ae73f

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread scripts/import_profiler/profiler.py Outdated

Update scripts/import_profiler/profiler.py

6a94207

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

hebaalazzeh and others added 4 commits June 15, 2026 16:00

Update scripts/import_profiler/profiler.py

e15d972

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/import_profiler/profiler.py

051b455

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/import_profiler/profiler.py

9f55f1c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix: resolve code injection vulnerability when importing target_module

fb35fd3

- Use json.dumps to safely escape target_module string in all subprocess commands (mprofile, cprofile, trace)

ohmayr reviewed Jun 15, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

ohmayr reviewed Jun 15, 2026

View reviewed changes

hebaalazzeh added 2 commits June 15, 2026 23:16

docs: simplify profiler mechanism section

727d44e

- Replace detailed implementation architecture and alternative solutions with a concise high-level usage and expected output format per PR review feedback

update documenation md

70c2a24

		* [plan.md](./plan.md): The current project phases and roadmap checklist.
		* [status.md](./status.md): Tracks the active task state and hosts recorded baseline performance metrics.

		if mod and getattr(mod, '__file__', None):
		file_path = mod.__file__

Conversation

hebaalazzeh commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hebaalazzeh commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

hebaalazzeh commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

hebaalazzeh commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

scripts/import_profiler/profiler.py (78-81)

scripts/import_profiler/profiler.py (127-128)

scripts/import_profiler/profiler.py (232-267)

Uh oh!

hebaalazzeh commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

hebaalazzeh commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hebaalazzeh commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment