chore: add a script to do a sparse checkout of the mono repo to make it manageable#13408
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new script, sparse-bootstrap.py, designed to automate sparse checkout and dependency installation for Maven modules. The code review identified several key improvements: resolving issues in parent POM resolution (handling directory-based relative paths and coordinate fallbacks), adding the -N flag to prevent recursive Maven builds when no submodules are targeted, ensuring compatibility with Python versions older than 3.9 by replacing Path.is_relative_to, and filtering out root-level files from the sparse-checkout directory list to prevent Git warnings.
| needed_poms, dep_edges = find_needed_modules(seed_dir, pom_contents, coord_to_pom) | ||
| sorted_poms = topo_sort(needed_poms, dep_edges) | ||
|
|
||
| top_dirs = sorted({p.parts[0] for p in needed_poms}) |
There was a problem hiding this comment.
If needed_poms contains a root-level pom.xml (e.g., pom.xml), p.parts[0] will be 'pom.xml'. In Git's sparse-checkout cone mode, adding a file instead of a directory can cause warnings or errors. Since root-level files are always checked out by default in cone mode, we should filter out any root-level paths (where len(p.parts) <= 1) from top_dirs.
| top_dirs = sorted({p.parts[0] for p in needed_poms}) | |
| top_dirs = sorted({p.parts[0] for p in needed_poms if len(p.parts) > 1}) |
This repo is 5GB, breaks a lot of tooling. To make it manageable this script will do a sparse clone of it, then walk the dependency tree and do a cone checkout of just the relevant modules. The script should work with any module, but currently lives under bigtable until there is interest to move it to the monorepo root
Example usage: