chore(bigtable): correct sparse-bootstrap dep discovery and install ordering#13523
chore(bigtable): correct sparse-bootstrap dep discovery and install ordering#13523igorbernstein2 wants to merge 1 commit into
Conversation
…rdering The previous script missed cross-project SNAPSHOT deps and could fail on clean-m2 runs. Fixes: - Walk regular <dependency> edges (not just parent + BOM imports) so monorepo SNAPSHOTs like api-common-java are discovered. - Version-aware edge following: skip edges whose declared version doesn't match the local pom, so pinned released BOM imports (e.g. google-cloud-pubsub-bom:1.140.0 from google-cloud-jar-parent) aren't mistakenly followed to the local SNAPSHOT, which created cycles and phantom projects in the install list. - Verify the local pom's coord+version matches <parent> before accepting the <relativePath> shortcut; the Maven default '../pom.xml' otherwise resolves to the unrelated repo-root pom. - Enqueue each project's aggregator <project>/pom.xml when any pom under it is visited, since 'cd <project> && mvn' forces Maven to load the aggregator regardless of -pl. - Flush the install batch on every top-level project change so interleaved topo orders (A1, A2, B1, A3 when A3 depends on B1) install in dep order instead of being collapsed per project. - Match pom filenames by exact basename so test resources like src/test/resources/gax-example-pom.xml aren't picked up as modules.
There was a problem hiding this comment.
Code Review
This pull request enhances the sparse-bootstrap.py script to resolve and traverse regular <dependency> edges within the monorepo, and updates the install command generation to preserve topological order by grouping consecutive POMs of the same project. The review feedback highlights three key improvements: resolving a namespace issue in get_version that prevents inheriting versions from parent POMs, adding the -N flag to mvn install when no sub-modules are present to prevent recursive builds in sparse checkouts, and caching parsed XML roots to avoid redundant parsing during dependency resolution.
| def get_version(root: ET.Element) -> Optional[str]: | ||
| """The pom's own version (or inherited from <parent><version>).""" | ||
| return child_text(root, 'version') or child_text(root, 'parent/version') |
There was a problem hiding this comment.
In ElementTree, when parsing a namespaced XML document (like a Maven POM), every element in the path must be namespaced. If child_text uses a helper like t(tag) to prepend the namespace, calling child_text(root, 'parent/version') will result in searching for {ns}parent/version (where version is namespace-less), which will fail to match the namespaced <version> element inside <parent>. This means get_version will always return None when a POM inherits its version from its parent.
To fix this, retrieve the parent element first and then get its version child.
| def get_version(root: ET.Element) -> Optional[str]: | |
| """The pom's own version (or inherited from <parent><version>).""" | |
| return child_text(root, 'version') or child_text(root, 'parent/version') | |
| def get_version(root: ET.Element) -> Optional[str]: | |
| """The pom's own version (or inherited from <parent><version>).""" | |
| v = child_text(root, 'version') | |
| if v: | |
| return v | |
| parent = root.find(t('parent')) | |
| if parent is not None: | |
| return child_text(parent, 'version') | |
| return None |
| sub_modules = [m for m in modules if m != '.'] | ||
| cmd = ['mvn', 'install', '-T', '1C', '-DskipTests', '-P', 'quick-build'] | ||
| if sub_modules: | ||
| for m in sub_modules: |
There was a problem hiding this comment.
If sub_modules is empty (meaning only the root POM . is being installed), running mvn install without -pl or -N will cause Maven to recursively build all modules defined in the root POM. In a sparse checkout, this will fail because those module directories are not checked out.
To prevent this, you should append the -N (non-recursive) flag when sub_modules is empty so that only the root POM itself is installed.
| def resolve_local(coord: Coord, declared_version: Optional[str]) -> Optional[Path]: | ||
| """Local pom whose version matches the declared version, or None. | ||
|
|
||
| - No coord match in the monorepo → None. | ||
| - declared_version is None (regular dep inheriting from depMgmt) → follow | ||
| the local pom optimistically; we'd need full Maven evaluation to know | ||
| the resolved version, and in this monorepo unversioned deps virtually | ||
| always inherit from a SNAPSHOT BOM. | ||
| - Either version contains a Maven property (${…}) → follow optimistically. | ||
| - Otherwise the versions must equal. | ||
| """ | ||
| local = coord_to_pom.get(coord) | ||
| if local is None: | ||
| return None | ||
| if declared_version is None: | ||
| return local | ||
| local_version = get_version(ET.fromstring(pom_contents[local])) or '' | ||
| if '${' in declared_version or '${' in local_version: | ||
| return local | ||
| return local if local_version == declared_version else None |
There was a problem hiding this comment.
Parsing the XML content of the same POM files repeatedly (via ET.fromstring(pom_contents[local])) during dependency resolution can be highly inefficient, especially in a large monorepo with hundreds of modules.
We can introduce a simple cache for the parsed XML roots (pom_roots) to ensure each POM is parsed at most once. This cached helper can also be reused at line 260 (local_root = get_pom_root(local_parent)).
pom_roots: dict[Path, ET.Element] = {}
def get_pom_root(path: Path) -> ET.Element:
if path not in pom_roots:
pom_roots[path] = ET.fromstring(pom_contents[path])
return pom_roots[path]
def resolve_local(coord: Coord, declared_version: Optional[str]) -> Optional[Path]:
"""Local pom whose version matches the declared version, or None.
- No coord match in the monorepo → None.
- declared_version is None (regular dep inheriting from depMgmt) → follow
the local pom optimistically; we'd need full Maven evaluation to know
the resolved version, and in this monorepo unversioned deps virtually
always inherit from a SNAPSHOT BOM.
- Either version contains a Maven property (${…}) → follow optimistically.
- Otherwise the versions must equal.
"""
local = coord_to_pom.get(coord)
if local is None:
return None
if declared_version is None:
return local
local_version = get_version(get_pom_root(local)) or ''
if '${' in declared_version or '${' in local_version:
return local
return local if local_version == declared_version else None
The previous script missed cross-project SNAPSHOT deps and could fail on clean-m2 runs. Fixes: