Skip to content

chore(bigtable): correct sparse-bootstrap dep discovery and install ordering#13523

Open
igorbernstein2 wants to merge 1 commit into
googleapis:mainfrom
igorbernstein2:sparse-fix
Open

chore(bigtable): correct sparse-bootstrap dep discovery and install ordering#13523
igorbernstein2 wants to merge 1 commit into
googleapis:mainfrom
igorbernstein2:sparse-fix

Conversation

@igorbernstein2

Copy link
Copy Markdown
Contributor

The previous script missed cross-project SNAPSHOT deps and could fail on clean-m2 runs. Fixes:

  • Walk regular edges (not just parent + BOM imports) so monorepo SNAPSHOTs like api-common-java are discovered.
  • Version-aware edge following: skip edges whose declared version doesn't match the local pom, so pinned released BOM imports (e.g. google-cloud-pubsub-bom:1.140.0 from google-cloud-jar-parent) aren't mistakenly followed to the local SNAPSHOT, which created cycles and phantom projects in the install list.
  • Verify the local pom's coord+version matches before accepting the shortcut; the Maven default '../pom.xml' otherwise resolves to the unrelated repo-root pom.
  • Enqueue each project's aggregator /pom.xml when any pom under it is visited, since 'cd && mvn' forces Maven to load the aggregator regardless of -pl.
  • Flush the install batch on every top-level project change so interleaved topo orders (A1, A2, B1, A3 when A3 depends on B1) install in dep order instead of being collapsed per project.
  • Match pom filenames by exact basename so test resources like src/test/resources/gax-example-pom.xml aren't picked up as modules.

…rdering

The previous script missed cross-project SNAPSHOT deps and could fail on
clean-m2 runs. Fixes:

- Walk regular <dependency> edges (not just parent + BOM imports) so
  monorepo SNAPSHOTs like api-common-java are discovered.
- Version-aware edge following: skip edges whose declared version
  doesn't match the local pom, so pinned released BOM imports (e.g.
  google-cloud-pubsub-bom:1.140.0 from google-cloud-jar-parent) aren't
  mistakenly followed to the local SNAPSHOT, which created cycles and
  phantom projects in the install list.
- Verify the local pom's coord+version matches <parent> before
  accepting the <relativePath> shortcut; the Maven default
  '../pom.xml' otherwise resolves to the unrelated repo-root pom.
- Enqueue each project's aggregator <project>/pom.xml when any pom
  under it is visited, since 'cd <project> && mvn' forces Maven to
  load the aggregator regardless of -pl.
- Flush the install batch on every top-level project change so
  interleaved topo orders (A1, A2, B1, A3 when A3 depends on B1)
  install in dep order instead of being collapsed per project.
- Match pom filenames by exact basename so test resources like
  src/test/resources/gax-example-pom.xml aren't picked up as modules.
@igorbernstein2 igorbernstein2 requested review from a team as code owners June 18, 2026 21:21

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the sparse-bootstrap.py script to resolve and traverse regular <dependency> edges within the monorepo, and updates the install command generation to preserve topological order by grouping consecutive POMs of the same project. The review feedback highlights three key improvements: resolving a namespace issue in get_version that prevents inheriting versions from parent POMs, adding the -N flag to mvn install when no sub-modules are present to prevent recursive builds in sparse checkouts, and caching parsed XML roots to avoid redundant parsing during dependency resolution.

Comment on lines +130 to +132
def get_version(root: ET.Element) -> Optional[str]:
"""The pom's own version (or inherited from <parent><version>)."""
return child_text(root, 'version') or child_text(root, 'parent/version')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In ElementTree, when parsing a namespaced XML document (like a Maven POM), every element in the path must be namespaced. If child_text uses a helper like t(tag) to prepend the namespace, calling child_text(root, 'parent/version') will result in searching for {ns}parent/version (where version is namespace-less), which will fail to match the namespaced <version> element inside <parent>. This means get_version will always return None when a POM inherits its version from its parent.

To fix this, retrieve the parent element first and then get its version child.

Suggested change
def get_version(root: ET.Element) -> Optional[str]:
"""The pom's own version (or inherited from <parent><version>)."""
return child_text(root, 'version') or child_text(root, 'parent/version')
def get_version(root: ET.Element) -> Optional[str]:
"""The pom's own version (or inherited from <parent><version>)."""
v = child_text(root, 'version')
if v:
return v
parent = root.find(t('parent'))
if parent is not None:
return child_text(parent, 'version')
return None

Comment on lines +343 to 346
sub_modules = [m for m in modules if m != '.']
cmd = ['mvn', 'install', '-T', '1C', '-DskipTests', '-P', 'quick-build']
if sub_modules:
for m in sub_modules:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If sub_modules is empty (meaning only the root POM . is being installed), running mvn install without -pl or -N will cause Maven to recursively build all modules defined in the root POM. In a sparse checkout, this will fail because those module directories are not checked out.

To prevent this, you should append the -N (non-recursive) flag when sub_modules is empty so that only the root POM itself is installed.

Comment on lines +214 to +233
def resolve_local(coord: Coord, declared_version: Optional[str]) -> Optional[Path]:
"""Local pom whose version matches the declared version, or None.

- No coord match in the monorepo → None.
- declared_version is None (regular dep inheriting from depMgmt) → follow
the local pom optimistically; we'd need full Maven evaluation to know
the resolved version, and in this monorepo unversioned deps virtually
always inherit from a SNAPSHOT BOM.
- Either version contains a Maven property (${…}) → follow optimistically.
- Otherwise the versions must equal.
"""
local = coord_to_pom.get(coord)
if local is None:
return None
if declared_version is None:
return local
local_version = get_version(ET.fromstring(pom_contents[local])) or ''
if '${' in declared_version or '${' in local_version:
return local
return local if local_version == declared_version else None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Parsing the XML content of the same POM files repeatedly (via ET.fromstring(pom_contents[local])) during dependency resolution can be highly inefficient, especially in a large monorepo with hundreds of modules.

We can introduce a simple cache for the parsed XML roots (pom_roots) to ensure each POM is parsed at most once. This cached helper can also be reused at line 260 (local_root = get_pom_root(local_parent)).

    pom_roots: dict[Path, ET.Element] = {}

    def get_pom_root(path: Path) -> ET.Element:
        if path not in pom_roots:
            pom_roots[path] = ET.fromstring(pom_contents[path])
        return pom_roots[path]

    def resolve_local(coord: Coord, declared_version: Optional[str]) -> Optional[Path]:
        """Local pom whose version matches the declared version, or None.

        - No coord match in the monorepo → None.
        - declared_version is None (regular dep inheriting from depMgmt) → follow
          the local pom optimistically; we'd need full Maven evaluation to know
          the resolved version, and in this monorepo unversioned deps virtually
          always inherit from a SNAPSHOT BOM.
        - Either version contains a Maven property (${…}) → follow optimistically.
        - Otherwise the versions must equal.
        """
        local = coord_to_pom.get(coord)
        if local is None:
            return None
        if declared_version is None:
            return local
        local_version = get_version(get_pom_root(local)) or ''
        if '${' in declared_version or '${' in local_version:
            return local
        return local if local_version == declared_version else None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant