Skip to content

feature/neo4j: property-graph output with Py/PY_-namespaced schema; rename CLI to canpy#33

Merged
rahlk merged 5 commits into
mainfrom
feature/neo4j
Jun 20, 2026
Merged

feature/neo4j: property-graph output with Py/PY_-namespaced schema; rename CLI to canpy#33
rahlk merged 5 commits into
mainfrom
feature/neo4j

Conversation

@rahlk

@rahlk rahlk commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds Neo4j property-graph output to the Python analyzer (mirroring codeanalyzer-typescript), namespaces the whole graph schema so multiple language backends can share one database, and renames the CLI command to canpy. One in-memory analysis (PyApplication) can now be emitted three ways via --emit: the canonical analysis.json (default), a Neo4j property graph, or the version-stamped schema contract.

Scope grew after the original description — see Schema namespacing, CLI rename, and Docs auto-generation below.

Neo4j property graph

New codeanalyzer/neo4j/ package (mirrors src/build/neo4j/):

  • catalog.py — declarative schema catalog (single source of truth), SCHEMA_VERSION = 1.0.0
  • project.py — pure projection PyApplication → graph rows
  • cypher.py — self-contained graph.cypher snapshot writer
  • bolt.py — incremental live-push writer (lazy neo4j import; content-hash module diffing, vanished-decl cleanup, full-run orphan prune)
  • rows.py / schema.py / emit.py

Emit shapes

  • --emit neo4j -o ./out./out/graph.cypher (no driver needed)
  • --emit neo4j --neo4j-uri bolt://… → incremental Bolt push (needs the [neo4j] extra)
  • --emit schemaschema.json (no project required; checked in as schema.neo4j.json)

Schema namespacing (multi-language coexistence)

Every node label is Py-prefixed and every relationship type is PY_-prefixed (e.g. :PyClass, shared MERGE label :PySymbol, PY_CALLS); constraint/index names get a py_ prefix. Labels and relationship types are separate Neo4j namespaces, so both are prefixed — this lets a future Java/JS analyzer share one database without MERGE / rel-type collisions. SCHEMA_VERSION stays 1.0.0 (the schema is new on this branch; no released consumer has seen the unprefixed form).

CLI rename → canpy

  • The console command is now canpy (parallels the TS sibling's cants). The PyPI package (codeanalyzer-python) and the importable codeanalyzer module are unchanged.
  • codeanalyzer is retained as a deprecated alias that prints a notice to stderr (so piped stdout — e.g. --emit schema — stays clean) and then runs unchanged; to be removed in a future release.
  • CLI args are unchanged: --emit {json,neo4j,schema}, --app-name, --neo4j-uri/-user/-password/-database; -i/--input is optional for --emit schema; -f/--format msgpack retained on the json path.

Docs auto-generation

  • scripts/update_readme.py renders canpy --help into the README between <!-- BEGIN/END canpy-help --> markers (the Python twin of TS scripts/update-readme.ts), so documented options can't drift from the CLI. Regenerating surfaced previously-undocumented --ray / --skip-tests / --file-name.
  • release.yml now syncs both the README --help block and schema.neo4j.json from source before publishing, and stages canpy-installer.sh + schema.json as Release assets.

Tests & verification

  • test/test_neo4j_schema.py — always-on anti-drift conformance guard (the emitter can never produce a label/rel/property the catalog doesn't declare; checked-in schema.neo4j.json stays current).
  • test/test_neo4j_bolt.py — opt-in (RUN_CONTAINER_TESTS=1) Neo4j Testcontainers integration test: full push, idempotent re-push, vanished-module prune.
  • Verified against a live Neo4j 5 (Testcontainers via podman): pushed the analysis of this repo and cross-checked the graph against analysis.json — node counts match exactly (2397 nodes), relationship counts match after MERGE de-dup, and every label is Py-prefixed / every rel type PY_-prefixed.

Packaging

  • packaging/install/canpy-installer.shcurl | sh installer (uv / pipx / pip; CANPY_* env overrides).
  • README consolidated with Neo4j docs; CHANGELOG 0.2.0; version bump 0.1.15 → 0.2.0.
  • Diagrams: neo4j-schema.drawio (the property-graph schema) and schema-uml.drawio (the analysis.json containment tree, relaid out).

Tracked issues

This pull request implements and closes the following:

Closes #34 - Neo4j property-graph projection and graph.cypher snapshot writer
Closes #35 - Incremental Neo4j Bolt writer (live push)
Closes #36 - Declarative schema catalog, --emit schema contract, and conformance test
Closes #37 - Read Neo4j connection options from environment variables
Closes #38 - Namespace graph schema with Py and PY_ prefixes for multi-language coexistence
Closes #39 - Rename CLI command to canpy
Closes #40 - Retain codeanalyzer as a deprecated alias for canpy
Closes #41 - Auto-generate the README --help block from the CLI
Closes #42 - Shell installer (canpy-installer.sh, curl | sh)
Closes #43 - Neo4j Testcontainers integration test

Port the codeanalyzer-typescript v0.4.0 Neo4j feature to Python under the
neo4j feature branch, with the same CLI arg entrypoints.

- codeanalyzer/neo4j: catalog (schema source of truth), project (pure IR ->
  graph rows), cypher snapshot writer, incremental Bolt writer, and the
  output-agnostic rows intermediate. Lazy neo4j-driver import keeps it off the
  default json path.
- CLI: --emit {json,neo4j,schema}, --app-name, --neo4j-uri/-user/-password/
  -database; -i/--input now optional for --emit schema.
- --emit neo4j writes a self-contained graph.cypher, or pushes incrementally to
  a live Neo4j over Bolt; --emit schema emits the version-stamped schema.json
  contract (checked in as schema.neo4j.json).
- Tests: schema-conformance (always runs, anti-drift guard) + opt-in Neo4j
  Testcontainers bolt integration test (RUN_CONTAINER_TESTS=1).
- packaging/install/codeanalyzer-installer.sh: curl|sh installer (uv/pipx/pip).
- release.yml: sync schema.neo4j.json + publish schema.json and installer as
  release assets. README/CHANGELOG updates; schema-uml.drawio. Version 0.2.0.
The four --neo4j-* connection options now fall back to the standard
NEO4J_URI / NEO4J_USERNAME / NEO4J_PASSWORD / NEO4J_DATABASE environment
variables when the flag is omitted (explicit flag wins). Prefer the env var
for the password so it doesn't land in shell history or the process list.
@rahlk rahlk changed the title feat(neo4j): project analysis.json into a Neo4j property graph feature/neo4j: project analysis.json into a Neo4j property graph Jun 20, 2026
@rahlk rahlk changed the title feature/neo4j: project analysis.json into a Neo4j property graph feature/neo4j: project analysis.json into a Neo4j property graph Jun 20, 2026
rahlk added 3 commits June 20, 2026 14:05
…PY_*)

In a shared Neo4j instance, unprefixed labels and relationship types from
different language analyzers collide: `MERGE (:Application {name})` and
`:Symbol`/`HAS_MODULE` from a future Java/JS backend would fuse with Python's.
Labels and relationship types are separate Neo4j namespaces, so both are
prefixed — every node label gets `Py` (e.g. `:PyClass`, shared MERGE label
`:PySymbol`) and every relationship type gets `PY_` (e.g. `PY_CALLS`).
Constraint/index names are also globally unique per-DB, so they get a `py_`
prefix too.

- catalog.py: the source-of-truth labels, merge labels, and rel types
- schema.py: DDL label refs + constraint/index names
- project.py, cypher.py, bolt.py, rows.py: emitter + both writers
- tests, sample app, README, CHANGELOG, --app-name help, schema.neo4j.json
- neo4j-schema.drawio: new property-graph diagram; schema-uml.drawio: relayout

SCHEMA_VERSION stays 1.0.0 (the schema is new on this branch — no released
consumer has seen the unprefixed 1.0.0).
Rename the CLI command from `codeanalyzer` to `canpy`, paralleling the
TypeScript sibling's `cants`. The PyPI package (`codeanalyzer-python`) and the
importable `codeanalyzer` module are unchanged — only the console-script entry
point, the Typer app name, and user-facing docs/installer change.

- pyproject.toml: console-script entry point `canpy`
- __main__.py: Typer app name `canpy`
- packaging/install: rename installer to canpy-installer.sh; CANPY_* env vars
- scripts/update_readme.py: render `canpy --help` into the README between
  <!-- BEGIN/END canpy-help --> markers (mirrors codeanalyzer-typescript's
  scripts/update-readme.ts), so the documented options can't drift from the CLI
- release.yml: sync the README --help block alongside schema.neo4j.json before
  publishing, and stage canpy-installer.sh as a release asset
- README/CHANGELOG: command + installer references; regenerated --help block
  (also surfaces previously-undocumented --ray/--skip-tests/--file-name)
Re-add the `codeanalyzer` console script for backwards compatibility. It points
to a thin shim that prints a one-line deprecation notice to stderr (so piped
stdout — e.g. `--emit schema`) stays clean) and then runs the CLI unchanged.
To be removed in a future release.
@rahlk rahlk changed the title feature/neo4j: project analysis.json into a Neo4j property graph feat(neo4j): property-graph output with Py/PY_-namespaced schema; rename CLI to canpy Jun 20, 2026
@rahlk rahlk changed the title feat(neo4j): property-graph output with Py/PY_-namespaced schema; rename CLI to canpy feature/neo4j: property-graph output with Py/PY_-namespaced schema; rename CLI to canpy Jun 20, 2026
@rahlk rahlk added the enhancement New feature or request label Jun 20, 2026
@rahlk rahlk self-assigned this Jun 20, 2026
@rahlk rahlk added the documentation Improvements or additions to documentation label Jun 20, 2026
@rahlk rahlk merged commit 6498575 into main Jun 20, 2026
@rahlk rahlk deleted the feature/neo4j branch June 20, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment