feature/neo4j: property-graph output with Py/PY_-namespaced schema; rename CLI to canpy#33
Merged
Conversation
Port the codeanalyzer-typescript v0.4.0 Neo4j feature to Python under the
neo4j feature branch, with the same CLI arg entrypoints.
- codeanalyzer/neo4j: catalog (schema source of truth), project (pure IR ->
graph rows), cypher snapshot writer, incremental Bolt writer, and the
output-agnostic rows intermediate. Lazy neo4j-driver import keeps it off the
default json path.
- CLI: --emit {json,neo4j,schema}, --app-name, --neo4j-uri/-user/-password/
-database; -i/--input now optional for --emit schema.
- --emit neo4j writes a self-contained graph.cypher, or pushes incrementally to
a live Neo4j over Bolt; --emit schema emits the version-stamped schema.json
contract (checked in as schema.neo4j.json).
- Tests: schema-conformance (always runs, anti-drift guard) + opt-in Neo4j
Testcontainers bolt integration test (RUN_CONTAINER_TESTS=1).
- packaging/install/codeanalyzer-installer.sh: curl|sh installer (uv/pipx/pip).
- release.yml: sync schema.neo4j.json + publish schema.json and installer as
release assets. README/CHANGELOG updates; schema-uml.drawio. Version 0.2.0.
The four --neo4j-* connection options now fall back to the standard NEO4J_URI / NEO4J_USERNAME / NEO4J_PASSWORD / NEO4J_DATABASE environment variables when the flag is omitted (explicit flag wins). Prefer the env var for the password so it doesn't land in shell history or the process list.
analysis.json into a Neo4j property graph
…PY_*)
In a shared Neo4j instance, unprefixed labels and relationship types from
different language analyzers collide: `MERGE (:Application {name})` and
`:Symbol`/`HAS_MODULE` from a future Java/JS backend would fuse with Python's.
Labels and relationship types are separate Neo4j namespaces, so both are
prefixed — every node label gets `Py` (e.g. `:PyClass`, shared MERGE label
`:PySymbol`) and every relationship type gets `PY_` (e.g. `PY_CALLS`).
Constraint/index names are also globally unique per-DB, so they get a `py_`
prefix too.
- catalog.py: the source-of-truth labels, merge labels, and rel types
- schema.py: DDL label refs + constraint/index names
- project.py, cypher.py, bolt.py, rows.py: emitter + both writers
- tests, sample app, README, CHANGELOG, --app-name help, schema.neo4j.json
- neo4j-schema.drawio: new property-graph diagram; schema-uml.drawio: relayout
SCHEMA_VERSION stays 1.0.0 (the schema is new on this branch — no released
consumer has seen the unprefixed 1.0.0).
Rename the CLI command from `codeanalyzer` to `canpy`, paralleling the TypeScript sibling's `cants`. The PyPI package (`codeanalyzer-python`) and the importable `codeanalyzer` module are unchanged — only the console-script entry point, the Typer app name, and user-facing docs/installer change. - pyproject.toml: console-script entry point `canpy` - __main__.py: Typer app name `canpy` - packaging/install: rename installer to canpy-installer.sh; CANPY_* env vars - scripts/update_readme.py: render `canpy --help` into the README between <!-- BEGIN/END canpy-help --> markers (mirrors codeanalyzer-typescript's scripts/update-readme.ts), so the documented options can't drift from the CLI - release.yml: sync the README --help block alongside schema.neo4j.json before publishing, and stage canpy-installer.sh as a release asset - README/CHANGELOG: command + installer references; regenerated --help block (also surfaces previously-undocumented --ray/--skip-tests/--file-name)
Re-add the `codeanalyzer` console script for backwards compatibility. It points to a thin shim that prints a one-line deprecation notice to stderr (so piped stdout — e.g. `--emit schema`) stays clean) and then runs the CLI unchanged. To be removed in a future release.
analysis.json into a Neo4j property graphcanpy
canpycanpy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Neo4j property-graph output to the Python analyzer (mirroring
codeanalyzer-typescript), namespaces the whole graph schema so multiple language backends can share one database, and renames the CLI command tocanpy. One in-memory analysis (PyApplication) can now be emitted three ways via--emit: the canonicalanalysis.json(default), a Neo4j property graph, or the version-stamped schema contract.Neo4j property graph
New
codeanalyzer/neo4j/package (mirrorssrc/build/neo4j/):catalog.py— declarative schema catalog (single source of truth),SCHEMA_VERSION = 1.0.0project.py— pure projectionPyApplication→ graph rowscypher.py— self-containedgraph.cyphersnapshot writerbolt.py— incremental live-push writer (lazyneo4jimport; content-hash module diffing, vanished-decl cleanup, full-run orphan prune)rows.py/schema.py/emit.pyEmit shapes
--emit neo4j -o ./out→./out/graph.cypher(no driver needed)--emit neo4j --neo4j-uri bolt://…→ incremental Bolt push (needs the[neo4j]extra)--emit schema→schema.json(no project required; checked in asschema.neo4j.json)Schema namespacing (multi-language coexistence)
Every node label is
Py-prefixed and every relationship type isPY_-prefixed (e.g.:PyClass, shared MERGE label:PySymbol,PY_CALLS); constraint/index names get apy_prefix. Labels and relationship types are separate Neo4j namespaces, so both are prefixed — this lets a future Java/JS analyzer share one database withoutMERGE/ rel-type collisions.SCHEMA_VERSIONstays1.0.0(the schema is new on this branch; no released consumer has seen the unprefixed form).CLI rename →
canpycanpy(parallels the TS sibling'scants). The PyPI package (codeanalyzer-python) and the importablecodeanalyzermodule are unchanged.codeanalyzeris retained as a deprecated alias that prints a notice to stderr (so piped stdout — e.g.--emit schema— stays clean) and then runs unchanged; to be removed in a future release.--emit {json,neo4j,schema},--app-name,--neo4j-uri/-user/-password/-database;-i/--inputis optional for--emit schema;-f/--format msgpackretained on the json path.Docs auto-generation
scripts/update_readme.pyrenderscanpy --helpinto the README between<!-- BEGIN/END canpy-help -->markers (the Python twin of TSscripts/update-readme.ts), so documented options can't drift from the CLI. Regenerating surfaced previously-undocumented--ray/--skip-tests/--file-name.release.ymlnow syncs both the README--helpblock andschema.neo4j.jsonfrom source before publishing, and stagescanpy-installer.sh+schema.jsonas Release assets.Tests & verification
test/test_neo4j_schema.py— always-on anti-drift conformance guard (the emitter can never produce a label/rel/property the catalog doesn't declare; checked-inschema.neo4j.jsonstays current).test/test_neo4j_bolt.py— opt-in (RUN_CONTAINER_TESTS=1) Neo4j Testcontainers integration test: full push, idempotent re-push, vanished-module prune.analysis.json— node counts match exactly (2397 nodes), relationship counts match afterMERGEde-dup, and every label isPy-prefixed / every rel typePY_-prefixed.Packaging
packaging/install/canpy-installer.sh—curl | shinstaller (uv / pipx / pip;CANPY_*env overrides).CHANGELOG0.2.0; version bump0.1.15 → 0.2.0.neo4j-schema.drawio(the property-graph schema) andschema-uml.drawio(theanalysis.jsoncontainment tree, relaid out).Tracked issues
This pull request implements and closes the following:
Closes #34 - Neo4j property-graph projection and graph.cypher snapshot writer
Closes #35 - Incremental Neo4j Bolt writer (live push)
Closes #36 - Declarative schema catalog,
--emit schemacontract, and conformance testCloses #37 - Read Neo4j connection options from environment variables
Closes #38 - Namespace graph schema with Py and PY_ prefixes for multi-language coexistence
Closes #39 - Rename CLI command to
canpyCloses #40 - Retain
codeanalyzeras a deprecated alias forcanpyCloses #41 - Auto-generate the README --help block from the CLI
Closes #42 - Shell installer (canpy-installer.sh, curl | sh)
Closes #43 - Neo4j Testcontainers integration test