| title | Quickstart |
|---|---|
| description | Install codeanalyzer-python, run it against a Python project, and read your first analysis — as analysis.json or a Neo4j property graph — in three steps. |
import { Steps, Aside, LinkCard, CardGrid, Tabs, TabItem } from "@astrojs/starlight/components";
canpy points at a Python project and produces one typed artifact — its symbol table, call graph, and framework entrypoints. Three steps below: install, run it against a project, and read the result. Then emit the same analysis into a Neo4j property graph.
-
Install the CLI.
pip install codeanalyzer-python
That installs the
The command was renamed from `codeanalyzer` to `canpy` (matching the `cants` TypeScript sibling). The old `codeanalyzer` command still works as a deprecated alias and prints a notice to stderr.canpycommand. Jedi and Tree-sitter ship with the package; CodeQL is downloaded on demand only if you opt in with--codeql. -
Run it against a project.
Point
--inputat any Python project root and--outputat a directory for the result.canpy --input ./my-python-project --output ./out
On the first run
Omit `--output` to stream the JSON to stdout instead — handy for piping into `jq`:canpycreates a virtual environment under.codeanalyzer/, installs the project's dependencies into it, walks every.pyfile, and writes./out/analysis.json. This is the default--emit jsontarget.canpy --input ./my-python-project | jq '.entrypoints'
-
Read the result.
analysis.jsonis a singlePyApplicationobject with three top-level keys.jq 'keys' ./out/analysis.json # [ "call_graph", "entrypoints", "symbol_table" ] jq '.symbol_table | length' ./out/analysis.json # modules analyzed jq '.call_graph | length' ./out/analysis.json # call edges
That's it — a directory of source files is now a typed, queryable model of the program.
The call graph is a flat list of source -> target edges keyed by callable signature, so it drops straight into networkx:
import json
import networkx as nx
app = json.load(open("./out/analysis.json"))
g = nx.DiGraph()
for edge in app["call_graph"]:
g.add_edge(edge["source"], edge["target"])
print(g.number_of_nodes(), "nodes,", g.number_of_edges(), "edges")
# Is a sink reachable from an entrypoint? A graph query, not a guess.
# print(nx.has_path(g, entry_sig, sink_sig))This works well for one application held in memory. When you want the analysis to persist, compose across many applications, or be read by other tools without re-running it, emit it into Neo4j instead.
canpy builds one analysis in memory and can project it into a labeled property graph with --emit neo4j. Every node label is Py-prefixed and every relationship type PY_-prefixed (:PyClass, PY_CALLS), so Java, TypeScript, and Python analyzers can share one database without label or relationship-type collisions. Each application is anchored at its own :PyApplication node, named by --app-name, so a single Neo4j database holds many applications and you query across them with Cypher instead of loading giant JSON blobs.
There are two ways to get the graph into Neo4j, selected solely by whether you pass --neo4j-uri.
Without --neo4j-uri, canpy writes a self-contained graph.cypher to --output (constraints + indexes, a scoped wipe of this app's prior subgraph, then batched MERGEs). It needs no extra dependencies and expresses the full truth of the analysis:
canpy --input ./my-python-project --emit neo4j --app-name my-service --output ./outLoad it into a running Neo4j with cypher-shell:
cypher-shell < ./out/graph.cypherThe snapshot does a scoped DETACH DELETE of the :PyApplication {name: "my-service"} subtree before reloading, so re-running it replaces this application cleanly without touching other applications in the database.
With --neo4j-uri, canpy pushes to a live Neo4j over Bolt incrementally — it diffs each module's content hash against the database and only rewrites modules that changed, and on a full run it prunes modules whose source file vanished. The prune is scoped to the :PyApplication anchor named by --app-name, so writing one application never deletes another's modules from a shared database.
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USERNAME=neo4j
export NEO4J_PASSWORD=secret
canpy --input ./my-python-project --emit neo4j --app-name my-serviceThe live push reads NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD, and NEO4J_DATABASE from the environment (an explicit flag wins when set). Prefer the env var for the password so it doesn't land in your shell history or the process list.
Once the graph is loaded, query it with Cypher — for example, the call edges out of a single application:
MATCH (:PyApplication {name: "my-service"})-[:PY_HAS_MODULE]->(:PyModule)
-[:PY_DECLARES]->(c:PyCallable)-[:PY_CALLS]->(callee)
RETURN c.signature, callee.signature
LIMIT 25;The graph is populated out of band by canpy; consumers just read it. The CLDK Python SDK has a read-only Neo4j backend — point it at the Bolt URI with the same application_name you loaded under, and it reconstructs the same typed PyClass / PyCallable objects and the same networkx call graph as the in-process analyzer, with no JDK, no native binary, and no project source on the consumer. It only needs the graph and read-only credentials.
from cldk import CLDK
from cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.python(
backend=Neo4jConnectionConfig(
uri="bolt://localhost:7687",
username="neo4j",
password="neo4j",
application_name="my-service", # matches canpy --app-name
),
)
classes = analysis.get_classes() # Dict[str, PyClass]
cg = analysis.get_call_graph() # networkx.DiGraph keyed by callable signatures
print(len(classes), "classes,", cg.number_of_edges(), "call edges")The Neo4j backend in the SDK is the same optional extra: pip install cldk[neo4j]. See the Neo4j property graph guide for the full schema, incremental semantics, and the SDK read API.
The default run uses Jedi for resolution — fast, no external tooling. Add --codeql to resolve the edges lexical analysis misses (dynamic dispatch, RPC, third-party targets). The CodeQL CLI is downloaded into the project cache on first use and reused thereafter. This augmentation applies to both the json and neo4j emit targets — the same enriched call graph is what gets projected into the property graph.
canpy --input ./my-python-project --output ./out --codeql