Python SDK (CLDK)
codeanalyzer-java is the JVM analysis engine behind CodeLLM-DevKit (CLDK)‘s Java support. The SDK doesn’t re-implement Java analysis — it gets the analysis from this engine and wraps it in a typed facade so Python callers never touch the backend directly.
There are two ways it gets that analysis, and you pick between them by the type of object you pass to backend=:
CodeAnalyzerConfig(the default) — the SDK runscodeanalyzeron your project, parses the resultinganalysis.json, and builds the model in process. This is the classic flow: source in, models out.Neo4jConnectionConfig— the SDK becomes a read-only Cypher client. It reads a Neo4j property graph that some other job already populated withcodeanalyzer --emit neo4j, and reconstructs the same typed models from the graph. No JDK, no analyzer binary, no project source on the consumer.
Both produce an identical JavaAnalysis facade. The second is the one that scales across a portfolio — more on that below.
The flow
Section titled “The flow”flowchart LR
A["CLDK.java(...)"] --> SEL{"backend type?"}
SEL -->|"CodeAnalyzerConfig<br/>(default)"| B["JCodeanalyzer<br/>backend"]
B --> C["codeanalyzer<br/>-i project -a level"]
C --> D["analysis.json (IR)"]
SEL -->|"Neo4jConnectionConfig"| N["JavaAnalysisBackend<br/>(read-only Cypher)"]
N --> G["Neo4j property graph<br/>(:JApplication {name})"]
C -. "out of band:<br/>codeanalyzer --emit neo4j" .-> EMIT{"Bolt URI set?"}
EMIT -->|"yes (fat jar)"| BOLT["live Bolt push<br/>(incremental, content_hash diff)"]
EMIT -->|"no"| SNAP["graph.cypher snapshot<br/>(scoped wipe + UNWIND MERGE)"]
BOLT --> G
SNAP -.->|"cypher-shell < graph.cypher"| G
D --> E["typed models<br/>JApplication / JType / JCallable"]
G --> E
E --> F["JavaAnalysis facade"]
The left half is the in-process backend; the right half is the Neo4j backend. The dotted edges show how the graph gets there in the first place: an analyzer run with --emit neo4j projects the very same IR (the thing that would otherwise become analysis.json) into the graph, either as a re-runnable graph.cypher snapshot or as a live incremental Bolt push. Whichever path filled the graph, the SDK reads it back into the same Pydantic models.
Default backend: running the analyzer
Section titled “Default backend: running the analyzer”This is the in-process flow. If you don’t pass backend=, CLDK shells out to codeanalyzer, parses analysis.json, and builds the model.
- Binary discovery — if you don’t point the SDK at a specific build, it locates a bundled
codeanalyzerdistribution from its package resources. - Invocation — it runs the analyzer over your project at the requested level (
codeanalyzer -i <project> -a <level> -o <tmpdir>) and reads back the emittedanalysis.json. - Parsing — the JSON is deserialized into Pydantic models:
JApplication(the whole document),JType,JCallable, and the rest — mirroring the output schema. - Facade — the models are wrapped in
JavaAnalysis, which exposes query methods likeget_classes(),get_methods_in_class(),get_call_graph(), andget_callers().
from cldk import CLDKfrom cldk.analysis import AnalysisLevel
# No backend= -> the default in-process JCodeanalyzer backendanalysis = CLDK.java( project_path="commons-cli", analysis_level=AnalysisLevel.call_graph, # -> runs with -a 2)
print(len(analysis.get_classes()), "classes")print(analysis.get_call_graph()) # -> networkx.DiGraphThe analysis_level maps directly onto the analyzer’s -a flag: AnalysisLevel.symbol_table → -a 1, AnalysisLevel.call_graph → -a 2.
Neo4j backend: reading from the graph
Section titled “Neo4j backend: reading from the graph”The default backend re-analyzes the project on every run, in process. That’s fine for one project on a developer’s laptop. It does not compose across a portfolio: every analysis.json is a standalone document that has to be loaded whole into memory, and forty services means forty JSON blobs and forty re-runs.
The Neo4j backend inverts this. Analysis is produced once, centrally — a CI or Kubernetes job runs codeanalyzer --emit neo4j and pushes an app-scoped subgraph into a shared Neo4j database (see the Neo4j output guide). Every consumer — agents, dashboards, and this SDK — is then a lightweight read-only client that just queries the graph. No analysis happens on the read side at all.
Pass a Neo4jConnectionConfig to backend= and the facade swaps onto the read-only JavaAnalysisBackend:
from cldk import CLDKfrom cldk.analysis import AnalysisLevelfrom cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.java( analysis_level=AnalysisLevel.call_graph, backend=Neo4jConnectionConfig( uri="bolt://localhost:7687", username="neo4j", password="neo4j", application_name="daytrader8", ),)
symbol_table = analysis.get_symbol_table() # Dict[str, JCompilationUnit]cg = analysis.get_call_graph() # networkx.DiGraphklass = analysis.get_class("com.example.MyService") # JTypemethods = analysis.get_methods_in_class("com.example.MyService")The driver is an optional dependency — install it with the extra:
pip install "cldk[neo4j]" # or: pip install neo4jIf the neo4j driver isn’t installed, constructing the backend raises CodeanalyzerExecutionException with that install hint.
Connection config
Section titled “Connection config”Neo4jConnectionConfig is a thin wrapper over the official neo4j Python driver. The driver is created with GraphDatabase.driver(uri, auth=(username, password)) and every query runs in session(database=database).
| Field | Default | Notes |
|---|---|---|
uri | (required) | Bolt URI of the Neo4j server, e.g. bolt://localhost:7687. |
username | "neo4j" | Read-only credentials are sufficient — the SDK never writes. |
password | "neo4j" | Read-only credentials are sufficient. |
database | None | Database name; None uses the server’s default database. |
application_name | None | The :JApplication anchor to scope every query to. |
Because the graph is external, project_path is optional for the Neo4j backend — there is no source tree to point at. The backend is also a context manager, so you can scope the driver’s lifetime:
import os
with CLDK.java( backend=Neo4jConnectionConfig( uri="bolt://neo4j.internal:7687", username="reader", # read-only RBAC role password=os.environ["NEO4J_PASSWORD"], application_name="daytrader8", ),) as analysis: entrypoints = analysis.get_entry_point_methods() cruds = analysis.get_all_crud_operations()What you get back
Section titled “What you get back”The backend doesn’t return raw graph rows. It bulk-fetches nodes and relationships in a handful of Cypher queries and reconstructs the canonical JApplication — handing an analysis.json-shaped payload to JApplication(**payload) — exactly the model the in-process analyzer would have built. So the get_* methods return the identical typed objects (JType, JCallable, a networkx.DiGraph call graph) regardless of which backend produced them:
# Same methods, same return types, whichever backend you choseanalysis.get_symbol_table() # Dict[str, JCompilationUnit]analysis.get_classes() # all JType nodes for this applicationanalysis.get_class(fqn) # one JTypeanalysis.get_methods_in_class(fqn) # callables in a classanalysis.get_callers(...) # who calls this (level 2)analysis.get_callees(...) # what this calls (level 2)analysis.get_entry_point_methods()analysis.get_all_crud_operations()A get_call_graph() over the Neo4j backend reads the projected J_CALLS edges directly out of the graph:
MATCH (app:JApplication {name: $appName})-[:J_HAS_UNIT]->(:JCompilationUnit) -[:J_DECLARES_TYPE]->(:JType)-[:J_HAS_CALLABLE]->(caller:JCallable)MATCH (caller)-[:J_CALLS]->(callee:JCallable)RETURN caller.id AS source, callee.id AS targetCaveats and version requirements
Section titled “Caveats and version requirements”Choosing a backend
Section titled “Choosing a backend”Use when you have the project source on hand and want a self-contained, one-shot analysis — local development, a single repo in CI, a notebook.
analysis = CLDK.java( project_path="my_project", analysis_level=AnalysisLevel.call_graph,)Every run re-analyzes the project; nothing is shared between runs or services.
Use when analysis is produced centrally and read in many places — agents, dashboards, cross-service queries — without shipping the JDK, the analyzer binary, or the source to every consumer.
analysis = CLDK.java( analysis_level=AnalysisLevel.call_graph, backend=Neo4jConnectionConfig( uri="bolt://neo4j.internal:7687", password=os.environ["NEO4J_PASSWORD"], application_name="daytrader8", ),)Analysis is produced once by a separate --emit neo4j job; reads scale independently and cost a Cypher query.
The whole point of the Neo4j backend is that the read side carries no analysis dependency. Forty services analyzed by forty Kubernetes jobs land in one cluster, each anchored at its own :JApplication, and a single SDK client queries across all of them by application_name — a graph traversal, not forty JSON parses.
Pointing at a custom build
Section titled “Pointing at a custom build”To use an analyzer you built yourself — say, a local development build — pass analysis_backend_path (a directory containing the analyzer distribution):
analysis = CLDK.java( project_path="my_project", analysis_level=AnalysisLevel.call_graph, analysis_backend_path="/path/containing/codeanalyzer",)This is the bridge between this repo and the SDK: build it (Installation), then point the SDK at your build output. This applies to the in-process backend only — the Neo4j backend has no analyzer to locate, since the graph was populated out of band.
See also
Section titled “See also”- Neo4j graph output — how to populate the graph with
--emit neo4j, snapshot vs. live Bolt, and the producer/consumer deployment model. - Neo4j graph schema — the node labels,
J_*relationships, constraints, and indexes the backend reads. - CLI options —
--emit,--app-name, and the--neo4j-*connection flags.
For the bigger picture — concepts, agent recipes, the cross-language API — see the main CodeLLM-DevKit documentation.