Quickstart

This guide gets you from a clone to a working analysis.json in a couple of minutes. For installation alternatives (native binary, pre-built JAR via the Python SDK), see Installation.

Prerequisites

A Linux, macOS, or WSL machine
A JDK, version 11 or above (Java 17 recommended). We suggest installing it with SDKMan!

Build the JAR

Install a JDK (Java 17 shown here, via SDKMan!):
Terminal window
```
sdk install java 17.0.10-sem
sdk use java 17.0.10-sem
```
Clone and build the fat JAR:
Terminal window
```
git clone https://github.com/codellm-devkit/codeanalyzer-java
cd codeanalyzer-java
./gradlew fatJar
```
The build produces a self-contained JAR at build/libs/codeanalyzer-2.3.7.jar.

Confirm it runs:

java -jar build/libs/codeanalyzer-2.3.7.jar --version

Run your first analysis

Symbol table only (fast)

Analysis level 1 parses source and builds the symbol table. It does not require building the target project, so it’s quick:

java -jar build/libs/codeanalyzer-2.3.7.jar \
  -i /path/to/your/project \
  -a 1 \
  -o ./output

This writes ./output/analysis.json containing the symbol_table for every .java file.

Symbol table + call graph

Analysis level 2 additionally builds the WALA call graph. By default codeanalyzer will build the target project (so WALA has compiled classes and resolved dependencies to work from):

java -jar build/libs/codeanalyzer-2.3.7.jar \
  -i /path/to/your/project \
  -a 2 \
  -o ./output \
  -v

The -v flag streams progress logs so you can watch the build and call-graph construction.

Analyze a single source string

No project, no build — pass Java source directly and get a symbol table back on stdout:

java -jar build/libs/codeanalyzer-2.3.7.jar \
  -s "public class Hello { public static void main(String[] a){} }" \
  -a 1

Read the output

analysis.json has this top-level shape:

{
  "symbol_table": { "/abs/path/File.java": { /* compilation unit */ } },
  "call_graph": [ /* caller→callee edges, present at level 2 */ ],
  "version": "2.3.7"
}

Continue to the Output schema for the full structure, or the CLI reference for every flag.

Emit to Neo4j

analysis.json is self-contained, but it doesn’t compose: to ask a question across a portfolio you load every blob into memory and stitch it together yourself. --emit neo4j projects the same symbol table and call graph into a Neo4j property graph instead — a queryable system of record that many applications can share. --emit selects one output target, so --emit neo4j returns without writing analysis.json.

The quickest path needs no running database. With no Bolt URI, codeanalyzer renders a self-contained, re-runnable graph.cypher snapshot:

java -jar build/libs/codeanalyzer-2.3.7.jar \
  -i /path/to/your/project \
  -a 2 \
  --emit neo4j \
  --app-name daytrader8 \
  -o ./output
# -> ./output/graph.cypher

--app-name is the tenancy key — it anchors this app’s subgraph at a :JApplication node, so one database can host many apps side by side. Load the snapshot into any Neo4j whenever you’re ready; the script declares its constraints and indexes, does a scoped wipe of just this app’s prior subgraph, then MERGE-loads the graph:

cypher-shell -a bolt://localhost:7687 -u neo4j < ./output/graph.cypher

To push live and incrementally to a running cluster — only re-sending the compilation units whose content_hash changed — set a Bolt URI instead. Prefer the NEO4J_PASSWORD environment variable so the secret never lands in shell history:

export NEO4J_URI=bolt://localhost:7687
export NEO4J_USERNAME=neo4j
export NEO4J_PASSWORD=secret

java -jar build/libs/codeanalyzer-2.3.7.jar \
  -i /path/to/your/project -a 2 \
  --emit neo4j --app-name daytrader8

Once the graph is populated, the CLDK Python SDK reads it back with no JDK, binary, or project source — only read-only credentials. See the Neo4j graph output guide for the two emit modes, deployment as a Kubernetes Job, and the --emit schema contract, or Python SDK integration to read the graph from Python.