Skip to content

Add go support#230

Draft
sinha108 wants to merge 6 commits into
mainfrom
add-go-support
Draft

Add go support#230
sinha108 wants to merge 6 commits into
mainfrom
add-go-support

Conversation

@sinha108

@sinha108 sinha108 commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Add Go language support to the SDK via the codeanalyzer-go subprocess backend, bringing Go to full parity with the existing Java, Python, and TypeScript integrations.

Motivation and Context

Go is a first-class language in the codeanalyzer-go backend but was absent from the Python SDK. Users analysing polyglot codebases that include Go had no CLDK-native path; they had to shell out to the binary themselves and deserialise the output manually. This PR closes that gap.

How Has This Been Tested?

Mocked unit tests (23, no binary required)

  • Symbol table round-trip: get_symbol_table, get_file, get_types_in_file, get_type
  • Callable queries: get_callables_in_file, get_callable
  • Call graph: get_call_graph, get_callers, get_callees, unknown-node edge case
  • Pydantic model round-trip (GoApplication.model_dump_jsonmodel_validate_json)
  • Model aliases (GoFile.types, GoFile.package_name)
  • Binary flag correctness: --eager present/absent, --target-files repeated per entry / absent when None
  • ABC contract: issubclass(GoCodeanalyzer, GoAnalysisBackend), isinstance(analysis._codeanalyzer, GoAnalysisBackend)

End-to-end tests against a real binary (25, auto-skipped when codeanalyzer-go is absent)

Fixture project under tests/resources/go/application/ exercises:

  • Multi-file package (calc/calc.go, calc/formatter.go)
  • Cross-file method attachment (value receiver defined in a different file from the struct)
  • Pointer vs value receivers, embedded fields, exported/unexported identifiers
  • Variadic parameters, multiple return types
  • Goroutine call sites, cyclomatic complexity
  • Full call graph: cross-package edges, every endpoint resolvable in the symbol table
  • Cache correctness: second run with eager=False does not re-invoke the binary

Breaking Changes

None. All additions are new surface; existing Java / Python / TypeScript / C paths are untouched.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the Codellm-Devkit Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

New public surface

from cldk import CLDK

analysis = CLDK.go(
    project_path="/path/to/my-go-project",
    analysis_level="call_graph",   # "symbol_table" (default) or "call_graph"
    eager=True,                    # always re-run the binary
    target_files=["pkg/server.go"],# narrow analysis to specific files
)

st       = analysis.get_symbol_table()                              # Dict[str, GoFile]
go_file  = analysis.get_file("pkg/server.go")                      # Optional[GoFile]
fn       = analysis.get_callable("example.com/mymod.Server.Handle")# Optional[GoCallable]
callers  = analysis.get_callers("example.com/mymod.Server.Handle") # List[str]
callees  = analysis.get_callees("example.com/mymod.main")          # List[str]
cg       = analysis.get_call_graph()                               # networkx.DiGraph

Architecture

Follows the established pattern exactly:

Layer Go Java (reference)
Factory CLDK.go() CLDK.java()
Facade GoAnalysis JavaAnalysis
Backend ABC GoAnalysisBackend JavaAnalysisBackend
Local impl GoCodeanalyzer (subprocess) JCodeanalyzer (subprocess JAR)
Config type GoCodeAnalyzerConfig JCodeAnalyzerConfig
Models cldk/models/go/ cldk/models/java/

Binary requirement

codeanalyzer-go must be on PATH (built from the codeanalyzer-go repo). The SDK raises CodeanalyzerExecutionException with a clear message if it is absent. All unit tests mock shutil.which and subprocess.run so they pass in CI without the binary.

Commits

  • 5abe90a — initial Go support (models, codeanalyzer, facade, core factory, README, tests)
  • aac1203 — docs: add Go to README language support table
  • c146713 — merge: bring branch up to date with main, conform Go to new backend-config API
  • f9b52ce — fix: --eager flag never passed to binary
  • d196199 — fix: target_files not wired end-to-end through CLDK.go()GoAnalysis → binary
  • 31e497f — feat: GoAnalysisBackend ABC; fix e2e test reading analysis.json from wrong path

sinha108 added 6 commits June 17, 2026 10:06
Wires CLDK(language="go") end to end:

Models (cldk/models/go/):
- 9 Pydantic models mirroring codeanalyzer-go schema: GoApplication,
  GoFile, GoType, GoCallable, GoCallsite, GoCallEdge, GoField,
  GoParameter, GoImport, GoComment, GoSymbol, GoVariableDeclaration,
  GoEntrypoint.
- _NullSafeBase coerces JSON null to empty list/dict for nil-slice
  serialization from Go.
- GoFile.types / .package_name aliases for spine compatibility.

Analysis facade (cldk/analysis/go/):
- GoCodeanalyzer subprocess wrapper: shells out to codeanalyzer-go
  binary discovered via shutil.which(); passes --analysis-level as
  integer (1/2), not enum string.
- GoAnalysis facade: get_application_view, get_symbol_table, get_file,
  get_all_types, get_types_in_file, get_type, get_all_callables,
  get_callables_in_file, get_callable, get_call_graph (nx.DiGraph),
  get_callers, get_callees, get_call_graph_edges.

Core dispatch (cldk/core.py):
- elif language == "go" branch; rejects source_code mode.

pyproject.toml: codeanalyzer-go = "0.1.0" pinned under
[tool.backend-versions].

Tests:
- 17 mocked SDK tests (backend patched, no binary required).
- 25 E2E tests against a multi-package Go fixture; skipif when
  codeanalyzer-go is absent from PATH.
- Fixture: calc (Calculator, Operator interface, embedded field,
  pointer/value receivers, multi-return, cross-file method) +
  pipeline (goroutine, variadic, cyclomatic complexity) + main.go.

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
- TOC: add Go under Analysis Backends
- Mermaid diagram: add cldk.analysis.go → codeanalyzer-go branch and Go models node
- Update "full support for Java, Python, C" → include Go
- Update cldk.models example list to include cldk.models.go
- Add Go backend section: tools, capabilities, prerequisites, usage snippet

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
Merges origin/main (40 commits) and adapts the Go language support to the
refactored per-language factory pattern introduced on main.

Conflict resolutions:
- pyproject.toml: bump codeanalyzer-java/python/typescript to main versions;
  retain codeanalyzer-go = 0.1.0 in [tool.backend-versions]
- README.md: adopt main's cleaner structure; add Go row to supported-languages
  table, Go node to architecture Mermaid diagram, CLDK.go() in Quick Start
- cldk/core.py: keep main's per-language factory layout; add CLDK.go() factory
  and wire Go into the deprecated analysis() shim via GoCodeAnalyzerConfig
- uv.lock: regenerated from resolved pyproject.toml

API conformance (Go):
- backend_config.py: add GoCodeAnalyzerConfig dataclass, GoBackend union, and
  "go" key in _CACHE_KEYS so cache_subdir computes the right output path
- GoAnalysis.__init__: replace (analysis_backend_path, analysis_json_path,
  cache_dir) with backend: GoBackend | None = None; derive output path via
  cache_subdir(backend.cache_dir, project_dir, "go")
- GoCodeanalyzer.__init__: drop analysis_backend_path (always PATH) and
  cache_dir (folded into analysis_json_path); clean up unused imports
- core.py: add CLDK.go() static factory; update analysis() Go branch to call
  CLDK.go() with GoCodeAnalyzerConfig
- Tests: update GoAnalysis instantiations to use new backend= API

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
…sis=True

Previously GoCodeanalyzer bypassed its own analysis.json cache when
eager_analysis=True, but never forwarded --eager to the binary. The
binary's own internal cache (~/.cldk/go-cache) was therefore still used,
so repeated eager calls could return stale results.

Pass --eager to the subprocess args unconditionally when eager_analysis
is set; the binary will force a clean rebuild and ignore its cache.

Tests: add test_eager_flag_passed_to_binary and
test_eager_flag_absent_when_not_eager in test_go_analysis.py, patching
subprocess.run at the GoCodeanalyzer layer to assert on the argv.

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
…→ binary

CLDK.go() accepted target_files but neither passed it to GoAnalysis nor did
GoAnalysis or GoCodeanalyzer forward it to the codeanalyzer-go binary. The
--target-files flag therefore had no effect, silently ignoring incremental
mode.

Changes:
- GoCodeanalyzer.__init__: add target_files: Optional[List[str]] = None;
  append one --target-files <path> per entry in _run_and_parse
- GoAnalysis.__init__: add target_files param; forward to GoCodeanalyzer
- CLDK.go(): add target_files param; forward to GoAnalysis
- analysis() shim: forward target_files to CLDK.go()

Tests: add test_target_files_passed_to_binary and
test_target_files_absent_when_none in test_go_analysis.py.

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
Java, Python, and TypeScript all define an abstract base class that
every analysis backend implements, letting GoAnalysis hold its backend
via the interface rather than the concrete type. This commit brings Go
to parity:

- Add cldk/analysis/go/backend.py with GoAnalysisBackend(ABC) exposing
  six abstract methods: get_application, get_symbol_table, get_all_files,
  get_file, get_all_types, get_all_callables.
- Make GoCodeanalyzer subclass GoAnalysisBackend (satisfies ABC at class
  definition time; TypeError at startup if any abstract method is missing).
- Type GoAnalysis._codeanalyzer as GoAnalysisBackend so static type
  checkers enforce the interface boundary.
- Export GoAnalysisBackend from cldk/analysis/go/__init__.py.
- Add two unit tests: GoCodeanalyzer is a subclass of GoAnalysisBackend
  (issubclass), and GoAnalysis._codeanalyzer is a GoAnalysisBackend
  instance at runtime (subprocess-stub pattern, no mock needed).
- Fix test_e2e_application_round_trips_pydantic: cache_subdir writes
  analysis.json under <cache_dir>/go/, not directly under cache_dir.

Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
@sinha108 sinha108 requested a review from rahlk July 3, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant