Skip to content

Merge pull request #49 from codellm-devkit/feat/jedi-shard-planner#51

Merged
rahlk merged 1 commit into
mainfrom
feat/level2-pycg-remove-codeql
Jun 27, 2026
Merged

Merge pull request #49 from codellm-devkit/feat/jedi-shard-planner#51
rahlk merged 1 commit into
mainfrom
feat/level2-pycg-remove-codeql

Conversation

@rahlk

@rahlk rahlk commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Replaces CodeQL with PyCG as the level 2 call graph backend, and adds coupling-aware sharding so PyCG scales to large apps.

Motivation and Context

PyCG does not scale past a few hundred files. A flat file-count shard forces every shard small (severs many call edges, hurts recall) just to tame the few shards that diverge. This shards by Jedi module coupling instead, and recovers diverging shards by re-sharding only them.

How Has This Been Tested?

Unit tests for the planner, dep exclusion, max_iter, and the adaptive loop (14 tests, all pass).

End to end on a real app. Benchmark app: Odoo, 1028 modules, level 2, Ray. PyCG edges recovered:

uniform ceiling 100, timeout 90s    13302 edges   ~100 files lost   96s
uniform ceiling 100, timeout 300s   17149 edges   ~100 files lost   307s
adaptive (start 100)                22210 edges     20 files lost    760s

Adaptive recovers 22210 edges (+30% over the best uniform run), losing only 20 of 1028 files instead of a whole 100-file shard.

Breaking Changes

Yes.

--codeql / --no-codeql   removed, replaced by --analysis-level {1,2}
edge provenance          "codeql" literal becomes "pycg"
new dependency           pycg (Apache 2.0)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the Codellm-Devkit Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Sharding algorithm:

plan = scc_louvain_shards(jedi_module_graph, budget=ceiling)
edges, budget = [], ceiling
while plan.shards:
    converged, runaways = run_pycg(plan.shards, timeout)   # symlink-bounded, Ray
    edges += converged
    if not runaways: break
    budget = max(floor, budget // 2)
    if budget did not shrink or max rounds hit:
        runaways -> jedi_only ; break
    next = []
    for r in runaways:
        sub = scc_louvain_shards(r.files, budget)   # re-shard that runaway alone
        if sub did not split: r -> jedi_only ; continue
        next += sub.shards
    plan.shards = budget = next
return coalesce(edges)

New flags:

--analysis-level {1,2}                 default 1
--pycg-shard / --no-pycg-shard         shard level 2 on large projects
--pycg-shard-strategy {jedi,package}   jedi (default) uses the planner
--pycg-shard-ceiling (default 100)     starting budget per shard
--pycg-shard-timeout (default 120)     per-shard wall clock
--pycg-max-iter (default 50)           caps PyCG fixpoint passes

Caveats:

  1. Adaptive wall time is higher (760s vs 300s). Decomposition rounds run in sequence, and each round waits the full timeout for its runaways before re-sharding. Tunable later (shorter per-round timeout, overlap rounds).
  2. 20 files stay Jedi-only. They are a true PyCG divergence (the ORM metaclass core), not a bug. PyCG has no convergence guarantee on its field-sensitive access paths.
  3. Numbers are from one app (Odoo). They will vary by codebase.
  4. timeout and max_iter are the only guards against PyCG running forever. With timeout 0 and max_iter -1, a divergent shard never returns.

PyCG sharding: Jedi planner + adaptive decomposition of runaways
@rahlk rahlk merged commit e07887c into main Jun 27, 2026
@rahlk rahlk deleted the feat/level2-pycg-remove-codeql branch June 27, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant