Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: codellm-devkit/codeanalyzer-python
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: codellm-devkit/codeanalyzer-python
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: feat/level3-dataflow-sdg
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 11 commits
  • 37 files changed
  • 1 contributor

Commits on Jul 2, 2026

  1. feat(dataflow): stage 1 — exceptional statement-level CFG per callable

    Level-3 groundwork (#67): hand-built CFG from the stdlib ast with the
    shared node/edge vocabulary, Python lowering rules (try/except/else/
    finally, with, yield/await resume kinds, break/continue, synthetic
    escape edge for infinite loops), dead-code pruning, and source-span-
    ordered node ids (ENTRY=0, EXIT=last). Dataflow fixture project and
    CFG gate tests included.
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    2c08649 View commit details
    Browse the repository at this point in the history
  2. feat(dataflow): stage 2 — post-dominators and control dependence

    Cooper–Harper–Kennedy iterative post-dominators over the reverse CFG
    (unique root EXIT, guaranteed by stage 1's synthetic escape edges) and
    Ferrante–Ottenstein–Warren control dependence with ENTRY as the region
    root. Gate tests pin exact hand-computed CDG sets for the fixture's
    if/loop/early-return functions. (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    6af65e0 View commit details
    Browse the repository at this point in the history
  3. feat(dataflow): stage 3 — access paths, reaching definitions, DDG

    k-limited access-path model with per-scope base classification (local/
    param/self/global/capture), header-only facts for compound statements,
    comprehension scoping, closure-capture and call-mutation rules; classic
    worklist reaching definitions with strong kills on exact non-wildcard
    paths; DDG edges via textual interference plus the type-based may-alias
    oracle (the locked MVP points-to substrate — unknown types conservatively
    alias, incompatible types don't). Gate tests cover the loop-carried
    dependency, scope shadowing, and the aliased write/read pair. (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    7377dc3 View commit details
    Browse the repository at this point in the history
  4. feat(dataflow): stage 4 — PDG assembly and exact backward-slice gate

    PDG = CDG ∪ DDG per callable over the same node ids; intraprocedural
    backward slice as reverse reachability. Gate pins hand-computed exact
    slices: the early-return arm is excluded from the other arm's slice,
    loop slices close over the loop-carried dependency. (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    d21fc0a View commit details
    Browse the repository at this point in the history
  5. feat(dataflow): stage 5 — alias oracle wiring, Tarjan SCC, global qua…

    …lification
    
    Iterative Tarjan SCC condensation of the frozen call-graph oracle
    (reverse topological schedule for bottom-up summaries); call mutations
    become suffixed weak defs so caller-visible mutation is distinguishable
    from local rebinding; global bases gain module::name qualification for
    the interprocedural build. (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    789bd1c View commit details
    Browse the repository at this point in the history
  6. feat(dataflow): stages 6–7 — function summaries and SDG assembly

    Relational summaries (params/captures/read-globals → return/mutations/
    written-globals) composed bottom-up over the Tarjan condensation DAG,
    monotone fixpoint within SCCs, callee global footprints injected at
    callsites and reaching definitions re-solved; HRB parameter structure
    (formal/actual in/out nodes in the owning function's id space after
    EXIT), CALL/PARAM_IN/PARAM_OUT edges, SUMMARY edges from composed
    flows, globals as extra formals, closure captures bound at definition
    sites; builder maps symbol-table signatures to AST by (file, line) and
    treats the call graph and Jedi callsite resolutions as frozen oracles.
    Gates: arity, no dangling endpoints, transitive-chain SUMMARY, cross-
    file global flow, deterministic double-run. (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    c6f990f View commit details
    Browse the repository at this point in the history
  7. feat(dataflow): stage 8a — two-phase context-sensitive backward slicing

    Classic HRB traversal over the assembled SDG: phase 1 ascends and skips
    across callsites via SUMMARY edges (never PARAM_OUT), phase 2 descends
    (never PARAM_IN/CALL) — call–return matching without re-descent. Gate
    pins an exact hand-computed interprocedural slice (caller_of_mutate →
    mutate) plus cross-file global descent and no-reascend properties. (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    43e0e69 View commit details
    Browse the repository at this point in the history
  8. feat(dataflow): program_graphs emission, -a 3, --graphs, --graph-fiel…

    …d-depth
    
    program_graphs schema section (PyProgramGraphs and friends, versioned
    1.0.0 independently of the application schema) attached to
    PyApplication; -a extended to 3 (cumulative: level 3 keeps PyCG
    enrichment); --graphs cfg,dfg,pdg,sdg selector with strict validation
    (unknown values and level<3 usage exit non-zero, never silently fall
    back); --graph-field-depth k-limit knob recorded in the output. -a 1/2
    emit no program_graphs and their pipeline is untouched. (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    6479c04 View commit details
    Browse the repository at this point in the history
  9. feat(dataflow): stage 8b — CPG projection through the Neo4j emitter

    CFGNode label (merge key id = <signature>#<node_id>) carrying both CFG
    statements and HRB parameter nodes, plus the shared cross-language edge
    vocabulary HAS_CFG_NODE / CFG_NEXT / CDG / DDG / PARAM_IN / PARAM_OUT /
    SUMMARY (deliberately unprefixed — parity clause). Additive
    schema.neo4j.json bump to 1.2.0; sample app extended so the conformance
    tests exercise every new row family; count-parity and no-dangling gates
    on the real fixture at -a 3. CALL stays at the callable level (PY_CALLS
    twin). (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    3e82256 View commit details
    Browse the repository at this point in the history
  10. docs(dataflow): analysis levels, Architecture & Tooling, schema decis…

    …ions
    
    README gains the level table, the locked level-3 substrate decisions
    (CFG from stdlib ast, hand-built reaching defs, type-based may-alias
    MVP, documented unsoundness), a level-3 usage example, and a
    regenerated --help block; CHANGELOG Unreleased entry; level-3 schema
    decision log tracked at .claude/SCHEMA_DECISIONS.md as SDK-model
    input (un-ignored past the global .claude exclude). (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    ac7acb3 View commit details
    Browse the repository at this point in the history
  11. fix(neo4j): namespace the CPG overlay per language (PyCFGNode, PY_* e…

    …dges)
    
    Unprefixed CFGNode/CFG_NEXT/CDG/DDG/PARAM_IN/PARAM_OUT/SUMMARY would
    mingle analyzers' dependence edges in a Neo4j database holding more
    than one language's graph — SDK backends scope queries by label/type
    prefix. The vocabulary stays cross-language in shape (same suffixes,
    props, semantics) but is PY_-namespaced in the projection like every
    other row family; the JSON program_graphs section keeps the unprefixed
    contract since each analysis.json is its own namespace. Decision
    recorded in .claude/SCHEMA_DECISIONS.md. (#67)
    rahlk committed Jul 2, 2026
    Configuration menu
    Copy the full SHA
    ddae36c View commit details
    Browse the repository at this point in the history
Loading