Skip to content

perf: factorize-once + bincount MAP demeaning, max_iter 10k parity, FE-spanned regressor guard#601

Open
igerber wants to merge 2 commits into
mainfrom
perf/demean-bincount
Open

perf: factorize-once + bincount MAP demeaning, max_iter 10k parity, FE-spanned regressor guard#601
igerber wants to merge 2 commits into
mainfrom
perf/demean-bincount

Conversation

@igerber

@igerber igerber commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

  • Rewrites the FE-absorption engine (demean_by_groups N>=2 branch): each absorbed dimension is factorized ONCE and every MAP sweep is two O(n) np.bincount passes, replacing pandas rebuilding the group hash table per (iteration x variable x dimension). Convergence semantics unchanged (per-variable sweep order, max|x - x_old| < tol, zero-total-weight inert guard, non-convergence warning). NaN group keys now raise ValueError at every arity (pandas groupby silently dropped them; pd.factorize would code them -1 and mis-index).
  • Raises the MAP iteration cap 100 -> 10,000 in both demean_by_groups and within_transform, matching R fixest (fixef.iter) and pyfixest (fixef_maxiter). Correlated FE incidence (contiguous unit lifetimes) genuinely needs ~270 iterations; the old cap warned and returned slightly-off residuals. Worst-case trade-off documented in REGISTRY.
  • Closes an FE-spanned-regressor inference hazard (new utils.snap_absorbed_regressors, wired at DiD/MPD absorb=, TWFE, SunAbraham, Bacon, Wooldridge, and the DiD/MPD/TWFE replicate-refit closures): regressors spanned by the absorbed FEs previously demeaned to junk columns that survived rank detection via column equilibration and silently perturbed identified estimates - up to ~3e-3 on ATT with ~1e14-scale garbage coefficients in the joint-span (a_unit + b_time) case. Detection is two-stage because the MAP stopping rule bounds the last step, not the distance to the limit: fast relative-norm path at 1e-10, then an exact sparse-LSMR span-membership confirmation for truncation-masked candidates; sqrt(w)-weighted norms so zero-weight domain rows cannot mask spanning. Spanned regressors drop deterministically (coefficient NaN) with a cause-specific UserWarning honoring rank_deficient_action.
  • Measured (committed fe_absorption suite baselines, generated on frozen code, CVs 0.1-4.4%): county policy event study 3.87s -> 2.39s (1.6x), firm-churn 2.4M-row SunAbraham 93.0s -> 49.5s (1.9x), scanner TWFE 1.55s -> 0.98s (1.6x), 5M-row geo experiment 2.63s -> 0.97s (2.7x), survey BRR x80 7.39s -> 4.08s (1.8x), small-panel guard unregressed. The correlated-FE stress case now CONVERGES (0.8x wall-clock as the price of correctness: 2.7x the iterations to full convergence plus the exact span confirmation; previously it hit the cap and kept a masked junk column).
  • Identity evidence: --check-estimates gate at 1e-14-1e-16 on all scenarios except the two documented one-time corrections (survey_absorb ATT 9.8e-6, geo_experiment SE 1e-7 - the removed junk influence), passed via the auditable per-run --allow-shift flag. Estimates are now invariant (~1e-14) to the demeaning tolerance where they previously swung at ~1e-5.
  • Dead _within_transform methods removed (twfe.py, sun_abraham.py). docs/doc-deps.yaml gains the utils.py -> "Absorbed Fixed Effects" REGISTRY mapping.

Methodology references (required if estimator / math changes)

  • Method name(s): Method of alternating projections (MAP) for N-way fixed-effects absorption (Frisch-Waugh-Lovell residualization); exact span-membership confirmation via LSMR on the sparse FE incidence
  • Paper / source link(s): fixest/reghdfe/lfe demeaning convention (docs/methodology/REGISTRY.md "Absorbed Fixed Effects with Survey Weights"); Fong & Saunders (2011) LSMR via scipy.sparse.linalg.lsmr
  • Any intentional deviations from the source (and why): Numerical contract change documented in REGISTRY - bincount accumulation is not Kahan-compensated the way pandas' grouped mean was, so demeaned values agree with the prior implementation to ~1e-10 order rather than bit-for-bit (estimates validated unchanged at the benchmark identity gate). FE-spanned regressors now drop to NaN with a cause warning instead of receiving junk coefficients; designs carrying them see identified estimates shift once by up to ~1e-5 (REGISTRY edge cases + CHANGELOG).

Validation

  • Tests added/updated: tests/test_utils.py (bincount-engine parity vs full-dummy ground truth, frozen-loop drift guards, NaN-key guards at every arity, weight-aware snap under zero-weight domains, joint-span LSMR confirmation + identified-low-variation counter-test, max_iter defaults lock, >100-iteration convergence fixture), tests/test_estimators.py (DiD/MPD/TWFE snap behavior, replicate-local spanning fail-closed, joint-span covariate dropped with ATT stable, degenerate unbalanced-TWFE spec now asserts the identification error), tests/test_sun_abraham.py, tests/test_wooldridge.py (hc1 lock recaptured, ~1e-15 shift), tests/test_bacon.py, tests/test_within_transform.py. Targeted sweep: 875 passed.
  • Backtest / simulation / notebook evidence (if applicable): committed before/after baselines in benchmarks/speed_review/baselines/fe_absorption_{before,after}.json (subprocess-isolated multi-run protocol, CVs recorded); findings table auto-regenerated in docs/performance-plan.md; pyfixest yardstick coefficient parity 6e-13 to 2.7e-9 on exact-estimand scenarios.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…rity, FE-spanned regressor guard

Rewrites the fixed-effects absorption engine and closes a silent-inference
hazard the rewrite's identity gate surfaced. Three coupled changes:

1. demean_by_groups N>=2 branch: each absorbed dimension factorized ONCE
   (pd.factorize), MAP sweeps via np.bincount group-sum + gather instead of
   pandas rebuilding the group hash table per (iteration x variable x
   dimension). Same convergence semantics (per-variable sweep order,
   max|x - x_old| < tol, zero-total-weight inert guard, warn_if_not_converged).
   NaN group keys raise ValueError at every arity (one-way included; pandas
   groupby silently dropped NaN keys, factorize would code them -1).
   Numerical contract: bincount accumulation is not Kahan-compensated, so
   outputs agree with the prior pandas loop to ~1e-10 order, not bit-for-bit.

2. max_iter raised 100 -> 10,000 in BOTH demean_by_groups and
   within_transform (fixest fixef.iter / pyfixest fixef_maxiter parity).
   Correlated FE incidence (contiguous unit lifetimes) genuinely needs ~270
   iterations; the old cap warned and returned slightly-off residuals.

3. FE-spanned regressor guard (new utils.snap_absorbed_regressors +
   pre_demean_norms + _fe_span_residual_norm), wired at DiD/MPD absorb=,
   TWFE, SunAbraham, Bacon, Wooldridge AND the DiD/MPD/TWFE replicate-refit
   closures: regressors spanned by the absorbed FEs previously demeaned to
   junk columns that survived rank detection via column equilibration and
   perturbed identified estimates - up to ~3e-3 on ATT with ~1e14-scale
   garbage coefficients in the joint-span (a_unit + b_time) case. Detection
   is TWO-STAGE because the MAP stopping rule bounds the last step, not the
   distance to the limit: fast relative-norm path at 1e-10, then an exact
   sparse-LSMR span-membership confirmation (on the already-demeaned column,
   so it converges in a few iterations) for truncation-masked candidates in
   (1e-10, 1e-3]. Norms are sqrt(w)-weighted so zero-weight domain rows
   cannot mask spanning on the positive-weight sample. Spanned regressors
   drop deterministically (coefficient NaN) with a cause-specific UserWarning
   under rank_deficient_action="warn"; identified estimates become invariant
   (~1e-14) to the demeaning tolerance where they previously swung at ~1e-5.

Measured (fe_absorption suite, AFTER baselines committed, generated on frozen
code): 1.6-2.7x end-to-end on the headline scenarios; the correlated-FE
stress case now converges (0.8x wall-clock as the price of correctness:
2.7x the iterations to full convergence + the exact span confirmation).
Identity gate: 1e-14-1e-16 everywhere except the documented one-time
corrections (survey_absorb ATT 9.8e-6, geo_experiment SE 1e-7 - junk columns
removed). 875 targeted tests pass; byte-identity locks converted to
allclose + full-dummy ground-truth guards; Wooldridge 1e-14 baseline
recaptured (~1e-15 shift); a pre-existing degenerate test spec that asserted
isfinite on garbage output now asserts the identification error. Dead
_within_transform methods removed (twfe, sun_abraham). REGISTRY "Absorbed
Fixed Effects" + CHANGELOG + doc-deps updated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings found.

Executive Summary

  • Affected methods: absorbed fixed-effect MAP/FWL residualization, weighted within transforms, and FE-spanned regressor handling across DiD/MPD absorb=, TWFE, SunAbraham, Bacon, Wooldridge, and replicate-refit closures.
  • The methodology changes are documented in docs/methodology/REGISTRY.md with explicit Note/Edge case entries, so the numerical drift, max_iter=10_000, NaN group-key guard, and FE-spanned snap behavior are P3 informational, not defects.
  • I did not find new inline inference anti-patterns or partial NaN guards in the modified paths; replicate SE paths still route through safe_inference().
  • Edge-case coverage is strong for NaN group keys, zero-weight domains, joint FE-span confirmation, and replicate-local spanning.
  • I could not execute runtime checks because the review environment is missing numpy.

Methodology

Finding: Documented absorbed-FE numerical contract change

  • Severity: P3 informational, no action required
  • Location: diff_diff/utils.py:L2644-L2848, docs/methodology/REGISTRY.md:L4317-L4344
  • Impact: The factorize-once + np.bincount MAP rewrite changes floating-point accumulation relative to pandas groupby and raises default max_iter from 100 to 10,000. This is a methodology-relevant default/numerical contract change, but it is explicitly documented in the registry.
  • Concrete fix: None required.

Finding: Documented FE-spanned regressor snap behavior

  • Severity: P3 informational, no action required
  • Location: diff_diff/utils.py:L2851-L3033, docs/methodology/REGISTRY.md:L4348-L4378
  • Impact: FE-spanned regressors are now snapped to zero before solve so rank handling drops them deterministically. This can shift estimates that previously included numerical junk, but the correction and assumptions are documented.
  • Concrete fix: None required.

Code Quality

No blocking findings. The snap helper is centralized and call sites consistently pass pre-demean norms, absorbed group variables, suffixes, rank-deficiency behavior, and weights.

Performance

No blocking findings. The worst-case max_iter=10_000 latency and LSMR confirmation cost are explicitly accepted and documented in REGISTRY.md and docs/performance-plan.md.

Maintainability

No blocking findings. The repeated estimator wiring is somewhat verbose but follows existing local patterns and keeps estimator-specific rank/warning behavior clear.

Tech Debt

No untracked correctness debt found. Existing deferred-work handling in TODO.md does not appear to be used to mask any P0/P1 issue introduced here.

Security

No findings. The changed code does not introduce secret handling, external I/O, shell execution, or unsafe deserialization paths.

Documentation/Tests

Finding: doc-deps.yaml note omits the new snap helper

  • Severity: P3 informational
  • Location: docs/doc-deps.yaml:L904-L907
  • Impact: The utils-to-registry dependency note mentions demeaning/max_iter/NaN-key behavior but not snap_absorbed_regressors, even though that new helper is methodology-relevant and lives in utils.py.
  • Concrete fix: Extend the note to include “FE-spanned regressor snap / exact span confirmation”.

Finding: Local tests not executed in this review environment

  • Severity: P3 informational
  • Location: Review environment
  • Impact: A lightweight Python sanity check failed at import time with ModuleNotFoundError: No module named 'numpy', so I could not independently run the new tests here.
  • Concrete fix: Run the targeted test set in CI or a dependency-complete local env, especially tests/test_utils.py, tests/test_estimators.py, tests/test_sun_abraham.py, tests/test_wooldridge.py, and tests/test_bacon.py.

…view P3)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a03319a048e69d4da0739044e72fe4c3fbdffa58


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings found.

Executive Summary

  • Affected methods: MAP/FWL absorbed fixed-effect demeaning, weighted within transforms, and FE-spanned regressor handling across DiD/MPD absorb=, TWFE, SunAbraham, Bacon, Wooldridge, and replicate-refit closures.
  • The methodology-relevant changes are documented in REGISTRY.md with Note / Edge case labels, so the max-iteration change, bincount numerics, NaN group-key guard, and FE-spanned snap behavior are P3 informational, not defects.
  • Previous P3 doc-dependency issue is resolved: docs/doc-deps.yaml:L904-L907 now names snap_absorbed_regressors and exact LSMR span confirmation.
  • No new inline inference anti-patterns or partial NaN guards were found in modified estimator paths; replicate paths continue to route through safe_inference().
  • I could not execute tests in this environment: numpy is missing and pytest is not installed.

Methodology

Finding: Documented absorbed-FE MAP numerical/default contract change

  • Severity: P3 informational, no action required
  • Location: diff_diff/utils.py:L2646-L2833, docs/methodology/REGISTRY.md:L4317-L4347
  • Impact: demean_by_groups / within_transform now use factorized np.bincount MAP sweeps and raise max_iter to 10_000; NaN absorbed group keys now raise. These are methodology/default behavior changes, but the registry explicitly documents the rationale, numerical drift contract, and edge-case behavior.
  • Concrete fix: None required.

Finding: Documented FE-spanned regressor snap behavior

  • Severity: P3 informational, no action required
  • Location: diff_diff/utils.py:L2851-L3033, docs/methodology/REGISTRY.md:L4348-L4378
  • Impact: FE-spanned regressors are now snapped to exact zero before solve, including weighted norms and LSMR confirmation for truncation-masked joint-span cases. This can change estimates that previously depended on numerical junk columns, but the correction is documented and wired across the affected estimators.
  • Concrete fix: None required.

Code Quality

No blocking findings. The helper is centralized and call sites consistently pass pre-demean norms, absorbed group variables, suffixes, rank-deficiency behavior, display names where useful, and weights: diff_diff/estimators.py:L452-L483, diff_diff/twfe.py:L382-L409, diff_diff/sun_abraham.py:L1491-L1513, diff_diff/wooldridge.py:L1198-L1223, diff_diff/bacon.py:L794-L820.

Performance

No blocking findings. The 10,000-iteration worst case and LSMR confirmation cost are documented as accepted trade-offs in docs/methodology/REGISTRY.md:L4317-L4334 and summarized in docs/performance-plan.md.

Maintainability

No blocking findings. The repeated estimator wiring is verbose but follows existing estimator-specific rank/warning patterns and keeps the snap behavior explicit at each demeaning boundary.

Tech Debt

No untracked correctness debt found. I did not see TODO tracking used to mask a P0/P1 issue introduced by this PR.

Security

No findings. The changes do not add secret handling, external I/O, unsafe deserialization, or shell execution paths.

Documentation/Tests

Finding: Local tests not executed in this review environment

  • Severity: P3 informational
  • Location: Review environment
  • Impact: python import failed with ModuleNotFoundError: No module named 'numpy', and pytest is not installed, so I could not independently run the added tests here.
  • Concrete fix: Run the targeted CI suite covering tests/test_utils.py, tests/test_estimators.py, tests/test_sun_abraham.py, tests/test_wooldridge.py, and tests/test_bacon.py.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant