Skip to content

Commit 88a4362

Browse files
authored
Merge pull request igerber#477 from igerber/spillover-conley-wave-e2-followup-lag
SpilloverDiD conley + survey + lag>0 via panel-block composition (Wave E.2 follow-up)
2 parents 955aa4b + 82dda96 commit 88a4362

10 files changed

Lines changed: 1641 additions & 91 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
106106
- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
107107
- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
108108
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
109-
- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up)
109+
- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`)
110110
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
111111
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
112112
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves

TODO.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,8 @@ Deferred items from PR reviews that were not addressed before merge.
145145
| `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
146146
| `SpilloverDiD(survey_design=...)` replicate-weight variance (BRR / Fay / JK1 / JKn / SDR). Wave E.1 ships Taylor-linearization only. Per Gerber (2026) Appendix A, the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit of stage 1 and stage 2 (200+ LoC of test surface beyond E.1). | `spillover.py::SpilloverDiD.fit`, `survey.py::compute_replicate_refit_variance` | follow-up | Low |
147147
| `SpilloverDiD(survey_design=...)` subpopulation preservation (Wave E.3). Wave E.1's `finite_mask` block physically removes zero-weight rows that lose stage-1 FE support, so `SurveyDesign.subpopulation()`-derived designs see `n_psu` / `df_survey` / Binder centering recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (R `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts and treats out-of-domain rows as zero-score padding. Fix requires separating fit-sample alignment (Psi array) from design-level bookkeeping: preserve a full-design `resolved_survey` for inference metadata + zero-pad dropped zero-weight rows' IF contribution. Add `SurveyDesign.subpopulation()` regression test to lock the contract. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_binder_tsl_meat` | follow-up (Wave E.3) | Medium |
148-
| `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` serial Bartlett HAC composition. Wave E.2 ships the panel-aware `conley_lag_cutoff = 0` case ("within-period spatial only" — `sum_t sum_h M_h_t` per `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`) and raises `NotImplementedError` upfront at `spillover.py:fit` on `conley_lag_cutoff > 0`. The serial Bartlett component (within-unit / within-PSU temporal HAC at lag ≤ L) needs to compose with the panel-aware stratified-Conley spatial sandwich — the natural addition is `meat_serial = sum_g sum_{|t-s|<=L, t!=s} (1 - |t-s|/(L+1)) * (S_psu_t[g] - S_bar_h_t)(S_psu_s[g] - S_bar_h_s)'` per PSU, summed across all PSUs in each stratum, with appropriate Binder FPC scaling — plus a methodology call on whether to include cross-period spatial pairs in the serial term. Regression goldens vs the cross-sectional limit (lag=0, which is now the shipped path). | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_conley_meat` | follow-up (Wave E.2 follow-up) | Medium |
148+
| Serial Bartlett kernel logic duplicated between `diff_diff/two_stage.py::_compute_stratified_serial_bartlett_meat` (survey path) and `diff_diff/conley.py::_compute_conley_meat` panel-block branch (no-survey path). Both compute `K[t,s] = (1 - |t-s|/(L+1)) * 1{|t-s| <= L, t != s}` over dense panel-period codes. Factor out a shared `_serial_bartlett_kernel_matrix(t_codes, L)` helper and a shared post-meat finite + PSD-warning guard so the survey and no-survey paths can't drift on diagnostics or kernel weights. Cosmetic; refactor doesn't change behavior. | `two_stage.py::_compute_stratified_serial_bartlett_meat`, `conley.py::_compute_conley_meat` | follow-up | Low |
149+
| `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` no-effective-PSU serial Bartlett HAC. Wave E.2 follow-up ships the panel-block composition when an effective PSU exists (explicit `survey_design.psu` OR injected via `cluster=<col>` per `_inject_cluster_as_psu`). Weights-only / strata-only survey designs WITHOUT a cluster fallback raise `NotImplementedError` at `SpilloverDiD.fit` post-resolution because under the pseudo-PSU = obs-index fallback each pseudo-PSU appears in exactly one period — the per-PSU serial cross-period loop would silently contribute zero. Fix would either derive a unit-level serial fallback for no-PSU designs (mixes IF allocators with the pseudo-PSU spatial term — needs methodology work) or route the serial loop through `conley_unit` with explicit documentation of the IF-allocator asymmetry. Regression goldens vs the effective-PSU shipped path. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_serial_bartlett_meat` | follow-up (Wave E.2 follow-up tail) | Low |
149150
| `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
150151
| `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
151152
| `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |

diff_diff/guides/llms.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
5858
- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
5959
- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects
6060
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
61-
- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=<col>` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up)
61+
- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=<col>` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2 cross-sectional `conley_lag_cutoff=0`) extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC (Newey-West 1987 separable form; `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up)
6262
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
6363
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
6464
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves

diff_diff/spillover.py

Lines changed: 61 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2197,37 +2197,23 @@ def fit(
21972197
# check, cluster-vs-PSU warn) runs AFTER `_validate_spillover_inputs`
21982198
# below so it sees the panel columns the validator guarantees.
21992199
#
2200-
# Wave E.2 scope-limit (upfront, before resolution / panel work):
2201-
# the panel-block Conley HAC (`conley_lag_cutoff > 0`) is NOT
2202-
# composed with the survey path in this PR. The stratified-Conley
2203-
# helper applies a cross-sectional kernel on PSU-aggregated totals;
2204-
# composing the within-unit serial Bartlett HAC with the within-
2205-
# stratum cross-PSU spatial kernel requires carrying PSU-by-time
2206-
# scores into the meat construction, which is a separate Wave E.x
2207-
# follow-up tracked in TODO.md. Reject upfront with a clear pointer
2208-
# so users running `survey_design=` + `conley_lag_cutoff > 0` get
2209-
# the error before stage-1 / 2 work (per `feedback_no_silent_failures`).
2210-
if (
2211-
survey_design is not None
2212-
and self.vcov_type == "conley"
2213-
and self.conley_lag_cutoff is not None
2214-
and self.conley_lag_cutoff > 0
2215-
):
2216-
raise NotImplementedError(
2217-
"SpilloverDiD(vcov_type='conley', conley_lag_cutoff > 0) "
2218-
"combined with survey_design= is not supported in Wave E.2. "
2219-
"The Wave E.2 stratified-Conley sandwich aggregates Psi to "
2220-
"PSU totals before applying the cross-sectional Conley "
2221-
"kernel; the panel-block decomposition (within-unit serial "
2222-
"Bartlett HAC over time) would require carrying PSU-by-time "
2223-
"scores and composing the serial kernel with the within-"
2224-
"stratum cross-PSU spatial kernel. This composition is "
2225-
"queued as a follow-up (see TODO.md). For Wave E.2, use "
2226-
"conley_lag_cutoff=0 (cross-sectional Conley) with "
2227-
"survey_design=, or use survey_design= with "
2228-
"vcov_type='hc1' (+ cluster=<col> for CR1) for the full "
2229-
"Wave E.1 path."
2230-
)
2200+
# Wave E.2 follow-up (shipped): `vcov_type='conley' + conley_lag_cutoff > 0
2201+
# + survey_design=` is supported via panel-block stratified-Conley
2202+
# sandwich (spatial Wave E.2 term + within-PSU serial Bartlett HAC)
2203+
# WHEN there is an effective PSU (explicit `survey_design.psu` OR
2204+
# injected via `cluster=<col>` per Wave E.1's `_inject_cluster_as_psu`
2205+
# routing). The orchestrator at
2206+
# `two_stage.py::_compute_stratified_conley_meat` sums the two terms
2207+
# with disjoint index sets — matches the no-survey panel-block
2208+
# decomposition at `conley.py::_compute_conley_meat` (Conley 1999
2209+
# spatial + Newey-West 1987 serial Bartlett; separable form, NOT
2210+
# Driscoll-Kraay 2D-HAC). FPC convention: per-period FPC on spatial,
2211+
# panel-wide stratum-level FPC on serial. The no-effective-PSU
2212+
# fail-closed gate is downstream at the post-resolution check (see
2213+
# the `resolved_survey_fit.psu is None` block below the cluster
2214+
# injection); the gate cannot live up here because at this point
2215+
# the user-supplied `cluster=<col>` has not yet been injected into
2216+
# the survey design as the effective PSU.
22312217
# Validate `anticipation` up front: must be a non-negative integer.
22322218
# Accepting fractional or negative values would silently shift
22332219
# treatment timing and ring exposure beyond what the estimator's
@@ -3100,6 +3086,50 @@ def fit(
31003086
_conley_unit_arg = None
31013087
_conley_lag_arg = None
31023088

3089+
# Wave E.2 follow-up gate (post-resolution, post-injection):
3090+
# fail-closed for `vcov_type="conley" + conley_lag_cutoff > 0` when
3091+
# the EFFECTIVE PSU is still absent after `_inject_cluster_as_psu`.
3092+
# Under no-effective-PSU survey designs (weights-only / strata-only
3093+
# WITHOUT a cluster fallback) the orchestrator falls back to
3094+
# pseudo-PSU = obs-index in `_compute_stratified_conley_meat`, but
3095+
# each pseudo-PSU appears in exactly one period, so the per-PSU
3096+
# serial cross-period loop never contributes anything (silent zero
3097+
# serial term). Routing the serial loop to `conley_unit` (the panel
3098+
# unit) instead of pseudo-PSU would mix IF allocators (PSU spatial
3099+
# vs unit serial), which violates the single-IF-allocator design
3100+
# pinned by the user-confirmed methodology in the Wave E.2 follow-up
3101+
# plan. Fail-closed per `feedback_no_silent_failures` until a
3102+
# no-effective-PSU-specific derivation is queued. Note: this fires
3103+
# AFTER `_inject_cluster_as_psu` (which runs upstream) so the
3104+
# documented `cluster=<col> + survey_design(without psu)` surface
3105+
# — which becomes an effective-PSU layout via injection — passes
3106+
# through unscathed. R2 P1 fix: original front-door gate at
3107+
# `spillover.py:2210-2242` (now removed) fired before injection
3108+
# and broke the cluster-as-PSU survey-Conley surface.
3109+
if (
3110+
resolved_survey_fit is not None
3111+
and resolved_survey_fit.psu is None
3112+
and self.vcov_type == "conley"
3113+
and self.conley_lag_cutoff is not None
3114+
and self.conley_lag_cutoff > 0
3115+
):
3116+
raise NotImplementedError(
3117+
"SpilloverDiD(vcov_type='conley', conley_lag_cutoff > 0) "
3118+
"combined with a no-effective-PSU survey_design "
3119+
"(weights-only / strata-only WITHOUT a cluster fallback) "
3120+
"is not supported in Wave E.2 follow-up. Under no-effective-"
3121+
"PSU survey designs the panel-block serial Bartlett HAC "
3122+
"would silently contribute zero (each pseudo-PSU = "
3123+
"obs-index appears in exactly one period, so the within-PSU "
3124+
"temporal sum has no cross-period pairs to accumulate). "
3125+
"Routing the serial loop to `conley_unit` would mix IF "
3126+
"allocators with the spatial term and is not derived in "
3127+
"this PR. Supply either an explicit `survey_design.psu`, "
3128+
"or `cluster=<col>` (which is injected as the effective "
3129+
"PSU per Wave E.1's `_inject_cluster_as_psu` routing), "
3130+
"or use `conley_lag_cutoff=0` (cross-sectional Wave E.2)."
3131+
)
3132+
31033133
# Derive the Wave D variance mode from the PUBLIC contract:
31043134
# - vcov_type="conley" → "conley" (Conley spatial-HAC + GMM)
31053135
# - cluster=<col> supplied → "cluster" (CR1 + GMM)

0 commit comments

Comments
 (0)