refactor(survey): consolidate panel-to-unit collapse (ContinuousDiD/EfficientDiD)#602
refactor(survey): consolidate panel-to-unit collapse (ContinuousDiD/EfficientDiD)#602igerber wants to merge 1 commit into
Conversation
…fficientDiD) Extract the row_idx->unit survey-design collapse hand-rolled at four sites (continuous_did.py event-study/overall/bootstrap SE + efficient_did.py build-once) into two diff_diff/survey.py helpers: - ResolvedSurveyDesign.subset_to_units_by_row_idx(row_idx, unit_weights=None): folds the index-and-recount preamble around the existing subset_to_units. - build_unit_first_row_index(unit_values, unit_order): first-panel-row index per unit (replaces ContinuousDiD's slow df.iterrows() build and EfficientDiD's inline .values scan). Bit-identical: same np.unique + subset_to_units + arrays; survey-weighted SEs, df_survey, and design-effect metadata are unchanged (oracle test locks it vs the old inline logic). Removes ~95 lines of duplicated collapse. StackedDiD is deliberately left on its own path (control units are duplicated across sub-experiments, so it re-resolves at stacked granularity rather than collapsing to one row per unit) with a clarifying comment; the residual stacked-specific dedup is parked in TODO.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Overall Assessment✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
MethodologyFinding: No unmitigated methodology findings. Code QualityFinding: No code quality findings. PerformanceFinding: No performance findings. MaintainabilityFinding: No maintainability findings. Tech DebtFinding: Residual SecurityFinding: No security findings. Documentation/TestsFinding: Verification environment missing test dependencies. |
Summary
diff_diff/survey.pyhelpers:ResolvedSurveyDesign.subset_to_units_by_row_idx(row_idx, unit_weights=None)(folds the index-and-recount preamble around the existingsubset_to_units) andbuild_unit_first_row_index(unit_values, unit_order)(first-panel-row index per unit).df.iterrows()index build → single helper calls. EfficientDiD: index build + build-once collapse → helper calls. Net ~95 fewer lines of duplicated collapse.TODO.md.np.unique+subset_to_units+ arrays; survey-weighted SEs,df_survey, and design-effect metadata are unchanged.Methodology references (required if estimator / math changes)
Validation
tests/test_survey.py::TestUnitCollapseHelpers(4 tests) —build_unit_first_row_indexon an unsorted panel, andsubset_to_units_by_row_idxelement-for-element equal to the OLD inline preamble +subset_to_units(the frozen oracle), covering both the explicit-unit_weightsand default (None) paths plus a replicate-weight design (R-column subset preserved).tests/test_survey.py,tests/test_methodology_continuous_did.py,tests/test_efficient_did.py(329 passed locally). No golden re-baselining.object-attr errors), zero new.Security / privacy
Generated with Claude Code