TwoStageDiD Wave E.3 parity: always-treated full-design retention#485
Conversation
…n survey design Mechanical transfer of the Wave E.3 SpilloverDiD invariant (PR #482, merge 24de906) to TwoStageDiD. When the always-treated handler drops units from the OLS sample, the resolved survey design retains its FULL-DOMAIN n_psu / n_strata / df_survey / strata / fpc / psu arrays instead of being subsetted via replace(...). Per-cluster scores aggregate at fit-length then zero-pad onto the full-domain unique-PSU list via two new optional kwargs on _compute_gmm_variance: score_pad_mask and cluster_ids_full. PSUs containing only always-treated rows get zero score rows but still count toward G_full for n_psu / df_survey accounting. Documented synthesis (library-convention adoption): adopts the canonical R survey::svyrecvar(subset()) convention (Lumley 2010 §2.5), already established at imputation.py:2175-2183 (PreTrendsImputation), prep.py:1401-1432 (DCDH cell variance), and spillover.py (PR #482). Implementation: - diff_diff/two_stage.py: delete L1485-1525 design-subset block; promote keep_mask to fit()-level scope (always defined; defaults all-True); cluster injection sources cluster_ids_raw from FULL-DOMAIN data[cluster] (not post-drop df[cluster]) so _inject_cluster_as_psu's zip against resolved_survey.strata stays length-aligned; df["_survey_cluster"] aligned to post-drop length via resolved_survey.psu[keep_mask.values]; _compute_gmm_variance + 3 inner _stage2_* methods gain score_pad_mask / cluster_ids_full kwargs; zero-pad expansion after per-cluster aggregation; strata/fpc obs_idx lookups use cluster_ids_full under padding; G < 2 unidentified gate fires on G_full when padding active. - diff_diff/two_stage.py: _refit_ts callback subsets each replicate weight w_r via keep_mask.values before threading into _fit_untreated_model and _stage2_*, matching the keep_mask-subsetting applied to survey_weights in the main fit (otherwise solve_ols rejects the length mismatch and compute_replicate_refit_variance swallows the ValueError so replicate inference NaNs out). - Always-treated warning text updated to reflect the new contract: weights are subsetted for OLS, design retained for variance. - Replicate variance + always-treated: existing path now works correctly (score_pad_mask_arg stays None on _uses_replicate_ts paths; per-replicate refit handles resampling separately). Tests (tests/test_two_stage.py): - New TestTwoStageDiDWaveE3ParityAlwaysTreated class with 8 tests: (a) no-always-treated baseline, (b) full-domain df_survey preservation under drop, (c) full-domain n_psu reporting, (d) per-cluster zero-pad mock-spy on _compute_stratified_meat_from_psu_scores, (e) subpopulation + always-treated composition, (f) cluster-as-PSU + always-treated, (g) no-survey path unchanged, (h) PSU entirely-always-treated. Tests (tests/test_replicate_weight_expansion.py): - Strengthened test_two_stage_always_treated to assert finite overall_se / overall_p_value / overall_conf_int (was only asserting finite ATT, missing the replicate-SE regression class). - New test_two_stage_always_treated_event_study_and_group_replicate exercising the event-study + group replicate refit branches end-to-end under always-treated drop with aggregate="all"; asserts finite se + p_value + conf_int on non-reference horizons and cohorts. Docs: - docs/methodology/REGISTRY.md: TwoStageDiD section gains "documented synthesis — Wave E.3 parity" note; SpilloverDiD Wave E.3 follow-up note updated to mark TwoStageDiD parity as shipped. - CHANGELOG.md: Unreleased Added entry leading with documented-synthesis framing. - TODO.md: drop parity follow-up row. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Overall Assessment✅ Looks good Executive Summary
MethodologyNo findings. The PR changes the survey-variance realization for Code QualityNo findings. The PR continues to use Performance
MaintainabilityNo findings. Internal propagation of the new split-length contract looks complete across the changed call graph: Tech DebtNo findings. Removing the resolved TODO entry in SecurityNo findings. Documentation/Tests
Execution note: I could not run |
CI codex review P3 #1 (performance): _compute_gmm_variance was taking the zero-pad branch (np.unique + np.searchsorted + full-size c_by_cluster / s2_by_cluster copies) on EVERY analytical survey fit because keep_mask defaults to all-True. Gate on `n_always_treated > 0` so baseline survey fits with no drop pass `None / None` and take the bit-identical pre-PR path. CI codex review P3 #2 (test determinism): seed local np.random.default_rng in the two new always-treated replicate-weight tests so always-treated outcomes are deterministic across runs (was using global np.random.normal which is non-deterministic at import time). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
SpilloverDiD(PR SpilloverDiD Wave E.3: SurveyDesign.subpopulation() full-design retention #482) toTwoStageDiD. When the always-treated handler drops units from the OLS sample, the resolved survey design now retains its FULL-DOMAINn_psu/n_strata/df_survey/strata/fpc/psuarrays. Per-cluster scores are zero-padded onto the full-domain unique-PSU list via new optional kwargs (score_pad_mask,cluster_ids_full) on_compute_gmm_variance; PSUs containing only always-treated rows get zero score rows but still count towardG_full._refit_ts) subsets eachw_rviakeep_mask.valuesbefore threading into stage-1/stage-2, mirroring the same subsetting the main fit applies tosurvey_weights(otherwisesolve_olsrejects the length mismatch andcompute_replicate_refit_varianceswallows the ValueError so replicate inference NaNs out).cluster_ids_rawfrom FULL-DOMAINdata[cluster](not post-dropdf[cluster]) so_inject_cluster_as_psu's zip againstresolved_survey.stratastays length-aligned;df["_survey_cluster"]aligned viaresolved_survey.psu[keep_mask.values].TODO.mdsince PR SpilloverDiD Wave E.3: SurveyDesign.subpopulation() full-design retention #482.Methodology references (required if estimator / math changes)
TwoStageDiD(Gardner 2022) — survey variance under always-treated unit drop.survey::svyrecvar(subset())zero-pad convention). Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943.imputation.py:2175-2183(PreTrendsImputation) andprep.py:1401-1432(DCDH cell variance).Validation
TestTwoStageDiDWaveE3ParityAlwaysTreatedintests/test_two_stage.py(8 tests: no-always-treated baseline, full-domaindf_surveypreservation, full-domainn_psureporting, per-cluster zero-pad mock-spy, subpopulation+always-treated composition, cluster-as-PSU+always-treated, no-survey path unchanged, PSU entirely-always-treated).test_two_stage_always_treatedintests/test_replicate_weight_expansion.pyto assert finiteoverall_se/overall_p_value/overall_conf_intunder always-treated + replicate-weight design.test_two_stage_always_treated_event_study_and_group_replicatein the same file: exercises event-study + group replicate refit branches end-to-end withaggregate="all", asserts finitese+p_value+conf_inton non-reference horizons and cohorts.Security / privacy
Generated with Claude Code