Phase 1b interstitial #2: TripleDifference vcov_type narrow contract#488
Merged
Conversation
Mirrors CS PR #487. TripleDifference uses IF-based variance per Ortiz-Villavicencio & Sant'Anna (2025) on the 3-pairwise-DiD decomposition; analytical-sandwich families {classical, hc2, hc2_bm} have no single design matrix for hat-matrix leverage or BM Satterthwaite DOF and are rejected at __init__ with methodology-rooted messages. conley deferred to follow-up TODO row. vcov_type='hc1' (default) preserves all current behavior bit-equally across {dr, reg, ipw} on default / cluster= / survey_design= paths. vcov_type field added to TripleDifferenceResults + to_dict + summary ("Variance estimator: hc1"). _validate_vcov_type @staticmethod called from __init__ AND fit() so sklearn-style set_params mutations can't silently propagate bad values to Results metadata. 24 new tests in TestTripleDifferenceVcovType: 5-surface contract (default/cluster/survey bit-equal, __init__ rejection per family, fit()-time revalidation) + 8 introspection/convenience-function tests. REGISTRY IF-based taxonomy section now lists TripleDifference alongside CallawaySantAnna in "Enforced today". CHANGELOG, TODO row strikes (L107, L206), and new Conley deferral row added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses local codex P2 (.claude/reviews/local-review-latest.md): the unconditional "Variance estimator: hc1" line in summary() was misleading for clustered fits (actual algebra is CR1 Liang-Zeger on the combined IF, not the raw hc1 sandwich) and for survey-backed fits (actual algebra is TSL on the combined IF). Changes: - Add cluster_name field to TripleDifferenceResults (mirrors CS PR #487) - Thread self.cluster as cluster_name in fit() Results construction - summary() routes through shared _format_vcov_label (results.py:49-89): - Bare: "HC1 heteroskedasticity-robust" - Cluster: "CR1 cluster-robust at <name>, G=<n>" - Survey: line suppressed (Survey Design block is the canonical surface) - to_dict() carries cluster_name when set Tests (3 new) close the codex P3 regression-coverage gap: - test_summary_cluster_label_is_cr1_not_raw_hc1 - test_summary_no_variance_estimator_line_under_survey - test_results_cluster_name_carries_through (+ to_dict assertion) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses local codex R2 P2 (.claude/reviews/local-review-latest.md):
when survey_design.psu overrides a conflicting cluster= argument,
the Results cluster_name + n_clusters fields previously still recorded
the raw self.cluster (the ignored argument). That made to_dict() and
downstream reporting describe the wrong clustering level.
Fix mirrors the variance-estimator line suppression in summary(): on
survey-backed fits the Survey Design block (PSU/strata/df) is the
canonical surface, so cluster_name + n_clusters are suppressed
(set to None) rather than reporting potentially-overridden raw inputs.
Reviewer offered two equivalent options ("reflect effective PSU OR
suppress under survey-backed fits"); suppression is simpler and
consistent with the existing summary() architecture.
Tests:
- test_cluster_name_suppressed_under_survey_design: cluster='state'
+ survey_design(psu='psu') with conflicting partitions; resolver
emits the documented UserWarning; Results carries None for both
cluster_name and n_clusters; to_dict() doesn't leak; Survey Design
block remains in summary()
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses local codex R3 P3 (.claude/reviews/local-review-latest.md): the IF-based taxonomy section's "hc1 with cluster=X" bullet claimed enforced estimators activate CR1 by synthesizing SurveyDesign(psu=X) and routing through _compute_stratified_psu_meat. That's true for CallawaySantAnna but not for TripleDifference, which computes the algebraically equivalent CR1 directly from cluster-summed IFs inline at triple_diff.py:1245-1259 (no SurveyDesign synthesis — the IF is already in scope at the SE call site). Reword the bullet to describe both activation paths and note they produce the same numerical result. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses local codex R4 P3 (.claude/reviews/local-review-latest.md): the post-taxonomy "documented synthesis" note still read as if SurveyDesign(psu=...) synthesis were the universal clustered-hc1 route. After R3 reworded the per-estimator bullet, this generic note needed the same treatment — split the activation path into the CallawaySantAnna synthesis route and the TripleDifference inline-CR1 route. CR1 Liang-Zeger algebra on the IF is Williams (2000) / Hansen (2007) in both cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses local codex R5 P1 (.claude/reviews/local-review-latest.md): the new vcov_type contract was previously only regression-tested on the default, bare-cluster, and analytical-TSL survey paths. The distinct replicate-variance branch in fit() at lines 727-747 (survey_design with replicate_weights + JK1/BRR/Fay/JKn/SDR) was unverified, leaving a coverage hole on a supported inference branch. Add _ddd_replicate_panel helper (mirrors the JK1 pattern from tests/test_survey_phase6.py::test_triple_diff_replicate_all_methods but uses default_rng for reproducibility) + parameterized test_replicate_survey_hc1_bit_equal_baseline over reg/ipw/dr. Asserts: - ATT/SE bit-equal between default and explicit vcov_type="hc1" - results.vcov_type == "hc1" on the replicate branch - summary() still suppresses raw "Variance estimator" line under survey (Survey Design block remains the canonical surface) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-merge-check audit caught a missing surface: diff_diff/guides/llms-full.txt
TripleDifference signature listing (lines 575-583) did not include the new
vcov_type parameter, even though the CallawaySantAnna entry at line 199 has
the exact analog line. The llms-full.txt is the AI-agent contract bundled in
the wheel + published on RTD; the signature listing should mirror the public
TripleDifference.__init__ signature.
Add the vcov_type line with the same "permanently narrow {hc1} per IF-based
variance" framing as CS, citing Ortiz-Villavicencio & Sant'Anna (2025).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses local codex R7 P1 (.claude/reviews/local-review-latest.md): TripleDifference(cluster=X, survey_design=SurveyDesign(replicate_weights=...)) was a silent no-op — the replicate-variance branch in fit() computes SE only via compute_replicate_if_variance(psi_rep, resolved_survey) and never consumes PSU/cluster state. Adding cluster='state' produced identical ATT/SE and no warnings relative to the no-cluster replicate fit, while populating cluster_name/n_clusters on Results dishonestly. Fail-closed per feedback_no_silent_failures: raise NotImplementedError at fit() when both cluster= and survey_design.replicate_weights are set. Message points to two valid alternatives: omit cluster= (replicate weights encode design implicitly) or use a non-replicate survey design with explicit strata/psu/fpc. Mirrors CS PR #487 guard at staggered.py:1705-1719 (the audit that caught it on CS originally noted "the other 7 Phase 1b estimators ... handle bare cluster= correctly" — but that audit only checked whether bare-cluster works at all, not the replicate-weight interaction). Tests: - test_cluster_plus_replicate_weights_rejected: parameterized over reg/ipw/dr, asserts NotImplementedError with "replicate-weight" in message Docs: - REGISTRY.md TripleDifference section gains a per-estimator Note on the cluster= + replicate-weight rejection - CHANGELOG entry amended with the new rejection clause Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Owner
Author
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good. No unmitigated P0/P1 findings in the PR diff. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
igerber
added a commit
that referenced
this pull request
May 25, 2026
Phase 1b interstitial #3 for ImputationDiD. Mirrors the CallawaySantAnna (PR #487) + TripleDifference (PR #488) template for IF-based estimators: vcov_type is permanently narrow to {"hc1"} because the per-unit influence function aggregation (Borusyak-Jaravel-Spiess 2024 Theorem 3) has no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined. Source surface: - diff_diff/imputation.py: vcov_type param + @staticmethod _validate_vcov_type + fit()-time revalidation + cluster+replicate-weights NotImplementedError guard + Results cluster_name/n_clusters resolution - diff_diff/imputation_results.py: vcov_type/cluster_name/n_clusters fields + new to_dict() + variance-estimator line in summary() routing through shared _format_vcov_label helper - diff_diff/imputation_bootstrap.py: dual-site n_clusters<2 / n_psu<2 NaN guards via new _build_nan_bootstrap_results helper (closes the BLAS-roundoff zero-SE class predicted to recur on IF-based estimators) Tests: 34 new tests in TestImputationDiDVcovType covering default / cluster / TSL-survey / replicate-survey bit-equality (parametrized over aggregate modes), bootstrap × cluster + bootstrap × survey bit-equality, fit()-time revalidation after set_params bypass, bootstrap n_psu<2 / n_clusters<2 NaN propagation, pretrends bit-equality, and the full introspection + safety-gate surface (8 tests). Docs: REGISTRY.md (IF-based taxonomy + 4 new Notes), CHANGELOG.md, TODO.md (row narrowed, Conley follow-up added), llms-full.txt (vcov_type + pretrends signature drift fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 25, 2026
… 3 (Phase 1b interstitial #3) Phase 1b interstitial #3 for ImputationDiD. Mirrors the CallawaySantAnna (PR #487) + TripleDifference (PR #488) template for IF-based estimators: vcov_type is permanently narrow to {"hc1"} because the per-unit influence function aggregation (Borusyak-Jaravel-Spiess 2024 Theorem 3) has no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined. Source surface (diff_diff/): - imputation.py: vcov_type param + @staticmethod _validate_vcov_type + fit()-time revalidation + cluster+replicate-weights NotImplementedError guard + Results metadata resolution (cluster_name=unit by default for the Theorem 3 unit-clustered IF variance; suppressed under ANY survey design — analytical OR replicate — because replicate variance ignores cluster/PSU entirely) - imputation_results.py: vcov_type/cluster_name/n_clusters fields, new to_dict() method, summary() variance line via shared _format_vcov_label (default cluster=None renders "CR1 cluster-robust at <unit>, G=<n>"; bootstrap fits suppress the analytical label and render "Inference method: bootstrap" instead, mirroring DiDResults.summary() gate at results.py:213-226) - imputation_bootstrap.py: dual-site n_clusters<2 / n_psu<2 NaN guards via new _build_nan_bootstrap_results helper (closes the BLAS-roundoff zero-SE class predicted to recur on IF-based estimators) Tests: 42 new tests in TestImputationDiDVcovType covering default / cluster / TSL-survey / replicate-survey + bootstrap × cluster + bootstrap × survey bit-equality (ALL parametrized over aggregate ∈ {None, "event_study", "group"} with per-horizon and per-group SE override branches pinned); fit()-time revalidation after set_params bypass; bootstrap n_psu<2 / n_clusters<2 NaN propagation including coef_var NaN; pretrends=True × vcov_type='hc1' × cluster bit-equality; introspection (default attr, get_params, Results carries, to_dict, summary label default+cluster+bootstrap-suppressed, cluster_name suppression under both analytical AND replicate survey, fit-clone idempotence, convenience function); input rejection on classical/hc2/hc2_bm/conley/ unknown with distinct methodology-keyword pins; cluster+replicate rejection. Full pytest tests/test_imputation.py: 128 passed. Docs: - REGISTRY.md: IF-based taxonomy adds ImputationDiD to "Enforced today" tier; ImputationDiD section gains 4 new Notes (vcov_type contract, cluster+replicate fail-closed, bootstrap n<2 NaN, default unit-cluster CR1 rendering) - CHANGELOG.md: [Unreleased] entry - TODO.md: Phase 1b row narrowed to TwoStageDiD + EfficientDiD; ImputationDiD Conley follow-up row added - guides/llms-full.txt: vcov_type + pretrends signature drift fix + shared staggered-results section advertises new variance metadata fields and to_dict() Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vcov_typeinput contract (Phase 1b interstitial Add fixed effects and absorb parameters to DifferenceInDifferences #2, permanently narrow to{\"hc1\"}). Mirrors CallawaySantAnna PR callaway-santanna: fix cluster= silent no-op + narrow vcov_type contract #487 template. Analytical-sandwich families{classical, hc2, hc2_bm}andconleyREJECTED at__init__with methodology-rooted messages: TripleDifference uses influence-function-based variance per Ortiz-Villavicencio & Sant'Anna (2025) on the 3-pairwise-DiD decompositioninf = w3·IF_3 + w2·IF_2 - w1·IF_1; the analytical-sandwich families have no single design matrix for hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF.vcov_type+cluster_namefields added toTripleDifferenceResults;summary()routes through shared_format_vcov_label: bare fits render\"HC1 heteroskedasticity-robust\", clustered fits render\"CR1 cluster-robust at <name>, G=<n>\"(actual algebra is Liang-Zeger CR1 on combined IF), survey-backed fits suppress the variance-estimator line (Survey Design block is the canonical surface).cluster= + SurveyDesign(replicate_weights=[...])raisesNotImplementedErroratfit()— replicate-weight variance ignores PSU/cluster entirely; honoringcluster=would silently have no effect on variance. Mirrors CS PR callaway-santanna: fix cluster= silent no-op + narrow vcov_type contract #487 guard.cluster_name+n_clusterssuppressed under survey designs so they don't misreport the rawcluster=argument whensurvey_design.psuoverrides it._validate_vcov_type@staticmethodcalled from__init__ANDfit()(mirrors CS pattern) so sklearn-styleset_paramsmutations can't silently propagate bad values to Results metadata.{dr, reg, ipw}×{default, cluster, survey, replicate-survey}paths.TestTripleDifferenceVcovType: 5-surface bit-equal matrix (default/cluster/TSL-survey/replicate-survey/fit-time-revalidation) + 8 introspection/convenience tests + 3 codex-driven additions (summary cluster label, summary survey suppression, cluster+replicate rejection).Methodology references
SE = std(w3·IF_3 + w2·IF_2 - w1·IF_1) / sqrt(n); CR1 Liang-Zeger on combined IF whencluster=set.vcov_typenarrowing to{\"hc1\"}is a library-architectural surface restriction, not a methodological deviation — the paper's asymptotic theory is written around influence functions rather than a single hat-matrix-bearing regression. Existing documented deviations on the variance side (no Stata-style(n-1)/(n-p)finite-sample factor on the CR1 algebra because the IF has no design-matrixpin the OLS sense) are now explicitly called out in REGISTRY.md.Validation
tests/test_triple_diff.py::TestTripleDifferenceVcovType(35 tests; +477 LoC). Full file: 76 → 111 passing.tests/test_methodology_triple_diff.py(44 passing, 63 skipped R-parity);tests/test_survey_phase3.py::TestTripleDifferenceSurvey(8 passing)./ai-review-local --backend codexiterated to clean (zero actionable findings). Each round caught a different consumer surface: R1 summary-label rendering, R2 cluster+survey metadata, R3+R4 REGISTRY wording, R5 replicate-weight coverage, R7 cluster+replicate silent no-op (real bug discovered by R5's added test). R8 verdict: ✅ Looks good.Security / privacy