Phase 1b interstitial #2: TripleDifference vcov_type narrow contract by igerber · Pull Request #488 · igerber/diff-diff

igerber · 2026-05-24T15:16:06Z

Summary

TripleDifference vcov_type input contract (Phase 1b interstitial Add fixed effects and absorb parameters to DifferenceInDifferences #2, permanently narrow to {\"hc1\"}). Mirrors CallawaySantAnna PR callaway-santanna: fix cluster= silent no-op + narrow vcov_type contract #487 template. Analytical-sandwich families {classical, hc2, hc2_bm} and conley REJECTED at __init__ with methodology-rooted messages: TripleDifference uses influence-function-based variance per Ortiz-Villavicencio & Sant'Anna (2025) on the 3-pairwise-DiD decomposition inf = w3·IF_3 + w2·IF_2 - w1·IF_1; the analytical-sandwich families have no single design matrix for hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF.
vcov_type + cluster_name fields added to TripleDifferenceResults; summary() routes through shared _format_vcov_label: bare fits render \"HC1 heteroskedasticity-robust\", clustered fits render \"CR1 cluster-robust at <name>, G=<n>\" (actual algebra is Liang-Zeger CR1 on combined IF), survey-backed fits suppress the variance-estimator line (Survey Design block is the canonical surface).
cluster= + SurveyDesign(replicate_weights=[...]) raises NotImplementedError at fit() — replicate-weight variance ignores PSU/cluster entirely; honoring cluster= would silently have no effect on variance. Mirrors CS PR callaway-santanna: fix cluster= silent no-op + narrow vcov_type contract #487 guard.
cluster_name + n_clusters suppressed under survey designs so they don't misreport the raw cluster= argument when survey_design.psu overrides it.
_validate_vcov_type @staticmethod called from __init__ AND fit() (mirrors CS pattern) so sklearn-style set_params mutations can't silently propagate bad values to Results metadata.
All default behavior bit-equal across {dr, reg, ipw} × {default, cluster, survey, replicate-survey} paths.
35 new tests in TestTripleDifferenceVcovType: 5-surface bit-equal matrix (default/cluster/TSL-survey/replicate-survey/fit-time-revalidation) + 8 introspection/convenience tests + 3 codex-driven additions (summary cluster label, summary survey suppression, cluster+replicate rejection).

Methodology references

Method: TripleDifference (DDD) — Ortiz-Villavicencio & Sant'Anna (2025), "Better Understanding Triple Differences Estimators", arXiv:2505.09942. IF-based variance: SE = std(w3·IF_3 + w2·IF_2 - w1·IF_1) / sqrt(n); CR1 Liang-Zeger on combined IF when cluster= set.
Intentional deviations from the source: None new. The vcov_type narrowing to {\"hc1\"} is a library-architectural surface restriction, not a methodological deviation — the paper's asymptotic theory is written around influence functions rather than a single hat-matrix-bearing regression. Existing documented deviations on the variance side (no Stata-style (n-1)/(n-p) finite-sample factor on the CR1 algebra because the IF has no design-matrix p in the OLS sense) are now explicitly called out in REGISTRY.md.

Validation

Tests added: tests/test_triple_diff.py::TestTripleDifferenceVcovType (35 tests; +477 LoC). Full file: 76 → 111 passing.
Tests still green: tests/test_methodology_triple_diff.py (44 passing, 63 skipped R-parity); tests/test_survey_phase3.py::TestTripleDifferenceSurvey (8 passing).
8 rounds of /ai-review-local --backend codex iterated to clean (zero actionable findings). Each round caught a different consumer surface: R1 summary-label rendering, R2 cluster+survey metadata, R3+R4 REGISTRY wording, R5 replicate-weight coverage, R7 cluster+replicate silent no-op (real bug discovered by R5's added test). R8 verdict: ✅ Looks good.

Security / privacy

Confirm no secrets/PII in this PR: Yes

@staticmethod

Mirrors CS PR #487. TripleDifference uses IF-based variance per Ortiz-Villavicencio & Sant'Anna (2025) on the 3-pairwise-DiD decomposition; analytical-sandwich families {classical, hc2, hc2_bm} have no single design matrix for hat-matrix leverage or BM Satterthwaite DOF and are rejected at __init__ with methodology-rooted messages. conley deferred to follow-up TODO row. vcov_type='hc1' (default) preserves all current behavior bit-equally across {dr, reg, ipw} on default / cluster= / survey_design= paths. vcov_type field added to TripleDifferenceResults + to_dict + summary ("Variance estimator: hc1"). _validate_vcov_type @staticmethod called from __init__ AND fit() so sklearn-style set_params mutations can't silently propagate bad values to Results metadata. 24 new tests in TestTripleDifferenceVcovType: 5-surface contract (default/cluster/survey bit-equal, __init__ rejection per family, fit()-time revalidation) + 8 introspection/convenience-function tests. REGISTRY IF-based taxonomy section now lists TripleDifference alongside CallawaySantAnna in "Enforced today". CHANGELOG, TODO row strikes (L107, L206), and new Conley deferral row added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses local codex P2 (.claude/reviews/local-review-latest.md): the unconditional "Variance estimator: hc1" line in summary() was misleading for clustered fits (actual algebra is CR1 Liang-Zeger on the combined IF, not the raw hc1 sandwich) and for survey-backed fits (actual algebra is TSL on the combined IF). Changes: - Add cluster_name field to TripleDifferenceResults (mirrors CS PR #487) - Thread self.cluster as cluster_name in fit() Results construction - summary() routes through shared _format_vcov_label (results.py:49-89): - Bare: "HC1 heteroskedasticity-robust" - Cluster: "CR1 cluster-robust at <name>, G=<n>" - Survey: line suppressed (Survey Design block is the canonical surface) - to_dict() carries cluster_name when set Tests (3 new) close the codex P3 regression-coverage gap: - test_summary_cluster_label_is_cr1_not_raw_hc1 - test_summary_no_variance_estimator_line_under_survey - test_results_cluster_name_carries_through (+ to_dict assertion) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses local codex R2 P2 (.claude/reviews/local-review-latest.md): when survey_design.psu overrides a conflicting cluster= argument, the Results cluster_name + n_clusters fields previously still recorded the raw self.cluster (the ignored argument). That made to_dict() and downstream reporting describe the wrong clustering level. Fix mirrors the variance-estimator line suppression in summary(): on survey-backed fits the Survey Design block (PSU/strata/df) is the canonical surface, so cluster_name + n_clusters are suppressed (set to None) rather than reporting potentially-overridden raw inputs. Reviewer offered two equivalent options ("reflect effective PSU OR suppress under survey-backed fits"); suppression is simpler and consistent with the existing summary() architecture. Tests: - test_cluster_name_suppressed_under_survey_design: cluster='state' + survey_design(psu='psu') with conflicting partitions; resolver emits the documented UserWarning; Results carries None for both cluster_name and n_clusters; to_dict() doesn't leak; Survey Design block remains in summary() Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses local codex R3 P3 (.claude/reviews/local-review-latest.md): the IF-based taxonomy section's "hc1 with cluster=X" bullet claimed enforced estimators activate CR1 by synthesizing SurveyDesign(psu=X) and routing through _compute_stratified_psu_meat. That's true for CallawaySantAnna but not for TripleDifference, which computes the algebraically equivalent CR1 directly from cluster-summed IFs inline at triple_diff.py:1245-1259 (no SurveyDesign synthesis — the IF is already in scope at the SE call site). Reword the bullet to describe both activation paths and note they produce the same numerical result. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses local codex R4 P3 (.claude/reviews/local-review-latest.md): the post-taxonomy "documented synthesis" note still read as if SurveyDesign(psu=...) synthesis were the universal clustered-hc1 route. After R3 reworded the per-estimator bullet, this generic note needed the same treatment — split the activation path into the CallawaySantAnna synthesis route and the TripleDifference inline-CR1 route. CR1 Liang-Zeger algebra on the IF is Williams (2000) / Hansen (2007) in both cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses local codex R5 P1 (.claude/reviews/local-review-latest.md): the new vcov_type contract was previously only regression-tested on the default, bare-cluster, and analytical-TSL survey paths. The distinct replicate-variance branch in fit() at lines 727-747 (survey_design with replicate_weights + JK1/BRR/Fay/JKn/SDR) was unverified, leaving a coverage hole on a supported inference branch. Add _ddd_replicate_panel helper (mirrors the JK1 pattern from tests/test_survey_phase6.py::test_triple_diff_replicate_all_methods but uses default_rng for reproducibility) + parameterized test_replicate_survey_hc1_bit_equal_baseline over reg/ipw/dr. Asserts: - ATT/SE bit-equal between default and explicit vcov_type="hc1" - results.vcov_type == "hc1" on the replicate branch - summary() still suppresses raw "Variance estimator" line under survey (Survey Design block remains the canonical surface) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pre-merge-check audit caught a missing surface: diff_diff/guides/llms-full.txt TripleDifference signature listing (lines 575-583) did not include the new vcov_type parameter, even though the CallawaySantAnna entry at line 199 has the exact analog line. The llms-full.txt is the AI-agent contract bundled in the wheel + published on RTD; the signature listing should mirror the public TripleDifference.__init__ signature. Add the vcov_type line with the same "permanently narrow {hc1} per IF-based variance" framing as CS, citing Ortiz-Villavicencio & Sant'Anna (2025). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses local codex R7 P1 (.claude/reviews/local-review-latest.md): TripleDifference(cluster=X, survey_design=SurveyDesign(replicate_weights=...)) was a silent no-op — the replicate-variance branch in fit() computes SE only via compute_replicate_if_variance(psi_rep, resolved_survey) and never consumes PSU/cluster state. Adding cluster='state' produced identical ATT/SE and no warnings relative to the no-cluster replicate fit, while populating cluster_name/n_clusters on Results dishonestly. Fail-closed per feedback_no_silent_failures: raise NotImplementedError at fit() when both cluster= and survey_design.replicate_weights are set. Message points to two valid alternatives: omit cluster= (replicate weights encode design implicitly) or use a non-replicate survey design with explicit strata/psu/fpc. Mirrors CS PR #487 guard at staggered.py:1705-1719 (the audit that caught it on CS originally noted "the other 7 Phase 1b estimators ... handle bare cluster= correctly" — but that audit only checked whether bare-cluster works at all, not the replicate-weight interaction). Tests: - test_cluster_plus_replicate_weights_rejected: parameterized over reg/ipw/dr, asserts NotImplementedError with "replicate-weight" in message Docs: - REGISTRY.md TripleDifference section gains a per-estimator Note on the cluster= + replicate-weight rejection - CHANGELOG entry amended with the new rejection clause Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-24T15:23:44Z

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

TripleDifference is the only affected method; the PR narrows the public vcov_type surface and adds validation/results plumbing, but it does not change the underlying ATT or SE algebra in _core_ddd.
I did not find an undocumented methodology mismatch against the Methodology Registry.
Parameter propagation looks complete across constructor, get_params(), fit-time revalidation, convenience wrapper, results serialization, and summary output.
No new inline inference or partial-NaN anti-patterns were introduced; inference still routes through safe_inference.
Review limitation: I could not run pytest here because the environment is missing pytest and runtime deps such as pandas; verification was static only.

Methodology

Severity: P3. Impact: TripleDifference remains aligned with the repo’s documented IF-based variance contract. The new vcov_type={"hc1"} restriction, fit-time revalidation, and results metadata only formalize the already-existing IF / clustered-IF / survey-TSL paths; I do not see an undocumented estimator, weighting, variance, or assumption drift in the diff. Concrete fix: none. diff_diff/triple_diff.py:L543-L551, diff_diff/triple_diff.py:L619-L684, diff_diff/triple_diff.py:L781-L818, docs/methodology/REGISTRY.md:L321-L360, docs/methodology/REGISTRY.md:L2039-L2045. citeturn1view1
Severity: P3. Impact: The new cluster + replicate_weights rejection is a fail-closed guard against a silent no-op and is explicitly documented in REGISTRY.md, so it is informational rather than blocking. Concrete fix: none. diff_diff/triple_diff.py:L661-L684, docs/methodology/REGISTRY.md:L2045-L2045

Code Quality

No findings. The changed fit path still uses safe_inference, and the modified file does not introduce new inline t_stat = ... / se or partial-NaN inference logic. diff_diff/triple_diff.py:L750-L779

Performance

No findings. The change is validation/metadata/test surface only; no hot-path estimator math was expanded.

Maintainability

No findings. The new parameter is propagated consistently through constructor state, get_params(), fit-time validation, the convenience wrapper, to_dict(), and summary(). diff_diff/triple_diff.py:L516-L556, diff_diff/triple_diff.py:L619-L623, diff_diff/triple_diff.py:L781-L818, diff_diff/triple_diff.py:L2036-L2055, diff_diff/triple_diff.py:L2101-L2205

Tech Debt

Severity: P3. Impact: Deferred conley support is properly tracked in TODO.md, so the deferral is non-blocking under the project’s review policy. Concrete fix: none in this PR. diff_diff/triple_diff.py:L2017-L2027, TODO.md:L107-L112

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The new tests cover the right surfaces for this change: default, cluster, survey, replicate-survey, fit-time revalidation, and results-summary metadata behavior. Concrete fix: none on the PR itself; rerun the targeted TripleDifference tests in CI or a normal dev environment with pytest installed. tests/test_triple_diff.py:L1206-L1618

igerber · 2026-05-24T15:28:40Z

/ai-review

github-actions · 2026-05-24T15:35:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b6fa9bdebc1457f96cf1cac909c74cfdad0879f

Overall Assessment

Looks good. No unmitigated P0/P1 findings in the PR diff.

Executive Summary

TripleDifference is the only affected method; the PR narrows the public vcov_type surface and adds validation/results plumbing, but it does not change the underlying ATT or existing IF / CR1 / survey-TSL variance algebra.
The narrowed vcov_type={"hc1"} contract is consistent with the source paper's IF-based DDD formulation and the registry's documented IF-vs-sandwich split; I did not find an undocumented methodology deviation. (arxiv.org)
The new cluster + replicate_weights guard correctly fail-closes the only silent-no-op risk in the changed code path.
Parameter propagation is complete across constructor, fit-time revalidation, get_params(), the convenience wrapper, results serialization, and summary output.
The prior AI review had no P1+ findings, and I did not identify any new P1+ regressions.
I could not execute pytest here because pytest is not installed.

Methodology

Severity: P3. Impact: The permanent vcov_type={"hc1"} narrowing is consistent with the paper and is now explicitly documented in the Methodology Registry, so this is informational rather than a defect. The paper motivates DDD-specific RA/IPW/DR estimators, shows DDD is not generally just "two DiDs," and builds the asymptotic theory around influence-function / recentered-influence-function representations, with cluster-robust inference treated as a straightforward extension of that setup. Concrete fix: none. diff_diff/triple_diff.py:L407-L415, diff_diff/triple_diff.py:L543-L551, diff_diff/triple_diff.py:L1986-L2034, docs/methodology/REGISTRY.md:L321-L360, docs/methodology/REGISTRY.md:L2039-L2045 (arxiv.org)
Severity: P3. Impact: Rejecting cluster= with replicate-weight survey designs is a documented fail-closed guard against misleading variance metadata and a silent no-op. Concrete fix: none. diff_diff/triple_diff.py:L661-L684, docs/methodology/REGISTRY.md:L2045-L2045

Code Quality

No findings. The diff continues to route inference through safe_inference() and does not introduce inline t_stat = ... / se or partial-NaN patterns. diff_diff/triple_diff.py:L750-L822

Performance

No findings. The change is validation, metadata plumbing, and summary formatting only; the DDD estimation and IF construction paths are unchanged. diff_diff/triple_diff.py:L619-L684, diff_diff/triple_diff.py:L781-L822

Maintainability

No findings. vcov_type propagation is complete across constructor state, fit-time revalidation, get_params(), the convenience wrapper, results serialization, and summary rendering. diff_diff/triple_diff.py:L516-L556, diff_diff/triple_diff.py:L619-L623, diff_diff/triple_diff.py:L781-L822, diff_diff/triple_diff.py:L2036-L2055, diff_diff/triple_diff.py:L2101-L2205

Tech Debt

Severity: P3. Impact: conley remains an explicitly tracked follow-up in TODO.md, so the new rejection message is properly tracked and non-blocking under the project policy. Concrete fix: none in this PR. TODO.md:L107-L113

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The new tests cover the key changed surfaces: default, cluster, survey, replicate-survey, fit-time revalidation, and results-summary metadata. Concrete fix: run tests/test_triple_diff.py -k 'TripleDifferenceVcovType or TripleDifferenceClusterDefensive' in CI or a normal dev environment; I could not run it here because pytest is not installed. tests/test_triple_diff.py:L1205-L1618

@staticmethod

Phase 1b interstitial #3 for ImputationDiD. Mirrors the CallawaySantAnna (PR #487) + TripleDifference (PR #488) template for IF-based estimators: vcov_type is permanently narrow to {"hc1"} because the per-unit influence function aggregation (Borusyak-Jaravel-Spiess 2024 Theorem 3) has no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined. Source surface: - diff_diff/imputation.py: vcov_type param + @staticmethod _validate_vcov_type + fit()-time revalidation + cluster+replicate-weights NotImplementedError guard + Results cluster_name/n_clusters resolution - diff_diff/imputation_results.py: vcov_type/cluster_name/n_clusters fields + new to_dict() + variance-estimator line in summary() routing through shared _format_vcov_label helper - diff_diff/imputation_bootstrap.py: dual-site n_clusters<2 / n_psu<2 NaN guards via new _build_nan_bootstrap_results helper (closes the BLAS-roundoff zero-SE class predicted to recur on IF-based estimators) Tests: 34 new tests in TestImputationDiDVcovType covering default / cluster / TSL-survey / replicate-survey bit-equality (parametrized over aggregate modes), bootstrap × cluster + bootstrap × survey bit-equality, fit()-time revalidation after set_params bypass, bootstrap n_psu<2 / n_clusters<2 NaN propagation, pretrends bit-equality, and the full introspection + safety-gate surface (8 tests). Docs: REGISTRY.md (IF-based taxonomy + 4 new Notes), CHANGELOG.md, TODO.md (row narrowed, Conley follow-up added), llms-full.txt (vcov_type + pretrends signature drift fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@staticmethod

… 3 (Phase 1b interstitial #3) Phase 1b interstitial #3 for ImputationDiD. Mirrors the CallawaySantAnna (PR #487) + TripleDifference (PR #488) template for IF-based estimators: vcov_type is permanently narrow to {"hc1"} because the per-unit influence function aggregation (Borusyak-Jaravel-Spiess 2024 Theorem 3) has no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined. Source surface (diff_diff/): - imputation.py: vcov_type param + @staticmethod _validate_vcov_type + fit()-time revalidation + cluster+replicate-weights NotImplementedError guard + Results metadata resolution (cluster_name=unit by default for the Theorem 3 unit-clustered IF variance; suppressed under ANY survey design — analytical OR replicate — because replicate variance ignores cluster/PSU entirely) - imputation_results.py: vcov_type/cluster_name/n_clusters fields, new to_dict() method, summary() variance line via shared _format_vcov_label (default cluster=None renders "CR1 cluster-robust at <unit>, G=<n>"; bootstrap fits suppress the analytical label and render "Inference method: bootstrap" instead, mirroring DiDResults.summary() gate at results.py:213-226) - imputation_bootstrap.py: dual-site n_clusters<2 / n_psu<2 NaN guards via new _build_nan_bootstrap_results helper (closes the BLAS-roundoff zero-SE class predicted to recur on IF-based estimators) Tests: 42 new tests in TestImputationDiDVcovType covering default / cluster / TSL-survey / replicate-survey + bootstrap × cluster + bootstrap × survey bit-equality (ALL parametrized over aggregate ∈ {None, "event_study", "group"} with per-horizon and per-group SE override branches pinned); fit()-time revalidation after set_params bypass; bootstrap n_psu<2 / n_clusters<2 NaN propagation including coef_var NaN; pretrends=True × vcov_type='hc1' × cluster bit-equality; introspection (default attr, get_params, Results carries, to_dict, summary label default+cluster+bootstrap-suppressed, cluster_name suppression under both analytical AND replicate survey, fit-clone idempotence, convenience function); input rejection on classical/hc2/hc2_bm/conley/ unknown with distinct methodology-keyword pins; cluster+replicate rejection. Full pytest tests/test_imputation.py: 128 passed. Docs: - REGISTRY.md: IF-based taxonomy adds ImputationDiD to "Enforced today" tier; ImputationDiD section gains 4 new Notes (vcov_type contract, cluster+replicate fail-closed, bootstrap n<2 NaN, default unit-cluster CR1 rendering) - CHANGELOG.md: [Unreleased] entry - TODO.md: Phase 1b row narrowed to TwoStageDiD + EfficientDiD; ImputationDiD Conley follow-up row added - guides/llms-full.txt: vcov_type + pretrends signature drift fix + shared staggered-results section advertises new variance metadata fields and to_dict() Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber and others added 8 commits May 24, 2026 10:16

igerber added ready-for-ci Triggers CI test workflows and removed ready-for-ci Triggers CI test workflows labels May 24, 2026

igerber added the ready-for-ci Triggers CI test workflows label May 24, 2026

igerber merged commit 03b0b13 into main May 24, 2026
62 of 64 checks passed

igerber deleted the triple-diff-vcov-type-phase1b branch May 24, 2026 17:02

igerber mentioned this pull request May 25, 2026

ImputationDiD: thread vcov_type as narrow {hc1} contract per BJS Theorem 3 (Phase 1b interstitial #3) #492

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 1b interstitial #2: TripleDifference vcov_type narrow contract#488

Phase 1b interstitial #2: TripleDifference vcov_type narrow contract#488
igerber merged 8 commits into
mainfrom
triple-diff-vcov-type-phase1b

igerber commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

igerber commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 24, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

igerber commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant