diff --git a/CHANGELOG.md b/CHANGELOG.md
index f1fa6e1a..fe537a8d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`CallawaySantAnna.cluster=` silent no-op (Phase 1b interstitial).** `CallawaySantAnna(cluster="state").fit(...)` previously accepted the argument, stored it, returned it from `get_params()`, but never consumed it anywhere in the fit / aggregator / bootstrap pipeline (`staggered.py:154-156` docstring claimed "Defaults to unit-level clustering" — but for bare `cluster=X`, the aggregator at `staggered_aggregation.py:193-213` computed per-unit IF variance regardless, and the bootstrap at `staggered_bootstrap.py:323-347` drew per-unit multiplier weights regardless). Users who explicitly set `cluster="state"` got per-unit inference with no warning — typically SE too small under intra-cluster correlation. **Survey-PSU clustering via `survey_design=SurveyDesign(psu="state")` was NOT affected** and continued to cluster correctly via `_compute_stratified_psu_meat`. The fix synthesizes a minimal `SurveyDesign(psu=self.cluster, weight_type="pweight")` when bare `cluster=` is set without an explicit survey design, threading the synthesized PSU through the existing survey-PSU machinery (aggregator + bootstrap). A new dedicated `df_inference` field on `CallawaySantAnnaResults` carries the cluster-level df for the bare-cluster-synthesize path ONLY (where `survey_metadata` is intentionally `None` to preserve the `DiagnosticReport.survey_metadata is not None` skip at `diagnostic_report.py:848-856` + `:1150-1158` for "Original fit used a survey design" reasoning, and the `summary()` survey block render at `staggered_results.py:235-238`). `HonestDiD` at `honest_did.py` prefers `survey_metadata.df_survey` first (the actual CS-internal df, which may be tightened post-resolve for replicate designs) and falls back to `df_inference` for bare-cluster fits — so downstream consumers always see the cluster df without overriding the post-recompute survey df. When `survey_design=SurveyDesign(weights=Y)` without PSU is provided AND `cluster=X` is also set, `_inject_cluster_as_psu` injects the bare cluster as the effective PSU AND an `effective_survey_design = replace(survey_design, psu=self.cluster)` is constructed so the downstream `_validate_unit_constant_survey` catches movers (units crossing clusters across periods) on panel data via the now-PSU-bearing design; `survey_metadata` is recomputed to reflect the injected PSU. When both `cluster=X` AND `survey_design.psu=Y` are set, the explicit PSU wins via `_resolve_effective_cluster` (emits `UserWarning` if partitions differ). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`**: replicate-weight variance is computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores PSU/cluster entirely (`survey.py:104-109` enforces replicate_weights are mutually exclusive with strata/psu/fpc); honoring bare `cluster=` would silently have no effect while populating `cluster_name`/`n_clusters` on Results dishonestly. Assertive regression tests pin the fix on both panel and repeated-cross-section paths plus the survey/non-survey contract boundaries: `test_cluster_robust_ses_differ_from_unit_level`, `test_bare_cluster_works_with_panel_false_rcs`, `test_bare_cluster_synthesizes_survey_design`, `test_inject_branch_panel_mover_raises`, `test_replicate_weight_plus_cluster_rejected`, `test_bare_cluster_populates_df_inference` (asserts the dedicated cluster-df carrier is set), `test_bare_cluster_does_not_set_survey_metadata` (asserts the survey/non-survey contract is preserved — DiagnosticReport / summary() must not treat a bare-cluster fit as survey-backed), `test_explicit_survey_design_does_populate_survey_metadata` (asserts the inject-branch path still populates survey_metadata for legitimate user-provided SurveyDesign), and `test_bare_cluster_honest_did_uses_df_inference` (end-to-end: HonestDiD threads df_inference into HonestDiDResults.df_survey, preventing silent normal-theory regression on a future refactor). When `cluster=None` (default), behavior is bit-equal to pre-PR (wiring guarded by `if self.cluster is not None:`). Audit verified the no-op was CS-specific — the other 7 Phase 1b estimators (SunAbraham, StackedDiD, WooldridgeDiD, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD) handle bare `cluster=` correctly.
 
 ### Added
+- **TripleDifference `vcov_type` input contract (Phase 1b interstitial #2, permanently narrow).** `TripleDifference(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages mirroring the CS interstitial. The rejection is **library-architectural, not paper-prescribed**: TripleDifference uses influence-function-based variance per Ortiz-Villavicencio & Sant'Anna (2025) arXiv:2505.09942 — the 3-pairwise-DiD decomposition `inf = w3·IF_3 + w2·IF_2 - w1·IF_1` has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to the remaining IF-based estimators (`ImputationDiD`, `EfficientDiD`) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (`std(inf)/sqrt(n)`); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the combined IF (`(G/(G-1)) · Σ_c (Σ_{i∈c} ψ_i)² / n²`, plain CR1 — no Stata-style `(n-1)/(n-p)` finite-sample factor because the IF has no design-matrix `p` in the OLS sense); `hc1` with `survey_design=` ≡ TSL on the combined IF (analytical or replicate). All three paths are unchanged at machine precision (default behavior bit-equal across all 3 estimation methods `{dr, reg, ipw}`). `vcov_type` and `cluster_name` fields added to `TripleDifferenceResults`, threaded through `to_dict()`. `summary()` routes the variance-family label through the shared `_format_vcov_label` (`results.py:49-89`): bare fits render `"HC1 heteroskedasticity-robust"`, clustered fits render `"CR1 cluster-robust at <cluster_name>, G=<n>"` (since the actual algebra is Liang-Zeger CR1 on the combined IF), and survey-backed fits suppress the variance-estimator line entirely (the Survey Design block already names design + n_psu + df, and the analytical SE is TSL on the combined IF — a raw "hc1" label would misstate the inference path). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`** at `fit()`: replicate-weight variance is computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores PSU/cluster entirely; honoring bare `cluster=` would silently have no effect on the variance estimate while populating `cluster_name`/`n_clusters` on Results dishonestly. Mirrors the `CallawaySantAnna` guard from PR #487. Under `survey_design.psu` (non-replicate path) `cluster_name`/`n_clusters` on Results are suppressed (set to None) so they can't misreport the raw cluster argument when the resolver picks the survey PSU instead. `set_params(vcov_type=...)` mirrors CS pattern (mutate-then-validate-at-use, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR #2** (after CS PR #487) rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by TripleDifference's IF-based variance, not a deferral. New `TestTripleDifferenceVcovType` class in `tests/test_triple_diff.py` covers the 5-surface contract (default/cluster/survey bit-equal, `__init__` rejection per family, `fit()`-time revalidation) plus 8 introspection / convenience-function tests. REGISTRY.md "IF-based variance estimators vs analytical-sandwich estimators" cross-reference section updated to list `TripleDifference` alongside `CallawaySantAnna` in the "Enforced today" tier. Phase 1b PR 4/8 (full `{classical, hc1, hc2, hc2_bm}` threading) resumes on a different estimator (TwoStageDiD) post-merge; the two remaining IF-based estimators (`ImputationDiD`, `EfficientDiD`) follow the same narrow-contract template.
 - **CallawaySantAnna `vcov_type` input contract (Phase 1b interstitial, permanently narrow).** `CallawaySantAnna(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages. The rejection is **library-architectural, not paper-prescribed**: CS uses influence-function-based variance per Callaway & Sant'Anna (2021) — per-(g,t) doubly-robust / IPW / outcome-regression structure — and has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to other IF-based estimators (ImputationDiD, EfficientDiD) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (Williams 2000 form); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the IF activated via the cluster= wiring fix above. Documentation in `docs/methodology/REGISTRY.md` "IF-based variance estimators vs analytical-sandwich estimators" subsection. `vcov_type`, `cluster_name`, `n_clusters`, `df_inference` added to `CallawaySantAnnaResults` (the canonical PSU column wins for `cluster_name` reporting — `survey_design.psu` when explicit PSU is provided, `self.cluster` when bare cluster synthesizes/injects). `set_params(vcov_type=...)` mirrors SA pattern (mutate-then-refresh `_vcov_type_explicit`, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR** rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by CS's IF-based variance, not a deferral. Phase 1b PR 4/8 (full {classical, hc1, hc2, hc2_bm} threading) resumes on a different estimator post-merge.
 - **TripleDifference cluster-changes-SE defensive regression test.** Added `tests/test_triple_diff.py::TestTripleDifferenceClusterDefensive::test_cluster_changes_ses` asserting that `TripleDifference(cluster="state")` produces SE differing from `cluster=None` SE by `>1e-6` on a fixed-seed panel with state-level random effects. Defensive coverage closes a test gap identified during the Phase 1b cluster-wiring audit; TripleDifference's bare-cluster code path (`triple_diff.py:1245-1259`) was already correct but lacked a positive regression test. Mirrors `tests/test_two_stage.py::test_cluster_changes_ses`.
 - **TwoStageDiD: parity with SpilloverDiD Wave E.3 — always-treated unit drop preserves full-domain survey design via zero-padded scores.** Closes the parity follow-up tracked at `TODO.md` after PR #482 (SpilloverDiD Wave E.3, merge `24de9062`). When TwoStageDiD detects always-treated units (`first_treat <= min_time`) and removes them from the OLS sample, the resolved survey design retains its FULL-DOMAIN `n_psu` / `n_strata` / `df_survey` / `strata` / `fpc` / `psu` arrays instead of being subsetted via `replace(resolved_survey, ...)`. Per-cluster stage-1 / stage-2 score aggregates are computed at the post-drop fit-sample length and then zero-padded onto the full-domain unique-PSU list before stratified-meat dispatch via two new optional kwargs on `_compute_gmm_variance`: `score_pad_mask` (full-domain boolean keep mask) and `cluster_ids_full` (full-domain post-injection PSU labels). PSUs containing only always-treated rows get zero score rows but still count toward `G_full` for `n_psu` / `df_survey` accounting. **Documented synthesis (library-convention adoption, NOT new methodology):** adopts the canonical "zero-pad scores + retain full-design resolved survey" convention from R `survey::svyrecvar(subset())` (Lumley 2010 §2.5) already established in `diff_diff/imputation.py:2175-2183` (PreTrendsImputation), `diff_diff/prep.py:1401-1432` (DCDH cell variance), and `diff_diff/spillover.py` (PR #482 Wave E.3). **Mechanical realization:** `two_stage.py:1485-1525` design-subset block deleted (the `replace(resolved_survey, ...)` subset + `n_psu` / `n_strata` recompute + post-drop `compute_survey_metadata` call); `keep_mask` promoted to `fit()`-level scope (always defined, all-True when no always-treated drop); `survey_weights = survey_weights[keep_mask.values]` retained for stage-1 / stage-2 OLS arithmetic; cluster injection block updated to source `cluster_ids_raw` from FULL-DOMAIN `data[cluster_var].values` (not post-drop `df[cluster_var].values`) so `_inject_cluster_as_psu`'s zip against `resolved_survey.strata` (full-domain) stays length-aligned; `df["_survey_cluster"]` aligned to post-drop length via `resolved_survey.psu[keep_mask.values]`; post-injection `compute_survey_metadata` uses full-domain `raw_w` from `data[survey_design.weights]`. `_compute_gmm_variance` adds the zero-pad expansion after the per-cluster aggregation (mapping fit-sample `unique_clusters` into `unique_clusters_full` positions via `np.searchsorted`) and updates the strata/fpc `obs_idx` lookups to use `cluster_ids_for_lookup = cluster_ids_full` when padding is active. The three inner stage-2 methods (`_stage2_static`, `_stage2_event_study`, `_stage2_group`) thread the new kwargs through; bootstrap-resample call sites keep default `None` (no behavior change on bootstrap path). **Always-treated warning text updated:** "Associated survey weights subsetted for stage-1 / stage-2 OLS; full-domain survey design retained for variance estimation (Wave E.3 parity)." replaces the prior "and design arrays adjusted" claim. **No-survey path unchanged:** when `resolved_survey is None`, both `score_pad_mask` and `cluster_ids_full` default to `None` and the existing post-drop scoring path runs bit-identically. **Replicate variance + always-treated drop:** existing path unchanged (replicate refit handles resampling at the survey-design level; `score_pad_mask_arg` is `None` on `_uses_replicate_ts` paths). **Tests:** new `TestTwoStageDiDWaveE3ParityAlwaysTreated` class in `tests/test_two_stage.py` (8 tests: no-always-treated baseline, full-domain `df_survey` preservation under drop, full-domain `n_psu` reporting, per-cluster zero-pad mock-spy on `_compute_stratified_meat_from_psu_scores`, subpopulation + always-treated composition, cluster-as-PSU + always-treated, no-survey path unchanged, PSU entirely-always-treated). REGISTRY.md TwoStageDiD section gains a "documented synthesis — Wave E.3 parity" note; SpilloverDiD Wave E.3 section updated to mark the TwoStageDiD parity follow-up as shipped.
diff --git a/TODO.md b/TODO.md
index fdbc2c26..47206f88 100644
--- a/TODO.md
+++ b/TODO.md
@@ -104,12 +104,13 @@ Deferred items from PR reviews that were not addressed before merge.
 | PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |
 
 
-| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; PR 2/8 added `StackedDiD`; PR 3/8 added `WooldridgeDiD` OLS path. **Interstitial PR (post-PR-3/8) addressed `CallawaySantAnna` separately**: CS uses IF-based variance per Callaway & Sant'Anna (2021) Theorem 2, so its `vcov_type` contract is permanently narrow to `{"hc1"}` (analytical-sandwich families don't compose); the interstitial also fixed CS's bare-`cluster=` silent no-op. This row tracks the remaining 4 (ImputationDiD and EfficientDiD are also IF-based and will likely adopt the same narrow contract). | multiple | Phase 1b | Medium |
+| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `ImputationDiD`, `TwoStageDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; PR 2/8 added `StackedDiD`; PR 3/8 added `WooldridgeDiD` OLS path. **Two interstitial PRs (post-PR-3/8) addressed the IF-based estimators separately, each permanently narrow to `{"hc1"}`**: (a) `CallawaySantAnna` per Callaway & Sant'Anna (2021) Theorem 2 (also fixed CS's bare-`cluster=` silent no-op); (b) `TripleDifference` per Ortiz-Villavicencio & Sant'Anna (2025) on the 3-pairwise-DiD decomposition. Analytical-sandwich families don't compose with IF-based variance for either. This row tracks the remaining 3 (`ImputationDiD` and `EfficientDiD` are also IF-based and will likely adopt the same narrow contract; `TwoStageDiD` is sandwich-class). | multiple | Phase 1b | Medium |
 | Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
 | Extend `StackedDiD` with `vcov_type="conley"` (Conley spatial-HAC) — thread the six `conley_*` params through `solve_ols` at `stacked_did.py:419` (and the `_refit_stacked` closure at `:444`). Phase 1b PR 2/8 deferred this; StackedDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham conley follow-up. | `diff_diff/stacked_did.py` | follow-up | Medium |
 | Extend `WooldridgeDiD` with `vcov_type="conley"` — thread the six `conley_*` params through `solve_ols` in `_fit_ols`. Phase 1b PR 3/8 deferred this; WooldridgeDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham / StackedDiD conley follow-ups. | `diff_diff/wooldridge.py` | follow-up | Medium |
 | Extend `WooldridgeDiD` `method ∈ {"logit","poisson"}` paths with `vcov_type ∈ {classical, hc2, hc2_bm}`. The GLM QMLE sandwich uses pseudo-residuals (`weights=p(1-p)` for logit, `weights=μ_i` for Poisson, aweight semantics); composing HC2 leverage and Bell-McCaffrey Satterthwaite DOF with QMLE on canonical-link pseudo-residuals needs derivation + R parity against `clubSandwich::vcovCR(glm(...), type="CR2")`. Phase 1b PR 3/8 rejects `method != "ols" + vcov_type != "hc1"` at `__init__` with a deferral pointer here. | `diff_diff/wooldridge.py` (`_fit_logit`, `_fit_poisson`) | follow-up | Medium |
 | Extend `CallawaySantAnna` with `vcov_type="conley"` — would require deriving a spatial-HAC composition for per-unit influence functions (Conley 1999 spatial kernel × per-(g,t) IF aggregation); no reference implementation exists today. Phase 1b interstitial PR rejected this at `__init__` with a deferral pointer here. | `diff_diff/staggered.py` | follow-up | Low |
+| Extend `TripleDifference` with `vcov_type="conley"` — would require deriving a spatial-HAC composition for the 3-pairwise-DiD influence-function decomposition (Conley 1999 spatial kernel × `inf = w3·IF_3 + w2·IF_2 - w1·IF_1` aggregation); no reference implementation exists today. Phase 1b interstitial #2 PR rejected this at `__init__` with a deferral pointer here. | `diff_diff/triple_diff.py` | follow-up | Low |
 | Decide whether to formally deprecate `CallawaySantAnna.cluster=X` in favor of `survey_design=SurveyDesign(psu=X)`. Both APIs are first-class today (the bare-cluster path synthesizes a minimal SurveyDesign internally), but having two equivalent paths to express the same intent creates redundant surface. Mirrors a similar question for ImputationDiD / EfficientDiD / TwoStageDiD if those estimators ever face the same review. | `diff_diff/staggered.py` | follow-up | Low |
 | Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
 <!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
@@ -203,7 +204,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 
 #### Tier B — Mid-size methodology (5-10 CI rounds expected, per memory cascade priors)
 
-- Thread `vcov_type` through the 4 remaining standalone estimators: `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD` (Phase 1b PR 1/8 added SunAbraham, PR 2/8 added StackedDiD, PR 3/8 added WooldridgeDiD-OLS; interstitial post-PR-3/8 narrowed CallawaySantAnna permanently to `{hc1}` per IF-based variance + fixed bare-`cluster=` silent no-op)
+- Thread `vcov_type` through the 3 remaining standalone estimators: `ImputationDiD`, `TwoStageDiD`, `EfficientDiD` (Phase 1b PR 1/8 added SunAbraham, PR 2/8 added StackedDiD, PR 3/8 added WooldridgeDiD-OLS; interstitial #1 narrowed CallawaySantAnna permanently to `{hc1}` per IF-based variance + fixed bare-`cluster=` silent no-op; interstitial #2 narrowed TripleDifference permanently to `{hc1}` per IF-based variance on the 3-pairwise-DiD decomposition)
 - SyntheticDiD: rename internal `placebo_effects` → `variance_effects` AND public `placebo_effects` field with deprecation alias retained for one release (`synthetic_did.py`, `results.py`)
 - StaggeredTripleDifference R parity: commit CSV fixtures + add covariate-adjusted scenarios + aggregation-SE assertions (`tests/test_methodology_staggered_triple_diff.py`, `benchmarks/R/benchmark_staggered_triplediff.R`)
 - StaggeredTripleDifference: per-cohort group-effect SE WIF override for exact R `triplediff` match (`staggered_triple_diff.py`)
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index b6932d6e..fb86aaff 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -577,6 +577,7 @@ TripleDifference(
     estimation_method: str = "dr",            # "dr", "reg", or "ipw"
     robust: bool = True,
     cluster: str | None = None,
+    vcov_type: str = "hc1",                   # {"hc1"} only — IF-based variance per Ortiz-Villavicencio & Sant'Anna (2025). Analytical-sandwich {classical, hc2, hc2_bm} and conley REJECTED at __init__ (see REGISTRY.md IF-vs-sandwich subsection).
     alpha: float = 0.05,
     pscore_trim: float = 0.01,
     rank_deficient_action: str = "warn",
diff --git a/diff_diff/triple_diff.py b/diff_diff/triple_diff.py
index fc504507..45b5eb27 100644
--- a/diff_diff/triple_diff.py
+++ b/diff_diff/triple_diff.py
@@ -12,6 +12,11 @@
 unlike naive implementations. Standard errors use the efficient influence
 function: SE = std(IF) / sqrt(n), which is inherently heteroskedasticity-
 robust. Cluster-robust SEs are available via the ``cluster`` parameter.
+The ``vcov_type`` input contract is permanently narrow to ``{"hc1"}``
+because the analytical-sandwich families (classical, hc2, hc2_bm) have
+no equivalent single design matrix on the 3-pairwise-DiD decomposition;
+see REGISTRY.md "IF-based variance estimators vs analytical-sandwich
+estimators" for the structural taxonomy.
 
 The DDD is computed via three pairwise DiD comparisons matching R's
 ``triplediff::ddd()`` package (panel=FALSE mode).
@@ -101,14 +106,14 @@ class TripleDifferenceResults:
     covariate_balance: Optional[pd.DataFrame] = field(default=None, repr=False)
     # Inference details
     inference_method: str = field(default="analytical")
+    vcov_type: str = field(default="hc1")
+    cluster_name: Optional[str] = field(default=None)
     n_bootstrap: Optional[int] = field(default=None)
     n_clusters: Optional[int] = field(default=None)
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None)
     # EPV diagnostics per subgroup comparison
-    epv_diagnostics: Optional[Dict[int, Dict[str, Any]]] = field(
-        default=None, repr=False
-    )
+    epv_diagnostics: Optional[Dict[int, Dict[str, Any]]] = field(default=None, repr=False)
     epv_threshold: float = 10
     pscore_fallback: str = "error"
 
@@ -164,6 +169,23 @@ def summary(self, alpha: Optional[float] = None) -> str:
             lines.append(f"{'Inference method:':<30} {self.inference_method:>15}")
             if self.n_bootstrap is not None:
                 lines.append(f"{'Bootstrap replications:':<30} {self.n_bootstrap:>15}")
+        # Variance-estimator line. Suppressed under survey designs (the survey
+        # block above already names the design + n_psu + df; the analytical
+        # SE is TSL on the combined IF, not the raw hc1 sandwich). For bare
+        # cluster= fits the actual algebra is CR1 Liang-Zeger on the combined
+        # IF, so route through the shared _format_vcov_label to render a
+        # cluster-aware label rather than raw "hc1".
+        if self.survey_metadata is None:
+            from diff_diff.results import _format_vcov_label
+
+            vcov_label = _format_vcov_label(
+                self.vcov_type,
+                cluster_name=self.cluster_name,
+                n_clusters=self.n_clusters,
+                n_obs=self.n_obs,
+            )
+            if vcov_label:
+                lines.append(f"{'Variance estimator:':<30} {vcov_label:>15}")
         if self.n_clusters is not None:
             lines.append(f"{'Number of clusters:':<30} {self.n_clusters:>15}")
 
@@ -266,6 +288,7 @@ def to_dict(self) -> Dict[str, Any]:
             "n_control_ineligible": self.n_control_ineligible,
             "estimation_method": self.estimation_method,
             "inference_method": self.inference_method,
+            "vcov_type": self.vcov_type,
         }
         if self.r_squared is not None:
             result["r_squared"] = self.r_squared
@@ -273,6 +296,8 @@ def to_dict(self) -> Dict[str, Any]:
             result["n_bootstrap"] = self.n_bootstrap
         if self.n_clusters is not None:
             result["n_clusters"] = self.n_clusters
+        if self.cluster_name is not None:
+            result["cluster_name"] = self.cluster_name
         if self.survey_metadata is not None:
             sm = self.survey_metadata
             result["weight_type"] = sm.weight_type
@@ -320,9 +345,7 @@ def epv_summary(self, show_all: bool = False) -> pd.DataFrame:
             Columns: subgroup, epv, n_events, n_params, is_low.
         """
         if not self.epv_diagnostics:
-            return pd.DataFrame(
-                columns=["subgroup", "epv", "n_events", "n_params", "is_low"]
-            )
+            return pd.DataFrame(columns=["subgroup", "epv", "n_events", "n_params", "is_low"])
         rows = []
         for sg, diag in sorted(self.epv_diagnostics.items()):
             if show_all or diag.get("is_low", False):
@@ -381,6 +404,15 @@ class TripleDifference:
         Column name for cluster-robust standard errors. When provided,
         SEs are computed using the Liang-Zeger cluster-robust variance
         estimator on the influence function.
+    vcov_type : str, default="hc1"
+        Variance estimator. Permanently narrow to ``{"hc1"}`` per the
+        IF-based variance decomposition: TripleDifference uses an
+        efficient influence function and has no single design matrix on
+        which the analytical-sandwich families (``classical``, ``hc2``,
+        ``hc2_bm``) could compute hat-matrix leverage or Bell-McCaffrey
+        Satterthwaite DOF. ``conley`` is deferred. With ``hc1``, default
+        SE is ``std(IF)/sqrt(n)``; with ``hc1`` + ``cluster=<col>``,
+        Liang-Zeger CR1 on the combined IF.
     alpha : float, default=0.05
         Significance level for confidence intervals.
     pscore_trim : float, default=0.01
@@ -486,6 +518,7 @@ def __init__(
         estimation_method: str = "dr",
         robust: bool = True,
         cluster: Optional[str] = None,
+        vcov_type: str = "hc1",
         alpha: float = 0.05,
         pscore_trim: float = 0.01,
         rank_deficient_action: str = "warn",
@@ -502,17 +535,25 @@ def __init__(
                 f"got '{rank_deficient_action}'"
             )
         if epv_threshold <= 0:
-            raise ValueError(
-                f"epv_threshold must be > 0, got {epv_threshold}"
-            )
+            raise ValueError(f"epv_threshold must be > 0, got {epv_threshold}")
         if pscore_fallback not in {"error", "unconditional"}:
             raise ValueError(
-                f"pscore_fallback must be 'error' or 'unconditional', "
-                f"got '{pscore_fallback}'"
+                f"pscore_fallback must be 'error' or 'unconditional', " f"got '{pscore_fallback}'"
             )
+        # vcov_type input contract: TripleDifference is permanently narrow
+        # to {"hc1"} because the analytical-sandwich families (classical,
+        # hc2, hc2_bm) require a single regression's hat matrix that
+        # TripleDifference's 3-pairwise-DiD influence-function decomposition
+        # doesn't have. See REGISTRY.md "IF-based variance estimators vs
+        # analytical-sandwich estimators" for the structural taxonomy.
+        # Factored out so fit() can re-run it after sklearn-style
+        # set_params bypasses __init__ validation.
+        self._validate_vcov_type(vcov_type)
+
         self.estimation_method = estimation_method
         self.robust = robust
         self.cluster = cluster
+        self.vcov_type = vcov_type
         self.alpha = alpha
         self.pscore_trim = pscore_trim
         self.rank_deficient_action = rank_deficient_action
@@ -575,6 +616,12 @@ def fit(
         NotImplementedError
             If survey_design is used with wild_bootstrap inference.
         """
+        # Re-validate vcov_type at fit-time so sklearn-style set_params
+        # mutations are caught before they propagate to Results metadata.
+        # __init__ already validated the constructor argument; this is the
+        # second layer for the post-construction mutation path.
+        self._validate_vcov_type(self.vcov_type)
+
         # Reset replicate state from any previous fit
         self._replicate_n_valid = None
 
@@ -611,6 +658,31 @@ def fit(
         if self._cluster_ids is not None and np.any(pd.isna(data[self.cluster])):
             raise ValueError(f"Cluster column '{self.cluster}' contains missing values")
 
+        # Reject replicate-weight + cluster=: replicate IF variance is
+        # computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR)
+        # and ignores PSU/cluster entirely (survey.py:104-109 enforces
+        # replicate_weights are mutually exclusive with strata/psu/fpc).
+        # Honoring bare cluster= here would silently have no effect on
+        # variance while populating cluster_name/n_clusters on Results
+        # dishonestly. Fail-closed per feedback_no_silent_failures.
+        # Mirrors CallawaySantAnna guard at staggered.py:1705-1719.
+        if (
+            self.cluster is not None
+            and survey_design is not None
+            and getattr(survey_design, "replicate_weights", None) is not None
+        ):
+            raise NotImplementedError(
+                f"TripleDifference(cluster={self.cluster!r}) is not "
+                "supported with replicate-weight survey designs. "
+                "Replicate-weight variance is computed by replicate "
+                "reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores "
+                "PSU/cluster entirely — setting cluster= would silently "
+                "have no effect on the variance estimate. Either omit "
+                "cluster= (the replicate weights encode the design "
+                "structure implicitly) or use a non-replicate survey "
+                "design (with explicit strata/psu/fpc)."
+            )
+
         # Resolve effective cluster and inject cluster-as-PSU for survey variance
         if resolved_survey is not None:
             effective_cluster_ids = _resolve_effective_cluster(
@@ -680,17 +752,22 @@ def fit(
         if survey_metadata is not None and survey_metadata.df_survey is not None:
             df = survey_metadata.df_survey
             # Override with effective replicate df only when replicates were dropped
-            if (hasattr(self, '_replicate_n_valid') and self._replicate_n_valid is not None
-                    and resolved_survey is not None
-                    and self._replicate_n_valid < resolved_survey.n_replicates):
+            if (
+                hasattr(self, "_replicate_n_valid")
+                and self._replicate_n_valid is not None
+                and resolved_survey is not None
+                and self._replicate_n_valid < resolved_survey.n_replicates
+            ):
                 df = self._replicate_n_valid - 1
                 survey_metadata.df_survey = self._replicate_n_valid - 1
             # df <= 0 means insufficient rank for t-based inference
             if df is not None and df <= 0:
                 df = 0  # Forces NaN from t-distribution
-        elif (resolved_survey is not None
-              and hasattr(resolved_survey, 'uses_replicate_variance')
-              and resolved_survey.uses_replicate_variance):
+        elif (
+            resolved_survey is not None
+            and hasattr(resolved_survey, "uses_replicate_variance")
+            and resolved_survey.uses_replicate_variance
+        ):
             # Replicate design with undefined df (rank <= 1) — NaN inference
             df = 0  # Forces NaN from t-distribution
         else:
@@ -701,10 +778,21 @@ def fit(
 
         t_stat, p_value, conf_int = safe_inference(att, se, alpha=self.alpha, df=df)
 
-        # Get number of clusters if clustering
-        n_clusters = None
-        if self.cluster is not None:
+        # Resolve cluster_name / n_clusters for Results metadata.
+        # Under survey designs the survey block (PSU/strata/df) is the
+        # canonical surface for cluster reporting — suppress the bare
+        # cluster_name / n_clusters fields so they don't misreport the
+        # raw `cluster=` argument when `survey_design.psu` overrides it.
+        # Mirrors the variance-estimator line suppression in summary().
+        if resolved_survey is not None:
+            cluster_name_for_results: Optional[str] = None
+            n_clusters: Optional[int] = None
+        elif self.cluster is not None:
+            cluster_name_for_results = self.cluster
             n_clusters = data[self.cluster].nunique()
+        else:
+            cluster_name_for_results = None
+            n_clusters = None
 
         # Create results object
         self.results_ = TripleDifferenceResults(
@@ -724,6 +812,8 @@ def fit(
             pscore_stats=pscore_stats,
             r_squared=r_squared,
             inference_method="analytical",
+            vcov_type=self.vcov_type,
+            cluster_name=cluster_name_for_results,
             n_clusters=n_clusters,
             survey_metadata=survey_metadata,
             epv_diagnostics=epv_diag if epv_diag else None,
@@ -862,7 +952,9 @@ def _regression_adjustment(
         X: Optional[np.ndarray],
         survey_weights: Optional[np.ndarray] = None,
         resolved_survey=None,
-    ) -> Tuple[float, float, Optional[float], Optional[Dict[str, float]], Dict[int, Dict[str, Any]]]:
+    ) -> Tuple[
+        float, float, Optional[float], Optional[Dict[str, float]], Dict[int, Dict[str, Any]]
+    ]:
         """
         Estimate ATT using regression adjustment via three-DiD decomposition.
 
@@ -890,7 +982,9 @@ def _ipw_estimation(
         X: Optional[np.ndarray],
         survey_weights: Optional[np.ndarray] = None,
         resolved_survey=None,
-    ) -> Tuple[float, float, Optional[float], Optional[Dict[str, float]], Dict[int, Dict[str, Any]]]:
+    ) -> Tuple[
+        float, float, Optional[float], Optional[Dict[str, float]], Dict[int, Dict[str, Any]]
+    ]:
         """
         Estimate ATT using inverse probability weighting via three-DiD
         decomposition.
@@ -918,7 +1012,9 @@ def _doubly_robust(
         X: Optional[np.ndarray],
         survey_weights: Optional[np.ndarray] = None,
         resolved_survey=None,
-    ) -> Tuple[float, float, Optional[float], Optional[Dict[str, float]], Dict[int, Dict[str, Any]]]:
+    ) -> Tuple[
+        float, float, Optional[float], Optional[Dict[str, float]], Dict[int, Dict[str, Any]]
+    ]:
         """
         Estimate ATT using doubly robust estimation via three-DiD
         decomposition.
@@ -947,7 +1043,9 @@ def _estimate_ddd_decomposition(
         X: Optional[np.ndarray],
         survey_weights: Optional[np.ndarray] = None,
         resolved_survey=None,
-    ) -> Tuple[float, float, Optional[float], Optional[Dict[str, float]], Dict[int, Dict[str, Any]]]:
+    ) -> Tuple[
+        float, float, Optional[float], Optional[Dict[str, float]], Dict[int, Dict[str, Any]]
+    ]:
         """
         Core DDD estimation via three-DiD decomposition.
 
@@ -1032,10 +1130,7 @@ def _estimate_ddd_decomposition(
                             diagnostics_out=diag,
                         )
                     except Exception:
-                        if (
-                            self.pscore_fallback == "error"
-                            or self.rank_deficient_action == "error"
-                        ):
+                        if self.pscore_fallback == "error" or self.rank_deficient_action == "error":
                             raise
                         if w_sub is not None:
                             pos = w_sub > 0
@@ -1888,6 +1983,56 @@ def _safe_ratio(num, denom):
         inf_func = inf_treat - inf_control + inf_eff + inf_or
         return att, inf_func
 
+    @staticmethod
+    def _validate_vcov_type(vcov_type: str) -> None:
+        """Validate ``vcov_type`` membership against TripleDifference's
+        narrow contract.
+
+        Called from ``__init__`` and from ``fit()`` (so sklearn-style
+        ``set_params(vcov_type=...)`` mutations are re-checked at use
+        time rather than silently passing a bad value through to Results).
+        """
+        _accepted_vcov = {"hc1"}
+        _deferred_vcov = {"conley"}
+        _if_incompatible_vcov = {"classical", "hc2", "hc2_bm"}
+        if vcov_type in _if_incompatible_vcov:
+            raise ValueError(
+                f"TripleDifference(vcov_type={vcov_type!r}) is rejected: "
+                "TripleDifference uses influence-function-based variance "
+                "per Ortiz-Villavicencio & Sant'Anna (2025) "
+                "arXiv:2505.09942; the analytical-sandwich families "
+                "{'classical', 'hc2', 'hc2_bm'} are defined on a single "
+                "regression's hat matrix, and TripleDifference's "
+                "3-pairwise-DiD decomposition (DiD_3 + DiD_2 - DiD_1) "
+                "has no equivalent single design matrix to compute "
+                "hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF "
+                "on. The rejection is library-architectural, not "
+                "paper-prescribed. Use vcov_type='hc1' (the default) with "
+                "cluster=<col> for cluster-robust inference (Liang-Zeger "
+                "CR1 on the combined influence function). See "
+                "docs/methodology/REGISTRY.md 'IF-based variance "
+                "estimators vs analytical-sandwich estimators' for the "
+                "structural taxonomy."
+            )
+        if vcov_type in _deferred_vcov:
+            raise ValueError(
+                f"TripleDifference(vcov_type={vcov_type!r}) is not yet "
+                "supported: spatial-HAC (Conley) on the 3-pairwise-DiD "
+                "influence-function decomposition could conceptually "
+                "apply (spatial aggregation of per-unit IFs) but requires "
+                "separate methodology work; no reference implementation "
+                "exists today. Tracked as a follow-up TODO row. Use "
+                "vcov_type='hc1' (the default) with cluster=<col> for "
+                "cluster-robust inference today."
+            )
+        if vcov_type not in _accepted_vcov:
+            raise ValueError(
+                f"TripleDifference(vcov_type={vcov_type!r}) is invalid. "
+                f"Accepted values: {sorted(_accepted_vcov)}. "
+                "TripleDifference is permanently narrow to 'hc1' per "
+                "IF-based variance structure; see REGISTRY.md."
+            )
+
     def get_params(self) -> Dict[str, Any]:
         """
         Get estimator parameters (sklearn-compatible).
@@ -1901,6 +2046,7 @@ def get_params(self) -> Dict[str, Any]:
             "estimation_method": self.estimation_method,
             "robust": self.robust,
             "cluster": self.cluster,
+            "vcov_type": self.vcov_type,
             "alpha": self.alpha,
             "pscore_trim": self.pscore_trim,
             "rank_deficient_action": self.rank_deficient_action,
@@ -1962,6 +2108,7 @@ def triple_difference(
     estimation_method: str = "dr",
     robust: bool = True,
     cluster: Optional[str] = None,
+    vcov_type: str = "hc1",
     alpha: float = 0.05,
     rank_deficient_action: str = "warn",
     epv_threshold: float = 10,
@@ -2001,6 +2148,11 @@ def triple_difference(
         for API compatibility.
     cluster : str, optional
         Column name for cluster-robust standard errors.
+    vcov_type : str, default="hc1"
+        Variance estimator. Permanently narrow to ``{"hc1"}`` per the
+        IF-based variance decomposition; ``classical``/``hc2``/``hc2_bm``
+        are rejected at ``__init__`` and ``conley`` is deferred. See the
+        ``TripleDifference`` class docstring for the structural taxonomy.
     alpha : float, default=0.05
         Significance level for confidence intervals.
     rank_deficient_action : str, default="warn"
@@ -2037,6 +2189,7 @@ def triple_difference(
         estimation_method=estimation_method,
         robust=robust,
         cluster=cluster,
+        vcov_type=vcov_type,
         alpha=alpha,
         rank_deficient_action=rank_deficient_action,
         epv_threshold=epv_threshold,
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index f007c88b..4aea52bc 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -321,14 +321,20 @@ a defined interpretation on the hat-matrix-bearing design (HC2 leverage
 **IF-based estimators** derive variance from an asymptotic influence function
 `Var(θ̂) = (1/n) Σ_i ψ_i²` per estimator-specific derivations (Callaway &
 Sant'Anna 2021 for `CallawaySantAnna`; Borusyak-Jaravel-Spiess 2024 for
-`ImputationDiD`; Sant'Anna & Zhao 2020 for `EfficientDiD`). For these:
+`ImputationDiD`; Sant'Anna & Zhao 2020 for `EfficientDiD`; Ortiz-Villavicencio
+& Sant'Anna 2025 for `TripleDifference`, where the variance is built on the
+3-pairwise-DiD decomposition `inf = w3·IF_3 + w2·IF_2 - w1·IF_1`). For these:
 
 - `hc1` with `cluster=None` ≡ per-unit IF variance — the default
   (Williams 2000 form).
 - `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the IF:
-  `Var = (G/(G-1)) Σ_c (Σ_{i∈c} ψ_i)² / n²`. Activated by synthesizing
-  `SurveyDesign(psu=X)` internally and routing through the existing PSU-meat
-  machinery (`_compute_stratified_psu_meat`).
+  `Var = (G/(G-1)) Σ_c (Σ_{i∈c} ψ_i)² / n²`. The activation path is
+  estimator-specific: `CallawaySantAnna` synthesizes `SurveyDesign(psu=X)`
+  internally and routes through the shared PSU-meat machinery
+  (`_compute_stratified_psu_meat`); `TripleDifference` computes the
+  algebraically equivalent CR1 directly from cluster-summed IFs inline at
+  `triple_diff.py` (no SurveyDesign synthesis — the IF is already in scope
+  at the SE call site). Both produce the same numerical result.
 - `classical`, `hc2`, `hc2_bm` are **N/A** for IF-based estimators —
   hat-matrix leverage and Bell-McCaffrey Satterthwaite DOF are defined on a
   single regression's design matrix, and IF-based estimators have no
@@ -342,14 +348,16 @@ Sant'Anna 2021 for `CallawaySantAnna`; Borusyak-Jaravel-Spiess 2024 for
 This split is a structural property of the estimator's variance derivation,
 not a missing feature. The `vcov_type` input contract for IF-based estimators
 is **permanently narrow** at `{"hc1"}`. Enforced today on
-`CallawaySantAnna`; the same narrow contract is expected when
-`ImputationDiD` and `EfficientDiD` reach `vcov_type` threading.
+`CallawaySantAnna` and `TripleDifference`; the same narrow contract is
+expected when `ImputationDiD` and `EfficientDiD` reach `vcov_type` threading.
 
-**Note:** This routing is a documented synthesis: the
-`SurveyDesign(psu=...)` synthesis is the new wiring; the downstream
-PSU-meat machinery (`_compute_stratified_psu_meat`) is the established
-survey-side path; the CR1 Liang-Zeger algebra on IF is Williams (2000) /
-Hansen (2007). No new methodology is introduced.
+**Note:** This routing is a documented synthesis. The clustered-`hc1`
+activation path is estimator-specific: `CallawaySantAnna` synthesizes
+`SurveyDesign(psu=X)` internally and routes through the existing
+PSU-meat machinery (`_compute_stratified_psu_meat`); `TripleDifference`
+computes the algebraically equivalent CR1 directly from cluster-summed
+IFs inline. The CR1 Liang-Zeger algebra on the IF is Williams (2000) /
+Hansen (2007) in both cases — no new methodology is introduced.
 
 ---
 
@@ -2033,6 +2041,8 @@ contract changes.
 - [x] ATT and SE match R within <0.001% for all methods and DGP types
 - [x] Survey design support: all methods (reg, IPW, DR) with weighted OLS/logit + TSL on combined influence functions. Weighted solve_logit() for propensity scores in IPW/DR paths.
 - **Note:** TripleDifference survey SE: for IPW/DR, pairwise IFs incorporate survey weights via weighted Riesz representers (`riesz *= weights`), so the combined IF is divided by per-observation survey weights (`inf / sw`) before passing to `compute_survey_vcov()` to prevent double-weighting. For regression (RA), pairwise IFs are already on the unweighted residual scale (WLS fits use weights internally but the IF is not Riesz-multiplied), so the combined IF passes directly to TSL without de-weighting. The OLS nuisance IF corrections in DR mode use weighted cross-products normalized by subgroup row count `n` (not `sum(weights)`).
+- **Note (vcov_type contract):** `vcov_type` is permanently narrow to `{"hc1"}` per the IF-based variance decomposition. Analytical-sandwich families `{classical, hc2, hc2_bm}` are rejected at `__init__` with a methodology-rooted message citing Ortiz-Villavicencio & Sant'Anna (2025) — the 3-pairwise-DiD decomposition has no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined. `cluster=` continues to invoke Liang-Zeger CR1 on the combined influence function (`(G/(G-1)) · Σ_c (Σ_{i∈c} ψ_i)² / n²`, plain CR1 — no Stata-style `(n-1)/(n-p)` finite-sample factor because the IF has no design-matrix `p` in the OLS sense); `survey_design=` continues to invoke TSL on the combined IF. `vcov_type='conley'` is deferred to the TripleDifference Conley follow-up row in `TODO.md`. See ["IF-based variance estimators vs analytical-sandwich estimators"](#if-based-variance-estimators-vs-analytical-sandwich-estimators) above for the structural taxonomy.
+- **Note (`cluster=` + replicate-weight survey rejection):** `TripleDifference(cluster=X)` + `SurveyDesign(replicate_weights=[...], replicate_method=...)` is rejected at `fit()` with `NotImplementedError`. Replicate-weight variance is computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores PSU/cluster entirely (the survey-side gate at `survey.py:104-109` enforces `replicate_weights` are mutually exclusive with `strata`/`psu`/`fpc`); honoring `cluster=` here would silently have no effect on the variance estimate while populating `cluster_name`/`n_clusters` on Results dishonestly. Mirrors the `CallawaySantAnna` guard at `staggered.py:1705-1719`. Either omit `cluster=` (the replicate weights encode the design structure implicitly) or use a non-replicate survey design with explicit `strata`/`psu`/`fpc`.
 
 ---
 
diff --git a/tests/test_triple_diff.py b/tests/test_triple_diff.py
index 18c6f0c8..7910ba47 100644
--- a/tests/test_triple_diff.py
+++ b/tests/test_triple_diff.py
@@ -13,6 +13,7 @@
 import pandas as pd
 import pytest
 
+from diff_diff.survey import SurveyDesign
 from diff_diff.triple_diff import (
     TripleDifference,
     TripleDifferenceResults,
@@ -1139,3 +1140,479 @@ def test_cluster_changes_ses(self):
             f"({res_unit.se:.6f}) — the cluster= parameter may "
             "have regressed to a silent no-op."
         )
+
+
+def _ddd_survey_panel(seed: int = 71, n: int = 400) -> pd.DataFrame:
+    """Cross-sectional DDD data with survey columns for vcov_type bit-equal tests.
+
+    Mirrors ``tests/test_survey_phase3.py::ddd_survey_data`` but uses
+    ``default_rng`` for reproducibility independent of global state.
+    """
+    rng = np.random.default_rng(seed)
+    data = pd.DataFrame(
+        {
+            "outcome": rng.standard_normal(n) + 0.5,
+            "group": rng.choice([0, 1], n),
+            "partition": rng.choice([0, 1], n),
+            "time": rng.choice([0, 1], n),
+            "weight": rng.uniform(0.5, 2.0, n),
+            "stratum": rng.choice([1, 2, 3], n),
+        }
+    )
+    mask = (data["group"] == 1) & (data["partition"] == 1) & (data["time"] == 1)
+    data.loc[mask, "outcome"] += 1.5
+    return data
+
+
+def _ddd_replicate_panel(seed: int = 89, n: int = 200, n_rep: int = 10):
+    """DDD panel with JK1 replicate-weight columns for testing the
+    replicate-variance inference branch. Mirrors the pattern in
+    ``tests/test_survey_phase6.py::test_triple_diff_replicate_all_methods``
+    but uses ``default_rng`` for reproducibility independent of global state.
+
+    Returns (DataFrame with outcome/group/partition/time/weight + rep_0..rep_{n_rep-1},
+    list of replicate column names).
+    """
+    rng = np.random.default_rng(seed)
+    d1 = np.repeat([0, 1], n // 2)
+    d2 = np.tile([0, 1], n // 2)
+    post = rng.choice([0, 1], n)
+    y = 1.0 + 0.5 * d1 + 0.3 * d2 + 2.0 * d1 * d2 * post + rng.standard_normal(n) * 0.5
+    w = 1.0 + rng.exponential(0.3, n)
+    data = pd.DataFrame(
+        {
+            "outcome": y,
+            "group": d1,
+            "partition": d2,
+            "time": post,
+            "weight": w,
+        }
+    )
+    cluster_size = n // n_rep
+    rep_cols = []
+    for r in range(n_rep):
+        w_r = w.copy()
+        start = r * cluster_size
+        end = min((r + 1) * cluster_size, n)
+        w_r[start:end] = 0.0
+        w_r[w_r > 0] *= n_rep / (n_rep - 1)
+        col = f"rep_{r}"
+        data[col] = w_r
+        rep_cols.append(col)
+    return data, rep_cols
+
+
+class TestTripleDifferenceVcovType:
+    """Phase 1b interstitial #2: vcov_type input contract on TripleDifference.
+
+    TripleDifference uses IF-based variance per Ortiz-Villavicencio &
+    Sant'Anna (2025); vcov_type is permanently narrow to {"hc1"}.
+    Analytical-sandwich families {classical, hc2, hc2_bm} and conley are
+    rejected at __init__ with methodology-rooted messages. Mirrors CS
+    PR #487 template at tests/test_staggered.py.
+
+    5-surface matrix:
+      1. Default preserved bit-equally (3 estimation methods)
+      2. Cluster path preserved bit-equally
+      3. Survey path preserved bit-equally
+      4. Input rejection at __init__ (methodology terminology)
+      5. fit()-time revalidation (set_params can't bypass)
+
+    Plus introspection tests for Results carrier, summary render,
+    to_dict, get_params, fit-clone idempotence, and convenience function.
+    """
+
+    # -- Surface 1: default behavior preserved bit-equally ---------------
+
+    @pytest.mark.parametrize("method", ["dr", "reg", "ipw"])
+    def test_default_hc1_bit_equal_baseline(self, method):
+        """vcov_type='hc1' (explicit) is bit-equal to the default for every
+        estimation method. Guards against drift between __init__ defaults
+        and Results construction when vcov_type was threaded through."""
+        data = generate_ddd_data(n_per_cell=80, true_att=2.0, seed=11)
+
+        r_default = TripleDifference(estimation_method=method).fit(
+            data, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        r_explicit = TripleDifference(estimation_method=method, vcov_type="hc1").fit(
+            data, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        assert r_default.att == r_explicit.att, f"[{method}] ATT not bit-equal"
+        assert r_default.se == r_explicit.se, f"[{method}] SE not bit-equal"
+
+    # -- Surface 2: cluster path preserved bit-equally -------------------
+
+    def test_cluster_hc1_bit_equal_baseline(self):
+        """cluster=<col> + vcov_type='hc1' bit-equal to cluster=<col> alone."""
+        data = _generate_ddd_data_with_state_clusters(seed=23)
+
+        r_default = TripleDifference(cluster="state").fit(
+            data, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        r_explicit = TripleDifference(cluster="state", vcov_type="hc1").fit(
+            data, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        assert r_default.att == r_explicit.att
+        assert r_default.se == r_explicit.se
+
+    # -- Surface 3: survey path preserved bit-equally --------------------
+
+    @pytest.mark.parametrize("method", ["dr", "reg", "ipw"])
+    def test_survey_hc1_bit_equal_baseline(self, method):
+        """survey_design + vcov_type='hc1' bit-equal to survey_design alone.
+
+        Pre-empt: CS PR #487 R2 caught a survey_metadata overload bug;
+        same risk class here when threading vcov_type alongside survey_design.
+        """
+        data = _ddd_survey_panel(seed=29)
+        sd = SurveyDesign(weights="weight", strata="stratum")
+
+        r_default = TripleDifference(estimation_method=method).fit(
+            data,
+            outcome="outcome",
+            group="group",
+            partition="partition",
+            time="time",
+            survey_design=sd,
+        )
+        r_explicit = TripleDifference(estimation_method=method, vcov_type="hc1").fit(
+            data,
+            outcome="outcome",
+            group="group",
+            partition="partition",
+            time="time",
+            survey_design=sd,
+        )
+        assert r_default.att == r_explicit.att, f"[{method}] survey ATT not bit-equal"
+        assert r_default.se == r_explicit.se, f"[{method}] survey SE not bit-equal"
+
+    # -- Surface 3b: replicate-weight survey path preserved bit-equally --
+
+    @pytest.mark.parametrize("method", ["dr", "reg", "ipw"])
+    def test_replicate_survey_hc1_bit_equal_baseline(self, method):
+        """Replicate-weight survey design + vcov_type='hc1' bit-equal to
+        replicate-weight survey design alone. Exercises the distinct
+        replicate-df branch in fit() (separate from the TSL branch in
+        Surface 3 above).
+
+        Addresses codex R5 P1 (.claude/reviews/local-review-latest.md):
+        the prior survey bit-equal coverage only exercised the analytical
+        TSL path; the replicate-variance path was unverified."""
+        data, rep_cols = _ddd_replicate_panel(seed=89)
+        sd = SurveyDesign(
+            weights="weight",
+            replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        r_default = TripleDifference(estimation_method=method).fit(
+            data,
+            outcome="outcome",
+            group="group",
+            partition="partition",
+            time="time",
+            survey_design=sd,
+        )
+        r_explicit = TripleDifference(estimation_method=method, vcov_type="hc1").fit(
+            data,
+            outcome="outcome",
+            group="group",
+            partition="partition",
+            time="time",
+            survey_design=sd,
+        )
+        assert r_default.att == r_explicit.att, f"[{method}] replicate ATT not bit-equal"
+        assert r_default.se == r_explicit.se, f"[{method}] replicate SE not bit-equal"
+        # Results-surface assertion: vcov_type carries through on the
+        # replicate path AND summary still suppresses the raw variance
+        # line (survey block remains the canonical surface).
+        assert r_explicit.vcov_type == "hc1"
+        assert r_explicit.survey_metadata is not None
+        out = r_explicit.summary()
+        assert "Survey Design" in out
+        assert "Variance estimator" not in out
+
+    @pytest.mark.parametrize("method", ["dr", "reg", "ipw"])
+    def test_cluster_plus_replicate_weights_rejected(self, method):
+        """cluster= + survey_design(replicate_weights=...) raises
+        NotImplementedError because replicate-weight variance is computed
+        by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores
+        PSU/cluster entirely — honoring the cluster argument would silently
+        have no effect on the variance estimate.
+
+        Addresses codex R7 P1 (.claude/reviews/local-review-latest.md):
+        the silent no-op was caught by direct interpreter inspection of
+        the new JK1 replicate fixture. Mirrors CallawaySantAnna's guard
+        at diff_diff/staggered.py:1705-1719 (CS PR #487)."""
+        data, rep_cols = _ddd_replicate_panel(seed=89)
+        # Add a 'state' column to attempt as the cluster argument
+        rng = np.random.default_rng(seed=89)
+        data["state"] = rng.choice(range(5), size=len(data))
+        sd = SurveyDesign(
+            weights="weight",
+            replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        with pytest.raises(NotImplementedError, match="replicate-weight"):
+            TripleDifference(estimation_method=method, cluster="state").fit(
+                data,
+                outcome="outcome",
+                group="group",
+                partition="partition",
+                time="time",
+                survey_design=sd,
+            )
+
+    # -- Surface 4: input rejection at __init__ --------------------------
+
+    def test_reject_classical_at_init(self):
+        with pytest.raises(ValueError, match="influence-function"):
+            TripleDifference(vcov_type="classical")
+
+    def test_reject_hc2_at_init(self):
+        with pytest.raises(ValueError, match="Ortiz-Villavicencio"):
+            TripleDifference(vcov_type="hc2")
+
+    def test_reject_hc2_bm_at_init(self):
+        with pytest.raises(ValueError, match="hat matrix"):
+            TripleDifference(vcov_type="hc2_bm")
+
+    def test_reject_hc2_bm_at_init_bm_keyword(self):
+        """Distinct keyword pin: Bell-McCaffrey terminology in the message."""
+        with pytest.raises(ValueError, match="Bell-McCaffrey"):
+            TripleDifference(vcov_type="hc2_bm")
+
+    def test_reject_conley_at_init(self):
+        with pytest.raises(ValueError, match="spatial-HAC"):
+            TripleDifference(vcov_type="conley")
+
+    def test_reject_conley_at_init_todo_pointer(self):
+        """Conley rejection cites the TODO follow-up row."""
+        with pytest.raises(ValueError, match="TODO"):
+            TripleDifference(vcov_type="conley")
+
+    def test_reject_unknown_vcov_type(self):
+        """Generic membership rejection for unrecognized values."""
+        with pytest.raises(ValueError, match="invalid"):
+            TripleDifference(vcov_type="hc4")
+
+    # -- Surface 5: fit()-time revalidation (set_params can't bypass) ----
+
+    def test_set_params_bad_vcov_caught_at_fit_time(self):
+        """set_params is strict-mirror sklearn (no atomic validation), but
+        fit() re-validates so a bad set_params(vcov_type='hc4') surfaces a
+        clear error at fit-time rather than silently propagating a bad
+        value to Results metadata. Mirrors CS
+        tests/test_staggered.py::test_set_params_bad_vcov_caught_at_fit_time."""
+        td = TripleDifference()
+        # set_params succeeds (sklearn-style mutate-then-validate-at-use)
+        td.set_params(vcov_type="hc4")
+        assert td.vcov_type == "hc4"
+        # fit() re-validates and raises
+        data = generate_ddd_data(n_per_cell=40, true_att=2.0, seed=37)
+        with pytest.raises(ValueError, match="hc4"):
+            td.fit(
+                data,
+                outcome="outcome",
+                group="group",
+                partition="partition",
+                time="time",
+            )
+
+    def test_set_params_bad_vcov_classical_caught_at_fit_time(self):
+        """Same as above but with an IF-incompatible family (classical).
+        Catches the silent-propagation path on the methodology-rooted
+        rejection branch."""
+        td = TripleDifference()
+        td.set_params(vcov_type="classical")
+        data = generate_ddd_data(n_per_cell=40, true_att=2.0, seed=39)
+        with pytest.raises(ValueError, match="influence-function"):
+            td.fit(
+                data,
+                outcome="outcome",
+                group="group",
+                partition="partition",
+                time="time",
+            )
+
+    # -- Introspection contract -------------------------------------------
+
+    def test_default_vcov_type_is_hc1(self):
+        """Attribute default sanity (pre-fit)."""
+        assert TripleDifference().vcov_type == "hc1"
+
+    def test_get_params_includes_vcov_type(self):
+        td = TripleDifference()
+        params = td.get_params()
+        assert "vcov_type" in params
+        assert params["vcov_type"] == "hc1"
+
+    def test_results_carries_vcov_type(self):
+        data = generate_ddd_data(n_per_cell=40, true_att=2.0, seed=43)
+        res = TripleDifference().fit(
+            data, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        assert res.vcov_type == "hc1"
+
+    def test_to_dict_includes_vcov_type(self):
+        """CS R7 caught the same Results-introspection gap on the dict surface."""
+        data = generate_ddd_data(n_per_cell=40, true_att=2.0, seed=47)
+        res = TripleDifference().fit(
+            data, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        d = res.to_dict()
+        assert "vcov_type" in d
+        assert d["vcov_type"] == "hc1"
+
+    def test_summary_includes_vcov_type(self):
+        """Default (no cluster, no survey) renders the variance-family label
+        via the shared _format_vcov_label, not the raw vcov_type string."""
+        data = generate_ddd_data(n_per_cell=40, true_att=2.0, seed=51)
+        res = TripleDifference().fit(
+            data, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        out = res.summary()
+        assert "Variance estimator" in out
+        assert "HC1 heteroskedasticity-robust" in out
+
+    def test_summary_cluster_label_is_cr1_not_raw_hc1(self):
+        """Cluster fit renders the cluster-aware CR1 Liang-Zeger label rather
+        than 'hc1', since the actual algebra is CR1 on the combined IF.
+        Addresses codex local-review P2 — raw 'hc1' line was misleading."""
+        data = _generate_ddd_data_with_state_clusters(seed=53)
+        res = TripleDifference(cluster="state").fit(
+            data, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        out = res.summary()
+        assert "CR1 cluster-robust at state" in out
+        # G=<n_clusters> suffix present
+        assert f"G={res.n_clusters}" in out
+
+    def test_summary_no_variance_estimator_line_under_survey(self):
+        """Survey fit suppresses the variance-estimator line; the Survey Design
+        block above already names design + n_psu + df. The analytical SE is
+        TSL on the combined IF (or replicate refit), not the raw hc1 sandwich,
+        so a 'Variance estimator: hc1' line would be misleading. Addresses
+        codex local-review P2 + P3 (summary regression coverage gap)."""
+        data = _ddd_survey_panel(seed=29)
+        sd = SurveyDesign(weights="weight", strata="stratum")
+        res = TripleDifference(estimation_method="reg").fit(
+            data,
+            outcome="outcome",
+            group="group",
+            partition="partition",
+            time="time",
+            survey_design=sd,
+        )
+        out = res.summary()
+        # The Survey Design block remains the canonical surface
+        assert "Survey Design" in out
+        # No misleading variance-estimator line on survey-backed fits
+        assert "Variance estimator" not in out
+
+    def test_results_cluster_name_carries_through(self):
+        """cluster_name field on Results: populated when cluster= set, None otherwise.
+        Mirrors CS PR #487 pattern; consumed by _format_vcov_label in summary()."""
+        data = generate_ddd_data(n_per_cell=40, true_att=2.0, seed=63)
+        r_none = TripleDifference().fit(
+            data, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        assert r_none.cluster_name is None
+
+        data2 = _generate_ddd_data_with_state_clusters(seed=67)
+        r_cluster = TripleDifference(cluster="state").fit(
+            data2, outcome="outcome", group="group", partition="partition", time="time"
+        )
+        assert r_cluster.cluster_name == "state"
+        # And it flows through to_dict
+        d = r_cluster.to_dict()
+        assert d.get("cluster_name") == "state"
+
+    def test_cluster_name_suppressed_under_survey_design(self):
+        """When survey_design overrides the bare cluster= argument, the Results
+        cluster_name + n_clusters fields are suppressed so they don't misreport
+        the ignored argument. The Survey Design block on summary() is the
+        canonical surface for cluster/PSU reporting on survey-backed fits.
+
+        Addresses codex local-review R2 P2 (.claude/reviews/local-review-latest.md):
+        Under cluster='state' + survey_design(psu='psu') with conflicting
+        partitions, _resolve_effective_cluster picks survey_design.psu and
+        warns; the records on Results should reflect that, not the raw
+        `self.cluster` argument the user passed."""
+        # Build DDD survey panel with BOTH a 'state' column (user's cluster=)
+        # and a 'psu' column (survey_design.psu) at DIFFERENT partitions.
+        # The survey-design PSU wins; cluster= is overridden with a warning.
+        data = _ddd_survey_panel(seed=83).copy()
+        rng = np.random.default_rng(seed=83)
+        # 'psu' is a coarser partition than 'state' — distinct grouping
+        data["state"] = rng.choice(range(20), size=len(data))
+        data["psu"] = rng.choice(range(5), size=len(data))
+
+        sd = SurveyDesign(weights="weight", psu="psu")
+        with pytest.warns(UserWarning, match="PSU will be used"):
+            res = TripleDifference(estimation_method="reg", cluster="state").fit(
+                data,
+                outcome="outcome",
+                group="group",
+                partition="partition",
+                time="time",
+                survey_design=sd,
+            )
+        # cluster_name + n_clusters suppressed under survey-backed fit
+        assert res.cluster_name is None, (
+            f"cluster_name should be suppressed under survey-backed fit, "
+            f"got {res.cluster_name!r} (the raw cluster= argument)"
+        )
+        assert res.n_clusters is None, (
+            f"n_clusters should be suppressed under survey-backed fit, "
+            f"got {res.n_clusters} (would be raw data['state'].nunique())"
+        )
+        # And to_dict doesn't leak the misleading raw cluster
+        d = res.to_dict()
+        assert "cluster_name" not in d or d.get("cluster_name") is None
+        assert "n_clusters" not in d or d.get("n_clusters") is None
+        # Survey block remains the canonical surface for cluster/PSU reporting
+        assert "Survey Design" in res.summary()
+        assert res.survey_metadata is not None
+
+    def test_fit_clone_idempotent_on_vcov_type(self):
+        """get_params -> reconstruct -> refit -> identical SE.
+        Catches drift between __init__ defaults, attribute storage, and
+        Results construction (sklearn clone() pattern)."""
+        data = generate_ddd_data(n_per_cell=40, true_att=2.0, seed=57)
+        td1 = TripleDifference(vcov_type="hc1")
+        r1 = td1.fit(data, outcome="outcome", group="group", partition="partition", time="time")
+        td2 = TripleDifference(**td1.get_params())
+        r2 = td2.fit(data, outcome="outcome", group="group", partition="partition", time="time")
+        assert r1.att == r2.att
+        assert r1.se == r2.se
+        assert r2.vcov_type == "hc1"
+
+    # -- Convenience function threading ----------------------------------
+
+    def test_triple_difference_convenience_func_rejects_invalid_vcov_type(self):
+        """Invalid vcov_type rejected at the function entry point too."""
+        data = generate_ddd_data(n_per_cell=40, true_att=2.0, seed=59)
+        with pytest.raises(ValueError, match="influence-function"):
+            triple_difference(
+                data,
+                outcome="outcome",
+                group="group",
+                partition="partition",
+                time="time",
+                vcov_type="classical",
+            )
+
+    def test_triple_difference_convenience_func_threads_valid_vcov_type(self):
+        """Valid vcov_type='hc1' fits successfully AND lands on Results."""
+        data = generate_ddd_data(n_per_cell=40, true_att=2.0, seed=61)
+        res = triple_difference(
+            data,
+            outcome="outcome",
+            group="group",
+            partition="partition",
+            time="time",
+            vcov_type="hc1",
+        )
+        assert res.vcov_type == "hc1"
+        assert np.isfinite(res.att)
+        assert np.isfinite(res.se)