diff --git a/CHANGELOG.md b/CHANGELOG.md
index ff5e4c7e..bf72670a 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,7 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Fixed
+- **`CallawaySantAnna.cluster=` silent no-op (Phase 1b interstitial).** `CallawaySantAnna(cluster="state").fit(...)` previously accepted the argument, stored it, returned it from `get_params()`, but never consumed it anywhere in the fit / aggregator / bootstrap pipeline (`staggered.py:154-156` docstring claimed "Defaults to unit-level clustering" — but for bare `cluster=X`, the aggregator at `staggered_aggregation.py:193-213` computed per-unit IF variance regardless, and the bootstrap at `staggered_bootstrap.py:323-347` drew per-unit multiplier weights regardless). Users who explicitly set `cluster="state"` got per-unit inference with no warning — typically SE too small under intra-cluster correlation. **Survey-PSU clustering via `survey_design=SurveyDesign(psu="state")` was NOT affected** and continued to cluster correctly via `_compute_stratified_psu_meat`. The fix synthesizes a minimal `SurveyDesign(psu=self.cluster, weight_type="pweight")` when bare `cluster=` is set without an explicit survey design, threading the synthesized PSU through the existing survey-PSU machinery (aggregator + bootstrap). A new dedicated `df_inference` field on `CallawaySantAnnaResults` carries the cluster-level df for the bare-cluster-synthesize path ONLY (where `survey_metadata` is intentionally `None` to preserve the `DiagnosticReport.survey_metadata is not None` skip at `diagnostic_report.py:848-856` + `:1150-1158` for "Original fit used a survey design" reasoning, and the `summary()` survey block render at `staggered_results.py:235-238`). `HonestDiD` at `honest_did.py` prefers `survey_metadata.df_survey` first (the actual CS-internal df, which may be tightened post-resolve for replicate designs) and falls back to `df_inference` for bare-cluster fits — so downstream consumers always see the cluster df without overriding the post-recompute survey df. When `survey_design=SurveyDesign(weights=Y)` without PSU is provided AND `cluster=X` is also set, `_inject_cluster_as_psu` injects the bare cluster as the effective PSU AND an `effective_survey_design = replace(survey_design, psu=self.cluster)` is constructed so the downstream `_validate_unit_constant_survey` catches movers (units crossing clusters across periods) on panel data via the now-PSU-bearing design; `survey_metadata` is recomputed to reflect the injected PSU. When both `cluster=X` AND `survey_design.psu=Y` are set, the explicit PSU wins via `_resolve_effective_cluster` (emits `UserWarning` if partitions differ). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`**: replicate-weight variance is computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores PSU/cluster entirely (`survey.py:104-109` enforces replicate_weights are mutually exclusive with strata/psu/fpc); honoring bare `cluster=` would silently have no effect while populating `cluster_name`/`n_clusters` on Results dishonestly. Assertive regression tests pin the fix on both panel and repeated-cross-section paths plus the survey/non-survey contract boundaries: `test_cluster_robust_ses_differ_from_unit_level`, `test_bare_cluster_works_with_panel_false_rcs`, `test_bare_cluster_synthesizes_survey_design`, `test_inject_branch_panel_mover_raises`, `test_replicate_weight_plus_cluster_rejected`, `test_bare_cluster_populates_df_inference` (asserts the dedicated cluster-df carrier is set), `test_bare_cluster_does_not_set_survey_metadata` (asserts the survey/non-survey contract is preserved — DiagnosticReport / summary() must not treat a bare-cluster fit as survey-backed), `test_explicit_survey_design_does_populate_survey_metadata` (asserts the inject-branch path still populates survey_metadata for legitimate user-provided SurveyDesign), and `test_bare_cluster_honest_did_uses_df_inference` (end-to-end: HonestDiD threads df_inference into HonestDiDResults.df_survey, preventing silent normal-theory regression on a future refactor). When `cluster=None` (default), behavior is bit-equal to pre-PR (wiring guarded by `if self.cluster is not None:`). Audit verified the no-op was CS-specific — the other 7 Phase 1b estimators (SunAbraham, StackedDiD, WooldridgeDiD, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD) handle bare `cluster=` correctly.
+
 ### Added
+- **CallawaySantAnna `vcov_type` input contract (Phase 1b interstitial, permanently narrow).** `CallawaySantAnna(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages. The rejection is **library-architectural, not paper-prescribed**: CS uses influence-function-based variance per Callaway & Sant'Anna (2021) — per-(g,t) doubly-robust / IPW / outcome-regression structure — and has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to other IF-based estimators (ImputationDiD, EfficientDiD) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (Williams 2000 form); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the IF activated via the cluster= wiring fix above. Documentation in `docs/methodology/REGISTRY.md` "IF-based variance estimators vs analytical-sandwich estimators" subsection. `vcov_type`, `cluster_name`, `n_clusters`, `df_inference` added to `CallawaySantAnnaResults` (the canonical PSU column wins for `cluster_name` reporting — `survey_design.psu` when explicit PSU is provided, `self.cluster` when bare cluster synthesizes/injects). `set_params(vcov_type=...)` mirrors SA pattern (mutate-then-refresh `_vcov_type_explicit`, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR** rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by CS's IF-based variance, not a deferral. Phase 1b PR 4/8 (full {classical, hc1, hc2, hc2_bm} threading) resumes on a different estimator post-merge.
+- **TripleDifference cluster-changes-SE defensive regression test.** Added `tests/test_triple_diff.py::TestTripleDifferenceClusterDefensive::test_cluster_changes_ses` asserting that `TripleDifference(cluster="state")` produces SE differing from `cluster=None` SE by `>1e-6` on a fixed-seed panel with state-level random effects. Defensive coverage closes a test gap identified during the Phase 1b cluster-wiring audit; TripleDifference's bare-cluster code path (`triple_diff.py:1245-1259`) was already correct but lacked a positive regression test. Mirrors `tests/test_two_stage.py::test_cluster_changes_ses`.
 - **TwoStageDiD: parity with SpilloverDiD Wave E.3 — always-treated unit drop preserves full-domain survey design via zero-padded scores.** Closes the parity follow-up tracked at `TODO.md` after PR #482 (SpilloverDiD Wave E.3, merge `24de9062`). When TwoStageDiD detects always-treated units (`first_treat <= min_time`) and removes them from the OLS sample, the resolved survey design retains its FULL-DOMAIN `n_psu` / `n_strata` / `df_survey` / `strata` / `fpc` / `psu` arrays instead of being subsetted via `replace(resolved_survey, ...)`. Per-cluster stage-1 / stage-2 score aggregates are computed at the post-drop fit-sample length and then zero-padded onto the full-domain unique-PSU list before stratified-meat dispatch via two new optional kwargs on `_compute_gmm_variance`: `score_pad_mask` (full-domain boolean keep mask) and `cluster_ids_full` (full-domain post-injection PSU labels). PSUs containing only always-treated rows get zero score rows but still count toward `G_full` for `n_psu` / `df_survey` accounting. **Documented synthesis (library-convention adoption, NOT new methodology):** adopts the canonical "zero-pad scores + retain full-design resolved survey" convention from R `survey::svyrecvar(subset())` (Lumley 2010 §2.5) already established in `diff_diff/imputation.py:2175-2183` (PreTrendsImputation), `diff_diff/prep.py:1401-1432` (DCDH cell variance), and `diff_diff/spillover.py` (PR #482 Wave E.3). **Mechanical realization:** `two_stage.py:1485-1525` design-subset block deleted (the `replace(resolved_survey, ...)` subset + `n_psu` / `n_strata` recompute + post-drop `compute_survey_metadata` call); `keep_mask` promoted to `fit()`-level scope (always defined, all-True when no always-treated drop); `survey_weights = survey_weights[keep_mask.values]` retained for stage-1 / stage-2 OLS arithmetic; cluster injection block updated to source `cluster_ids_raw` from FULL-DOMAIN `data[cluster_var].values` (not post-drop `df[cluster_var].values`) so `_inject_cluster_as_psu`'s zip against `resolved_survey.strata` (full-domain) stays length-aligned; `df["_survey_cluster"]` aligned to post-drop length via `resolved_survey.psu[keep_mask.values]`; post-injection `compute_survey_metadata` uses full-domain `raw_w` from `data[survey_design.weights]`. `_compute_gmm_variance` adds the zero-pad expansion after the per-cluster aggregation (mapping fit-sample `unique_clusters` into `unique_clusters_full` positions via `np.searchsorted`) and updates the strata/fpc `obs_idx` lookups to use `cluster_ids_for_lookup = cluster_ids_full` when padding is active. The three inner stage-2 methods (`_stage2_static`, `_stage2_event_study`, `_stage2_group`) thread the new kwargs through; bootstrap-resample call sites keep default `None` (no behavior change on bootstrap path). **Always-treated warning text updated:** "Associated survey weights subsetted for stage-1 / stage-2 OLS; full-domain survey design retained for variance estimation (Wave E.3 parity)." replaces the prior "and design arrays adjusted" claim. **No-survey path unchanged:** when `resolved_survey is None`, both `score_pad_mask` and `cluster_ids_full` default to `None` and the existing post-drop scoring path runs bit-identically. **Replicate variance + always-treated drop:** existing path unchanged (replicate refit handles resampling at the survey-design level; `score_pad_mask_arg` is `None` on `_uses_replicate_ts` paths). **Tests:** new `TestTwoStageDiDWaveE3ParityAlwaysTreated` class in `tests/test_two_stage.py` (8 tests: no-always-treated baseline, full-domain `df_survey` preservation under drop, full-domain `n_psu` reporting, per-cluster zero-pad mock-spy on `_compute_stratified_meat_from_psu_scores`, subpopulation + always-treated composition, cluster-as-PSU + always-treated, no-survey path unchanged, PSU entirely-always-treated). REGISTRY.md TwoStageDiD section gains a "documented synthesis — Wave E.3 parity" note; SpilloverDiD Wave E.3 section updated to mark the TwoStageDiD parity follow-up as shipped.
 - **WooldridgeDiD `vcov_type` parameter, OLS path (Phase 1b PR 3/8).** `WooldridgeDiD(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` on `method="ols"` (defaults to `"hc1"`, preserves prior behavior at machine precision — the WLS-CR1 sandwich is algebraically invariant between the prior within-transform path and the new branched path, differing only by float64 multiplication ordering at sub-ULP scale; the full 106-test `tests/test_wooldridge.py` baseline still passes unchanged). `hc2_bm` auto-routes to a full-dummy saturated design (`[intercept, X_design, unit_dummies, time_dummies]`) + clubSandwich WLS-CR2 algebra (PR #475) — matches `clubSandwich::vcovCR(lm(...), type="CR2") + coef_test()$df_Satt` at `atol=1e-10` on the new `benchmarks/data/wooldridge_golden.json` fixture. `classical`/`hc2` supported via full-dummy + auto-drop of the unit auto-cluster (one-way families); explicit `cluster="X"` + one-way family raises at the linalg validator. Per-cell + aggregate p-values/CIs on `classical`/`hc2` paths use the residual DOF `n - rank(X)` (matches R `lm()` / `coef_test()` t-distribution), not normal-theory. **Bell-McCaffrey Satterthwaite DOF is threaded across ALL hc2_bm user-facing inference surfaces**: (1) per-cell `group_time_effects[(g, t)]` use `coef_test()$df_Satt` (matches R at atol=1e-6 from CI inversion); (2) overall ATT uses the post-period-aggregation contrast DOF from `_compute_cr2_bm_contrast_dof` (matches R `Wald_test(test="HTZ")$df_denom` at atol=1e-10); (3) `.aggregate("group" | "calendar" | "event")` recomputes contrast-specific BM DOFs lazily from BM artifacts stored on the Results object — the REDUCED kept-column design (`X_red`), cluster_ids, reduced bread matrix, and reduced-space coef-index map (using the reduced kept-column design after rank-deficient drops keeps the bread non-singular and matches the subspace `solve_ols` actually estimated in). Fail-closed (all-NaN inference) when BM DOF unavailable, mirrors PR #475 R7 and PR #479 R3. `method ∈ {"logit","poisson"}` + `vcov_type != "hc1"` raises `NotImplementedError` at `__init__` (GLM CR2-BM-on-pseudo-residuals composition needs derivation; deferred to follow-up TODO row). `SurveyDesign` + `vcov_type != "hc1"` raises `NotImplementedError` at `fit()` (survey TSL overrides analytical sandwich). `n_bootstrap > 0` + one-way (`hc2`/`classical`) raises at `fit()` regardless of `cluster=` setting (multiplier bootstrap is intrinsically clustered, but one-way vcov_type does not compose with cluster_ids — either the auto-cluster is dropped when `cluster=None` leaving the bootstrap with no cluster to draw at, or the linalg validator rejects one-way + cluster_ids when `cluster=X`). `conley` rejected at `__init__` with a deferral pointer. `vcov_type`, `cluster_name`, `n_clusters` added to `WooldridgeDiDResults` for downstream introspection (per `feedback_results_vcov_label_cluster_metadata`). Third PR of the Phase 1b standalone-estimator threading initiative (5 PRs to follow: CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD).
 - **`SpilloverDiD(survey_design=SurveyDesign.subpopulation(...))` full-design retention via zero-pad scores (Wave E.3).** Closes the Wave E.1/E.2/follow-up documented limitation at `REGISTRY.md:3249`: `SurveyDesign.subpopulation()`-derived designs AND warn-and-drop fits now preserve the full-domain resolved survey design — `n_psu` / `n_strata` / `df_survey` / Binder TSL per-stratum centering reflect the FULL domain rather than the post-`finite_mask` fit sample. **Documented synthesis (library-convention adoption, NOT new methodology):** Wave E.3 adopts the canonical "zero-pad scores to full panel + retain full-design resolved survey" pattern from R `survey::svyrecvar(subset())` (Lumley 2010 §2.5) already established in `diff_diff/imputation.py:2175-2183` (PreTrendsImputation lead regression — Omega_0 scores zero-padded back to full panel length) and `diff_diff/prep.py:1401-1432` (DCDH cell variance — IF zero-padded outside the cell). Wave E.3 propagates the same convention to SpilloverDiD's Wave E.1 Binder TSL × Wave D Gardner GMM × Wave E.2/follow-up stratified-Conley + serial Bartlett meat. **Mechanical realization (one new `_compute_gmm_corrected_meat` kwarg):** the gamma_hat / Psi build stays on SURVEY-FINITE-MASK inputs (`X_1_sparse_fit`, `X_10_sparse_fit`, `eps_10_fit` built on `survey_finite_mask = finite_mask & survey_weights > 0`; `X_2_kept_gamma`, `eps_2_fit_gamma`, `survey_weights_fit_gamma` projected from the fit-sample frame down to survey_finite_mask) so the drop-first stage-1 FE column space is bit-identical to the pre-E.3 path. `_compute_gmm_corrected_meat` gains a new optional kwarg `score_pad_mask: Optional[np.ndarray] = None`: when supplied, the helper zero-pads the fit-sample `Psi` to full panel length AFTER construction but BEFORE kernel dispatch via `Psi_padded[score_pad_mask] = Psi`. Kernel-dispatch arrays (`cluster_ids`, `conley_coords`, `conley_time`, `conley_unit`, `resolved_survey`) are passed at FULL length so the meat helpers (Binder TSL / stratified-Conley / serial Bartlett) see the full-domain PSU / strata / centroid / time geometry. The `_validate_conley_kwargs` call inside the helper reads `n_for_conley = len(score_pad_mask)` when the kwarg is set so the Conley shape checks see the full-length geometry. **`gamma_hat` invariance:** the gamma_hat solve operates on fit-sample inputs throughout — bit-identical to the pre-E.3 path (critical for the case where `_build_butts_fe_design_csr`'s `pd.factorize` re-compaction would drop a different unit's column under a full-length FE build than under a fit-length one). **Bread invariance:** `A_22 = X_2_kept' W X_2_kept` at `spillover.py:3187-3214` still uses fit-length `X_2_kept` because `A_22_full = X_2_full' W_full X_2_full` equals `A_22_kept` when zero-weight rows contribute zero. **A2 invariant:** warn-and-drop and `SurveyDesign.subpopulation()` drops are treated identically — both apply the zero-pad mechanism. The "both mechanisms compose cleanly" case (subpop-excluded row that is ALSO warn-and-dropped) produces `Psi = 0` from either cause; the PSU still counts toward `n_psu_full`. Hand-computation methodology anchor at `_scratch/wave_e3_smoke.py` codifies the A2 invariant on 4 PSU × 4 period × 3 obs synthetic. **Subpopulation parity vs upstream-subset:** `df_survey` matches the full domain regardless of how many rows the subpopulation mask excludes (mirrors R `svyglm(design=subset(d, mask))` vs `svyglm(design=svydesign(data=data[mask], ...))`). SE may differ by design — subpopulation retains zero-padded PSU geometry; upstream-subset drops PSUs entirely. **Pre-E.3 baseline parity:** when `finite_mask.all() == True` AND all weights `> 0`, the Wave E.3 zero-pad is a no-op — ATT + SE + n_psu + df_survey match pre-E.3 baseline values via FIXED GOLDEN values at `test_c` (`rtol=1e-12, atol=1e-12`). **Cross-surface n_psu consistency:** top-level `res.n_psu` reads from `len(resolved_survey_fit.weights)` on the implicit-PSU branch (was `int(finite_mask.sum())` pre-codex-R1-P2-fix); this keeps `res.n_psu == res.survey_metadata.n_psu` on weights-only / strata-only survey designs under warn-and-drop. Regression at `test_c2`. **Restrictions inherited:** replicate-weight variance + subpopulation continues to raise `NotImplementedError` at the Wave E.1 gate. TwoStageDiD's analogous `finite_mask + design-subset` pattern at `two_stage.py:567-601` is NOT yet adopted to Wave E.3 — separate parity follow-up tracked in `TODO.md` (an expected-divergence test was attempted but TwoStageDiD's always-treated handling at `two_stage.py:294-336` differs from SpilloverDiD's per-unit Omega_0 check, so the divergence didn't materialize on the standard fixture; the parity follow-up should add its own targeted regression). **Implementation:** `spillover.py:2845-2896` design-subset block deleted; `survey_weights_fit = survey_weights[finite_mask]` retained for the stage-2 OLS solve which still operates on the fit sample; `cluster_ids_full[finite_mask]` subset dropped on the survey path. `_compute_gmm_corrected_meat` call at `spillover.py:3163` now receives FIT-LENGTH gamma_hat-construction inputs (unchanged) plus FULL-LENGTH kernel-dispatch arrays (`cluster_ids_for_meat`, `conley_*_for_meat`, `resolved_survey_fit`) plus the new `score_pad_mask=survey_finite_mask` kwarg; no-survey path passes `score_pad_mask=None` and uses fit-length variables throughout (bit-identical to pre-E.3). `_compute_gmm_corrected_meat` at `two_stage.py:62-80` adds one new optional kwarg `score_pad_mask: Optional[np.ndarray] = None` and one post-Psi-construction zero-pad block; the `_validate_conley_kwargs` call uses `n_for_conley = len(score_pad_mask)` when the kwarg is set. Within-unit-constancy validator at `spillover.py:2913` updated to operate on full-length unit array. Second `compute_survey_metadata` recompute at `spillover.py:2954-2959` uses full-length `raw_w`. No `_compute_stratified_meat_from_psu_scores` / `_compute_stratified_conley_meat` / `_compute_stratified_serial_bartlett_meat` signature changes. **Tests:** new `TestSpilloverDiDWaveE3SubpopulationFullDesign` and `TestSpilloverDiDWaveE3SubpopulationFullDesignEventStudy` classes in `tests/test_spillover.py` (19 tests: pre-E.3 baseline parity via pinned goldens, n_psu cross-surface consistency on implicit-PSU branch, A2 invariant (zero-pad mechanics via mock-spy), subpopulation × explicit-PSU parity, conley + lag>0 + subpopulation × explicit-PSU / cluster-injection / weights-only branches, cluster-as-PSU + subpopulation parity, unit with BOTH zero weight AND no Omega_0 support, gamma_hat-build sample excludes zero-weight rows, n_obs / n_treated / n_control / n_far_away_obs reflect count_mask, warn-drop SE drift golden, ATT bit-equality under PSU-last-sort exclusion, exact event-study n_obs propagation, event-study on both is_staggered branches with analytical + conley+lag variants). Pre-existing Wave E.1 `test_p2_finite_mask_forces_drop_under_survey` assertion flipped from `n_psu=8` (subset) to `n_psu=10` (full domain) to reflect the new contract.
diff --git a/TODO.md b/TODO.md
index 8f221f59..f4a6286c 100644
--- a/TODO.md
+++ b/TODO.md
@@ -99,11 +99,13 @@ Deferred items from PR reviews that were not addressed before merge.
 | PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |
 
 
-| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; Phase 1b PR 2/8 added `StackedDiD`; Phase 1b PR 3/8 added `WooldridgeDiD` OLS path (this row tracks the remaining 5). | multiple | Phase 1b | Medium |
+| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; PR 2/8 added `StackedDiD`; PR 3/8 added `WooldridgeDiD` OLS path. **Interstitial PR (post-PR-3/8) addressed `CallawaySantAnna` separately**: CS uses IF-based variance per Callaway & Sant'Anna (2021) Theorem 2, so its `vcov_type` contract is permanently narrow to `{"hc1"}` (analytical-sandwich families don't compose); the interstitial also fixed CS's bare-`cluster=` silent no-op. This row tracks the remaining 4 (ImputationDiD and EfficientDiD are also IF-based and will likely adopt the same narrow contract). | multiple | Phase 1b | Medium |
 | Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
 | Extend `StackedDiD` with `vcov_type="conley"` (Conley spatial-HAC) — thread the six `conley_*` params through `solve_ols` at `stacked_did.py:419` (and the `_refit_stacked` closure at `:444`). Phase 1b PR 2/8 deferred this; StackedDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham conley follow-up. | `diff_diff/stacked_did.py` | follow-up | Medium |
 | Extend `WooldridgeDiD` with `vcov_type="conley"` — thread the six `conley_*` params through `solve_ols` in `_fit_ols`. Phase 1b PR 3/8 deferred this; WooldridgeDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham / StackedDiD conley follow-ups. | `diff_diff/wooldridge.py` | follow-up | Medium |
 | Extend `WooldridgeDiD` `method ∈ {"logit","poisson"}` paths with `vcov_type ∈ {classical, hc2, hc2_bm}`. The GLM QMLE sandwich uses pseudo-residuals (`weights=p(1-p)` for logit, `weights=μ_i` for Poisson, aweight semantics); composing HC2 leverage and Bell-McCaffrey Satterthwaite DOF with QMLE on canonical-link pseudo-residuals needs derivation + R parity against `clubSandwich::vcovCR(glm(...), type="CR2")`. Phase 1b PR 3/8 rejects `method != "ols" + vcov_type != "hc1"` at `__init__` with a deferral pointer here. | `diff_diff/wooldridge.py` (`_fit_logit`, `_fit_poisson`) | follow-up | Medium |
+| Extend `CallawaySantAnna` with `vcov_type="conley"` — would require deriving a spatial-HAC composition for per-unit influence functions (Conley 1999 spatial kernel × per-(g,t) IF aggregation); no reference implementation exists today. Phase 1b interstitial PR rejected this at `__init__` with a deferral pointer here. | `diff_diff/staggered.py` | follow-up | Low |
+| Decide whether to formally deprecate `CallawaySantAnna.cluster=X` in favor of `survey_design=SurveyDesign(psu=X)`. Both APIs are first-class today (the bare-cluster path synthesizes a minimal SurveyDesign internally), but having two equivalent paths to express the same intent creates redundant surface. Mirrors a similar question for ImputationDiD / EfficientDiD / TwoStageDiD if those estimators ever face the same review. | `diff_diff/staggered.py` | follow-up | Low |
 | Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
 <!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
      form matches clubSandwich's specific algebra (W not sqrt(W), W^2 in bias term,
@@ -196,7 +198,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 
 #### Tier B — Mid-size methodology (5-10 CI rounds expected, per memory cascade priors)
 
-- Thread `vcov_type` through the 5 remaining standalone estimators: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD` (Phase 1b PR 1/8 added SunAbraham, PR 2/8 added StackedDiD, PR 3/8 added WooldridgeDiD-OLS)
+- Thread `vcov_type` through the 4 remaining standalone estimators: `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD` (Phase 1b PR 1/8 added SunAbraham, PR 2/8 added StackedDiD, PR 3/8 added WooldridgeDiD-OLS; interstitial post-PR-3/8 narrowed CallawaySantAnna permanently to `{hc1}` per IF-based variance + fixed bare-`cluster=` silent no-op)
 - SyntheticDiD: rename internal `placebo_effects` → `variance_effects` AND public `placebo_effects` field with deprecation alias retained for one release (`synthetic_did.py`, `results.py`)
 - StaggeredTripleDifference R parity: commit CSV fixtures + add covariate-adjusted scenarios + aggregation-SE assertions (`tests/test_methodology_staggered_triple_diff.py`, `benchmarks/R/benchmark_staggered_triplediff.R`)
 - StaggeredTripleDifference: per-cohort group-effect SE WIF override for exact R `triplediff` match (`staggered_triple_diff.py`)
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index 86dbfac9..b6932d6e 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -188,7 +188,7 @@ CallawaySantAnna(
     anticipation: int = 0,                       # Anticipation periods
     estimation_method: str = "dr",               # "dr", "ipw", or "reg"
     alpha: float = 0.05,
-    cluster: str | None = None,                  # Defaults to unit-level clustering
+    cluster: str | None = None,                  # Cluster col; activates CR1 on the IF via synthesized SurveyDesign(psu=col). None → per-unit IF.
     n_bootstrap: int = 0,                        # 0 = analytical SEs, 999+ recommended
     bootstrap_weights: str | None = None,        # "rademacher", "mammen", or "webb"
     seed: int | None = None,
@@ -196,6 +196,7 @@ CallawaySantAnna(
     base_period: str = "varying",                # "varying" or "universal"
     cband: bool = True,                          # Simultaneous confidence bands
     pscore_trim: float = 0.01,                   # Propensity score trimming bound
+    vcov_type: str = "hc1",                      # {"hc1"} only — IF-based variance per Callaway & Sant'Anna (2021). Analytical-sandwich {classical, hc2, hc2_bm} and conley REJECTED at __init__ (see REGISTRY.md IF-vs-sandwich subsection).
 )
 ```
 
diff --git a/diff_diff/honest_did.py b/diff_diff/honest_did.py
index 714285e3..624f8736 100644
--- a/diff_diff/honest_did.py
+++ b/diff_diff/honest_did.py
@@ -478,7 +478,9 @@ def to_dataframe(self) -> pd.DataFrame:
                     "ub": ub,
                     "ci_lb": ci_lb,
                     "ci_ub": ci_ub,
-                    "is_significant": (np.isfinite(ci_lb) and np.isfinite(ci_ub) and not (ci_lb <= 0 <= ci_ub)),
+                    "is_significant": (
+                        np.isfinite(ci_lb) and np.isfinite(ci_ub) and not (ci_lb <= 0 <= ci_ub)
+                    ),
                 }
             )
         return pd.DataFrame(rows)
@@ -649,13 +651,26 @@ def _extract_event_study_params(
             # Fallback: diagonal from SEs
             sigma = np.diag(np.array(ses) ** 2)
 
-        # Extract survey df. Replicate designs with undefined df → sentinel 0.
+        # Extract inference df. Prefer ``survey_metadata.df_survey`` (the
+        # actual CS-internal df, which may have been tightened post-resolve
+        # for replicate designs) over the dedicated ``df_inference`` field.
+        # ``df_inference`` is the FALLBACK carrier for bare-``cluster=`` CS
+        # fits where ``survey_metadata`` is intentionally None (preserving
+        # the survey/non-survey contract for ``DiagnosticReport`` /
+        # ``summary()``). Reading ``df_inference`` first would silently
+        # overstate the denominator df on panel survey fits whose df was
+        # tightened during aggregation. Replicate designs with undefined
+        # df → sentinel 0.
         df_survey = None
         if hasattr(results, "survey_metadata") and results.survey_metadata is not None:
             sm = results.survey_metadata
             df_survey = getattr(sm, "df_survey", None)
             if df_survey is None and getattr(sm, "replicate_method", None) is not None:
-                df_survey = 0
+                df_survey = 0  # undefined replicate df → NaN inference
+        if df_survey is None:
+            df_inference = getattr(results, "df_inference", None)
+            if df_inference is not None:
+                df_survey = int(df_inference)
 
         return (
             beta_hat,
@@ -817,15 +832,26 @@ def _extract_event_study_params(
                         "or use balance_e to restrict to a balanced subset."
                     )
 
-                # Extract survey df. For replicate designs with undefined df
-                # (rank <= 1), use sentinel df=0 so _get_critical_value returns
-                # NaN, matching the safe_inference contract.
+                # Extract inference df. Prefer ``survey_metadata.df_survey``
+                # (the actual CS-internal df, which may have been tightened
+                # post-resolve for replicate designs) over the dedicated
+                # ``df_inference`` field. ``df_inference`` is the FALLBACK
+                # carrier for bare-``cluster=`` CS fits where
+                # ``survey_metadata`` is intentionally None (preserving the
+                # survey/non-survey contract for ``DiagnosticReport`` /
+                # ``summary()``). Mirrors the MPD branch preference order
+                # at ``_extract_event_study_params`` so all branches agree.
+                # Replicate designs with undefined df → sentinel 0.
                 df_survey = None
                 if hasattr(results, "survey_metadata") and results.survey_metadata is not None:
                     sm = results.survey_metadata
                     df_survey = getattr(sm, "df_survey", None)
                     if df_survey is None and getattr(sm, "replicate_method", None) is not None:
                         df_survey = 0  # undefined replicate df → NaN inference
+                if df_survey is None:
+                    df_inference = getattr(results, "df_inference", None)
+                    if df_inference is not None:
+                        df_survey = int(df_inference)
 
                 return (
                     beta_hat,
@@ -884,8 +910,8 @@ def _extract_event_study_params(
                     if np.isfinite(data.get("se", np.nan))
                 }
 
-                pre_times = sorted(placebo_finite.keys())   # -P, ..., -1
-                post_times = sorted(effects_finite.keys())   # 1, ..., L_max
+                pre_times = sorted(placebo_finite.keys())  # -P, ..., -1
+                post_times = sorted(effects_finite.keys())  # 1, ..., L_max
 
                 if len(pre_times) == 0:
                     raise ValueError(
@@ -975,15 +1001,26 @@ def _largest_consecutive_block(times, boundary_val):
                 beta_hat = np.array(effects)
                 sigma = np.diag(np.array(ses) ** 2)
 
-                # Extract survey df. For replicate designs with undefined df
-                # (rank <= 1), use sentinel df=0 so _get_critical_value returns
-                # NaN, matching the safe_inference contract.
+                # Extract inference df. Prefer ``survey_metadata.df_survey``
+                # (the actual CS-internal df, which may have been tightened
+                # post-resolve for replicate designs) over the dedicated
+                # ``df_inference`` field. ``df_inference`` is the FALLBACK
+                # carrier for bare-``cluster=`` CS fits where
+                # ``survey_metadata`` is intentionally None (preserving the
+                # survey/non-survey contract for ``DiagnosticReport`` /
+                # ``summary()``). Mirrors the MPD branch preference order
+                # at ``_extract_event_study_params`` so all branches agree.
+                # Replicate designs with undefined df → sentinel 0.
                 df_survey = None
                 if hasattr(results, "survey_metadata") and results.survey_metadata is not None:
                     sm = results.survey_metadata
                     df_survey = getattr(sm, "df_survey", None)
                     if df_survey is None and getattr(sm, "replicate_method", None) is not None:
                         df_survey = 0  # undefined replicate df → NaN inference
+                if df_survey is None:
+                    df_inference = getattr(results, "df_inference", None)
+                    if df_inference is not None:
+                        df_survey = int(df_inference)
 
                 return (
                     beta_hat,
@@ -1045,17 +1082,17 @@ def _construct_A_sd(num_pre_periods: int, num_post_periods: int) -> np.ndarray:
     # Row i corresponds to: delta_{-(T-i)} - 2*delta_{-(T-i-1)} + delta_{-(T-i-2)}
     for i in range(T - 2):
         row = np.zeros(total)
-        row[i] = 1        # delta_{t-1}
-        row[i + 1] = -2   # delta_t
-        row[i + 2] = 1    # delta_{t+1}
+        row[i] = 1  # delta_{t-1}
+        row[i + 1] = -2  # delta_t
+        row[i + 2] = 1  # delta_{t+1}
         rows.append(row)
 
     # Boundary constraint at t = -1: delta_{-2} - 2*delta_{-1} + delta_0
     # With delta_0 = 0: delta_{-2} - 2*delta_{-1}
     if T >= 2:
         row = np.zeros(total)
-        row[T - 2] = 1    # delta_{-2}
-        row[T - 1] = -2   # delta_{-1}
+        row[T - 2] = 1  # delta_{-2}
+        row[T - 1] = -2  # delta_{-1}
         # delta_0 = 0, no entry needed
         rows.append(row)
 
@@ -1063,25 +1100,25 @@ def _construct_A_sd(num_pre_periods: int, num_post_periods: int) -> np.ndarray:
     # With delta_0 = 0: delta_{-1} + delta_1
     if T >= 1 and Tbar >= 1:
         row = np.zeros(total)
-        row[T - 1] = 1    # delta_{-1}
-        row[T] = 1         # delta_1
+        row[T - 1] = 1  # delta_{-1}
+        row[T] = 1  # delta_1
         rows.append(row)
 
     # Boundary constraint at t = 1: delta_0 - 2*delta_1 + delta_2
     # With delta_0 = 0: -2*delta_1 + delta_2
     if Tbar >= 2:
         row = np.zeros(total)
-        row[T] = -2        # delta_1
-        row[T + 1] = 1     # delta_2
+        row[T] = -2  # delta_1
+        row[T + 1] = 1  # delta_2
         rows.append(row)
 
     # Pure post-period second differences: event times t = 2, ..., Tbar-1
     # delta_{t+1} - 2*delta_t + delta_{t-1}, all within the post-period block
     for t in range(2, Tbar):
         row = np.zeros(total)
-        row[T + t - 2] = 1    # delta_{t-1}
-        row[T + t - 1] = -2   # delta_t
-        row[T + t] = 1        # delta_{t+1}
+        row[T + t - 2] = 1  # delta_{t-1}
+        row[T + t - 1] = -2  # delta_t
+        row[T + t] = 1  # delta_{t+1}
         rows.append(row)
 
     if not rows:
@@ -1182,12 +1219,12 @@ def _construct_constraints_rm_component(
     # t=1, ..., Tbar-1: |delta_{t+1} - delta_t| <= bound
     for t in range(1, Tbar):
         row_pos = np.zeros(total)
-        row_pos[T + t] = 1       # delta_{t+1}
+        row_pos[T + t] = 1  # delta_{t+1}
         row_pos[T + t - 1] = -1  # -delta_t
         rows.append(row_pos)
         row_neg = np.zeros(total)
-        row_neg[T + t] = -1      # -delta_{t+1}
-        row_neg[T + t - 1] = 1   # delta_t
+        row_neg[T + t] = -1  # -delta_{t+1}
+        row_neg[T + t - 1] = 1  # delta_t
         rows.append(row_neg)
 
     if not rows:
@@ -1290,9 +1327,7 @@ def _solve_rm_bounds_union(
     A_ineq, b_ineq = _construct_constraints_rm_component(
         num_pre_periods, num_post, Mbar, max_pre_fd
     )
-    return _solve_bounds_lp(
-        beta_pre, beta_post, l_vec, A_ineq, b_ineq, num_pre_periods, lp_method
-    )
+    return _solve_bounds_lp(beta_pre, beta_post, l_vec, A_ineq, b_ineq, num_pre_periods, lp_method)
 
 
 def _solve_bounds_lp(
@@ -1550,9 +1585,7 @@ def _build_fd_transform(num_pre: int, num_post: int) -> np.ndarray:
     return C
 
 
-def _build_fd_smoothness_constraints(
-    num_fd: int, M: float
-) -> Tuple[np.ndarray, np.ndarray]:
+def _build_fd_smoothness_constraints(num_fd: int, M: float) -> Tuple[np.ndarray, np.ndarray]:
     """
     Build smoothness constraints in first-difference space.
 
@@ -1839,7 +1872,7 @@ def _setup_moment_inequalities(
     # Change of basis: first column = l direction, rest = complement
     # Use QR on l to get orthogonal complement
     l_full = l.reshape(-1, 1)
-    Q, _ = np.linalg.qr(np.hstack([l_full, np.eye(num_post)[:, :num_post - 1]]))
+    Q, _ = np.linalg.qr(np.hstack([l_full, np.eye(num_post)[:, : num_post - 1]]))
 
     A_tilde_rotated = A_tilde @ Q  # Rotate into (l, complement) basis
 
@@ -1905,10 +1938,12 @@ def _enumerate_vertices(
         # gamma[basis_idx]' @ X_tilde[basis_idx, :] = 0
         # gamma[basis_idx]' @ sigma_tilde_diag[basis_idx] = 1
         if n_nuisance > 0:
-            A_sys = np.vstack([
-                X_tilde[basis_idx, :].T,
-                sigma_tilde_diag[basis_idx].reshape(1, -1),
-            ])
+            A_sys = np.vstack(
+                [
+                    X_tilde[basis_idx, :].T,
+                    sigma_tilde_diag[basis_idx].reshape(1, -1),
+                ]
+            )
         else:
             A_sys = sigma_tilde_diag[basis_idx].reshape(1, -1)
 
@@ -2282,7 +2317,7 @@ def fit(
         M = M if M is not None else self.M
 
         # Extract event study parameters
-        (beta_hat, sigma, num_pre, num_post, pre_periods, post_periods, df_survey) = (
+        beta_hat, sigma, num_pre, num_post, pre_periods, post_periods, df_survey = (
             _extract_event_study_params(results)
         )
 
@@ -2343,8 +2378,15 @@ def fit(
         # Compute bounds based on method
         if self.method == "smoothness":
             lb, ub, ci_lb, ci_ub = self._compute_smoothness_bounds(
-                beta_pre, beta_post, sigma, sigma_post, l_vec,
-                num_pre, num_post, M, df=df_survey,
+                beta_pre,
+                beta_post,
+                sigma,
+                sigma_post,
+                l_vec,
+                num_pre,
+                num_post,
+                M,
+                df=df_survey,
             )
             ci_method = "FLCI"
 
@@ -2425,9 +2467,7 @@ def _compute_smoothness_bounds(
         A_ineq, b_ineq = _construct_constraints_sd(num_pre, num_post, M)
 
         # Solve for identified set bounds with delta_pre = beta_pre pinned
-        lb, ub = _solve_bounds_lp(
-            beta_pre, beta_post, l_vec, A_ineq, b_ineq, num_pre
-        )
+        lb, ub = _solve_bounds_lp(beta_pre, beta_post, l_vec, A_ineq, b_ineq, num_pre)
 
         # Propagate infeasibility: if bounds are NaN, CI is NaN too
         if np.isnan(lb) or np.isnan(ub):
@@ -2436,8 +2476,15 @@ def _compute_smoothness_bounds(
         # Compute optimal FLCI (Rambachan & Roth Section 4.1)
         if sigma_full.shape[0] == num_pre + num_post:
             ci_lb, ci_ub = _compute_optimal_flci(
-                beta_pre, beta_post, sigma_full, l_vec,
-                num_pre, num_post, M, self.alpha, df=df,
+                beta_pre,
+                beta_post,
+                sigma_full,
+                l_vec,
+                num_pre,
+                num_post,
+                M,
+                self.alpha,
+                df=df,
             )
         else:
             # Fallback to naive FLCI when full sigma unavailable
@@ -2472,9 +2519,7 @@ def _compute_rm_bounds(
         inequality transformation.
         """
         # Solve identified set via union of polyhedra
-        lb, ub = _solve_rm_bounds_union(
-            beta_pre, beta_post, l_vec, num_pre, Mbar
-        )
+        lb, ub = _solve_rm_bounds_union(beta_pre, beta_post, l_vec, num_pre, Mbar)
 
         # CI construction for Delta^RM.
         # The paper recommends ARP conditional/hybrid confidence sets
@@ -2520,14 +2565,30 @@ def _compute_combined_bounds(
         )
         # Get smoothness bounds
         lb_sd, ub_sd, _, _ = self._compute_smoothness_bounds(
-            beta_pre, beta_post, sigma_full, sigma_post, l_vec,
-            num_pre, num_post, M, df=df,
+            beta_pre,
+            beta_post,
+            sigma_full,
+            sigma_post,
+            l_vec,
+            num_pre,
+            num_post,
+            M,
+            df=df,
         )
 
         # Get RM bounds (use M as Mbar for combined)
         lb_rm, ub_rm, _, _ = self._compute_rm_bounds(
-            beta_pre, beta_post, sigma_full, sigma_post, l_vec,
-            num_pre, num_post, M, pre_periods, results, df=df,
+            beta_pre,
+            beta_post,
+            sigma_full,
+            sigma_post,
+            l_vec,
+            num_pre,
+            num_post,
+            M,
+            pre_periods,
+            results,
+            df=df,
         )
 
         # Combined bounds are intersection
@@ -2651,6 +2712,7 @@ def _find_breakdown(
 
         Uses binary search for precision.
         """
+
         # Check if any CI includes zero (NaN CIs are treated as undefined, not significant)
         def _ci_includes_zero(ci_lb, ci_ub):
             if not (np.isfinite(ci_lb) and np.isfinite(ci_ub)):
@@ -2709,6 +2771,7 @@ def breakdown_value(
         float or None
             Breakdown value, or None if effect is always significant.
         """
+
         def _ci_covers_zero(r):
             if not (np.isfinite(r.ci_lb) and np.isfinite(r.ci_ub)):
                 return True  # Undefined CIs are not "significant"
diff --git a/diff_diff/staggered.py b/diff_diff/staggered.py
index a57e8a27..07680e35 100644
--- a/diff_diff/staggered.py
+++ b/diff_diff/staggered.py
@@ -92,6 +92,104 @@ def _linear_regression(
     return beta, residuals
 
 
+def _cluster_robust_se_from_per_gt_if(
+    inf_info: Dict[str, Any],
+    resolved_survey: "Any",
+) -> Optional[float]:
+    """CR1 Liang-Zeger cluster-robust SE for a single (g,t) ATT.
+
+    Builds the per-(g,t) per-index IF vector from ``inf_info`` and routes
+    through ``compute_survey_if_variance`` so that the per-cell variance
+    inherits the SAME design-based machinery as the aggregate path:
+
+        V = sum_h (1 - f_h) * (n_h / (n_h - 1)) * sum_j (psi_hj - psi_h_bar)^2
+
+    where ``psi_hj = sum_{i in PSU j, stratum h} psi_i``. This matches
+    the documented CR1 contract in REGISTRY.md (synthesized
+    ``SurveyDesign(psu=cluster)`` → ``_compute_stratified_psu_meat``)
+    and applies the G/(G-1) finite-sample correction, PSU centering,
+    FPC, and lonely-PSU handling uniformly with overall / event-study
+    inference.
+
+    For the panel path, ``resolved_survey`` is ``resolved_survey_unit``
+    (length n_units) and the IF index space is per-unit. For the RCS
+    path, ``resolved_survey`` is the per-obs ``resolved_survey`` (length
+    n_obs). The helper is index-space agnostic — it just requires
+    ``treated_idx`` / ``control_idx`` in ``inf_info`` to be valid
+    offsets into ``resolved_survey.psu``.
+
+    Return contract (callers depend on this distinction):
+
+    * **float SE** — finite cluster-robust variance; caller uses it.
+    * **NaN** — ``compute_survey_if_variance`` returned NaN (clustered
+      variance unidentified, e.g., G<2 or lonely-PSU removed all strata).
+      Caller MUST propagate this NaN through to ``safe_inference`` so
+      the per-cell inference surface (se / t_stat / p_value / conf_int)
+      is NaN-consistent — NEVER fall back to the unit-level SE. Falling
+      back would silently report a different estimator's variance under
+      a clustered request (``feedback_no_silent_failures``).
+    * **None** — malformed inputs or invariant violations:
+      ``inf_info`` lacks required IF fields, ``resolved_survey.psu`` is
+      None, index alignment cannot be verified, or
+      ``compute_survey_if_variance`` returned a negative variance. In
+      these cases the helper cannot evaluate the contract; caller falls
+      back to the unit-level SE returned by the underlying estimation
+      method (no PSU is in play, so unit-level is the documented default).
+    """
+    if (
+        inf_info is None
+        or "treated_inf" not in inf_info
+        or "control_inf" not in inf_info
+        or "treated_idx" not in inf_info
+        or "control_idx" not in inf_info
+    ):
+        return None
+    treated_idx = np.asarray(inf_info["treated_idx"])
+    control_idx = np.asarray(inf_info["control_idx"])
+    treated_inf = np.asarray(inf_info["treated_inf"])
+    control_inf = np.asarray(inf_info["control_inf"])
+    psu_array = getattr(resolved_survey, "psu", None)
+    if psu_array is None:
+        return None
+    n = len(psu_array)
+    if (
+        treated_idx.size > 0
+        and (treated_idx.max(initial=-1) >= n or treated_idx.min(initial=0) < 0)
+    ) or (
+        control_idx.size > 0
+        and (control_idx.max(initial=-1) >= n or control_idx.min(initial=0) < 0)
+    ):
+        return None
+    psi_per_index = np.zeros(n)
+    if treated_idx.size:
+        np.add.at(psi_per_index, treated_idx, treated_inf)
+    if control_idx.size:
+        np.add.at(psi_per_index, control_idx, control_inf)
+    # Route through the shared survey helper so the per-cell variance
+    # gets the same G/(G-1) finite-sample correction, PSU centering,
+    # FPC handling, and lonely-PSU/G<2→NaN behavior as overall +
+    # event-study inference (per the documented CR1 contract).
+    from diff_diff.survey import compute_survey_if_variance
+
+    var = compute_survey_if_variance(psi_per_index, resolved_survey)
+    # Return contract:
+    #   float SE → use it (finite cluster-robust variance)
+    #   NaN     → propagate NaN so the caller can NaN-out the inference
+    #             surface rather than silently falling back to the
+    #             unit-level SE (per feedback_no_silent_failures: when
+    #             clustered variance is undefined — e.g., G<2, lonely-PSU
+    #             removed all strata — the user-facing per-cell SE must
+    #             reflect that, not silently revert to a different
+    #             estimator).
+    #   None    → malformed (negative variance or other invariant
+    #             violation); caller falls back to the unit-level SE.
+    if np.isnan(var):
+        return float("nan")
+    if var < 0:
+        return None
+    return float(np.sqrt(var))
+
+
 def _safe_inv(
     A: np.ndarray,
     tracker: Optional[list] = None,
@@ -152,8 +250,26 @@ class CallawaySantAnna(
     alpha : float, default=0.05
         Significance level for confidence intervals.
     cluster : str, optional
-        Column name for cluster-robust standard errors.
-        Defaults to unit-level clustering.
+        Column name for cluster-robust standard errors. When set, the
+        influence-function aggregator clusters at the named level via a
+        synthesized ``SurveyDesign(psu=cluster_col)`` threaded through the
+        existing PSU-meat machinery (``_compute_stratified_psu_meat``) and
+        PSU-level multiplier bootstrap. When ``None`` (default), the
+        aggregator uses per-unit IF variance (Williams 2000 form). When
+        ``survey_design=SurveyDesign(psu=...)`` is also provided, the
+        explicit PSU takes precedence; a ``UserWarning`` fires if the bare
+        ``cluster=`` partition differs from the explicit PSU partition.
+    vcov_type : str, default="hc1"
+        Variance family. CallawaySantAnna accepts ``{"hc1"}`` only —
+        ``hc1`` means per-unit IF variance when ``cluster=None`` and CR1
+        Liang-Zeger on the IF when ``cluster=X`` is set. The
+        analytical-sandwich families (``classical``, ``hc2``, ``hc2_bm``)
+        and spatial-HAC (``conley``) are rejected at ``__init__`` because
+        CS's per-(g,t) doubly-robust / IPW / outcome-regression structure
+        has no single design matrix to compute hat-matrix leverage or
+        Bell-McCaffrey Satterthwaite DOF on. See REGISTRY.md "IF-based
+        variance estimators vs analytical-sandwich estimators" for the
+        structural taxonomy.
     n_bootstrap : int, default=0
         Number of bootstrap iterations for inference.
         If 0, uses analytical standard errors.
@@ -321,6 +437,7 @@ def __init__(
         panel: bool = True,
         epv_threshold: float = 10,
         pscore_fallback: str = "error",
+        vcov_type: str = "hc1",
     ):
         import warnings
 
@@ -363,11 +480,27 @@ def __init__(
                 f"base_period must be 'varying' or 'universal', " f"got '{base_period}'"
             )
 
+        # vcov_type input contract: CallawaySantAnna is permanently narrow
+        # to {"hc1"} because the analytical-sandwich families (classical,
+        # hc2, hc2_bm) require a single regression's hat matrix that CS's
+        # per-(g,t) doubly-robust / IPW / outcome-regression structure
+        # doesn't have. See REGISTRY.md "IF-based variance estimators vs
+        # analytical-sandwich estimators" for the structural taxonomy.
+        # Factored out so fit() can re-run it after sklearn-style
+        # set_params bypasses __init__ validation.
+        self._validate_vcov_type(vcov_type)
+
         self.control_group = control_group
         self.anticipation = anticipation
         self.estimation_method = estimation_method
         self.alpha = alpha
         self.cluster = cluster
+        self.vcov_type = vcov_type
+        # Track whether vcov_type was explicitly set (for future symmetry
+        # with SA / StackedDiD / WooldridgeDiD set_params patterns; the
+        # narrow contract makes the flag a no-op today but consistency
+        # avoids surprises if the contract ever broadens).
+        self._vcov_type_explicit = vcov_type != "hc1"
         self.n_bootstrap = n_bootstrap
         self.bootstrap_weights = bootstrap_weights
         self.seed = seed
@@ -1000,7 +1133,7 @@ def _compute_all_att_gt_vectorized(
             all_units = precomputed["all_units"]
             treated_positions = np.where(treated_valid)[0]
             control_positions = np.where(control_valid)[0]
-            influence_func_info[(g, t)] = {
+            inf_info_gt = {
                 "treated_idx": treated_positions,
                 "control_idx": control_positions,
                 "treated_units": all_units[treated_positions],
@@ -1008,6 +1141,24 @@ def _compute_all_att_gt_vectorized(
                 "treated_inf": inf_treated,
                 "control_inf": inf_control,
             }
+            influence_func_info[(g, t)] = inf_info_gt
+
+            # Cluster-aware per-(g,t) SE: aggregate the per-(g,t) IF by
+            # PSU when a survey design (explicit OR synthesized from bare
+            # cluster=) provides one. Bit-equal to pre-PR when psu is None.
+            rsu_for_gt = precomputed.get("resolved_survey_unit")
+            if rsu_for_gt is not None and getattr(rsu_for_gt, "psu", None) is not None:
+                se_cluster = _cluster_robust_se_from_per_gt_if(inf_info_gt, rsu_for_gt)
+                # se_cluster is float (use), NaN (cluster-undefined, propagate),
+                # or None (malformed, keep unit-level). Per the helper's
+                # contract, NaN is a deliberate signal that the survey/PSU
+                # variance is unidentified (e.g., G<2, lonely-PSU removed
+                # all strata) — propagating NaN here causes safe_inference
+                # to NaN-out the full per-cell inference surface, which is
+                # the documented CS contract (feedback_no_silent_failures).
+                if se_cluster is not None:
+                    se = se_cluster
+                    group_time_effects[(g, t)]["se"] = se
 
             atts.append(att)
             ses.append(se)
@@ -1344,7 +1495,7 @@ def _compute_all_att_gt_covariate_reg(
                 all_units = precomputed["all_units"]
                 treated_positions = np.where(treated_valid)[0]
                 control_positions = np.where(control_valid)[0]
-                influence_func_info[(g, t)] = {
+                inf_info_gt = {
                     "treated_idx": treated_positions,
                     "control_idx": control_positions,
                     "treated_units": all_units[treated_positions],
@@ -1352,6 +1503,19 @@ def _compute_all_att_gt_covariate_reg(
                     "treated_inf": inf_treated,
                     "control_inf": inf_control,
                 }
+                influence_func_info[(g, t)] = inf_info_gt
+
+                # Cluster-aware per-(g,t) SE — see same pattern in
+                # _compute_all_att_gt_vectorized.
+                rsu_for_gt = precomputed.get("resolved_survey_unit")
+                if rsu_for_gt is not None and getattr(rsu_for_gt, "psu", None) is not None:
+                    se_cluster = _cluster_robust_se_from_per_gt_if(inf_info_gt, rsu_for_gt)
+                    # Propagate NaN (cluster-undefined) per the helper's
+                    # contract — see _compute_all_att_gt_vectorized site
+                    # for the same pattern.
+                    if se_cluster is not None:
+                        se = se_cluster
+                        group_time_effects[(g, t)]["se"] = se
 
                 atts.append(att)
                 ses.append(se)
@@ -1465,6 +1629,12 @@ def fit(
         # cell. Sibling of PR #9 finding #17.
         self._safe_inv_tracker: List[float] = []
 
+        # Re-validate vcov_type at fit-time so sklearn-style set_params
+        # mutations are caught before they propagate to Results metadata.
+        # __init__ already validated the constructor argument; this is the
+        # second layer for the post-construction mutation path.
+        self._validate_vcov_type(self.vcov_type)
+
         if not self.panel:
             warnings.warn(
                 "panel=False uses repeated cross-section DRDID estimators "
@@ -1490,6 +1660,7 @@ def fit(
 
         # Resolve survey design if provided
         from diff_diff.survey import (
+            SurveyDesign,
             _resolve_survey_for_fit,
             _validate_unit_constant_survey,
         )
@@ -1498,10 +1669,119 @@ def fit(
             _resolve_survey_for_fit(survey_design, data, "analytical")
         )
 
-        # Validate within-unit constancy for panel survey designs
+        # Wire bare cluster= into the survey-PSU machinery BEFORE the
+        # unit-constant + pweight validators below, so the synthesized
+        # survey design also passes through validation (otherwise movers
+        # on panel data — units crossing cluster boundaries — would
+        # silently get a first-value-wins collapse via
+        # _collapse_survey_to_unit_level). Mirrors estimators.py:497-516,
+        # adapted for CS's IF-based variance: no solve_ols(cluster_ids=...)
+        # fallback exists, so we synthesize a minimal SurveyDesign(psu=...)
+        # to reach the existing PSU-meat aggregator + PSU multiplier
+        # bootstrap. Both consume resolved_survey.psu, so synthesis
+        # transparently activates the same code paths used when the user
+        # passes SurveyDesign(psu=X) explicitly.
+        effective_survey_design = survey_design  # refreshed below if synthesized
+        if self.cluster is not None:
+            if self.cluster not in data.columns:
+                raise ValueError(f"cluster column '{self.cluster}' not found in data")
+            # Pre-validate cluster NaN with a cluster-domain error message.
+            # Without this, SurveyDesign.resolve() raises "PSU column ...
+            # contains missing values" — wrong domain for the user-facing
+            # cluster= API.
+            if data[self.cluster].isna().any():
+                raise ValueError(
+                    f"cluster column '{self.cluster}' contains missing "
+                    "values. All observations must have valid cluster "
+                    "identifiers."
+                )
+            # Reject replicate-weight + cluster=: replicate IF variance is
+            # computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR)
+            # and ignores PSU/cluster entirely (survey.py:104-109 enforces
+            # replicate_weights are mutually exclusive with strata/psu/fpc).
+            # Honoring bare cluster= here would silently have no effect on
+            # variance while populating cluster_name/n_clusters on Results
+            # dishonestly. Fail-closed per feedback_no_silent_failures.
+            if (
+                survey_design is not None
+                and getattr(survey_design, "replicate_weights", None) is not None
+            ):
+                raise NotImplementedError(
+                    f"CallawaySantAnna(cluster={self.cluster!r}) is not "
+                    "supported with replicate-weight survey designs. "
+                    "Replicate-weight variance is computed by replicate "
+                    "reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores "
+                    "PSU/cluster entirely — setting cluster= would silently "
+                    "have no effect on the variance estimate. Either omit "
+                    "cluster= (the replicate weights encode the design "
+                    "structure implicitly) or use a non-replicate survey "
+                    "design (with explicit strata/psu/fpc)."
+                )
+            cluster_ids = data[self.cluster].values
+
+            if resolved_survey is None:
+                # Bare cluster=, no survey: synthesize minimal
+                # SurveyDesign(psu=...). Activates the same survey-PSU
+                # aggregator + bootstrap machinery already used when the
+                # user passes SurveyDesign(psu=X) explicitly.
+                synthetic_design = SurveyDesign(psu=self.cluster, weight_type="pweight")
+                (
+                    resolved_survey,
+                    survey_weights,
+                    survey_weight_type,
+                    _,
+                ) = _resolve_survey_for_fit(synthetic_design, data, "analytical")
+                effective_survey_design = synthetic_design
+                # survey_metadata stays None — user did NOT provide a
+                # survey design, so downstream consumers that check
+                # `survey_metadata is not None` for "original fit used a
+                # survey design" (DiagnosticReport at diagnostic_report.py:
+                # 848-856 + 1150-1158, summary at staggered_results.py:
+                # 235-238) must continue to see a non-survey fit. The
+                # cluster-level df is carried via the dedicated
+                # `df_inference` field on Results (set below after the
+                # cluster-handling block).
+            elif resolved_survey.psu is None:
+                # User provided survey_design (weights/strata/etc.) without
+                # PSU AND cluster=X. Inject cluster as PSU into the resolved
+                # survey. Construct effective_survey_design with
+                # psu=self.cluster (via dataclasses.replace) so the
+                # downstream _validate_unit_constant_survey catches movers
+                # on panel data (otherwise the validator runs on the user-
+                # provided design which has no PSU column).
+                from dataclasses import replace
+
+                from diff_diff.survey import (
+                    _inject_cluster_as_psu,
+                    compute_survey_metadata,
+                )
+
+                effective_survey_design = replace(survey_design, psu=self.cluster)
+                resolved_survey = _inject_cluster_as_psu(resolved_survey, cluster_ids)
+                # Recompute survey_metadata to reflect the new effective PSU
+                # (so n_psu / df_survey on Results reflect the injected
+                # cluster, not the no-PSU pre-injection state).
+                raw_w = (
+                    data[survey_design.weights].values.astype(np.float64)
+                    if survey_design.weights
+                    else np.ones(len(data), dtype=np.float64)
+                )
+                survey_metadata = compute_survey_metadata(resolved_survey, raw_w)
+            else:
+                # User provided survey_design with psu= AND cluster=X. PSU
+                # wins; _resolve_effective_cluster emits a UserWarning if
+                # partitions differ. Return value intentionally discarded
+                # — helper's purpose is the warning, and the effective PSU
+                # is already resolved_survey.psu (no rewrite needed).
+                from diff_diff.survey import _resolve_effective_cluster
+
+                _resolve_effective_cluster(resolved_survey, cluster_ids, self.cluster)
+
+        # Validate within-unit constancy for panel survey designs (uses
+        # effective_survey_design so synthesized designs are validated too).
         if resolved_survey is not None:
             if self.panel:
-                _validate_unit_constant_survey(data, unit, survey_design)
+                _validate_unit_constant_survey(data, unit, effective_survey_design)
             if resolved_survey.weight_type != "pweight":
                 raise ValueError(
                     f"CallawaySantAnna survey support requires weight_type='pweight', "
@@ -1669,6 +1949,26 @@ def fit(
                     agg_w = rc_result[6] if len(rc_result) > 6 else n_treat
 
                     if att_gt is not None:
+                        # Cluster-aware per-(g,t) SE on the RCS path. RC
+                        # IF indices are per-obs (vs per-unit on the panel
+                        # path); the corresponding PSU array is
+                        # ``resolved_survey.psu`` (length n_obs), not
+                        # ``resolved_survey_unit.psu``. Bit-equal to pre-PR
+                        # when psu is None.
+                        rs_for_gt = precomputed.get("resolved_survey") if precomputed else None
+                        if (
+                            rs_for_gt is not None
+                            and getattr(rs_for_gt, "psu", None) is not None
+                            and inf_info is not None
+                        ):
+                            se_cluster = _cluster_robust_se_from_per_gt_if(inf_info, rs_for_gt)
+                            # Propagate NaN (cluster-undefined) per the
+                            # helper's contract — see
+                            # _compute_all_att_gt_vectorized for the
+                            # pattern.
+                            if se_cluster is not None:
+                                se_gt = se_cluster
+
                         t_stat, p_val, ci = safe_inference(
                             att_gt,
                             se_gt,
@@ -1761,6 +2061,26 @@ def fit(
                     )
 
                     if att_gt is not None:
+                        # Cluster-aware per-(g,t) SE: when a survey PSU is
+                        # in play (explicit OR synthesized from bare
+                        # cluster=), aggregate the per-(g,t) IF by PSU
+                        # and use CR1 Liang-Zeger SE instead of the
+                        # unit-level diff-of-means SE returned by OR/IPW/DR.
+                        # Preserves bit-equality when psu is None.
+                        rsu_for_gt = precomputed.get("resolved_survey_unit")
+                        if (
+                            rsu_for_gt is not None
+                            and getattr(rsu_for_gt, "psu", None) is not None
+                            and inf_info is not None
+                        ):
+                            se_cluster = _cluster_robust_se_from_per_gt_if(inf_info, rsu_for_gt)
+                            # Propagate NaN (cluster-undefined) per the
+                            # helper's contract — see
+                            # _compute_all_att_gt_vectorized for the
+                            # pattern.
+                            if se_cluster is not None:
+                                se_gt = se_cluster
+
                         t_stat, p_val, ci = safe_inference(
                             att_gt,
                             se_gt,
@@ -2035,6 +2355,42 @@ def fit(
             event_study_vcov = None
             event_study_vcov_index = None
 
+        # Resolve canonical cluster_name + n_clusters for Results metadata.
+        # Canonical PSU column wins when explicit: if survey_design has
+        # psu=, that's the canonical column even if bare cluster= was also
+        # set (a UserWarning would have fired above if partitions differed).
+        if survey_design is not None and getattr(survey_design, "psu", None) is not None:
+            cluster_name_for_results: Optional[str] = survey_design.psu
+        elif self.cluster is not None:
+            cluster_name_for_results = self.cluster
+        else:
+            cluster_name_for_results = None
+        n_clusters_for_results: Optional[int] = (
+            int(np.unique(resolved_survey.psu).size)
+            if (resolved_survey is not None and resolved_survey.psu is not None)
+            else None
+        )
+        # df_inference: cluster-level degrees of freedom for downstream
+        # inference (HonestDiD t-critical selection). Populated ONLY for
+        # the bare-cluster-synthesize path (where survey_metadata is None
+        # because the user did not provide a survey design). For
+        # inject/conflict branches, survey_metadata is populated and
+        # survey_metadata.df_survey carries the actual CS-internal df
+        # (which may have been tightened by overall_effective_df recompute
+        # at the aggregation step around L1995-1999). Narrowing here
+        # prevents HonestDiD from reading a stale/wrong df_inference for
+        # survey fits whose df was tightened post-resolve — fix for
+        # CI codex P1 (PR #487 round 1).
+        df_inference_for_results: Optional[int] = (
+            int(resolved_survey.df_survey)
+            if (
+                resolved_survey is not None
+                and survey_metadata is None
+                and getattr(resolved_survey, "df_survey", None) is not None
+            )
+            else None
+        )
+
         self.results_ = CallawaySantAnnaResults(
             group_time_effects=group_time_effects,
             overall_att=overall_att,
@@ -2063,6 +2419,10 @@ def fit(
             epv_diagnostics=epv_diagnostics if epv_diagnostics else None,
             epv_threshold=self.epv_threshold,
             pscore_fallback=self.pscore_fallback,
+            vcov_type=self.vcov_type,
+            cluster_name=cluster_name_for_results,
+            n_clusters=n_clusters_for_results,
+            df_inference=df_inference_for_results,
         )
 
         self.is_fitted_ = True
@@ -3922,6 +4282,51 @@ def _doubly_robust_rc(
         idx_all = None
         return att, se, inf_all, idx_all
 
+    @staticmethod
+    def _validate_vcov_type(vcov_type: str) -> None:
+        """Validate ``vcov_type`` membership against CS's narrow contract.
+
+        Called from ``__init__`` and from ``fit()`` (so sklearn-style
+        ``set_params(vcov_type=...)`` mutations are re-checked at use
+        time rather than silently passing a bad value through to Results).
+        """
+        _accepted_vcov = {"hc1"}
+        _deferred_vcov = {"conley"}
+        _if_incompatible_vcov = {"classical", "hc2", "hc2_bm"}
+        if vcov_type in _if_incompatible_vcov:
+            raise ValueError(
+                f"CallawaySantAnna(vcov_type={vcov_type!r}) is rejected: "
+                "CS uses influence-function-based variance per Callaway & "
+                "Sant'Anna (2021); the analytical-sandwich families "
+                "{'classical', 'hc2', 'hc2_bm'} are defined on a single "
+                "regression's hat matrix, and CS's per-(g,t) doubly-robust "
+                "/ IPW / outcome-regression structure has no equivalent "
+                "single design matrix to compute hat-matrix leverage or "
+                "Bell-McCaffrey Satterthwaite DOF on. The rejection is "
+                "library-architectural, not paper-prescribed. Use "
+                "vcov_type='hc1' (the default) with cluster=<col> for "
+                "cluster-robust inference. See docs/methodology/REGISTRY.md "
+                "'IF-based variance estimators vs analytical-sandwich "
+                "estimators' for the structural taxonomy."
+            )
+        if vcov_type in _deferred_vcov:
+            raise ValueError(
+                f"CallawaySantAnna(vcov_type={vcov_type!r}) is not yet "
+                "supported: spatial-HAC (Conley) on per-unit influence "
+                "functions could conceptually apply (spatial aggregation "
+                "of per-unit IFs) but requires separate methodology work. "
+                "Tracked as a follow-up TODO row. Use vcov_type='hc1' "
+                "(the default) with cluster=<col> for cluster-robust "
+                "inference today."
+            )
+        if vcov_type not in _accepted_vcov:
+            raise ValueError(
+                f"CallawaySantAnna(vcov_type={vcov_type!r}) is invalid. "
+                f"Accepted values: {sorted(_accepted_vcov)}. CS is "
+                "permanently narrow to 'hc1' per IF-based variance "
+                "structure; see REGISTRY.md."
+            )
+
     def get_params(self) -> Dict[str, Any]:
         """Get estimator parameters (sklearn-compatible)."""
         return {
@@ -3930,6 +4335,7 @@ def get_params(self) -> Dict[str, Any]:
             "estimation_method": self.estimation_method,
             "alpha": self.alpha,
             "cluster": self.cluster,
+            "vcov_type": self.vcov_type,
             "n_bootstrap": self.n_bootstrap,
             "bootstrap_weights": self.bootstrap_weights,
             "seed": self.seed,
@@ -3943,12 +4349,23 @@ def get_params(self) -> Dict[str, Any]:
         }
 
     def set_params(self, **params) -> "CallawaySantAnna":
-        """Set estimator parameters (sklearn-compatible)."""
+        """Set estimator parameters (sklearn-compatible).
+
+        Mirrors SA pattern at ``sun_abraham.py:2150-2161``: setattr first,
+        then refresh ``_vcov_type_explicit`` if ``vcov_type`` changed.
+        Membership validation of ``vcov_type`` is deferred to next
+        ``fit()`` call (sklearn-style ``set_params`` is documented as
+        mutate-then-validate-at-use). Bad values like
+        ``set_params(vcov_type="hc4")`` surface at the next ``__init__``-
+        style validation call.
+        """
         for key, value in params.items():
             if hasattr(self, key):
                 setattr(self, key, value)
             else:
                 raise ValueError(f"Unknown parameter: {key}")
+        if "vcov_type" in params:
+            self._vcov_type_explicit = self.vcov_type != "hc1"
         return self
 
     def summary(self) -> str:
diff --git a/diff_diff/staggered_bootstrap.py b/diff_diff/staggered_bootstrap.py
index 10f9549f..a97d6ed4 100644
--- a/diff_diff/staggered_bootstrap.py
+++ b/diff_diff/staggered_bootstrap.py
@@ -326,11 +326,36 @@ def _run_multiplier_bootstrap(
             or resolved_survey_unit.fpc is not None
         )
 
+        # When the bootstrap routes through PSU-multiplier weights, the
+        # bootstrap variance is unidentified if there are fewer than 2
+        # PSUs (single-cluster designs collapse all multiplier draws to
+        # constants → ≈0 variance from BLAS roundoff, NOT NaN). Without
+        # this guard, downstream safe_inference would silently produce
+        # tight CIs and near-zero p-values for a variance that's actually
+        # undefined. Capture the flag here and NaN-out all bootstrap
+        # inference surfaces before return (per feedback_no_silent_failures).
+        _bootstrap_cluster_variance_unidentified = False
+
         if _use_survey_bootstrap:
             # PSU-level multiplier weights
             psu_weights, psu_ids = _generate_survey_multiplier_weights_batch(
                 self.n_bootstrap, resolved_survey_unit, self.bootstrap_weights, rng
             )
+            if len(psu_ids) < 2:
+                import warnings as _warnings
+
+                _warnings.warn(
+                    f"CallawaySantAnna bootstrap with survey/cluster design "
+                    f"has only {len(psu_ids)} PSU(s); bootstrap variance is "
+                    "unidentified. All bootstrap inference fields "
+                    "(overall_se, group_time_ses, event_study_ses, "
+                    "group_effect_ses, and their CIs / p-values) will be "
+                    "NaN. Use n_bootstrap=0 (analytical IF variance) or "
+                    "a design with at least 2 PSUs.",
+                    UserWarning,
+                    stacklevel=2,
+                )
+                _bootstrap_cluster_variance_unidentified = True
             # Build unit → PSU column map
             if resolved_survey_unit.psu is not None:
                 unit_psu = resolved_survey_unit.psu
@@ -532,6 +557,28 @@ def _run_multiplier_bootstrap(
                 elif n_valid > 0:
                     cband_crit_value = float(np.quantile(sup_t_dist[finite_mask], 1 - self.alpha))
 
+        # NaN-out all bootstrap inference surfaces when clustered
+        # bootstrap variance is unidentified (G<2 PSUs). See guard
+        # added at the top of the bootstrap weight generation.
+        if _bootstrap_cluster_variance_unidentified:
+            overall_se = np.nan
+            overall_ci = (np.nan, np.nan)
+            overall_p_value = np.nan
+            gt_ses = {gt: np.nan for gt in gt_ses} if gt_ses else gt_ses
+            gt_cis = (
+                {gt: (np.nan, np.nan) for gt in gt_cis} if gt_cis else gt_cis
+            )
+            gt_p_values = {gt: np.nan for gt in gt_p_values} if gt_p_values else gt_p_values
+            if event_study_ses:
+                event_study_ses = {k: np.nan for k in event_study_ses}
+                event_study_cis = {k: (np.nan, np.nan) for k in event_study_cis}
+                event_study_p_values = {k: np.nan for k in event_study_p_values}
+            if group_effect_ses:
+                group_effect_ses = {k: np.nan for k in group_effect_ses}
+                group_effect_cis = {k: (np.nan, np.nan) for k in group_effect_cis}
+                group_effect_p_values = {k: np.nan for k in group_effect_p_values}
+            cband_crit_value = None
+
         return CSBootstrapResults(
             n_bootstrap=self.n_bootstrap,
             weight_type=self.bootstrap_weights,
diff --git a/diff_diff/staggered_results.py b/diff_diff/staggered_results.py
index 02b2cabd..886fdc92 100644
--- a/diff_diff/staggered_results.py
+++ b/diff_diff/staggered_results.py
@@ -95,6 +95,35 @@ class CallawaySantAnnaResults:
         Effects aggregated by treatment cohort.
     pscore_trim : float
         Propensity score trimming bound used during estimation.
+    vcov_type : str
+        Variance type used during estimation. CallawaySantAnna is
+        permanently narrow to ``"hc1"`` — see REGISTRY.md
+        "IF-based variance estimators vs analytical-sandwich estimators"
+        for why analytical-sandwich families don't compose with the
+        per-(g,t) doubly-robust / IPW / outcome-regression structure.
+    cluster_name : str, optional
+        Canonical cluster column. Set to ``survey_design.psu`` when an
+        explicit survey PSU was provided (regardless of bare ``cluster=``),
+        otherwise to ``self.cluster`` when bare cluster synthesizes or
+        injects a PSU. ``None`` when no clustering is active.
+    n_clusters : int, optional
+        Number of unique clusters (PSUs) used for variance estimation.
+        ``None`` when no clustering is active.
+    df_inference : int, optional
+        Cluster-level degrees of freedom for downstream inference (e.g.,
+        ``HonestDiD`` t-critical-value selection) on the bare-``cluster=``
+        synthesize path ONLY (the case where ``survey_metadata`` is
+        intentionally ``None`` to preserve the survey/non-survey contract
+        for ``DiagnosticReport`` / ``summary()``). When the user provides
+        an explicit ``survey_design=`` (inject or conflict branches),
+        ``df_inference`` stays ``None`` and the canonical df carrier is
+        ``survey_metadata.df_survey`` — which holds the actual CS-internal
+        df, including any post-resolve tightening (e.g., the
+        ``overall_effective_df`` recompute for replicate aggregations).
+        ``HonestDiD`` reads ``survey_metadata.df_survey`` first and falls
+        back to ``df_inference`` only when ``survey_metadata`` is absent.
+        Narrow contract prevents HonestDiD from silently overriding a
+        tightened survey df with the original ``resolved_survey.df_survey``.
     """
 
     group_time_effects: Dict[Tuple[Any, Any], Dict[str, Any]]
@@ -137,6 +166,26 @@ class CallawaySantAnnaResults:
     )
     epv_threshold: float = 10
     pscore_fallback: str = "error"
+    # Variance / clustering metadata (PR #XXX — narrow vcov_type contract
+    # + cluster= wiring fix). vcov_type is permanently narrow to "hc1" for
+    # CS per IF-based variance structure (REGISTRY.md). cluster_name +
+    # n_clusters surface the effective clustering level for downstream
+    # introspection and label rendering.
+    vcov_type: str = "hc1"
+    cluster_name: Optional[str] = None
+    n_clusters: Optional[int] = None
+    # df_inference: cluster-level degrees of freedom for downstream
+    # inference, populated on the bare-cluster-synthesize path ONLY.
+    # When the user provides an explicit survey_design= (inject or
+    # conflict branches), df_inference stays None and the canonical df
+    # carrier is survey_metadata.df_survey (which holds the actual
+    # CS-internal df, including any post-resolve tightening via the
+    # overall_effective_df recompute at staggered.py:~1995-1999).
+    # HonestDiD reads survey_metadata.df_survey first and falls back to
+    # df_inference only when survey_metadata is absent. Narrow contract
+    # prevents HonestDiD from silently overriding a tightened survey df
+    # with the original resolved_survey.df_survey.
+    df_inference: Optional[int] = None
 
     # --- Inference-field aliases (balance/external-adapter compatibility) ---
     @property
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index ec7e57ff..36f97e15 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -303,6 +303,56 @@ This matches the behavior of R's `fixest::feols()` with absorbed FE.
 
 # Modern Staggered Estimators
 
+## IF-based variance estimators vs analytical-sandwich estimators
+
+diff-diff houses two structural families for variance computation, and the
+distinction governs which `vcov_type` values an estimator can accept:
+
+**Analytical-sandwich estimators** fit a single (or per-cohort) linear
+regression and derive variance via `solve_ols(..., vcov_type=...)`, returning
+a sandwich `(X'X)^{-1} M (X'X)^{-1}` whose meat `M` is parameterized by
+`vcov_type ∈ {classical, hc1, hc2, hc2_bm}` (plus `conley` for spatial-HAC).
+Examples: `DifferenceInDifferences`, `MultiPeriodDiD`, `TwoWayFixedEffects`,
+`SunAbraham`, `StackedDiD`, `WooldridgeDiD`, `LinearRegression`. The full
+`vcov_type` contract is methodologically applicable because every family has
+a defined interpretation on the hat-matrix-bearing design (HC2 leverage
+`1/(1-h_ii)`, Bell-McCaffrey Satterthwaite DOF, etc.).
+
+**IF-based estimators** derive variance from an asymptotic influence function
+`Var(θ̂) = (1/n) Σ_i ψ_i²` per estimator-specific derivations (Callaway &
+Sant'Anna 2021 for `CallawaySantAnna`; Borusyak-Jaravel-Spiess 2024 for
+`ImputationDiD`; Sant'Anna & Zhao 2020 for `EfficientDiD`). For these:
+
+- `hc1` with `cluster=None` ≡ per-unit IF variance — the default
+  (Williams 2000 form).
+- `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the IF:
+  `Var = (G/(G-1)) Σ_c (Σ_{i∈c} ψ_i)² / n²`. Activated by synthesizing
+  `SurveyDesign(psu=X)` internally and routing through the existing PSU-meat
+  machinery (`_compute_stratified_psu_meat`).
+- `classical`, `hc2`, `hc2_bm` are **N/A** for IF-based estimators —
+  hat-matrix leverage and Bell-McCaffrey Satterthwaite DOF are defined on a
+  single regression's design matrix, and IF-based estimators have no
+  equivalent global hat matrix (they compose per-(g,t) or per-cohort fits
+  with custom IF derivations). Rejected at `__init__` with
+  methodology-rooted messages.
+- `conley` (spatial-HAC) — could conceptually apply to the IF (spatial
+  aggregation of per-unit IFs) but requires separate methodology work;
+  deferred.
+
+This split is a structural property of the estimator's variance derivation,
+not a missing feature. The `vcov_type` input contract for IF-based estimators
+is **permanently narrow** at `{"hc1"}`. Enforced today on
+`CallawaySantAnna`; the same narrow contract is expected when
+`ImputationDiD` and `EfficientDiD` reach `vcov_type` threading.
+
+**Note:** This routing is a documented synthesis: the
+`SurveyDesign(psu=...)` synthesis is the new wiring; the downstream
+PSU-meat machinery (`_compute_stratified_psu_meat`) is the established
+survey-side path; the CR1 Liang-Zeger algebra on IF is Williams (2000) /
+Hansen (2007). No new methodology is introduced.
+
+---
+
 ## CallawaySantAnna
 
 **Primary source:** [Callaway, B., & Sant'Anna, P.H.C. (2021). Difference-in-Differences with multiple time periods. *Journal of Econometrics*, 225(2), 200-230.](https://doi.org/10.1016/j.jeconom.2020.12.001)
@@ -315,6 +365,18 @@ This matches the behavior of R's `fixest::feols()` with absorbed FE.
 - Limited pre-treatment periods reduce ability to test parallel trends
 - **Note:** The analytical SE paths call `_safe_inv()` on the propensity-score Hessian (`H_psi`) and outcome-regression bread (`X'WX`) across every `(g, t)` cell. When these matrices are rank deficient, `np.linalg.solve` raises `LinAlgError` and `_safe_inv()` falls back to `np.linalg.lstsq`. Previously silent; now `fit()` emits ONE aggregate `UserWarning` at the end of the fit reporting the number of fallbacks and the max condition number, so a rank-deficient analytical SE path can't quietly ship degraded standard errors. Sibling of axis-A finding #17 in the Phase 2 silent-failures audit.
 
+*Variance families (`vcov_type`, IF-based):*
+- `hc1` (default, only accepted value) — per-unit IF variance per Callaway & Sant'Anna (2021) when `cluster=None`; cluster-robust CR1 Liang-Zeger on the IF when `cluster=X` is set (synthesizes `SurveyDesign(psu=X)` internally, threading through the same PSU machinery as explicit survey designs). When `survey_design=SurveyDesign(psu=Y)` is provided, the explicit PSU takes precedence; if `cluster=X` is also set with a different partition, emits a `UserWarning` (PSU wins).
+- `classical`, `hc2`, `hc2_bm`, `conley` — REJECTED at `__init__`. The rejection is **library-architectural, not paper-prescribed**: analytical-sandwich variance families (`classical`, `hc2`, `hc2_bm`) are defined on a single regression's hat matrix, and CS's per-(g,t) doubly-robust / IPW / outcome-regression structure has no equivalent single design matrix to compute hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF on. Spatial-HAC (`conley`) likewise has no defined composition with per-unit IF aggregation today. See ["IF-based variance estimators vs analytical-sandwich estimators"](#if-based-variance-estimators-vs-analytical-sandwich-estimators) above for the structural taxonomy.
+
+*Cluster wiring:*
+Prior to the bare-`cluster=` wiring fix, `CallawaySantAnna(cluster="X")` was a silent no-op — the parameter was stored at `__init__` but never consumed in the fit / aggregator / bootstrap pipeline (users got per-unit IF variance silently, even when they explicitly set `cluster="state"`). The fix synthesizes a minimal `SurveyDesign(psu=X, weight_type="pweight")` when bare `cluster=` is set without an explicit survey design, threading the synthesized PSU through the existing `_compute_stratified_psu_meat` aggregator (`staggered_aggregation.py:735-749`) and PSU-level multiplier bootstrap (`staggered_bootstrap.py:323-347`). Three-branch wiring at `staggered.py:~1500`:
+  1. Bare `cluster=X` + no `survey_design` → synthesize `SurveyDesign(psu=X)`; refit `_resolve_survey_for_fit` on synthetic; `effective_survey_design = synthetic` so `_validate_unit_constant_survey` runs on it (preventing first-value-wins collapse for movers on panel data).
+  2. `survey_design` without PSU + `cluster=X` → call `_inject_cluster_as_psu(resolved_survey, cluster_ids)`.
+  3. `survey_design` with PSU + `cluster=X` → PSU wins; `_resolve_effective_cluster` emits `UserWarning` if partitions differ.
+
+The `cluster_name` and `n_clusters` fields on `CallawaySantAnnaResults` report the effective clustering level: `survey_design.psu` (canonical column) when explicit PSU is provided, `self.cluster` when bare cluster synthesizes or injects.
+
 *Estimator equation (as implemented):*
 
 Group-time average treatment effect:
diff --git a/tests/test_staggered.py b/tests/test_staggered.py
index f7bedc3e..46bbaefc 100644
--- a/tests/test_staggered.py
+++ b/tests/test_staggered.py
@@ -4172,9 +4172,7 @@ class TestCallawaySantAnnaSafeInvFallback:
     def test_collinear_covariates_emit_safe_inv_warning(self):
         """Perfectly collinear covariates should trigger the aggregate
         `_safe_inv` lstsq-fallback warning across analytical SE paths."""
-        data = generate_staggered_data(
-            n_units=150, n_periods=6, n_cohorts=3, seed=55
-        )
+        data = generate_staggered_data(n_units=150, n_periods=6, n_cohorts=3, seed=55)
         rng = np.random.default_rng(0)
         # Add a covariate and a redundant (collinear) copy — forces rank-
         # deficient X'WX in the OR bread and the PS Hessian within at
@@ -4196,7 +4194,8 @@ def test_collinear_covariates_emit_safe_inv_warning(self):
                 covariates=["x1", "x2"],
             )
         fallback_warnings = [
-            w for w in caught
+            w
+            for w in caught
             if "Rank-deficient matrix encountered" in str(w.message)
             and "analytical SE paths" in str(w.message)
         ]
@@ -4209,9 +4208,7 @@ def test_collinear_covariates_emit_safe_inv_warning(self):
     def test_well_conditioned_no_safe_inv_warning(self):
         """Clean data should NOT trigger the aggregate warning —
         regression-safety for the happy path."""
-        data = generate_staggered_data(
-            n_units=200, n_periods=6, n_cohorts=3, seed=42
-        )
+        data = generate_staggered_data(n_units=200, n_periods=6, n_cohorts=3, seed=42)
         cs = CallawaySantAnna(estimation_method="dr")
         with warnings.catch_warnings(record=True) as caught:
             warnings.simplefilter("always")
@@ -4223,7 +4220,8 @@ def test_well_conditioned_no_safe_inv_warning(self):
                 first_treat="first_treat",
             )
         fallback_warnings = [
-            w for w in caught
+            w
+            for w in caught
             if "Rank-deficient matrix encountered" in str(w.message)
             and "analytical SE paths" in str(w.message)
         ]
@@ -4231,3 +4229,1002 @@ def test_well_conditioned_no_safe_inv_warning(self):
             f"Unexpected _safe_inv fallback warning on clean data: "
             f"{[str(w.message) for w in fallback_warnings]}"
         )
+
+
+def _generate_clustered_staggered_data(
+    n_clusters: int = 20,
+    units_per_cluster: int = 5,
+    n_periods: int = 8,
+    cluster_effect_sd: float = 3.0,
+    seed: int = 7,
+) -> pd.DataFrame:
+    """
+    Generate a staggered panel with strong intra-cluster correlation.
+
+    Each "state" cluster contributes a shared random effect to every
+    unit within it, so cluster-robust SE should differ measurably from
+    per-unit IF SE. Required for the assertive cluster-wiring tests
+    (per ``feedback_homogeneous_dgp_no_twfe_bias`` — homogeneous DGPs
+    produce zero divergence and can't distinguish wired from no-op).
+    """
+    rng = np.random.default_rng(seed)
+    n_units = n_clusters * units_per_cluster
+    state_ids = np.repeat(np.arange(n_clusters), units_per_cluster)
+    cluster_effects = rng.normal(0.0, cluster_effect_sd, n_clusters)
+
+    cohort_choices = [0, 3, 5, 7]  # 0 = never-treated
+    first_treat = rng.choice(cohort_choices, size=n_units, p=[0.4, 0.2, 0.2, 0.2])
+
+    rows = []
+    for u in range(n_units):
+        s = state_ids[u]
+        ft = first_treat[u]
+        for t in range(1, n_periods + 1):
+            y = (
+                cluster_effects[s]
+                + 0.5 * (t - 1)
+                + (2.0 if (ft > 0 and t >= ft) else 0.0)
+                + rng.normal(0.0, 0.5)
+            )
+            rows.append(
+                {
+                    "unit": u,
+                    "state": int(s),
+                    "time": t,
+                    "first_treat": int(ft),
+                    "outcome": y,
+                }
+            )
+    return pd.DataFrame(rows)
+
+
+class TestCallawaySantAnnaClusterWiring:
+    """Cluster wiring fix: bare ``cluster=`` activates cluster-robust IF.
+
+    Prior to PR fix, ``CS(cluster="state").fit(...)`` accepted the
+    parameter but never consumed it — silent unit-level inference. These
+    tests pin the fix: bare cluster= synthesizes ``SurveyDesign(psu=X)``
+    and routes through the existing PSU-meat machinery.
+    """
+
+    def test_cluster_robust_ses_differ_from_unit_level(self):
+        """Assertive: cluster=state SE differs from cluster=None SE
+        on a panel with intra-cluster correlation. This is the
+        regression test that pins the silent no-op fix."""
+        data = _generate_clustered_staggered_data(seed=7)
+
+        cs_unit = CallawaySantAnna()
+        res_unit = cs_unit.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+
+        cs_cluster = CallawaySantAnna(cluster="state")
+        res_cluster = cs_cluster.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+
+        assert np.isfinite(res_unit.overall_se) and res_unit.overall_se > 0
+        assert np.isfinite(res_cluster.overall_se) and res_cluster.overall_se > 0
+        assert abs(res_unit.overall_se - res_cluster.overall_se) > 1e-6, (
+            f"cluster=state SE ({res_cluster.overall_se:.6f}) is "
+            f"effectively identical to cluster=None SE "
+            f"({res_unit.overall_se:.6f}) — the cluster= parameter "
+            "may not be wired through to the variance machinery."
+        )
+
+    def test_bare_cluster_synthesizes_survey_design(self):
+        """bare cluster= populates Results.cluster_name and n_clusters."""
+        data = _generate_clustered_staggered_data(seed=11)
+        cs = CallawaySantAnna(cluster="state")
+        res = cs.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        assert res.cluster_name == "state"
+        assert res.n_clusters is not None and res.n_clusters > 0
+        assert res.vcov_type == "hc1"
+
+    def test_survey_design_psu_overrides_cluster_warns(self):
+        """survey_design.psu wins over bare cluster=; UserWarning fires
+        if partitions differ; cluster_name reflects the canonical PSU."""
+        from diff_diff import SurveyDesign
+
+        data = _generate_clustered_staggered_data(n_clusters=20, units_per_cluster=5, seed=13)
+        # Add a coarser "region" partition: 2 regions, each with 10 states.
+        data["region"] = data["state"] // 10
+
+        cs = CallawaySantAnna(cluster="state")
+        with warnings.catch_warnings(record=True) as caught:
+            warnings.simplefilter("always")
+            res = cs.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+                survey_design=SurveyDesign(psu="region"),
+            )
+        partition_warnings = [
+            w
+            for w in caught
+            if "psu" in str(w.message).lower()
+            or "partition" in str(w.message).lower()
+            or "different groupings" in str(w.message).lower()
+        ]
+        assert len(partition_warnings) > 0, (
+            f"Expected UserWarning about psu/partition mismatch; "
+            f"caught: {[str(w.message) for w in caught]}"
+        )
+        # Canonical PSU column wins
+        assert res.cluster_name == "region"
+
+    def test_survey_design_without_psu_plus_cluster_injects(self):
+        """survey_design without psu + cluster=X injects cluster as PSU.
+        cluster_name reflects the bare cluster (no explicit PSU to win)."""
+        from diff_diff import SurveyDesign
+
+        data = _generate_clustered_staggered_data(seed=17)
+        data["wt"] = 1.0  # uniform weights
+
+        cs = CallawaySantAnna(cluster="state")
+        res = cs.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            survey_design=SurveyDesign(weights="wt"),
+        )
+        assert res.cluster_name == "state"
+        assert res.n_clusters is not None and res.n_clusters > 0
+
+    def test_cluster_none_path_unchanged(self):
+        """cluster=None path: no wiring, no cluster metadata in Results.
+        Verifies the wiring guard ``if self.cluster is not None:`` prevents
+        the wiring block from firing when cluster is not set."""
+        data = _generate_clustered_staggered_data(seed=19)
+        cs = CallawaySantAnna()  # cluster=None default
+        res = cs.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        assert res.cluster_name is None
+        assert res.n_clusters is None
+        assert res.vcov_type == "hc1"
+        assert np.isfinite(res.overall_se) and res.overall_se > 0
+
+    def test_invalid_cluster_column_raises(self):
+        """cluster=<nonexistent_col> raises ValueError with column name."""
+        data = _generate_clustered_staggered_data(seed=23)
+        cs = CallawaySantAnna(cluster="nonexistent_col")
+        with pytest.raises(ValueError, match="cluster column"):
+            cs.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+            )
+
+    def test_cluster_nan_raises_with_cluster_domain_message(self):
+        """cluster column with NaN raises ValueError citing 'cluster'
+        (not 'PSU') — verifies the cluster-domain pre-validator fires
+        BEFORE synthesis, so the error message refers to the right API."""
+        data = _generate_clustered_staggered_data(seed=29)
+        data.loc[0, "state"] = np.nan
+        cs = CallawaySantAnna(cluster="state")
+        with pytest.raises(ValueError, match="cluster column"):
+            cs.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+            )
+
+    def test_bare_cluster_works_with_panel_false_rcs(self):
+        """RCS coverage: panel=False + cluster=state produces clustered SE
+        that differs from cluster=None SE. Closes RCS coverage gap from
+        plan review."""
+        # Build a repeated cross-section: each obs is a distinct unit,
+        # but obs share state-level clusters.
+        rng = np.random.default_rng(31)
+        n_states = 15
+        obs_per_period = 60
+        n_periods = 6
+        state_effects = rng.normal(0.0, 3.0, n_states)
+        rows = []
+        next_unit = 0
+        for t in range(1, n_periods + 1):
+            for _ in range(obs_per_period):
+                s = int(rng.integers(0, n_states))
+                ft = int(rng.choice([0, 3, 5], p=[0.4, 0.3, 0.3]))
+                y = (
+                    state_effects[s]
+                    + 0.3 * (t - 1)
+                    + (1.5 if (ft > 0 and t >= ft) else 0.0)
+                    + rng.normal(0.0, 0.5)
+                )
+                rows.append(
+                    {
+                        "unit": next_unit,
+                        "state": s,
+                        "time": t,
+                        "first_treat": ft,
+                        "outcome": y,
+                    }
+                )
+                next_unit += 1
+        data = pd.DataFrame(rows)
+
+        cs_unit = CallawaySantAnna(panel=False)
+        res_unit = cs_unit.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        cs_cluster = CallawaySantAnna(panel=False, cluster="state")
+        res_cluster = cs_cluster.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        assert np.isfinite(res_unit.overall_se) and res_unit.overall_se > 0
+        assert np.isfinite(res_cluster.overall_se) and res_cluster.overall_se > 0
+        assert abs(res_unit.overall_se - res_cluster.overall_se) > 1e-6, (
+            "RCS path: cluster=state SE not measurably different from "
+            "cluster=None SE — cluster wiring may not reach the RCS code path."
+        )
+
+
+class TestCallawaySantAnnaVcovTypeNarrowContract:
+    """Narrow vcov_type contract: CS accepts {hc1} only; rejects
+    analytical-sandwich families and conley with methodology-rooted
+    messages."""
+
+    def test_default_vcov_type_is_hc1(self):
+        cs = CallawaySantAnna()
+        assert cs.vcov_type == "hc1"
+
+    def test_classical_rejected_at_init(self):
+        with pytest.raises(ValueError, match="influence-function"):
+            CallawaySantAnna(vcov_type="classical")
+
+    def test_hc2_rejected_at_init(self):
+        with pytest.raises(ValueError, match="hat matrix"):
+            CallawaySantAnna(vcov_type="hc2")
+
+    def test_hc2_bm_rejected_at_init(self):
+        with pytest.raises(ValueError, match="Bell-McCaffrey"):
+            CallawaySantAnna(vcov_type="hc2_bm")
+
+    def test_conley_rejected_at_init(self):
+        with pytest.raises(ValueError, match="(conley|spatial-HAC)"):
+            CallawaySantAnna(vcov_type="conley")
+
+    def test_unknown_vcov_type_rejected(self):
+        with pytest.raises(ValueError, match="hc4"):
+            CallawaySantAnna(vcov_type="hc4")
+
+    def test_get_params_includes_vcov_type(self):
+        cs = CallawaySantAnna()
+        params = cs.get_params()
+        assert "vcov_type" in params
+        assert params["vcov_type"] == "hc1"
+
+    def test_set_params_bad_vcov_caught_at_fit_time(self):
+        """set_params is strict-mirror SA (no atomic validation), but
+        fit() re-validates so a bad set_params(vcov_type='hc4')
+        surfaces a clear error at fit-time rather than silently
+        propagating a bad value to Results metadata."""
+        cs = CallawaySantAnna()
+        # set_params succeeds (sklearn-style mutate-then-validate-at-use)
+        cs.set_params(vcov_type="hc4")
+        assert cs.vcov_type == "hc4"
+        # fit() re-validates and raises
+        data = _generate_clustered_staggered_data(seed=37)
+        with pytest.raises(ValueError, match="hc4"):
+            cs.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+            )
+
+    def test_results_carries_vcov_type(self):
+        data = _generate_clustered_staggered_data(seed=41)
+        cs = CallawaySantAnna()
+        res = cs.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        assert res.vcov_type == "hc1"
+
+    def test_fit_clone_idempotent_on_vcov_type(self):
+        """get_params + reconstruct + refit produces same SE."""
+        data = _generate_clustered_staggered_data(seed=43)
+        cs1 = CallawaySantAnna(cluster="state")
+        res1 = cs1.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        cs2 = CallawaySantAnna(**cs1.get_params())
+        res2 = cs2.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        assert res1.overall_se == pytest.approx(res2.overall_se, rel=0, abs=0)
+        assert res1.vcov_type == res2.vcov_type == "hc1"
+        assert res1.cluster_name == res2.cluster_name == "state"
+
+
+class TestCallawaySantAnnaClusterSafetyGates:
+    """Safety gates for the cluster= wiring fix added in response to local
+    AI review findings (panel-mover validation, replicate-weight rejection,
+    df_survey propagation to HonestDiD via survey_metadata)."""
+
+    def test_inject_branch_panel_mover_raises(self):
+        """survey_design without PSU + cluster=X where a unit changes
+        cluster across periods (a 'mover') must raise via the unit-
+        constancy validator. The validator must see the injected cluster
+        column — earlier versions ran the validator on the user-provided
+        survey_design (no PSU), missing the mover entirely."""
+        from diff_diff import SurveyDesign
+
+        data = _generate_clustered_staggered_data(seed=61)
+        data["wt"] = 1.0
+        # Force unit 0 to be a mover: assign it to a different state in the
+        # later half of the panel.
+        unit_0_late_mask = (data["unit"] == 0) & (data["time"] >= 5)
+        original_state_for_unit_0 = data.loc[data["unit"] == 0, "state"].iloc[0]
+        mover_target_state = (int(original_state_for_unit_0) + 1) % 20
+        data.loc[unit_0_late_mask, "state"] = mover_target_state
+
+        cs = CallawaySantAnna(cluster="state")
+        with pytest.raises((ValueError, RuntimeError), match="(unit|constant|invariant)"):
+            cs.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+                survey_design=SurveyDesign(weights="wt"),
+            )
+
+    def test_replicate_weight_plus_cluster_rejected(self):
+        """SurveyDesign(replicate_weights=[...]) + cluster=X must raise
+        NotImplementedError. Replicate-weight variance ignores PSU entirely,
+        so honoring bare cluster= would silently have no effect on the
+        variance estimate while populating cluster_name/n_clusters
+        dishonestly. Fail-closed per feedback_no_silent_failures."""
+        from diff_diff import SurveyDesign
+
+        data = _generate_clustered_staggered_data(seed=67)
+        data["wt"] = 1.0
+        # Add 4 BRR replicate weights (R survey package convention).
+        for r in range(1, 5):
+            data[f"repwt_{r}"] = 1.0
+
+        cs = CallawaySantAnna(cluster="state")
+        with pytest.raises(NotImplementedError, match="replicate"):
+            cs.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+                survey_design=SurveyDesign(
+                    weights="wt",
+                    replicate_weights=["repwt_1", "repwt_2", "repwt_3", "repwt_4"],
+                    replicate_method="BRR",
+                ),
+            )
+
+    def test_bare_cluster_populates_df_inference(self):
+        """Bare cluster= must populate Results.df_inference so downstream
+        consumers (e.g., HonestDiD at honest_did.py:~652) see the cluster-
+        level df rather than silently reverting to normal-theory critical
+        values. df_inference is the canonical carrier — survey_metadata is
+        for user-provided SurveyDesign only (see
+        test_bare_cluster_does_not_set_survey_metadata for the other half
+        of the contract)."""
+        data = _generate_clustered_staggered_data(seed=71)
+        cs = CallawaySantAnna(cluster="state")
+        res = cs.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        assert res.df_inference is not None and res.df_inference > 0, (
+            f"Bare cluster= must populate Results.df_inference with a "
+            f"positive integer; got {res.df_inference!r}."
+        )
+        # df_inference must equal n_clusters - 1 for the PSU-only design
+        assert res.n_clusters is not None
+        assert res.df_inference == res.n_clusters - 1, (
+            f"df_inference ({res.df_inference}) must equal n_clusters - 1 "
+            f"({res.n_clusters - 1}) for PSU-only synthesized designs."
+        )
+
+    def test_bare_cluster_does_not_set_survey_metadata(self):
+        """Bare cluster= must NOT populate Results.survey_metadata. The
+        user did not provide a SurveyDesign, so downstream consumers that
+        check ``survey_metadata is not None`` for 'original fit used a
+        survey design' must continue to see a non-survey fit. Affected
+        consumers: DiagnosticReport at diagnostic_report.py:848-856 +
+        1150-1158 (Bacon decomp + 2x2 PT skip); CallawaySantAnnaResults.
+        summary() at staggered_results.py:235-238 (survey block render).
+        df_inference carries cluster df separately (see
+        test_bare_cluster_populates_df_inference)."""
+        data = _generate_clustered_staggered_data(seed=73)
+        cs = CallawaySantAnna(cluster="state")
+        res = cs.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        assert res.survey_metadata is None, (
+            "Bare cluster= must NOT populate survey_metadata — that field "
+            "is reserved for user-provided SurveyDesign. Setting it on a "
+            "non-survey fit would cause DiagnosticReport to skip checks "
+            "with 'Original fit used a survey design' and summary() to "
+            "print a misleading survey block."
+        )
+
+    def test_explicit_survey_design_does_populate_survey_metadata(self):
+        """Counterpart to test_bare_cluster_does_not_set_survey_metadata:
+        when user provides a real SurveyDesign, survey_metadata IS
+        populated (regardless of bare cluster= status). Verifies the
+        'inject' branch path: SurveyDesign(weights=...) + cluster=X →
+        survey_metadata populated; df_inference stays None per the
+        narrowed contract (canonical df carrier when survey_metadata is
+        present is survey_metadata.df_survey, which holds CS-internal
+        post-resolve-tightened df). HonestDiD reads survey_metadata
+        first, df_inference only as fallback."""
+        from diff_diff import SurveyDesign
+
+        data = _generate_clustered_staggered_data(seed=75)
+        data["wt"] = 1.0
+        cs = CallawaySantAnna(cluster="state")
+        res = cs.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            survey_design=SurveyDesign(weights="wt"),
+        )
+        assert (
+            res.survey_metadata is not None
+        ), "User-provided SurveyDesign must populate survey_metadata."
+        # df_inference is NARROWED to bare-cluster-synthesize path only:
+        # when survey_metadata is populated, df_inference stays None and
+        # HonestDiD reads df_survey directly from survey_metadata (which
+        # carries the actual CS-internal df, post-recompute). Prevents
+        # HonestDiD from reading a stale/wrong df_inference when CS's
+        # internal df was tightened post-resolve. See honest_did.py:
+        # _extract_event_study_params preference order: survey_metadata
+        # first, df_inference fallback.
+        assert res.df_inference is None, (
+            "Inject/conflict branches must leave df_inference=None — "
+            "survey_metadata.df_survey is the canonical df carrier when "
+            "a survey design is present."
+        )
+        sm_df = getattr(res.survey_metadata, "df_survey", None)
+        assert sm_df is not None and sm_df > 0, (
+            "survey_metadata.df_survey must be populated when an explicit "
+            "SurveyDesign is provided."
+        )
+
+    def test_bare_cluster_honest_did_uses_df_inference(self):
+        """End-to-end integration: HonestDiD.fit() on a bare-cluster CS
+        result must pick up the cluster-level df via df_inference (not
+        revert to normal-theory critical values). A future refactor that
+        stops honoring df_inference in honest_did.py would silently fall
+        back to z-critical values for clustered CS fits without failing
+        the simpler results-object-contract tests. This test pins the
+        end-to-end behavior. Per the R3 codex finding."""
+        from diff_diff.honest_did import HonestDiD
+
+        data = _generate_clustered_staggered_data(seed=79)
+        cs = CallawaySantAnna(cluster="state", base_period="universal")
+        cs_res = cs.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            aggregate="event_study",
+        )
+
+        # Sanity: the CS fit populated df_inference but not survey_metadata
+        assert cs_res.df_inference is not None and cs_res.df_inference > 0
+        assert cs_res.survey_metadata is None, (
+            "Pre-condition for this test: bare cluster= must NOT populate "
+            "survey_metadata. If this fails, the survey/non-survey "
+            "contract regressed (see test_bare_cluster_does_not_set_survey_metadata)."
+        )
+
+        # Run HonestDiD; assert it threads df_inference into the returned df_survey
+        honest = HonestDiD(method="relative_magnitude", M=1.0)
+        honest_res = honest.fit(cs_res)
+
+        assert honest_res.df_survey is not None, (
+            "HonestDiD must preserve the cluster df from CS's df_inference. "
+            "Reading None means it silently reverted to normal-theory "
+            "critical values — the contract this test exists to guard."
+        )
+        assert int(honest_res.df_survey) == int(cs_res.df_inference), (
+            f"HonestDiDResults.df_survey ({honest_res.df_survey}) must "
+            f"equal CS Results.df_inference ({cs_res.df_inference}). "
+            "A divergence here means df_inference is not being threaded "
+            "through honest_did.py's _extract_event_study_params."
+        )
+
+    def test_bare_cluster_bootstrap_se_differs_from_unit_level(self):
+        """Bootstrap path coverage: bare cluster= must route bootstrap
+        through the PSU-level multiplier-weights branch at
+        staggered_bootstrap.py:323-347 (synthesized SurveyDesign(psu=
+        cluster) sets resolved_survey.psu, triggering the survey-PSU
+        bootstrap path). Without the fix, bootstrap drew per-unit weights
+        regardless of self.cluster — same class of silent no-op as the
+        analytical path. Per CI codex R1 P3 finding."""
+        data = _generate_clustered_staggered_data(seed=83)
+
+        # Low n_bootstrap for speed; assertion bands wide enough for stochasticity
+        cs_unit = CallawaySantAnna(n_bootstrap=99, seed=83)
+        res_unit = cs_unit.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        cs_cluster = CallawaySantAnna(cluster="state", n_bootstrap=99, seed=83)
+        res_cluster = cs_cluster.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        assert np.isfinite(res_unit.overall_se) and res_unit.overall_se > 0
+        assert np.isfinite(res_cluster.overall_se) and res_cluster.overall_se > 0
+        assert abs(res_unit.overall_se - res_cluster.overall_se) > 1e-6, (
+            f"Bootstrap path: cluster=state SE ({res_cluster.overall_se:.6f}) "
+            f"is effectively identical to cluster=None SE "
+            f"({res_unit.overall_se:.6f}) — the cluster= parameter may "
+            "not be reaching the bootstrap multiplier-weights routing."
+        )
+
+    def test_per_gt_analytical_se_changes_with_cluster(self):
+        """Per-(g,t) analytical SE at results.group_time_effects[(g,t)]
+        ["se"] must change when cluster= is set (mirrors the overall_se
+        contract). Pre-fix, per-(g,t) SEs were unit-level even with
+        cluster=, only the aggregate path + bootstrap honored cluster=.
+        Per CI codex R3 P0 finding."""
+        data = _generate_clustered_staggered_data(seed=97)
+
+        cs_unit = CallawaySantAnna()
+        res_unit = cs_unit.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+        cs_cluster = CallawaySantAnna(cluster="state")
+        res_cluster = cs_cluster.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+
+        # Pick a representative (g, t) cell that exists in both fits
+        gt_keys = sorted(
+            set(res_unit.group_time_effects.keys()) & set(res_cluster.group_time_effects.keys())
+        )
+        assert len(gt_keys) > 0, "expected overlapping (g, t) keys"
+
+        # At least one (g, t) cell must show measurable SE divergence —
+        # cluster-aware aggregation should differ from unit-level for at
+        # least one cell on a panel with intra-cluster correlation.
+        diffs = []
+        for gt in gt_keys:
+            se_unit = res_unit.group_time_effects[gt]["se"]
+            se_cluster = res_cluster.group_time_effects[gt]["se"]
+            if np.isfinite(se_unit) and np.isfinite(se_cluster):
+                diffs.append(abs(se_unit - se_cluster))
+        max_diff = max(diffs) if diffs else 0.0
+        assert max_diff > 1e-6, (
+            f"Per-(g,t) SEs did not change with cluster= (max diff "
+            f"across {len(diffs)} cells: {max_diff:.6g}). The cluster= "
+            "parameter may not be reaching the per-(g,t) analytical SE "
+            "computation."
+        )
+
+    def test_per_gt_se_matches_explicit_survey_design(self):
+        """When bare cluster=X and explicit SurveyDesign(psu=X) produce
+        equivalent variance contracts, the per-(g,t) SE surface must
+        also agree (modulo the deterministic synthesis path). Per CI
+        codex R3 P0 finding."""
+        from diff_diff import SurveyDesign
+
+        data = _generate_clustered_staggered_data(seed=101)
+
+        cs_bare = CallawaySantAnna(cluster="state")
+        res_bare = cs_bare.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+
+        cs_explicit = CallawaySantAnna()
+        res_explicit = cs_explicit.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            survey_design=SurveyDesign(psu="state"),
+        )
+
+        gt_keys = sorted(
+            set(res_bare.group_time_effects.keys()) & set(res_explicit.group_time_effects.keys())
+        )
+        assert len(gt_keys) > 0
+
+        for gt in gt_keys:
+            se_bare = res_bare.group_time_effects[gt]["se"]
+            se_explicit = res_explicit.group_time_effects[gt]["se"]
+            if np.isfinite(se_bare) and np.isfinite(se_explicit):
+                assert se_bare == pytest.approx(se_explicit, rel=1e-10, abs=1e-12), (
+                    f"Per-(g,t) SE divergence at {gt}: bare cluster=state "
+                    f"({se_bare}) vs explicit SurveyDesign(psu=state) "
+                    f"({se_explicit}). Both should activate the same CR1 "
+                    "aggregation."
+                )
+
+    def test_per_gt_se_matches_compute_survey_if_variance_helper(self):
+        """The per-(g,t) cluster-aware SE must use the SAME design-based
+        variance machinery as the aggregate path
+        (compute_survey_if_variance / _compute_stratified_psu_meat) —
+        applying G/(G-1) finite-sample correction, PSU centering, and
+        lonely-PSU handling uniformly. Compares per-cell SE against the
+        shared helper on a small-G design (so the finite-sample
+        correction is non-trivial). Per CI codex R4 P1/P2 findings."""
+        from diff_diff.staggered import _cluster_robust_se_from_per_gt_if
+        from diff_diff.survey import (
+            SurveyDesign,
+            _resolve_survey_for_fit,
+            compute_survey_if_variance,
+        )
+
+        # 10 PSUs (states), 4 units each = 40 units total (small-G)
+        n_clusters = 10
+        units_per_cluster = 4
+        n_units = n_clusters * units_per_cluster
+        state_ids = np.repeat(np.arange(n_clusters), units_per_cluster)
+        unit_data = pd.DataFrame({"unit": np.arange(n_units), "state": state_ids})
+
+        synthetic = SurveyDesign(psu="state", weight_type="pweight")
+        rsu, _, _, _ = _resolve_survey_for_fit(synthetic, unit_data, "analytical")
+        assert rsu is not None
+        assert rsu.psu is not None and len(rsu.psu) == n_units
+
+        # Hand-crafted per-(g,t) IF: 5 treated + 10 control units in this cell
+        rng = np.random.default_rng(7)
+        treated_idx = np.arange(0, 5)
+        control_idx = np.arange(5, 15)
+        treated_inf = rng.normal(0.0, 0.1, 5)
+        control_inf = rng.normal(0.0, 0.1, 10)
+        inf_info = {
+            "treated_idx": treated_idx,
+            "control_idx": control_idx,
+            "treated_inf": treated_inf,
+            "control_inf": control_inf,
+        }
+
+        # Helper output (function under test)
+        se_helper = _cluster_robust_se_from_per_gt_if(inf_info, rsu)
+        assert se_helper is not None
+        assert np.isfinite(se_helper) and se_helper > 0
+
+        # Direct reconstruction via compute_survey_if_variance must agree
+        # exactly — verifies the helper routes through the shared
+        # G/(G-1) + PSU centering + FPC machinery, not a bespoke formula.
+        psi_per_unit = np.zeros(n_units)
+        np.add.at(psi_per_unit, treated_idx, treated_inf)
+        np.add.at(psi_per_unit, control_idx, control_inf)
+        var_reference = compute_survey_if_variance(psi_per_unit, rsu)
+        se_reference = float(np.sqrt(var_reference))
+
+        assert se_helper == pytest.approx(se_reference, rel=0, abs=0), (
+            f"Per-(g,t) SE helper ({se_helper}) must equal "
+            f"compute_survey_if_variance reconstruction ({se_reference}) "
+            "— any divergence means the helper bypasses the shared "
+            "G/(G-1) finite-sample correction + PSU centering machinery."
+        )
+
+    def test_per_gt_se_propagates_nan_when_cluster_variance_undefined(self):
+        """When clustered design-based variance is undefined (e.g., G=1
+        — single cluster, no within-PSU variability), the per-(g,t) SE
+        must propagate NaN through the full inference surface (se,
+        t_stat, p_value, conf_int) instead of silently falling back to
+        the unit-level SE. Verifies the helper's NaN-propagation
+        contract end-to-end on a fit. Per CI codex R5 P1/P2 findings."""
+        # Build a panel where all units belong to a single cluster.
+        # compute_survey_if_variance returns NaN for G<2 designs (lonely
+        # PSU removed or single-cluster) — the per-cell helper must
+        # propagate this NaN rather than retain the unit-level SE.
+        data = _generate_clustered_staggered_data(n_clusters=2, units_per_cluster=10, seed=109)
+        # Force ALL units into a single cluster (G=1)
+        data["single_cluster"] = 0
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")  # lonely-PSU warnings are expected
+            cs = CallawaySantAnna(cluster="single_cluster")
+            res = cs.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+            )
+
+        # At least one (g, t) cell should have NaN inference under the
+        # undefined-variance contract. If ALL cells retain finite SE, the
+        # helper is silently falling back to unit-level on the NaN branch.
+        nan_cells = [gt for gt, eff in res.group_time_effects.items() if not np.isfinite(eff["se"])]
+        assert len(nan_cells) > 0, (
+            "Expected at least one (g, t) cell with NaN SE under G=1 "
+            "(undefined clustered variance), but all cells retained "
+            "finite unit-level SE — the helper's NaN-propagation "
+            "contract is broken (cells silently fall back to unit-level)."
+        )
+
+        # For each NaN-SE cell, the full inference surface must be NaN
+        # (matches the safe_inference contract for non-finite SE).
+        for gt in nan_cells:
+            eff = res.group_time_effects[gt]
+            assert np.isnan(eff["se"]), f"{gt}: se should be NaN"
+            assert np.isnan(eff["t_stat"]), f"{gt}: t_stat should be NaN"
+            assert np.isnan(eff["p_value"]), f"{gt}: p_value should be NaN"
+            ci_lo, ci_hi = eff["conf_int"]
+            assert np.isnan(ci_lo) and np.isnan(
+                ci_hi
+            ), f"{gt}: CI bounds should both be NaN, got ({ci_lo}, {ci_hi})"
+
+    def test_bare_cluster_bootstrap_propagates_nan_when_g_less_than_2(self):
+        """Bootstrap path NaN propagation: when bare cluster= produces
+        G=1 (single cluster), the PSU-multiplier-weights bootstrap path
+        at bootstrap_utils.py:557-562 returns zero PSU multipliers and
+        the downstream zero-SE guards at :365-377/:472-485 must NaN-out
+        the full bootstrap inference surface (overall_se, per-(g,t),
+        aggregate). Per CI codex R7 P3 finding."""
+        data = _generate_clustered_staggered_data(n_clusters=2, units_per_cluster=10, seed=113)
+        data["single_cluster"] = 0  # Force G=1
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")  # lonely-PSU + low-n_bootstrap warnings expected
+            cs = CallawaySantAnna(cluster="single_cluster", n_bootstrap=99, seed=113)
+            res = cs.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+                aggregate="event_study",
+            )
+
+        # Overall bootstrap inference must be NaN-consistent
+        assert not np.isfinite(res.overall_se), (
+            f"Bootstrap overall_se should be NaN under G=1 cluster, " f"got {res.overall_se}."
+        )
+        assert np.isnan(res.overall_t_stat)
+        assert np.isnan(res.overall_p_value)
+        assert np.isnan(res.overall_conf_int[0]) and np.isnan(res.overall_conf_int[1])
+
+        # At least one (g, t) cell must have NaN inference (undefined
+        # clustered variance propagating through either the bootstrap or
+        # analytical layer)
+        nan_gt_cells = [
+            gt for gt, eff in res.group_time_effects.items() if not np.isfinite(eff["se"])
+        ]
+        assert len(nan_gt_cells) > 0, (
+            "Expected at least one (g, t) cell with NaN SE under "
+            "G=1 cluster + bootstrap — undefined clustered variance "
+            "must propagate through the bootstrap inference surface."
+        )
+        for gt in nan_gt_cells:
+            eff = res.group_time_effects[gt]
+            assert np.isnan(eff["se"])
+            assert np.isnan(eff["t_stat"])
+            assert np.isnan(eff["p_value"])
+            assert np.isnan(eff["conf_int"][0]) and np.isnan(eff["conf_int"][1])
+
+        # Requested aggregate (event-study) must also be NaN-consistent
+        # for any aggregated horizon whose underlying cells are NaN
+        if res.event_study_effects:
+            for h, ev in res.event_study_effects.items():
+                if not np.isfinite(ev["se"]):
+                    assert np.isnan(ev["t_stat"])
+                    assert np.isnan(ev["p_value"])
+                    assert np.isnan(ev["conf_int"][0]) and np.isnan(ev["conf_int"][1])
+
+    def test_grouped_aggregate_se_changes_with_cluster(self):
+        """The ``aggregate="group"`` aggregation path
+        (``_aggregate_by_group`` at ``staggered_aggregation.py:782-860``)
+        has its own SE computation independent of overall + event-study.
+        Asserts grouped SEs differ between cluster=None and cluster="state"
+        on a panel with intra-cluster correlation, AND that bare cluster=
+        "state" matches explicit SurveyDesign(psu="state") on the grouped
+        surface. Per CI codex R8 P3 finding."""
+        from diff_diff import SurveyDesign
+
+        data = _generate_clustered_staggered_data(seed=117)
+
+        cs_unit = CallawaySantAnna()
+        res_unit = cs_unit.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            aggregate="group",
+        )
+
+        cs_cluster = CallawaySantAnna(cluster="state")
+        res_cluster = cs_cluster.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            aggregate="group",
+        )
+
+        cs_explicit = CallawaySantAnna()
+        res_explicit = cs_explicit.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            aggregate="group",
+            survey_design=SurveyDesign(psu="state"),
+        )
+
+        assert res_unit.group_effects is not None
+        assert res_cluster.group_effects is not None
+        assert res_explicit.group_effects is not None
+
+        # Grouped SEs must differ under cluster vs unit-level (at least
+        # one group)
+        common_groups = set(res_unit.group_effects.keys()) & set(
+            res_cluster.group_effects.keys()
+        )
+        assert common_groups, "expected overlapping groups"
+
+        diffs = []
+        for g in common_groups:
+            se_unit = res_unit.group_effects[g]["se"]
+            se_cluster = res_cluster.group_effects[g]["se"]
+            if np.isfinite(se_unit) and np.isfinite(se_cluster):
+                diffs.append(abs(se_unit - se_cluster))
+        max_diff = max(diffs) if diffs else 0.0
+        assert max_diff > 1e-6, (
+            f"Grouped SEs did not change with cluster= (max diff: "
+            f"{max_diff:.6g}). aggregate='group' may not be routing "
+            "through the cluster-aware IF aggregation."
+        )
+
+        # Bare cluster vs explicit SurveyDesign must agree on grouped surface
+        common = set(res_cluster.group_effects.keys()) & set(
+            res_explicit.group_effects.keys()
+        )
+        for g in common:
+            se_bare = res_cluster.group_effects[g]["se"]
+            se_explicit = res_explicit.group_effects[g]["se"]
+            if np.isfinite(se_bare) and np.isfinite(se_explicit):
+                assert se_bare == pytest.approx(
+                    se_explicit, rel=1e-10, abs=1e-12
+                ), (
+                    f"Grouped SE divergence at g={g}: bare cluster=state "
+                    f"({se_bare}) vs explicit SurveyDesign(psu=state) "
+                    f"({se_explicit})."
+                )
+
+    def test_survey_design_psu_wins_under_bootstrap(self):
+        """Bootstrap path: when survey_design=SurveyDesign(psu=Y) is
+        explicit AND cluster=X is also set with a different partition,
+        the explicit PSU partition wins for the bootstrap draws (just
+        like for the analytical sandwich). UserWarning fires for the
+        partition mismatch; bootstrap SE matches the explicit-PSU-only
+        fit, not the bare-cluster fit. Per CI codex R1 P3 finding."""
+        from diff_diff import SurveyDesign
+
+        data = _generate_clustered_staggered_data(n_clusters=20, units_per_cluster=5, seed=89)
+        data["region"] = data["state"] // 10  # 2 regions of 10 states
+
+        # Reference: explicit region PSU only (no cluster= confound)
+        cs_ref = CallawaySantAnna(n_bootstrap=99, seed=89)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            res_ref = cs_ref.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+                survey_design=SurveyDesign(psu="region"),
+            )
+
+        # Conflict: explicit region PSU + bare cluster=state (different partition)
+        cs_conflict = CallawaySantAnna(cluster="state", n_bootstrap=99, seed=89)
+        with warnings.catch_warnings(record=True) as caught:
+            warnings.simplefilter("always")
+            res_conflict = cs_conflict.fit(
+                data,
+                outcome="outcome",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+                survey_design=SurveyDesign(psu="region"),
+            )
+
+        partition_warnings = [
+            w
+            for w in caught
+            if "psu" in str(w.message).lower()
+            or "partition" in str(w.message).lower()
+            or "different groupings" in str(w.message).lower()
+        ]
+        assert len(partition_warnings) > 0, (
+            "Conflict case (explicit PSU + bare cluster with different "
+            "partition) must emit UserWarning."
+        )
+        # PSU wins under bootstrap too — SE must match the reference
+        # (explicit-PSU-only) fit at the same seed
+        assert res_conflict.overall_se == pytest.approx(res_ref.overall_se, rel=0, abs=0), (
+            f"Bootstrap precedence: with seed={cs_conflict.seed}, conflict "
+            f"fit SE ({res_conflict.overall_se}) must match explicit-PSU-only "
+            f"reference SE ({res_ref.overall_se}) — both bootstraps must "
+            "draw at the same effective PSU level."
+        )
diff --git a/tests/test_triple_diff.py b/tests/test_triple_diff.py
index 9a56d26d..18c6f0c8 100644
--- a/tests/test_triple_diff.py
+++ b/tests/test_triple_diff.py
@@ -73,15 +73,17 @@ def generate_ddd_data(
                     # Add noise
                     y += rng.normal(0, noise_sd)
 
-                    rows.append({
-                        "outcome": y,
-                        "group": g,
-                        "partition": p,
-                        "time": t,
-                        "x1": x1,
-                        "x2": x2,
-                        "unit_id": len(rows),
-                    })
+                    rows.append(
+                        {
+                            "outcome": y,
+                            "group": g,
+                            "partition": p,
+                            "time": t,
+                            "x1": x1,
+                            "x2": x2,
+                            "unit_id": len(rows),
+                        }
+                    )
 
     return pd.DataFrame(rows)
 
@@ -463,9 +465,11 @@ def test_missing_cell(self, simple_ddd_data):
         """Test error when a cell has no observations."""
         # Remove all observations from one cell
         data = simple_ddd_data[
-            ~((simple_ddd_data["group"] == 1) &
-              (simple_ddd_data["partition"] == 1) &
-              (simple_ddd_data["time"] == 0))
+            ~(
+                (simple_ddd_data["group"] == 1)
+                & (simple_ddd_data["partition"] == 1)
+                & (simple_ddd_data["time"] == 0)
+            )
         ]
 
         ddd = TripleDifference()
@@ -880,12 +884,14 @@ def ddd_data_with_covariates(self):
         """Create DDD data with covariates for testing."""
         np.random.seed(42)
         n = 400
-        data = pd.DataFrame({
-            "group": np.repeat([0, 1], n // 2),
-            "partition": np.tile(np.repeat([0, 1], n // 4), 2),
-            "time": np.tile([0, 1], n // 2),
-            "x1": np.random.randn(n),
-        })
+        data = pd.DataFrame(
+            {
+                "group": np.repeat([0, 1], n // 2),
+                "partition": np.tile(np.repeat([0, 1], n // 4), 2),
+                "time": np.tile([0, 1], n // 2),
+                "x1": np.random.randn(n),
+            }
+        )
 
         # Generate outcome with effect
         data["outcome"] = (
@@ -907,7 +913,7 @@ def test_rank_deficient_action_error_raises(self, ddd_data_with_covariates):
 
         ddd = TripleDifference(
             estimation_method="reg",  # Use regression method to test OLS path
-            rank_deficient_action="error"
+            rank_deficient_action="error",
         )
         with pytest.raises(ValueError, match="[Rr]ank-deficient"):
             ddd.fit(
@@ -916,7 +922,7 @@ def test_rank_deficient_action_error_raises(self, ddd_data_with_covariates):
                 group="group",
                 partition="partition",
                 time="time",
-                covariates=["x1", "x1_dup"]
+                covariates=["x1", "x1_dup"],
             )
 
     def test_rank_deficient_action_silent_no_warning(self, ddd_data_with_covariates):
@@ -928,7 +934,7 @@ def test_rank_deficient_action_silent_no_warning(self, ddd_data_with_covariates)
 
         ddd = TripleDifference(
             estimation_method="reg",  # Use regression method to test OLS path
-            rank_deficient_action="silent"
+            rank_deficient_action="silent",
         )
 
         with warnings.catch_warnings(record=True) as w:
@@ -939,12 +945,15 @@ def test_rank_deficient_action_silent_no_warning(self, ddd_data_with_covariates)
                 group="group",
                 partition="partition",
                 time="time",
-                covariates=["x1", "x1_dup"]
+                covariates=["x1", "x1_dup"],
             )
 
             # No warnings about rank deficiency should be emitted
-            rank_warnings = [x for x in w if "Rank-deficient" in str(x.message)
-                           or "rank-deficient" in str(x.message).lower()]
+            rank_warnings = [
+                x
+                for x in w
+                if "Rank-deficient" in str(x.message) or "rank-deficient" in str(x.message).lower()
+            ]
             assert len(rank_warnings) == 0, f"Expected no rank warnings, got {rank_warnings}"
 
         # Should still get valid results
@@ -968,7 +977,7 @@ def test_convenience_function_passes_rank_deficient_action(self, ddd_data_with_c
                 time="time",
                 estimation_method="reg",
                 covariates=["x1", "x1_dup"],
-                rank_deficient_action="error"
+                rank_deficient_action="error",
             )
 
 
@@ -993,18 +1002,16 @@ def test_tstat_nan_when_se_zero(self):
         t_stat = results.t_stat
 
         if not np.isfinite(se) or se == 0:
-            assert np.isnan(t_stat), (
-                f"t_stat should be NaN when SE={se}, got {t_stat}"
-            )
+            assert np.isnan(t_stat), f"t_stat should be NaN when SE={se}, got {t_stat}"
             ci = results.conf_int
-            assert np.isnan(ci[0]) and np.isnan(ci[1]), (
-                f"conf_int should be (NaN, NaN) when SE={se}, got {ci}"
-            )
+            assert np.isnan(ci[0]) and np.isnan(
+                ci[1]
+            ), f"conf_int should be (NaN, NaN) when SE={se}, got {ci}"
         else:
             expected = results.att / se
-            assert np.isclose(t_stat, expected), (
-                f"t_stat should be ATT/SE, expected {expected}, got {t_stat}"
-            )
+            assert np.isclose(
+                t_stat, expected
+            ), f"t_stat should be ATT/SE, expected {expected}, got {t_stat}"
 
     def test_tstat_consistency_all_methods(self):
         """t_stat follows NaN pattern across all estimation methods."""
@@ -1031,12 +1038,104 @@ def test_tstat_consistency_all_methods(self):
             t_stat = results.t_stat
 
             if not np.isfinite(se) or se == 0:
-                assert np.isnan(t_stat), (
-                    f"[{method}] t_stat should be NaN when SE={se}, got {t_stat}"
-                )
+                assert np.isnan(
+                    t_stat
+                ), f"[{method}] t_stat should be NaN when SE={se}, got {t_stat}"
             else:
                 expected = results.att / se
                 assert np.isclose(t_stat, expected), (
-                    f"[{method}] t_stat should be ATT/SE, "
-                    f"expected {expected}, got {t_stat}"
+                    f"[{method}] t_stat should be ATT/SE, " f"expected {expected}, got {t_stat}"
                 )
+
+
+def _generate_ddd_data_with_state_clusters(
+    n_states: int = 25,
+    units_per_state: int = 8,
+    state_effect_sd: float = 3.0,
+    true_att: float = 2.0,
+    seed: int = 53,
+) -> pd.DataFrame:
+    """Generate DDD data with state-level random effects.
+
+    Used by the defensive cluster-changes-SE test below. Per
+    ``feedback_homogeneous_dgp_no_twfe_bias``, assertive cluster-vs-no-cluster
+    SE tests need a panel with intra-cluster correlation; without state
+    random effects, cluster-robust SE collapses to per-unit SE.
+    """
+    rng = np.random.default_rng(seed)
+    state_effects = rng.normal(0.0, state_effect_sd, n_states)
+    rows = []
+    next_unit = 0
+    for s in range(n_states):
+        for _ in range(units_per_state):
+            for g in [0, 1]:
+                for p in [0, 1]:
+                    for t in [0, 1]:
+                        y = (
+                            state_effects[s]
+                            + 10.0
+                            + 2 * g
+                            + 1 * p
+                            + 0.5 * t
+                            + 0.3 * g * p
+                            + 0.2 * g * t
+                            + 0.1 * p * t
+                            + (true_att if (g == 1 and p == 1 and t == 1) else 0.0)
+                            + rng.normal(0.0, 0.5)
+                        )
+                        rows.append(
+                            {
+                                "outcome": y,
+                                "group": g,
+                                "partition": p,
+                                "time": t,
+                                "state": s,
+                                "unit_id": next_unit,
+                            }
+                        )
+                        next_unit += 1
+    return pd.DataFrame(rows)
+
+
+class TestTripleDifferenceClusterDefensive:
+    """Defensive: TripleDifference cluster= produces SE differing from
+    cluster=None on a panel with intra-cluster correlation.
+
+    Added because the audit found that TripleDifference's bare-cluster
+    code path (``triple_diff.py:1245-1259``) is correct but had no
+    positive regression test (only an error-handling test for missing
+    cluster columns). Without this assertive test, a future refactor
+    could silently regress the cluster wiring to a no-op (matching the
+    CS class of bug just fixed). Mirrors
+    ``tests/test_two_stage.py::test_cluster_changes_ses``.
+    """
+
+    def test_cluster_changes_ses(self):
+        data = _generate_ddd_data_with_state_clusters(seed=53)
+
+        td_unit = TripleDifference()  # cluster=None default
+        res_unit = td_unit.fit(
+            data,
+            outcome="outcome",
+            group="group",
+            partition="partition",
+            time="time",
+        )
+
+        td_cluster = TripleDifference(cluster="state")
+        res_cluster = td_cluster.fit(
+            data,
+            outcome="outcome",
+            group="group",
+            partition="partition",
+            time="time",
+        )
+
+        assert np.isfinite(res_unit.se) and res_unit.se > 0
+        assert np.isfinite(res_cluster.se) and res_cluster.se > 0
+        assert abs(res_unit.se - res_cluster.se) > 1e-6, (
+            f"TripleDifference cluster='state' SE ({res_cluster.se:.6f}) "
+            f"is effectively identical to cluster=None SE "
+            f"({res_unit.se:.6f}) — the cluster= parameter may "
+            "have regressed to a silent no-op."
+        )