igerber · igerber · May 23, 2026 · May 23, 2026 · May 23, 2026 · May 23, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/TODO.md b/TODO.md
@@ -99,11 +99,13 @@ Deferred items from PR reviews that were not addressed before merge.
 | PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |
 
 
-| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; Phase 1b PR 2/8 added `StackedDiD`; Phase 1b PR 3/8 added `WooldridgeDiD` OLS path (this row tracks the remaining 5). | multiple | Phase 1b | Medium |
+| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; PR 2/8 added `StackedDiD`; PR 3/8 added `WooldridgeDiD` OLS path. **Interstitial PR (post-PR-3/8) addressed `CallawaySantAnna` separately**: CS uses IF-based variance per Callaway & Sant'Anna (2021) Theorem 2, so its `vcov_type` contract is permanently narrow to `{"hc1"}` (analytical-sandwich families don't compose); the interstitial also fixed CS's bare-`cluster=` silent no-op. This row tracks the remaining 4 (ImputationDiD and EfficientDiD are also IF-based and will likely adopt the same narrow contract). | multiple | Phase 1b | Medium |
 | Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
 | Extend `StackedDiD` with `vcov_type="conley"` (Conley spatial-HAC) — thread the six `conley_*` params through `solve_ols` at `stacked_did.py:419` (and the `_refit_stacked` closure at `:444`). Phase 1b PR 2/8 deferred this; StackedDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham conley follow-up. | `diff_diff/stacked_did.py` | follow-up | Medium |
 | Extend `WooldridgeDiD` with `vcov_type="conley"` — thread the six `conley_*` params through `solve_ols` in `_fit_ols`. Phase 1b PR 3/8 deferred this; WooldridgeDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham / StackedDiD conley follow-ups. | `diff_diff/wooldridge.py` | follow-up | Medium |
 | Extend `WooldridgeDiD` `method ∈ {"logit","poisson"}` paths with `vcov_type ∈ {classical, hc2, hc2_bm}`. The GLM QMLE sandwich uses pseudo-residuals (`weights=p(1-p)` for logit, `weights=μ_i` for Poisson, aweight semantics); composing HC2 leverage and Bell-McCaffrey Satterthwaite DOF with QMLE on canonical-link pseudo-residuals needs derivation + R parity against `clubSandwich::vcovCR(glm(...), type="CR2")`. Phase 1b PR 3/8 rejects `method != "ols" + vcov_type != "hc1"` at `__init__` with a deferral pointer here. | `diff_diff/wooldridge.py` (`_fit_logit`, `_fit_poisson`) | follow-up | Medium |
+| Extend `CallawaySantAnna` with `vcov_type="conley"` — would require deriving a spatial-HAC composition for per-unit influence functions (Conley 1999 spatial kernel × per-(g,t) IF aggregation); no reference implementation exists today. Phase 1b interstitial PR rejected this at `__init__` with a deferral pointer here. | `diff_diff/staggered.py` | follow-up | Low |
+| Decide whether to formally deprecate `CallawaySantAnna.cluster=X` in favor of `survey_design=SurveyDesign(psu=X)`. Both APIs are first-class today (the bare-cluster path synthesizes a minimal SurveyDesign internally), but having two equivalent paths to express the same intent creates redundant surface. Mirrors a similar question for ImputationDiD / EfficientDiD / TwoStageDiD if those estimators ever face the same review. | `diff_diff/staggered.py` | follow-up | Low |
 | Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
 <!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
      form matches clubSandwich's specific algebra (W not sqrt(W), W^2 in bias term,
@@ -196,7 +198,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 
 #### Tier B — Mid-size methodology (5-10 CI rounds expected, per memory cascade priors)
 
-- Thread `vcov_type` through the 5 remaining standalone estimators: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD` (Phase 1b PR 1/8 added SunAbraham, PR 2/8 added StackedDiD, PR 3/8 added WooldridgeDiD-OLS)
+- Thread `vcov_type` through the 4 remaining standalone estimators: `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD` (Phase 1b PR 1/8 added SunAbraham, PR 2/8 added StackedDiD, PR 3/8 added WooldridgeDiD-OLS; interstitial post-PR-3/8 narrowed CallawaySantAnna permanently to `{hc1}` per IF-based variance + fixed bare-`cluster=` silent no-op)
 - SyntheticDiD: rename internal `placebo_effects` → `variance_effects` AND public `placebo_effects` field with deprecation alias retained for one release (`synthetic_did.py`, `results.py`)
 - StaggeredTripleDifference R parity: commit CSV fixtures + add covariate-adjusted scenarios + aggregation-SE assertions (`tests/test_methodology_staggered_triple_diff.py`, `benchmarks/R/benchmark_staggered_triplediff.R`)
 - StaggeredTripleDifference: per-cohort group-effect SE WIF override for exact R `triplediff` match (`staggered_triple_diff.py`)

diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
@@ -188,14 +188,15 @@ CallawaySantAnna(
     anticipation: int = 0,                       # Anticipation periods
     estimation_method: str = "dr",               # "dr", "ipw", or "reg"
     alpha: float = 0.05,
-    cluster: str | None = None,                  # Defaults to unit-level clustering
+    cluster: str | None = None,                  # Cluster col; activates CR1 on the IF via synthesized SurveyDesign(psu=col). None → per-unit IF.
     n_bootstrap: int = 0,                        # 0 = analytical SEs, 999+ recommended
     bootstrap_weights: str | None = None,        # "rademacher", "mammen", or "webb"
     seed: int | None = None,
     rank_deficient_action: str = "warn",
     base_period: str = "varying",                # "varying" or "universal"
     cband: bool = True,                          # Simultaneous confidence bands
     pscore_trim: float = 0.01,                   # Propensity score trimming bound
+    vcov_type: str = "hc1",                      # {"hc1"} only — IF-based variance per Callaway & Sant'Anna (2021). Analytical-sandwich {classical, hc2, hc2_bm} and conley REJECTED at __init__ (see REGISTRY.md IF-vs-sandwich subsection).
 )
 ```