diff --git a/CHANGELOG.md b/CHANGELOG.md index ff5e4c7e..bf72670a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Fixed +- **`CallawaySantAnna.cluster=` silent no-op (Phase 1b interstitial).** `CallawaySantAnna(cluster="state").fit(...)` previously accepted the argument, stored it, returned it from `get_params()`, but never consumed it anywhere in the fit / aggregator / bootstrap pipeline (`staggered.py:154-156` docstring claimed "Defaults to unit-level clustering" — but for bare `cluster=X`, the aggregator at `staggered_aggregation.py:193-213` computed per-unit IF variance regardless, and the bootstrap at `staggered_bootstrap.py:323-347` drew per-unit multiplier weights regardless). Users who explicitly set `cluster="state"` got per-unit inference with no warning — typically SE too small under intra-cluster correlation. **Survey-PSU clustering via `survey_design=SurveyDesign(psu="state")` was NOT affected** and continued to cluster correctly via `_compute_stratified_psu_meat`. The fix synthesizes a minimal `SurveyDesign(psu=self.cluster, weight_type="pweight")` when bare `cluster=` is set without an explicit survey design, threading the synthesized PSU through the existing survey-PSU machinery (aggregator + bootstrap). A new dedicated `df_inference` field on `CallawaySantAnnaResults` carries the cluster-level df for the bare-cluster-synthesize path ONLY (where `survey_metadata` is intentionally `None` to preserve the `DiagnosticReport.survey_metadata is not None` skip at `diagnostic_report.py:848-856` + `:1150-1158` for "Original fit used a survey design" reasoning, and the `summary()` survey block render at `staggered_results.py:235-238`). `HonestDiD` at `honest_did.py` prefers `survey_metadata.df_survey` first (the actual CS-internal df, which may be tightened post-resolve for replicate designs) and falls back to `df_inference` for bare-cluster fits — so downstream consumers always see the cluster df without overriding the post-recompute survey df. When `survey_design=SurveyDesign(weights=Y)` without PSU is provided AND `cluster=X` is also set, `_inject_cluster_as_psu` injects the bare cluster as the effective PSU AND an `effective_survey_design = replace(survey_design, psu=self.cluster)` is constructed so the downstream `_validate_unit_constant_survey` catches movers (units crossing clusters across periods) on panel data via the now-PSU-bearing design; `survey_metadata` is recomputed to reflect the injected PSU. When both `cluster=X` AND `survey_design.psu=Y` are set, the explicit PSU wins via `_resolve_effective_cluster` (emits `UserWarning` if partitions differ). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`**: replicate-weight variance is computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores PSU/cluster entirely (`survey.py:104-109` enforces replicate_weights are mutually exclusive with strata/psu/fpc); honoring bare `cluster=` would silently have no effect while populating `cluster_name`/`n_clusters` on Results dishonestly. Assertive regression tests pin the fix on both panel and repeated-cross-section paths plus the survey/non-survey contract boundaries: `test_cluster_robust_ses_differ_from_unit_level`, `test_bare_cluster_works_with_panel_false_rcs`, `test_bare_cluster_synthesizes_survey_design`, `test_inject_branch_panel_mover_raises`, `test_replicate_weight_plus_cluster_rejected`, `test_bare_cluster_populates_df_inference` (asserts the dedicated cluster-df carrier is set), `test_bare_cluster_does_not_set_survey_metadata` (asserts the survey/non-survey contract is preserved — DiagnosticReport / summary() must not treat a bare-cluster fit as survey-backed), `test_explicit_survey_design_does_populate_survey_metadata` (asserts the inject-branch path still populates survey_metadata for legitimate user-provided SurveyDesign), and `test_bare_cluster_honest_did_uses_df_inference` (end-to-end: HonestDiD threads df_inference into HonestDiDResults.df_survey, preventing silent normal-theory regression on a future refactor). When `cluster=None` (default), behavior is bit-equal to pre-PR (wiring guarded by `if self.cluster is not None:`). Audit verified the no-op was CS-specific — the other 7 Phase 1b estimators (SunAbraham, StackedDiD, WooldridgeDiD, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD) handle bare `cluster=` correctly. + ### Added +- **CallawaySantAnna `vcov_type` input contract (Phase 1b interstitial, permanently narrow).** `CallawaySantAnna(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages. The rejection is **library-architectural, not paper-prescribed**: CS uses influence-function-based variance per Callaway & Sant'Anna (2021) — per-(g,t) doubly-robust / IPW / outcome-regression structure — and has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to other IF-based estimators (ImputationDiD, EfficientDiD) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (Williams 2000 form); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the IF activated via the cluster= wiring fix above. Documentation in `docs/methodology/REGISTRY.md` "IF-based variance estimators vs analytical-sandwich estimators" subsection. `vcov_type`, `cluster_name`, `n_clusters`, `df_inference` added to `CallawaySantAnnaResults` (the canonical PSU column wins for `cluster_name` reporting — `survey_design.psu` when explicit PSU is provided, `self.cluster` when bare cluster synthesizes/injects). `set_params(vcov_type=...)` mirrors SA pattern (mutate-then-refresh `_vcov_type_explicit`, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR** rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by CS's IF-based variance, not a deferral. Phase 1b PR 4/8 (full {classical, hc1, hc2, hc2_bm} threading) resumes on a different estimator post-merge. +- **TripleDifference cluster-changes-SE defensive regression test.** Added `tests/test_triple_diff.py::TestTripleDifferenceClusterDefensive::test_cluster_changes_ses` asserting that `TripleDifference(cluster="state")` produces SE differing from `cluster=None` SE by `>1e-6` on a fixed-seed panel with state-level random effects. Defensive coverage closes a test gap identified during the Phase 1b cluster-wiring audit; TripleDifference's bare-cluster code path (`triple_diff.py:1245-1259`) was already correct but lacked a positive regression test. Mirrors `tests/test_two_stage.py::test_cluster_changes_ses`. - **TwoStageDiD: parity with SpilloverDiD Wave E.3 — always-treated unit drop preserves full-domain survey design via zero-padded scores.** Closes the parity follow-up tracked at `TODO.md` after PR #482 (SpilloverDiD Wave E.3, merge `24de9062`). When TwoStageDiD detects always-treated units (`first_treat <= min_time`) and removes them from the OLS sample, the resolved survey design retains its FULL-DOMAIN `n_psu` / `n_strata` / `df_survey` / `strata` / `fpc` / `psu` arrays instead of being subsetted via `replace(resolved_survey, ...)`. Per-cluster stage-1 / stage-2 score aggregates are computed at the post-drop fit-sample length and then zero-padded onto the full-domain unique-PSU list before stratified-meat dispatch via two new optional kwargs on `_compute_gmm_variance`: `score_pad_mask` (full-domain boolean keep mask) and `cluster_ids_full` (full-domain post-injection PSU labels). PSUs containing only always-treated rows get zero score rows but still count toward `G_full` for `n_psu` / `df_survey` accounting. **Documented synthesis (library-convention adoption, NOT new methodology):** adopts the canonical "zero-pad scores + retain full-design resolved survey" convention from R `survey::svyrecvar(subset())` (Lumley 2010 §2.5) already established in `diff_diff/imputation.py:2175-2183` (PreTrendsImputation), `diff_diff/prep.py:1401-1432` (DCDH cell variance), and `diff_diff/spillover.py` (PR #482 Wave E.3). **Mechanical realization:** `two_stage.py:1485-1525` design-subset block deleted (the `replace(resolved_survey, ...)` subset + `n_psu` / `n_strata` recompute + post-drop `compute_survey_metadata` call); `keep_mask` promoted to `fit()`-level scope (always defined, all-True when no always-treated drop); `survey_weights = survey_weights[keep_mask.values]` retained for stage-1 / stage-2 OLS arithmetic; cluster injection block updated to source `cluster_ids_raw` from FULL-DOMAIN `data[cluster_var].values` (not post-drop `df[cluster_var].values`) so `_inject_cluster_as_psu`'s zip against `resolved_survey.strata` (full-domain) stays length-aligned; `df["_survey_cluster"]` aligned to post-drop length via `resolved_survey.psu[keep_mask.values]`; post-injection `compute_survey_metadata` uses full-domain `raw_w` from `data[survey_design.weights]`. `_compute_gmm_variance` adds the zero-pad expansion after the per-cluster aggregation (mapping fit-sample `unique_clusters` into `unique_clusters_full` positions via `np.searchsorted`) and updates the strata/fpc `obs_idx` lookups to use `cluster_ids_for_lookup = cluster_ids_full` when padding is active. The three inner stage-2 methods (`_stage2_static`, `_stage2_event_study`, `_stage2_group`) thread the new kwargs through; bootstrap-resample call sites keep default `None` (no behavior change on bootstrap path). **Always-treated warning text updated:** "Associated survey weights subsetted for stage-1 / stage-2 OLS; full-domain survey design retained for variance estimation (Wave E.3 parity)." replaces the prior "and design arrays adjusted" claim. **No-survey path unchanged:** when `resolved_survey is None`, both `score_pad_mask` and `cluster_ids_full` default to `None` and the existing post-drop scoring path runs bit-identically. **Replicate variance + always-treated drop:** existing path unchanged (replicate refit handles resampling at the survey-design level; `score_pad_mask_arg` is `None` on `_uses_replicate_ts` paths). **Tests:** new `TestTwoStageDiDWaveE3ParityAlwaysTreated` class in `tests/test_two_stage.py` (8 tests: no-always-treated baseline, full-domain `df_survey` preservation under drop, full-domain `n_psu` reporting, per-cluster zero-pad mock-spy on `_compute_stratified_meat_from_psu_scores`, subpopulation + always-treated composition, cluster-as-PSU + always-treated, no-survey path unchanged, PSU entirely-always-treated). REGISTRY.md TwoStageDiD section gains a "documented synthesis — Wave E.3 parity" note; SpilloverDiD Wave E.3 section updated to mark the TwoStageDiD parity follow-up as shipped. - **WooldridgeDiD `vcov_type` parameter, OLS path (Phase 1b PR 3/8).** `WooldridgeDiD(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` on `method="ols"` (defaults to `"hc1"`, preserves prior behavior at machine precision — the WLS-CR1 sandwich is algebraically invariant between the prior within-transform path and the new branched path, differing only by float64 multiplication ordering at sub-ULP scale; the full 106-test `tests/test_wooldridge.py` baseline still passes unchanged). `hc2_bm` auto-routes to a full-dummy saturated design (`[intercept, X_design, unit_dummies, time_dummies]`) + clubSandwich WLS-CR2 algebra (PR #475) — matches `clubSandwich::vcovCR(lm(...), type="CR2") + coef_test()$df_Satt` at `atol=1e-10` on the new `benchmarks/data/wooldridge_golden.json` fixture. `classical`/`hc2` supported via full-dummy + auto-drop of the unit auto-cluster (one-way families); explicit `cluster="X"` + one-way family raises at the linalg validator. Per-cell + aggregate p-values/CIs on `classical`/`hc2` paths use the residual DOF `n - rank(X)` (matches R `lm()` / `coef_test()` t-distribution), not normal-theory. **Bell-McCaffrey Satterthwaite DOF is threaded across ALL hc2_bm user-facing inference surfaces**: (1) per-cell `group_time_effects[(g, t)]` use `coef_test()$df_Satt` (matches R at atol=1e-6 from CI inversion); (2) overall ATT uses the post-period-aggregation contrast DOF from `_compute_cr2_bm_contrast_dof` (matches R `Wald_test(test="HTZ")$df_denom` at atol=1e-10); (3) `.aggregate("group" | "calendar" | "event")` recomputes contrast-specific BM DOFs lazily from BM artifacts stored on the Results object — the REDUCED kept-column design (`X_red`), cluster_ids, reduced bread matrix, and reduced-space coef-index map (using the reduced kept-column design after rank-deficient drops keeps the bread non-singular and matches the subspace `solve_ols` actually estimated in). Fail-closed (all-NaN inference) when BM DOF unavailable, mirrors PR #475 R7 and PR #479 R3. `method ∈ {"logit","poisson"}` + `vcov_type != "hc1"` raises `NotImplementedError` at `__init__` (GLM CR2-BM-on-pseudo-residuals composition needs derivation; deferred to follow-up TODO row). `SurveyDesign` + `vcov_type != "hc1"` raises `NotImplementedError` at `fit()` (survey TSL overrides analytical sandwich). `n_bootstrap > 0` + one-way (`hc2`/`classical`) raises at `fit()` regardless of `cluster=` setting (multiplier bootstrap is intrinsically clustered, but one-way vcov_type does not compose with cluster_ids — either the auto-cluster is dropped when `cluster=None` leaving the bootstrap with no cluster to draw at, or the linalg validator rejects one-way + cluster_ids when `cluster=X`). `conley` rejected at `__init__` with a deferral pointer. `vcov_type`, `cluster_name`, `n_clusters` added to `WooldridgeDiDResults` for downstream introspection (per `feedback_results_vcov_label_cluster_metadata`). Third PR of the Phase 1b standalone-estimator threading initiative (5 PRs to follow: CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD). - **`SpilloverDiD(survey_design=SurveyDesign.subpopulation(...))` full-design retention via zero-pad scores (Wave E.3).** Closes the Wave E.1/E.2/follow-up documented limitation at `REGISTRY.md:3249`: `SurveyDesign.subpopulation()`-derived designs AND warn-and-drop fits now preserve the full-domain resolved survey design — `n_psu` / `n_strata` / `df_survey` / Binder TSL per-stratum centering reflect the FULL domain rather than the post-`finite_mask` fit sample. **Documented synthesis (library-convention adoption, NOT new methodology):** Wave E.3 adopts the canonical "zero-pad scores to full panel + retain full-design resolved survey" pattern from R `survey::svyrecvar(subset())` (Lumley 2010 §2.5) already established in `diff_diff/imputation.py:2175-2183` (PreTrendsImputation lead regression — Omega_0 scores zero-padded back to full panel length) and `diff_diff/prep.py:1401-1432` (DCDH cell variance — IF zero-padded outside the cell). Wave E.3 propagates the same convention to SpilloverDiD's Wave E.1 Binder TSL × Wave D Gardner GMM × Wave E.2/follow-up stratified-Conley + serial Bartlett meat. **Mechanical realization (one new `_compute_gmm_corrected_meat` kwarg):** the gamma_hat / Psi build stays on SURVEY-FINITE-MASK inputs (`X_1_sparse_fit`, `X_10_sparse_fit`, `eps_10_fit` built on `survey_finite_mask = finite_mask & survey_weights > 0`; `X_2_kept_gamma`, `eps_2_fit_gamma`, `survey_weights_fit_gamma` projected from the fit-sample frame down to survey_finite_mask) so the drop-first stage-1 FE column space is bit-identical to the pre-E.3 path. `_compute_gmm_corrected_meat` gains a new optional kwarg `score_pad_mask: Optional[np.ndarray] = None`: when supplied, the helper zero-pads the fit-sample `Psi` to full panel length AFTER construction but BEFORE kernel dispatch via `Psi_padded[score_pad_mask] = Psi`. Kernel-dispatch arrays (`cluster_ids`, `conley_coords`, `conley_time`, `conley_unit`, `resolved_survey`) are passed at FULL length so the meat helpers (Binder TSL / stratified-Conley / serial Bartlett) see the full-domain PSU / strata / centroid / time geometry. The `_validate_conley_kwargs` call inside the helper reads `n_for_conley = len(score_pad_mask)` when the kwarg is set so the Conley shape checks see the full-length geometry. **`gamma_hat` invariance:** the gamma_hat solve operates on fit-sample inputs throughout — bit-identical to the pre-E.3 path (critical for the case where `_build_butts_fe_design_csr`'s `pd.factorize` re-compaction would drop a different unit's column under a full-length FE build than under a fit-length one). **Bread invariance:** `A_22 = X_2_kept' W X_2_kept` at `spillover.py:3187-3214` still uses fit-length `X_2_kept` because `A_22_full = X_2_full' W_full X_2_full` equals `A_22_kept` when zero-weight rows contribute zero. **A2 invariant:** warn-and-drop and `SurveyDesign.subpopulation()` drops are treated identically — both apply the zero-pad mechanism. The "both mechanisms compose cleanly" case (subpop-excluded row that is ALSO warn-and-dropped) produces `Psi = 0` from either cause; the PSU still counts toward `n_psu_full`. Hand-computation methodology anchor at `_scratch/wave_e3_smoke.py` codifies the A2 invariant on 4 PSU × 4 period × 3 obs synthetic. **Subpopulation parity vs upstream-subset:** `df_survey` matches the full domain regardless of how many rows the subpopulation mask excludes (mirrors R `svyglm(design=subset(d, mask))` vs `svyglm(design=svydesign(data=data[mask], ...))`). SE may differ by design — subpopulation retains zero-padded PSU geometry; upstream-subset drops PSUs entirely. **Pre-E.3 baseline parity:** when `finite_mask.all() == True` AND all weights `> 0`, the Wave E.3 zero-pad is a no-op — ATT + SE + n_psu + df_survey match pre-E.3 baseline values via FIXED GOLDEN values at `test_c` (`rtol=1e-12, atol=1e-12`). **Cross-surface n_psu consistency:** top-level `res.n_psu` reads from `len(resolved_survey_fit.weights)` on the implicit-PSU branch (was `int(finite_mask.sum())` pre-codex-R1-P2-fix); this keeps `res.n_psu == res.survey_metadata.n_psu` on weights-only / strata-only survey designs under warn-and-drop. Regression at `test_c2`. **Restrictions inherited:** replicate-weight variance + subpopulation continues to raise `NotImplementedError` at the Wave E.1 gate. TwoStageDiD's analogous `finite_mask + design-subset` pattern at `two_stage.py:567-601` is NOT yet adopted to Wave E.3 — separate parity follow-up tracked in `TODO.md` (an expected-divergence test was attempted but TwoStageDiD's always-treated handling at `two_stage.py:294-336` differs from SpilloverDiD's per-unit Omega_0 check, so the divergence didn't materialize on the standard fixture; the parity follow-up should add its own targeted regression). **Implementation:** `spillover.py:2845-2896` design-subset block deleted; `survey_weights_fit = survey_weights[finite_mask]` retained for the stage-2 OLS solve which still operates on the fit sample; `cluster_ids_full[finite_mask]` subset dropped on the survey path. `_compute_gmm_corrected_meat` call at `spillover.py:3163` now receives FIT-LENGTH gamma_hat-construction inputs (unchanged) plus FULL-LENGTH kernel-dispatch arrays (`cluster_ids_for_meat`, `conley_*_for_meat`, `resolved_survey_fit`) plus the new `score_pad_mask=survey_finite_mask` kwarg; no-survey path passes `score_pad_mask=None` and uses fit-length variables throughout (bit-identical to pre-E.3). `_compute_gmm_corrected_meat` at `two_stage.py:62-80` adds one new optional kwarg `score_pad_mask: Optional[np.ndarray] = None` and one post-Psi-construction zero-pad block; the `_validate_conley_kwargs` call uses `n_for_conley = len(score_pad_mask)` when the kwarg is set. Within-unit-constancy validator at `spillover.py:2913` updated to operate on full-length unit array. Second `compute_survey_metadata` recompute at `spillover.py:2954-2959` uses full-length `raw_w`. No `_compute_stratified_meat_from_psu_scores` / `_compute_stratified_conley_meat` / `_compute_stratified_serial_bartlett_meat` signature changes. **Tests:** new `TestSpilloverDiDWaveE3SubpopulationFullDesign` and `TestSpilloverDiDWaveE3SubpopulationFullDesignEventStudy` classes in `tests/test_spillover.py` (19 tests: pre-E.3 baseline parity via pinned goldens, n_psu cross-surface consistency on implicit-PSU branch, A2 invariant (zero-pad mechanics via mock-spy), subpopulation × explicit-PSU parity, conley + lag>0 + subpopulation × explicit-PSU / cluster-injection / weights-only branches, cluster-as-PSU + subpopulation parity, unit with BOTH zero weight AND no Omega_0 support, gamma_hat-build sample excludes zero-weight rows, n_obs / n_treated / n_control / n_far_away_obs reflect count_mask, warn-drop SE drift golden, ATT bit-equality under PSU-last-sort exclusion, exact event-study n_obs propagation, event-study on both is_staggered branches with analytical + conley+lag variants). Pre-existing Wave E.1 `test_p2_finite_mask_forces_drop_under_survey` assertion flipped from `n_psu=8` (subset) to `n_psu=10` (full domain) to reflect the new contract. diff --git a/TODO.md b/TODO.md index 8f221f59..f4a6286c 100644 --- a/TODO.md +++ b/TODO.md @@ -99,11 +99,13 @@ Deferred items from PR reviews that were not addressed before merge. | PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low | -| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; Phase 1b PR 2/8 added `StackedDiD`; Phase 1b PR 3/8 added `WooldridgeDiD` OLS path (this row tracks the remaining 5). | multiple | Phase 1b | Medium | +| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; PR 2/8 added `StackedDiD`; PR 3/8 added `WooldridgeDiD` OLS path. **Interstitial PR (post-PR-3/8) addressed `CallawaySantAnna` separately**: CS uses IF-based variance per Callaway & Sant'Anna (2021) Theorem 2, so its `vcov_type` contract is permanently narrow to `{"hc1"}` (analytical-sandwich families don't compose); the interstitial also fixed CS's bare-`cluster=` silent no-op. This row tracks the remaining 4 (ImputationDiD and EfficientDiD are also IF-based and will likely adopt the same narrow contract). | multiple | Phase 1b | Medium | | Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium | | Extend `StackedDiD` with `vcov_type="conley"` (Conley spatial-HAC) — thread the six `conley_*` params through `solve_ols` at `stacked_did.py:419` (and the `_refit_stacked` closure at `:444`). Phase 1b PR 2/8 deferred this; StackedDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham conley follow-up. | `diff_diff/stacked_did.py` | follow-up | Medium | | Extend `WooldridgeDiD` with `vcov_type="conley"` — thread the six `conley_*` params through `solve_ols` in `_fit_ols`. Phase 1b PR 3/8 deferred this; WooldridgeDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham / StackedDiD conley follow-ups. | `diff_diff/wooldridge.py` | follow-up | Medium | | Extend `WooldridgeDiD` `method ∈ {"logit","poisson"}` paths with `vcov_type ∈ {classical, hc2, hc2_bm}`. The GLM QMLE sandwich uses pseudo-residuals (`weights=p(1-p)` for logit, `weights=μ_i` for Poisson, aweight semantics); composing HC2 leverage and Bell-McCaffrey Satterthwaite DOF with QMLE on canonical-link pseudo-residuals needs derivation + R parity against `clubSandwich::vcovCR(glm(...), type="CR2")`. Phase 1b PR 3/8 rejects `method != "ols" + vcov_type != "hc1"` at `__init__` with a deferral pointer here. | `diff_diff/wooldridge.py` (`_fit_logit`, `_fit_poisson`) | follow-up | Medium | +| Extend `CallawaySantAnna` with `vcov_type="conley"` — would require deriving a spatial-HAC composition for per-unit influence functions (Conley 1999 spatial kernel × per-(g,t) IF aggregation); no reference implementation exists today. Phase 1b interstitial PR rejected this at `__init__` with a deferral pointer here. | `diff_diff/staggered.py` | follow-up | Low | +| Decide whether to formally deprecate `CallawaySantAnna.cluster=X` in favor of `survey_design=SurveyDesign(psu=X)`. Both APIs are first-class today (the bare-cluster path synthesizes a minimal SurveyDesign internally), but having two equivalent paths to express the same intent creates redundant surface. Mirrors a similar question for ImputationDiD / EfficientDiD / TwoStageDiD if those estimators ever face the same review. | `diff_diff/staggered.py` | follow-up | Low | | Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |