PR-B: WooldridgeDiD tracker promotion + methodology bundle by igerber · Pull Request #486 · igerber/diff-diff

igerber · 2026-05-22T19:01:11Z

Summary

Closes the WooldridgeDiD (ETWFE) methodology-review-tracker promotion (In Progress → Complete) following the primary-source review for Wooldridge (2025) merged in PR-A (PR-A: WooldridgeDiD primary-source review (Wooldridge 2025 EmpEcon) #484)
Implementation: opt-in aggregate(weights="cohort_share") (paper Eqs. 7.4 / 7.6; event path restricted to k≥0); cohort_trends=True (paper Section 8 / Eq. 8.1; OLS path; auto-routes to full-dummy; per-cohort pre-period identification check)
R Goldens: extended benchmarks/R/generate_wooldridge_golden.R with etwfe(family="poisson") + etwfe(family="logit") log-link goldens; etwfe pinned in benchmarks/R/requirements.R
Tests: 60 new methodology tests across 6 paper-equation-numbered classes + TestW2025LibraryDeviations + 2 surface R-parity classes for the nonlinear paths (209 tests total in the file, including the existing 12 vcov_type R-parity tests from PR wooldridge: thread vcov_type through OLS path (Phase 1b 3/8) #483)
F.L.I.P. consolidation: METHODOLOGY_REVIEW status flip + REGISTRY ### Deviations block + CHANGELOG [Unreleased] ### Added + TODO row 95 closed + paper-review Status inline notes + API doc updates

Methodology references

Method: WooldridgeDiD (ETWFE) — Section 7 aggregation paths, Section 8 heterogeneous cohort trends, nonlinear R-parity surface
Paper sources:
- Wooldridge, J. M. (2025). Two-way fixed effects, the two-way Mundlak regression, and difference-in-differences estimators. Empirical Economics 69(5), 2545-2587. DOI 10.1007/s00181-025-02807-z. (review on file)
- Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. Econometrics Journal 26(3), C31-C66. (companion review on file)
Intentional deviations from the source: see docs/methodology/REGISTRY.md ## WooldridgeDiD (ETWFE) → ### Deviations from the paper / from R / library extensions for the consolidated list (HC1 finite-sample factor, QMLE sandwich, nonlinear-vs-fixest, logit cohort+time dummies, anticipation + aggregation, cell-count default + opt-in cohort-share, conditional-on-shares inference, response-scale vs log-link nonlinear parity)

Validation

Tests added/updated: tests/test_methodology_wooldridge.py (60 methodology tests; existing 12 vcov_type tests unchanged from PR wooldridge: thread vcov_type through OLS path (Phase 1b 3/8) #483)
R goldens: benchmarks/R/generate_wooldridge_golden.R extended (Poisson + logit) + benchmarks/data/wooldridge_golden.json regenerated + benchmarks/data/wooldridge_test_panel.csv augmented with y_pois + y_logit columns
No tutorial / notebook changes (T16 unaffected — new parameters are opt-in)

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-05-22T19:09:03Z

Overall Assessment

⚠️ Needs changes

Executive Summary

Affected methods are Wooldridge (2025) Section 7 aggregation (weights="cohort_share", Eqs. 7.4 and 7.6) and Section 8 heterogeneous cohort trends (cohort_trends=True, Eq. 8.1). I did not find an unmitigated paper/math mismatch in the implemented estimator paths themselves.
aggregate(weights=...) was not propagated to the plot_event_study() wrapper, so the new Eq. 7.6 cohort-share event aggregation is unreachable from that convenience surface and silently falls back to cell weights on a fresh results object.
The new cohort_trends parameter is not locked across all aggregation methods. The added tests cover event and simple, but not group or calendar, which the review checklist treats as a P1 parameter-interaction gap.
The survey rejection for cohort_trends=True is fail-closed and documented, but the docs say it is tracked in TODO.md and there is no matching TODO row.
This was a static review. I did not execute the test suite in this environment.

Methodology

No unmitigated P0/P1 methodology defect found in the Section 7 / Section 8 estimator math after cross-checking the new code against the Registry and paper-review notes.
Severity: P3 Impact: nonlinear Poisson/logit “R parity” remains surface-only rather than estimand-level, but that limitation is explicitly documented and tracked in TODO.md:L96-L98, docs/methodology/REGISTRY.md:L1580-L1586, and tests/test_methodology_wooldridge.py:L2205-L2315. Concrete fix: none required for this PR.

Code Quality

Severity: P1 Impact: aggregate() now exposes weights, but plot_event_study() still hardcodes aggregate("event") with no pass-through. On a fresh results object, direct plotting can only produce the default cell-weighted event study, so the new paper Eq. 7.6 weights="cohort_share" path is silently unavailable through this wrapper. See diff_diff/wooldridge_results.py:L138-L170 and diff_diff/wooldridge_results.py:L696-L704. Concrete fix: add weights: str = "cell" to plot_event_study(), thread it through to aggregate("event", weights=weights), and add a regression test for direct plotting under both schemes.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 Impact: the new cohort_trends=True + survey_design rejection is documented and fail-closed, but the docs/error text say it is “tracked in TODO” and there is no corresponding TODO row. That is not a correctness issue, but it means the deferred work is not actually on the project’s tech-debt ledger. See diff_diff/wooldridge.py:L554-L569, docs/methodology/REGISTRY.md:L1566-L1574, and TODO.md:L94-L100. Concrete fix: add a TODO row for survey support on the cohort_trends full-dummy path, or remove the “tracked in TODO” wording.

Security

No findings.

Documentation/Tests

Severity: P1 Impact: the new cohort_trends parameter is only cross-product tested with aggregate("event") and aggregate("simple", weights="cohort_share"); there is no corresponding coverage for aggregate("group") or aggregate("calendar") on the new full-dummy cohort_trends=True path. Per the review checklist, new parameter interactions should be exercised across all aggregation methods. See the current cohort_trends coverage in tests/test_methodology_wooldridge.py:L1310-L1417. Concrete fix: add cohort_trends=True tests for group and calendar, ideally on the heterogeneous-trends DGP and one rank-deficient fixture.

Path to Approval

Propagate weights through WooldridgeDiDResults.plot_event_study() and add a regression test that direct plotting from an unaggregated results object honors weights="cohort_share".
Extend the cohort_trends test block with aggregate("group") and aggregate("calendar") coverage on the new full-dummy path.

Closes the WooldridgeDiD (ETWFE) methodology-review-tracker promotion in METHODOLOGY_REVIEW.md (In Progress → Complete), following the primary-source review for Wooldridge (2025) merged in PR-A (#484). Adds two paper-driven implementation surfaces and extends R-parity goldens to the nonlinear paths. Implementation: - `aggregate(weights="cohort_share")` on WooldridgeDiDResults implements paper Eqs. 7.4 (simple-overall) and 7.6 (event-time, restricted to k>=0) cohort-share aggregation weights as an opt-in alternative to the default cell-count weighting (matching Stata `jwdid_estat`). Inference fields fail-closed to NaN with UserWarning per paper Section 7.5 conditional-on-shares semantics; raises on `survey_design` (design-consistent totals deferred); raises on `type ∈ {"group","calendar"}` (no paper closed-form); raises on bootstrap fits (no matching bootstrap variant). Closes TODO row 95. - `cohort_trends=True` on `WooldridgeDiD.__init__` adds linear `dg_i · t` cohort-specific trend interactions (paper Section 8 / Eq. 8.1) for the OLS path. Rejects on logit/poisson per paper Section 8 OLS scope; rejects on survey_design pending full-dummy/TSL validation; enforces per-cohort pre-period identification check (≥ 2 observed pre-periods per treated cohort). Auto-routes to full-dummy mode regardless of vcov_type. Closes the PR-A Requirements Checklist heterogeneous-trends gap. Tests: - `tests/test_methodology_wooldridge.py` extended with 6 paper-equation-numbered methodology classes (Theorem 3.1, Proposition 5.1, Section 6 event study, Section 7 aggregation paths, Section 8 heterogeneous trends, Section 10 unbalanced panels) + `TestW2025LibraryDeviations` consolidating 5 surviving deviations. Mirrors the HAD PR #473 precedent. - Two new R-parity surface classes (`TestWooldridgeParityRPoisson`, `TestWooldridgeParityRLogit`) lock the structural surface against R `etwfe(family=...)` log-link goldens. - 209 tests total (60 methodology + 149 R-parity + unit regressions). R Goldens: - `benchmarks/R/generate_wooldridge_golden.R` extended with Poisson + logit DGPs via R `etwfe`; augmented panel CSV retains the same seed-generated `y_pois` + `y_logit` columns for cross-language reproducibility. - `benchmarks/R/requirements.R` pins `etwfe >= 0.5.0`. Tracker promotion: - METHODOLOGY_REVIEW.md L52 status flip with merge date; detail section L583-605 rewritten to the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns template mirroring HAD / ContinuousDiD / DCDH. L27 example re-pointed; priority queue items #7-#10 renumbered to #6-#9. - REGISTRY.md `## WooldridgeDiD (ETWFE)` extended with `### Deviations from the paper / from R / library extensions` block consolidating 7 surviving deviations + opt-in notes for cohort_share + cohort_trends + survey rejection + bootstrap cohort_share rejection contracts. - CHANGELOG.md `[Unreleased]` `### Added` documents the new parameters, R-parity extension, and tracker flip. - `docs/methodology/papers/wooldridge-2025-review.md` Requirements Checklist + Gaps & Uncertainties items 1 + 11 marked `**Status:** Closed in PR-B`. - `docs/api/wooldridge_etwfe.rst` updated with weighting-scheme notes alongside the existing aggregation table. Second of two PRs for the WooldridgeDiD methodology-review-tracker promotion. PR-A merged at e416aed (#484). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…rt_trends × group/calendar coverage + TODO row

github-actions · 2026-05-22T20:13:53Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 379e065f7455a72da6543a7828dff5c4b3eb066b

Overall Assessment

⚠️ Needs changes

Executive Summary

Affected methods are Wooldridge (2025) Section 7 aggregation (aggregate(weights="cohort_share"), Eqs. 7.4 and 7.6) and Section 8 heterogeneous cohort trends (cohort_trends=True, Eq. 8.1). I did not find an unmitigated paper/Registry mismatch in the estimator math itself.
The previous plot_event_study() propagation finding is only partially resolved: the wrapper now accepts weights, but it does not track which weighting scheme is currently cached, so a later plot_event_study(weights="cell") call can silently reuse stale cohort-share results instead of recomputing the cell-weighted event study. See diff_diff/wooldridge_results.py:L696-L723.
The previous cohort_trends test-coverage gap for aggregate("group") and aggregate("calendar") appears resolved by the new coverage in tests/test_methodology_wooldridge.py:L1378-L1414.
The previous tech-debt tracking issue is also resolved: the new survey/cohort-share/cohort-trends follow-ups are now explicitly tracked in TODO.md:L96-L99.
Nonlinear Poisson/logit “R parity” remains surface-only rather than estimand-level, but that limitation is documented in the Registry and paper review and tracked in TODO, so it is informational rather than blocking. See docs/methodology/REGISTRY.md:L1581-L1587, docs/methodology/papers/wooldridge-2025-review.md:L618-L619, TODO.md:L96-L99.

Methodology

Severity: P3 Impact: The nonlinear Poisson/logit additions still stop at structural/surface parity with R etwfe; they do not attempt cell-level numerical parity because diff-diff returns response-scale ATT while etwfe exposes link-scale coefficients. This is documented in the Methodology Registry and paper review and tracked in TODO, so it is not a defect for this PR. Concrete fix: none required in this PR. See docs/methodology/REGISTRY.md:L1581-L1587, docs/methodology/papers/wooldridge-2025-review.md:L618-L619, TODO.md:L96-L99.

Code Quality

Severity: P1 Impact: WooldridgeDiDResults.plot_event_study(weights=...) still has a stale-cache bug. It recomputes when event_study_effects is None or when the requested weights are "cohort_share", but not when the cached event study was previously built under cohort-share and the caller later requests "cell". That means plot_event_study() can silently plot the wrong aggregation scheme on a reused results object. The new regression only exercises the cell -> cohort_share direction, not the reverse. Concrete fix: track the cached event-study weighting scheme on the results object and re-aggregate whenever the requested scheme differs, or more simply always call aggregate("event", weights=weights) inside the wrapper; add a regression that calls plot_event_study(weights="cohort_share") first and then plot_event_study()/weights="cell" and asserts the negative-k placebo leads are restored. See diff_diff/wooldridge_results.py:L711-L723, tests/test_methodology_wooldridge.py:L1416-L1470.

Performance

No findings.

Maintainability

No findings beyond the plot_event_study() state bug above.

Tech Debt

Severity: P3 Impact: The remaining Section 7/8 limitations are now properly tracked instead of being left as undocumented deferred work: response-scale/link-scale nonlinear parity, design-consistent survey cohort totals for cohort-share weighting, unconditional cohort-share inference, and cohort_trends=True + survey_design. Concrete fix: none required in this PR. See TODO.md:L96-L99.

Security

No findings.

Documentation/Tests

No new blocking findings. The prior re-review gaps around cohort_trends=True interaction coverage for aggregate("group") and aggregate("calendar") are resolved in tests/test_methodology_wooldridge.py:L1378-L1414.

Path to Approval

Fix WooldridgeDiDResults.plot_event_study() so it refreshes the event-study aggregation whenever the requested weights differ from the cached scheme, including the cohort_share -> cell direction.
Add a regression test that first calls plot_event_study(weights="cohort_share"), then calls plot_event_study() or plot_event_study(weights="cell"), and asserts the second call restores the cell-weighted event-study keys/values.

Static re-review only. I did not run the test suite in this environment because the Python runtime here is missing project dependencies (numpy was not importable).

…hort_share→cell regression

github-actions · 2026-05-22T23:49:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a884867e70ae60de422618ee8141b624ef70e9cc

Overall Assessment

⚠️ Needs changes

Executive Summary

Affected methods are Wooldridge (2025) Section 7 aggregation (aggregate(weights="cohort_share"), Eqs. 7.4 and 7.6) and Section 8 heterogeneous cohort trends (cohort_trends=True, Eq. 8.1). I did not find an unmitigated paper/Registry mismatch in the changed estimator math.
The previous plot_event_study() stale-cache finding is resolved: the wrapper now always re-aggregates under the requested weighting scheme, and the reverse cohort_share -> cell regression is present in diff_diff/wooldridge_results.py:L696-L732 and tests/test_methodology_wooldridge.py:L1416-L1521.
The previous cohort_trends interaction-coverage gap for aggregate("group") and aggregate("calendar") is also resolved in tests/test_methodology_wooldridge.py:L1378-L1414.
Severity: P1 [Newly identified] The new weights="cohort_share" surface is documented as a WooldridgeDiD feature and is threaded into OLS, logit, and Poisson results, but the added regression coverage exercises that weighting only on OLS. The new nonlinear tests stop at fit/surface checks and never call aggregate(..., weights="cohort_share"). Per the parameter-interaction checklist, that is an unmitigated coverage gap.
Remaining cohort-share survey/inference limitations and the nonlinear response-scale vs link-scale parity limitation are fail-closed, documented in docs/methodology/REGISTRY.md, and tracked in TODO.md; those are P3 informational, not blockers.

Methodology

No unmitigated P0/P1 methodology findings.
Severity: P3 Impact: aggregate(weights="cohort_share") now fail-closes on survey-weighted fits instead of mixing design-weighted ATTs with unweighted cohort shares, which would target the wrong estimand. This deviation is documented and tracked, so it is informational, not a defect. Concrete fix: none required in this PR. See diff_diff/wooldridge_results.py:L355-L373, docs/methodology/REGISTRY.md:L1508-L1510, TODO.md:L97-L99.
Severity: P3 Impact: cohort-share aggregation intentionally sets t_stat / p_value / CI to NaN because the current SE is conditional on realized cohort shares, and unconditional inference is deferred. That matches the documented fail-closed contract. Concrete fix: none required in this PR. See diff_diff/wooldridge_results.py:L375-L406, docs/methodology/REGISTRY.md:L1509-L1510, TODO.md:L98-L99.
Severity: P3 Impact: the new Poisson/logit R goldens remain surface-only because diff-diff returns response-scale ATT while etwfe exposes link-scale coefficients. This is explicitly documented and tracked. Concrete fix: none required in this PR. See tests/test_methodology_wooldridge.py:L2350-L2465, docs/methodology/REGISTRY.md:L1587-L1587, TODO.md:L96-L96.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 Impact: the remaining Wooldridge follow-ups are now properly tracked rather than left implicit: nonlinear response-scale/link-scale bridging, survey-consistent cohort totals, unconditional cohort-share inference, and cohort_trends=True + survey_design. Concrete fix: none required in this PR. See TODO.md:L96-L99.

Security

No findings.

Documentation/Tests

Severity: P1 [Newly identified] Impact: docs/api/wooldridge_etwfe.rst advertises aggregate(weights="cohort_share") as a WooldridgeDiD feature generally (docs/api/wooldridge_etwfe.rst:L10-L15, docs/api/wooldridge_etwfe.rst:L149-L161), and the implementation explicitly threads _n_g_per_cohort into both nonlinear result builders (diff_diff/wooldridge.py:L1665-L1672, diff_diff/wooldridge.py:L1916-L1923). But the new Stage-B cohort-share tests are OLS-only (tests/test_methodology_wooldridge.py:L788-L1048, tests/test_methodology_wooldridge.py:L1332-L1562), while the new nonlinear classes only assert fit completion / golden shape (tests/test_methodology_wooldridge.py:L2350-L2465). Existing baseline nonlinear aggregation tests cover only default aggregation paths (tests/test_wooldridge.py:L763-L785). That leaves the new parameter untested across all supported estimator paths. Concrete fix: add deterministic logit and Poisson regressions that call aggregate("simple", weights="cohort_share") and aggregate("event", weights="cohort_share"), asserting finite att/se when estimable, NaN inference fields under the conditional-on-shares contract, and k >= 0 on the event path.
Static review only. I could not run the test suite here because this environment is missing project Python dependencies (numpy was not importable) and pytest is unavailable.

Path to Approval

Add a Poisson regression test on a deterministic staggered panel that exercises aggregate("simple", weights="cohort_share") and aggregate("event", weights="cohort_share"), checking finite point estimates/SEs, NaN inference fields, and k >= 0 event keys.
Add the same regression coverage for method="logit" on a fixture that converges deterministically; if the current fixture is unstable, introduce a purpose-built binary DGP instead of a skip-based surface test.

…Poisson paths

github-actions · 2026-05-23T00:02:03Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 426cb3236e6a7269d5e48b1435a591e1a638ed95

Overall Assessment

⚠️ Needs changes

Executive Summary

The previous re-review blocker is addressed: the PR now adds explicit nonlinear weights="cohort_share" coverage for both Poisson and logit, plus the plot_event_study() cache/regression tests. See tests/test_methodology_wooldridge.py:L1014-L1145 and tests/test_methodology_wooldridge.py:L1549-L1654.
The new cohort-share aggregation matches the documented fail-closed contract in the Methodology Registry: survey-weighted fits raise, event-time cohort-share aggregation is restricted to k >= 0, and inference is nulled to NaN while unconditional share uncertainty remains deferred. See diff_diff/wooldridge_results.py:L355-L406, diff_diff/wooldridge_results.py:L496-L529, docs/methodology/REGISTRY.md:L1508-L1514, and TODO.md:L97-L99.
Severity: P1. The new cohort_trends=True surface does not define the all-eventually-treated normalization for the newly exposed cohort_trend_coefs. The code adds one dg_i·t column for every treated cohort and only checks “>= 2 pre-periods,” but the paper review already notes that in all-treated designs the last-cohort terms are dropped. As shipped, one trend coefficient must be absorbed by rank-deficiency, while the docs/results surface still present Dict[g -> δ_g] as if absolute cohort slopes were identified.
I did not find another unmitigated P0/P1 in the changed ATT aggregation math, variance fail-closures, or the newly added nonlinear tests.
Static review only; I could not run the test suite here because the environment is missing project Python dependencies (numpy was not importable).

Methodology

Severity: P1. Impact: cohort_trends=True is documented and surfaced as estimating per-cohort linear trends δ_g, but on all-eventually-treated panels those trend coefficients are only identified up to a baseline normalization. The implementation always appends a trend column for every treated cohort (diff_diff/wooldridge.py:L948-L953), only guards on pre-period count (diff_diff/wooldridge.py:L651-L669), and then exposes cohort_trend_coefs[g] directly from the solved coefficient vector (diff_diff/wooldridge.py:L1022-L1034, diff_diff/wooldridge_results.py:L75-L81). The paper review explicitly notes that when everyone is eventually treated, last-cohort terms are dropped (docs/methodology/papers/wooldridge-2025-review.md:L745-L745). The current REGISTRY/API text does not document that baseline/drop rule for cohort_trend_coefs (docs/methodology/REGISTRY.md:L1567-L1575, docs/api/wooldridge_etwfe.rst:L14-L16). Concrete fix: either explicitly omit the last cohort’s dg_i·t column when there is no never-treated group and document the baseline normalization, or reject cohort_trends=True on all-treated panels until that normalization is implemented and tested.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3. Impact: the remaining cohort-share and nonlinear-parity limitations are properly tracked rather than left implicit: response-scale vs link-scale nonlinear parity, survey-consistent cohort totals, unconditional cohort-share inference, and cohort_trends=True + survey_design. See TODO.md:L96-L99. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

No unmitigated findings. The prior AI-review P1 about missing nonlinear weights="cohort_share" coverage appears resolved by tests/test_methodology_wooldridge.py:L1014-L1145.
Static review only; I could not execute the suite in this environment.

Path to Approval

In diff_diff/wooldridge.py, make the all-treated cohort_trends=True normalization explicit: either drop the last cohort’s trend column deterministically (mirroring the paper’s all-treated dropping rule) or reject the configuration.
Add a regression in tests/test_methodology_wooldridge.py for an all-eventually-treated staggered panel with cohort_trends=True, asserting deterministic cohort_trend_coefs behavior and stable ATT output.
Add a REGISTRY/API note describing the baseline/identification contract for cohort_trend_coefs on all-treated designs.

…TRY note

github-actions · 2026-05-23T00:14:41Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0c5d766042de9aba26fc2e0067fc3253ce444c04

Overall Assessment
✅ Looks good

Executive Summary

The previous P1 on all-treated cohort_trends normalization appears resolved: the OLS full-dummy path now drops the last cohort trend when no never-treated baseline exists, the Registry documents that contract, and regression tests cover both all-treated and never-treated cases.
The new aggregate(weights="cohort_share") surface is fail-closed where it should be: group/calendar raise, event is restricted to k >= 0, survey fits raise, and unconditional share-uncertainty is explicitly deferred in TODO.md.
I did not find a new unmitigated P0/P1 in the changed estimator/results math.
P3: the API-facing docs still do not mirror the Registry’s baseline-normalization semantics for cohort_trend_coefs.
P3: the R golden generator now writes intermediate D_* columns back into the benchmark CSV, which is broader than the documented y_pois/y_logit augmentation.
Static review only; I could not run tests here because pytest is unavailable and importing diff_diff fails with ModuleNotFoundError: numpy.

Methodology

No unmitigated findings. The prior all-treated cohort_trends issue is addressed in diff_diff/wooldridge.py:L947-L960, diff_diff/wooldridge.py:L1043-L1057, docs/methodology/REGISTRY.md:L1567-L1576, and tests/test_methodology_wooldridge.py:L1605-L1682.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3. Impact: the benchmark generator persists the full working dataframe, so benchmarks/data/wooldridge_test_panel.csv:L1 now carries OLS-stage D_3_3...D_5_6 columns in addition to the new nonlinear outcomes. That broadens the fixture contract unnecessarily and makes the benchmark input noisier than the PR/docs describe. Concrete fix: when writing the augmented panel, subset to the canonical panel columns plus y_pois and y_logit. benchmarks/R/generate_wooldridge_golden.R:L47-L65, benchmarks/R/generate_wooldridge_golden.R:L291-L293

Tech Debt

No unmitigated findings. The remaining nonlinear response-scale parity gap and the cohort-share survey/inference limitations are explicitly tracked in TODO.md:L96-L99.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the implementation and Registry now define cohort_trend_coefs as baseline-relative, with the last cohort omitted on all-eventually-treated panels, but the API-facing field description still reads like a plain g -> δ_g mapping. That can still mislead users about what the numbers mean even though the estimator is now correct. Concrete fix: mirror the Registry note in the cohort_trend_coefs docstring or API page, explicitly stating “relative to the never-treated baseline; last cohort omitted on all-treated panels.” diff_diff/wooldridge_results.py:L75-L81, docs/api/wooldridge_etwfe.rst:L60-L76, docs/methodology/REGISTRY.md:L1567-L1576

…coefs baseline-normalization docstring

github-actions · 2026-05-23T00:27:45Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 200975d3d753ba138c44bf4d2724be2eab5bfc13

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior re-review notes look addressed: the API docs now explain the baseline-relative cohort_trend_coefs contract, and the benchmark CSV no longer persists the intermediate D_* columns.
The core estimator changes look aligned with the updated registry/review notes: weights="cohort_share" matches the documented Eq. 7.4 / 7.6 surface, survey-weighted cohort-share is fail-closed, k >= 0 is enforced for cohort-share event aggregation, and cohort_trends=True is correctly restricted to OLS.
Severity: P1. WooldridgeDiDResults.plot_event_study(weights="cohort_share") still renders confidence intervals from finite SEs, bypassing the PR’s explicit “NaN inference” contract for cohort-share aggregation.
Severity: P1. The new public cohort_trends / weights surfaces are not fully propagated into results metadata/reporting, so downstream consumers cannot tell which model/estimand produced overall_* or event_study_effects.
Static review only: this environment does not have numpy or pytest, so I could not run the new methodology tests.

Methodology

Severity: P1. Impact: the new cohort-share event-study plotting surface leaks invalid inference back into user-facing output. aggregate(..., weights="cohort_share") intentionally NaNs t_stat/p_value/conf_int because Section 7.5 requires uncertainty from the estimated cohort shares, but WooldridgeDiDResults.plot_event_study() throws those fields away and passes finite se values into the generic plotter, which reconstructs normal-theory CIs from se. That makes the plot contradict the documented fail-closed contract in diff_diff/wooldridge_results.py:L706-L742 and diff_diff/visualization/_event_study.py:L218-L249. Concrete fix: for weights="cohort_share", suppress error bars entirely by passing NaN SEs or explicit NaN CI overrides through the plotting helper, and add a regression that verifies no confidence intervals are drawn on that surface.
No other unmitigated methodology mismatches stood out in the changed estimator math. The cohort-share formulas, the survey fail-close, the k >= 0 restriction, and the all-treated trend normalization are all documented in the registry and paper-review notes.

Code Quality

No separate findings.

Performance

No findings.

Maintainability

Severity: P1. Impact: parameter propagation is incomplete for the new public surfaces. cohort_trends is added on the estimator side in diff_diff/wooldridge.py:L323-L358, but WooldridgeDiDResults never stores an explicit cohort_trends flag; meanwhile aggregate(..., weights="cohort_share") overwrites overall_* in place in diff_diff/wooldridge_results.py:L418-L464, while summary(), to_dataframe(), and __repr__() still label the output as generic “ATT” / “ATT (simple)” with no weighting tag in diff_diff/wooldridge_results.py:L587-L597, diff_diff/wooldridge_results.py:L663-L673, and diff_diff/wooldridge_results.py:L765-L774. Because the registry explicitly allows legitimate omission of trend coefficients on all-treated panels, cohort_trend_coefs is not a reliable proxy for “cohort_trends was on” (docs/methodology/REGISTRY.md:L1567-L1576). Concrete fix: add explicit results metadata for cohort_trends and the active aggregation weighting, thread it from fit() / aggregate(), and surface it in summary(), to_dataframe(), and __repr__() so persisted/reporting outputs stay self-describing.

Tech Debt

No unmitigated findings. The remaining deferred items added in TODO.md for survey-weighted cohort totals, unconditional cohort-share inference, nonlinear response-scale R parity, and survey + cohort_trends composition are appropriate tracked follow-ups, not blockers.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the new plotting tests only assert key-set propagation and stale-cache behavior; they never assert the no-CI contract for cohort-share plots or the missing metadata/reporting labels, so both P1 issues above are currently unguarded. See tests/test_methodology_wooldridge.py:L1549-L1733. Concrete fix: add one regression that verifies cohort-share plots suppress error bars/CI bands, and one that verifies results/reporting surfaces expose the active model/weighting metadata.

Path to Approval

Make plot_event_study(weights="cohort_share") honor the fail-closed inference contract by suppressing CI/error bars instead of recomputing them from finite SEs, and lock that with a regression test.
Add explicit cohort_trends and aggregation-weight metadata to WooldridgeDiDResults, propagate it through fit() / aggregate(), and expose it in summary(), to_dataframe(), and __repr__().

…Results metadata for cohort_trends/aggregation weights

github-actions · 2026-05-23T00:38:46Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f81b7b6a4ac72751c4b82ffc028ae025a335bfc7

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior plotting P1 is addressed: plot_event_study(weights="cohort_share") now re-aggregates under the requested weighting and suppresses CI/error bars by passing NaN SEs, which the plotter respects at diff_diff/wooldridge_results.py:L733-L782 and diff_diff/visualization/_event_study.py:L242-L252.
I did not find a new paper-vs-code mismatch in the estimator math itself. The new cohort-share aggregation, k >= 0 restriction on cohort-share event aggregation, conditional-on-shares NaN inference, OLS-only cohort_trends, and cohort_trends + survey_design fail-close behavior are all documented in docs/methodology/REGISTRY.md:L1501-L1514, docs/methodology/REGISTRY.md:L1567-L1576, and docs/methodology/papers/wooldridge-2025-review.md:L607-L618.
Severity P1: the replacement metadata/reporting fix is still incorrect. A single global active_aggregation_weights flag can silently mislabel cached outputs after mixed aggregation calls or failed cohort_share calls, so summary() can report the wrong estimand for overall_* / event_study_effects.
to_dataframe() and __repr__() still do not carry the new model/weight metadata, so exported outputs remain ambiguous.
Static review only: this environment cannot import numpy, so I could not execute the new methodology tests.

Methodology

Severity P1. Impact: overall_*, group_effects, calendar_effects, and event_study_effects are cached independently, but active_aggregation_weights is a single global label mutated at the start of aggregate() in diff_diff/wooldridge_results.py:L215-L221, before later fail paths in diff_diff/wooldridge_results.py:L379-L408, and regardless of which cache is actually recomputed in diff_diff/wooldridge_results.py:L443-L564. That means sequences like fit() -> aggregate("event", weights="cohort_share") -> summary("simple") will label the fit-time cell-weighted overall_* from diff_diff/wooldridge.py:L1283-L1379 as cohort_share, and the reverse mislabel happens after later cell-only aggregations. Because the weighting scheme changes the estimand, this is a silent semantic-contract violation, not just a display nit. to_dataframe() / __repr__() also remain unlabeled at diff_diff/wooldridge_results.py:L674-L731 and diff_diff/wooldridge_results.py:L805-L814. Concrete fix: store weighting metadata per cached aggregation surface (simple, group, calendar, event) or clear incompatible caches whenever weights change; only mutate metadata after the requested aggregation succeeds; then make summary(), to_dataframe(), and __repr__() use the per-surface metadata.

Code Quality

No separate findings.

Performance

No findings.

Maintainability

No separate findings beyond the methodology/reporting contract issue above.

Tech Debt

No unmitigated findings. The remaining unsupported surfaces are properly tracked in TODO.md:L96-L99, so they are informational rather than blockers.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new metadata regression only checks the straight-line case aggregate("simple", weights="cohort_share") -> summary("simple") in tests/test_methodology_wooldridge.py:L1588-L1627. It does not cover mixed cache orderings or failed aggregation attempts, which is why the P1 above is still unguarded. Concrete fix: add regressions for aggregate("event", weights="cohort_share") -> summary("simple"), aggregate("simple", weights="cohort_share") -> aggregate("group") -> summary("simple"), and a failed aggregate(weights="cohort_share") on survey/legacy results leaving metadata unchanged.

Path to Approval

Replace the single global active_aggregation_weights field with per-aggregation metadata, or clear stale cached aggregations whenever a new weighting scheme is applied.
Make metadata mutation atomic: do not change reporting labels until the requested aggregation passes all validation and completes successfully.
Thread the correct per-surface metadata through summary(), to_dataframe(), and __repr__(), and add the three mixed-order / failed-call regressions above.

…c mutation

github-actions · 2026-05-23T00:50:28Z

🔁 AI review rerun (requested by @igerber)

Head SHA: fa46df656629b4e0facf141b52c38f94fa05370f

Overall Assessment

✅ Looks good

Executive Summary

The prior P1 is resolved: weighting metadata is now stored per aggregation surface and updated atomically, so summary() no longer mislabels mixed-order or failed aggregation sequences. The fix is in diff_diff/wooldridge_results.py:L103-L126, diff_diff/wooldridge_results.py:L443-L569, with regression coverage in tests/test_methodology_wooldridge.py:L1588-L1664 and tests/test_methodology_wooldridge.py:L1801-L1849.
I did not find an undocumented paper-vs-code mismatch in the new estimator behavior. The weights="cohort_share" simple/event formulas, the k >= 0 event restriction, the survey fail-close, the conditional-on-shares NaN inference, and the OLS-only cohort_trends=True path with a per-cohort pre-period identification check all match the documented contract in docs/methodology/REGISTRY.md:L1501-L1514 and docs/methodology/REGISTRY.md:L1567-L1576.
Severity P2: to_dataframe() and __repr__() still omit the new weighting/trend metadata, so exported artifacts remain ambiguous after aggregate(weights="cohort_share") or cohort_trends=True. See diff_diff/wooldridge_results.py:L679-L736 and diff_diff/wooldridge_results.py:L810-L819.
The remaining nonlinear R-parity, survey-weighted cohort-share totals, unconditional cohort-share inference, and survey_design + cohort_trends gaps are properly tracked in TODO.md:L96-L99, so they are informational rather than blockers.
Static review only: I could not execute the Wooldridge tests here because this environment lacks pytest and even the package runtime deps needed for import (pandas).

Methodology

No unmitigated findings. The methodology-sensitive changes are either aligned with the paper/registry or explicitly fail-closed and documented.

Code Quality

No findings.

Performance

No findings. The heavier full-dummy cohort_trends=True route is at least surfaced with a design-size warning in diff_diff/wooldridge.py:L911-L932.

Maintainability

Severity P2. Impact: summary() is now self-describing, but to_dataframe() and __repr__() still drop the new aggregation_weights / cohort_trends context, so downstream exports can no longer tell whether an ATT is cell-weighted or cohort-share weighted, or whether it came from the heterogeneous-trends specification. Concrete fix: add cohort_trends plus the active aggregation-weight label to aggregated DataFrame exports and the repr string, and add regressions alongside the new summary metadata tests. Locations: diff_diff/wooldridge_results.py:L679-L736, diff_diff/wooldridge_results.py:L810-L819, tests/test_methodology_wooldridge.py:L1588-L1664.

Tech Debt

Severity P3. Impact: the remaining follow-ups are properly tracked, and the PR fail-closes the risky combinations instead of returning silent mixed-estimand or under-covered inference outputs. Concrete fix: none required in this PR; keep follow-up work tied to TODO.md:L96-L99.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the in-code docstrings were not fully updated for the new public surface. WooldridgeDiD still describes the estimator as Wooldridge (2021) and its parameter block stops at vcov_type, and WooldridgeDiDResults still says to call .aggregate(type) without mentioning weights. Concrete fix: update the in-code docstrings to the 2025 reference and document cohort_trends and aggregate(..., weights=...). Locations: diff_diff/wooldridge.py:L263-L320, diff_diff/wooldridge_results.py:L17-L22.

… in to_dataframe + repr + docstrings

github-actions · 2026-05-23T01:00:34Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d584060419e586fbd11b8ae94e5999db483831dc

Overall Assessment

⚠️ Needs changes

Executive Summary

One unmitigated P1 remains in the changed estimator surface: cohort_trends=True is not identified under control_group="never_treated", but the new API/registry/tracker text presents Section 8 heterogeneous trends as generally supported.
The new weights="cohort_share" simple/event aggregation work is otherwise aligned with the documented Eq. 7.4 / 7.6 contract, and the survey-weighted + unconditional-inference gaps are fail-closed and tracked in TODO.md:L97-L99, so I did not count those as defects.
The prior informational gaps from the last review look resolved: the new weighting/trend metadata now flows through summary(), to_dataframe(), and __repr__(), and the estimator/results docstrings were updated for Wooldridge (2025).
Static review only: I could not execute the Wooldridge tests here because the environment is missing runtime deps (pandas import fails).

Methodology

Severity P1 [Newly identified]. Impact: WooldridgeDiD(cohort_trends=True, control_group="never_treated") does not implement the Section 8 model the PR now advertises. The OLS never_treated branch builds a design with every treated-cohort (g,t) dummy, including pre-treatment placebo cells (diff_diff/wooldridge.py:L163-L218), and the new trend path then appends dg_i · t columns on top (diff_diff/wooldridge.py:L915-L988). For any treated cohort, dg_i · t = Σ_t t·1{g,t} on that design, so the trend column is fully spanned by the existing cell dummies; fit() only checks the “≥2 pre-periods” condition and therefore accepts an unidentified specification (diff_diff/wooldridge.py:L660-L691). That conflicts with the paper-review’s Section 8 mapping, which adds dg_i · t to the g_it block while keeping the post-treatment h_it basis from Eq. 5.3 / Appendix A (docs/methodology/papers/wooldridge-2025-review.md:L314-L317), and with the new support claims in the registry/API/tracker (docs/methodology/REGISTRY.md:L1567-L1576, docs/api/wooldridge_etwfe.rst:L14-L27, METHODOLOGY_REVIEW.md:L595-L613). Concrete fix: either reject cohort_trends=True when control_group="never_treated" with a front-door NotImplementedError and document that limitation, or redesign that branch so the Section 8 trend columns are not added to the all-(g,t) placebo-dummy basis; add a regression that explicitly exercises control_group="never_treated", cohort_trends=True.

Code Quality

No findings.

Performance

No findings.

Maintainability

No new findings. The prior self-description issue appears resolved in diff_diff/wooldridge_results.py:L583-L850.

Tech Debt

Severity P3. Impact: the remaining cohort-share survey/inference limits and cohort_trends=True + survey_design limit are explicitly fail-closed and tracked, so they are informational rather than blockers (TODO.md:L97-L99). Concrete fix: none required in this PR beyond keeping those follow-ups attached to the existing TODO rows.

Security

No findings.

Documentation/Tests

No separate finding beyond the P1 above. The prior docstring/API metadata gaps appear resolved, but the test suite still lacks a direct regression on control_group="never_treated" + cohort_trends=True; the closest “never-treated” cohort-trends test still instantiates the default control-group path (tests/test_methodology_wooldridge.py:L1777-L1799).

Path to Approval

Add a front-door guard or a corrected implementation for cohort_trends=True with control_group="never_treated" so the Section 8 surface is either truly identified or explicitly unsupported.
Add a methodology test that instantiates WooldridgeDiD(control_group="never_treated", cohort_trends=True) on a panel with never-treated units and asserts the chosen contract, then align REGISTRY.md, docs/api/wooldridge_etwfe.rst, and METHODOLOGY_REVIEW.md with that contract.

…'never_treated' (unidentified)

github-actions · 2026-05-23T01:14:45Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ebb255554d5c873db22f058ea912a3263afcc87b

Overall Assessment

✅ Looks good

Executive Summary

The prior P1 from the last review is resolved: cohort_trends=True + control_group="never_treated" now fail-closes at the estimator boundary and is documented/tested consistently.
The new aggregate(weights="cohort_share") surface is aligned with the W2025 Eq. 7.4 / 7.6 contract, and the unsupported survey/unconditional-inference cases are fail-closed and tracked rather than silently changing the estimand.
I did not find any new unmitigated P0/P1 issues in the changed estimator/results code.
I found two minor documentation issues only: the API overview still describes cohort_trends=True too broadly, and METHODOLOGY_REVIEW.md points the Stata parity follow-up to the wrong TODO.md row.
Static review only: I could not execute the Wooldridge tests here because python -c "import pandas" fails with ModuleNotFoundError.

Methodology

No findings. Re-review check passed: the previously unidentified cohort_trends=True + control_group="never_treated" branch is now rejected at fit() in diff_diff/wooldridge.py:L593-L617, with matching documentation/tests in docs/methodology/REGISTRY.md:L1575-L1576, TODO.md:L99-L100, and tests/test_methodology_wooldridge.py:L1485-L1537.
No findings. The new cohort-share aggregation matches the documented W2025 Section 7 contract and fail-closes the unsupported survey and conditional-on-shares inference cases instead of returning misleading inference, at diff_diff/wooldridge_results.py:L400-L451 and diff_diff/wooldridge_results.py:L545-L579.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The remaining Wooldridge limitations introduced or referenced by this PR are explicitly tracked and fail-closed, so I did not count them against the assessment: TODO.md:L96-L100.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the API overview still presents cohort_trends=True as generally supported, but the runtime contract now excludes at least control_group="never_treated". Users reading the public API page will discover that restriction only at runtime. Concrete fix: mirror the registry note into the overview and the cohort_trends parameter docs. References: docs/api/wooldridge_etwfe.rst:L14-L16, diff_diff/wooldridge.py:L327-L341, diff_diff/wooldridge.py:L606-L617, docs/methodology/REGISTRY.md:L1575-L1576.
Severity P3. Impact: METHODOLOGY_REVIEW.md points the Stata jwdid parity follow-up to TODO row 97, but row 97 is the survey cohort-share follow-up; the Stata parity item is row 102. That makes the tracker harder to trust as a navigation surface. Concrete fix: update the reference to TODO.md:L102 or replace row-number references with a semantic label. References: METHODOLOGY_REVIEW.md:L617-L620, TODO.md:L97-L102.
Static-review note: I could not run tests/test_methodology_wooldridge.py in this environment because pandas is not installed.

…treated rejection + remove stale TODO row reference

github-actions · 2026-05-23T01:27:38Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b9986c5feb9b777b336c9dfc518c43919f177412

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review documentation items are addressed: the API docs now describe the cohort_trends=True runtime restrictions, and the tracker no longer points the Stata parity follow-up at the wrong TODO row.
I did not find any new unmitigated P0/P1 issues in the changed Wooldridge estimator/results code. The new weights="cohort_share" and cohort_trends=True surfaces either implement the paper contract or fail closed where the methodology is deferred.
One P3 methodology-doc inconsistency remains in docs/methodology/papers/wooldridge-2025-review.md: the file says the cohort-share gap is closed, but earlier sections still describe the library as cell-count-only.
One P3 documentation metadata issue remains: CHANGELOG.md and METHODOLOGY_REVIEW.md undercount the current tests/test_methodology_wooldridge.py suite.
Static review only; I did not execute the test suite in this environment.

Methodology

No unmitigated P0/P1 findings. The estimator-side implementation is aligned with the paper/registry contract or explicitly fail-closed on deferred combinations: diff_diff/wooldridge.py:L454-L462, diff_diff/wooldridge.py:L591-L627, diff_diff/wooldridge_results.py:L224-L231, diff_diff/wooldridge_results.py:L400-L451, diff_diff/wooldridge_results.py:L545-L579, docs/methodology/REGISTRY.md:L1508-L1515, docs/methodology/REGISTRY.md:L1568-L1577.
Severity P3. Impact: the methodology bundle is internally inconsistent about aggregation weights. The PR-B closure notes correctly say the Eq. 7.4 / 7.6 cohort-share path is shipped, but the earlier aggregation table and the Stata reference note still describe WooldridgeDiD as cell-count-only. That can mislead future reviewers into treating the new surface as an unresolved deviation. Concrete fix: update those sections to say weights="cell" remains the default for jwdid_estat parity, while opt-in weights="cohort_share" implements paper Eq. 7.4 / 7.6 for type="simple" and type="event". References: docs/methodology/papers/wooldridge-2025-review.md:L542-L547, docs/methodology/papers/wooldridge-2025-review.md:L607-L618, docs/methodology/papers/wooldridge-2025-review.md:L714-L715.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The remaining Wooldridge limitations introduced here are fail-closed and explicitly tracked in TODO.md:L96-L100, so they are mitigated under the project policy.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the release-tracking docs still claim tests/test_methodology_wooldridge.py has “60 total” tests, but the file now contains 82 test_* methods across 10 classes, including the new Poisson/logit parity classes. That makes CHANGELOG.md and METHODOLOGY_REVIEW.md less reliable as audit surfaces. Concrete fix: update the hard-coded totals in CHANGELOG.md:L16-L16 and METHODOLOGY_REVIEW.md:L603-L605, or replace exact totals with a class-based description that is less likely to drift. Reference: tests/test_methodology_wooldridge.py:L336-L2816.
Static review only: I did not run tests/test_methodology_wooldridge.py in this environment.

… shipped opt-in surface + test-count genericization

github-actions · 2026-05-23T01:38:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5b9f289be75ea3bf085097ca890bcfbd717ee632

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review’s two documentation P3s are resolved. The Wooldridge paper review now reflects the shipped cohort-share surface instead of describing the library as cell-count-only, and the tracker/changelog now use non-stale coverage wording rather than the old hard-coded totals. docs/methodology/papers/wooldridge-2025-review.md:L540-L547 docs/methodology/papers/wooldridge-2025-review.md:L607-L618 docs/methodology/papers/wooldridge-2025-review.md:L714-L725 METHODOLOGY_REVIEW.md:L603-L606 CHANGELOG.md:L12-L16
I did not find any new unmitigated P0/P1 issues in the changed Wooldridge estimator/results code. The new Section 7 and Section 8 surfaces either implement the paper contract or fail closed where the derivation is still deferred. diff_diff/wooldridge_results.py:L183-L230 diff_diff/wooldridge_results.py:L389-L579 diff_diff/wooldridge.py:L327-L350 diff_diff/wooldridge.py:L454-L461 diff_diff/wooldridge.py:L591-L727 diff_diff/wooldridge.py:L951-L1024 docs/methodology/REGISTRY.md:L1508-L1514 docs/methodology/REGISTRY.md:L1567-L1577
Remaining gaps are explicitly tracked and fail closed instead of emitting silent numbers: nonlinear R parity remains surface-only, cohort-share survey composition is rejected, unconditional cohort-share inference is NaN-failed, and cohort_trends remains blocked for survey and never_treated. Under the project policy those are mitigated P3s, not blockers. TODO.md:L96-L100
Static review only. I could not execute tests/test_methodology_wooldridge.py in this environment because the Python test tooling/dependencies are not installed.

Methodology

No findings. aggregate(weights="cohort_share") is restricted to the paper-supported simple and event paths, rebuilds Bell-McCaffrey contrast DOF under the active weights, rejects survey-weighted composition, and NaN-fails inference until unconditional share uncertainty is derived. cohort_trends=True is restricted to the documented OLS / not_yet_treated / non-survey surface and enforces the per-cohort pre-period identification guard. diff_diff/wooldridge_results.py:L183-L230 diff_diff/wooldridge_results.py:L260-L327 diff_diff/wooldridge_results.py:L389-L579 diff_diff/wooldridge.py:L327-L350 diff_diff/wooldridge.py:L454-L461 diff_diff/wooldridge.py:L591-L727 diff_diff/wooldridge.py:L951-L1024 docs/methodology/REGISTRY.md:L1508-L1514 docs/methodology/REGISTRY.md:L1567-L1577

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3. Impact: the remaining deferred work is appropriately fail-closed and tracked, so it does not create a silent correctness bug in this PR. Concrete fix: keep the follow-up items in TODO.md until the survey-weighted cohort-share totals, unconditional cohort-share inference, nonlinear response-scale parity bridge, and cohort_trends cross-products are actually derived and parity-validated. TODO.md:L96-L100

Security

No findings.

Documentation/Tests

No findings. The documentation inconsistencies from the prior review are fixed in the changed paper-review/tracker/changelog surfaces. docs/methodology/papers/wooldridge-2025-review.md:L540-L547 docs/methodology/papers/wooldridge-2025-review.md:L607-L618 docs/methodology/papers/wooldridge-2025-review.md:L714-L725 METHODOLOGY_REVIEW.md:L603-L606 CHANGELOG.md:L12-L16
Static review only; I could not run the Wooldridge methodology tests here because the environment is missing the Python test stack.

igerber and others added 2 commits May 22, 2026 16:05

wooldridge: CI R1 fixes — plot_event_study weights propagation + coho…

379e065

…rt_trends × group/calendar coverage + TODO row

igerber force-pushed the feature/wooldridge-2025-pr-b branch from 9939fa1 to 379e065 Compare May 22, 2026 20:08

wooldridge: CI R2 P1 fix — plot_event_study always re-aggregates + co…

a884867

…hort_share→cell regression

wooldridge: CI R3 P1 fix — cohort_share aggregation tests on logit + …

426cb32

…Poisson paths

wooldridge: CI R4 P1 fix — all-treated last-cohort trend drop + REGIS…

0c5d766

…TRY note

wooldridge: CI R5 P3 fixes — narrow panel CSV columns + cohort_trend_…

200975d

…coefs baseline-normalization docstring

wooldridge: CI R6 P1 fixes — plot cohort_share suppresses CI bands + …

f81b7b6

…Results metadata for cohort_trends/aggregation weights

wooldridge: CI R7 P1 fix — per-surface aggregation_weights with atomi…

fa46df6

…c mutation

wooldridge: CI R8 P2/P3 — surface cohort_trends + per-surface weights…

d584060

… in to_dataframe + repr + docstrings

wooldridge: CI R9 P1 fix — reject cohort_trends=True + control_group=…

ebb2555

…'never_treated' (unidentified)

wooldridge: CI R10 P3 fixes — API/docstring note cohort_trends never_…

b9986c5

…treated rejection + remove stale TODO row reference

wooldridge: CI R11 P3 fixes — paper-review aggregation table reflects…

5b9f289

… shipped opt-in surface + test-count genericization

igerber added the ready-for-ci Triggers CI test workflows label May 23, 2026

igerber merged commit 0848c32 into main May 23, 2026
33 of 34 checks passed

igerber deleted the feature/wooldridge-2025-pr-b branch May 23, 2026 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR-B: WooldridgeDiD tracker promotion + methodology bundle#486

PR-B: WooldridgeDiD tracker promotion + methodology bundle#486
igerber merged 12 commits into
mainfrom
feature/wooldridge-2025-pr-b

igerber commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 22, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant