Skip to content

feat(policy): per-tenant password complexity + session timeouts#275

Open
iarunsaragadam wants to merge 1 commit into
mainfrom
feat/issue-263-tenant-password-session-policy
Open

feat(policy): per-tenant password complexity + session timeouts#275
iarunsaragadam wants to merge 1 commit into
mainfrom
feat/issue-263-tenant-password-session-policy

Conversation

@iarunsaragadam

Copy link
Copy Markdown
Contributor

Closes #263

What

Additively extends LoginPolicy so an enterprise tenant can tighten its members' authentication beyond the global defaults — addressing the SOC2/enterprise asks in #263 (per-org password complexity + session idle/absolute timeouts).

New LoginPolicy fields (all default 0/false = "use global behavior")

  • password_min_length, password_require_classes — per-tenant password complexity.
  • session_idle_timeout_seconds, session_absolute_timeout_seconds — per-tenant session timeouts.

Enforcement

  • Passwords: validatePasswordStrengthForEmail resolves the owning tenant's policy (domain → tenant → policy, fail-safe) and validates via a new passwords.StrengthPolicy. Wired into password signup, password reset, and accept-invitation. A tenant can only tighten (min-length below the global floor is clamped up); ungoverned users keep the global rules. A global baseline check still runs before any token lookup so behavior/info-leak is unchanged.
  • Sessions: enforceSessionTimeout runs on the RefreshToken hot path against the refresh row's LastUsedAt (idle) and CreatedAt (absolute). On breach it returns the new ErrSessionExpired and deletes the dead refresh row. Unset timeouts and ungoverned users impose no restriction.

Storage / docs

  • Postgres migration 0263 adds the four columns (additive, defaulted). LoginPolicyStore upsert/scan carry the fields; store smoke test extended to round-trip them and assert overwrite-on-upsert.
  • docs/redesign/schema.md login_policies table updated.
  • The domain→tenant→policy walk is factored into resolveLoginPolicy, reused by method, password, and session enforcement.

Backward compatibility

Purely additive. No proto change. No existing RPC/field touched. LoginPolicy is postgres-only governance state (memory/sqlite have no policy plane), so no conformance change is required. Existing ListAuditEvents is untouched.

Tests

  • pkg/passwords: ValidateStrengthWithPolicy tightening + clamp-to-floor.
  • internal/service: per-tenant min-length at signup, tenant-scoped (other-domain unaffected), idle + absolute timeout on the real RefreshToken path (incl. dead-row deletion), and fail-safe fallback.
  • Postgres login_policy_store smoke round-trips the new columns.

Deferred (noted for #263 follow-up)

  • Tenant-scoped ExportAuditEvents RPC and exposing LoginPolicy via proto/admin RPCs — depend on the RBAC / policy-API issues that must emit audit events.
  • ProfileService.ChangePassword stays on the global baseline until the governance plane is injected into that service (it has no governance dependency today).

🤖 Generated with Claude Code

Extend LoginPolicy with four additive fields so an org can tighten its
members' authentication beyond the global defaults:

- password_min_length / password_require_classes — enforced at password
  signup, reset, and accept-invitation via a per-tenant StrengthPolicy
  resolved from the governance plane (domain → tenant → policy). A tenant
  can only ever raise the floor, never loosen below the global baseline;
  ungoverned users keep the global rules.
- session_idle_timeout_seconds / session_absolute_timeout_seconds —
  enforced on the RefreshToken hot path against the refresh row's
  LastUsedAt / CreatedAt, returning the new ErrSessionExpired and deleting
  the dead session. Unset (0) preserves current behavior.

Add postgres migration 0263 (additive columns, default 0/false), wire the
columns through the LoginPolicyStore upsert/scan, extend the store smoke
test, and update the schema doc. The domain→tenant→policy walk is factored
into resolveLoginPolicy so password and session enforcement reuse the same
fail-safe lookup as method enforcement.

Tests cover the strength policy (pkg/passwords), per-tenant min-length at
signup, idle/absolute timeout on the refresh path, and fail-safe fallback
for ungoverned users.

Deferred: tenant-scoped ExportAuditEvents RPC and exposing LoginPolicy via
proto/admin RPCs (depends on the RBAC/policy-API issues); ProfileService
ChangePassword stays on the global baseline until governance is wired into
that service.

Closes #263

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30a99e354c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/service/auth.go
if timeoutUser == nil {
return nil, "", "", fmt.Errorf("%w: user not found", ErrNotFound)
}
if err := s.enforceSessionTimeout(ctx, timeoutUser.Email, s.nowMs(), record.CreatedAt, record.LastUsedAt); err != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve original session age for absolute timeout

When a tenant sets SessionAbsoluteTimeoutSeconds, this check compares against record.CreatedAt from the current refresh-token row. RefreshToken then rotates by calling issueTokens, which writes each replacement refresh token with CreatedAt: now, so a client that refreshes more often than the absolute window keeps resetting the baseline and never hits the absolute session timeout. Use a stable session start timestamp for the absolute check instead of the rotated token's creation time.

Useful? React with 👍 / 👎.

Comment thread internal/service/auth.go
// ErrSessionExpired is returned when a still-valid refresh token is
// rejected because the owning tenant's LoginPolicy idle or absolute
// session timeout has elapsed.
ErrSessionExpired = errors.New("session expired")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Map session expiry to a client auth error

When a tenant idle or absolute timeout is exceeded, RefreshToken returns the new ErrSessionExpired, but internal/connect/errors.go::toConnectError does not map this sentinel. In that scenario the public RefreshToken RPC falls through to CodeInternal instead of an authentication/expiry response, so clients see a server error and cannot reliably route the user back to sign-in.

Useful? React with 👍 / 👎.

@@ -0,0 +1,8 @@
-- Per-tenant password complexity + session timeout policy fields.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep migration versions contiguous

The repository currently has migrations only through 0016, so adding this as version 0263 advances golang-migrate's recorded schema version to 263 on upgraded databases. A later normal migration numbered 0017 through 0262 would be lower than the recorded version and would not be applied, leaving those deployments missing schema changes unless every future migration intentionally starts at 0264.

Useful? React with 👍 / 👎.

@iarunsaragadam iarunsaragadam left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR review gate: ❌ BLOCKED

Reviewer Effective verdict Surviving blockers
Product REQUEST_CHANGES 1
Security APPROVE 0
Performance APPROVE 0
Maintainability REQUEST_CHANGES 1
Correctness APPROVE 0
Contract & Migration APPROVE 0

Conditional reviewers: Contract & Migration — ran (proto/API/schema/migration touched); Accessibility & UX — skipped (no UI changes).

The gate is decided solely by surviving confirmed blockers. Security and Contract & Migration each raised a REQUEST_CHANGES finding that was refuted on adversarial re-verification, so their effective verdict is APPROVE and they do not block. Two confirmed blockers remain.

Confirmed blockers

Product — Absolute session timeout never fires for an active user (CreatedAt resets on every rotation)

  • Location: internal/service/login_policy_enforce.go (enforceSessionTimeout) + internal/service/auth.go:1058-1109 (issueTokens)
  • Detail: session_absolute_timeout_seconds is documented (login_policy.go and docs/redesign/schema.md) as "invalidate a session older than this regardless of activity." Enforcement compares now - record.CreatedAt against the absolute window, but refresh tokens rotate on every RefreshToken call and issueTokens stamps a fresh CreatedAt=now on each rotated row with no carry-forward of the original session start. For any user refreshing more often than the absolute window, CreatedAt is perpetually recent and the absolute timeout never triggers — the SOC2/enterprise "force re-auth" guarantee is not met. The idle path is fine (LastUsedAt tracks last refresh). Tests mask the gap: TestRefreshToken_TenantTimeout_FreshSessionPasses only asserts a fresh session passes, and TestEnforceSessionTimeout_Absolute calls the helper directly with a hand-set old CreatedAt — neither exercises absolute expiry across a real rotation.
  • Fix: Carry a session-origin timestamp forward across rotations (persist/propagate the original CreatedAt or a session_started_at on the rotated refresh row) and compare the absolute timeout against that origin, not the per-rotation CreatedAt. Add a RefreshToken-path test that rotates at least once and asserts ErrSessionExpired once the absolute window since session start has elapsed. If carrying the origin is out of scope for this PR, drop the session_absolute_timeout_seconds field + docs and note it under Deferred rather than shipping a control that silently does nothing.

Maintainability — password_require_classes / RequireClasses wired end-to-end but never affects validation (inert governance field)

  • Location: pkg/passwords/strength.go:55-62 (RequireClasses) and internal/service/login_policy_enforce.go:18-26 (passwordStrengthPolicyFor)
  • Detail: StrengthPolicy.RequireClasses is added with a long doc comment, persisted as the password_require_classes Postgres column (migration 0263, login_policy_store.go scan+upsert), exposed as LoginPolicy.PasswordRequireClasses, and mapped into StrengthPolicy. But ValidateStrengthWithPolicy never reads policy.RequireClasses — the upper/lower/digit/special checks are unconditional, so password_require_classes=true and =false produce byte-identical validation. This is an inert field/column threaded through the whole stack, which AGENTS.md §2 (no dead code) and the "no half-finished implementations" rule prohibit, yet the PR markets it as a working per-tenant complexity control. TestPasswordStrengthPolicyFor only asserts the field mapping because there is no behavioral difference to assert.
  • Fix: Either (a) drop RequireClasses, PasswordRequireClasses, and the password_require_classes column from this PR until per-class control is actually implemented (an additive migration is easy to add later), or (b) make it load-bearing now — gate the four class checks behind it so the zero policy still enforces them (preserving the global default) but the flag genuinely tightens. Do not ship a column whose value never changes behavior. If kept, the doc comment must describe present behavior (not aspirational future per-class relaxation) and a test must assert an actual behavioral effect.

Refuted (raised but verification cleared them)

  • Security — "Absolute session timeout resets on every refresh": Same underlying observation as the confirmed Product blocker; retained as the Product finding, so the duplicate Security blocker was dropped (not a separate defect).
  • Contract & Migration — "New ErrSessionExpired unmapped → leaks as CodeInternal (HTTP 500)": Re-verification cleared this; ErrSessionExpired does not fall through to CodeInternal on the RefreshToken wire path as claimed.

Non-blocking findings

  • Product (minor): PasswordRequireClasses is surfaced/documented but has no enforcement effect today; schema.md's "Demands all four char classes" overstates current behavior (classes are always enforced globally). Soften the wording or make the flag load-bearing. (Overlaps the Maintainability blocker.)
  • Product (nit): No CHANGELOG/release-note entry for the new tenant-facing governance controls, migration 0263, or the new session-expired outcome on RefreshToken.
  • Performance (minor): RefreshToken adds 3 uncached governance round-trips (domain→tenant→policy) per refresh for governed tenants; all index-backed point reads, but worth folding behind the existing short-TTL cache (project_resolver_cache.go) in a follow-up.
  • Performance (nit): No load/bench coverage for the added refresh-path governance lookups.
  • Maintainability (nit): RequireClasses doc comment explains future intent rather than present behavior (AGENTS.md §5).
  • Contract & Migration (minor): Migration numbering jumps 0016 → 0256 → 0263; harmless under golang-migrate's numeric ordering, but document whether versions are sequential or issue-derived to avoid future collisions.

Recommendation

Block merge until the two confirmed blockers are resolved. Both are "half-finished control" issues that the PR advertises as working: the absolute session timeout silently never fires for active users, and password_require_classes is inert across the entire stack. For each, either make the control load-bearing (carry a session-origin timestamp; gate class checks on the flag) with a test that proves the behavioral difference, or remove the field/column/docs and defer it explicitly. The rest of the PR — tighten-only password min-length, fail-safe policy resolution, the additive defaulted migration, and the idle-timeout path — is sound and well-tested. Address the non-blocking items (changelog, schema.md wording, governance-walk caching) opportunistically.

Generated by the PR review gate (AGENTS.md §11) — 5 always-on reviewers + conditional Contract/Accessibility, with adversarial verification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-tenant password + session timeout policy and tenant-scoped audit-log export

1 participant