Skip to content

feat(DDE-365): cert-auth (mTLS) — feat #1–#8#8

Open
nc-markovic wants to merge 11 commits into
fix/DDE-365-security-hardeningfrom
feat/DDE-365-cert-auth-mtls
Open

feat(DDE-365): cert-auth (mTLS) — feat #1–#8#8
nc-markovic wants to merge 11 commits into
fix/DDE-365-security-hardeningfrom
feat/DDE-365-cert-auth-mtls

Conversation

@nc-markovic

Copy link
Copy Markdown
Collaborator

Summary

Cert-auth (mTLS) support — server→AEM authentication via X.509 client cert
handshake. New CertAuthStrategy is selected automatically when --cert

Out of scope (handled at upstream layer): CRL/OCSP revocation, MCP
client→server auth (the /mcp endpoint stays open — only loopback bind
protects it; operators add their own auth in front of any non-loopback
exposure).

Related Issue

Fixes #

Stack note

This branch stacks on fix/DDE-365-security-hardening (PR #7). When #7
merges to main, this PR will auto-retarget to main and the diff will
narrow to the 9 cert-auth commits.

Commits shipped

# Title Commit
feat #1 AuthStrategy interface + Basic/OAuth strategies 3b05cae
feat #2 AEMFetch consumes AuthStrategy (BREAKING) f87796c
feat #3 CertAuthStrategy — singleton Agent, PEM guards, encrypted-key 8093018
feat #4 --cert/--key/--ca + Zod validation + env fallback ee6ffe6
feat #5 Factory selection + conflict warning + sanitize cert mode e8f3193
feat #6 destroyAllCertStrategies on SIGINT/SIGTERM drain 2d3ad1d
feat #7 SIGHUP-driven reload + optional mtime poll f8cc5e2
feat #8 README: auth modes, rotation contract, trust boundary 3504260
chore move phase1-smoke-audit out of public docs 4285531

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

feat #2 (f87796c) removes the AEMBasicAuth, AEMOAuthServerToServer,
and AEMAuth public type exports and changes AEMFetchConfig.auth to
AEMFetchConfig.authStrategy. The commit carries a BREAKING CHANGE:
footer; semantic-release will produce a major NPM version bump on
merge.

Highlights

  • mTLS client cert handshake — identity rides in the TLS handshake; no
    Authorization header sent to AEM. Singleton undici.Agent per
    process (one keep-alive socket pool per cert mode), built once in
    init(), attached as dispatcher to native fetch.
  • PEM defense-in-depthreadAndGuardPem rejects in this order:
    path traversal (..), oversized files (>1 MB), non-PEM content (first
    64 bytes must start with -----BEGIN ). tls.createSecureContext
    validates the keypair at boot before any HTTPS request — keypair
    mismatch surfaces a one-line sanitized error, no OpenSSL trace.
  • Encrypted keys via env-only — PKCS#8 -----BEGIN ENCRYPTED PRIVATE KEY----- requires AEM_KEY_PASSPHRASE. No --passphrase CLI flag —
    CLI args appear in ps aux, env vars do not.
  • Graceful shutdown — feat Code QL fails workflow #6 walks a module-level Set of live
    CertAuthStrategy instances on SIGINT/SIGTERM and calls
    Agent.destroy() on each, bounded by a 5s race timeout. Stderr line:
    [shutdown] destroyed N cert-auth agent pool(s).
  • Rotation without restart — feat fix(DDE-365): security hardening — leak #1–#17 #7 CertAuthStrategy.reload()
    re-reads PEMs, validates the keypair, builds a new Agent, atomically
    swaps it in, and schedules the old Agent for destruction after a 30s
    drain window. Triggered by SIGHUP or optional mtime poll
    (--cert-watch-interval-min N). Bad-PEM reloads keep the old Agent
    active — no broken state.
  • Log fidelity (feat Release merge #5 sanitize fix)redactCliParams learns
    'cert' authMode plus hasCert/hasKey/hasCa/hasPassphrase booleans.
    Passphrase value is never echoed.

Test plan / Evidence

Automated smoke harnesses (under src/test/, gitignored)

Harness Cases Result What it proves
smoke-cert-auth.mjs (feat #3) 15 ALL PASS PEM guards (a)–(f), Agent singleton, 200-request mTLS handshake against local stub, FD-leak proof
smoke-feat-4.mjs 12 ALL PASS Zod schema, CLI flags, env fallback, --passphrase NOT in --help, sanitized 1-line errors
smoke-feat-5.mjs 24 ALL PASS Factory chain selection, conflict warning (cert+OAuth), partial-OAuth warning, sanitize cert mode + hidden passphrase
smoke-feat-6.mjs 16 ALL PASS liveCertStrategies registry, double-destroy idempotent, SIGINT logs destroyed N, Basic mode logs nothing
smoke-feat-7.mjs 19 ALL PASS reload() Agent swap + fingerprint, bad-cert keeps old, partial-failure isolation, SIGHUP triggers fingerprint log, mtime watcher startup
smoke-single-flight.mjs (regression) ALL PASS feat #1/#2 single-flight + leak #14 + leak #9 still intact

Reproduce: npm run build && for f in src/test/smoke-*.mjs; do node "$f"; done

Manual mTLS verification (against local stub, MOCK certs only)

Three-terminal flow per plan/phase2-test-plan.md § feat #5 (e) and
§ feat #7 (a/b/c).

  • Cert handshake reaches stub — stub log lines like
    mTLS request: clientCN=test-client method=GET path=/... for each
    AEMFetch call. Confirms the client cert is in the TLS handshake; no
    Authorization header is sent.
  • Startup log fidelityConnecting to MCP server with CLI params:
    shows authMode: 'cert', hasCert: true, hasKey: true, hasCa: true, hasPassphrase: false. (Pre-feat-Release merge #5 it would have lied with
    authMode: 'basic'.)
  • SIGHUP rotation — fresh cert generated via openssl x509 -req -CA src/test/certs/ca.crt -CAkey src/test/certs/ca.key ..., swapped into
    the active path, kill -HUP \$PID triggers:
    [cert-reload] SIGHUP received — reloading cert-auth strategies
    [cert-reload] strategy reloaded: SHA256(old)=c6ce6b43446385f6 → SHA256(new)=a77ba2fe9c851817
    [cert-reload] reloaded 1 cert-auth strategy(ies)
    
    Confirmed different fingerprints — real content swap, not just mtime
    touch.
  • mtime watcher fires — with --cert-watch-interval-min 1, after a
    touch on the cert file the watcher logs the mtime delta and reload:
    [cert-watch] cert mtime changed (was 2026-06-17T11:53:33.718Z, now
      2026-06-17T11:56:25.137Z) — reloading
    
  • New cert on the wire — post-rotation tools/call (listChildren
    /content) produces a stub log line with the rotated CN:
    GET /content.1.json — clientCN=test-client-rotated
    
    Server never restarted; same PID and port throughout. End-to-end
    rotation proven on the wire.
  • Negative control — server started WITHOUT --cert/--key against
    the same stub returns TLS connection error (stub rejects bare TLS).
    Confirms the positive cases weren't passing through some other path.

Semantic-release major-bump verification

```
$ npx --yes -p conventional-recommended-bump -p conventional-changelog-angular \
conventional-recommended-bump -p angular
major
```

Driven by the `BREAKING CHANGE:` footer in feat #2 (`f87796c`).

What's NOT covered locally (needs real AEM)

  • Real-AEM mTLS handshake against a corporate mTLS gateway (test (g)
    in plan): handshake will work the same way it works against the stub
    (this branch contains no AEM-specific cert path), but cert→JCR user
    mapping is an AEM-admin concern outside this server's scope.
  • CRL/OCSP revocation flow — by design (see Out of scope above).

Checklist

  • My code follows the code style of this project.
  • I have read the CONTRIBUTING document.
  • All new and existing tests passed.

🤖 Generated with Claude Code

Add AuthStrategy interface plus BasicAuthStrategy and OAuthStrategy
classes in aem.auth.ts; AEMFetch now constructs a strategy in its
constructor and delegates getAuthToken/isOAuth/refreshAuthToken to it.
No behaviour change - AEMAuth union and public method signatures
preserved (removed in feat #2). Single-flight dedup, expires_in > 60
guard, Latin-1 base64, and OAuth-only 401-retry gating are lifted into
the strategy classes without altering observable behaviour.
AEMFetch now consumes AuthStrategy directly. Removed AEMAuth union,
getAuthToken(), isOAuth getter, and the AEMBasicAuth /
AEMOAuthServerToServer / AEMAuth type exports. getFetchInstance() calls
strategy.getHeaders() per request and attaches strategy.getAgent?.() as
the fetch dispatcher (cert-mode hook for feat #3). 401-retry is gated by
strategy refresh capability rather than the removed isOAuth flag.
AEMConnector constructs the strategy via createAuthStrategy().

BREAKING CHANGE: AEMBasicAuth, AEMOAuthServerToServer, and AEMAuth
types are removed from the public API. AEMFetchConfig now requires an
authStrategy: AuthStrategy field instead of auth: AEMAuth. Consumers
must construct the strategy via createAuthStrategy() from aem.auth.js
(or the strategy classes directly).
Add CertAuthStrategy with singleton undici.Agent for mTLS to AEM. PEM
guards (path-traversal / 1 MB size / BEGIN prefix) reject malformed
inputs before tls.createSecureContext can throw an OpenSSL trace.
Encrypted PKCS#8 keys require AEM_KEY_PASSPHRASE env var (never a CLI
flag - keeps secrets out of ps aux). World-readable key files emit a
warning. AuthStrategy interface gains init()/destroy() hooks; AEMFetch
now calls strategy.init?.() instead of refresh?.(). undici pinned to
7.27.2.
Add --cert/-C, --key/-k, --ca yargs options with env-var fallback
(AEM_CERT_PATH / AEM_KEY_PATH / AEM_CA_PATH). passphrase is sourced
ONLY from AEM_KEY_PASSPHRASE - no CLI flag, no ps aux exposure.
CertParamsSchema (Zod) catches partial config (cert without key) and
empty strings (--cert "") before they reach CertAuthStrategy.init().
safeParse with one-line sanitized stderr; never echoes received
values. zod pinned exact 3.25.76 (no caret).
Extend AuthFactoryInput with certPath/keyPath/caPath/passphrase.
createAuthStrategy now selects cert+key > OAuth > Basic. When both
cert-auth and OAuth credentials are supplied, log a warning that
cert-auth takes priority and OAuth params are ignored (cert vs Basic
is silent because Basic defaults to admin/admin and we can't
distinguish explicit from default). AEMConnector.loadConfig forwards
all credential candidates to the factory. End-to-end mTLS to AEM is
now reachable via --cert/--key from the CLI.

redactCliParams in sanitize.ts learns the 'cert' authMode plus
hasCert/hasKey/hasCa/hasPassphrase booleans so startup logs no
longer mislead operators by displaying authMode: 'basic' when cert
mode is active. Passphrase value is never echoed.
Extend the leak #17 drain sequence with a cert-mode hook. After
server.close drains in-flight requests, call destroyAllCertStrategies()
to release every cached undici.Agent keep-alive pool before
process.exit. The call is bounded by a 5s race timeout so a stuck
Agent can't hang shutdown past the drain deadline.

A module-level Set in aem.auth.ts tracks live CertAuthStrategy
instances (init() registers after the Agent is built, destroy()
unregisters). This handles the per-session reality where a process
holds N strategies (one /health connector + one per MCP session)
rather than the single-instance world the plan envisioned.

Drain logs "destroyed N cert-auth agent pool(s)" only when N > 0, so
Basic/OAuth runs stay quiet. The fatal-error path (uncaughtException,
unhandledRejection) is unchanged - sync-only as the original leak #17
specified; OS reclaims sockets on abnormal exit.
Add CertAuthStrategy.reload() that re-reads PEM material, validates
the keypair, builds a fresh undici.Agent, atomically swaps it in,
and schedules the old Agent for destruction after a 30s drain
window. Failed reloads (bad PEM, keypair mismatch on the refreshed
material) keep the old Agent untouched - never half-broken state.

Module-level reloadAllCertStrategies() walks the same
liveCertStrategies registry feat #6 introduced. SIGHUP handler in
app.server.ts invokes it; fingerprint transitions log to stderr
(always visible, no MCP_LOGGER required) so SREs see the rotation
in real time. No-op when no cert-auth strategy is live.

Optional --cert-watch-interval-min N (env AEM_CERT_WATCH_INTERVAL_MIN)
adds an mtime poll: every N minutes, statSync the cert path; on
mtime change, run the same code path SIGHUP triggers. Off by
default (0); setInterval is .unref()'d so it doesn't keep the
process alive on shutdown.
The Phase 1 smoke audit is internal work product (snapshot of
pre-feat #1 testing state), not user-facing documentation. Relocate
to plan/ where the analogous phase2-test-plan.md already lives —
both directories follow the same convention: gitignored, kept on
disk for local reference, not published with the npm package.
Document the three auth modes (Basic / OAuth / mTLS) and how the
factory chains them. Add an explicit trust-boundary callout: mTLS
authenticates the server→AEM leg only - the /mcp endpoint stays
open and operators are responsible for adding their own auth in
front of any non-loopback bind.

Document the cert-rotation contract: SIGHUP-driven reload (operator-
triggered) and optional --cert-watch-interval-min mtime poll (cron-
style). Both paths log fingerprint transitions to stderr so SREs
can correlate rotations.

Document the CRL/OCSP scope decision: revocation is handled at the
upstream layer (Dispatcher, reverse proxy, mTLS gateway), not in
this server. Document encrypted-key handling via AEM_KEY_PASSPHRASE
env (no CLI flag - keeps secrets out of ps aux).

Expand the environment-variables table with all new vars introduced
by feat #4-#7 (AEM_CERT_PATH, AEM_KEY_PATH, AEM_CA_PATH,
AEM_KEY_PASSPHRASE, AEM_CERT_WATCH_INTERVAL_MIN, MCP_BIND,
MCP_ALLOWED_ORIGINS, MCP_SHUTDOWN_DRAIN_SECONDS). Add per-mode
example commands.

Verified semantic-release major-bump trigger via
`conventional-recommended-bump -p angular` -> `major` (driven by
the BREAKING CHANGE: footer in feat #2, commit f87796c).
When --cert-watch-interval-min was set and fs.statSync failed at boot,
the early return exited startServer before uncaughtException and
unhandledRejection handlers were registered. Replace with a guard
variable so the watcher is silently disabled while execution continues
to the fatal-error handler registration.
…ict warning, watcher docs

- CertAuthStrategy: store drain timer on instance; clearTimeout before
  each reload() so rapid successive reloads don't fire the first timer
  on the now-live agent
- CertAuthStrategy.init(): throw on repeated calls to prevent silently
  orphaning the previous socket pool
- createAuthStrategy: cert+OAuth conflict warning moved from LOGGER.warn
  (no-op without MCP_LOGGER) to process.stderr.write — always visible
- app.server.ts: document that only cert mtime is polled; key/CA changes
  require SIGHUP
- smoke-feat-5: capture process.stderr.write instead of console.warn to
  match the new unconditional warning mechanism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant