feat(DDE-365): cert-auth (mTLS) — feat #1–#8#8
Open
nc-markovic wants to merge 11 commits into
Open
Conversation
Add AuthStrategy interface plus BasicAuthStrategy and OAuthStrategy classes in aem.auth.ts; AEMFetch now constructs a strategy in its constructor and delegates getAuthToken/isOAuth/refreshAuthToken to it. No behaviour change - AEMAuth union and public method signatures preserved (removed in feat #2). Single-flight dedup, expires_in > 60 guard, Latin-1 base64, and OAuth-only 401-retry gating are lifted into the strategy classes without altering observable behaviour.
AEMFetch now consumes AuthStrategy directly. Removed AEMAuth union, getAuthToken(), isOAuth getter, and the AEMBasicAuth / AEMOAuthServerToServer / AEMAuth type exports. getFetchInstance() calls strategy.getHeaders() per request and attaches strategy.getAgent?.() as the fetch dispatcher (cert-mode hook for feat #3). 401-retry is gated by strategy refresh capability rather than the removed isOAuth flag. AEMConnector constructs the strategy via createAuthStrategy(). BREAKING CHANGE: AEMBasicAuth, AEMOAuthServerToServer, and AEMAuth types are removed from the public API. AEMFetchConfig now requires an authStrategy: AuthStrategy field instead of auth: AEMAuth. Consumers must construct the strategy via createAuthStrategy() from aem.auth.js (or the strategy classes directly).
Add CertAuthStrategy with singleton undici.Agent for mTLS to AEM. PEM guards (path-traversal / 1 MB size / BEGIN prefix) reject malformed inputs before tls.createSecureContext can throw an OpenSSL trace. Encrypted PKCS#8 keys require AEM_KEY_PASSPHRASE env var (never a CLI flag - keeps secrets out of ps aux). World-readable key files emit a warning. AuthStrategy interface gains init()/destroy() hooks; AEMFetch now calls strategy.init?.() instead of refresh?.(). undici pinned to 7.27.2.
Add --cert/-C, --key/-k, --ca yargs options with env-var fallback (AEM_CERT_PATH / AEM_KEY_PATH / AEM_CA_PATH). passphrase is sourced ONLY from AEM_KEY_PASSPHRASE - no CLI flag, no ps aux exposure. CertParamsSchema (Zod) catches partial config (cert without key) and empty strings (--cert "") before they reach CertAuthStrategy.init(). safeParse with one-line sanitized stderr; never echoes received values. zod pinned exact 3.25.76 (no caret).
Extend AuthFactoryInput with certPath/keyPath/caPath/passphrase. createAuthStrategy now selects cert+key > OAuth > Basic. When both cert-auth and OAuth credentials are supplied, log a warning that cert-auth takes priority and OAuth params are ignored (cert vs Basic is silent because Basic defaults to admin/admin and we can't distinguish explicit from default). AEMConnector.loadConfig forwards all credential candidates to the factory. End-to-end mTLS to AEM is now reachable via --cert/--key from the CLI. redactCliParams in sanitize.ts learns the 'cert' authMode plus hasCert/hasKey/hasCa/hasPassphrase booleans so startup logs no longer mislead operators by displaying authMode: 'basic' when cert mode is active. Passphrase value is never echoed.
Extend the leak #17 drain sequence with a cert-mode hook. After server.close drains in-flight requests, call destroyAllCertStrategies() to release every cached undici.Agent keep-alive pool before process.exit. The call is bounded by a 5s race timeout so a stuck Agent can't hang shutdown past the drain deadline. A module-level Set in aem.auth.ts tracks live CertAuthStrategy instances (init() registers after the Agent is built, destroy() unregisters). This handles the per-session reality where a process holds N strategies (one /health connector + one per MCP session) rather than the single-instance world the plan envisioned. Drain logs "destroyed N cert-auth agent pool(s)" only when N > 0, so Basic/OAuth runs stay quiet. The fatal-error path (uncaughtException, unhandledRejection) is unchanged - sync-only as the original leak #17 specified; OS reclaims sockets on abnormal exit.
Add CertAuthStrategy.reload() that re-reads PEM material, validates the keypair, builds a fresh undici.Agent, atomically swaps it in, and schedules the old Agent for destruction after a 30s drain window. Failed reloads (bad PEM, keypair mismatch on the refreshed material) keep the old Agent untouched - never half-broken state. Module-level reloadAllCertStrategies() walks the same liveCertStrategies registry feat #6 introduced. SIGHUP handler in app.server.ts invokes it; fingerprint transitions log to stderr (always visible, no MCP_LOGGER required) so SREs see the rotation in real time. No-op when no cert-auth strategy is live. Optional --cert-watch-interval-min N (env AEM_CERT_WATCH_INTERVAL_MIN) adds an mtime poll: every N minutes, statSync the cert path; on mtime change, run the same code path SIGHUP triggers. Off by default (0); setInterval is .unref()'d so it doesn't keep the process alive on shutdown.
The Phase 1 smoke audit is internal work product (snapshot of pre-feat #1 testing state), not user-facing documentation. Relocate to plan/ where the analogous phase2-test-plan.md already lives — both directories follow the same convention: gitignored, kept on disk for local reference, not published with the npm package.
Document the three auth modes (Basic / OAuth / mTLS) and how the factory chains them. Add an explicit trust-boundary callout: mTLS authenticates the server→AEM leg only - the /mcp endpoint stays open and operators are responsible for adding their own auth in front of any non-loopback bind. Document the cert-rotation contract: SIGHUP-driven reload (operator- triggered) and optional --cert-watch-interval-min mtime poll (cron- style). Both paths log fingerprint transitions to stderr so SREs can correlate rotations. Document the CRL/OCSP scope decision: revocation is handled at the upstream layer (Dispatcher, reverse proxy, mTLS gateway), not in this server. Document encrypted-key handling via AEM_KEY_PASSPHRASE env (no CLI flag - keeps secrets out of ps aux). Expand the environment-variables table with all new vars introduced by feat #4-#7 (AEM_CERT_PATH, AEM_KEY_PATH, AEM_CA_PATH, AEM_KEY_PASSPHRASE, AEM_CERT_WATCH_INTERVAL_MIN, MCP_BIND, MCP_ALLOWED_ORIGINS, MCP_SHUTDOWN_DRAIN_SECONDS). Add per-mode example commands. Verified semantic-release major-bump trigger via `conventional-recommended-bump -p angular` -> `major` (driven by the BREAKING CHANGE: footer in feat #2, commit f87796c).
When --cert-watch-interval-min was set and fs.statSync failed at boot, the early return exited startServer before uncaughtException and unhandledRejection handlers were registered. Replace with a guard variable so the watcher is silently disabled while execution continues to the fatal-error handler registration.
6 tasks
…ict warning, watcher docs - CertAuthStrategy: store drain timer on instance; clearTimeout before each reload() so rapid successive reloads don't fire the first timer on the now-live agent - CertAuthStrategy.init(): throw on repeated calls to prevent silently orphaning the previous socket pool - createAuthStrategy: cert+OAuth conflict warning moved from LOGGER.warn (no-op without MCP_LOGGER) to process.stderr.write — always visible - app.server.ts: document that only cert mtime is polled; key/CA changes require SIGHUP - smoke-feat-5: capture process.stderr.write instead of console.warn to match the new unconditional warning mechanism
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cert-auth (mTLS) support — server→AEM authentication via X.509 client cert
handshake. New
CertAuthStrategyis selected automatically when--cert--keyare supplied (highest priority over OAuth and Basic). Builds onthe AuthStrategy interface introduced by feat Review and fix/delete non-functional tools. Adjust documentation. #1/MCP is using to much resources #2 and the hardening
series shipped in fix(DDE-365): security hardening — leak #1–#17 #7.
Out of scope (handled at upstream layer): CRL/OCSP revocation, MCP
client→server auth (the
/mcpendpoint stays open — only loopback bindprotects it; operators add their own auth in front of any non-loopback
exposure).
Related Issue
Fixes #
Stack note
This branch stacks on
fix/DDE-365-security-hardening(PR #7). When #7merges to
main, this PR will auto-retarget tomainand the diff willnarrow to the 9 cert-auth commits.
Commits shipped
3b05caef87796c8093018ee6ffe6e8f31932d3ad1df8cc5e235042604285531Types of changes
feat #2(f87796c) removes theAEMBasicAuth,AEMOAuthServerToServer,and
AEMAuthpublic type exports and changesAEMFetchConfig.authtoAEMFetchConfig.authStrategy. The commit carries aBREAKING CHANGE:footer; semantic-release will produce a major NPM version bump on
merge.
Highlights
Authorizationheader sent to AEM. Singletonundici.Agentperprocess (one keep-alive socket pool per cert mode), built once in
init(), attached asdispatcherto nativefetch.readAndGuardPemrejects in this order:path traversal (
..), oversized files (>1 MB), non-PEM content (first64 bytes must start with
-----BEGIN).tls.createSecureContextvalidates the keypair at boot before any HTTPS request — keypair
mismatch surfaces a one-line sanitized error, no OpenSSL trace.
-----BEGIN ENCRYPTED PRIVATE KEY-----requiresAEM_KEY_PASSPHRASE. No--passphraseCLI flag —CLI args appear in
ps aux, env vars do not.Setof liveCertAuthStrategy instances on SIGINT/SIGTERM and calls
Agent.destroy()on each, bounded by a 5s race timeout. Stderr line:[shutdown] destroyed N cert-auth agent pool(s).CertAuthStrategy.reload()re-reads PEMs, validates the keypair, builds a new Agent, atomically
swaps it in, and schedules the old Agent for destruction after a 30s
drain window. Triggered by SIGHUP or optional mtime poll
(
--cert-watch-interval-min N). Bad-PEM reloads keep the old Agentactive — no broken state.
redactCliParamslearns'cert'authMode plushasCert/hasKey/hasCa/hasPassphrasebooleans.Passphrase value is never echoed.
Test plan / Evidence
Automated smoke harnesses (under
src/test/, gitignored)smoke-cert-auth.mjs(feat #3)smoke-feat-4.mjs--passphraseNOT in--help, sanitized 1-line errorssmoke-feat-5.mjssmoke-feat-6.mjsdestroyed N, Basic mode logs nothingsmoke-feat-7.mjssmoke-single-flight.mjs(regression)Reproduce:
npm run build && for f in src/test/smoke-*.mjs; do node "$f"; doneManual mTLS verification (against local stub, MOCK certs only)
Three-terminal flow per
plan/phase2-test-plan.md§ feat #5 (e) and§ feat #7 (a/b/c).
mTLS request: clientCN=test-client method=GET path=/...for eachAEMFetch call. Confirms the client cert is in the TLS handshake; no
Authorizationheader is sent.Connecting to MCP server with CLI params:shows
authMode: 'cert',hasCert: true,hasKey: true,hasCa: true,hasPassphrase: false. (Pre-feat-Release merge #5 it would have lied withauthMode: 'basic'.)openssl x509 -req -CA src/test/certs/ca.crt -CAkey src/test/certs/ca.key ..., swapped intothe active path,
kill -HUP \$PIDtriggers:touch.
--cert-watch-interval-min 1, after atouchon the cert file the watcher logs the mtime delta and reload:tools/call(listChildren/content) produces a stub log line with the rotated CN:
rotation proven on the wire.
--cert/--keyagainstthe same stub returns TLS connection error (stub rejects bare TLS).
Confirms the positive cases weren't passing through some other path.
Semantic-release major-bump verification
```
$ npx --yes -p conventional-recommended-bump -p conventional-changelog-angular \
conventional-recommended-bump -p angular
major
```
Driven by the `BREAKING CHANGE:` footer in feat #2 (`f87796c`).
What's NOT covered locally (needs real AEM)
in plan): handshake will work the same way it works against the stub
(this branch contains no AEM-specific cert path), but cert→JCR user
mapping is an AEM-admin concern outside this server's scope.
Checklist
🤖 Generated with Claude Code