Conversation
… replay primitives
- squash __forge_v001..v011 into a single normalized v001_initial.sql carrying the final shape of every table, function, and trigger - gate cluster metrics on the gateway feature; gate job_queue and realtime modules on their own features so api/worker/minimal slim builds compile cleanly - bump astral-tokio-tar to clear RUSTSEC-2025-0146 - inline example Cargo.toml deps so `forge new` templates build standalone outside the workspace; drop dead workspace=true replacements from demo .forge-template.toml files - drop deny.toml `version` key (cargo-deny no longer accepts it) - prepare sqlx with --all-targets so test-only queries are cached - fix slow realtime drain test that deadlocked on empty channel and update tls cert-path test to match current read_pem_certs error - retire issues/ tracking notes and .agents/rewrite-progress.md; strip the running commentary from v001_initial.sql, keeping only comments that explain non-obvious intent
- codegen: handle HashMap and Vec end-to-end in Dioxus emitter; surface parser failures so silently-skipped files don't drop handlers from bindings - pg leader: parse role string strictly, emit NOTIFY forge_leader_released on voluntary release so standbys fail over without waiting for the next check tick - daemon runner: collect and abort spawned validate/refresh tasks on clean iteration exit to stop them leaking past handler return - workflow scheduler: document is_leader() as advisory-only (correctness comes from atomic UPDATE-with-status-check) - cluster registry: drop dead mark_dead_nodes path - webhook handler: make dispatch transactional with idempotency release on failure; refactor context replay primitives - signals: widen hash_ua to 64 bits and guard against misrouted default-partition rows - runtime wiring: build PgNotifyBus before LeaderElection so leader-released wakeups flow through the shared bus - migrations: collapse system schema to v001, add FK indexes on workflow_events.consumed_by and oauth_codes.client_id - docs: clarify parser context detection is an allowlist of 8 framework context types
- workflow: route suspend through ForgeError::WorkflowSuspended, propagate state persistence errors, claim-for-execution UPDATE drops the 'running' filter, dispatch_job/start_workflow go through trait builders. - gateway: hash JWT cache key with SHA-256 (was raw token), stateless HMAC- signed OAuth CSRF tokens with 5-min TTL, JWKS singleflight + 30s negative cache for unknown kids, rate limiter clamps to -1.0 instead of underflow. - realtime: ChangeListener snapshots max_seq before bus.subscribe and replays missed events on NOTIFY reconnect via watch::Sender generation counter; leader-lease refresh holds the mutex across pg_locks probe and UPDATE forge_leaders. - worker: shutdown drain via JoinSet with grace period, release_claim helper for orphaned claims, job-status row mapper surfaces unknown variants instead of panicking.
- macros: workflow and mutation visitors track the ctx ident from the first fn argument and only collect step/wait/dispatch keys when the method-call receiver chain bottoms out on that ident, eliminating false positives from any same-named method. Query unscoped-error message now spells out the structural lint, the #[query(unscoped)] opt-out, and points users at Postgres RLS for real isolation. - testing: TestMutationContext::into_mutation_context(pool) bridges to the production MutationContext wired with the test mocks, so handlers written against &MutationContext can be exercised through assertion macros. MockWorkflowDispatch picks up the WorkflowDispatch trait impl that the bridge needs. - dioxus signals: now_iso falls back to the portable formatter instead of silently returning ""; the three serde_json::to_value(..).unwrap_or_default sites now early-return with console.warn so dropped analytics aren't silent; rand_u32 non-wasm path uses subsec_nanos XOR counter instead of memory-address hashing; localStorage queue persistence ported from the Svelte client (key forge_signals_queue_v1, restored on init, written on every enqueue/flush, cleared on success/beacon).
The scheduler's consume-claim-and-resume / claim-and-resume transactions flip the run to 'running' before enqueueing the resume job, so by the time the worker reaches claim_for_execution the row is already there. Excluding 'running' from the IN list left every event- or timer-driven resume failing with InvalidState, which surfaced as the demo PR smoke test (with-svelte/demo, with-dioxus/demo) stalling at 3/6 completed steps once the user clicks Confirm Verification. Dedup is already enforced upstream: the job queue holds resume jobs under FOR UPDATE SKIP LOCKED, and the scheduler's claim UPDATE / event consume ensures only one resume job per wake event. Re-admitting 'running' makes the worker idempotent without adding a real race. The matching .sqlx cache file is renamed (new hash for the restored IN list).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Full rearchitecture of forge internals. 26 commits, 454 files, +42k/−21k LOC. Pre-1.0 — no migration path from intermediate states.
What changed
Runtime foundation
crates/forge-runtime/src/pg/: migration runner, advisory locks, NOTIFY bus, listener reconnect, identifier validation.Reactivity
ChangeListener->InvalidationEngine(50ms debounce / 200ms max) ->SubscriptionManager(DashMap, 64 shards, dedup by hash) ->Reactor(bounded concurrency 64, hash compare before push) ->SessionServer(SSE fan-out).forge_changesNOTIFY.PgNotifyBus: single connection, multi-channel, exponential-backoff reconnect (500ms -> 30s) with full re-LISTEN.Workflows
ParallelBuilder.Mutations and dispatch
dispatch_job/start_workflowshare the mutation's tx; theforge_notify_job_availabletrigger fires inside that tx, so workers only see jobs whose mutation actually committed. At-least-once preserved.dispatch_job/start_workflowoutside a#[mutation(transactional)]errors at macro expansion.Security
user_id/owner_id/tenant_idin SQL.#[query(unscoped)]opts out explicitly.is_private_ipcovers IPv4-mapped IPv6, link-local, ULA, broadcast, documentation.SsrfSafeResolverfilters at DNS resolution to close rebinding. Literal-IP URLs caught pre-DNS.Clustering and operations
pg_try_advisory_lock, lock held by connection, holder-pid diagnostics).UNIQUE (cron_name, scheduled_time), catch-up for missed runs.Codegen
emit.rs(ts_type+dioxus_type).BindingSetIR.Build and supply chain
v001_initial.sql.gateway,jobs,cron,workflows,daemons,realtime,clustergated independently. (See open items below.)astral-tokio-tarbump clears RUSTSEC-2025-0146.cargo-deny, audit, supply-chain guardrails tightened.Docs
docs/docs/(Docusaurus) anddocs/skills/forge-idiomatic-engineer/references/(api.md,frontend.md,patterns.md,pitfalls.md).Must fix before merge
workflows-only andcron-only configurations.cargo check -p forge-runtime --no-default-features --features workflowsand--features cronboth fail becauseworkflow/scheduler.rs:442andcron/scheduler.rs:449unconditionally referencecrate::jobs::JobRecord, which is#[cfg(feature = "jobs")]-gated. Fix: either addjobsto those features inCargo.tomlorcfg-gate the dispatch sites.Non-blocking notes
forge_invalidationstable andforge_purge_expired_invalidations()defined but unreferenced from Rust. Drop or wire up.forge_workflow_events.consumed_byandforge_oauth_codes.client_id— both will seq-scan children on parent DELETE CASCADE.forge_signals_events_defaultpartition silently swallows misrouted rows ifforge_signals_ensure_partitionfails. Retention drop logic explicitly skips it. Add a guardrail.user_id/owner_id/tenant_idpasses). Doc comment is honest: not a security boundary until RLS lands.hash_uatruncates SHA-256 to 8 hex chars (32 bits). HMAC carries forgery resistance, so it's fine, but a wider truncation (64+ bits) is cheap insurance against UA-rotation bypass.webhook/handler.rsstill TODO'd to migrate fromWebhookContexttoMutationContext— same atomicity story already fixed for mutations.forge_change_logsequence table exists, but no reader replays it on listener reconnect. If the doc claims "replay missed rows," verify the path.Not reviewed (size budget)
Detailed audits skipped for
forge-codegeninternals, the cluster leader-election rewrite, webhook replay command, and the fullcargo test --workspacerun. Reviewed viacargo checkagainst feature combos and targeted reads of architecture-sensitive files.Test plan
cargo fmt --all --checkSQLX_OFFLINE=true cargo clippy --all-targets --all-features --workspace -- -D warningsSQLX_OFFLINE=true cargo test --workspace.sqlx/cache regenerated against the squashed schema and committedcargo check --no-default-features --features workflows(currently failing — see above)cargo check --no-default-features --features cron(currently failing — see above)