| name | D.I.T. (Do It Together) — MVP Plan (Internal Alpha) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| overview | Internal-alpha Discord bot ("D.I.T.") built around a novel chat-AI architecture: a host harness that selects a per-turn "disposition" from a full N-dimensional matrix (reasoning, ~15 context strategies, voice, output shape, search posture, risk). Dispatches parallel research subagents, synthesizes through soft context, supports multi-message bursts and reactor-mediated self-followup turns. Guild-wide awareness across all channels. Backed by layered memory with conversation epochs, an event-and-state-driven reactor (never cron), walled-garden TS skills, and Discord steward capabilities. Discord is the proving ground for an eventual custom platform. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| todos |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| isProject | false |
Build an autonomous group-native AI that joins a Discord server as a participant, not a tool. It picks how to respond, when to speak unprompted, and how much to think — on a per-turn basis — by choosing a disposition from a rich behavioral matrix. It manages the server itself as a steward (channels, threads, archives). It remembers the group across all channels using a layered memory model with verbatim moments and time-gap conversation epochs. End users never see config, never touch an API key, never edit a markdown file. Skills live in a walled garden.
Discord is the proving ground for an eventual custom chat platform. The MVP target is internal alpha in our own server — to validate the architecture in real chat and iterate fast. SaaS onboarding, billing, multi-tenancy, and the sleep mechanic are deferred to a post-MVP roadmap.
- No cron. Every timing concern goes through the reactor — an event-and-state-driven, durable, retryable orchestrator. Triggers can be timers, events, state predicates, or composites. Never wall-clock grids.
- No static personality prose at runtime. SOUL.md exists as a constraint surface for what the persona is allowed to feel like, but the disposition router IS the runtime personality. Behavior emerges from choice-of-process, not from prose instructions.
- No workspace files for end users. All configuration lives in the database. Walled garden by default.
- No markdown-as-runtime-skill. Skills are TypeScript modules we author and ship. AgentSkills format may inspire the catalog shape, but skills are not dynamically discovered from filesystems.
- No forced citation between subagents and synthesizer. Subagents return material; the synthesizer is the taste that selects what to use. Forcing citation makes outputs feel like book reports.
- No compaction-as-replacement. Originals are never destroyed. Layered memory + conversation epochs replace summary-and-replace.
- No one-pipeline-per-turn. Every turn is routed through the disposition system, which picks coordinates in an N-dimensional behavior space. The matrix IS the architecture.
- No latency optimization at the expense of quality. Thinking takes time and that reads as deliberate to humans. Speculative turns can take minutes.
- No SaaS overhead at MVP. Internal alpha in our own server. Multi-tenancy, billing, onboarding deferred.
- No platform lock-in. Discord is one adapter. Host harness, router, memory, reactor, skills are platform-agnostic.
flowchart TB
subgraph discord [Discord platform]
DC[Bot application + guilds]
end
subgraph adapter [Discord adapter platform-agnostic boundary]
GW[discord.js shard pool]
OUT[Output dispatcher burst + interruption aware]
end
subgraph host [Host harness per-channel execution, guild-wide awareness]
Ingest[Inbound ingest + pending-history buffer]
Router[Disposition router]
Fanout[Subagent fan-out]
Synth[Synthesizer]
end
subgraph reactor [Reactor event-and-state-driven orchestrator]
Bus[Event bus]
Sched[Durable scheduler]
Timers[Stateful timers]
Predicates[State predicates]
Budget[Budget envelope]
end
subgraph workers [Worker subagents]
Research[Research personas: memory-archeologist, profiler, topic-researcher, comedian, contrarian]
Tasks[Task workers full agent-core for delegated goals]
end
subgraph mem [Layered memory guild-wide]
Live[Live transcript]
Doss[User dossiers]
Hi[Highlight reel verbatim]
Lore[Group lore]
Arc[Transcript archive]
Ep[Conversation epochs]
end
subgraph llmgw [DIT model gateway purpose-routed]
Tlow[Low reasoning tier]
Tmid[Mid reasoning tier]
Thigh[High reasoning tier]
Emb[Embeddings]
Policy[Quotas + policy]
end
subgraph services [Platform]
PG[(Postgres + pgvector)]
Redis[(Redis)]
end
DC <--> GW
GW --> Ingest
Ingest --> Bus
Ingest --> Router
Bus --> Sched
Timers --> Bus
Predicates --> Bus
Bus -->|speculative trigger| Router
Budget -->|envelope| Router
Router --> Fanout
Fanout --> Research
Research --> mem
Fanout --> Synth
Synth --> OUT
OUT --> DC
OUT -->|human interrupt| Router
Router & Fanout & Synth & Research --> llmgw
llmgw --> Policy --> PG
mem --> PG
Sched --> Redis
Bus --> Tasks
Tasks --> mem
Tasks --> llmgw
Synth -->|moment extraction| mem
Sched -->|epoch close| Ep
A "host" is the thing that decides what to do for a given turn. It is not a generic agent loop. Two things to clarify up front because they're orthogonal and easy to conflate:
- Execution serialization is per-channel. There's one in-flight bot reply per channel at any moment, so the bot doesn't produce overlapping output in the same place. Replies in different channels run in parallel. This is purely a concurrency primitive (BullMQ queue keyed on
guild:channel). - Awareness / context is guild-wide, always. The host has full view across every channel in the guild. A turn fired in
#generalsees what was just said in#random30 seconds ago. Channel ID is metadata on every message and memory item, never a partition. This is a core distinguishing property — most chat bots (OpenClaw included) silo per-channel, which is why their "did you see what we were just talking about?" experience is broken.
Host responsibilities:
- Buffer inbound messages, maintain a guild-wide pending-history window (cross-channel, with channel tags).
- Receive routing inputs from two sources: (1) inbound messages, (2) reactor-initiated speculative or self-followup triggers.
- Call the disposition router to pick a disposition for the current turn.
- Dispatch subagent fan-out and synthesizer per the chosen disposition.
- Submit output to the dispatcher (one or many message beats).
- Handle interruptions during multi-message bursts and during pending self-followup turns.
The host does NOT do multi-step task execution. When something is goal-shaped ("research X and produce a writeup", "design a poll structure"), the host delegates to a task worker — a full agent-core harness with its own loop, skills, and budget — via the reactor. The host then either awaits the task result or proceeds without it and folds the result in later.
This two-tier split is the core architectural insight: chat turns are not coding tasks. Group chat is dispositional; coding is goal-directed. They need different loops.
A single primitive: "do work W when condition C becomes true, with retry, observability, and dead-letter semantics."
C can be:
- A timer (
after 3h of channel silence) - An event (
messageCreate,reactionAdd,member joined,bot reply was quoted) - A state predicate (
there are >=N pending dossier updates for guild,budget envelope under 30% with >2h until refresh) - A composite (
it's evening AND it's been quiet AND the channel-activity rolling avg suggests ambient mode is welcome)
Concrete things that go through the reactor instead of cron:
- Conversation epoch close — per-channel silence timer (configurable 2-4h); on fire, kick off the epoch-summarize job.
- Dossier consolidation — accumulator predicate: when N new highlights/messages tagged to a user, schedule a consolidation pass.
- Group lore detection — repeated-phrase detector emits events; reactor coalesces and batches.
- Speculative routing — composite trigger evaluating activity level, budget remaining, time since last bot turn, randomized seed; on fire, dispatch an
unprompted=trueturn to the router. - Self-followup turns — see dedicated subsection below; this is how the "stream-of-consciousness" feel actually gets produced.
- Budget refresh — period-boundary event.
- Steward maintenance — periodic state-predicate evaluation (e.g., "archive channel has 200+ items, propose threading").
- Task worker dispatch — host delegates a goal, reactor schedules and tracks the worker run.
The output dispatcher never fakes pauses. Instead, when a turn completes, the reactor evaluates the chosen disposition's selfFollowupHint:
type SelfFollowupHint = {
probability: number; // 0-1 base chance of scheduling a follow-up
bias: 'go-deeper' | 'self-correct' | 'contrarian-self' | 'add-nuance' | 'callback-later';
windowMs: [min: number, max: number]; // when to fire, sampled uniformly
};If the dice roll lands in favor, the reactor schedules a follow-up evaluation for some natural-feeling later moment (e.g., 30s-2min for go-deeper, 5-20min for callback-later). At evaluation time the reactor checks:
- Has the conversation moved on past the original topic? → likely abort (or downgrade to
callback-laterfor much later) - Did a human reply to the bot's earlier message? → abort the follow-up (that's now a normal reactive turn)
- Is the channel currently active with someone else's in-progress thought? → defer briefly or abort
Otherwise the reactor fires a new turn into the router with is_self_followup=true plus the previous turn's full context. The router picks a disposition appropriate for the bias: go-deeper favors high-reasoning dispositions with richer context; contrarian-self favors the contrarian voice; self-correct favors a deadpan low-reasoning quick correction.
This is the actual machinery behind the bot saying something quick and then coming back 90 seconds later with a real take, or quietly contradicting itself, or producing a "wait — actually" follow-up. The pattern emerges from the loop, not from baked-in timing.
Interruption awareness is also handled here: at any point during a pending self-followup, an inbound message or reaction can cancel or transform the scheduled follow-up.
Implementation: BullMQ over Redis as the queue substrate; a thin DSL layer on top that lets us express triggers declaratively and tracks job lineage. Every reactor decision is logged for debugging. We never write a cron expression in this codebase.
A disposition is a point in an N-dimensional behavior space describing how a turn should be handled. The router scores candidate dispositions against the current context and picks one (with controlled stochasticity).
- Reasoning depth —
low/medium/high. Drives synthesizer model tier. - Context strategy — one of ~15 strategies (see ContextAssembler section). Not depth-only; each is a different lens on memory. Choice of strategy is what produces output variety — same situation routed through different strategies produces meaningfully different responses.
- Voice —
playful/sharp/sincere/contrarian/nostalgic/deadpan/encouraging/indifferent/present-reaction/observational/fact-drop. Drives synthesizer prompt and which research personas fire. - Output shape —
silent/react-only/single-line/multi-line/burst. Drives dispatcher behavior.burstis multiple messages sent in sequence with natural typing-indicator behavior between them (no programmed delays — see Output shapes section). - Search posture —
none/broad-scan/person-deep/callback-hunt/contradiction-hunt/setup-payoff. Drives which research personas fire and with what prompts. - Risk appetite —
safe/spicy. Biases voice + synthesizer prompt (enables roasts, hot takes). - Self-followup hint — optional. Whether and how the reactor should consider scheduling a self-followup after this turn completes. Drives the bot's ability to come back later with a deeper, contrarian, or corrective take.
A Disposition type sketch:
type Disposition = {
id: string;
name: string;
reasoningDepth: 'low' | 'medium' | 'high';
contextStrategy: ContextStrategyId; // see ContextAssembler for the full list
voice: VoiceTag;
outputShape: 'silent' | 'react-only' | 'single-line' | 'multi-line' | 'burst';
searchPosture: SearchPostureTag;
riskAppetite: 'safe' | 'spicy';
speculativeOnly?: boolean;
selfFollowupHint?: SelfFollowupHint; // see Reactor section
};Stored in DB, configurable, scored at routing time. Starter set of ~10. Format: reasoning / contextStrategy / voice / outputShape / searchPosture / risk (+ optional flags):
silent-listen— silent output, no model call. Conscious decision not to speak.vibing— low /narrow-live/ present-reaction / react-only / none / safe; followup 5% go-deeperquick-react— low /live/ playful / single-line / none / safe; followup 10% go-deeperroast-back— medium /archive-person-deep/ sharp / single-line / person-deep / spicyfactual-recall— low /live+lore/ deadpan / single-line / none / safecallback-from-memory— high /callback-primed/ nostalgic / single-line or multi-line / callback-hunt / safesettle-debate— high /contrasting/ sincere / burst / contradiction-hunt / safe; followup 30% add-nuancecontrarian-take— high /cross-channel-similar/ contrarian / multi-line / contradiction-hunt / spicylate-night-musing— high /serendipity/ observational / burst / broad-scan / safe; speculative-only; followup 40% add-nuanceobservation-drop— medium /bot-history/ observational / single-line / broad-scan / safe; speculative-only
Note the variety in context strategies across these — that variety is the point. The router can also pick novel coordinate combinations off-axis (e.g., apply serendipity strategy to a quick-react for surprise variety).
- Inbound message (or null for speculative)
- Channel state — activity level, recent vibe, present users, current topic cluster
- Memory state — pending high-value material (e.g., a primed callback waiting)
- Budget envelope — remaining USD in period, recent spend pattern
- Orchestrator hints —
unpromptedflag, optional suggested-disposition bias - Stochasticity seed (logged for replay/A-B)
Heuristic-first scoring (cheap, deterministic). For each candidate disposition, compute a score from explicit rules:
- Hard yes/no gates (e.g., explicit mention → not
silent-listen; sleep state →silent-listen) - Heuristic signals (unanswered question detector → boosts factual dispositions; recent callback opportunity → boosts
callback-from-memory; lull → boosts speculative observational dispositions) - Voice biasing from current persona constraints (SOUL.md → which voices are even eligible)
- Budget pressure (high-cost dispositions clamped when budget low)
When heuristics produce a tight top-cluster, optionally call a cheap-tier classifier through the gateway (purpose=router-classify) for a tie-breaker. Otherwise sample directly from the scored distribution with a stochasticity temperature.
Every routing decision is written to disposition_decisions with full inputs, scores, the choice, and the stochasticity seed — this is our primary tuning surface.
For dispositions that warrant research (anything above low/live), the host fans out parallel research subagent calls before the synthesizer runs.
Each is a small, focused prompt + tool set + cheap-to-mid model call. Returns a structured payload (~200-500 tokens of findings).
- memory-archeologist — scans epochs and highlights for callbacks, parallels, "you said this before" material
- participant-profiler — pulls dossiers for active speakers; surfaces preferences, sensitivities, recent vibes
- topic-researcher — vector-searches the transcript archive for substantive history on the current topic
- comedian — brainstorms 2-3 comedic angles (fires only when voice is humorous)
- contrarian — argues against the obvious response (fires only when voice is sharp/contrarian)
Which personas fire is a function of disposition voice + search posture. A roast-back fires comedian + participant-profiler. A callback-from-memory fires memory-archeologist + participant-profiler. A vibing fires nothing. The fan-out plan is part of the disposition definition (or computed deterministically from it).
A single model call per turn (high-reasoning tier for high dispositions, mid for medium, low for low). Inputs:
- System prompt assembled from base + SOUL.md (as constraint, not instructions) + voice-specific framing
- Context block assembled per the disposition's context depth
- The inbound message (if reactive)
- A delimited
## research materialmarkdown block containing all subagent payloads as suggestions, not requirements. Instruction is "this is raw material; use what's useful, mostly let it inform without naming sources, occasionally a direct callback or quote will land — pick your moment, don't force it." Explicit references are allowed and even encouraged when one genuinely lands; they just shouldn't be the default. - Output-shape instructions
The synthesizer IS the bot's taste. Subagents are the writers' room; the synthesizer is the performer. No forced citation. Models that get raw material and explicit license to ignore it produce dramatically better output than models forced to incorporate it.
The output dispatcher is the bridge between the synthesizer and the Discord adapter. It interprets the disposition's outputShape and handles dispatch.
silent— no output. Decision was to listen. Logged but no Discord call.react-only— bot adds an emoji reaction to a recent message. Single REST call.single-line— one message, no thread, no formatting flourishes.multi-line— one message, possibly with paragraph breaks.burst— multiple messages sent in sequence as one turn. See below.
When the synthesizer judges a thought is better in multiple messages than one block ("hmm" → "ok so the thing about that is..." → "actually wait" → "..."), it produces an array of strings. The dispatcher sends them in sequence, interleaved with the platform's natural typing-indicator behavior — no programmed pauses, no fake delays. Each message goes as quickly as Discord will accept it; the visual rhythm is whatever emerges from "type-and-send" pacing. This is the part where we deliberately avoid theater. Real humans don't pad their typing with setTimeout(2000) to seem thoughtful; they just type and send.
The synthesizer output schema for burst:
type BurstOutput = {
messages: Array<{
text: string;
isCorrection?: boolean; // marks self-correction beats for telemetry
}>;
};Interruption awareness during a burst. If a human message arrives mid-burst:
- Dispatcher pauses the remaining beats.
- The remaining beats + the interrupting message go back to the router as a follow-up event.
- Router decides: (a) abort the rest entirely (most common — the floor moved); (b) fold the interruption into a continuation ("oh — yeah, exactly what jen just said"); (c) continue as planned (rare).
The "stream of consciousness" feel — the bot saying something quick, then coming back later with a deeper take, possibly contradicting itself — is not an output shape. It's an emergent pattern produced by combining:
- Burst output for the in-turn micro-thought-flow ("hmm, ok, the thing is...")
- Self-followup turns via the reactor for the macro-thought-flow ("I said something glib 2 minutes ago, here's the real take")
- Natural model behavior within those mechanisms
Architecturally cleaner than baking timing into a single output, and far more natural-feeling: every "follow-up" goes through the full router again and earns its place against the current context. If the conversation moved on, the bot doesn't barrel through with a stale thought; it adapts or shuts up.
Post-MVP: emoji-only, gif-only, react-burst, thread-spawn, etc.
The bot is aware across all channels in a guild. Memory is keyed by guildId. Channel ID is provenance, not partition.
- Live transcript (guild-wide) — last 30-50 verbatim messages across the entire guild, ordered by time, each tagged with its channel ID. NOT scoped to the current channel. The current channel is weighted heavier in assembly (more of its messages, more recently) but a hot conversation in
#randomis visible from#general. Always in context for any strategy that includes "live" material. - User dossiers — one curated text blob per user the bot has interacted with in the guild. Stable facts, declared preferences, relationship signals.
- Highlight reel — preserved-verbatim moments (quotes, jokes that landed, hot takes, callbacks, opinions). Each row has embedding for similarity search.
- Group lore — relational patterns about the group as a unit (who fights with whom, recurring jokes, shared vocabulary).
- Transcript archive — full raw transcripts with embeddings, rolling retention. Searched on demand via
memory_searchtool. - Conversation epochs — time-gap-bounded summaries. When a channel goes silent for the configured gap (default 3h), the reactor fires an epoch-close event. A summarization job (mid-tier,
purpose=epoch-summarize) produces a summary, topic tags, participants list, and embedding for the closed window. Raw transcripts persist; the epoch is an index over them, not a replacement.
An important property of the highlight reel: highlights are stored verbatim with attribution, not paraphrased. "Andrew, didn't you say last week your boss thinks rust is overhyped?" lands very differently from "Andrew, you previously expressed a negative opinion about Rust." The texture matters.
- Verbatim: direct quotes, jokes, self-disclosure, memorable phrases, callbacks. The highlight reel and live transcript both preserve original wording.
- Paraphrase only for: epoch summaries, dossier blurbs, lore narratives, aggregate sentiment. Compressed by design, not trying to preserve voice.
Not the most important rule in the system, but important enough that the schema enforces it (highlights stores raw_text; epochs store summary_text; they're different fields by intent).
This is one of the most important sources of output variety. Same context → same response. Unique context → unique response. So we don't have 4 depth levels; we have ~15 distinct strategies (call them lenses), each assembling a meaningfully different context for the synthesizer.
type ContextStrategyId =
| 'none' // just system prompt; no memory at all
| 'narrow-live' // last 5 messages, current channel only (quick reactions)
| 'live' // last 30 across guild, channel-tagged, current channel weighted
| 'live+dossiers' // + dossiers for present users
| 'live+lore' // + group lore items
| 'live+lore+dossiers' // both, no archive
| 'archive-topic-deep' // heavy vector-search on current topic; lighter live
| 'archive-person-deep' // one specific person's full historical statements on related topics
| 'epochs-only' // recent epoch summaries only; high-level vibe context, no raw messages
| 'time-shifted' // same channel(s) from a similar time-of-day in past weeks (captures the moment's vibe)
| 'cross-channel-similar' // recent conversations from OTHER channels on similar topics, mixed in
| 'contrasting' // historical material where current speakers DISAGREED on similar topics
| 'serendipity' // random archive sample for surprise / pure variety
| 'unanswered' // context biased toward unresolved threads from recent memory
| 'bot-history' // recent bot turns + how each landed (reactions, replies, callback survival)
| 'callback-primed' // when memory has flagged a specific ripe callback, prioritize that material
| 'full-rich'; // everything: live + dossiers + lore + top archive matches + recent epochs (expensive)Each disposition picks a primary strategy. The router can also stochastically apply an unexpected strategy for variety — e.g., route a quick-react through serendipity once in a while to produce a surprising non-sequitur lol that lands because it pulls something nobody expected.
memory_search remains available as a synthesizer tool for any strategy — the bot can always go reach for the archive on demand if its starting context doesn't have what it needs.
Strategies are pluggable: each is a TS module under packages/memory/strategies/ that takes (guildId, channelId, presentUsers, topicSignal, ...) and returns an assembled context block. Adding a new strategy later is a single-file change. Post-MVP we expect this catalog to grow significantly as we learn what produces interesting variety in practice.
Trigger inverted from compaction: "this just landed — does it deserve preserving?"
Signals (cheapest first): reaction counts, reply-graph engagement, bot's reply later quoted, self-disclosure patterns (regex + LLM verify), repeated phrases across the guild, time-gap after a message (rhetorical weight), sampled per-message cheap-tier scoring (purpose=memory).
Runs synchronously on synthesizer completion (scoped to the just-finished exchange) and async via reactor for batch consolidation (dossiers, lore, dedup).
The bot doesn't just live in the server — it shapes it. A TS-implemented skill bundle the host or task workers can invoke:
create-channel— propose + execute new channel creation when topic clusters warrant itcreate-thread— spawn threads for sub-conversationsarchive-channel— create read-only channels where the bot posts curated artifacts (decisions, plans, group canon)pin-moment— pin highlight-worthy messagesmanage-roles— assign/create roles based on observed participation patternspropose-restructure— suggest server reorganization to admins (never executes without confirmation)
Permissions: the bot requests full server-management scopes on install; destructive operations route through a confirmation flow that pings an admin in a dedicated #dit-mod channel auto-created on install.
The steward skills bridge the chat-AI architecture into actual environment-shaping. They're invoked by the host (for low-cost actions like thread-spawn during multi-line output) or dispatched as goals to task workers (for higher-cost or multi-step changes).
Skills are TypeScript modules in our repo. Each exports:
type Skill = {
id: string;
description: string;
catalogEntry: string;
permissions: SkillPermission[];
tools: ToolDescriptor[];
invoke: (ctx: SkillContext) => Promise<SkillResult>;
};A system-prompt catalog (JSON-formatted, conceptually similar to AgentSkills' XML catalog but expressed in JSON for cleaner authoring and prompt fit) advertises available skills to the synthesizer and task workers. Skills are loaded by ID at boot, not discovered from filesystems at runtime.
MVP skill bundle (~10):
- Memory tools:
memory_search,recall_user,recall_group - Steward tools: the steward skills listed above
- Utility:
weather,define,poll,dice,summarize_window - Light gen:
gif(Tenor via our gateway)
No marketplace, no user-uploaded skills, no MCP. Walled garden.
SOUL.md exists but plays a different role than in OpenClaw. It is NOT a runtime instruction inlined every turn. It is:
- A constraint surface for the router (which voices are eligible, which never)
- A constraint surface for the synthesizer (banned phrases, never-do list, tone bounds)
- The seed for canonical dispositions
The persona is the distribution over dispositions the router actually picks, modulated by context. A funny persona is one where the comedian-firing dispositions are weighted higher in lull/casual contexts. A wise persona is one where settle-debate and callback-from-memory get higher weight. We don't tell the model "be funny" — we configure the router to pick funny-shaped dispositions when funny is appropriate.
Stored as DB rows for the bot's active persona (single persona for MVP). Markdown templates exist in workspace/ only as authoring conveniences for us to seed the DB.
The Discord adapter (apps/adapter-discord/) is the only Discord-aware code. It exposes a generic interface:
type PlatformAdapter = {
onInboundMessage: (handler: (msg: InboundMessage) => void) => void;
sendMessage: (target: ChannelRef, content: OutputBeat) => Promise<MessageRef>;
addReaction: (msg: MessageRef, emoji: string) => Promise<void>;
createChannel: (guild: GuildRef, spec: ChannelSpec) => Promise<ChannelRef>;
createThread: (parent: ChannelRef, spec: ThreadSpec) => Promise<ChannelRef>;
... etc
};Host, router, memory, reactor, skills consume the generic interface. When we build our own platform later, we write a new adapter implementing this interface and the rest of the system doesn't change.
OpenAI-compatible HTTP service (apps/gateway-llm) in front of Anthropic + OpenAI + one cheap third option.
Required headers: x-dit-guild, x-dit-purpose. Purpose enum:
router-classify— tie-breaker classifier (cheapest tier, single-token-ish)subagent-research— research subagent calls (cheap-to-mid)synth-low/synth-mid/synth-high— synthesizer tiersextract— moment extraction (cheap)embed— embeddingsepoch-summarize— epoch summarization (mid)steward— task-worker invocations for steward operations
Per-purpose model mapping is policy-driven (DB). Per-request flow: authn (HMAC service token, never user-facing) → resolve guild policy → map purpose → tier → quota/budget check (Redis bucket) → upstream stream → emit usage event to Postgres. Failover taxonomy per tier (provider rotation on rate-limit / 5xx).
Budget envelope (budget_envelopes table) tracks spend per guild per period; reactor reads it for speculative routing decisions and clamps high-cost dispositions when low.
Core:
guilds(id, discord_id, name, config_json, persona_id)personas(id, name, soul_text, banned_phrases_json, voice_eligibility_json, disposition_weights_json)users(id, discord_id, global_facts_json)transcripts(guild_id, channel_id, seq, role, author_id, content, tool_calls_json, usage_json, ts)— raw, immutable
Memory:
dossiers(guild_id, user_id, body_text, signals_json, msg_count, last_seen_at, updated_at)highlights(id, guild_id, channel_id, user_id?, kind, raw_text, surrounding_context, score, embedding vector(1536), created_at, source_msg_id)group_lore(id, guild_id, kind, body_text, score, embedding vector(1536), updated_at)transcript_archive(id, guild_id, channel_id, author_id, content, embedding vector(1536), ts)epochs(id, guild_id, channel_id, start_ts, end_ts, summary_text, topics_json, participants_json, embedding vector(1536), score, closed_at)
Disposition system:
dispositions(id, name, dimensions_json, fanout_plan_json, self_followup_hint_json?, speculative_only, created_at)— seeded canonical set, mutable.dimensions_jsoncarries{reasoningDepth, contextStrategy, voice, outputShape, searchPosture, riskAppetite}.disposition_decisions(id, guild_id, channel_id, ts, inbound_msg_id?, trigger_kind, candidate_scores_json, chosen_disposition_id, stochasticity_seed, dimensions_used_json, context_strategy_used, subagents_fired_json, output_shape, beat_count, scheduled_followup_id?, latency_ms, cost_usd, model_used)—trigger_kindisreactive/speculative/self_followup.subagent_runs(id, decision_id, persona, input_payload_json, output_payload_json, latency_ms, cost_usd, model_used, error?)self_followup_schedules(id, parent_decision_id, scheduled_for, bias, status, fired_decision_id?, aborted_reason?)— tracks the lifecycle of pending self-followup turns from the reactor.
Reactor:
reactor_jobs(id, kind, scheduled_for?, predicate_json?, payload_json, status, attempts, last_error, fired_at?, completed_at?)reactor_events(id, kind, payload_json, ts)— append-only event logbudget_envelopes(guild_id, period_start, period_end, allocated_usd, spent_usd, refresh_at)
Gateway:
usage_events(id, guild_id, purpose, model, input_tok, output_tok, cost_usd, latency_ms, ts)gateway_policies(guild_id?, purpose, model_primary, model_fallbacks_json, quotas_json)
(SaaS tables — tenants, stripe_*, sleep_policies — deferred to post-MVP.)
dit/
apps/
bot/ # host harness orchestration entry point
adapter-discord/ # discord.js shards, REST, platform-adapter impl
gateway-llm/ # OpenAI-compatible front of providers
api/ # admin/dashboard (minimal in MVP)
packages/
agent-core/ # vendored from OpenClaw; compaction disabled
host/ # host harness loop, two-tier dispatch
disposition/ # router, scoring, stochasticity, canonical set
fanout/ # subagent fan-out runner
synthesizer/ # synthesizer prompt building + dispatch
subagent-personas/ # the 5 research personas as TS modules
output-dispatcher/ # output shape interpretation, burst + interrupt handling
reactor/ # event bus, scheduler, predicates, BullMQ wiring
memory/ # layered memory, ContextAssembler, MomentExtractor, epochs
skills/ # walled-garden TS skill registry + bundled skills
steward/ # Discord server management skills
platform/ # PlatformAdapter interface + types
db/ # Drizzle schema + migrations
shared/ # types, logger, config
workspace/ # authoring-time templates for seeding personas (not runtime)
- Node 20, TypeScript 5, ESM, pnpm workspaces
- discord.js v14 (in adapter only)
- Fastify (api + gateway-llm)
- Postgres 16 + pgvector, Redis 7
- Drizzle ORM
- BullMQ on Redis (reactor substrate)
- Hosting: Fly.io for bot + adapter (multi-region for Discord gateway latency); Render or Fly for api + gateway-llm; Neon or Supabase for Postgres; Upstash for Redis
- OpenTelemetry → Grafana Cloud
- Stripe — NOT in MVP
- Per-channel execution serialization (concurrency primitive only — memory/context remains guild-wide) — extensions/discord/src/monitor/message-handler.ts, extensions/discord/src/monitor/message-run-queue.ts
- Preflight drops before any LLM call (bot-self, allowlist, mention decision) — src/channels/mention-gating.ts
- Pending-history buffer pattern, generalized to guild-wide in our version — src/auto-reply/reply/history.ts
- discord.js's built-in REST bucket scheduler (we use the library version, not OpenClaw's port)
- Skill catalog concept in system prompt (we use JSON, OpenClaw uses XML) — src/agents/skills/skill-contract.ts
- Failover taxonomy + provider rotation for gateway — src/agents/model-fallback.ts
packages/agent-coreagent loop + AgentSkills loader vendored for task workers — packages/agent-core/src/agent-loop.ts, packages/agent-core/src/harness/skills.ts
- Plugin manifest / discovery / loader / activation planning
ChannelPluginmega-adapter (20+ optional sub-adapters) — src/channels/plugins/types.plugin.ts- WebSocket gateway server protocol — Discord is our platform; we don't reimplement a gateway
- 3500-line embedded runner — src/agents/embedded-agent-runner/run.ts. Its responsibilities are spread across host + reactor + gateway in our model.
- Custom Discord client (extensions/discord/src/internal/*) — discord.js wins
- Auth profile vault on disk — single tenant (us) holding pooled provider keys in the gateway
- Threshold-based compaction (packages/agent-core/src/harness/compaction/compaction.ts) — replaced by layered memory + epochs
- Markdown skills discovered at runtime — walled garden TS modules
- Cron / wall-clock-grid schedulers — reactor only
- Static SOUL.md injected as runtime prompt — SOUL is constraint surface, disposition router is runtime personality
- Workshop, ClawHub, ACP, voice, browser, doctor, setup wizard
- SaaS onboarding, billing, sleep mechanic (post-MVP)
MVP target: bot installed in our own Discord server, single guild, single persona, no multi-tenancy, no billing, no public OAuth flow. Goal is to validate the architecture in real chat:
- Does the disposition router pick well? (decision log analysis)
- Do the subagent personas produce useful material? (synthesizer's ignore-rate)
- Does the burst + self-followup combo produce a natural stream-of-consciousness feel, or does it read as choppy / theatrical?
- Do conversation epochs preserve enough texture?
- Does speculative routing get the bar right (memorable vs annoying)?
- Do steward skills shape the server usefully or destructively?
Decision-log dashboard is a first-class deliverable so we can iterate on routing in production.
- SaaS onboarding, multi-tenancy, OAuth flow for end users
- Billing, Stripe, sleep mechanic, quota tiers
- Public marketplace, user-uploaded skills
- MCP server support, BYO API key
- Voice, image generation, video
- Other platforms (Slack, Telegram)
- Geo discovery layer
- Learning loop (engagement feedback → router scoring)
- Mood drift (short-term router bias)
- Additional output shapes (gif-only, emoji-only, thread-spawn, react-burst)
- Per-guild persona variation
- Latency dimension (deliberate delayed callbacks)
- Web-search subagent persona
- Multi-persona per server
- Privacy / forget commands
In rough priority order, but explicitly NOT committed:
- Multi-tenancy + onboarding — public OAuth install, per-guild persona seeding, dashboard.
- Billing + sleep mechanic — Stripe, anthropomorphized free-tier sleep, quotas, per-tier model tier access.
- Learning loop — engagement signals (reactions on bot messages, reply rates, callback survival) feed back into router scoring weights. Per-guild weight drift.
- Mood drift — short-term router bias from recent outcomes (successful joke → playful voice boosted for ~1h; getting ignored → spicy risk cooled).
- More output shapes — emoji-only, gif-only, react-burst, thread-spawn, archive-channel-post.
- Latency dimension re-introduced — deliberate delayed callbacks ("earlier you said X..." 30 min later via reactor scheduling).
- Web-search subagent persona — for
topic-researcherwhen archive insufficient. - Multi-modal — image generation skill, voice replies.
- Marketplace / external skills — opening the walled garden carefully.
- Other platforms — Slack adapter, Telegram adapter.
- Custom platform — the long-term goal. Discord adapter becomes one of many.
- Geo discovery layer — find-your-people feature from the original brief.
- Phase 0 — Spike (~1w): repo bootstrap, discord.js connect via adapter, gateway proxies one call end-to-end, "@bot hi" → reply through the harness skeleton.
- Phase 1 — Host loop + disposition router skeleton (~2w): Postgres + Drizzle, two-tier loop, disposition router with full dimension set + 9 canonical dispositions seeded, heuristic scoring, simple synthesizer (no fanout yet — synthesizer gets full memory directly), TS skill registry with 2-3 utility skills, decision log writing. Proves the architecture end-to-end.
- Phase 2 — Reactor (~1.5w): event bus + durable scheduler + state-predicate triggers over BullMQ, declarative trigger DSL, job lineage logging. Migrate any "background" work scheduled in Phase 1 onto reactor primitives.
- Phase 3 — Layered memory + conversation epochs (~2.5w): the 5 memory layers end-to-end,
ContextAssemblerper-disposition assembly,MomentExtractorsync + reactor-batched, epoch close detection + summarization,memory_searchtool. This is the memory moat. - Phase 4 — Subagent fan-out + synthesizer pattern (~2w): the 5 research personas, parallel dispatch under disposition control, structured payload return, synthesizer-as-taste with soft context. Wire dispositions to fanout plans.
- Phase 5 — Speculative routing + budget envelope (~1w): reactor composite trigger for speculative turns, budget envelope as routing input, stochasticity temperature. Bot starts speaking unprompted (well-timed).
- Phase 6 — Burst output + self-followup turns (~1.5w):
burstoutput shape end-to-end (multi-message in sequence, natural typing-indicator behavior, no programmed delays); reactor-mediated self-followup turns with all bias types (go-deeper,self-correct,contrarian-self,add-nuance,callback-later); interruption handling for both in-burst and pending self-followup. - Phase 7 — Steward skills + curated bundle (~1.5w): the Discord server-management skill bundle, install-time
#dit-modchannel auto-create, confirmation flow, ~5 additional utility skills. - Phase 8 — Hardening + internal dogfood (~1.5w): shard the bot, observability, decision-log dashboard, real usage in our server, iteration on routing weights.
Total: ~14 weeks of focused work to internal alpha. Post-MVP roadmap starts when we're confident the architecture holds up in real chat.
- Hosting concretes (Fly regions, Postgres provider — Neon vs Supabase, Upstash vs Fly Redis)
- Initial budget envelope per guild for internal alpha (USD/day)
- Whether SOUL.md authoring lives in repo
workspace/or in a small admin UI from day 1 - Exact gap thresholds for epoch close (default 3h; configurable per channel?)
- Stochasticity temperature defaults per context bucket
- Steward skills: which actions auto-execute vs require admin confirmation in
#dit-mod - Project / bot name finalization (D.I.T. internal, but the visible Discord persona name?)