D.I.T. (Do It Together) — MVP Plan (Internal Alpha)

name

overview

Internal-alpha Discord bot ("D.I.T.") built around a novel chat-AI architecture: a host harness that selects a per-turn "disposition" from a full N-dimensional matrix (reasoning, ~15 context strategies, voice, output shape, search posture, risk). Dispatches parallel research subagents, synthesizes through soft context, supports multi-message bursts and reactor-mediated self-followup turns. Guild-wide awareness across all channels. Backed by layered memory with conversation epochs, an event-and-state-driven reactor (never cron), walled-garden TS skills, and Discord steward capabilities. Discord is the proving ground for an eventual custom platform.

todos

id	content	status
phase0	Phase 0 spike: repo bootstrap + discord.js adapter connect + gateway-llm proxies one call + minimal harness skeleton end-to-end	pending

id	content	status
phase1	Phase 1 host loop + disposition router skeleton: Postgres+Drizzle, two-tier loop, full-dimension router with 9 canonical dispositions, heuristic scoring, simple synthesizer (no fanout), TS skill registry, decision log	pending

id	content	status
phase2	Phase 2 reactor: event bus + durable scheduler + state predicates over BullMQ, declarative trigger DSL, job lineage logging, migrate background work onto reactor primitives	pending

id	content	status
phase3	Phase 3 layered memory + conversation epochs: 5 memory layers, ContextAssembler per-disposition, MomentExtractor sync + reactor-batched, epoch close detection + summarization, memory_search tool	pending

id	content	status
phase4	Phase 4 subagent fan-out + synthesizer pattern: 5 research personas, parallel dispatch under disposition control, structured payload return, synthesizer-as-taste with soft context	pending

id	content	status
phase5	Phase 5 speculative routing + budget envelope: reactor composite trigger for unprompted turns, budget envelope as routing input, stochasticity temperature	pending

id	content	status
phase6	Phase 6 multi-message bursts + self-followup turns: burst output shape (no programmed delays, natural typing-indicator behavior), reactor-mediated self-followup with abort-on-interrupt, follow-up bias dispositions (go-deeper, self-correct, contrarian-self, add-nuance)	pending

id	content	status
phase7	Phase 7 steward skills + curated bundle: Discord server-management skill bundle, install-time #dit-mod channel, confirmation flow, additional utility skills	pending

id	content	status
phase8	Phase 8 hardening + internal dogfood: shard the bot, observability, decision-log dashboard, real usage in our server, iterate on routing weights	pending

isProject

false

D.I.T. (Do It Together) — MVP Plan (Internal Alpha)

Goal

Build an autonomous group-native AI that joins a Discord server as a participant, not a tool. It picks how to respond, when to speak unprompted, and how much to think — on a per-turn basis — by choosing a disposition from a rich behavioral matrix. It manages the server itself as a steward (channels, threads, archives). It remembers the group across all channels using a layered memory model with verbatim moments and time-gap conversation epochs. End users never see config, never touch an API key, never edit a markdown file. Skills live in a walled garden.

Discord is the proving ground for an eventual custom chat platform. The MVP target is internal alpha in our own server — to validate the architecture in real chat and iterate fast. SaaS onboarding, billing, multi-tenancy, and the sleep mechanic are deferred to a post-MVP roadmap.

Design principles (non-negotiables)

No cron. Every timing concern goes through the reactor — an event-and-state-driven, durable, retryable orchestrator. Triggers can be timers, events, state predicates, or composites. Never wall-clock grids.
No static personality prose at runtime. SOUL.md exists as a constraint surface for what the persona is allowed to feel like, but the disposition router IS the runtime personality. Behavior emerges from choice-of-process, not from prose instructions.
No workspace files for end users. All configuration lives in the database. Walled garden by default.
No markdown-as-runtime-skill. Skills are TypeScript modules we author and ship. AgentSkills format may inspire the catalog shape, but skills are not dynamically discovered from filesystems.
No forced citation between subagents and synthesizer. Subagents return material; the synthesizer is the taste that selects what to use. Forcing citation makes outputs feel like book reports.
No compaction-as-replacement. Originals are never destroyed. Layered memory + conversation epochs replace summary-and-replace.
No one-pipeline-per-turn. Every turn is routed through the disposition system, which picks coordinates in an N-dimensional behavior space. The matrix IS the architecture.
No latency optimization at the expense of quality. Thinking takes time and that reads as deliberate to humans. Speculative turns can take minutes.
No SaaS overhead at MVP. Internal alpha in our own server. Multi-tenancy, billing, onboarding deferred.
No platform lock-in. Discord is one adapter. Host harness, router, memory, reactor, skills are platform-agnostic.

Architecture

flowchart TB
  subgraph discord [Discord platform]
    DC[Bot application + guilds]
  end

  subgraph adapter [Discord adapter platform-agnostic boundary]
    GW[discord.js shard pool]
    OUT[Output dispatcher burst + interruption aware]
  end

  subgraph host [Host harness per-channel execution, guild-wide awareness]
    Ingest[Inbound ingest + pending-history buffer]
    Router[Disposition router]
    Fanout[Subagent fan-out]
    Synth[Synthesizer]
  end

  subgraph reactor [Reactor event-and-state-driven orchestrator]
    Bus[Event bus]
    Sched[Durable scheduler]
    Timers[Stateful timers]
    Predicates[State predicates]
    Budget[Budget envelope]
  end

  subgraph workers [Worker subagents]
    Research[Research personas: memory-archeologist, profiler, topic-researcher, comedian, contrarian]
    Tasks[Task workers full agent-core for delegated goals]
  end

  subgraph mem [Layered memory guild-wide]
    Live[Live transcript]
    Doss[User dossiers]
    Hi[Highlight reel verbatim]
    Lore[Group lore]
    Arc[Transcript archive]
    Ep[Conversation epochs]
  end

  subgraph llmgw [DIT model gateway purpose-routed]
    Tlow[Low reasoning tier]
    Tmid[Mid reasoning tier]
    Thigh[High reasoning tier]
    Emb[Embeddings]
    Policy[Quotas + policy]
  end

  subgraph services [Platform]
    PG[(Postgres + pgvector)]
    Redis[(Redis)]
  end

  DC <--> GW
  GW --> Ingest
  Ingest --> Bus
  Ingest --> Router
  Bus --> Sched
  Timers --> Bus
  Predicates --> Bus
  Bus -->|speculative trigger| Router
  Budget -->|envelope| Router
  Router --> Fanout
  Fanout --> Research
  Research --> mem
  Fanout --> Synth
  Synth --> OUT
  OUT --> DC
  OUT -->|human interrupt| Router
  Router & Fanout & Synth & Research --> llmgw
  llmgw --> Policy --> PG
  mem --> PG
  Sched --> Redis
  Bus --> Tasks
  Tasks --> mem
  Tasks --> llmgw
  Synth -->|moment extraction| mem
  Sched -->|epoch close| Ep

The host harness

A "host" is the thing that decides what to do for a given turn. It is not a generic agent loop. Two things to clarify up front because they're orthogonal and easy to conflate:

Execution serialization is per-channel. There's one in-flight bot reply per channel at any moment, so the bot doesn't produce overlapping output in the same place. Replies in different channels run in parallel. This is purely a concurrency primitive (BullMQ queue keyed on guild:channel).
Awareness / context is guild-wide, always. The host has full view across every channel in the guild. A turn fired in #general sees what was just said in #random 30 seconds ago. Channel ID is metadata on every message and memory item, never a partition. This is a core distinguishing property — most chat bots (OpenClaw included) silo per-channel, which is why their "did you see what we were just talking about?" experience is broken.

Host responsibilities:

Buffer inbound messages, maintain a guild-wide pending-history window (cross-channel, with channel tags).
Receive routing inputs from two sources: (1) inbound messages, (2) reactor-initiated speculative or self-followup triggers.
Call the disposition router to pick a disposition for the current turn.
Dispatch subagent fan-out and synthesizer per the chosen disposition.
Submit output to the dispatcher (one or many message beats).
Handle interruptions during multi-message bursts and during pending self-followup turns.

The host does NOT do multi-step task execution. When something is goal-shaped ("research X and produce a writeup", "design a poll structure"), the host delegates to a task worker — a full agent-core harness with its own loop, skills, and budget — via the reactor. The host then either awaits the task result or proceeds without it and folds the result in later.

This two-tier split is the core architectural insight: chat turns are not coding tasks. Group chat is dispositional; coding is goal-directed. They need different loops.

The reactor (orchestrator — what replaces "cron")

A single primitive: "do work W when condition C becomes true, with retry, observability, and dead-letter semantics."

C can be:

A timer (after 3h of channel silence)
An event (messageCreate, reactionAdd, member joined, bot reply was quoted)
A state predicate (there are >=N pending dossier updates for guild, budget envelope under 30% with >2h until refresh)
A composite (it's evening AND it's been quiet AND the channel-activity rolling avg suggests ambient mode is welcome)

Concrete things that go through the reactor instead of cron:

Conversation epoch close — per-channel silence timer (configurable 2-4h); on fire, kick off the epoch-summarize job.
Dossier consolidation — accumulator predicate: when N new highlights/messages tagged to a user, schedule a consolidation pass.
Group lore detection — repeated-phrase detector emits events; reactor coalesces and batches.
Speculative routing — composite trigger evaluating activity level, budget remaining, time since last bot turn, randomized seed; on fire, dispatch an unprompted=true turn to the router.
Self-followup turns — see dedicated subsection below; this is how the "stream-of-consciousness" feel actually gets produced.
Budget refresh — period-boundary event.
Steward maintenance — periodic state-predicate evaluation (e.g., "archive channel has 200+ items, propose threading").
Task worker dispatch — host delegates a goal, reactor schedules and tracks the worker run.

Self-followup turns (the real stream-of-consciousness mechanism)

The output dispatcher never fakes pauses. Instead, when a turn completes, the reactor evaluates the chosen disposition's selfFollowupHint:

type SelfFollowupHint = {
  probability: number;  // 0-1 base chance of scheduling a follow-up
  bias: 'go-deeper' | 'self-correct' | 'contrarian-self' | 'add-nuance' | 'callback-later';
  windowMs: [min: number, max: number];  // when to fire, sampled uniformly
};

If the dice roll lands in favor, the reactor schedules a follow-up evaluation for some natural-feeling later moment (e.g., 30s-2min for go-deeper, 5-20min for callback-later). At evaluation time the reactor checks:

Has the conversation moved on past the original topic? → likely abort (or downgrade to callback-later for much later)
Did a human reply to the bot's earlier message? → abort the follow-up (that's now a normal reactive turn)
Is the channel currently active with someone else's in-progress thought? → defer briefly or abort

Otherwise the reactor fires a new turn into the router with is_self_followup=true plus the previous turn's full context. The router picks a disposition appropriate for the bias: go-deeper favors high-reasoning dispositions with richer context; contrarian-self favors the contrarian voice; self-correct favors a deadpan low-reasoning quick correction.

This is the actual machinery behind the bot saying something quick and then coming back 90 seconds later with a real take, or quietly contradicting itself, or producing a "wait — actually" follow-up. The pattern emerges from the loop, not from baked-in timing.

Interruption awareness is also handled here: at any point during a pending self-followup, an inbound message or reaction can cancel or transform the scheduled follow-up.

Implementation: BullMQ over Redis as the queue substrate; a thin DSL layer on top that lets us express triggers declaratively and tracks job lineage. Every reactor decision is logged for debugging. We never write a cron expression in this codebase.

The disposition router (the matrix)

A disposition is a point in an N-dimensional behavior space describing how a turn should be handled. The router scores candidate dispositions against the current context and picks one (with controlled stochasticity).

Dimensions (MVP full set)

Reasoning depth — low / medium / high. Drives synthesizer model tier.
Context strategy — one of ~15 strategies (see ContextAssembler section). Not depth-only; each is a different lens on memory. Choice of strategy is what produces output variety — same situation routed through different strategies produces meaningfully different responses.
Voice — playful / sharp / sincere / contrarian / nostalgic / deadpan / encouraging / indifferent / present-reaction / observational / fact-drop. Drives synthesizer prompt and which research personas fire.
Output shape — silent / react-only / single-line / multi-line / burst. Drives dispatcher behavior. burst is multiple messages sent in sequence with natural typing-indicator behavior between them (no programmed delays — see Output shapes section).
Search posture — none / broad-scan / person-deep / callback-hunt / contradiction-hunt / setup-payoff. Drives which research personas fire and with what prompts.
Risk appetite — safe / spicy. Biases voice + synthesizer prompt (enables roasts, hot takes).
Self-followup hint — optional. Whether and how the reactor should consider scheduling a self-followup after this turn completes. Drives the bot's ability to come back later with a deeper, contrarian, or corrective take.

A Disposition type sketch:

type Disposition = {
  id: string;
  name: string;
  reasoningDepth: 'low' | 'medium' | 'high';
  contextStrategy: ContextStrategyId;  // see ContextAssembler for the full list
  voice: VoiceTag;
  outputShape: 'silent' | 'react-only' | 'single-line' | 'multi-line' | 'burst';
  searchPosture: SearchPostureTag;
  riskAppetite: 'safe' | 'spicy';
  speculativeOnly?: boolean;
  selfFollowupHint?: SelfFollowupHint;  // see Reactor section
};

Canonical seeded dispositions (MVP)

Stored in DB, configurable, scored at routing time. Starter set of ~10. Format: reasoning / contextStrategy / voice / outputShape / searchPosture / risk (+ optional flags):

silent-listen — silent output, no model call. Conscious decision not to speak.
vibing — low / narrow-live / present-reaction / react-only / none / safe; followup 5% go-deeper
quick-react — low / live / playful / single-line / none / safe; followup 10% go-deeper
roast-back — medium / archive-person-deep / sharp / single-line / person-deep / spicy
factual-recall — low / live+lore / deadpan / single-line / none / safe
callback-from-memory — high / callback-primed / nostalgic / single-line or multi-line / callback-hunt / safe
settle-debate — high / contrasting / sincere / burst / contradiction-hunt / safe; followup 30% add-nuance
contrarian-take — high / cross-channel-similar / contrarian / multi-line / contradiction-hunt / spicy
late-night-musing — high / serendipity / observational / burst / broad-scan / safe; speculative-only; followup 40% add-nuance
observation-drop — medium / bot-history / observational / single-line / broad-scan / safe; speculative-only

Note the variety in context strategies across these — that variety is the point. The router can also pick novel coordinate combinations off-axis (e.g., apply serendipity strategy to a quick-react for surprise variety).

Router inputs

Inbound message (or null for speculative)
Channel state — activity level, recent vibe, present users, current topic cluster
Memory state — pending high-value material (e.g., a primed callback waiting)
Budget envelope — remaining USD in period, recent spend pattern
Orchestrator hints — unprompted flag, optional suggested-disposition bias
Stochasticity seed (logged for replay/A-B)

Router implementation

Heuristic-first scoring (cheap, deterministic). For each candidate disposition, compute a score from explicit rules:

Hard yes/no gates (e.g., explicit mention → not silent-listen; sleep state → silent-listen)
Heuristic signals (unanswered question detector → boosts factual dispositions; recent callback opportunity → boosts callback-from-memory; lull → boosts speculative observational dispositions)
Voice biasing from current persona constraints (SOUL.md → which voices are even eligible)
Budget pressure (high-cost dispositions clamped when budget low)

When heuristics produce a tight top-cluster, optionally call a cheap-tier classifier through the gateway (purpose=router-classify) for a tie-breaker. Otherwise sample directly from the scored distribution with a stochasticity temperature.

Every routing decision is written to disposition_decisions with full inputs, scores, the choice, and the stochasticity seed — this is our primary tuning surface.

Subagent fan-out + synthesizer pattern

For dispositions that warrant research (anything above low/live), the host fans out parallel research subagent calls before the synthesizer runs.

Research personas (MVP — 5 starters)

Each is a small, focused prompt + tool set + cheap-to-mid model call. Returns a structured payload (~200-500 tokens of findings).

memory-archeologist — scans epochs and highlights for callbacks, parallels, "you said this before" material
participant-profiler — pulls dossiers for active speakers; surfaces preferences, sensitivities, recent vibes
topic-researcher — vector-searches the transcript archive for substantive history on the current topic
comedian — brainstorms 2-3 comedic angles (fires only when voice is humorous)
contrarian — argues against the obvious response (fires only when voice is sharp/contrarian)

Which personas fire is a function of disposition voice + search posture. A roast-back fires comedian + participant-profiler. A callback-from-memory fires memory-archeologist + participant-profiler. A vibing fires nothing. The fan-out plan is part of the disposition definition (or computed deterministically from it).

Synthesizer pattern (soft context)

A single model call per turn (high-reasoning tier for high dispositions, mid for medium, low for low). Inputs:

System prompt assembled from base + SOUL.md (as constraint, not instructions) + voice-specific framing
Context block assembled per the disposition's context depth
The inbound message (if reactive)
A delimited ## research material markdown block containing all subagent payloads as suggestions, not requirements. Instruction is "this is raw material; use what's useful, mostly let it inform without naming sources, occasionally a direct callback or quote will land — pick your moment, don't force it." Explicit references are allowed and even encouraged when one genuinely lands; they just shouldn't be the default.
Output-shape instructions

The synthesizer IS the bot's taste. Subagents are the writers' room; the synthesizer is the performer. No forced citation. Models that get raw material and explicit license to ignore it produce dramatically better output than models forced to incorporate it.

Output shapes

The output dispatcher is the bridge between the synthesizer and the Discord adapter. It interprets the disposition's outputShape and handles dispatch.

Shapes (MVP)

silent — no output. Decision was to listen. Logged but no Discord call.
react-only — bot adds an emoji reaction to a recent message. Single REST call.
single-line — one message, no thread, no formatting flourishes.
multi-line — one message, possibly with paragraph breaks.
burst — multiple messages sent in sequence as one turn. See below.

Burst output

When the synthesizer judges a thought is better in multiple messages than one block ("hmm" → "ok so the thing about that is..." → "actually wait" → "..."), it produces an array of strings. The dispatcher sends them in sequence, interleaved with the platform's natural typing-indicator behavior — no programmed pauses, no fake delays. Each message goes as quickly as Discord will accept it; the visual rhythm is whatever emerges from "type-and-send" pacing. This is the part where we deliberately avoid theater. Real humans don't pad their typing with setTimeout(2000) to seem thoughtful; they just type and send.

The synthesizer output schema for burst:

type BurstOutput = {
  messages: Array<{
    text: string;
    isCorrection?: boolean;  // marks self-correction beats for telemetry
  }>;
};

Interruption awareness during a burst. If a human message arrives mid-burst:

Dispatcher pauses the remaining beats.
The remaining beats + the interrupting message go back to the router as a follow-up event.
Router decides: (a) abort the rest entirely (most common — the floor moved); (b) fold the interruption into a continuation ("oh — yeah, exactly what jen just said"); (c) continue as planned (rare).

Stream-of-consciousness emerges, it's not a shape

The "stream of consciousness" feel — the bot saying something quick, then coming back later with a deeper take, possibly contradicting itself — is not an output shape. It's an emergent pattern produced by combining:

Burst output for the in-turn micro-thought-flow ("hmm, ok, the thing is...")
Self-followup turns via the reactor for the macro-thought-flow ("I said something glib 2 minutes ago, here's the real take")
Natural model behavior within those mechanisms

Architecturally cleaner than baking timing into a single output, and far more natural-feeling: every "follow-up" goes through the full router again and earns its place against the current context. If the conversation moved on, the bot doesn't barrel through with a stale thought; it adapts or shuts up.

Post-MVP: emoji-only, gif-only, react-burst, thread-spawn, etc.

Layered memory + conversation epochs

The bot is aware across all channels in a guild. Memory is keyed by guildId. Channel ID is provenance, not partition.

Five layers + epochs

Live transcript (guild-wide) — last 30-50 verbatim messages across the entire guild, ordered by time, each tagged with its channel ID. NOT scoped to the current channel. The current channel is weighted heavier in assembly (more of its messages, more recently) but a hot conversation in #random is visible from #general. Always in context for any strategy that includes "live" material.
User dossiers — one curated text blob per user the bot has interacted with in the guild. Stable facts, declared preferences, relationship signals.
Highlight reel — preserved-verbatim moments (quotes, jokes that landed, hot takes, callbacks, opinions). Each row has embedding for similarity search.
Group lore — relational patterns about the group as a unit (who fights with whom, recurring jokes, shared vocabulary).
Transcript archive — full raw transcripts with embeddings, rolling retention. Searched on demand via memory_search tool.
Conversation epochs — time-gap-bounded summaries. When a channel goes silent for the configured gap (default 3h), the reactor fires an epoch-close event. A summarization job (mid-tier, purpose=epoch-summarize) produces a summary, topic tags, participants list, and embedding for the closed window. Raw transcripts persist; the epoch is an index over them, not a replacement.

Verbatim vs paraphrase (property of the highlights layer)

An important property of the highlight reel: highlights are stored verbatim with attribution, not paraphrased. "Andrew, didn't you say last week your boss thinks rust is overhyped?" lands very differently from "Andrew, you previously expressed a negative opinion about Rust." The texture matters.

Verbatim: direct quotes, jokes, self-disclosure, memorable phrases, callbacks. The highlight reel and live transcript both preserve original wording.
Paraphrase only for: epoch summaries, dossier blurbs, lore narratives, aggregate sentiment. Compressed by design, not trying to preserve voice.

Not the most important rule in the system, but important enough that the schema enforces it (highlights stores raw_text; epochs store summary_text; they're different fields by intent).

Context assembly (`ContextAssembler`) — strategies, not depth levels

This is one of the most important sources of output variety. Same context → same response. Unique context → unique response. So we don't have 4 depth levels; we have ~15 distinct strategies (call them lenses), each assembling a meaningfully different context for the synthesizer.

type ContextStrategyId =
  | 'none'                       // just system prompt; no memory at all
  | 'narrow-live'                // last 5 messages, current channel only (quick reactions)
  | 'live'                       // last 30 across guild, channel-tagged, current channel weighted
  | 'live+dossiers'              // + dossiers for present users
  | 'live+lore'                  // + group lore items
  | 'live+lore+dossiers'         // both, no archive
  | 'archive-topic-deep'         // heavy vector-search on current topic; lighter live
  | 'archive-person-deep'        // one specific person's full historical statements on related topics
  | 'epochs-only'                // recent epoch summaries only; high-level vibe context, no raw messages
  | 'time-shifted'               // same channel(s) from a similar time-of-day in past weeks (captures the moment's vibe)
  | 'cross-channel-similar'      // recent conversations from OTHER channels on similar topics, mixed in
  | 'contrasting'                // historical material where current speakers DISAGREED on similar topics
  | 'serendipity'                // random archive sample for surprise / pure variety
  | 'unanswered'                 // context biased toward unresolved threads from recent memory
  | 'bot-history'                // recent bot turns + how each landed (reactions, replies, callback survival)
  | 'callback-primed'            // when memory has flagged a specific ripe callback, prioritize that material
  | 'full-rich';                 // everything: live + dossiers + lore + top archive matches + recent epochs (expensive)

Each disposition picks a primary strategy. The router can also stochastically apply an unexpected strategy for variety — e.g., route a quick-react through serendipity once in a while to produce a surprising non-sequitur lol that lands because it pulls something nobody expected.

memory_search remains available as a synthesizer tool for any strategy — the bot can always go reach for the archive on demand if its starting context doesn't have what it needs.

Strategies are pluggable: each is a TS module under packages/memory/strategies/ that takes (guildId, channelId, presentUsers, topicSignal, ...) and returns an assembled context block. Adding a new strategy later is a single-file change. Post-MVP we expect this catalog to grow significantly as we learn what produces interesting variety in practice.

Moment extraction (`MomentExtractor`)

Trigger inverted from compaction: "this just landed — does it deserve preserving?"

Signals (cheapest first): reaction counts, reply-graph engagement, bot's reply later quoted, self-disclosure patterns (regex + LLM verify), repeated phrases across the guild, time-gap after a message (rhetorical weight), sampled per-message cheap-tier scoring (purpose=memory).

Runs synchronously on synthesizer completion (scoped to the just-finished exchange) and async via reactor for batch consolidation (dossiers, lore, dedup).

Steward skills (Discord server management)

The bot doesn't just live in the server — it shapes it. A TS-implemented skill bundle the host or task workers can invoke:

create-channel — propose + execute new channel creation when topic clusters warrant it
create-thread — spawn threads for sub-conversations
archive-channel — create read-only channels where the bot posts curated artifacts (decisions, plans, group canon)
pin-moment — pin highlight-worthy messages
manage-roles — assign/create roles based on observed participation patterns
propose-restructure — suggest server reorganization to admins (never executes without confirmation)

Permissions: the bot requests full server-management scopes on install; destructive operations route through a confirmation flow that pings an admin in a dedicated #dit-mod channel auto-created on install.

The steward skills bridge the chat-AI architecture into actual environment-shaping. They're invoked by the host (for low-cost actions like thread-spawn during multi-line output) or dispatched as goals to task workers (for higher-cost or multi-step changes).

Walled-garden TS skill system

Skills are TypeScript modules in our repo. Each exports:

type Skill = {
  id: string;
  description: string;
  catalogEntry: string;
  permissions: SkillPermission[];
  tools: ToolDescriptor[];
  invoke: (ctx: SkillContext) => Promise<SkillResult>;
};

A system-prompt catalog (JSON-formatted, conceptually similar to AgentSkills' XML catalog but expressed in JSON for cleaner authoring and prompt fit) advertises available skills to the synthesizer and task workers. Skills are loaded by ID at boot, not discovered from filesystems at runtime.

MVP skill bundle (~10):

Memory tools: memory_search, recall_user, recall_group
Steward tools: the steward skills listed above
Utility: weather, define, poll, dice, summarize_window
Light gen: gif (Tenor via our gateway)

No marketplace, no user-uploaded skills, no MCP. Walled garden.

Personality (constraint surface, not runtime prompt)

SOUL.md exists but plays a different role than in OpenClaw. It is NOT a runtime instruction inlined every turn. It is:

A constraint surface for the router (which voices are eligible, which never)
A constraint surface for the synthesizer (banned phrases, never-do list, tone bounds)
The seed for canonical dispositions

The persona is the distribution over dispositions the router actually picks, modulated by context. A funny persona is one where the comedian-firing dispositions are weighted higher in lull/casual contexts. A wise persona is one where settle-debate and callback-from-memory get higher weight. We don't tell the model "be funny" — we configure the router to pick funny-shaped dispositions when funny is appropriate.

Stored as DB rows for the bot's active persona (single persona for MVP). Markdown templates exist in workspace/ only as authoring conveniences for us to seed the DB.

Platform abstraction (Discord adapter)

The Discord adapter (apps/adapter-discord/) is the only Discord-aware code. It exposes a generic interface:

type PlatformAdapter = {
  onInboundMessage: (handler: (msg: InboundMessage) => void) => void;
  sendMessage: (target: ChannelRef, content: OutputBeat) => Promise<MessageRef>;
  addReaction: (msg: MessageRef, emoji: string) => Promise<void>;
  createChannel: (guild: GuildRef, spec: ChannelSpec) => Promise<ChannelRef>;
  createThread: (parent: ChannelRef, spec: ThreadSpec) => Promise<ChannelRef>;
  ... etc
};

Host, router, memory, reactor, skills consume the generic interface. When we build our own platform later, we write a new adapter implementing this interface and the rest of the system doesn't change.

DIT Model Gateway (purpose-based routing)

OpenAI-compatible HTTP service (apps/gateway-llm) in front of Anthropic + OpenAI + one cheap third option.

Required headers: x-dit-guild, x-dit-purpose. Purpose enum:

router-classify — tie-breaker classifier (cheapest tier, single-token-ish)
subagent-research — research subagent calls (cheap-to-mid)
synth-low / synth-mid / synth-high — synthesizer tiers
extract — moment extraction (cheap)
embed — embeddings
epoch-summarize — epoch summarization (mid)
steward — task-worker invocations for steward operations

Per-purpose model mapping is policy-driven (DB). Per-request flow: authn (HMAC service token, never user-facing) → resolve guild policy → map purpose → tier → quota/budget check (Redis bucket) → upstream stream → emit usage event to Postgres. Failover taxonomy per tier (provider rotation on rate-limit / 5xx).

Budget envelope (budget_envelopes table) tracks spend per guild per period; reactor reads it for speculative routing decisions and clamps high-cost dispositions when low.

Data model (Postgres + pgvector)

Core:

guilds(id, discord_id, name, config_json, persona_id)
personas(id, name, soul_text, banned_phrases_json, voice_eligibility_json, disposition_weights_json)
users(id, discord_id, global_facts_json)
transcripts(guild_id, channel_id, seq, role, author_id, content, tool_calls_json, usage_json, ts) — raw, immutable

Memory:

dossiers(guild_id, user_id, body_text, signals_json, msg_count, last_seen_at, updated_at)
highlights(id, guild_id, channel_id, user_id?, kind, raw_text, surrounding_context, score, embedding vector(1536), created_at, source_msg_id)
group_lore(id, guild_id, kind, body_text, score, embedding vector(1536), updated_at)
transcript_archive(id, guild_id, channel_id, author_id, content, embedding vector(1536), ts)
epochs(id, guild_id, channel_id, start_ts, end_ts, summary_text, topics_json, participants_json, embedding vector(1536), score, closed_at)

Disposition system:

dispositions(id, name, dimensions_json, fanout_plan_json, self_followup_hint_json?, speculative_only, created_at) — seeded canonical set, mutable. dimensions_json carries {reasoningDepth, contextStrategy, voice, outputShape, searchPosture, riskAppetite}.
disposition_decisions(id, guild_id, channel_id, ts, inbound_msg_id?, trigger_kind, candidate_scores_json, chosen_disposition_id, stochasticity_seed, dimensions_used_json, context_strategy_used, subagents_fired_json, output_shape, beat_count, scheduled_followup_id?, latency_ms, cost_usd, model_used) — trigger_kind is reactive / speculative / self_followup.
subagent_runs(id, decision_id, persona, input_payload_json, output_payload_json, latency_ms, cost_usd, model_used, error?)
self_followup_schedules(id, parent_decision_id, scheduled_for, bias, status, fired_decision_id?, aborted_reason?) — tracks the lifecycle of pending self-followup turns from the reactor.

Reactor:

reactor_jobs(id, kind, scheduled_for?, predicate_json?, payload_json, status, attempts, last_error, fired_at?, completed_at?)
reactor_events(id, kind, payload_json, ts) — append-only event log
budget_envelopes(guild_id, period_start, period_end, allocated_usd, spent_usd, refresh_at)

Gateway:

usage_events(id, guild_id, purpose, model, input_tok, output_tok, cost_usd, latency_ms, ts)
gateway_policies(guild_id?, purpose, model_primary, model_fallbacks_json, quotas_json)

(SaaS tables — tenants, stripe_*, sleep_policies — deferred to post-MVP.)

Repo layout

dit/
  apps/
    bot/                  # host harness orchestration entry point
    adapter-discord/      # discord.js shards, REST, platform-adapter impl
    gateway-llm/          # OpenAI-compatible front of providers
    api/                  # admin/dashboard (minimal in MVP)
  packages/
    agent-core/           # vendored from OpenClaw; compaction disabled
    host/                 # host harness loop, two-tier dispatch
    disposition/          # router, scoring, stochasticity, canonical set
    fanout/               # subagent fan-out runner
    synthesizer/          # synthesizer prompt building + dispatch
    subagent-personas/    # the 5 research personas as TS modules
    output-dispatcher/    # output shape interpretation, burst + interrupt handling
    reactor/              # event bus, scheduler, predicates, BullMQ wiring
    memory/               # layered memory, ContextAssembler, MomentExtractor, epochs
    skills/               # walled-garden TS skill registry + bundled skills
    steward/              # Discord server management skills
    platform/             # PlatformAdapter interface + types
    db/                   # Drizzle schema + migrations
    shared/               # types, logger, config
  workspace/              # authoring-time templates for seeding personas (not runtime)

Tech stack

Node 20, TypeScript 5, ESM, pnpm workspaces
discord.js v14 (in adapter only)
Fastify (api + gateway-llm)
Postgres 16 + pgvector, Redis 7
Drizzle ORM
BullMQ on Redis (reactor substrate)
Hosting: Fly.io for bot + adapter (multi-region for Discord gateway latency); Render or Fly for api + gateway-llm; Neon or Supabase for Postgres; Upstash for Redis
OpenTelemetry → Grafana Cloud
Stripe — NOT in MVP

Patterns we copy from OpenClaw

Per-channel execution serialization (concurrency primitive only — memory/context remains guild-wide) — extensions/discord/src/monitor/message-handler.ts, extensions/discord/src/monitor/message-run-queue.ts
Preflight drops before any LLM call (bot-self, allowlist, mention decision) — src/channels/mention-gating.ts
Pending-history buffer pattern, generalized to guild-wide in our version — src/auto-reply/reply/history.ts
discord.js's built-in REST bucket scheduler (we use the library version, not OpenClaw's port)
Skill catalog concept in system prompt (we use JSON, OpenClaw uses XML) — src/agents/skills/skill-contract.ts
Failover taxonomy + provider rotation for gateway — src/agents/model-fallback.ts
packages/agent-core agent loop + AgentSkills loader vendored for task workers — packages/agent-core/src/agent-loop.ts, packages/agent-core/src/harness/skills.ts

Patterns we explicitly do NOT copy

Plugin manifest / discovery / loader / activation planning
ChannelPlugin mega-adapter (20+ optional sub-adapters) — src/channels/plugins/types.plugin.ts
WebSocket gateway server protocol — Discord is our platform; we don't reimplement a gateway
3500-line embedded runner — src/agents/embedded-agent-runner/run.ts. Its responsibilities are spread across host + reactor + gateway in our model.
Custom Discord client (extensions/discord/src/internal/*) — discord.js wins
Auth profile vault on disk — single tenant (us) holding pooled provider keys in the gateway
Threshold-based compaction (packages/agent-core/src/harness/compaction/compaction.ts) — replaced by layered memory + epochs
Markdown skills discovered at runtime — walled garden TS modules
Cron / wall-clock-grid schedulers — reactor only
Static SOUL.md injected as runtime prompt — SOUL is constraint surface, disposition router is runtime personality
Workshop, ClawHub, ACP, voice, browser, doctor, setup wizard
SaaS onboarding, billing, sleep mechanic (post-MVP)

Internal alpha framing

MVP target: bot installed in our own Discord server, single guild, single persona, no multi-tenancy, no billing, no public OAuth flow. Goal is to validate the architecture in real chat:

Does the disposition router pick well? (decision log analysis)
Do the subagent personas produce useful material? (synthesizer's ignore-rate)
Does the burst + self-followup combo produce a natural stream-of-consciousness feel, or does it read as choppy / theatrical?
Do conversation epochs preserve enough texture?
Does speculative routing get the bar right (memorable vs annoying)?
Do steward skills shape the server usefully or destructively?

Decision-log dashboard is a first-class deliverable so we can iterate on routing in production.

Out of scope for MVP

SaaS onboarding, multi-tenancy, OAuth flow for end users
Billing, Stripe, sleep mechanic, quota tiers
Public marketplace, user-uploaded skills
MCP server support, BYO API key
Voice, image generation, video
Other platforms (Slack, Telegram)
Geo discovery layer
Learning loop (engagement feedback → router scoring)
Mood drift (short-term router bias)
Additional output shapes (gif-only, emoji-only, thread-spawn, react-burst)
Per-guild persona variation
Latency dimension (deliberate delayed callbacks)
Web-search subagent persona
Multi-persona per server
Privacy / forget commands

Post-MVP roadmap

In rough priority order, but explicitly NOT committed:

Multi-tenancy + onboarding — public OAuth install, per-guild persona seeding, dashboard.
Billing + sleep mechanic — Stripe, anthropomorphized free-tier sleep, quotas, per-tier model tier access.
Learning loop — engagement signals (reactions on bot messages, reply rates, callback survival) feed back into router scoring weights. Per-guild weight drift.
Mood drift — short-term router bias from recent outcomes (successful joke → playful voice boosted for ~1h; getting ignored → spicy risk cooled).
More output shapes — emoji-only, gif-only, react-burst, thread-spawn, archive-channel-post.
Latency dimension re-introduced — deliberate delayed callbacks ("earlier you said X..." 30 min later via reactor scheduling).
Web-search subagent persona — for topic-researcher when archive insufficient.
Multi-modal — image generation skill, voice replies.
Marketplace / external skills — opening the walled garden carefully.
Other platforms — Slack adapter, Telegram adapter.
Custom platform — the long-term goal. Discord adapter becomes one of many.
Geo discovery layer — find-your-people feature from the original brief.

Phased delivery (rough sizing — internal alpha target)

Phase 0 — Spike (~1w): repo bootstrap, discord.js connect via adapter, gateway proxies one call end-to-end, "@bot hi" → reply through the harness skeleton.
Phase 1 — Host loop + disposition router skeleton (~2w): Postgres + Drizzle, two-tier loop, disposition router with full dimension set + 9 canonical dispositions seeded, heuristic scoring, simple synthesizer (no fanout yet — synthesizer gets full memory directly), TS skill registry with 2-3 utility skills, decision log writing. Proves the architecture end-to-end.
Phase 2 — Reactor (~1.5w): event bus + durable scheduler + state-predicate triggers over BullMQ, declarative trigger DSL, job lineage logging. Migrate any "background" work scheduled in Phase 1 onto reactor primitives.
Phase 3 — Layered memory + conversation epochs (~2.5w): the 5 memory layers end-to-end, ContextAssembler per-disposition assembly, MomentExtractor sync + reactor-batched, epoch close detection + summarization, memory_search tool. This is the memory moat.
Phase 4 — Subagent fan-out + synthesizer pattern (~2w): the 5 research personas, parallel dispatch under disposition control, structured payload return, synthesizer-as-taste with soft context. Wire dispositions to fanout plans.
Phase 5 — Speculative routing + budget envelope (~1w): reactor composite trigger for speculative turns, budget envelope as routing input, stochasticity temperature. Bot starts speaking unprompted (well-timed).
Phase 6 — Burst output + self-followup turns (~1.5w): burst output shape end-to-end (multi-message in sequence, natural typing-indicator behavior, no programmed delays); reactor-mediated self-followup turns with all bias types (go-deeper, self-correct, contrarian-self, add-nuance, callback-later); interruption handling for both in-burst and pending self-followup.
Phase 7 — Steward skills + curated bundle (~1.5w): the Discord server-management skill bundle, install-time #dit-mod channel auto-create, confirmation flow, ~5 additional utility skills.
Phase 8 — Hardening + internal dogfood (~1.5w): shard the bot, observability, decision-log dashboard, real usage in our server, iteration on routing weights.

Total: ~14 weeks of focused work to internal alpha. Post-MVP roadmap starts when we're confident the architecture holds up in real chat.

Open questions (resolve before / during Phase 1)

Hosting concretes (Fly regions, Postgres provider — Neon vs Supabase, Upstash vs Fly Redis)
Initial budget envelope per guild for internal alpha (USD/day)
Whether SOUL.md authoring lives in repo workspace/ or in a small admin UI from day 1
Exact gap thresholds for epoch close (default 3h; configurable per channel?)
Stochasticity temperature defaults per context bucket
Steward skills: which actions auto-execute vs require admin confirmation in #dit-mod
Project / bot name finalization (D.I.T. internal, but the visible Discord persona name?)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

D.I.T. (Do It Together) — MVP Plan (Internal Alpha)

Goal

Design principles (non-negotiables)

Architecture

The host harness

The reactor (orchestrator — what replaces "cron")

Self-followup turns (the real stream-of-consciousness mechanism)

The disposition router (the matrix)

Dimensions (MVP full set)

Canonical seeded dispositions (MVP)

Router inputs

Router implementation

Subagent fan-out + synthesizer pattern

Research personas (MVP — 5 starters)

Synthesizer pattern (soft context)

Output shapes

Shapes (MVP)

Burst output

Stream-of-consciousness emerges, it's not a shape

Layered memory + conversation epochs

Five layers + epochs

Verbatim vs paraphrase (property of the highlights layer)

Context assembly (ContextAssembler) — strategies, not depth levels

Moment extraction (MomentExtractor)

Steward skills (Discord server management)

Walled-garden TS skill system

Personality (constraint surface, not runtime prompt)

Platform abstraction (Discord adapter)

DIT Model Gateway (purpose-based routing)

Data model (Postgres + pgvector)

Repo layout

Tech stack

Patterns we copy from OpenClaw

Patterns we explicitly do NOT copy

Internal alpha framing

Out of scope for MVP

Post-MVP roadmap

Phased delivery (rough sizing — internal alpha target)

Open questions (resolve before / during Phase 1)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Context assembly (`ContextAssembler`) — strategies, not depth levels

Moment extraction (`MomentExtractor`)

Packages