feat: multi-agent team command (/team) by marioidival · Pull Request #10 · marioidival/limit

marioidival · 2026-03-16T20:13:35Z

Summary

Multi-agent team system with the /team command — specialized roles (PM, TL, Jr) coordinate complex tasks through a structured 6-phase workflow with actor-based orchestration, parallel task execution, TUI streaming, build verification, and SQLite persistence.

106 commits | 60 files changed | +9,353 / -129 lines

Architecture

User Request
    │
    ▼
┌─────────────────────────────────────────────────┐
│              OrchestratorActor                   │
│  (state machine, tokio task, oneshot I/O)       │
│                                                  │
│  Phase 1: PM Analysis                            │
│  Phase 2: TL Plan                                │
│  Phase 3: TL Breakdown  → parse_tasks()          │
│  Phase 4: Jr Execution   → buffer_unordered       │
│  Phase 5: TL Validation  → build + verify         │
│  Phase 6: PM Delivery                            │
└──────────┬──────────────┬───────────┬────────────┘
           │              │           │
     ┌─────▼───┐   ┌─────▼───┐  ┌───▼──────────┐
     │AgentActor│   │AgentActor│  │  AgentActor[]│
     │  (PM)   │   │  (TL)   │  │  (Jr x N)   │
     └────┬────┘   └────┬────┘  └─────┬───────┘
          │             │             │
     ┌────▼────┐   ┌───▼────┐   ┌───▼──────────┐
     │TeamAgent│   │TeamAgent│   │  TeamAgent   │
     │(stream) │   │(stream) │   │ (tools)      │
     └─────────┘   └─────────┘   └──────────────┘

Evolution

See review comments below for a chronological walkthrough of each major design decision.

Phase 1: Foundation (commits 1-10)

Core types (Role, Task, TeamHistory), TeamAgent with LLM streaming, 6-phase workflow pipeline, /team CLI command, ADR-001.

Phase 2: Configuration & Hardening (commits 11-18)

Config-driven setup with per-role tool whitelists, error recovery with exponential backoff, MockLlmProvider for testing, role defaults, JSON persistence, integration tests.

Phase 3: Real Parallelism & Performance (commits 19-28)

True parallel Jr execution via buffer_unordered, prompt_stream() for streaming, phase logging, TeamSnapshot/TeamStore persistence, comprehensive docs.

Phase 4: Security & Quality (commits 29-35)

Fixed critical parallelism bug (serialized despite buffer_unordered), Jr tool restriction, PR review fixes, path sanitization with canonicalization, Arc<Vec<Message>> for COW history, ToolCallAccumulator extraction, parking_lot::Mutex.

Phase 5: Actor System Rewrite (commits 36-42)

Replaced sequential workflow with actor model: ActorRef<TeamMessage>, OrchestratorActor, AgentActor, oneshot request-response pattern, topological task dependency ordering, TeamProgressEvent system for TUI.

Phase 6: Observability & Telemetry (commits 43-48)

Detailed actor execution logging, file tracking at tool level (replacing regex extraction), token usage accumulation via Arc<AtomicU64>, TUI progress panel widget with spinner, phase bar, task list.

Phase 7: Per-Role Flexibility (commits 49-52)

Per-role model/provider selection with config inheritance, SQLite persistence replacing JSON, dead code removal.

Phase 8: Self-Referential Tools & Recursion (commits 53-58)

ToolRegistry interior mutability via RwLock, team_start tool for recursive team spawning with recursion depth guard.

Phase 9: Build Verification & Validation (commits 59-66)

Compilation check after Jr execution, structured TL validation (PASS/FAIL per task), file_edit retry on concurrent modification, TL-suggested build command replacing hardcoded detection.

Phase 10: Prompt Engineering & Role Separation (commits 67-72)

DoD per task, Jr max_tool_rounds 15→8, token-efficient prompts with CONTEXT blocks, structured handoff format (TASK/FILE_TARGETS/IMPLEMENTATION_HINTS/DoD), improved role separation.

Phase 11: P0 Fixes (commits 73-82)

Legacy workflow removal, per-phase timeout (120s), CAS loop for recursion depth, is_retryable hardening, validation failure surfacing in PM delivery, PM tool removal.

Phase 12: P1/P2 Improvements (commits 83-93)

TokenUpdate/StreamChunk progress events, PM/TL streaming to TUI, validation retry loop, token cost surfacing, robust task parsing (case-insensitive, CONTEXT-aware, markdown prefix stripping, numeric deps), multiple CONTEXT block handling in truncate_context.

Key Design Decisions

Decision	Rationale
Actor model over sequential loop	True parallelism, bounded mailboxes, clean shutdown
Per-role provider override	Cost optimization (cheap Jr model) + quality (strong PM/TL model)
TL-suggested build command	Works for any language, not just heuristic file probes
Topological dependency ordering	Tasks with DEPENDS_ON wait for completion, deadlock detection
CONTEXT blocks in TL breakdown	Jr agents don't waste tool calls re-reading files
Structured handoff format	TL provides FILE_TARGETS + IMPLEMENTATION_HINTS + DoD
Validation retry loop	Failed tasks get one retry with "VALIDATION RETRY" context
`parse_tasks` state machine	Robust to LLM output variations (casing, nesting, prefixes)

Config

[team]
default_juniors = 2
max_parallel_tasks = 4

[team.roles.pm]
model = "glm-5-turbo"

[team.roles.tl]
model = "glm-5-turbo"

[team.roles.jr]
model = "glm-4.7"

Test Plan

700+ unit tests pass across all crates
0 clippy warnings
Integration test with MockLlmProvider covers full pipeline
Integration test for validation retry loop
Manual testing with real LLM (nuveo team, 9 tasks, recursive)
E2E smoke tests with real LLM providers

Restructure system prompt into clear sections (Core Behavior, Communication Style, Work Protocols, Decision Framework, Safety Boundaries, Constraints) with concrete examples and action rules.

- Add chrono with serde feature for timestamped team events - Enable tokio sync feature for RwLock used in shared team history

- Add Role enum (PM, TL, Jr) with system_prompt() and label() - Add system prompts for each role via include_str! - PM: requirement analysis and product vision - TL: architecture, task breakdown, validation - Jr: task execution with tools

Append-only event log with timestamp, role, action, and content. Used to track the full lifecycle of team executions for debugging and the /team history subcommand.

- Task struct with id, description, and status (Pending/InProgress/Completed/Failed) - TaskResult with output and success flag - parse_tasks() to extract TASK: lines from TL responses

Specialized agent wrapping LlmProvider with role-specific system prompts. Handles streaming response collection, tool call accumulation, automatic tool execution with result feedback, and multi-turn tool loops.

6-phase execution: PM analysis → TL plan → TL task breakdown → Jr task execution → TL validation → PM delivery. Includes WorkflowPhase enum for tracking and graceful fallback when TL produces no parseable tasks.

- Team struct composing PM, TL, and Jr agents with shared history - TeamConfig with num_juniors and max_parallel_tasks - Public re-exports: Role, Team, TeamAgent, TeamConfig, TeamEvent, TeamHistory, TeamResult - Register team module in lib.rs

- /team create --name <n> [--juniors N]: register team - /team delete <n>: remove team - /team list: show all teams - /team status <n>: show team info - /team start --team <n> --task <desc>: execute with real LLM provider - /team history <n>: show execution events /team start loads config, creates provider via ProviderFactory, builds a ToolRegistry with all 17 tools, and runs the full PM→TL→Jr→TL→PM workflow. Register TeamCommand in default registry.

Architecture decision record covering problem statement, design, component structure, core abstractions, workflow, configuration, testing strategy, and implementation roadmap.

…mmand trait Root cause: tokio::runtime::Handle::current().block_on() panics when called from within a thread that has no active tokio runtime. The TUI runs commands synchronously in the main event loop thread, which uses std::thread, not tokio. Changes: - Replace Arc<RwLock<HashMap>> with std::sync::Mutex<HashMap> for teams storage — no async runtime needed for create/delete/list/status - Move /team start execution to a dedicated std::thread::spawn with its own tokio::runtime::Runtime::new() (same pattern as handle_enter) - /team history uses a fresh Runtime::new() for the single .await - Write results back to ChatView via Arc<Mutex<ChatView>> instead of CommandContext (which is not Send)

Phase 3 staged changes: - Add `team_raw: Option<toml::Value>` to `limit-llm::Config` for opaque `[team]` section passthrough (avoids coupling limit-llm to limit-agent) - Add `RoleConfig`, `TeamSection`, `TeamRolesSection` types with full TOML deserialization and sensible defaults (PM: no tools, TL: bash, Jr: all tools) - Add `TeamAgent::with_allowed_tools()` to create filtered tool registries - Add `ToolRegistry::register_arc()` for sharing pre-built Arc<dyn Tool> - Update `TeamConfig` with `roles: TeamRolesSection` and `TeamConfig::from_section()` builder - Update `Team::new()` to pass per-role tool whitelists to agents - Update `/team start` to read config from `config.toml` `[team]` section - Make `Role` enum `Copy` (no data variants) - Refactor bridge acquisition pattern in app_impl.rs - Update all test fixtures with `team_raw` field - Add `toml = "0.8"` dependency to limit-agent - Update ADR with implementation progress and design divergence appendix

Phase 3 & 4 remaining items: Error recovery: - Exponential backoff retry on transient LLM errors (rate limits, 429, 502, 503, timeout, overloaded). Max 3 retries with 1s/2s/4s delays. - Jr task retry once on failure before marking as failed. - is_retryable() heuristic for transient error detection. Observability: - EventLevel enum (Info/Warn/Error) on TeamEvent with Display impl. - TeamHistory::add_warn(), add_error(), error_count() helpers. - TeamResult now tracks failed_tasks, total_tasks, total_retries. CLI improvements: - /team start shows task success/failure counts and actionable error hints (rate limit, auth, timeout). - /team history shows summary header with error/warning counts. - /help includes team command references. - TeamError and LlmError variants added to AgentError. Tests: 76 agent + 296 cli + full workspace pass (0 failures).

Returns pre-configured responses sequentially, cycling when exhausted. Supports `with_response()`, `with_responses()`, `with_name()`, `with_model()`, `call_count()`, and `reset()`. Re-exported as `limit_llm::MockLlmProvider`.

- default_model: PM/TL → "gpt-4", Jr → "gpt-4o-mini" - default_tools: PM → [] (none), TL → ["bash"], Jr → ["file_read", "file_write", "file_edit", "bash"] - Derive Default for RoleConfig instead of manual impl

…erialize - Add `enable_streaming: bool` field (default: true) - Derive Serialize + Deserialize on TeamConfig for persistence - Wire enable_streaming through from_section() - Re-export TeamSnapshot and TeamStore from lib.rs

Returns `impl Stream<Item = Result<String, AgentError>>` that yields content deltas as they arrive from the provider. Handles tool calls automatically with retry logic matching prompt(). Refactor: extract is_retryable_e() for reuse between prompt and prompt_stream, use is_some_and() for history guard.

- Replace sequential for-loop with buffer_unordered(max_parallel) for Jr task execution (rename _max_parallel → max_parallel) - Add files_modified: Vec<String> to TeamResult - Implement extract_modified_files() via regex on task outputs - Add log_phase() to emit phase:XXX events for TUI progress - Derive Default for EventLevel enum

…t/teams/ - TeamSnapshot: serializable team state (name, config, history, run_count) - TeamStore: save/load/delete/list operations on JSON files - Sanitizes team names with special characters - Default directory: ~/.limit/teams/ (fallback to temp dir) - Add dirs and tempfile deps to limit-agent

- Integrate TeamStore into TeamCommand (save on create, delete on delete, load on list) - Report 🔄 phase transitions during /team start execution - Show 📁 Files section when files_modified is non-empty - Fix missing max_parallel_tasks in handle_create config

10 tests covering the full PM → TL → Jr pipeline: - test_full_team_workflow: validates solution, tasks, files_modified - test_workflow_phases_in_order: checks all 6 phases in sequence - test_workflow_records_pm/tl/jr_events: per-role event verification - test_workflow_empty_tasks_returns_early: early return path - test_workflow_duration_is_positive - test_team_create_and_events: team structure validation - test_team_reset_clears_state - test_team_persistence_save_load

- Remove unused `WorkflowPhase` import in integration tests - Convert runtime asserts on constants to const asserts - Replace `std::sync::Mutex` with `tokio::sync::Mutex` to fix `await_holding_lock` warning in parallel task execution

Covers CLI usage, workflow pipeline, configuration, architecture, error handling, persistence, testing, performance, design decisions, and programmatic API reference.

- Add 'Multi-Agent Teams' to features list - Add /team subcommands to Available Commands table - Add TEAM_SYSTEM.md to Documentation section - Update limit-agent crate description

Critical #1: Parallel execution was serialized - Changed from single Mutex<&mut [TeamAgent]> to Vec<Arc<Mutex<TeamAgent>>> - Each Jr agent now has its own mutex, enabling true parallel execution - buffer_unordered now executes tasks in parallel across different agents Critical #2: Jr agents had unrestricted tool access - Changed default from tools: None (all tools) to Role::Jr.default_tools() - Jr now only has safe tools: file_read, file_write, file_edit, bash - Dangerous tools (git_push, web_fetch, etc.) require explicit opt-in Both issues identified in PR #10 review were valid and have been fixed.

High Priority Fixes: - Fix build_llm_tools() returning empty Vec - now builds tool definitions from registry - Fix ToolCallDelta argument accumulation - handles both Object and String fragments Medium Priority Fixes: - Improve is_retryable() with word-boundary matching to avoid false positives - Add trim_history() to prevent unbounded conversation history growth - Enhance path sanitization in TeamStore with protection against .., null bytes, reserved names Low Priority Fixes: - Track total_retries using AtomicUsize for thread-safe parallel task tracking - Use OnceLock for regex compilation to avoid repeated compilation All 125 tests pass.

Replace generic "note any issues" TL prompt with structured per-task PASS/FAIL assessment. Include compilation output in prompt and require TL to explicitly FAIL tasks that caused build errors. Also adds progress_tx to AgentActor and JrExecute sub-status updates.

Add retry loop (max 3 attempts, 100ms backoff) to file_edit tool. On "old_text not found" or write failure, re-read the file and retry instead of failing immediately. Prevents race conditions when multiple Jr agents edit the same file concurrently.

tokio::process::Command does not use a shell, so "2>&1" was passed as a literal argument causing "unexpected argument" errors. Stderr is already captured separately via Command::output().

Remove run_compilation_check() that detected build systems via file probes. Instead, ask TL to suggest a build command based on the project context (TlSuggestBuildCommand message). The orchestrator parses the response, runs it via `sh -c`, and passes results to TlValidate for assessment.

The serde per-field default (default_tools_empty) gave Some(vec![]) to ALL roles when `tools` was absent from TOML, including Jr which needs file_read/file_write/file_edit/bash. TeamRolesSection::default() was only used when the role section was completely absent, not when partially specified (e.g., provider+model without tools). Now absent `tools` deserializes as None, then normalize_tools() fills in role-specific defaults: PM/TL → empty, Jr → restricted tool set.

…lidation - TL breakdown now requires DEFINITION_OF_DONE (2-4 criteria per task) - Jr prompt references DoD to scope work and avoid exploration bloat - Jr max_tool_rounds: 15 → 8 (most tasks complete in 3-6 rounds) - TL validation now fails tasks that caused build errors (was ignored) - Trim build output to 3000 chars to reduce token bloat in TL context

- Parse **FAIL** lines from TL validation output instead of ignoring them - Include validation failures in failed_tasks count (DB status follows) - Fix hardcoded Finished(true) to reflect actual failure state - Guard against empty/exploratory TL plans (< 200 chars or "explore") - Strengthen TlPlan prompt to forbid exploration responses

- Rewrite TL system prompt: 4 distinct phases, NO tools for 1-3, tools (bash, file_read) for validation phase 4 - Refine Jr system prompt: explicit 5-step workflow, restrict bash to build/test only (no ls/find/cat exploration) - Add structured output format to PM analysis (Core Problem, Key Requirements, Constraints, Success Criteria) - Allow CONTEXT blocks in TlBreakdown so TL provides file contents to Jr, eliminating exploration waste - TL validate now runs build command itself with tools instead of receiving truncated build_output from orchestrator - Remove orchestrator's run_command() and 3000-char truncation - TL default tools: ["bash", "file_read"], max_tool_rounds: 15

PM now has bash + file_read (5 tool rounds) to explore the project before analyzing, preventing wrong stack selection (e.g. Python in a Rust project). PmAnalyze explicitly asks PM to read Cargo.toml or equivalent and includes a Project Context section in output. Also fix 2 clippy warnings in team_progress.rs: - format_in_format_args: flatten nested format! call - unnecessary_cast: remove redundant usize -> usize cast

- PM: remove code exploration, focus on product/business requirements - TL: add FILE_TARGETS and IMPLEMENTATION_HINTS to every task - Jr: explicit instruction to follow hints, minimal exploration

Errors containing "retries exceeded" or "after retry" are permanent failures, not transient. Tighten the match to avoid infinite retries.

Ask PM/TL with tokio::time::timeout. Adds PhaseTimeout error variant.

Replace check-then-act with compare_exchange_weak to atomically reserve a depth slot before incrementing.

Prepend warning context to validation text so PM knows which tasks failed and must be re-done.

PM works from provided text only — no need for bash/file_read.

Remove DEFAULT_MAX_TOOL_ROUNDS constant and TeamAgent::placeholder() which were only used by the deleted execute_tasks_parallel.

…Actor

The retry logic could never match failed tasks because TL validation responses use numeric indices ("## Task: 1") but the orchestrator compared against UUIDs. Fix by falling back to 1-based index matching when UUID matching fails, and enrich the results_summary with task index + description so the TL has context for its validation output.

Case-insensitive TASK/DEPENDS_ON matching, markdown list prefix stripping, CONTEXT-block-aware DEPENDS_ON parsing, bold/code prefix handling, numeric dependency refs, and parse-failure tracing.

marioidival

Phase 1: Foundation (`a913869`..`86d06f7`)

Core architecture decision: PM/TL/Jr role metaphor

The initial 10 commits establish the fundamental metaphor: a product manager, tech lead, and junior developers collaborate through a structured 6-phase workflow.

Key decisions:

Role enum with include_str! prompts — Each role gets an embedded system prompt compiled into the binary. Simple, no runtime file I/O, but prompt changes require recompilation. Chosen for reliability over flexibility.
TeamHistory as append-only event log — Every action is timestamped. Critical for debugging multi-agent flows.
parse_tasks() extracting TASK: lines — Simple line-splitting initially, later evolved into a robust state machine (commit 529bf41).
TeamAgent wrapping LlmProvider — One LLM abstraction, multiple specialized agents.

Pain point:

commit 309090c: tokio::sync::RwLock in the Command trait crashed — TUI runs commands in std::thread, not tokio. Fixed with std::sync::Mutex + dedicated thread.

Phase 2: Configuration & Hardening (`724483d`..`993b5a5`)

Key decisions:

team_raw: Option<toml::Value> in limit-llm::Config — Opaque TOML passthrough avoids coupling LLM crate to agent types.
Per-role tool whitelists — PM: no tools, TL: bash only, Jr: safe subset. Prevents PM "exploring code".
Exponential backoff retry — 1s/2s/4s delays for transient errors (429, 502, 503). is_retryable() heuristic.
MockLlmProvider — Returns pre-configured responses in order. Made the system testable without API keys.
Jr defaults to cheaper model — gpt-4o-mini for Jr, gpt-4 for PM/TL. Cross-model composition is a core feature.

Phase 3: Real Parallelism (`ebfabd8`..`ab9059f`)

Key decisions:

buffer_unordered(max_parallel) — Replaced sequential loop. Biggest throughput improvement.
prompt_stream() — Returns impl Stream<Item = Result<String, AgentError>> for streaming responses.
JSON persistence — Later replaced with SQLite, but established the pattern.

Phase 4: Security & Quality (`4d2d79a`..`9ce746d`)

Critical fixes from PR review:

commit 4d2d79a: Jr had ALL tools (including git_push)! Fixed with safe defaults.
commit 4d2d79a: Parallel execution was serialized by single Mutex! Fixed with per-agent mutexes.
Path canonicalization (c35896d) — Defense-in-depth against traversal attacks.
Arc<Vec<Message>> COW (4c53559) — History cloning went from O(n) to O(1) per request.
ToolCallAccumulator (58eeaf0) — Unified duplicated tool-call accumulation.
parking_lot::Mutex (9ce746d) — No poisoning overhead, better performance.

Phase 5: Actor System Rewrite (`7ca96be`..`b9415dd`)

The big architectural shift.

Why rewrite?

Sequential execute_workflow() couldn't handle: true parallelism, cancellation, per-role streaming, topological ordering.

Key decisions:

ActorRef<T> + bounded mpsc (32) — Each agent in its own tokio task.
Oneshot channels for request-response pattern.
Topological dependency ordering for DEPENDS_ON directives.
PromptResult with usage, hit_tool_limit, files_modified.
TeamProgressEvent with nesting for recursive teams.

Phase 6: Observability (`ad54458`..`9ef953d`)

Key decisions:

File tracking at tool level (ea9b062) — Replaced regex extraction. Direct tracking via tool calls.
Token accumulation via Arc<AtomicU64> (e1b516f) — Shared counters across all agents.
TL breakdown prompt simplification (e1b516f) — Removed file-reading from TL. CONTEXT blocks carry contents.
TUI progress panel — Spinner, phase bar, scrollable task list, finish summary.

Phase 7: Per-Role Flexibility (`835045f`..`7156518`)

Key decisions:

Per-role provider override (2c6100f) — Cross-provider mixing: Anthropic PM/TL + OpenAI Jr.
SQLite persistence (0d05fa1) — WAL mode, atomic transactions, cascade deletes. Replaced JSON.
Config bug fix (835045f) — team_raw didn't match [team] TOML key. Configs silently ignored!

Phase 8: Recursion (`f5fc4dd`..`0887f30`)

Key decisions:

ToolRegistry with RwLock — Enables self-referential team_start tool.
team_start tool (606cb33) — Agents spawn child teams. Recursion depth guard (default 2).
TL default tools: ["bash", "file_read"] — Needed for validation. Evolved from "planning-only".
system_prompt_with_context() (0887f30) — Injects CWD into prompts. Prevents wrong language selection.

Phase 9: Build Verification (`4b403a1`..`b86092f`)

Closing the feedback loop.

Key decisions:

Compilation check after Jr (9ff3b49) — Build output passed to TL validation.
Structured PASS/FAIL (8806000) — TL explicitly assesses each task. Build errors auto-FAIL.
file_edit retry on concurrent mod (f167343) — Max 3 attempts, 100ms backoff.
TL-suggested build command (f15b88c) — Replaced hardcoded detection with TL's contextual knowledge.
DoD per task (8066437) — 2-4 criteria per task. Jr references DoD to scope work.

Bug chain:

Shell redirect as literal arg (b139278), serde default override (b86092f) — Jr lost all tools!

Phase 10: Prompt Engineering (`3b08bcc`..`8288589`)

Token optimization via better prompts.

Key decisions:

Token-efficient TL prompt — 4 phases, tools only for validation. Eliminates wasted round-trips.
Jr max_tool_rounds: 15→8 — Most tasks done in 3-6 rounds.
CONTEXT blocks — TL embeds file contents in tasks. Biggest token optimization.
Structured handoff format — TASK/FILE_TARGETS/IMPLEMENTATION_HINTS/CONTEXT/DoD/DEPENDS_ON.
PM: no tools — Final decision after iterate: tools→no tools→tools→no tools.

Phase 11: P0 Fixes (`d39bd24`..`43a74b1`)

Hardening from CTPO report.

Legacy removal (d39bd24) — Deleted ~450 lines of deprecated workflow.
Per-phase timeout (122a2d2) — 120s default. PhaseTimeout error variant.
CAS for recursion depth (cbd1e03) — Fixed race in concurrent team_start.
is_retryable hardening (cb48186) — Permanent errors no longer retried.
Validation in PM delivery (b8f37a7) — PM gets explicit failure context.

Phase 12: P1/P2 Improvements (`4b0ba8f`..`9b85765`)

Streaming, retry, robustness.

Key decisions:

TokenUpdate/StreamChunk events — Minimal API surface change, existing channel reused.
PM/TL streaming (f5b9702) — Real-time TUI display. Jr stays on prompt() for rich result.
Validation retry loop (1933004) — Failed tasks re-executed with retry context. UUID + index fallback matching.
Token cost surfacing (81c8ab3) — "Xk in / Yk out tokens" in finish summary.
Robust parse_tasks (529bf41) — State machine: case-insensitive, markdown prefixes, CONTEXT-aware, numeric deps.
Multiple CONTEXT blocks (c6871a1) — Independent truncation per block.

Known remaining issues:

Token counters show 0 for PM/TL phases (streaming discards Done(usage))
Finished(false) after successful retry (original count not adjusted)
PM delivery shows false "tasks FAILED" after retry succeeds

marioidival added 23 commits March 16, 2026 14:49

docs(cli): improve system prompt structure and decision framework

a913869

Restructure system prompt into clear sections (Core Behavior, Communication Style, Work Protocols, Decision Framework, Safety Boundaries, Constraints) with concrete examples and action rules.

build(agent): add chrono and tokio/sync dependencies for team module

9ba1f30

- Add chrono with serde feature for timestamped team events - Enable tokio sync feature for RwLock used in shared team history

feat(agent): add TeamHistory and TeamEvent types

4786855

Append-only event log with timestamp, role, action, and content. Used to track the full lifecycle of team executions for debugging and the /team history subcommand.

feat(agent): add Task, TaskResult and task parser

4dab49c

- Task struct with id, description, and status (Pending/InProgress/Completed/Failed) - TaskResult with output and success flag - parse_tasks() to extract TASK: lines from TL responses

feat(agent): add TeamAgent with LLM streaming and tool execution

a247e1b

Specialized agent wrapping LlmProvider with role-specific system prompts. Handles streaming response collection, tool call accumulation, automatic tool execution with result feedback, and multi-turn tool loops.

feat(agent): add team workflow pipeline

700b3dc

6-phase execution: PM analysis → TL plan → TL task breakdown → Jr task execution → TL validation → PM delivery. Includes WorkflowPhase enum for tracking and graceful fallback when TL produces no parseable tasks.

docs: add ADR-001 multi-agent team system

86d06f7

Architecture decision record covering problem statement, design, component structure, core abstractions, workflow, configuration, testing strategy, and implementation roadmap.

feat(llm): add MockLlmProvider for testing

db300c0

Returns pre-configured responses sequentially, cycling when exhausted. Supports `with_response()`, `with_responses()`, `with_name()`, `with_model()`, `call_count()`, and `reset()`. Re-exported as `limit_llm::MockLlmProvider`.

feat(team): add Role::default_model() and Role::default_tools()

3ded14f

- default_model: PM/TL → "gpt-4", Jr → "gpt-4o-mini" - default_tools: PM → [] (none), TL → ["bash"], Jr → ["file_read", "file_write", "file_edit", "bash"] - Derive Default for RoleConfig instead of manual impl

chore: update Cargo.lock

cae4d41

docs: update team ADR to reflect completed implementation

48a9175

marioidival self-assigned this Mar 16, 2026

marioidival added 6 commits March 16, 2026 17:20

fix: resolve all clippy warnings

f3e923c

- Remove unused `WorkflowPhase` import in integration tests - Convert runtime asserts on constants to const asserts - Replace `std::sync::Mutex` with `tokio::sync::Mutex` to fix `await_holding_lock` warning in parallel task execution

docs: add comprehensive team system documentation

7e6d18c

Covers CLI usage, workflow pipeline, configuration, architecture, error handling, persistence, testing, performance, design decisions, and programmatic API reference.

docs: add team system to README

ab9059f

- Add 'Multi-Agent Teams' to features list - Add /team subcommands to Available Commands table - Add TEAM_SYSTEM.md to Documentation section - Update limit-agent crate description

style: fmt

13011d6

marioidival added 29 commits March 18, 2026 16:05

fix(team): remove shell redirect from compilation check args

b139278

tokio::process::Command does not use a shell, so "2>&1" was passed as a literal argument causing "unexpected argument" errors. Stderr is already captured separately via Command::output().

refactor(team): improve role separation with structured handoffs

8288589

- PM: remove code exploration, focus on product/business requirements - TL: add FILE_TARGETS and IMPLEMENTATION_HINTS to every task - Jr: explicit instruction to follow hints, minimal exploration

refactor(team): remove deprecated execute_workflow legacy function

d39bd24

fix(team): prevent is_retryable from matching permanent retry errors

cb48186

Errors containing "retries exceeded" or "after retry" are permanent failures, not transient. Tighten the match to avoid infinite retries.

feat(team): add per-phase timeout (default 120s) to orchestrator

122a2d2

Ask PM/TL with tokio::time::timeout. Adds PhaseTimeout error variant.

fix(team): use CAS loop for recursion depth to prevent race condition

cbd1e03

Replace check-then-act with compare_exchange_weak to atomically reserve a depth slot before incrementing.

fix(team): surface validation failures explicitly in PM delivery

b8f37a7

Prepend warning context to validation text so PM knows which tasks failed and must be re-done.

fix(team): remove tools from PM default config

fb76836

PM works from provided text only — no need for bash/file_read.

chore(team): remove dead code left by legacy workflow removal

43a74b1

Remove DEFAULT_MAX_TOOL_ROUNDS constant and TeamAgent::placeholder() which were only used by the deleted execute_tasks_parallel.

style(team): apply rustfmt to recent P0/P1 fix commits

4215682

feat(team): add TokenUpdate progress event variant

4b0ba8f

feat(team): retry tasks that fail TL validation

1933004

feat(team): surface token costs in TUI progress panel

81c8ab3

feat(team): add StreamChunk progress event and stream_prompt to Agent…

abd2245

…Actor

feat(team): wire PM/TL agents to stream responses to TUI

f5b9702

style(team): apply rustfmt to P1/P2 improvement commits

8025a16

fix(team): make task parsing robust to LLM output variations

529bf41

Case-insensitive TASK/DEPENDS_ON matching, markdown list prefix stripping, CONTEXT-block-aware DEPENDS_ON parsing, bold/code prefix handling, numeric dependency refs, and parse-failure tracing.

fix(team): handle multiple CONTEXT blocks in truncate_context

c6871a1

fix(team): handle multiple CONTEXT blocks in truncate_context

989aeb9

--ammend

9b85765

marioidival commented Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: multi-agent team command (/team)#10

feat: multi-agent team command (/team)#10
marioidival wants to merge 94 commits into
trunkfrom
feat/teams

marioidival commented Mar 16, 2026 •

edited

Loading

Uh oh!

marioidival left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marioidival commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Evolution

Phase 1: Foundation (commits 1-10)

Phase 2: Configuration & Hardening (commits 11-18)

Phase 3: Real Parallelism & Performance (commits 19-28)

Phase 4: Security & Quality (commits 29-35)

Phase 5: Actor System Rewrite (commits 36-42)

Phase 6: Observability & Telemetry (commits 43-48)

Phase 7: Per-Role Flexibility (commits 49-52)

Phase 8: Self-Referential Tools & Recursion (commits 53-58)

Phase 9: Build Verification & Validation (commits 59-66)

Phase 10: Prompt Engineering & Role Separation (commits 67-72)

Phase 11: P0 Fixes (commits 73-82)

Phase 12: P1/P2 Improvements (commits 83-93)

Key Design Decisions

Config

Test Plan

Uh oh!

marioidival left a comment

Choose a reason for hiding this comment

Phase 1: Foundation (a913869..86d06f7)

Key decisions:

Pain point:

Phase 2: Configuration & Hardening (724483d..993b5a5)

Key decisions:

Phase 3: Real Parallelism (ebfabd8..ab9059f)

Key decisions:

Phase 4: Security & Quality (4d2d79a..9ce746d)

Phase 5: Actor System Rewrite (7ca96be..b9415dd)

Why rewrite?

Key decisions:

Phase 6: Observability (ad54458..9ef953d)

Key decisions:

Phase 7: Per-Role Flexibility (835045f..7156518)

Key decisions:

Phase 8: Recursion (f5fc4dd..0887f30)

Key decisions:

Phase 9: Build Verification (4b403a1..b86092f)

Key decisions:

Bug chain:

Phase 10: Prompt Engineering (3b08bcc..8288589)

Key decisions:

Phase 11: P0 Fixes (d39bd24..43a74b1)

Phase 12: P1/P2 Improvements (4b0ba8f..9b85765)

Key decisions:

Known remaining issues:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marioidival commented Mar 16, 2026 •

edited

Loading

Phase 1: Foundation (`a913869`..`86d06f7`)

Phase 2: Configuration & Hardening (`724483d`..`993b5a5`)

Phase 3: Real Parallelism (`ebfabd8`..`ab9059f`)

Phase 4: Security & Quality (`4d2d79a`..`9ce746d`)

Phase 5: Actor System Rewrite (`7ca96be`..`b9415dd`)

Phase 6: Observability (`ad54458`..`9ef953d`)

Phase 7: Per-Role Flexibility (`835045f`..`7156518`)

Phase 8: Recursion (`f5fc4dd`..`0887f30`)

Phase 9: Build Verification (`4b403a1`..`b86092f`)

Phase 10: Prompt Engineering (`3b08bcc`..`8288589`)

Phase 11: P0 Fixes (`d39bd24`..`43a74b1`)

Phase 12: P1/P2 Improvements (`4b0ba8f`..`9b85765`)