Skip to content

feat: multi-agent team command (/team)#10

Open
marioidival wants to merge 94 commits into
trunkfrom
feat/teams
Open

feat: multi-agent team command (/team)#10
marioidival wants to merge 94 commits into
trunkfrom
feat/teams

Conversation

@marioidival

@marioidival marioidival commented Mar 16, 2026

Copy link
Copy Markdown
Owner

Summary

Multi-agent team system with the /team command — specialized roles (PM, TL, Jr) coordinate complex tasks through a structured 6-phase workflow with actor-based orchestration, parallel task execution, TUI streaming, build verification, and SQLite persistence.

106 commits | 60 files changed | +9,353 / -129 lines

Architecture

User Request
    │
    ▼
┌─────────────────────────────────────────────────┐
│              OrchestratorActor                   │
│  (state machine, tokio task, oneshot I/O)       │
│                                                  │
│  Phase 1: PM Analysis                            │
│  Phase 2: TL Plan                                │
│  Phase 3: TL Breakdown  → parse_tasks()          │
│  Phase 4: Jr Execution   → buffer_unordered       │
│  Phase 5: TL Validation  → build + verify         │
│  Phase 6: PM Delivery                            │
└──────────┬──────────────┬───────────┬────────────┘
           │              │           │
     ┌─────▼───┐   ┌─────▼───┐  ┌───▼──────────┐
     │AgentActor│   │AgentActor│  │  AgentActor[]│
     │  (PM)   │   │  (TL)   │  │  (Jr x N)   │
     └────┬────┘   └────┬────┘  └─────┬───────┘
          │             │             │
     ┌────▼────┐   ┌───▼────┐   ┌───▼──────────┐
     │TeamAgent│   │TeamAgent│   │  TeamAgent   │
     │(stream) │   │(stream) │   │ (tools)      │
     └─────────┘   └─────────┘   └──────────────┘

Evolution

See review comments below for a chronological walkthrough of each major design decision.

Phase 1: Foundation (commits 1-10)

Core types (Role, Task, TeamHistory), TeamAgent with LLM streaming, 6-phase workflow pipeline, /team CLI command, ADR-001.

Phase 2: Configuration & Hardening (commits 11-18)

Config-driven setup with per-role tool whitelists, error recovery with exponential backoff, MockLlmProvider for testing, role defaults, JSON persistence, integration tests.

Phase 3: Real Parallelism & Performance (commits 19-28)

True parallel Jr execution via buffer_unordered, prompt_stream() for streaming, phase logging, TeamSnapshot/TeamStore persistence, comprehensive docs.

Phase 4: Security & Quality (commits 29-35)

Fixed critical parallelism bug (serialized despite buffer_unordered), Jr tool restriction, PR review fixes, path sanitization with canonicalization, Arc<Vec<Message>> for COW history, ToolCallAccumulator extraction, parking_lot::Mutex.

Phase 5: Actor System Rewrite (commits 36-42)

Replaced sequential workflow with actor model: ActorRef<TeamMessage>, OrchestratorActor, AgentActor, oneshot request-response pattern, topological task dependency ordering, TeamProgressEvent system for TUI.

Phase 6: Observability & Telemetry (commits 43-48)

Detailed actor execution logging, file tracking at tool level (replacing regex extraction), token usage accumulation via Arc<AtomicU64>, TUI progress panel widget with spinner, phase bar, task list.

Phase 7: Per-Role Flexibility (commits 49-52)

Per-role model/provider selection with config inheritance, SQLite persistence replacing JSON, dead code removal.

Phase 8: Self-Referential Tools & Recursion (commits 53-58)

ToolRegistry interior mutability via RwLock, team_start tool for recursive team spawning with recursion depth guard.

Phase 9: Build Verification & Validation (commits 59-66)

Compilation check after Jr execution, structured TL validation (PASS/FAIL per task), file_edit retry on concurrent modification, TL-suggested build command replacing hardcoded detection.

Phase 10: Prompt Engineering & Role Separation (commits 67-72)

DoD per task, Jr max_tool_rounds 15→8, token-efficient prompts with CONTEXT blocks, structured handoff format (TASK/FILE_TARGETS/IMPLEMENTATION_HINTS/DoD), improved role separation.

Phase 11: P0 Fixes (commits 73-82)

Legacy workflow removal, per-phase timeout (120s), CAS loop for recursion depth, is_retryable hardening, validation failure surfacing in PM delivery, PM tool removal.

Phase 12: P1/P2 Improvements (commits 83-93)

TokenUpdate/StreamChunk progress events, PM/TL streaming to TUI, validation retry loop, token cost surfacing, robust task parsing (case-insensitive, CONTEXT-aware, markdown prefix stripping, numeric deps), multiple CONTEXT block handling in truncate_context.

Key Design Decisions

Decision Rationale
Actor model over sequential loop True parallelism, bounded mailboxes, clean shutdown
Per-role provider override Cost optimization (cheap Jr model) + quality (strong PM/TL model)
TL-suggested build command Works for any language, not just heuristic file probes
Topological dependency ordering Tasks with DEPENDS_ON wait for completion, deadlock detection
CONTEXT blocks in TL breakdown Jr agents don't waste tool calls re-reading files
Structured handoff format TL provides FILE_TARGETS + IMPLEMENTATION_HINTS + DoD
Validation retry loop Failed tasks get one retry with "VALIDATION RETRY" context
parse_tasks state machine Robust to LLM output variations (casing, nesting, prefixes)

Config

[team]
default_juniors = 2
max_parallel_tasks = 4

[team.roles.pm]
model = "glm-5-turbo"

[team.roles.tl]
model = "glm-5-turbo"

[team.roles.jr]
model = "glm-4.7"

Test Plan

  • 700+ unit tests pass across all crates
  • 0 clippy warnings
  • Integration test with MockLlmProvider covers full pipeline
  • Integration test for validation retry loop
  • Manual testing with real LLM (nuveo team, 9 tasks, recursive)
  • E2E smoke tests with real LLM providers

Restructure system prompt into clear sections (Core Behavior,
Communication Style, Work Protocols, Decision Framework, Safety
Boundaries, Constraints) with concrete examples and action rules.
- Add chrono with serde feature for timestamped team events
- Enable tokio sync feature for RwLock used in shared team history
- Add Role enum (PM, TL, Jr) with system_prompt() and label()
- Add system prompts for each role via include_str!
  - PM: requirement analysis and product vision
  - TL: architecture, task breakdown, validation
  - Jr: task execution with tools
Append-only event log with timestamp, role, action, and content.
Used to track the full lifecycle of team executions for debugging
and the /team history subcommand.
- Task struct with id, description, and status (Pending/InProgress/Completed/Failed)
- TaskResult with output and success flag
- parse_tasks() to extract TASK: lines from TL responses
Specialized agent wrapping LlmProvider with role-specific system prompts.
Handles streaming response collection, tool call accumulation, automatic
tool execution with result feedback, and multi-turn tool loops.
6-phase execution: PM analysis → TL plan → TL task breakdown →
Jr task execution → TL validation → PM delivery.

Includes WorkflowPhase enum for tracking and graceful fallback
when TL produces no parseable tasks.
- Team struct composing PM, TL, and Jr agents with shared history
- TeamConfig with num_juniors and max_parallel_tasks
- Public re-exports: Role, Team, TeamAgent, TeamConfig, TeamEvent,
  TeamHistory, TeamResult
- Register team module in lib.rs
- /team create --name <n> [--juniors N]: register team
- /team delete <n>: remove team
- /team list: show all teams
- /team status <n>: show team info
- /team start --team <n> --task <desc>: execute with real LLM provider
- /team history <n>: show execution events

/team start loads config, creates provider via ProviderFactory,
builds a ToolRegistry with all 17 tools, and runs the full
PM→TL→Jr→TL→PM workflow. Register TeamCommand in default registry.
Architecture decision record covering problem statement, design,
component structure, core abstractions, workflow, configuration,
testing strategy, and implementation roadmap.
…mmand trait

Root cause: tokio::runtime::Handle::current().block_on() panics when
called from within a thread that has no active tokio runtime. The TUI
runs commands synchronously in the main event loop thread, which uses
std::thread, not tokio.

Changes:
- Replace Arc<RwLock<HashMap>> with std::sync::Mutex<HashMap> for
  teams storage — no async runtime needed for create/delete/list/status
- Move /team start execution to a dedicated std::thread::spawn with
  its own tokio::runtime::Runtime::new() (same pattern as handle_enter)
- /team history uses a fresh Runtime::new() for the single .await
- Write results back to ChatView via Arc<Mutex<ChatView>> instead of
  CommandContext (which is not Send)
Phase 3 staged changes:

- Add `team_raw: Option<toml::Value>` to `limit-llm::Config` for opaque
  `[team]` section passthrough (avoids coupling limit-llm to limit-agent)
- Add `RoleConfig`, `TeamSection`, `TeamRolesSection` types with full
  TOML deserialization and sensible defaults (PM: no tools, TL: bash,
  Jr: all tools)
- Add `TeamAgent::with_allowed_tools()` to create filtered tool registries
- Add `ToolRegistry::register_arc()` for sharing pre-built Arc<dyn Tool>
- Update `TeamConfig` with `roles: TeamRolesSection` and
  `TeamConfig::from_section()` builder
- Update `Team::new()` to pass per-role tool whitelists to agents
- Update `/team start` to read config from `config.toml` `[team]` section
- Make `Role` enum `Copy` (no data variants)
- Refactor bridge acquisition pattern in app_impl.rs
- Update all test fixtures with `team_raw` field
- Add `toml = "0.8"` dependency to limit-agent
- Update ADR with implementation progress and design divergence appendix
Phase 3 & 4 remaining items:

Error recovery:
- Exponential backoff retry on transient LLM errors (rate limits, 429,
  502, 503, timeout, overloaded). Max 3 retries with 1s/2s/4s delays.
- Jr task retry once on failure before marking as failed.
- is_retryable() heuristic for transient error detection.

Observability:
- EventLevel enum (Info/Warn/Error) on TeamEvent with Display impl.
- TeamHistory::add_warn(), add_error(), error_count() helpers.
- TeamResult now tracks failed_tasks, total_tasks, total_retries.

CLI improvements:
- /team start shows task success/failure counts and actionable error
  hints (rate limit, auth, timeout).
- /team history shows summary header with error/warning counts.
- /help includes team command references.
- TeamError and LlmError variants added to AgentError.

Tests: 76 agent + 296 cli + full workspace pass (0 failures).
Returns pre-configured responses sequentially, cycling when exhausted.
Supports `with_response()`, `with_responses()`, `with_name()`,
`with_model()`, `call_count()`, and `reset()`.

Re-exported as `limit_llm::MockLlmProvider`.
- default_model: PM/TL → "gpt-4", Jr → "gpt-4o-mini"
- default_tools: PM → [] (none), TL → ["bash"], Jr → ["file_read", "file_write", "file_edit", "bash"]
- Derive Default for RoleConfig instead of manual impl
…erialize

- Add `enable_streaming: bool` field (default: true)
- Derive Serialize + Deserialize on TeamConfig for persistence
- Wire enable_streaming through from_section()
- Re-export TeamSnapshot and TeamStore from lib.rs
Returns `impl Stream<Item = Result<String, AgentError>>` that yields
content deltas as they arrive from the provider. Handles tool calls
automatically with retry logic matching prompt().

Refactor: extract is_retryable_e() for reuse between prompt and
prompt_stream, use is_some_and() for history guard.
- Replace sequential for-loop with buffer_unordered(max_parallel)
  for Jr task execution (rename _max_parallel → max_parallel)
- Add files_modified: Vec<String> to TeamResult
- Implement extract_modified_files() via regex on task outputs
- Add log_phase() to emit phase:XXX events for TUI progress
- Derive Default for EventLevel enum
…t/teams/

- TeamSnapshot: serializable team state (name, config, history, run_count)
- TeamStore: save/load/delete/list operations on JSON files
- Sanitizes team names with special characters
- Default directory: ~/.limit/teams/ (fallback to temp dir)
- Add dirs and tempfile deps to limit-agent
- Integrate TeamStore into TeamCommand (save on create, delete on
  delete, load on list)
- Report 🔄 phase transitions during /team start execution
- Show 📁 Files section when files_modified is non-empty
- Fix missing max_parallel_tasks in handle_create config
10 tests covering the full PM → TL → Jr pipeline:

- test_full_team_workflow: validates solution, tasks, files_modified
- test_workflow_phases_in_order: checks all 6 phases in sequence
- test_workflow_records_pm/tl/jr_events: per-role event verification
- test_workflow_empty_tasks_returns_early: early return path
- test_workflow_duration_is_positive
- test_team_create_and_events: team structure validation
- test_team_reset_clears_state
- test_team_persistence_save_load
@marioidival marioidival self-assigned this Mar 16, 2026
- Remove unused `WorkflowPhase` import in integration tests
- Convert runtime asserts on constants to const asserts
- Replace `std::sync::Mutex` with `tokio::sync::Mutex` to fix
  `await_holding_lock` warning in parallel task execution
Covers CLI usage, workflow pipeline, configuration, architecture,
error handling, persistence, testing, performance, design decisions,
and programmatic API reference.
- Add 'Multi-Agent Teams' to features list
- Add /team subcommands to Available Commands table
- Add TEAM_SYSTEM.md to Documentation section
- Update limit-agent crate description
Critical #1: Parallel execution was serialized
- Changed from single Mutex<&mut [TeamAgent]> to Vec<Arc<Mutex<TeamAgent>>>
- Each Jr agent now has its own mutex, enabling true parallel execution
- buffer_unordered now executes tasks in parallel across different agents

Critical #2: Jr agents had unrestricted tool access
- Changed default from tools: None (all tools) to Role::Jr.default_tools()
- Jr now only has safe tools: file_read, file_write, file_edit, bash
- Dangerous tools (git_push, web_fetch, etc.) require explicit opt-in

Both issues identified in PR #10 review were valid and have been fixed.
High Priority Fixes:
- Fix build_llm_tools() returning empty Vec - now builds tool definitions from registry
- Fix ToolCallDelta argument accumulation - handles both Object and String fragments

Medium Priority Fixes:
- Improve is_retryable() with word-boundary matching to avoid false positives
- Add trim_history() to prevent unbounded conversation history growth
- Enhance path sanitization in TeamStore with protection against .., null bytes, reserved names

Low Priority Fixes:
- Track total_retries using AtomicUsize for thread-safe parallel task tracking
- Use OnceLock for regex compilation to avoid repeated compilation

All 125 tests pass.
Replace generic "note any issues" TL prompt with structured per-task
PASS/FAIL assessment. Include compilation output in prompt and
require TL to explicitly FAIL tasks that caused build errors.
Also adds progress_tx to AgentActor and JrExecute sub-status updates.
Add retry loop (max 3 attempts, 100ms backoff) to file_edit tool.
On "old_text not found" or write failure, re-read the file and retry
instead of failing immediately. Prevents race conditions when multiple
Jr agents edit the same file concurrently.
tokio::process::Command does not use a shell, so "2>&1" was passed as
a literal argument causing "unexpected argument" errors. Stderr is
already captured separately via Command::output().
Remove run_compilation_check() that detected build systems via file
probes. Instead, ask TL to suggest a build command based on the
project context (TlSuggestBuildCommand message). The orchestrator
parses the response, runs it via `sh -c`, and passes results to
TlValidate for assessment.
The serde per-field default (default_tools_empty) gave Some(vec![])
to ALL roles when `tools` was absent from TOML, including Jr which
needs file_read/file_write/file_edit/bash. TeamRolesSection::default()
was only used when the role section was completely absent, not when
partially specified (e.g., provider+model without tools).

Now absent `tools` deserializes as None, then normalize_tools() fills
in role-specific defaults: PM/TL → empty, Jr → restricted tool set.
…lidation

- TL breakdown now requires DEFINITION_OF_DONE (2-4 criteria per task)
- Jr prompt references DoD to scope work and avoid exploration bloat
- Jr max_tool_rounds: 15 → 8 (most tasks complete in 3-6 rounds)
- TL validation now fails tasks that caused build errors (was ignored)
- Trim build output to 3000 chars to reduce token bloat in TL context
- Parse **FAIL** lines from TL validation output instead of ignoring them
- Include validation failures in failed_tasks count (DB status follows)
- Fix hardcoded Finished(true) to reflect actual failure state
- Guard against empty/exploratory TL plans (< 200 chars or "explore")
- Strengthen TlPlan prompt to forbid exploration responses
- Rewrite TL system prompt: 4 distinct phases, NO tools for 1-3,
  tools (bash, file_read) for validation phase 4
- Refine Jr system prompt: explicit 5-step workflow, restrict bash
  to build/test only (no ls/find/cat exploration)
- Add structured output format to PM analysis (Core Problem,
  Key Requirements, Constraints, Success Criteria)
- Allow CONTEXT blocks in TlBreakdown so TL provides file contents
  to Jr, eliminating exploration waste
- TL validate now runs build command itself with tools instead of
  receiving truncated build_output from orchestrator
- Remove orchestrator's run_command() and 3000-char truncation
- TL default tools: ["bash", "file_read"], max_tool_rounds: 15
PM now has bash + file_read (5 tool rounds) to explore the project
before analyzing, preventing wrong stack selection (e.g. Python in a
Rust project). PmAnalyze explicitly asks PM to read Cargo.toml or
equivalent and includes a Project Context section in output.

Also fix 2 clippy warnings in team_progress.rs:
- format_in_format_args: flatten nested format! call
- unnecessary_cast: remove redundant usize -> usize cast
- PM: remove code exploration, focus on product/business requirements
- TL: add FILE_TARGETS and IMPLEMENTATION_HINTS to every task
- Jr: explicit instruction to follow hints, minimal exploration
Errors containing "retries exceeded" or "after retry" are permanent
failures, not transient. Tighten the match to avoid infinite retries.
Ask PM/TL with tokio::time::timeout. Adds PhaseTimeout error variant.
Replace check-then-act with compare_exchange_weak to atomically
reserve a depth slot before incrementing.
Prepend warning context to validation text so PM knows which
tasks failed and must be re-done.
PM works from provided text only — no need for bash/file_read.
Remove DEFAULT_MAX_TOOL_ROUNDS constant and TeamAgent::placeholder()
which were only used by the deleted execute_tasks_parallel.
The retry logic could never match failed tasks because TL validation
responses use numeric indices ("## Task: 1") but the orchestrator
compared against UUIDs. Fix by falling back to 1-based index matching
when UUID matching fails, and enrich the results_summary with task
index + description so the TL has context for its validation output.
Case-insensitive TASK/DEPENDS_ON matching, markdown list prefix
stripping, CONTEXT-block-aware DEPENDS_ON parsing, bold/code
prefix handling, numeric dependency refs, and parse-failure tracing.

@marioidival marioidival left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 1: Foundation (a913869..86d06f7)

Core architecture decision: PM/TL/Jr role metaphor

The initial 10 commits establish the fundamental metaphor: a product manager, tech lead, and junior developers collaborate through a structured 6-phase workflow.

Key decisions:

  • Role enum with include_str! prompts — Each role gets an embedded system prompt compiled into the binary. Simple, no runtime file I/O, but prompt changes require recompilation. Chosen for reliability over flexibility.
  • TeamHistory as append-only event log — Every action is timestamped. Critical for debugging multi-agent flows.
  • parse_tasks() extracting TASK: lines — Simple line-splitting initially, later evolved into a robust state machine (commit 529bf41).
  • TeamAgent wrapping LlmProvider — One LLM abstraction, multiple specialized agents.

Pain point:

  • commit 309090c: tokio::sync::RwLock in the Command trait crashed — TUI runs commands in std::thread, not tokio. Fixed with std::sync::Mutex + dedicated thread.

Phase 2: Configuration & Hardening (724483d..993b5a5)

Key decisions:

  • team_raw: Option<toml::Value> in limit-llm::Config — Opaque TOML passthrough avoids coupling LLM crate to agent types.
  • Per-role tool whitelists — PM: no tools, TL: bash only, Jr: safe subset. Prevents PM "exploring code".
  • Exponential backoff retry — 1s/2s/4s delays for transient errors (429, 502, 503). is_retryable() heuristic.
  • MockLlmProvider — Returns pre-configured responses in order. Made the system testable without API keys.
  • Jr defaults to cheaper modelgpt-4o-mini for Jr, gpt-4 for PM/TL. Cross-model composition is a core feature.

Phase 3: Real Parallelism (ebfabd8..ab9059f)

Key decisions:

  • buffer_unordered(max_parallel) — Replaced sequential loop. Biggest throughput improvement.
  • prompt_stream() — Returns impl Stream<Item = Result<String, AgentError>> for streaming responses.
  • JSON persistence — Later replaced with SQLite, but established the pattern.

Phase 4: Security & Quality (4d2d79a..9ce746d)

Critical fixes from PR review:

  • commit 4d2d79a: Jr had ALL tools (including git_push)! Fixed with safe defaults.
  • commit 4d2d79a: Parallel execution was serialized by single Mutex! Fixed with per-agent mutexes.
  • Path canonicalization (c35896d) — Defense-in-depth against traversal attacks.
  • Arc<Vec<Message>> COW (4c53559) — History cloning went from O(n) to O(1) per request.
  • ToolCallAccumulator (58eeaf0) — Unified duplicated tool-call accumulation.
  • parking_lot::Mutex (9ce746d) — No poisoning overhead, better performance.

Phase 5: Actor System Rewrite (7ca96be..b9415dd)

The big architectural shift.

Why rewrite?

Sequential execute_workflow() couldn't handle: true parallelism, cancellation, per-role streaming, topological ordering.

Key decisions:

  • ActorRef<T> + bounded mpsc (32) — Each agent in its own tokio task.
  • Oneshot channels for request-response pattern.
  • Topological dependency ordering for DEPENDS_ON directives.
  • PromptResult with usage, hit_tool_limit, files_modified.
  • TeamProgressEvent with nesting for recursive teams.

Phase 6: Observability (ad54458..9ef953d)

Key decisions:

  • File tracking at tool level (ea9b062) — Replaced regex extraction. Direct tracking via tool calls.
  • Token accumulation via Arc<AtomicU64> (e1b516f) — Shared counters across all agents.
  • TL breakdown prompt simplification (e1b516f) — Removed file-reading from TL. CONTEXT blocks carry contents.
  • TUI progress panel — Spinner, phase bar, scrollable task list, finish summary.

Phase 7: Per-Role Flexibility (835045f..7156518)

Key decisions:

  • Per-role provider override (2c6100f) — Cross-provider mixing: Anthropic PM/TL + OpenAI Jr.
  • SQLite persistence (0d05fa1) — WAL mode, atomic transactions, cascade deletes. Replaced JSON.
  • Config bug fix (835045f) — team_raw didn't match [team] TOML key. Configs silently ignored!

Phase 8: Recursion (f5fc4dd..0887f30)

Key decisions:

  • ToolRegistry with RwLock — Enables self-referential team_start tool.
  • team_start tool (606cb33) — Agents spawn child teams. Recursion depth guard (default 2).
  • TL default tools: ["bash", "file_read"] — Needed for validation. Evolved from "planning-only".
  • system_prompt_with_context() (0887f30) — Injects CWD into prompts. Prevents wrong language selection.

Phase 9: Build Verification (4b403a1..b86092f)

Closing the feedback loop.

Key decisions:

  • Compilation check after Jr (9ff3b49) — Build output passed to TL validation.
  • Structured PASS/FAIL (8806000) — TL explicitly assesses each task. Build errors auto-FAIL.
  • file_edit retry on concurrent mod (f167343) — Max 3 attempts, 100ms backoff.
  • TL-suggested build command (f15b88c) — Replaced hardcoded detection with TL's contextual knowledge.
  • DoD per task (8066437) — 2-4 criteria per task. Jr references DoD to scope work.

Bug chain:

  • Shell redirect as literal arg (b139278), serde default override (b86092f) — Jr lost all tools!

Phase 10: Prompt Engineering (3b08bcc..8288589)

Token optimization via better prompts.

Key decisions:

  • Token-efficient TL prompt — 4 phases, tools only for validation. Eliminates wasted round-trips.
  • Jr max_tool_rounds: 15→8 — Most tasks done in 3-6 rounds.
  • CONTEXT blocks — TL embeds file contents in tasks. Biggest token optimization.
  • Structured handoff format — TASK/FILE_TARGETS/IMPLEMENTATION_HINTS/CONTEXT/DoD/DEPENDS_ON.
  • PM: no tools — Final decision after iterate: tools→no tools→tools→no tools.

Phase 11: P0 Fixes (d39bd24..43a74b1)

Hardening from CTPO report.

  • Legacy removal (d39bd24) — Deleted ~450 lines of deprecated workflow.
  • Per-phase timeout (122a2d2) — 120s default. PhaseTimeout error variant.
  • CAS for recursion depth (cbd1e03) — Fixed race in concurrent team_start.
  • is_retryable hardening (cb48186) — Permanent errors no longer retried.
  • Validation in PM delivery (b8f37a7) — PM gets explicit failure context.

Phase 12: P1/P2 Improvements (4b0ba8f..9b85765)

Streaming, retry, robustness.

Key decisions:

  • TokenUpdate/StreamChunk events — Minimal API surface change, existing channel reused.
  • PM/TL streaming (f5b9702) — Real-time TUI display. Jr stays on prompt() for rich result.
  • Validation retry loop (1933004) — Failed tasks re-executed with retry context. UUID + index fallback matching.
  • Token cost surfacing (81c8ab3) — "Xk in / Yk out tokens" in finish summary.
  • Robust parse_tasks (529bf41) — State machine: case-insensitive, markdown prefixes, CONTEXT-aware, numeric deps.
  • Multiple CONTEXT blocks (c6871a1) — Independent truncation per block.

Known remaining issues:

  1. Token counters show 0 for PM/TL phases (streaming discards Done(usage))
  2. Finished(false) after successful retry (original count not adjusted)
  3. PM delivery shows false "tasks FAILED" after retry succeeds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant