MoonshotAI · Liewzheng · May 26, 2026 · May 26, 2026 · May 26, 2026 · May 26, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,6 +11,14 @@ Only write entries that are worth mentioning to users.
 
 ## Unreleased
 
+- Core: Add foreground subagent concurrency limit capped at 80% of available API keys or background task slots — prevents concurrent subagents from exhausting a single key's rate-limit quota
+- Core: Change default foreground subagent timeout from unlimited to 300 seconds, with optional override via `KIMI_FOREGROUND_AGENT_TIMEOUT` environment variable
+- Tool: Agent tool now distributes distinct API keys to concurrent subagents via round-robin when multiple keys are configured (`KIMI_API_KEY`, `KIMI_API_KEY_1`, …)
+- LLM: Add `APIKeyPool` for collecting and rotating multiple API keys from environment variables
+- LLM: Add `KeyPoolKimi` provider wrapper that automatically swaps to the next key on retryable errors (429, 500, 503)
+- Shell: Show subagent step counter, elapsed time, and live text preview in the shell UI
+- Auth: Fix OAuth token refresh crash when the LLM provider is wrapped by `KeyPoolKimi`
+
 ## 1.45.0 (2026-05-26)
 
 - Shell: `/clear` is now an alias for `/new` — both commands start a new session; previously `/clear` only cleared context without creating a new session

diff --git a/docs/en/configuration/env-vars.md b/docs/en/configuration/env-vars.md
@@ -12,6 +12,7 @@ The following environment variables take effect when using `kimi` type providers
 | --- | --- |
 | `KIMI_BASE_URL` | API base URL |
 | `KIMI_API_KEY` | API key |
+| `KIMI_API_KEY_1` … `KIMI_API_KEY_99` | Additional API keys for parallel subagent execution |
 | `KIMI_MODEL_NAME` | Model identifier |
 | `KIMI_MODEL_MAX_CONTEXT_SIZE` | Maximum context length (in tokens) |
 | `KIMI_MODEL_CAPABILITIES` | Model capabilities, comma-separated (e.g., `thinking,image_in`) |
@@ -36,6 +37,10 @@ Overrides the provider's `api_key` field in the configuration file. Used to inje
 export KIMI_API_KEY="sk-xxx"
 ```
 
+::: tip Multiple keys for parallel subagents
+When running multiple foreground subagents concurrently, you can configure additional API keys (`KIMI_API_KEY_1`, `KIMI_API_KEY_2`, … up to `KIMI_API_KEY_99`). Kimi CLI will create a key pool and assign a different key to each subagent in round-robin order, distributing rate-limit quota across keys instead of concentrating load on a single key.
+:::
+
 ### `KIMI_MODEL_NAME`
 
 Overrides the model's `model` field in the configuration file (the model identifier used in API calls).
@@ -132,6 +137,7 @@ export OPENAI_API_KEY="sk-xxx"
 | `KIMI_SHARE_DIR` | Customize the share directory path (default: `~/.kimi`) |
 | `KIMI_CLI_NO_AUTO_UPDATE` | Disable all update-related features |
 | `KIMI_CLI_PASTE_CHAR_THRESHOLD` | Character threshold for folding pasted text (default: `1000`) |
+| `KIMI_FOREGROUND_AGENT_TIMEOUT` | Foreground subagent timeout in seconds. `0` disables the timeout (default: `300`) |
 | `KIMI_CLI_PASTE_LINE_THRESHOLD` | Line threshold for folding pasted text (default: `15`) |
 
 ### `KIMI_SHARE_DIR`

diff --git a/docs/en/customization/agents.md b/docs/en/customization/agents.md
@@ -154,6 +154,14 @@ Subagents launched via the `Agent` tool run in an isolated context and return re
 - Subagents can have targeted system prompts
 - Persistent instances preserve context across multiple calls
 
+### Concurrency limits
+
+Foreground subagents share a concurrency cap to prevent overwhelming the API. The limit is `max(1, floor(key_pool_size × 0.8))` when multiple API keys are configured, otherwise `max(1, floor(max_running_tasks × 0.8))` from your config. If the cap is reached, the `Agent` tool returns a `ToolError` with a brief `Concurrency limit reached` message so the main agent can retry or queue work differently.
+
+### Multi-key parallel execution
+
+When `KIMI_API_KEY_1`, `KIMI_API_KEY_2`, … are configured, Kimi CLI builds an API key pool and assigns a **distinct key** to each foreground subagent in round-robin order. If a subagent hits a rate limit or retryable server error, the key is automatically rotated to the next one in the pool. This lets you run many subagents concurrently without concentrating load on a single API key. Each subagent request also carries an identifiable `User-Agent` header (e.g., `KimiCLI/1.44.0 (subagent: coder)`).
+
 ## Built-in tools list
 
 The following are all built-in tools in Kimi Code CLI.
@@ -171,7 +179,7 @@ The following are all built-in tools in Kimi Code CLI.
 | `model` | string | Optional model override |
 | `resume` | string | Optional agent instance ID to resume an existing instance |
 | `run_in_background` | bool | Whether to run in background, default false |
-| `timeout` | int | Timeout in seconds, range 30–3600. Foreground defaults to no timeout (runs until completion), background defaults to 15 minutes; the task is stopped if the limit is exceeded |
+| `timeout` | int | Timeout in seconds, range 30–3600. Foreground defaults to 300 seconds (or `KIMI_FOREGROUND_AGENT_TIMEOUT`), background defaults to 15 minutes; the task is stopped if the limit is exceeded. Set to `0` to disable the foreground timeout |
 
 ### `AskUserQuestion`
 

diff --git a/docs/en/release-notes/changelog.md b/docs/en/release-notes/changelog.md
@@ -4,6 +4,14 @@ This page documents the changes in each Kimi Code CLI release.
 
 ## Unreleased
 
+- Core: Add foreground subagent concurrency limit capped at 80% of available API keys or background task slots — prevents concurrent subagents from exhausting a single key's rate-limit quota
+- Core: Change default foreground subagent timeout from unlimited to 300 seconds, with optional override via `KIMI_FOREGROUND_AGENT_TIMEOUT` environment variable
+- Tool: Agent tool now distributes distinct API keys to concurrent subagents via round-robin when multiple keys are configured (`KIMI_API_KEY`, `KIMI_API_KEY_1`, …)
+- LLM: Add `APIKeyPool` for collecting and rotating multiple API keys from environment variables
+- LLM: Add `KeyPoolKimi` provider wrapper that automatically swaps to the next key on retryable errors (429, 500, 503)
+- Shell: Show subagent step counter, elapsed time, and live text preview in the shell UI
+- Auth: Fix OAuth token refresh crash when the LLM provider is wrapped by `KeyPoolKimi`
+
 ## 1.45.0 (2026-05-26)
 
 - Shell: `/clear` is now an alias for `/new` — both commands start a new session; previously `/clear` only cleared context without creating a new session

diff --git a/docs/zh/configuration/env-vars.md b/docs/zh/configuration/env-vars.md
@@ -12,6 +12,7 @@ Kimi Code CLI 支持通过环境变量覆盖配置或控制运行行为。本页
 | --- | --- |
 | `KIMI_BASE_URL` | API 基础 URL |
 | `KIMI_API_KEY` | API 密钥 |
+| `KIMI_API_KEY_1` … `KIMI_API_KEY_99` | 并行子代理的额外 API 密钥 |
 | `KIMI_MODEL_NAME` | 模型标识符 |
 | `KIMI_MODEL_MAX_CONTEXT_SIZE` | 最大上下文长度（token 数） |
 | `KIMI_MODEL_CAPABILITIES` | 模型能力，逗号分隔（如 `thinking,image_in`） |
@@ -36,6 +37,10 @@ export KIMI_BASE_URL="https://api.moonshot.cn/v1"
 export KIMI_API_KEY="sk-xxx"
 ```
 
+::: tip 多密钥并行子代理
+在同时运行多个前台子代理时，可配置额外的 API 密钥（`KIMI_API_KEY_1`、`KIMI_API_KEY_2`、… 直到 `KIMI_API_KEY_99`）。Kimi CLI 会创建密钥池，并以轮询方式为每个子代理分配不同密钥，将速率限制配额分散到多个密钥上，避免单密钥负载过高。
+:::
+
 ### `KIMI_MODEL_NAME`
 
 覆盖配置文件中模型的 `model` 字段（API 调用时使用的模型标识符）。
@@ -132,6 +137,7 @@ export OPENAI_API_KEY="sk-xxx"
 | `KIMI_SHARE_DIR` | 自定义共享目录路径（默认 `~/.kimi`） |
 | `KIMI_CLI_NO_AUTO_UPDATE` | 禁用所有更新相关功能 |
 | `KIMI_CLI_PASTE_CHAR_THRESHOLD` | 粘贴文本折叠的字符数阈值（默认 `1000`） |
+| `KIMI_FOREGROUND_AGENT_TIMEOUT` | 前台子代理超时时间（秒），`0` 表示不限制（默认 `300`） |
 | `KIMI_CLI_PASTE_LINE_THRESHOLD` | 粘贴文本折叠的行数阈值（默认 `15`） |
 
 ### `KIMI_SHARE_DIR`

diff --git a/docs/zh/customization/agents.md b/docs/zh/customization/agents.md
@@ -154,6 +154,14 @@ agent:
 - 子 Agent 可以有针对性的系统提示词
 - 持久实例可跨多次调用保留上下文
 
+### 并发限制
+
+前台子 Agent 共享一个并发上限，以避免对 API 造成过大压力。当配置了多个 API 密钥时，上限为 `max(1, floor(密钥池大小 × 0.8))`；否则使用配置中的 `max(1, floor(max_running_tasks × 0.8))`。达到上限后，`Agent` 工具会返回 `ToolError`，简短提示为 `Concurrency limit reached`，主 Agent 可以稍后重试或以其他方式调度任务。
+
+### 多密钥并行执行
+
+当配置了 `KIMI_API_KEY_1`、`KIMI_API_KEY_2` … 时，Kimi CLI 会构建 API 密钥池，并以轮询方式为每个前台子 Agent 分配**不同的密钥**。如果某个子 Agent 触发速率限制或可重试的服务器错误，密钥会自动轮换到池中的下一个。这使你可以并发运行大量子 Agent，而不会将负载集中在单个 API 密钥上。每个子 Agent 请求还会携带可识别的 `User-Agent` 请求头（例如 `KimiCLI/1.44.0 (subagent: coder)`）。
+
 ## 内置工具列表
 
 以下是 Kimi Code CLI 内置的所有工具。
@@ -171,7 +179,7 @@ agent:
 | `model` | string | 可选的模型覆盖 |
 | `resume` | string | 可选的 Agent 实例 ID，用于恢复现有实例 |
 | `run_in_background` | bool | 是否在后台运行，默认 false |
-| `timeout` | int | 超时时间（秒），范围 30–3600。前台默认无超时（运行到完成），后台默认 15 分钟；超时后任务会被停止 |
+| `timeout` | int | 超时时间（秒），范围 30–3600。前台默认 300 秒（或由 `KIMI_FOREGROUND_AGENT_TIMEOUT` 指定），后台默认 15 分钟；超时后任务会被停止。设置为 `0` 可关闭前台超时限制 |
 
 ### `AskUserQuestion`
 

diff --git a/docs/zh/release-notes/changelog.md b/docs/zh/release-notes/changelog.md
@@ -4,6 +4,14 @@
 
 ## 未发布
 
+- Core：新增前台子代理并发限制，上限为可用 API key 数量或后台任务槽数量的 80%——避免并发子代理耗尽单一 key 的速率限制配额
+- Core：将前台子代理默认超时从"无限制"改为 300 秒，支持通过环境变量 `KIMI_FOREGROUND_AGENT_TIMEOUT` 覆盖
+- Tool：当配置多个 key（`KIMI_API_KEY`、`KIMI_API_KEY_1`…）时，Agent 工具会为并发子代理轮询分配不同的 API key
+- LLM：新增 `APIKeyPool`，用于从环境变量收集和轮换多个 API key
+- LLM：新增 `KeyPoolKimi` provider 包装器，在遇到可重试错误（429、500、503）时自动切换到下一个 key
+- Shell：在终端界面中显示子代理的步进计数器、已耗时和实时文本预览
+- Auth：修复 LLM provider 被 `KeyPoolKimi` 包装时 OAuth token 刷新导致的崩溃
+
 ## 1.45.0 (2026-05-26)
 
 - Shell：`/clear` 现在成为 `/new` 的别名——两者都会启动新会话；此前 `/clear` 仅清空上下文而不创建新会话

diff --git a/packages/kosong/src/kosong/chat_provider/openai_common.py b/packages/kosong/src/kosong/chat_provider/openai_common.py
@@ -27,7 +27,12 @@ def create_openai_client(
     base_url: str | None,
     client_kwargs: Mapping[str, Any],
 ) -> AsyncOpenAI:
-    return AsyncOpenAI(api_key=api_key, base_url=base_url, **dict(client_kwargs))
+    kwargs = dict(client_kwargs)
+    # Apply a sensible default HTTP timeout when the caller has not supplied one.
+    # Prevents indefinite hangs when the API server stops responding mid-request.
+    if "timeout" not in kwargs:
+        kwargs["timeout"] = httpx.Timeout(connect=10.0, read=300.0, write=30.0, pool=30.0)
+    return AsyncOpenAI(api_key=api_key, base_url=base_url, **kwargs)
 
 
 async def _drain_awaitable(awaitable: Awaitable[object]) -> None:

diff --git a/src/kimi_cli/app.py b/src/kimi_cli/app.py
@@ -21,6 +21,7 @@
 from kimi_cli.config import Config, LLMModel, LLMProvider, load_config
 from kimi_cli.constant import VERSION
 from kimi_cli.llm import augment_provider_with_env_vars, create_llm, model_display_name
+from kimi_cli.llm_key_pool import APIKeyPool
 from kimi_cli.session import Session
 from kimi_cli.share import get_share_dir
 from kimi_cli.soul import RunCancelled, run_soul
@@ -254,6 +255,16 @@ async def create(
         if startup_progress is not None:
             startup_progress("Scanning workspace...")
 
+        # Initialise an optional API-key pool for parallel subagent execution.
+        # When the user has configured multiple keys (KIMI_API_KEY, KIMI_API_KEY_1,
+        # …) each subagent will be assigned a different key in round-robin order.
+        key_pool = APIKeyPool.from_env("KIMI_API_KEY")
+        if key_pool is not None:
+            logger.info(
+                "Subagent key pool initialised with {n} key(s)",
+                n=key_pool.key_count,
+            )
+
         runtime = await Runtime.create(
             config,
             oauth,
@@ -264,6 +275,7 @@ async def create(
             runtime_afk=runtime_afk,
             skills_dirs=skills_dirs,
         )
+        runtime.key_pool = key_pool
         runtime.ui_mode = ui_mode
         runtime.resumed = resumed
         runtime.notifications.recover()

diff --git a/src/kimi_cli/auth/oauth.py b/src/kimi_cli/auth/oauth.py
@@ -1080,10 +1080,26 @@ def _apply_access_token(self, runtime: Runtime | None, access_token: str) -> Non
             return
         from kosong.chat_provider.kimi import Kimi
 
-        assert isinstance(runtime.llm.chat_provider, Kimi), "Expected Kimi chat provider"
+        from kimi_cli.llm import unwrap_kimi_provider
+
+        chat_provider = unwrap_kimi_provider(runtime.llm.chat_provider)
+
+        assert isinstance(chat_provider, Kimi), "Expected Kimi chat provider"
+
+        # When the chat provider is wrapped by KeyPoolKimi, the pool (not OAuth)
+        # manages the API key. Overwriting the client key here would silently
+        # replace the pooled key with the OAuth token, breaking rotation.
+        # We gate on the wrapper type rather than runtime.key_pool existence,
+        # because a user may have a key pool configured for subagents while the
+        # root runtime still uses an unwrapped OAuth-based provider.
+        from kimi_cli.llm import KeyPoolKimi
+
+        if isinstance(runtime.llm.chat_provider, KeyPoolKimi):
+            return
+
         provider = runtime.config.providers.get(provider_key)
         fallback_api_key = provider.api_key.get_secret_value() if provider else ""
-        runtime.llm.chat_provider.client.api_key = access_token or fallback_api_key
+        chat_provider.client.api_key = access_token or fallback_api_key
 
 
 if __name__ == "__main__":