Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
63eeaf1
feat(subagent): add API key pool for parallel subagent execution
Liewzheng May 26, 2026
f7ff734
docs: update env-vars and agents docs for subagent multi-key parallel…
Liewzheng May 26, 2026
dc9eb9c
docs: update CHANGELOG for subagent multi-key parallel execution
Liewzheng May 26, 2026
c1b2853
fix(subagent): only inject key-pool keys for kimi provider; fix TOCTO…
Liewzheng May 26, 2026
304190a
fix(subagent): unwrap KeyPoolKimi in isinstance checks; limit key-poo…
Liewzheng May 26, 2026
45f3c74
fix(subagent): remove API key from log; use resume model for concurre…
Liewzheng May 26, 2026
fcab6e1
fix(subagent): remove API key from builder log; keep timed-out agents…
Liewzheng May 27, 2026
fc60a89
fix(subagent): address Codex Review round 4 feedback
Liewzheng May 27, 2026
a7b00bb
fix(subagent): resolve default model and count matching-provider runs…
Liewzheng May 27, 2026
b438f87
fix(subagent): clear pre-run timeouts from running_foreground
Liewzheng May 27, 2026
7b63d29
fix(subagent): rotate keys for streaming status retries
Liewzheng May 27, 2026
66cf0ae
refactor(subagent): centralise KeyPoolKimi unwrapping with unwrap_kim…
Liewzheng May 27, 2026
cc1d643
feat(subagent): add key health tracking with exponential cooldown to …
Liewzheng May 27, 2026
026e43e
fix(auth): preserve pooled API keys during OAuth refresh
Liewzheng May 27, 2026
ee7452c
fix(key-pool): reset key failures when cooldown expires
Liewzheng May 27, 2026
b50ef8f
fix(key-pool): advance fallback slot when all keys cooling down
Liewzheng May 27, 2026
53efcfd
fix(auth): only skip OAuth refresh for wrapped pooled providers
Liewzheng May 27, 2026
4e4267f
fix(agent): allow timeout=0 to disable foreground timeout
Liewzheng May 27, 2026
100b9bc
fix(agent): reject zero timeout for background agents
Liewzheng May 27, 2026
60d70eb
fix(agent): respect resume model overrides when applying provider caps
Liewzheng May 27, 2026
cb47dfc
fix(agent): persist resumed model override to store for provider cap …
Liewzheng May 28, 2026
bcd5f0b
fix(agent): defer persisting resume model override until launch succeeds
Liewzheng May 28, 2026
628b42b
fix(agent): keep non-Kimi foreground cap global
Liewzheng May 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,14 @@ Only write entries that are worth mentioning to users.

## Unreleased

- Core: Add foreground subagent concurrency limit capped at 80% of available API keys or background task slots — prevents concurrent subagents from exhausting a single key's rate-limit quota
- Core: Change default foreground subagent timeout from unlimited to 300 seconds, with optional override via `KIMI_FOREGROUND_AGENT_TIMEOUT` environment variable
- Tool: Agent tool now distributes distinct API keys to concurrent subagents via round-robin when multiple keys are configured (`KIMI_API_KEY`, `KIMI_API_KEY_1`, …)
- LLM: Add `APIKeyPool` for collecting and rotating multiple API keys from environment variables
- LLM: Add `KeyPoolKimi` provider wrapper that automatically swaps to the next key on retryable errors (429, 500, 503)
- Shell: Show subagent step counter, elapsed time, and live text preview in the shell UI
- Auth: Fix OAuth token refresh crash when the LLM provider is wrapped by `KeyPoolKimi`

## 1.45.0 (2026-05-26)

- Shell: `/clear` is now an alias for `/new` — both commands start a new session; previously `/clear` only cleared context without creating a new session
Expand Down
6 changes: 6 additions & 0 deletions docs/en/configuration/env-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ The following environment variables take effect when using `kimi` type providers
| --- | --- |
| `KIMI_BASE_URL` | API base URL |
| `KIMI_API_KEY` | API key |
| `KIMI_API_KEY_1` … `KIMI_API_KEY_99` | Additional API keys for parallel subagent execution |
| `KIMI_MODEL_NAME` | Model identifier |
| `KIMI_MODEL_MAX_CONTEXT_SIZE` | Maximum context length (in tokens) |
| `KIMI_MODEL_CAPABILITIES` | Model capabilities, comma-separated (e.g., `thinking,image_in`) |
Expand All @@ -36,6 +37,10 @@ Overrides the provider's `api_key` field in the configuration file. Used to inje
export KIMI_API_KEY="sk-xxx"
```

::: tip Multiple keys for parallel subagents
When running multiple foreground subagents concurrently, you can configure additional API keys (`KIMI_API_KEY_1`, `KIMI_API_KEY_2`, … up to `KIMI_API_KEY_99`). Kimi CLI will create a key pool and assign a different key to each subagent in round-robin order, distributing rate-limit quota across keys instead of concentrating load on a single key.
:::

### `KIMI_MODEL_NAME`

Overrides the model's `model` field in the configuration file (the model identifier used in API calls).
Expand Down Expand Up @@ -132,6 +137,7 @@ export OPENAI_API_KEY="sk-xxx"
| `KIMI_SHARE_DIR` | Customize the share directory path (default: `~/.kimi`) |
| `KIMI_CLI_NO_AUTO_UPDATE` | Disable all update-related features |
| `KIMI_CLI_PASTE_CHAR_THRESHOLD` | Character threshold for folding pasted text (default: `1000`) |
| `KIMI_FOREGROUND_AGENT_TIMEOUT` | Foreground subagent timeout in seconds. `0` disables the timeout (default: `300`) |
| `KIMI_CLI_PASTE_LINE_THRESHOLD` | Line threshold for folding pasted text (default: `15`) |

### `KIMI_SHARE_DIR`
Expand Down
10 changes: 9 additions & 1 deletion docs/en/customization/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,14 @@ Subagents launched via the `Agent` tool run in an isolated context and return re
- Subagents can have targeted system prompts
- Persistent instances preserve context across multiple calls

### Concurrency limits

Foreground subagents share a concurrency cap to prevent overwhelming the API. The limit is `max(1, floor(key_pool_size × 0.8))` when multiple API keys are configured, otherwise `max(1, floor(max_running_tasks × 0.8))` from your config. If the cap is reached, the `Agent` tool returns a `ToolError` with a brief `Concurrency limit reached` message so the main agent can retry or queue work differently.

### Multi-key parallel execution

When `KIMI_API_KEY_1`, `KIMI_API_KEY_2`, … are configured, Kimi CLI builds an API key pool and assigns a **distinct key** to each foreground subagent in round-robin order. If a subagent hits a rate limit or retryable server error, the key is automatically rotated to the next one in the pool. This lets you run many subagents concurrently without concentrating load on a single API key. Each subagent request also carries an identifiable `User-Agent` header (e.g., `KimiCLI/1.44.0 (subagent: coder)`).

## Built-in tools list

The following are all built-in tools in Kimi Code CLI.
Expand All @@ -171,7 +179,7 @@ The following are all built-in tools in Kimi Code CLI.
| `model` | string | Optional model override |
| `resume` | string | Optional agent instance ID to resume an existing instance |
| `run_in_background` | bool | Whether to run in background, default false |
| `timeout` | int | Timeout in seconds, range 30–3600. Foreground defaults to no timeout (runs until completion), background defaults to 15 minutes; the task is stopped if the limit is exceeded |
| `timeout` | int | Timeout in seconds, range 30–3600. Foreground defaults to 300 seconds (or `KIMI_FOREGROUND_AGENT_TIMEOUT`), background defaults to 15 minutes; the task is stopped if the limit is exceeded. Set to `0` to disable the foreground timeout |

### `AskUserQuestion`

Expand Down
8 changes: 8 additions & 0 deletions docs/en/release-notes/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ This page documents the changes in each Kimi Code CLI release.

## Unreleased

- Core: Add foreground subagent concurrency limit capped at 80% of available API keys or background task slots — prevents concurrent subagents from exhausting a single key's rate-limit quota
- Core: Change default foreground subagent timeout from unlimited to 300 seconds, with optional override via `KIMI_FOREGROUND_AGENT_TIMEOUT` environment variable
- Tool: Agent tool now distributes distinct API keys to concurrent subagents via round-robin when multiple keys are configured (`KIMI_API_KEY`, `KIMI_API_KEY_1`, …)
- LLM: Add `APIKeyPool` for collecting and rotating multiple API keys from environment variables
- LLM: Add `KeyPoolKimi` provider wrapper that automatically swaps to the next key on retryable errors (429, 500, 503)
- Shell: Show subagent step counter, elapsed time, and live text preview in the shell UI
- Auth: Fix OAuth token refresh crash when the LLM provider is wrapped by `KeyPoolKimi`

## 1.45.0 (2026-05-26)

- Shell: `/clear` is now an alias for `/new` — both commands start a new session; previously `/clear` only cleared context without creating a new session
Expand Down
6 changes: 6 additions & 0 deletions docs/zh/configuration/env-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Kimi Code CLI 支持通过环境变量覆盖配置或控制运行行为。本页
| --- | --- |
| `KIMI_BASE_URL` | API 基础 URL |
| `KIMI_API_KEY` | API 密钥 |
| `KIMI_API_KEY_1` … `KIMI_API_KEY_99` | 并行子代理的额外 API 密钥 |
| `KIMI_MODEL_NAME` | 模型标识符 |
| `KIMI_MODEL_MAX_CONTEXT_SIZE` | 最大上下文长度(token 数) |
| `KIMI_MODEL_CAPABILITIES` | 模型能力,逗号分隔(如 `thinking,image_in`) |
Expand All @@ -36,6 +37,10 @@ export KIMI_BASE_URL="https://api.moonshot.cn/v1"
export KIMI_API_KEY="sk-xxx"
```

::: tip 多密钥并行子代理
在同时运行多个前台子代理时,可配置额外的 API 密钥(`KIMI_API_KEY_1`、`KIMI_API_KEY_2`、… 直到 `KIMI_API_KEY_99`)。Kimi CLI 会创建密钥池,并以轮询方式为每个子代理分配不同密钥,将速率限制配额分散到多个密钥上,避免单密钥负载过高。
:::

### `KIMI_MODEL_NAME`

覆盖配置文件中模型的 `model` 字段(API 调用时使用的模型标识符)。
Expand Down Expand Up @@ -132,6 +137,7 @@ export OPENAI_API_KEY="sk-xxx"
| `KIMI_SHARE_DIR` | 自定义共享目录路径(默认 `~/.kimi`) |
| `KIMI_CLI_NO_AUTO_UPDATE` | 禁用所有更新相关功能 |
| `KIMI_CLI_PASTE_CHAR_THRESHOLD` | 粘贴文本折叠的字符数阈值(默认 `1000`) |
| `KIMI_FOREGROUND_AGENT_TIMEOUT` | 前台子代理超时时间(秒),`0` 表示不限制(默认 `300`) |
| `KIMI_CLI_PASTE_LINE_THRESHOLD` | 粘贴文本折叠的行数阈值(默认 `15`) |

### `KIMI_SHARE_DIR`
Expand Down
10 changes: 9 additions & 1 deletion docs/zh/customization/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,14 @@ agent:
- 子 Agent 可以有针对性的系统提示词
- 持久实例可跨多次调用保留上下文

### 并发限制

前台子 Agent 共享一个并发上限,以避免对 API 造成过大压力。当配置了多个 API 密钥时,上限为 `max(1, floor(密钥池大小 × 0.8))`;否则使用配置中的 `max(1, floor(max_running_tasks × 0.8))`。达到上限后,`Agent` 工具会返回 `ToolError`,简短提示为 `Concurrency limit reached`,主 Agent 可以稍后重试或以其他方式调度任务。

### 多密钥并行执行

当配置了 `KIMI_API_KEY_1`、`KIMI_API_KEY_2` … 时,Kimi CLI 会构建 API 密钥池,并以轮询方式为每个前台子 Agent 分配**不同的密钥**。如果某个子 Agent 触发速率限制或可重试的服务器错误,密钥会自动轮换到池中的下一个。这使你可以并发运行大量子 Agent,而不会将负载集中在单个 API 密钥上。每个子 Agent 请求还会携带可识别的 `User-Agent` 请求头(例如 `KimiCLI/1.44.0 (subagent: coder)`)。

## 内置工具列表

以下是 Kimi Code CLI 内置的所有工具。
Expand All @@ -171,7 +179,7 @@ agent:
| `model` | string | 可选的模型覆盖 |
| `resume` | string | 可选的 Agent 实例 ID,用于恢复现有实例 |
| `run_in_background` | bool | 是否在后台运行,默认 false |
| `timeout` | int | 超时时间(秒),范围 30–3600。前台默认无超时(运行到完成),后台默认 15 分钟;超时后任务会被停止 |
| `timeout` | int | 超时时间(秒),范围 30–3600。前台默认 300 秒(或由 `KIMI_FOREGROUND_AGENT_TIMEOUT` 指定),后台默认 15 分钟;超时后任务会被停止。设置为 `0` 可关闭前台超时限制 |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align the documented timeout escape hatch with the schema

When users follow this new table and pass timeout: 0 to Agent to disable the foreground timeout, the tool call is rejected before it reaches _resolve_foreground_timeout because Params.timeout is constrained with ge=30 in src/kimi_cli/tools/agent/__init__.py. The 0 escape hatch currently only works through KIMI_FOREGROUND_AGENT_TIMEOUT, so either allow 0 in the parameter schema or remove this instruction from the docs.

Useful? React with 👍 / 👎.


### `AskUserQuestion`

Expand Down
8 changes: 8 additions & 0 deletions docs/zh/release-notes/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@

## 未发布

- Core:新增前台子代理并发限制,上限为可用 API key 数量或后台任务槽数量的 80%——避免并发子代理耗尽单一 key 的速率限制配额
- Core:将前台子代理默认超时从"无限制"改为 300 秒,支持通过环境变量 `KIMI_FOREGROUND_AGENT_TIMEOUT` 覆盖
- Tool:当配置多个 key(`KIMI_API_KEY`、`KIMI_API_KEY_1`…)时,Agent 工具会为并发子代理轮询分配不同的 API key
- LLM:新增 `APIKeyPool`,用于从环境变量收集和轮换多个 API key
- LLM:新增 `KeyPoolKimi` provider 包装器,在遇到可重试错误(429、500、503)时自动切换到下一个 key
- Shell:在终端界面中显示子代理的步进计数器、已耗时和实时文本预览
- Auth:修复 LLM provider 被 `KeyPoolKimi` 包装时 OAuth token 刷新导致的崩溃

## 1.45.0 (2026-05-26)

- Shell:`/clear` 现在成为 `/new` 的别名——两者都会启动新会话;此前 `/clear` 仅清空上下文而不创建新会话
Expand Down
7 changes: 6 additions & 1 deletion packages/kosong/src/kosong/chat_provider/openai_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,12 @@ def create_openai_client(
base_url: str | None,
client_kwargs: Mapping[str, Any],
) -> AsyncOpenAI:
return AsyncOpenAI(api_key=api_key, base_url=base_url, **dict(client_kwargs))
kwargs = dict(client_kwargs)
# Apply a sensible default HTTP timeout when the caller has not supplied one.
# Prevents indefinite hangs when the API server stops responding mid-request.
if "timeout" not in kwargs:
kwargs["timeout"] = httpx.Timeout(connect=10.0, read=300.0, write=30.0, pool=30.0)
return AsyncOpenAI(api_key=api_key, base_url=base_url, **kwargs)


async def _drain_awaitable(awaitable: Awaitable[object]) -> None:
Expand Down
12 changes: 12 additions & 0 deletions src/kimi_cli/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from kimi_cli.config import Config, LLMModel, LLMProvider, load_config
from kimi_cli.constant import VERSION
from kimi_cli.llm import augment_provider_with_env_vars, create_llm, model_display_name
from kimi_cli.llm_key_pool import APIKeyPool
from kimi_cli.session import Session
from kimi_cli.share import get_share_dir
from kimi_cli.soul import RunCancelled, run_soul
Expand Down Expand Up @@ -254,6 +255,16 @@ async def create(
if startup_progress is not None:
startup_progress("Scanning workspace...")

# Initialise an optional API-key pool for parallel subagent execution.
# When the user has configured multiple keys (KIMI_API_KEY, KIMI_API_KEY_1,
# …) each subagent will be assigned a different key in round-robin order.
key_pool = APIKeyPool.from_env("KIMI_API_KEY")
if key_pool is not None:
logger.info(
"Subagent key pool initialised with {n} key(s)",
n=key_pool.key_count,
)

runtime = await Runtime.create(
config,
oauth,
Expand All @@ -264,6 +275,7 @@ async def create(
runtime_afk=runtime_afk,
skills_dirs=skills_dirs,
)
runtime.key_pool = key_pool
runtime.ui_mode = ui_mode
runtime.resumed = resumed
runtime.notifications.recover()
Expand Down
20 changes: 18 additions & 2 deletions src/kimi_cli/auth/oauth.py
Original file line number Diff line number Diff line change
Expand Up @@ -1080,10 +1080,26 @@ def _apply_access_token(self, runtime: Runtime | None, access_token: str) -> Non
return
from kosong.chat_provider.kimi import Kimi

assert isinstance(runtime.llm.chat_provider, Kimi), "Expected Kimi chat provider"
from kimi_cli.llm import unwrap_kimi_provider

chat_provider = unwrap_kimi_provider(runtime.llm.chat_provider)

assert isinstance(chat_provider, Kimi), "Expected Kimi chat provider"

# When the chat provider is wrapped by KeyPoolKimi, the pool (not OAuth)
# manages the API key. Overwriting the client key here would silently
# replace the pooled key with the OAuth token, breaking rotation.
# We gate on the wrapper type rather than runtime.key_pool existence,
# because a user may have a key pool configured for subagents while the
# root runtime still uses an unwrapped OAuth-based provider.
from kimi_cli.llm import KeyPoolKimi

if isinstance(runtime.llm.chat_provider, KeyPoolKimi):
return

provider = runtime.config.providers.get(provider_key)
fallback_api_key = provider.api_key.get_secret_value() if provider else ""
runtime.llm.chat_provider.client.api_key = access_token or fallback_api_key
chat_provider.client.api_key = access_token or fallback_api_key


if __name__ == "__main__":
Expand Down
Loading