Feb 26, 2026

Kimi CLI 深度解析：Moonshot AI 開源的終端 AI Agent 架構拆解

Kimi Code CLI 是 Moonshot AI（月之暗面）開源的終端 AI Agent，能在終端裡幫你讀寫程式碼、執行 Shell 指令、搜尋網頁，並自主規劃多步驟任務。截至 2026 年 2 月已累積超過 6800 個 GitHub Star，在短短四個月內版本從 1.0 迭代到 1.14，開發節奏非常快。

GitHub：https://github.com/MoonshotAI/kimi-cli

本文將從原始碼層級拆解 Kimi CLI 的核心架構設計，適合對 AI Agent 架構有興趣、或正在打造類似工具的開發者。

快速安裝與上手

# Linux / macOS 一鍵安裝
curl -LsSf https://code.kimi.com/install.sh | bash

# 或用 uv 安裝
uv tool install --python 3.13 kimi-cli

# 從原始碼跑起來
git clone https://github.com/MoonshotAI/kimi-cli.git
cd kimi-cli
make prepare
uv run kimi

安裝完後執行 kimi，就會進入互動式 Shell。按 Ctrl-X 可以在 AI Agent 模式和 Shell 命令模式之間切換——這代表你不用離開 Kimi CLI 就能直接跑任意 Shell 指令。

專案概覽

項目	說明
語言	Python 3.12+（目標 3.14）
建置工具	uv（現代 Python 套件管理）
授權	Apache 2.0
目前版本	1.14.0
二進位打包	PyInstaller
Stars	6,800+

整個 codebase 以 uv workspace 組織，包含四個 package：

kimi-cli：主應用，CLI 入口與所有 Agent 邏輯
kosong：LLM 抽象層（統一各家 Provider 的 message 格式與 tool calling 介面）
pykaos：OS 抽象層（檔案操作、跨平台 Shell、SSH 遠端執行）
kimi-code：Kimi Code 套件別名

架構全景

kimi-cli/
├── src/kimi_cli/
│   ├── cli/           # Typer CLI 入口
│   ├── soul/          # 核心 Agent 循環（KimiSoul）
│   ├── tools/         # 內建工具（檔案、Shell、搜尋、多 Agent）
│   ├── ui/            # UI 前端（Shell、Web、ACP、Wire）
│   ├── wire/          # 事件傳輸協議
│   ├── acp/           # Agent Client Protocol 伺服器
│   ├── web/           # Web UI（FastAPI + 前端）
│   ├── auth/          # OAuth 認證
│   ├── config.py      # 組態管理
│   ├── llm.py         # LLM Provider 抽象
│   └── session.py     # Session 管理
├── packages/
│   ├── kosong/        # LLM 抽象層
│   └── kaos/          # OS 抽象層

Kimi CLI 最值得學習的架構特色：用 Wire Protocol 將 Agent 邏輯和 UI 渲染完全解耦。Agent 的「靈魂」（KimiSoul）只發送結構化事件，任何 UI 前端（終端、Web、IDE）都能接收並渲染。

核心機制一：Wire Protocol

Wire Protocol 是 Kimi CLI 最關鍵的架構決策。它定義了一組結構化的事件訊息，讓 Agent 的執行邏輯（Soul）和使用者介面（UI）透過事件串流通訊，完全解耦。

事件類型

定義在 src/kimi_cli/wire/types.py：

class TurnBegin(BaseModel):
    """Agent 開始一個新回合"""
    user_input: str | list[ContentPart]

class TurnEnd(BaseModel):
    """Agent 結束當前回合"""
    pass

class StepBegin(BaseModel):
    """Agent 開始一個新步驟"""
    n: int  # 步驟編號

class CompactionBegin(BaseModel):
    """Context 壓縮開始"""
    pass

class CompactionEnd(BaseModel):
    """Context 壓縮結束"""
    pass

內容部分重新匯出自 kosong 的 message 類型：

TextPart — 一般文字輸出
ThinkPart — 延伸推理（thinking mode）
ToolCallPart — 工具呼叫請求
ToolResult — 工具執行結果

運作方式

KimiSoul                    Wire                     UI Frontend
   │                         │                          │
   ├─ wire_send(TurnBegin) ──►│                          │
   ├─ wire_send(StepBegin) ──►│──── async receive() ────►│
   ├─ wire_send(TextPart)  ──►│                          ├─ render_text()
   ├─ wire_send(ToolCall)  ──►│                          ├─ show_tool_call()
   ├─ wire_send(ToolResult)──►│                          ├─ show_result()
   ├─ wire_send(Approval)  ──►│                          ├─ prompt_user()
   │◄── ApprovalResponse ─────│◄─────────────────────────┤
   ├─ wire_send(TurnEnd)   ──►│                          │

這個設計帶來的好處是：同一套 Agent 邏輯可以同時支援終端 Shell UI、Web UI（FastAPI + WebSocket）、ACP（Agent Client Protocol，供 VS Code / Zed / JetBrains 整合），甚至 Wire 原始協議模式。切換 UI 前端不需要改動任何 Agent 程式碼。

核心機制二：多 LLM Provider 抽象

Kimi CLI 並不綁死自家的 Kimi 模型，而是透過 kosong 這個 workspace 套件提供統一的 ChatProvider 抽象。

支援的 Provider

# src/kimi_cli/llm.py
type ProviderType = Literal[
    "kimi",              # Moonshot AI Kimi 模型（預設）
    "openai_legacy",     # OpenAI 相容 API
    "openai_responses",  # OpenAI Responses 格式
    "anthropic",         # Claude
    "google_genai",      # Google Gemini
    "gemini",            # 同上
    "vertexai",          # Google Vertex AI
]

Provider 建立流程

create_llm() 函數使用 Python 3.10+ 的 match 語法做分派：

def create_llm(provider, model, *, thinking=None, session_id=None, oauth=None):
    match provider.type:
        case "kimi":
            chat_provider = Kimi(
                model=model.model,
                base_url=provider.base_url,
                api_key=resolved_api_key,
                default_headers=_kimi_default_headers(provider, oauth),
            )
            # Kimi 支援 prompt caching
            gen_kwargs["prompt_cache_key"] = session_id

        case "anthropic":
            chat_provider = Anthropic(
                model=model.model,
                base_url=provider.base_url,
                api_key=resolved_api_key,
                default_max_tokens=50000,
            )
        # ... 其他 provider

Thinking Mode

Kimi CLI 對模型的 thinking 能力做了自動偵測：

def derive_model_capabilities(model: LLMModel) -> set[ModelCapability]:
    capabilities = set(model.capabilities or ())
    # 名稱含 "thinking" 或 "reason" → always_thinking
    if "thinking" in model.model.lower() or "reason" in model.model.lower():
        capabilities.update(("thinking", "always_thinking"))
    # kimi-for-coding / kimi-code → 可開關 thinking
    elif model.model in {"kimi-for-coding", "kimi-code"}:
        capabilities.update(("thinking", "image_in", "video_in"))
    return capabilities

如果啟用 thinking，就會呼叫 chat_provider.with_thinking("high")，讓模型在生成前先做延伸推理。

核心機制三：工具系統

工具架構

所有工具繼承自 kosong 的 CallableTool2[Params] 協議：

class Shell(CallableTool2[Params]):
    name: str = "Shell"
    params: type[Params] = Params

    def __init__(self, approval: Approval, environment: Environment):
        # 依賴注入：approval 和 environment 由 toolset loader 自動傳入
        ...

    async def __call__(self, params: Params) -> ToolReturnValue:
        # 先請求使用者核准
        if not await self._approval.request(...):
            return ToolRejectedError()
        # 執行指令
        exitcode = await self._run_shell_command(params.command, ...)

內建工具清單

工具	檔案	功能
`ReadFile`	`tools/file/read.py`	讀檔（最大 1000 行、2000 字元/行、100KB/次）
`WriteFile`	`tools/file/write.py`	寫檔
`StrReplaceFile`	`tools/file/replace.py`	字串替換編輯
`Glob`	`tools/file/glob.py`	檔案路徑匹配
`Grep`	`tools/file/grep_local.py`	內容搜尋（用 ripgrepy）
`ReadMediaFile`	`tools/file/read_media.py`	圖片/影片轉 base64
`Shell`	`tools/shell/`	執行 Bash/PowerShell
`SearchWeb`	`tools/web/search.py`	Moonshot Search 網頁搜尋
`FetchURL`	`tools/web/fetch.py`	抓取網頁內容（trafilatura）
`Task`	`tools/multiagent/task.py`	派生子 Agent
`Think`	`tools/think/`	顯式推理步驟
`SetTodoList`	`tools/todo/`	任務管理
`AskUserQuestion`	`tools/ask_user/`	向使用者提問
`SendDMail`	`tools/dmail/`	內部訊息（checkpoint）

工具的依賴注入

KimiToolset 在載入工具時，會動態檢查 __init__ 的參數型別，自動注入對應的依賴：

# src/kimi_cli/soul/toolset.py
def _load_builtin_tool(import_path: str, runtime: Runtime):
    module = importlib.import_module(module_name)
    tool_class = getattr(module, class_name)
    # 自動檢查 constructor 需要什麼依賴，然後注入
    return tool_class(runtime)

這讓工具的開發非常乾淨——每個工具只宣告自己需要的依賴，不需要手動組裝。

核准系統（Approval）

Shell 指令等有副作用的操作必須經過使用者核准：

if not await self._approval.request(
    self.name,              # 工具名稱
    "run command",          # 操作類型
    f"Run command `{cmd}`", # 顯示訊息
    display=[ShellDisplayBlock(language="bash", command=cmd)],
):
    return ToolRejectedError()

核准透過 Wire Protocol 的 ApprovalRequest / ApprovalResponse 傳遞。使用者可以選擇：

單次核准
整個 session 核准（approve_for_session）
拒絕

用 /yolo 可以開啟自動核准模式。

核心機制四：Agent 循環（KimiSoul）

KimiSoul 是整個系統的「靈魂」，實作在 src/kimi_cli/soul/kimisoul.py。

循環結構

使用者輸入
    │
    ▼
TurnBegin ──► Step 1 ──► Step 2 ──► ... ──► Step N ──► TurnEnd
               │
               ├─ 呼叫 LLM
               ├─ 解析 tool calls
               ├─ 執行 tools
               ├─ 收集結果
               └─ 判斷是否繼續

每個 Turn 最多執行 max_steps_per_turn（預設 100）個 Step。每個 Step 有最多 max_retries_per_step（預設 3）次重試。

錯誤重試

使用 tenacity 做指數退避重試：

from tenacity import retry_if_exception, stop_after_attempt, wait_exponential_jitter

# 遇到 API 連線錯誤、超時、429/500/503 時自動重試

Skill 系統

兩種類型：

Markdown Skills：載入為使用者 prompt（如 /skill:gen-changelog）
Flow Skills：Python 工作流，有控制流程（如 /flow:release）

Skill 可以放在專案的 .claude/skills/ 目錄或全域設定中。

核心機制五：Context Compaction

當對話歷史逼近模型的 context 上限時，Kimi CLI 會自動觸發 context 壓縮。

觸發條件

# 當 context_tokens + reserved_context_size >= max_context_size 時觸發
# reserved_context_size 預設 50,000 tokens

壓縮演算法

SimpleCompaction 的做法很直觀：

class SimpleCompaction:
    def __init__(self, max_preserved_messages: int = 2):
        self.max_preserved_messages = max_preserved_messages

    async def compact(self, messages, llm):
        # 1. 保留最後 N 條 user/assistant 訊息
        # 2. 將之前的所有訊息交給 LLM 做摘要
        # 3. 用摘要結果替換舊訊息
        compact_message = Message(role="user", content=[])
        for i, msg in enumerate(to_compact):
            compact_message.content.append(
                TextPart(text=f"## Message {i + 1}\nRole: {msg.role}\n")
            )
            compact_message.content.extend(
                part for part in msg.content
                if not isinstance(part, ThinkPart)  # 丟掉 thinking 部分
            )

壓縮時會丟掉 ThinkPart（延伸推理的內容），因為它們通常佔大量 token 但對後續對話的價值較低。壓縮過程會發送 CompactionBegin / CompactionEnd 事件，讓 UI 可以顯示壓縮進度。

核心機制六：多 Agent（子 Agent）

透過 Task 工具，Agent 可以派生子 Agent 來處理子任務。

架構

class Task(CallableTool2[Params]):
    async def __call__(self, params: Params) -> ToolReturnValue:
        agent = subagents[params.subagent_name]
        result = await self._run_subagent(agent, params.prompt)
        return result

    async def _run_subagent(self, agent, prompt):
        # 建立獨立的 context 和 soul
        context = Context(file_backend=subagent_context_file)
        soul = KimiSoul(agent, context=context)

        # 執行子 agent
        await run_soul(soul, prompt, _ui_loop_fn, asyncio.Event())

        # 如果回覆太短（< 200 字元），自動追問
        if len(final_response) < 200:
            await run_soul(soul, CONTINUE_PROMPT, ...)

子 Agent 有自己獨立的 context 和 session，但透過 SubagentEvent 把 Wire 事件傳回主 Agent 的 UI。核准請求（ApprovalRequest）會直接穿透到主 Wire 層級，讓使用者統一核准。

子 Agent 由 LaborMarket 管理，分為：

Fixed subagents：Agent spec 中預定義的
Dynamic subagents：執行時動態建立的

核心機制七：MCP 整合

Kimi CLI 原生支援 Model Context Protocol（MCP），透過 fastmcp 套件整合外部工具。

# 新增 HTTP MCP 伺服器
kimi mcp add --transport http context7 https://mcp.context7.com/mcp \
  --header "CONTEXT7_API_KEY: ctx7sk-xxx"

# 新增 stdio MCP 伺服器
kimi mcp add --transport stdio chrome-devtools -- npx chrome-devtools-mcp@latest

# 支援 OAuth 認證的 MCP
kimi mcp add --transport http --auth oauth linear https://mcp.linear.app/mcp

MCP 工具會自動載入到 KimiToolset，和內建工具無差別地使用。

組態系統

組態檔案位於 ~/.kimi/config.toml，使用 Pydantic 做驗證：

default_model = "kimi-code__kimi-k2.5"
default_thinking = true
default_yolo = false

[loop_control]
max_steps_per_turn = 100
max_retries_per_step = 3
reserved_context_size = 50000

[models.kimi-code__kimi-k2-5]
provider = "kimi-code"
model = "kimi-k2.5"
max_context_size = 1000000
capabilities = ["thinking", "image_in", "video_in"]

[providers.kimi-code]
type = "kimi"
base_url = "https://api.kimi.com/v1"
api_key = ""

值得注意的是，組態系統支援從 JSON 遷移到 TOML——_migrate_json_config_to_toml() 會自動偵測舊格式並轉換，舊檔備份為 .bak。

環境變數可以覆蓋設定：KIMI_BASE_URL、KIMI_API_KEY、KIMI_MODEL_NAME 等。

OAuth 認證

Kimi CLI 的認證走 OAuth Device Flow：

CLI 向認證伺服器請求 device code
使用者在瀏覽器開啟授權頁面
CLI 輪詢等待授權完成
取得 access token 和 refresh token

Token 儲存在 ~/.kimi/credentials/{key}.json（檔案型，已棄用 keyring 方案），背景每 60 秒檢查是否需要重新整理 token。

IDE 整合

除了終端，Kimi CLI 支援多種 IDE 整合方式：

VS Code 擴充套件

直接安裝 Kimi Code VS Code Extension。

ACP（Agent Client Protocol）

Kimi CLI 原生支援 ACP，可以和 Zed、JetBrains 等編輯器整合：

{
  "agent_servers": {
    "Kimi Code CLI": {
      "command": "kimi",
      "args": ["acp"],
      "env": {}
    }
  }
}

Zsh 整合

git clone https://github.com/MoonshotAI/zsh-kimi-cli.git \
  ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/kimi-cli

在 Zsh 中按 Ctrl-X 就能切到 Agent 模式。

Web UI

Kimi CLI 內建 Web UI 模式，用 FastAPI 做後端、WebSocket 做即時通訊：

# 啟動 Web UI
kimi web

# 或在互動模式中用 /web 切換

Web UI 和終端 UI 共用同一套 Wire Protocol，差別只在渲染層。

關鍵設計模式總結

1. Wire Protocol 解耦

Agent 邏輯和 UI 透過結構化事件流通訊，讓同一套 soul 可以驅動終端、Web、IDE 三種前端。這在 AI Agent 工具中是很少見的架構選擇，大多數工具都把 UI 和 Agent 邏輯耦合在一起。

2. YAML Agent Spec 繼承

Agent 定義支援 extend 繼承：

agent:
  extend: default
  name: My Custom Agent
  tools: ["kimi_cli.tools.file.*", "kimi_cli.tools.shell.*"]
  exclude_tools: ["kimi_cli.tools.web.*"]

3. 依賴注入

工具透過 constructor inspection 自動注入 Runtime 依賴，不需要 DI 框架。

4. Prompt Caching

Kimi 的 ChatProvider 支援 prompt_cache_key，用 session_id 作為 key，避免重複的 system prompt 計算。

5. 版本策略

永遠只升 minor version（1.14.0 -> 1.15.0），不升 patch。這是個有趣的選擇，反映了快速迭代的開發風格。

和 Claude Code 的比較

作為一個同類型的終端 AI Agent，Kimi CLI 和 Claude Code 有不少相似之處：

面向	Kimi CLI	Claude Code
語言	Python	TypeScript
LLM	多 Provider（預設 Kimi）	Claude
工具系統	CallableTool2 + 依賴注入	內建工具
UI 解耦	Wire Protocol	類似但更緊耦合
MCP	原生支援	原生支援
IDE	ACP + VS Code	VS Code / JetBrains
Context 壓縮	LLM 摘要	LLM 摘要
開源	Apache 2.0	部分開源

Kimi CLI 最大的差異化在於：完全開源、多 LLM Provider 支援、以及 Wire Protocol 帶來的 UI 可插拔架構。

結語

Kimi CLI 的原始碼品質相當不錯，模組化程度高，async Python 的使用很成熟。如果你正在打造自己的 AI Agent 工具，這個 codebase 有幾個特別值得參考的設計：

Wire Protocol 做 Agent/UI 解耦
kosong 的多 Provider 抽象模式
依賴注入讓工具開發保持乾淨
Context Compaction 的實作策略
子 Agent 架構中的 Wire 事件穿透機制

專案正在非常活躍地開發中，每週都有新 commit。如果你想深入研究，建議直接 clone 原始碼，從 KimiSoul 這個核心類別開始讀起。