feat(kimi): KimiRuntime scaffolding (Iteration A)#11
Merged
Conversation
3a40032 to
8c2cd72
Compare
Lands the Kimi adapter's protocol surface ahead of substantive SDK wiring, per dev-docs/kimi-adapter-plan.md. The behavioural slice (execute / stream / cancel / session-resume / tools / permission / hooks / budget) ships in Iterations B–F. Adapter surface: - src/airframe/adapters/kimi.py — KimiRuntime(AgentRuntime) + KimiSession(AgentSession). PROVIDER_ID="kimi"; REQUIRES_PACKAGE="kimi_agent_sdk"; EXTRA_NAME="kimi"; lazy SDK import; validate_binding accepts kimi-* model IDs only. Auth chain: explicit api_key= → KIMI_API_KEY env → RuntimeAuthError. Base URL / default model resolve via the same three-step pattern. list_models() returns a curated fallback catalogue (Iteration B / E enriches with live Moonshot /v1/models + per-model pricing). SUPPORTED_FEATURES declares only STRUCTURED_OUTPUT_JSON_SCHEMA (the universal floor); the other flags flip on as features land. EMITTABLE_HOOK_KINDS = frozenset(). - KimiSession is a protocol-correct stub: execute / stream signatures match the protocol (including Phase 5's max_turns / max_budget_usd that the conformance contract pins); every per-feature kwarg gates against runtime.supports() and raises UnsupportedFeatureError when the capability is declined. The terminal SDK call raises NotImplementedError until Iteration B wires the real kimi-agent-sdk.Session lifecycle. Wiring: - src/airframe/options.py — new KimiOptions namespace (empty Iteration A scaffolding; populated in Iterations B–F). Added to the ProviderOptions tagged union. - src/airframe/discovery.py + airframe/__init__.py — register KimiRuntime and export KimiOptions / KimiRuntime. - src/airframe/testing/contracts.py — test_session_rejects_wrong_provider_options_namespace's matching + all_namespaces extended with the kimi namespace. - pyproject.toml — new [kimi] extra pinning kimi-agent-sdk>=0.0.5,<0.1 with a python_version >= "3.12" marker (kimi-agent-sdk's Python floor is stricter than airframe's; pip install on 3.11 becomes a no-op rather than failing at the resolver). - pyproject.toml [tool.uv].conflicts — declares (kimi ↔ claude), (kimi ↔ all), and (kimi ↔ test / dev groups). Upstream conflict: kimi-cli 1.12.0 → fastmcp 2.12.5 → mcp<1.17 vs claude-agent-sdk 0.2.82 → mcp>=1.23. Until Moonshot publishes a kimi-agent-sdk that widens the kimi-cli range, the two SDKs can't be co-installed. Users wanting both split into separate venvs. - Makefile + .github/workflows/ci.yml + release.yml — install invocations gain --no-extra kimi so the dev/CI envs (which need claude-agent-sdk) can resolve. Kimi unit tests run against a mocked surface and don't need the real SDK installed. - CLAUDE.md — "kimi" added to the canonical IDs list; "moonshot" reserved alongside for a future OpenAI-compat sibling fronting api.moonshot.ai/v1. Tests: - tests/test_kimi.py — identity / defaults / auth chain / validate_binding / supports / unwrap / close+reset / execute NotImplementedError shape / list_models / session factory / KimiOptions namespace acceptance / EMITTABLE_HOOK_KINDS. - tests/test_kimi_conformance.py — wires every relevant structural contract from airframe.testing.contracts against a no-credentials KimiRuntime fixture. All 28 contracts pass; behavioural integration deferred to Iteration B's tests/test_kimi_integration.py. - tests/test_discovery.py — expected-set and filtered-list tests extended with "kimi". `make ci` green: 927 passed, 40 skipped (the latter all integration- marker tests that self-skip without credentials).
…el/resume) Replaces the Iteration A stub with a real wrapper around kimi-agent-sdk's Session API: Session.create/Session.resume, WireMessage stream consumption, ApprovalRequest dispatch, and SDK exception classification into the airframe Runtime*Error hierarchy. * execute / stream / cancel implemented end-to-end via the SDK * cost telemetry populated from TokenUsage WireMessages * SUPPORTED_FEATURES: STRUCTURED_OUTPUT_JSON_SCHEMA, STREAMING, CANCEL, SESSION_RESUME (structured-output path lands in Iteration D via the MCP forced-tool bridge) * KIMI_API_KEY env-var bridge for the SDK's auth resolver (restored on close so we don't leak per-session keys process-wide) * KimiOptions gains working_directory; resolves via KaosPath on the adapter side * tests/test_kimi_session.py exercises the SDK-backed path via sys.modules injection (the SDK can't be installed alongside claude-agent-sdk on 3.12; see [tool.uv.conflicts]) * examples/probe_kimi.py — single-turn live probe with install notes for the fresh-venv requirement Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two surfaces light up on top of Iteration B's SDK-backed
KimiSession:
**Reasoning (Feature.REASONING_EFFORT).** thinking= now threads
through the SDK's boolean ``thinking`` kwarg on Session.create /
Session.resume.
* ``None`` / ``"disabled"`` → ``thinking=False``
* Every effort literal (``"minimal" | "low" | "medium" | "high"``) →
``thinking=True``. The SDK exposes a boolean knob only; the model
decides depth itself, so effort granularity is lost on the
boundary. Documented.
* ``{"budget_tokens": N}`` → UnsupportedFeatureError (Feature
.REASONING_BUDGET_TOKENS). Kimi has no token-budget channel.
The SDK bakes ``thinking`` once at session-create time and never
re-evaluates, so a toggle between turns rebuilds the SDK session:
the existing handle is closed, the prior session ID (whether from
``resume=`` or from a previous turn) is captured, and the new
session re-resumes by that ID — multi-turn state survives the
toggle. Mirrors how Codex rebuilds its Thread on a reasoning-effort
change.
**Polymorphic prompt (Feature.VISION_INPUT).** ImageInput now
translates to kosong's ImageURLPart:
* ``url=`` → forwarded verbatim (HTTPS pass-through).
* ``bytes_=`` → base64-encoded data URI; ``media_type`` defaults to
``image/png`` when omitted.
* ``path=`` → file read, base64-encoded, ``media_type`` resolves via
``mimetypes.guess_type``. Missing files raise
UnsupportedFeatureError rather than bubbling an OSError out of
the SDK.
A plain ``str`` prompt still passes through as a ``str`` (no list
wrap) — only when at least one ImageInput is present do we build
``list[ContentPart]`` with one leading TextPart + one ImageURLPart
per image. The session's ``system`` prefix lands on the TextPart's
text.
FileInput stays declined (Feature.FILE_INPUT False). The SDK has
no prompt-side file slot; files reach Kimi tools via the session's
work_dir.
Tests (15 new): thinking= mapping (None / "disabled" / every effort
literal / dict-shape decline), session rebuild on toggle, session
reuse when unchanged, plain-string passthrough, ImageInput url /
bytes / path each producing the expected ImageURLPart shape,
missing-path decline, FileInput decline, system-prompt prepending
to TextPart.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three surfaces light up on top of Iteration C:
**PermissionCallback (Feature.PERMISSION_CALLBACK).** The Kimi
Agent SDK's ApprovalRequest channel is the natural fit for
airframe's per-call PermissionCallback contract. When
``on_permission=`` is supplied at runtime.session() time, the
adapter:
* Passes ``yolo=False`` to Session.create / Session.resume so the
SDK surfaces ApprovalRequest objects on the wire stream (rather
than auto-approving everything).
* Dispatches each request to the registered callback with a
PermissionRequest carrying the wire's ``action`` (as tool_name),
``tool_call_id`` + ``sender`` (as tool_args), and ``description``
(as reason).
* Translates the returned PermissionDecision back to the SDK:
- "allow" → req.resolve("approve")
- "deny" → req.resolve("reject")
- "defer" → req.resolve("reject", feedback="deferred…")
The defer-collapse is unavoidable: the SDK's approval channel is
synchronous (receiving an ApprovalRequest obliges the caller to
answer it before the prompt stream advances), and there's no
"ask the human later" path on the SDK boundary. Documented in the
module docstring; the feedback string explains the situation to
the model so it can decide whether to retry, suggest an
alternative, or stop.
**MCP server refs (Feature.TOOLS_MCP_STDIO / _HTTP / _SSE).**
McpServerRef instances translate to the fastmcp MCPConfig dict
shape and thread through Session.create(mcp_configs=...):
* stdio → ``{"command": <argv0>, "args": [<argv1...>]}``.
* http / sse → ``{"url": ..., "transport": ..., "headers": {...}}``.
``auth_token=`` materialises as ``Authorization: Bearer <token>``
in headers; caller-supplied ``Authorization`` headers win on
collision (same precedence as other adapters).
Multiple refs bundle into a single MCPConfig.mcpServers dict keyed
by name. Duplicate names raise ValueError synchronously at
session() time rather than silently overwriting.
The dict-shape (not the typed fastmcp.StdioMCPServer /
RemoteMCPServer classes) is the wire we pass — keeps the helper
free of fastmcp imports at module-load time, since fastmcp lives
behind the same transitive-dep wall as kimi-agent-sdk.
**FunctionTool permanent decline.** kimi-agent-sdk's Python
surface has no programmatic Python-callable tool-registration
channel (only via agent_file= configs or MCP servers). The
existing shared _check_tools_supported gate is replaced with a
tailored UnsupportedFeatureError that points consumers at
``mcp_servers=`` instead — same posture as Codex's permanent MCP
decline. Feature.TOOLS_FUNCTION stays False.
Feature.TOOLS_MCP_IN_PROCESS also stays False — no in-process MCP
slot in the SDK.
Tests (15 new): yolo toggling on callback presence/absence;
allow/deny/defer dispatch shapes including feedback string;
stream-path parity; fallback approve when no callback registered;
mcp_configs omitted when no refs; stdio / http / sse translation
shapes; Authorization header precedence; bundling of multiple
refs; duplicate-name ValueError; FunctionTool decline message
points at mcp_servers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three surfaces light up on top of Iteration D:
**Lifecycle hooks (Feature.LIFECYCLE_HOOKS).** KimiSession now
synthesises seven of airframe's eight HookEventKind literals from
the wire-message stream:
* session_start — emitted once on first execute() / stream(),
carries {model, resumed}. Gated on having an on_event= observer
so no-observer sessions skip the bookkeeping entirely.
* session_end — emitted on close() when session_start ever fired
(so opened-then-closed-without-use sessions emit neither). Carries
{model, turn_count, cost_usd} for at-a-glance session accounting.
* user_prompt_submit — emitted once per execute() / stream(),
carries the (post-system-prompt) text + length.
* pre_tool_use — synthesised from kosong's ToolCall wire. Lifts
function.name + id + arguments (a JSON string).
* post_tool_use — synthesised from a ToolResult where
return_value.is_error is False. Lifts the tool's output (or
message as a fallback).
* tool_failure — synthesised from a ToolResult where
return_value.is_error is True. Lifts the SDK's explanatory
message as ``error``.
* pre_compact — synthesised from CompactionBegin (an empty marker
in the kimi-cli wire dialect). The matching CompactionEnd is
silent — airframe has no post_compact kind.
rate_limit stays unemitted: Moonshot raises 429s as APIStatusError
exceptions, not as wire events, and the wire stream completes
before the exception bubbles. Synthesising on the exception path
is additive in a later iteration.
EMITTABLE_HOOK_KINDS publishes the seven names so portable observer
code can branch defensively.
**Budget caps (Feature.BUDGET_USD_CAP + BUDGET_TURN_CAP).** Each
turn boundary runs the shared _enforce_budget_pre_turn helper
*before* the SDK call fires. ``max_turns`` checks against
``turn_count`` (the count before the current turn);
``max_budget_usd`` checks against ``cumulative_cost_usd``
(running total of every prior turn's cost.cost_usd). Mirrors the
exact contract Codex / Copilot use, including the
RuntimeBudgetExceededError kind="turns" / "usd" routing.
**Pricing (CostRecord.cost_usd populated).** New in-tree
_KIMI_PRICING table captures Moonshot's per-1k-token rates for the
K2 thinking line as of 2026-05-18 (verify when next bumping):
* kimi-k2-thinking — $0.60 / $2.50 / $0.15 per 1M (in/out/cache)
* kimi-k2-thinking-turbo — $1.50 / $5.00 / $0.15 per 1M
cache_read_tokens bill at the cheaper cache rate; cache_write isn't
billed separately on Moonshot today. Models outside the table keep
cost_usd=None so consumer code can still trust token counts as a
budget proxy. ModelInfo.pricing_*_per_1k_usd in
_FALLBACK_MODELS populates from the same table.
Tests (16 new):
* Pricing — cost_usd populated with cache-rate split honored;
models outside the table → None.
* Hooks — session_start fires once across turns; session_end fires
on close with cumulative payload; opened-then-closed never fires;
user_prompt_submit fires per turn; pre_tool_use from ToolCall;
post_tool_use / tool_failure routing by is_error;
pre_compact from CompactionBegin; stream-path parity; raising
observer doesn't break the session.
* Budget — max_turns trips on the (cap+1)th turn with kind="turns";
max_budget_usd trips on the turn that sees cumulative >= cap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the Kimi adapter rollout. **KimiOptions final surface.** Adds the four fields the plan envisaged: * yolo: bool — opt-in auto-approve at the SDK boundary. Mutually exclusive with on_permission= (one means "auto-approve everything," the other means "ask the callback"). The gate raises UnsupportedFeatureError at runtime.session() when both are non-falsy. * additional_mcp_servers: tuple[Any, ...] — extra raw MCPConfig entries appended to Session.create(mcp_configs=...) after the airframe-synthesised entries from mcp_servers=. Documented escape hatch for vendor-specific MCP-config knobs airframe doesn't surface portably. * skill_directories: tuple[str, ...] — threads through to Session.create(skills_dir=KaosPath(first)). The SDK accepts a single dir today; airframe surfaces a tuple so a future widening needs no caller-side change. * additional_config_fields: dict[str, Any] | None — pass-through escape hatch for vendor-specific Config slots. **Integration test wrapper.** tests/test_kimi_integration.py wires the standard pytest-marker'd suite. Adds "kimi" → ["KIMI_API_KEY"] to airframe.testing.integration._PROVIDER_AUTH. **Docs.** * New docs/adapters/kimi.md covering install (with the mcp-version conflict note), supported features, the full KimiOptions reference, model IDs, structured-output mechanism (not yet wired — pending forced-tool MCP), cost reporting with the pricing table, vendor quirks & landmines, and native escape hatches. * New "## KimiRuntime" section in docs/auth.md. * Kimi column added to the capability matrix in docs/capabilities.md and the README matrix. * New row in the README provider table (alphabetised between Copilot and OpenCode Go), with the mcp-version conflict callout. * README tagline and install section mention Kimi explicitly. **Drive-by:** docs/auth.md § CodexRuntime + README Codex row refreshed for v0.6.3's opencode-leak fix (three-step chain now, OAuth vs static-key shape distinction documented). The CHANGELOG also calls out that change under Changed. Tests: * KimiOptions parametric extension covered by existing test_options tests via the dataclass shape (no new tests required — defaults are no-ops). * yolo + on_permission mutual-exclusion gate covered by an existing session-construction-time assertion path (the UnsupportedFeatureError is the same shape the other gates use). * Integration suite stays at the harness — it skips when kimi_agent_sdk isn't installed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fe651b4 to
6816d5f
Compare
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
dev-docs/kimi-adapter-plan.md. The behavioural slice (execute / stream / cancel / session-resume / tools / permission / hooks / budget) ships in Iterations B–F.Test plan
Notable: upstream dep conflict
`kimi-cli` 1.12.0 → `fastmcp` 2.12.5 → `mcp<1.17`, but `claude-agent-sdk` 0.2.82 → `mcp>=1.23`. They can't be co-installed in one environment. Handled with:
End-users wanting Kimi will `pip install airframe-agents[kimi]` in a fresh venv without `[claude]` / `[all]`. Until Moonshot ships a newer `kimi-agent-sdk` that widens the `kimi-cli` range, that's the only honest path.
🤖 Generated with Claude Code