eAI Model Routing

eAI is a tech-oriented, multi-model agent fleet with an explicit orchestrator/worker hierarchy. It is a derivative of agency-agents (MIT), re-cut for engineering work and wired for cost-aware model routing.

The core idea: expensive models plan and review; cheaper models execute. Every agent declares a model: tier in its frontmatter, and models.yaml is the single source of truth that maps tiers to concrete model IDs, pricing, and usage rules.

The hierarchy

                    ┌─────────────────────────────┐
   PLAN / DECIDE    │  👑 Prime    — Fable 5       │   architecture, irreversible calls,
                    │  🎯 Lead     — Opus 4.8      │   decomposition, dev↔QA loop, merges
                    └──────────────┬──────────────┘
                                   │ assigns + reviews
                    ┌──────────────▼──────────────┐
   EXECUTE          │  🧠 Senior   — Sonnet 4.6    │   correctness-critical, security, debugging
                    │  ⚙️  Bulk     — Kimi K2       │   high-volume codegen, CMS/UI, refactors
                    │  ⚡ Fast     — Haiku 4.5     │   lookups, formatting, classification
                    └─────────────────────────────┘

Tiers at a glance

Tier	Role	Model	ID	Input / Output ($/MTok)	Context	Use it for
👑 Prime	Orchestrator	Claude Fable 5	`claude-fable-5`	$10 / $50	1M	Mission decomposition, architecture, irreversible & consensus-critical calls, conflict resolution, final review
🎯 Lead	Orchestrator	Claude Opus 4.8	`claude-opus-4-8`	$5 / $25	1M	Default orchestrator: task assignment, dev↔QA loop, code-review gates, merges, escalation triage
🧠 Senior	Worker	Claude Sonnet 4.6	`claude-sonnet-4-6`	$3 / $15	1M	Correctness-critical implementation, security analysis, complex debugging, schema/SRE judgment
⚙️ Bulk	Worker	Kimi K2	`kimi-k2`	(Moonshot)	256K	High-volume code generation, framework/CMS implementation, UI from spec, long mechanical refactors
⚡ Fast	Worker	Claude Haiku 4.5	`claude-haiku-4-5`	$1 / $5	200K	Lookups, grep/glob sweeps, formatting, linting, classification, simple single-file fixes

Prices are the published Anthropic per-million-token rates as of 2026-06. Kimi K2 runs on Moonshot AI's API (or any OpenAI-compatible runtime); pin a dated snapshot in production and consult Moonshot for current pricing.

How routing works

Prime decomposes the mission and owns architecture. Budget: ≤5 activations per phase — it is the scalpel.
Lead runs the day-to-day pipeline: assigns each task to the cheapest worker tier that can do it correctly, manages the dev↔QA loop (max 3 retries), and gates merges.
Workers execute. Each worker file carries a model: tier so the orchestrator (and your CLI) knows which model to spin it up on.
Escalation flows up the chain fast → bulk → senior → lead → prime. See the escalation: block in models.yaml for exact triggers.

Why these models

Fable 5 for Prime — the highest-capability model; reserve it for decisions that are expensive to undo. Adaptive thinking, effort: high+ . On Fable 5, never send thinking: {type:"disabled"} (it 400s) — omit the param.
Opus 4.8 for Lead — state-of-the-art long-horizon agentic execution at half Prime's cost; ideal for managing loops and gates across a phase.
Sonnet 4.6 for Senior — best speed/intelligence balance; the right tier for security and correctness-critical work where you want strong reasoning without Opus pricing.
Kimi K2 for Bulk — strong, cost-efficient code generation for high-volume mechanical work and large-context refactors. The multi-vendor tier.
Haiku 4.5 for Fast — fastest and cheapest; perfect for read-only exploration, formatting, and classification.

Single-vendor fallback (Claude Code)

Claude Code runs one vendor. When you install eAI into Claude Code, the Bulk tier (Kimi K2) falls back to claude-sonnet-4-6 automatically (see fallback: in models.yaml). To use Kimi K2 for real, install the Bulk-tier workers into a multi-vendor runtime — the Kimi CLI integration or any OpenAI-compatible client pointed at Moonshot.

Cost & cache discipline (applies to every agent)

Don't invert the hierarchy. Orchestrators plan and review; workers execute. A worker never makes an architectural call.
Don't burn Prime on routine work. Bug fixes, formatting, and "which file?" go to Lead or a worker.
Never switch a thread's model mid-conversation — it invalidates the Anthropic prompt cache. To get a cheaper opinion mid-task, spawn a subagent on a lower tier instead of downgrading the current thread.
Adaptive thinking everywhere on the Anthropic tiers; tune effort per route (medium for most worker tasks, high/xhigh for security, architecture, and long agentic runs).
Effort is pre-tuned in frontmatter. Every agent carries an effort: field that Claude Code honors per subagent: orchestrators and security/blockchain-senior workers run high, routine senior workers run medium (Sonnet 4.6 defaults to high, so this is a deliberate token saving), bulk runs low, and fast (Haiku) omits it (unsupported). Override per agent if a route needs more depth.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eAI Model Routing

The hierarchy

Tiers at a glance

How routing works

Why these models

Single-vendor fallback (Claude Code)

Cost & cache discipline (applies to every agent)

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

eAI Model Routing

The hierarchy

Tiers at a glance

How routing works

Why these models

Single-vendor fallback (Claude Code)

Cost & cache discipline (applies to every agent)