eAI is a tech-oriented, multi-model agent fleet with an explicit orchestrator/worker hierarchy. It is a derivative of agency-agents (MIT), re-cut for engineering work and wired for cost-aware model routing.
The core idea: expensive models plan and review; cheaper models execute. Every agent declares a model: tier in its frontmatter, and models.yaml is the single source of truth that maps tiers to concrete model IDs, pricing, and usage rules.
┌─────────────────────────────┐
PLAN / DECIDE │ 👑 Prime — Fable 5 │ architecture, irreversible calls,
│ 🎯 Lead — Opus 4.8 │ decomposition, dev↔QA loop, merges
└──────────────┬──────────────┘
│ assigns + reviews
┌──────────────▼──────────────┐
EXECUTE │ 🧠 Senior — Sonnet 4.6 │ correctness-critical, security, debugging
│ ⚙️ Bulk — Kimi K2 │ high-volume codegen, CMS/UI, refactors
│ ⚡ Fast — Haiku 4.5 │ lookups, formatting, classification
└─────────────────────────────┘
| Tier | Role | Model | ID | Input / Output ($/MTok) | Context | Use it for |
|---|---|---|---|---|---|---|
| 👑 Prime | Orchestrator | Claude Fable 5 | claude-fable-5 |
$10 / $50 | 1M | Mission decomposition, architecture, irreversible & consensus-critical calls, conflict resolution, final review |
| 🎯 Lead | Orchestrator | Claude Opus 4.8 | claude-opus-4-8 |
$5 / $25 | 1M | Default orchestrator: task assignment, dev↔QA loop, code-review gates, merges, escalation triage |
| 🧠 Senior | Worker | Claude Sonnet 4.6 | claude-sonnet-4-6 |
$3 / $15 | 1M | Correctness-critical implementation, security analysis, complex debugging, schema/SRE judgment |
| ⚙️ Bulk | Worker | Kimi K2 | kimi-k2 |
(Moonshot) | 256K | High-volume code generation, framework/CMS implementation, UI from spec, long mechanical refactors |
| ⚡ Fast | Worker | Claude Haiku 4.5 | claude-haiku-4-5 |
$1 / $5 | 200K | Lookups, grep/glob sweeps, formatting, linting, classification, simple single-file fixes |
Prices are the published Anthropic per-million-token rates as of 2026-06. Kimi K2 runs on Moonshot AI's API (or any OpenAI-compatible runtime); pin a dated snapshot in production and consult Moonshot for current pricing.
- Prime decomposes the mission and owns architecture. Budget: ≤5 activations per phase — it is the scalpel.
- Lead runs the day-to-day pipeline: assigns each task to the cheapest worker tier that can do it correctly, manages the dev↔QA loop (max 3 retries), and gates merges.
- Workers execute. Each worker file carries a
model:tier so the orchestrator (and your CLI) knows which model to spin it up on. - Escalation flows up the chain
fast → bulk → senior → lead → prime. See theescalation:block inmodels.yamlfor exact triggers.
- Fable 5 for Prime — the highest-capability model; reserve it for decisions that are expensive to undo. Adaptive thinking,
effort: high+ . On Fable 5, never sendthinking: {type:"disabled"}(it 400s) — omit the param. - Opus 4.8 for Lead — state-of-the-art long-horizon agentic execution at half Prime's cost; ideal for managing loops and gates across a phase.
- Sonnet 4.6 for Senior — best speed/intelligence balance; the right tier for security and correctness-critical work where you want strong reasoning without Opus pricing.
- Kimi K2 for Bulk — strong, cost-efficient code generation for high-volume mechanical work and large-context refactors. The multi-vendor tier.
- Haiku 4.5 for Fast — fastest and cheapest; perfect for read-only exploration, formatting, and classification.
Claude Code runs one vendor. When you install eAI into Claude Code, the Bulk tier (Kimi K2) falls back to claude-sonnet-4-6 automatically (see fallback: in models.yaml). To use Kimi K2 for real, install the Bulk-tier workers into a multi-vendor runtime — the Kimi CLI integration or any OpenAI-compatible client pointed at Moonshot.
- Don't invert the hierarchy. Orchestrators plan and review; workers execute. A worker never makes an architectural call.
- Don't burn Prime on routine work. Bug fixes, formatting, and "which file?" go to Lead or a worker.
- Never switch a thread's model mid-conversation — it invalidates the Anthropic prompt cache. To get a cheaper opinion mid-task, spawn a subagent on a lower tier instead of downgrading the current thread.
- Adaptive thinking everywhere on the Anthropic tiers; tune
effortper route (mediumfor most worker tasks,high/xhighfor security, architecture, and long agentic runs). - Effort is pre-tuned in frontmatter. Every agent carries an
effort:field that Claude Code honors per subagent: orchestrators and security/blockchain-senior workers runhigh, routine senior workers runmedium(Sonnet 4.6 defaults tohigh, so this is a deliberate token saving), bulk runslow, and fast (Haiku) omits it (unsupported). Override per agent if a route needs more depth.