Skip to content

Latest commit

 

History

History
59 lines (44 loc) · 5.75 KB

File metadata and controls

59 lines (44 loc) · 5.75 KB

eAI Model Routing

eAI is a tech-oriented, multi-model agent fleet with an explicit orchestrator/worker hierarchy. It is a derivative of agency-agents (MIT), re-cut for engineering work and wired for cost-aware model routing.

The core idea: expensive models plan and review; cheaper models execute. Every agent declares a model: tier in its frontmatter, and models.yaml is the single source of truth that maps tiers to concrete model IDs, pricing, and usage rules.

The hierarchy

                    ┌─────────────────────────────┐
   PLAN / DECIDE    │  👑 Prime    — Fable 5       │   architecture, irreversible calls,
                    │  🎯 Lead     — Opus 4.8      │   decomposition, dev↔QA loop, merges
                    └──────────────┬──────────────┘
                                   │ assigns + reviews
                    ┌──────────────▼──────────────┐
   EXECUTE          │  🧠 Senior   — Sonnet 4.6    │   correctness-critical, security, debugging
                    │  ⚙️  Bulk     — Kimi K2       │   high-volume codegen, CMS/UI, refactors
                    │  ⚡ Fast     — Haiku 4.5     │   lookups, formatting, classification
                    └─────────────────────────────┘

Tiers at a glance

Tier Role Model ID Input / Output ($/MTok) Context Use it for
👑 Prime Orchestrator Claude Fable 5 claude-fable-5 $10 / $50 1M Mission decomposition, architecture, irreversible & consensus-critical calls, conflict resolution, final review
🎯 Lead Orchestrator Claude Opus 4.8 claude-opus-4-8 $5 / $25 1M Default orchestrator: task assignment, dev↔QA loop, code-review gates, merges, escalation triage
🧠 Senior Worker Claude Sonnet 4.6 claude-sonnet-4-6 $3 / $15 1M Correctness-critical implementation, security analysis, complex debugging, schema/SRE judgment
⚙️ Bulk Worker Kimi K2 kimi-k2 (Moonshot) 256K High-volume code generation, framework/CMS implementation, UI from spec, long mechanical refactors
Fast Worker Claude Haiku 4.5 claude-haiku-4-5 $1 / $5 200K Lookups, grep/glob sweeps, formatting, linting, classification, simple single-file fixes

Prices are the published Anthropic per-million-token rates as of 2026-06. Kimi K2 runs on Moonshot AI's API (or any OpenAI-compatible runtime); pin a dated snapshot in production and consult Moonshot for current pricing.

How routing works

  1. Prime decomposes the mission and owns architecture. Budget: ≤5 activations per phase — it is the scalpel.
  2. Lead runs the day-to-day pipeline: assigns each task to the cheapest worker tier that can do it correctly, manages the dev↔QA loop (max 3 retries), and gates merges.
  3. Workers execute. Each worker file carries a model: tier so the orchestrator (and your CLI) knows which model to spin it up on.
  4. Escalation flows up the chain fast → bulk → senior → lead → prime. See the escalation: block in models.yaml for exact triggers.

Why these models

  • Fable 5 for Prime — the highest-capability model; reserve it for decisions that are expensive to undo. Adaptive thinking, effort: high+ . On Fable 5, never send thinking: {type:"disabled"} (it 400s) — omit the param.
  • Opus 4.8 for Lead — state-of-the-art long-horizon agentic execution at half Prime's cost; ideal for managing loops and gates across a phase.
  • Sonnet 4.6 for Senior — best speed/intelligence balance; the right tier for security and correctness-critical work where you want strong reasoning without Opus pricing.
  • Kimi K2 for Bulk — strong, cost-efficient code generation for high-volume mechanical work and large-context refactors. The multi-vendor tier.
  • Haiku 4.5 for Fast — fastest and cheapest; perfect for read-only exploration, formatting, and classification.

Single-vendor fallback (Claude Code)

Claude Code runs one vendor. When you install eAI into Claude Code, the Bulk tier (Kimi K2) falls back to claude-sonnet-4-6 automatically (see fallback: in models.yaml). To use Kimi K2 for real, install the Bulk-tier workers into a multi-vendor runtime — the Kimi CLI integration or any OpenAI-compatible client pointed at Moonshot.

Cost & cache discipline (applies to every agent)

  • Don't invert the hierarchy. Orchestrators plan and review; workers execute. A worker never makes an architectural call.
  • Don't burn Prime on routine work. Bug fixes, formatting, and "which file?" go to Lead or a worker.
  • Never switch a thread's model mid-conversation — it invalidates the Anthropic prompt cache. To get a cheaper opinion mid-task, spawn a subagent on a lower tier instead of downgrading the current thread.
  • Adaptive thinking everywhere on the Anthropic tiers; tune effort per route (medium for most worker tasks, high/xhigh for security, architecture, and long agentic runs).
  • Effort is pre-tuned in frontmatter. Every agent carries an effort: field that Claude Code honors per subagent: orchestrators and security/blockchain-senior workers run high, routine senior workers run medium (Sonnet 4.6 defaults to high, so this is a deliberate token saving), bulk runs low, and fast (Haiku) omits it (unsupported). Override per agent if a route needs more depth.