Problem
GA currently has no built-in mechanism to route tasks to different LLMs based on task semantics. Users either hardcode a model or switch manually. As model options grow (cost tiers, capability tiers), routing becomes a real problem.
Proposed approach
A three-layer routing module that fits in a single file:
Layer 1 - Reranker (primary path)
Each model has a set of exemplar task strings. A cross-encoder reranker scores the incoming task against all exemplars in one pass, aggregates per-model (top-2 mean), and picks the winner. Fast, no LLM call needed.
Layer 2 - Causal LLM fallback
When L1 fails (API down / low confidence margin), a small LLM classifies the task using model descriptions. Graceful degradation.
Layer 2.5 - Anti-bias challenger
When L2 selects a high-tier model (sonnet/opus), an independent model family cross-checks the decision to prevent single-model bias. Upgrades are accepted; downgrades are rejected.
Layer 3 - Feedback loop
feedback(routing_id, outcome) adjusts per-model weights at runtime. No retraining.
Why it fits GA's design
Single file, ~400 lines in a clean implementation.
Zero required dependencies beyond requests (already in use).
Reranker provider is configurable; falls back gracefully if unavailable.
Self-evolving: exemplar pool grows from successful routing history.
Interface: decide(task: str) -> str, returns model id, nothing more.
Safe in multi-worker scenarios: routing is decided once per task at dispatch time. Once a worker starts, its model is fixed for the entire session, no mid-session switching, no cross-worker interference. Context coherence is preserved by design.
Prior art
Running a battle-tested version of this in production on a fork for ~2 months. Routing accuracy measured against manual labels: L1 handles ~78% of cases, L2+L2.5 covers the rest. Cost reduction vs always-using-best-model: ~40%.
Question
Is this the kind of routing primitive that belongs in the core repo, or would this be better suited for the Skill Marketplace once it launches?
Problem
GA currently has no built-in mechanism to route tasks to different LLMs based on task semantics. Users either hardcode a model or switch manually. As model options grow (cost tiers, capability tiers), routing becomes a real problem.
Proposed approach
A three-layer routing module that fits in a single file:
Layer 1 - Reranker (primary path)
Each model has a set of exemplar task strings. A cross-encoder reranker scores the incoming task against all exemplars in one pass, aggregates per-model (top-2 mean), and picks the winner. Fast, no LLM call needed.
Layer 2 - Causal LLM fallback
When L1 fails (API down / low confidence margin), a small LLM classifies the task using model descriptions. Graceful degradation.
Layer 2.5 - Anti-bias challenger
When L2 selects a high-tier model (sonnet/opus), an independent model family cross-checks the decision to prevent single-model bias. Upgrades are accepted; downgrades are rejected.
Layer 3 - Feedback loop
feedback(routing_id, outcome) adjusts per-model weights at runtime. No retraining.
Why it fits GA's design
Single file, ~400 lines in a clean implementation.
Zero required dependencies beyond requests (already in use).
Reranker provider is configurable; falls back gracefully if unavailable.
Self-evolving: exemplar pool grows from successful routing history.
Interface: decide(task: str) -> str, returns model id, nothing more.
Safe in multi-worker scenarios: routing is decided once per task at dispatch time. Once a worker starts, its model is fixed for the entire session, no mid-session switching, no cross-worker interference. Context coherence is preserved by design.
Prior art
Running a battle-tested version of this in production on a fork for ~2 months. Routing accuracy measured against manual labels: L1 handles ~78% of cases, L2+L2.5 covers the rest. Cost reduction vs always-using-best-model: ~40%.
Question
Is this the kind of routing primitive that belongs in the core repo, or would this be better suited for the Skill Marketplace once it launches?