EveryInc · whw · Feb 12, 2026 · Feb 12, 2026
@@ -0,0 +1,255 @@
+---
+name: model-hierarchy
+description: >
+  Cost-optimize AI agent operations by routing tasks to appropriate models based on complexity.
+  Use this skill when: (1) deciding which model to use for a task, (2) spawning sub-agents,
+  (3) considering cost efficiency, (4) the current model feels like overkill for the task.
+  Triggers: "model routing", "cost optimization", "which model", "too expensive", "spawn agent".
+
+---
+
+# Model Hierarchy
+
+Route tasks to the cheapest model that can handle them. Most agent work is routine.
+
+## Core Principle
+
+**80% of agent tasks are janitorial.** File reads, status checks, formatting, simple Q&A. These don't need expensive models. Reserve premium models for problems that actually require deep reasoning.
+
+## Model Tiers
+
+### Tier 1: Cheap ($0.10-0.50/M tokens)
+
+| Model | Input | Output | Best For |
+|-------|-------|--------|----------|
+| DeepSeek V3 | $0.14 | $0.28 | General routine work |
+| GPT-4o-mini | $0.15 | $0.60 | Quick responses |
+| Claude Haiku | $0.25 | $1.25 | Fast tool use |
+| Gemini Flash | $0.075 | $0.30 | High volume |
+
+### Tier 2: Mid ($1-5/M tokens)
+
+| Model | Input | Output | Best For |
+|-------|-------|--------|----------|
+| Claude Sonnet | $3.00 | $15.00 | Balanced performance |
+| GPT-4o | $2.50 | $10.00 | Multimodal tasks |
+| Gemini Pro | $1.25 | $5.00 | Long context |
+
+### Tier 3: Premium ($10-75/M tokens)
+
+| Model | Input | Output | Best For |
+|-------|-------|--------|----------|
+| Claude Opus | $15.00 | $75.00 | Complex reasoning |
+| GPT-4.5 | $75.00 | $150.00 | Frontier tasks |
+| o1 | $15.00 | $60.00 | Multi-step reasoning |
+| o3-mini | $1.10 | $4.40 | Reasoning on budget |
+
+*Prices as of Feb 2026. Check provider docs for current rates.*
+
+## Task Classification
+
+Before executing any task, classify it:
+
+### ROUTINE → Use Tier 1
+
+Characteristics:
+- Single-step operations
+- Clear, unambiguous instructions
+- No judgment required
+- Deterministic output expected
+
+Examples:
+- File read/write operations
+- Status checks and health monitoring
+- Simple lookups (time, weather, definitions)
+- Formatting and restructuring text
+- List operations (filter, sort, transform)
+- API calls with known parameters
+- Heartbeat and cron tasks
+- URL fetching and basic parsing
+
+### MODERATE → Use Tier 2
+
+Characteristics:
+- Multi-step but well-defined
+- Some synthesis required
+- Standard patterns apply
+- Quality matters but isn't critical
+
+Examples:
+- Code generation (standard patterns)
+- Summarization and synthesis
+- Draft writing (emails, docs, messages)
+- Data analysis and transformation
+- Multi-file operations
+- Tool orchestration
+- Code review (non-security)
+- Search and research tasks
+
+### COMPLEX → Use Tier 3
+
+Characteristics:
+- Novel problem solving required
+- Multiple valid approaches
+- Nuanced judgment calls
+- High stakes or irreversible
+- Previous attempts failed
+
+Examples:
+- Multi-step debugging
+- Architecture and design decisions
+- Security-sensitive code review
+- Tasks where cheaper model already failed
+- Ambiguous requirements needing interpretation
+- Long-context reasoning (>50K tokens)
+- Creative work requiring originality
+- Adversarial or edge-case handling
+
+## Decision Algorithm
+
+```
+function selectModel(task):
+    # Rule 1: Escalation override
+    if task.previousAttemptFailed:
+        return nextTierUp(task.previousModel)
+
+    # Rule 2: Explicit complexity signals
+    if task.hasSignal("debug", "architect", "design", "security"):
+        return TIER_3
+
+    if task.hasSignal("write", "code", "summarize", "analyze"):
+        return TIER_2
+
+    # Rule 3: Default classification
+    complexity = classifyTask(task)
+
+    if complexity == ROUTINE:
+        return TIER_1
+    elif complexity == MODERATE:
+        return TIER_2
+    else:
+        return TIER_3
+```
+
+## Behavioral Rules
+
+### For Main Session
+
+1. **Default to Tier 2** for interactive work
+2. **Suggest downgrade** when doing routine work: "This is routine - I can handle this on a cheaper model or spawn a sub-agent."
+3. **Request upgrade** when stuck: "This needs more reasoning power. Switching to [premium model]."
+
+### For Sub-Agents
+
+1. **Default to Tier 1** unless task is clearly moderate+
+2. **Batch similar tasks** to amortize overhead
+3. **Report failures** back to parent for escalation
+
+### For Automated Tasks
+
+1. **Heartbeats/monitoring** → Always Tier 1
+2. **Scheduled reports** → Tier 1 or 2 based on complexity
+3. **Alert responses** → Start Tier 2, escalate if needed
+
+## Communication Patterns
+
+When suggesting model changes, use clear language:
+
+**Downgrade suggestion:**
+> "This looks like routine file work. Want me to spawn a sub-agent on DeepSeek for this? Same result, fraction of the cost."
+
+**Upgrade request:**
+> "I'm hitting the limits of what I can figure out here. This needs Opus-level reasoning. Switching up."
+
+**Explaining hierarchy:**
+> "I'm running the heavy analysis on Sonnet while sub-agents fetch the data on DeepSeek. Keeps costs down without sacrificing quality where it matters."
+
+## Cost Impact
+
+Assuming 100K tokens/day average usage:
+
+| Strategy | Monthly Cost | Notes |
+|----------|--------------|-------|
+| Pure Opus | ~$225 | Maximum capability, maximum spend |
+| Pure Sonnet | ~$45 | Good default for most work |
+| Pure DeepSeek | ~$8 | Cheap but limited on hard problems |
+| **Hierarchy (80/15/5)** | **~$19** | Best of all worlds |
+
+The 80/15/5 split:
+- 80% routine tasks on Tier 1 (~$6)
+- 15% moderate tasks on Tier 2 (~$7)
+- 5% complex tasks on Tier 3 (~$6)
+
+**Result: 10x cost reduction vs pure premium, with equivalent quality on complex tasks.**
+
+## Integration Examples
+
+### OpenClaw
+
+```yaml
+# config.yml - set default model
+model: anthropic/claude-sonnet-4
+
+# In session, switch models
+/model opus  # upgrade for complex task
+/model deepseek  # downgrade for routine
+
+# Spawn sub-agent on cheap model
+sessions_spawn:
+  task: "Fetch and parse these 50 URLs"
+  model: deepseek
+```
+
+### Claude Code
+
+```
+# In CLAUDE.md or project instructions
+When spawning background agents, use claude-3-haiku for:
+- File operations
+- Simple searches  
+- Status checks
+
+Reserve claude-sonnet-4 for:
+- Code generation
+- Analysis tasks
+```
+
+### General Agent Systems
+
+```python
+def get_model_for_task(task_description: str) -> str:
+    routine_signals = ['read', 'fetch', 'check', 'list', 'format', 'status']
+    complex_signals = ['debug', 'architect', 'design', 'security', 'why']
+
+    desc_lower = task_description.lower()
+
+    if any(signal in desc_lower for signal in complex_signals):
+        return "claude-opus-4"
+    elif any(signal in desc_lower for signal in routine_signals):
+        return "deepseek-v3"
+    else:
+        return "claude-sonnet-4"
+```
+
+## Anti-Patterns
+
+**DON'T:**
+- Run heartbeats on Opus
+- Use premium models for file I/O
+- Keep expensive model when task is clearly routine
+- Spawn sub-agents on premium models by default
+
+**DO:**
+- Start mid-tier, adjust based on task
+- Spawn helpers on cheapest viable model
+- Escalate explicitly when stuck
+- Track cost per task type to optimize further
+
+## Extending This Skill
+
+To customize for your use case:
+
+1. **Adjust tier definitions** based on your provider/budget
+2. **Add domain-specific signals** to classification rules
+3. **Track actual complexity** vs predicted to improve heuristics
+4. **Set budget alerts** to catch runaway premium usage