π° Intelligent 3-Tier Model Routing
/docs/what-is-it-exactly-agents-that-learn-build-and-work-perpetually/intelligent-3-tier-model-routing
Not every task needs the most powerful (and expensive) model. Ruflo analyzes each request and automatically routes it to the cheapest handler that can do the job well. Simple code transforms skip the LLM entirely using WebAssembly. Medium tasks use faster, cheaper models. Only complex architecture decisions use Opus.
Cost & Usage Benefits:
| Benefit | Impact |
|---|---|
| π΅ API Cost Reduction | 75% lower costs by using right-sized models |
| β±οΈ Claude Max Extension | More tasks within quota via smart model selection |
| π Faster Simple Tasks | <1ms for transforms vs 2-5s with LLM |
| π― Zero Wasted Tokens | Simple edits use 0 tokens (WASM handles them) |
Routing Tiers:
| Tier | Handler | Latency | Cost | Use Cases |
|---|---|---|---|---|
| 1 | Agent Booster (WASM) | <1ms | $0 | Simple transforms: varβconst, add-types, remove-console |
| 2 | Haiku/Sonnet | 500ms-2s | $0.0002-$0.003 | Bug fixes, refactoring, feature implementation |
| 3 | Opus | 2-5s | $0.015 | Architecture, security design, distributed systems |
Routing: Q-learning with epsilon-greedy exploration, sub-millisecond decision latency