ruflo

πŸ’° Intelligent 3-Tier Model Routing

/docs/what-is-it-exactly-agents-that-learn-build-and-work-perpetually/intelligent-3-tier-model-routing

Not every task needs the most powerful (and expensive) model. Ruflo analyzes each request and automatically routes it to the cheapest handler that can do the job well. Simple code transforms skip the LLM entirely using WebAssembly. Medium tasks use faster, cheaper models. Only complex architecture decisions use Opus.

Cost & Usage Benefits:

BenefitImpact
πŸ’΅ API Cost Reduction75% lower costs by using right-sized models
⏱️ Claude Max ExtensionMore tasks within quota via smart model selection
πŸš€ Faster Simple Tasks<1ms for transforms vs 2-5s with LLM
🎯 Zero Wasted TokensSimple edits use 0 tokens (WASM handles them)

Routing Tiers:

TierHandlerLatencyCostUse Cases
1Agent Booster (WASM)<1ms$0Simple transforms: var→const, add-types, remove-console
2Haiku/Sonnet500ms-2s$0.0002-$0.003Bug fixes, refactoring, feature implementation
3Opus2-5s$0.015Architecture, security design, distributed systems

Routing: Q-learning with epsilon-greedy exploration, sub-millisecond decision latency