ruflo

Thompson sampling model router (alpha.5)

/docs/whats-new-in-37/thompson-sampling-model-router-alpha5

The 3-tier model selector (Haiku / Sonnet / Opus) is now a cost-adjusted multi-armed bandit instead of static thresholds. hooks_model-outcome calls update Beta(α, β) priors per tier; hooks_model-route samples θ ~ Beta(α, β) and picks argmax. After ~50 outcomes the routing distribution self-corrects against tier overuse — no manual threshold tuning. Cost: 45 µs per route call.