
How can we get the best out of large language models without breaking the budget? This episode dives into Adaptive LLM Routing under Budget Constraints by Pranoy Panda, Raghav Magazine, Chaitanya Devaguptapu, Sho Takemori, and Vishal Sharma. The authors reimagine the problem of choosing the right LLM for each query as a contextual bandit task, learning from user feedback rather than costly full supervision. Their new method, PILOT, combines human preference data with online learning to route queries efficiently—achieving up to 93% of GPT-4’s performance at just 25% of its cost.
We also look at their budget-aware strategy, modeled as a multi-choice knapsack problem, that ensures smarter allocation of expensive queries to stronger models while keeping overall costs low.
Original paper: https://arxiv.org/abs/2508.21141
This podcast description was generated with the help of Google’s NotebookLM.