All content for YAAP (Yet Another AI Podcast) is the property of AI21 and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Your LLM gave a great answer. But who decides what “great” means?
In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story.
Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.