The Judge Model Diaries: Judging the Judges

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/21/f5/0a/21f50a00-e939-5dda-4905-c62a9fdd024f/mza_9854866287433778669.jpg/600x600bb.jpg

YAAP (Yet Another AI Podcast)

AI21

10 episodes

1 week ago

All content for YAAP (Yet Another AI Podcast) is the property of AI21 and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

The Judge Model Diaries: Judging the Judges

YAAP (Yet Another AI Podcast)

30 minutes

2 months ago

The Judge Model Diaries: Judging the Judges

Your LLM gave a great answer. But who decides what “great” means? In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story. Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.