How Good Is It, Really? - A Guide to LLM Evaluation

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/e4/5a/3e/e45a3ea0-0a72-e431-da49-f41566400590/mza_5289533444675556324.jpg/600x600bb.jpg

All Things LLM

Mr. Dew

15 episodes

1 month ago

In the grand finale of "All Things LLM," hosts Alex and Ben look ahead to the bleeding edge—and reflect on the ultimate question for AI: can we ever truly understand how these models think? Inside this episode: The rise of reasoning models: Discover why the next leap for AI isn’t just bigger models, but smarter thinking. Explore how OpenAI’s o1 and DeepSeek-R1 represent a paradigm shift, moving from brute-force “pre-train and scale” to dynamic, inference-time reasoning. Learn how these new mo...

Technology

RSS

All content for All Things LLM is the property of Mr. Dew and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://storage.buzzsprout.com/32lbp7dtfc0j8bpbemk5z4vvds7y?.jpg

How Good Is It, Really? - A Guide to LLM Evaluation

All Things LLM

7 minutes

1 month ago

How Good Is It, Really? - A Guide to LLM Evaluation

In the season finale of "All Things LLM," hosts Alex and Ben turn to one of the most important—and challenging—topics in AI: How do we objectively evaluate the quality and reliability of a language model? With so many models, benchmarks, and metrics, what actually counts as “good”? In this episode, you’ll discover: The evolution of LLM evaluation: From classic reference-based metrics like BLEU (translation) and ROUGE (summarization) to their limitations with today’s more sophisticated, nuance...