Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
News
Sports
TV & Film
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/bd/6b/33/bd6b33f3-f3b2-5a9f-eae8-30a5cf56d14a/mza_3488893396385669584.jpg/600x600bb.jpg
Gradient Descent - Podcast about AI and Data
Wisecube AI
6 episodes
6 days ago
“Gradient Descent" is a podcast that delves into the depths of artificial intelligence and data science. Hosted by Vishnu Vettrivel (Founder of Wisecube AI) and Alex Thomas (Principal Data Scientist), the show explores the latest trends, innovations, and practical applications in AI and data science. Join us to learn more about how these technologies are shaping our future.
Show more...
Technology
RSS
All content for Gradient Descent - Podcast about AI and Data is the property of Wisecube AI and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
“Gradient Descent" is a podcast that delves into the depths of artificial intelligence and data science. Hosted by Vishnu Vettrivel (Founder of Wisecube AI) and Alex Thomas (Principal Data Scientist), the show explores the latest trends, innovations, and practical applications in AI and data science. Join us to learn more about how these technologies are shaping our future.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43197403/43197403-1741271298817-7bd9c90c40ee3.jpg
LLM as a Judge: Can AI Evaluate Itself?
Gradient Descent - Podcast about AI and Data
31 minutes 59 seconds
7 months ago
LLM as a Judge: Can AI Evaluate Itself?

In the second episode of Gradient Descent, Vishnu Vettrivel (CTO of Wisecube) and Alex Thomas (Principal Data Scientist) explore the innovative yet controversial idea of using LLMs to judge and evaluate other AI systems. They discuss the hidden human role in AI training, limitations of traditional benchmarks, automated evaluation strengths and weaknesses, and best practices for building reliable AI judgment systems.

Timestamps:

00:00 – Introduction & Context

01:00 – The Role of Humans in AI

03:58 – Why Is Evaluating LLMs So Difficult?

09:00 – Pros and Cons of LLM-as-a-Judge

14:30 – How to Make LLM-as-a-Judge More Reliable?

19:30 – Trust and Reliability Issues

25:00 – The Future of LLM-as-a-Judge

30:00 – Final Thoughts and Takeaways


Listen on:

• ⁠YouTube⁠: https://youtube.com/@WisecubeAI/podcasts

• ⁠Apple Podcast⁠: https://apple.co/4kPMxZf

• ⁠Spotify⁠: https://open.spotify.com/show/1nG58pwg2Dv6oAhCTzab55

• ⁠Amazon Music⁠: https://bit.ly/4izpdO2


Follow us:

• ⁠Pythia Website⁠: www.askpythia.ai

• ⁠Wisecube Website⁠: www.wisecube.ai

• ⁠Linkedin⁠: www.linkedin.com/company/wisecube

• ⁠Facebook⁠: www.facebook.com/wisecubeai

• ⁠Reddit⁠: www.reddit.com/r/pythia/

Mentioned Materials:

- Best Practices for LLM-as-a-Judge: https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG

- LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods: https://arxiv.org/pdf/2412.05579v2

- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena: https://arxiv.org/abs/2306.05685

- Guide to LLM-as-a-Judge: https://www.evidentlyai.com/llm-guide/llm-as-a-judge

- Preference Leakage: A Contamination Problem in LLM-as-a-Judge: https://arxiv.org/pdf/2502.01534

- Large Language Models Are Not Fair Evaluators: https://arxiv.org/pdf/2305.17926

- Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment: https://arxiv.org/pdf/2402.14016v2

- Optimization-based Prompt Injection Attack to LLM-as-a-Judge: https://arxiv.org/pdf/2403.17710v4

- AWS Bedrock: Model Evaluation: https://aws.amazon.com/blogs/machine-learning/llm-as-a-judge-on-amazon-bedrock-model-evaluation/

- Hugging Face: LLM Judge Cookbook: https://huggingface.co/learn/cookbook/en/llm_judge

Gradient Descent - Podcast about AI and Data
“Gradient Descent" is a podcast that delves into the depths of artificial intelligence and data science. Hosted by Vishnu Vettrivel (Founder of Wisecube AI) and Alex Thomas (Principal Data Scientist), the show explores the latest trends, innovations, and practical applications in AI and data science. Join us to learn more about how these technologies are shaping our future.