Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
News
Sports
TV & Film
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/bd/6b/33/bd6b33f3-f3b2-5a9f-eae8-30a5cf56d14a/mza_3488893396385669584.jpg/600x600bb.jpg
Gradient Descent - Podcast about AI and Data
Wisecube AI
6 episodes
6 days ago
“Gradient Descent" is a podcast that delves into the depths of artificial intelligence and data science. Hosted by Vishnu Vettrivel (Founder of Wisecube AI) and Alex Thomas (Principal Data Scientist), the show explores the latest trends, innovations, and practical applications in AI and data science. Join us to learn more about how these technologies are shaping our future.
Show more...
Technology
RSS
All content for Gradient Descent - Podcast about AI and Data is the property of Wisecube AI and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
“Gradient Descent" is a podcast that delves into the depths of artificial intelligence and data science. Hosted by Vishnu Vettrivel (Founder of Wisecube AI) and Alex Thomas (Principal Data Scientist), the show explores the latest trends, innovations, and practical applications in AI and data science. Join us to learn more about how these technologies are shaping our future.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/43197403/43197403-1747175078499-926f67b736bb.jpg
LLM Fine-Tuning: RLHF vs DPO and Beyond
Gradient Descent - Podcast about AI and Data
37 minutes 36 seconds
5 months ago
LLM Fine-Tuning: RLHF vs DPO and Beyond

In this episode of Gradient Descent, we explore two competing approaches to fine-tuning LLMs: Reinforcement Learning with Human Feedback (RLHF) and Direct Preference Optimization (DPO). Dive into the mechanics of RLHF, its computational challenges, and how DPO simplifies the process by eliminating the need for a separate reward model. We also discuss supervised fine-tuning, emerging methods like Identity Preference Optimization (IPO) and Kahneman-Tversky Optimization (KTO), and their real-world applications in models like Llama 3 and Mistral. Learn practical LLM optimization strategies, including task modularization to boost performance without extensive fine-tuning.


Timestamps:

Intro - 0:00

Overview of LLM Fine-Tuning - 00:48

Deep Dive into RLHF - 02:46

Supervised Fine-Tuning vs. RLHF - 10:38

DPO and Other RLHF Alternatives - 14:43

Real-World Applications in Frontier Models - 22:23

Practical Tips for LLM Optimization - 25:18

Closing Thoughts - 36:05


References:

[1] Training language models to follow instructions with human feedback https://arxiv.org/abs/2203.02155

[2] Direct Preference Optimization: Your Language Model is Secretly a Reward Model https://arxiv.org/abs/2305.18290

[3] Hugging Face Blog on DPO: Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) https://huggingface.co/blog/ariG23498/rlhf-to-dpo

[4] Comparative Analysis: RLHF and DPO Compared https://crowdworks.blog/en/rlhf-and-dpo-compared/

[5] YouTube Explanation: How to fine-tune LLMs directly without reinforcement learning https://www.youtube.com/watch?v=k2pD3k1485A


Listen on:

• Apple Podcasts:

https://podcasts.apple.com/us/podcast/gradient-descent-podcast-about-ai-and-data/id1801323847

• Spotify:

https://open.spotify.com/show/1nG58pwg2Dv6oAhCTzab55

• Amazon Music:

https://music.amazon.com/podcasts/79f6ed45-ef49-4919-bebc-e746e0afe94c/gradient-descent---podcast-about-ai-and-data


Our solutions:

- https://askpythia.ai/ - LLM Hallucination Detection Tool

- https://www.wisecube.ai - Wisecube AI platform for large-scale biomedical knowledge analysis


Follow us:

- Pythia Website: https://askpythia.ai/

- Wisecube Website: https://www.wisecube.ai

- LinkedIn: https://www.linkedin.com/company/wisecube/

- Facebook: https://www.facebook.com/wisecubeai

- Twitter: https://x.com/wisecubeai

- Reddit: https://www.reddit.com/r/pythia/

- GitHub: https://github.com/wisecubeai


#FineTuning #LLM #DeepLearning #RLHF #DPO #AI #MachineLearning #AIDevelopment

Gradient Descent - Podcast about AI and Data
“Gradient Descent" is a podcast that delves into the depths of artificial intelligence and data science. Hosted by Vishnu Vettrivel (Founder of Wisecube AI) and Alex Thomas (Principal Data Scientist), the show explores the latest trends, innovations, and practical applications in AI and data science. Join us to learn more about how these technologies are shaping our future.