Reward Models | Data Brew

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/8e/51/88/8e51886a-4541-fcc1-0c09-a6d4a6636e56/mza_15905875571955043012.jpg/600x600bb.jpg

Data Brew by Databricks

Databricks

44 episodes

3 months ago

What if building a custom AI model for your business was as simple as giving feedback—no massive labeled datasets required? In this episode, we sit down with Travis Addair, CTO and Co-Founder of Predibase, creators of the first reinforcement fine-tuning platform, to explore the future of specialized AI. Discover how reinforcement fine-tuning is revolutionizing model customization, enabling you to start fast, adapt to your unique data, and keep improving through human feedback. Whether you’re ...

All content for Data Brew by Databricks is the property of Databricks and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

Education,

Leisure

https://storage.buzzsprout.com/87c29tarxucf32dcawul82cdi7m4?.jpg

Reward Models | Data Brew | Episode 40

Data Brew by Databricks

39 minutes

7 months ago

Reward Models | Data Brew | Episode 40

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF). Highlights include: - How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes. - Techniques like Policy Proximal Optimization (PPO) and Direct Preference Optimization (DPO) for enhancing response quality. - The role of reward models in improving ...