Data Brew by Databricks

EXPLORE

Society & Culture

Health & Fitness

© 2024 PodJoint

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/8e/51/88/8e51886a-4541-fcc1-0c09-a6d4a6636e56/mza_15905875571955043012.jpg/600x600bb.jpg

Data Brew by Databricks

Databricks

44 episodes

3 months ago

What if building a custom AI model for your business was as simple as giving feedback—no massive labeled datasets required? In this episode, we sit down with Travis Addair, CTO and Co-Founder of Predibase, creators of the first reinforcement fine-tuning platform, to explore the future of specialized AI. Discover how reinforcement fine-tuning is revolutionizing model customization, enabling you to start fast, adapt to your unique data, and keep improving through human feedback. Whether you’re ...

Show more...

All content for Data Brew by Databricks is the property of Databricks and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

What if building a custom AI model for your business was as simple as giving feedback—no massive labeled datasets required? In this episode, we sit down with Travis Addair, CTO and Co-Founder of Predibase, creators of the first reinforcement fine-tuning platform, to explore the future of specialized AI. Discover how reinforcement fine-tuning is revolutionizing model customization, enabling you to start fast, adapt to your unique data, and keep improving through human feedback. Whether you’re ...

Show more...

Episodes (20/44)

Data Brew by Databricks

Reinforcement Fine-Tuning and the Future of Specialized AI Models

What if building a custom AI model for your business was as simple as giving feedback—no massive labeled datasets required? In this episode, we sit down with Travis Addair, CTO and Co-Founder of Predibase, creators of the first reinforcement fine-tuning platform, to explore the future of specialized AI. Discover how reinforcement fine-tuning is revolutionizing model customization, enabling you to start fast, adapt to your unique data, and keep improving through human feedback. Whether you’re ...

3 months ago

40 minutes

Data Brew by Databricks

Benchmarking Domain Intelligence | Data Brew | Episode 45

In this episode, Pallavi Koppol, Research Scientist at Databricks, explores the importance of domain-specific intelligence in large language models (LLMs). She discusses how enterprises need models tailored to their unique jargon, data, and tasks rather than relying solely on general benchmarks. Highlights include: - Why benchmarking LLMs for domain-specific tasks is critical for enterprise AI. - An introduction to the Databricks Intelligence Benchmarking Suite (DIBS). - Evaluating models on...

6 months ago

31 minutes

Data Brew by Databricks

SWE-bench & SWE-agent | Data Brew | Episode 44

In this episode, Kilian Lieret, Research Software Engineer, and Carlos Jimenez, Computer Science PhD Candidate at Princeton University, discuss SWE-bench and SWE-agent, two groundbreaking tools for evaluating and enhancing AI in software engineering. Highlights include: - SWE-bench: A benchmark for assessing AI models on real-world coding tasks. - Addressing data leakage concerns in GitHub-sourced benchmarks. - SWE-agent: An AI-driven system for navigating and solving coding challenges. - Ov...

6 months ago

36 minutes

Data Brew by Databricks

Enterprise AI: Research to Product | Data Brew | Episode 43

In this episode, Dipendra Kumar, Staff Research Scientist, and Alnur Ali, Staff Software Engineer at Databricks, discuss the challenges of applying AI in enterprise environments and the tools being developed to bridge the gap between research and real-world deployment. Highlights include: - The challenges of real-world AI—messy data, security, and scalability. - Why enterprises need high-accuracy, fine-tuned models over generic AI APIs. - How QuickFix learns from user edits to improve AI-dri...

6 months ago

38 minutes

Data Brew by Databricks

Multimodal AI | Data Brew | Episode 42

In this episode, Chang She, CEO and Co-founder of LanceDB, discusses the challenges of handling multimodal data and how LanceDB provides a cutting-edge solution. He shares his journey from contributing to Pandas to building a database optimized for images, video, vectors, and subtitles. Highlights include: - The limitations of traditional storage systems like Parquet for multimodal AI. - How LanceDB enables efficient querying and processing of diverse data types. - The growing importance of ...

7 months ago

42 minutes

Data Brew by Databricks

Age of Agents | Data Brew | Episode 41

In this episode, Michele Catasta, President of Replit, explores how AI-driven agents are transforming software development by making coding more accessible and automating application creation. Highlights include: - The difference between AI agents and copilots in software development. - How AI is democratizing coding, enabling non-programmers to build applications. - Challenges in AI agent development, including error handling and software quality. - The growing role of AI in entrepreneurshi...

7 months ago

40 minutes

Data Brew by Databricks

Reward Models | Data Brew | Episode 40

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF). Highlights include: - How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes. - Techniques like Policy Proximal Optimization (PPO) and Direct Preference Optimization (DPO) for enhancing response quality. - The role of reward models in improving ...

7 months ago

39 minutes

Data Brew by Databricks

Retrieval, rerankers, and RAG tips and tricks | Data Brew | Episode 39

In this episode, Andrew Drozdov, Research Scientist at Databricks, explores how Retrieval Augmented Generation (RAG) enhances AI models by integrating retrieval capabilities for improved response accuracy and relevance. Highlights include: - Addressing LLM limitations by injecting relevant external information. - Optimizing document chunking, embedding, and query generation for RAG. - Improving retrieval systems with embeddings and fine-tuning techniques. - Enhancing search results using re-...

8 months ago

45 minutes

Data Brew by Databricks

The Power of Synthetic Data | Data Brew | Episode 38

In this episode, Yev Meyer, Chief Scientist at Gretel AI, explores how synthetic data transforms AI and ML by improving data access, quality, privacy, and model training. Highlights include: - Leveraging synthetic data to overcome AI data limitations. - Enhancing model training while mitigating ethical and privacy risks. - Exploring the intersection of computational neuroscience and AI workflows. - Addressing licensing and legal considerations in synthetic data usage. - Unlocking private dat...

9 months ago

42 minutes

Data Brew by Databricks

Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37

In this episode, Julia Neagu, CEO & co-founder of Quotient AI, explores the challenges of deploying Generative AI and LLMs, focusing on model evaluation, human-in-the-loop systems, and iterative development.Highlights include:- Merging reinforcement learning and unsupervised learning for real-time AI optimization.- Reducing bias in machine learning with fairness and ethical considerations.- Lessons from large-scale AI deployments on scalability and feedback loops.- Automating workflows wi...

9 months ago

37 minutes

Data Brew by Databricks

Mixture of Memory Experts (MoME) | Data Brew | Episode 36

In this episode, Sharon Zhou, Co-Founder and CEO of Lamini AI, shares her expertise in the world of AI, focusing on fine-tuning models for improved performance and reliability.Highlights include:- The integration of determinism and probabilism for handling unstructured data and user queries effectively.- Proprietary techniques like memory tuning and robust evaluation frameworks to mitigate model inaccuracies and hallucinations.- Lessons learned from deploying AI applications, including insigh...

9 months ago

41 minutes

Data Brew by Databricks

Mixed Attention & LLM Context | Data Brew | Episode 35

In this episode, Shashank Rajput, Research Scientist at Mosaic and Databricks, explores innovative approaches in large language models (LLMs), with a focus on Retrieval Augmented Generation (RAG) and its impact on improving efficiency and reducing operational costs. Highlights include: - How RAG enhances LLM accuracy by incorporating relevant external documents. - The evolution of attention mechanisms, including mixed attention strategies. - Practical applications of Mamba architectures and ...

11 months ago

39 minutes

Data Brew by Databricks

Kumo AI & Relational Deep Learning | Data Brew | Episode 34

In this episode, Jure Leskovec, Co-founder of Kumo AI and Professor of Computer Science at Stanford University, discusses Relational Deep Learning (RDL) and its role in automating feature engineering. Highlights include: - How RDL enhances predictive modeling. - Applications in fraud detection and recommendation systems. - The use of graph neural networks to simplify complex data structures.

1 year ago

43 minutes

Data Brew by Databricks

LLMs: Internals, Hallucinations, and Applications | Data Brew | Episode 33

Our fifth season dives into large language models (LLMs), from understanding the internals to the risks of using them and everything in between. While we're at it, we'll be enjoying our morning brew. In this session, we interviewed Chengyin Eng (Senior Data Scientist, Databricks), Sam Raymond (Senior Data Scientist, Databricks), and Joseph Bradley (Lead Production Specialist - ML, Databricks) on the best practices around LLM use cases, prompt engineering, and how to adapt MLOps for LLMs (i.e...

2 years ago

38 minutes

Data Brew by Databricks

Demonstrate–Search–Predict Framework | Data Brew | Episode 32

We will dive into LLMs for our fifth season, from understanding the internals to the risks of using them and everything in between. While we’re at it, we’ll be enjoying our morning brew. In this session, we interviewed Omar Khattab - Computer Science Ph.D. Student at Stanford, creator of DSP (Demonstrate–Search–Predict Framework), to discuss DSP, common applications, and the future of NLP.

2 years ago

33 minutes

Data Brew by Databricks

Generative AI Risks | Data Brew | Episode 31

We will dive into LLMs for our fifth season, from understanding the internals to the risks of using them and everything in between. While we’re at it, we’ll be enjoying our morning brew. In this session, we interviewed Yaron Singer, CEO of Robust Intelligence, Professor of Computer Science at Harvard University, and guest of Data Brew Season 3 (our first repeat guest!). In this session, we discuss generative AI, the trends toward embracing LLMs, and how the surface area for vulne...

2 years ago

34 minutes

Data Brew by Databricks

John Snow Labs & SparkNLP | Data Brew | Episode 30

We are back and we will dive into LLMs from understanding the internals to the risks of using them and everything in between. While we’re at it, we’ll be enjoying our morning brew. In this session, we interviewed David Talby who is the CTO at John Snow Labs; they help healthcare & life science companies put AI to good use. David's interests include natural language processing, applied artificial intelligence in healthcare, and responsible AI.

2 years ago

43 minutes

Data Brew by Databricks

Data Brew Season 4 Episode 6: Professional Athletes

For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we’re at it, we’ll be enjoying our morning brew. Shayna Powless and Eli Ankou, professional cyclist for L39ion of Los Angeles and defensive tackle for the Buffalo Bills, respectively, provide valuable insight on how professional athletes leverage data to improve their performance and how they combine their passion for sports with the Dreamcatcher Foundation. See more at data...

3 years ago

35 minutes

Data Brew by Databricks

Data Brew Season 4 Episode 5: Public Health: Education, Access, and Policy

For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we’re at it, we’ll be enjoying our morning brew. Matt Willis, Marin County Public Health Officer, shares the three pillars of public health: education, access, and policy, and the critical role data plays in addressing the COVID-19 pandemic & opioid epidemic. See more at databricks.com/data-brew

3 years ago

34 minutes

Data Brew by Databricks

Data Brew Season 4 Episode 4: 1283 Days of Running (and Counting)

For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we’re at it, we’ll be enjoying our morning brew. Running the length of the US every year, Alexandra Matthiesen shares her motivational secrets for running 1,283 consecutive days (and counting!) and redefining physical and mental limits. See more at databricks.com/data-brew

3 years ago

35 minutes