Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
Technology
Health & Fitness
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/6a/24/22/6a242243-a886-3562-51aa-5b0137909c8b/mza_6305134645633578970.jpg/600x600bb.jpg
The AI Research Deep Dive
The AI Research Deep Dive
36 episodes
6 days ago
From arXiv to insight: a daily tour of cutting-edge AI papers. The AI Research Deep Dive podcast dives into a new groundbreaking research paper every day. It combs through the most important details and results to give you a great idea of what the paper accomplishes and how it gets there.
Show more...
Science
RSS
All content for The AI Research Deep Dive is the property of The AI Research Deep Dive and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
From arXiv to insight: a daily tour of cutting-edge AI papers. The AI Research Deep Dive podcast dives into a new groundbreaking research paper every day. It combs through the most important details and results to give you a great idea of what the paper accomplishes and how it gets there.
Show more...
Science
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43949260/43949260-1750798569136-3391783a0fb9a.jpg
FastVLM: Efficient Vision Encoding for Vision Language Models
The AI Research Deep Dive
16 minutes 42 seconds
1 month ago
FastVLM: Efficient Vision Encoding for Vision Language Models

Arxiv: https://www.arxiv.org/abs/2412.13303

This episode of "The AI Research Deep Dive" unpacks "FastVLM," a paper from Apple that tackles the frustrating lag (Time-To-First-Token) in high-resolution Vision Language Models. The host explains how the model achieves a staggering 85x speedup over competitors by fundamentally re-engineering how the AI processes an image. Listeners will learn about FastVLM's clever hybrid vision encoder, which aggressively shrinks the image data to create over 20 times fewer visual tokens for the language model to process. The episode details how the system avoids losing critical details through a "multi-scale feature fusion" technique, resulting in an AI that is not only dramatically faster and smaller but also more accurate on key real-world benchmarks, paving the way for truly instant and powerful on-device visual intelligence.


The AI Research Deep Dive
From arXiv to insight: a daily tour of cutting-edge AI papers. The AI Research Deep Dive podcast dives into a new groundbreaking research paper every day. It combs through the most important details and results to give you a great idea of what the paper accomplishes and how it gets there.