Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
Technology
Health & Fitness
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/57/f0/bb/57f0bbb1-dfbb-f611-f221-c90175211817/mza_17695132971636498762.jpg/600x600bb.jpg
Large Language Model (LLM) Talk
AI-Talk
66 episodes
1 week ago
AI Explained breaks down the world of AI in just 10 minutes. Get quick, clear insights into AI concepts and innovations, without any complicated math or jargon. Perfect for your commute or spare time, this podcast makes understanding AI easy, engaging, and fun—whether you're a beginner or tech enthusiast.
Show more...
Technology
RSS
All content for Large Language Model (LLM) Talk is the property of AI-Talk and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
AI Explained breaks down the world of AI in just 10 minutes. Get quick, clear insights into AI concepts and innovations, without any complicated math or jargon. Perfect for your commute or spare time, this podcast makes understanding AI easy, engaging, and fun—whether you're a beginner or tech enthusiast.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/42833626/42833626-1736915491470-26d279f5a19fe.jpg
vLLM
Large Language Model (LLM) Talk
13 minutes 6 seconds
6 months ago
vLLM

vLLM is a high-throughput serving system for large language models. It addresses inefficient KV cache memory management in existing systems caused by fragmentation and lack of sharing, which limits batch size. vLLM uses PagedAttention, inspired by OS paging, to manage KV cache in non-contiguous blocks. This minimizes memory waste and enables flexible sharing, allowing vLLM to batch significantly more requests. As a result, vLLM achieves 2-4x higher throughput compared to state-of-the-art systems like FasterTransformer and Orca.

Large Language Model (LLM) Talk
AI Explained breaks down the world of AI in just 10 minutes. Get quick, clear insights into AI concepts and innovations, without any complicated math or jargon. Perfect for your commute or spare time, this podcast makes understanding AI easy, engaging, and fun—whether you're a beginner or tech enthusiast.