🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp"

https://is1-ssl.mzstatic.com/image/thumb/Podcasts126/v4/e2/69/ac/e269ac3a-fe0d-8b89-80dc-4d31b162bb21/mza_6753880242152877511.jpg/600x600bb.jpg

Tech Beats Unplugged

Cloud Dude

8 episodes

1 day ago

Welcome to Tech Beats Unplugged, the podcast where we dive into the dynamic world of technology, open source, cloud innovation, devops, Tech economics, and much more. Join us as we bring together experts, thought leaders, and innovators from diverse tech arenas and leading tech vendors. Here, we believe in creating a safe space where guests can freely share their opinions, insights, and experiences without any strings attached. It's a platform dedicated to unlocking knowledge, fostering meaningful discussions, and exploring the latest trends that shape the tech industry. Time to Tune in !

Technology

RSS

All content for Tech Beats Unplugged is the property of Cloud Dude and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/6967131/6967131-1752892953666-a1ef54ed2437b.jpg

🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp"

Tech Beats Unplugged

2 hours 51 minutes 11 seconds

3 months ago

🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp"

👋🏼 Hey AI heads 🎙️
Join us for the very first Tech Beats Live 🔴, hosted by Kosseila—aka @CloudDude from @CloudThrill.

🎯 This chill & laid-back livestream will unpack LLM quantization 🔥:

✅ WHY it matters
✅ HOW it works
✅ Enterprise (vLLM) vs Consumer (@Ollama) trade-offs
✅ and WHERE it’s going next.

We’ll be joined by two incredible guest stars to talk Enterprise vs Consumer Quantz 🗣️:

🔷 Eldar Kurtić – bringing the enterprise perspective with vLLM.
🔷 Colin Kealty – aka Bartowski, creator of the top-downloaded GGUF quantized LLMs on Hugging Face.

🫵🏼 Come learn and have some fun 😎.

𝐂𝐡𝐚𝐩𝐭𝐞𝐫𝐬:

(00:00) Host Introduction
(04:07) Eldar Intro
(07:33) Bartowski Intro
(13:04) What’s Quantization!
(16:19) Why LLM Quantization Matters?
(20:39) Training vs Inference – “The New Deal”
(27:46) Biggest Misconception About Quantization
(33:22) Enterprise Quantization in Production (vLLM)
(48:48) Consumer LLMs & Quantization (Ollama, llama.cpp, GGUF) – “LLMs for the People”
(01:06:45) BitNet 1-Bit Quantization from Microsoft
(01:28:14) How Long It Takes to Quantize a Model (Llama-3 70B) – GGUF or lm-compressor
(01:34:23) What Is I-Matrix & Why People Confuse It with IQ Quantization?
(01:39:36) What’s LoRA & LoRA-Q?
(01:42:36) What Is Sparsity?
(01:47:42) What Is Distillation?
(01:52:34) Extreme Quantization (Unsloth) of Big Models (DeepSeek) at 2-bits 70 % Size Cut
(01:57:27) Will Future Models (Llama-5) Be Trained on FP4 Tensor Cores?
(02:02:15) The Future of LLMs on Edge Devices (Google AI Edge)
(02:08:00) How to Evaluate the Quality of a Quantized Model
(02:26:09) Hugging Face’s Role in the World of LLM/Quantization
(02:33:46) Hugging Face’s Role in the World of LLM/Quantization
(02:36:41) LocalLlama Sub-Reddit Down (Moderator Goes Bananas)
(02:40:11) Guests’ Hope for the Future of LLMs & AI in General

📖 Check out the quantization blog: https://bitly/LLMQuant

#AI #LLM #Quantization #TechBeatsLive #LocalLlama #vLLM #Ollama