Join us as we discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance. Paper: https://arxiv.org/abs/2505.10938 Slides: https://bit.ly/45wolpr Join us for Ar...
All content for Deep Papers is the property of Arize AI and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Join us as we discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance. Paper: https://arxiv.org/abs/2505.10938 Slides: https://bit.ly/45wolpr Join us for Ar...
AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs
Deep Papers
30 minutes
2 months ago
AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs
This week, we're mixing things up a little bit. Instead of diving deep into a single research paper, we cover the biggest AI developments from the past few weeks. We break down key announcements, including: DeepSeek’s Big Launch Week: A look at FlashMLA (DeepSeek’s new approach to efficient inference) and DeepEP (their enhanced pretraining method).Claude 3.7 & Claude Code: What’s new with Anthropic’s latest model, and what Claude Code brings to the AI coding assistant space.Stay ahead of ...
Deep Papers
Join us as we discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance. Paper: https://arxiv.org/abs/2505.10938 Slides: https://bit.ly/45wolpr Join us for Ar...