Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
Technology
Health & Fitness
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/86/0c/75/860c75aa-068a-18b9-1cb5-600f803acdd4/mza_17177667092256625558.jpg/600x600bb.jpg
AI Illuminated
The AI Illuminators
25 episodes
1 day ago
A new way to keep up with AI research. Delivered to your ears. Illuminated by AI. Part of the GenAI4Good initiative.
Show more...
Courses
Education
RSS
All content for AI Illuminated is the property of The AI Illuminators and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
A new way to keep up with AI research. Delivered to your ears. Illuminated by AI. Part of the GenAI4Good initiative.
Show more...
Courses
Education
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/42256170/42256170-1731395806806-972281b31b647.jpg
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
AI Illuminated
7 minutes 20 seconds
11 months ago
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

[00:00] SVD-Quant: 4-bit diffusion model quantization

[00:27] Challenge: Outlier sensitivity in 4-bit quantization

[00:59] Solution: Smoothing + SVD approach

[01:37] Technical: SVD's role in low-rank approximation

[02:08] Nunchuku: New inference engine with kernel fusion

[02:35] Comparison: INT4 vs FP4 quantization methods

[03:00] Results: 3.5x memory reduction on Flux-1.0

[03:44] Feature: Seamless LoRA compatibility

[04:06] Study: Validating combined approach effectiveness

[04:40] Future: Hardware compatibility and improvements

[06:12] Methods: Image quality assessment metrics

[06:53] Impact: Open-source deployment benefits


Authors: Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, Song Han


Affiliations: MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU, Pika Labs


Abstract: Diffusion models have been proven highly effective at generating high-quality images. However, as these models grow larger, they require significantly more memory and suffer from higher latency, posing substantial challenges for deployment. In this work, we aim to accelerate diffusion models by quantizing their weights and activations to 4 bits. At such an aggressive level, both weights and activations are highly sensitive, where conventional post-training quantization methods for large language models like smoothing become insufficient. To overcome this limitation, we propose SVDQuant, a new 4-bit quantization paradigm. Different from smoothing which redistributes outliers between weights and activations, our approach absorbs these outliers using a low-rank branch. We first consolidate the outliers by shifting them from activations to weights, then employ a high-precision low-rank branch to take in the weight outliers with Singular Value Decomposition (SVD). This process eases the quantization on both sides. However, na\"ıvely running the low-rank branch independently incurs significant overhead due to extra data movement of activations, negating the quantization speedup. To address this, we co-design an inference engine Nunchaku that fuses the kernels of the low-rank branch into those of the low-bit branch to cut off redundant memory access. It can also seamlessly support off-the-shelf low-rank adapters (LoRAs) without the need for re-quantization. Extensive experiments on SDXL, PixArt-Σ, and FLUX.1 validate the effectiveness of SVDQuant in preserving image quality. We reduce the memory usage for the 12B FLUX.1 models by 3.5×, achieving 3.0× speedup over the 4-bit weight-only quantized baseline on the 16GB laptop 4090 GPU, paving the way for more interactive applications on PCs. Our quantization library and inference engine are open-sourced.


Link: https://hanlab.mit.edu/projects/svdquant

AI Illuminated
A new way to keep up with AI research. Delivered to your ears. Illuminated by AI. Part of the GenAI4Good initiative.