Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
News
Sports
TV & Film
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/a3/48/fb/a348fbda-0a6f-99f7-d0ed-3cf541d660b1/mza_4469737581030915080.png/600x600bb.jpg
Data Architecture Elevator
Agile Lab s.r.l.
17 episodes
7 months ago
Show more...
Technology
RSS
All content for Data Architecture Elevator is the property of Agile Lab s.r.l. and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Show more...
Technology
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/a3/48/fb/a348fbda-0a6f-99f7-d0ed-3cf541d660b1/mza_4469737581030915080.png/600x600bb.jpg
Advanced LLM Optimization techniques
Data Architecture Elevator
15 minutes 54 seconds
7 months ago
Advanced LLM Optimization techniques
Welcome to another Data Architecture Elevator podcast! Today's discussion is hosted by Paolo Platter supported by our experts Antonino Ingargiola and Irene Donato. In this episode, we explore effective strategies for optimizing large language models (LLMs) for inference tasks with multimodal data like audio, text, images, and video. We discuss the shift from online APIs to hosted models, choosing smaller, task-specific models, and leveraging fine-tuning, distillation, quantization, and tensor fusion techniques. We also highlight the role of specialized inference servers such as Triton and Dynamo, and how Kubernetes helps manage horizontal scaling. Don't forget to follow us on LinkedIn! Enjoy!
Data Architecture Elevator