Optimizing LLM Performance and Cost for LLMs in Production

https://is1-ssl.mzstatic.com/image/thumb/Podcasts126/v4/d3/dc/47/d3dc4709-bb5c-0335-4547-51963881b587/mza_8325914330340089378.jpg/600x600bb.jpg

Pipeline Conversations

ZenML GmbH

34 episodes

8 months ago

Pipeline Conversations is a fortnightly podcast bringing you interviews and discussion with industry leaders, top technology professionals and others. We discuss the latest developments in machine learning, deep learning, artificial intelligence, with a particular focus on MLOps, or how trained models are used in production.

Technology

RSS

All content for Pipeline Conversations is the property of ZenML GmbH and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

/4/4d525632-f8ef-47c1-9321-20f5c498b1ac/episodes/8/850c441c-eb0b-4d22-ad8d-3da4224b35b6/cover.jpg?v=1

Optimizing LLM Performance and Cost for LLMs in Production

Pipeline Conversations

33 minutes 49 seconds

9 months ago

Optimizing LLM Performance and Cost for LLMs in Production

In this episode, we dive deep into the world of LLM optimization and cost management - a critical challenge facing AI teams today. Join us as we explore real-world strategies from companies like Dropbox, Meta, and Replit who are pushing the boundaries of what's possible with large language models. From clever model selection techniques and knowledge distillation to advanced inference optimization and cost-saving strategies, we'll unpack the tools and approaches that are helping organizations squeeze maximum value from their LLM deployments. Whether you're dealing with runaway API costs, struggling with inference latency, or looking to optimize your model infrastructure, this episode provides practical insights that you can apply to your own AI initiatives. Perfect for ML engineers, technical leads, and anyone responsible for maintaining LLM systems in production.

Please read the full blog post here and the associated LLMOps database entries here.