Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
Technology
News
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/35/0e/ea/350eea4b-dc4c-8299-6bf7-39c4c41aca90/mza_1860621988665580564.jpg/600x600bb.jpg
How AI Is Built
Nicolay Gerold
63 episodes
6 days ago
Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.
Show more...
Technology
RSS
All content for How AI Is Built is the property of Nicolay Gerold and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/44001690/9755c92363885b6c.jpg
#052 Don't Build Models, Build Systems That Build Models
How AI Is Built
59 minutes 22 seconds
4 months ago
#052 Don't Build Models, Build Systems That Build Models


Nicolay here,


Today I have the chance to talk to Charles from Modal, who went from doing a PhD on neural network optimization in the 2010s - when ML engineers could build models with a soldering iron and some sticks - to architecting serverless infrastructure for AI models. Modal is about removing barriers so anyone can spin up a hundred GPUs in seconds.


The critical insight that stuck with me: "Don't build models, build systems that build models." Organizations often make the mistake of celebrating a one-time fine-tuned model that matches GPT-4 performance only to watch it become obsolete when the next foundation model arrives - typically three to six months down the road.

Charles's approach to infrastructure is particularly unconventional. He argues that serverless isn't just about convenience - it fundamentally changes how ambitious you can be with scale. "There's so much that gets in the way of trying to spin up a hundred GPUs or a thousand CPU containers that people just don't think to do something big."

The winning approach involves automated data pipelines with feedback collection, continuous evaluation against new foundation models, AB testing and canary deployments, and systematic error analysis and retraining.


In the podcast, we also cover:

  • Why inference, not training, is where the money is made
  • How to rethink compute when moving from traditional cloud to serverless
  • The economics of automated resource management
  • Why task decomposition is the key ML engineering skill
  • When to earn the right to fine-tune versus using foundation models

*📶 Connect with Charles:*

  • Twitter - https://twitter.com/charlesirl 
  • Modal Labs - https://modal.com 
  • Modal Slack Community - https://modal.com/slack 

*📶 Connect with Nicolay:*

  • LinkedIn - https://linkedin.com/in/nicolay-gerold/ 
  • X / Twitter - https://x.com/nicolaygerold 
  • Bluesky - https://bsky.app/profile/nicolaygerold.com 
  • Website - https://nicolaygerold.com/ 
  • My Agency Aisbach - https://aisbach.com/  (for ai implementations / strategy)

*⏱️ Important Moments*

  • From CUDA to Serverless: [00:01:38] Charles's journey from PhD neural network optimization to building Modal's serverless infrastructure.
  • Rethinking Scale Ambition: [00:01:38] "There's so much that gets in the way of trying to spin up a hundred GPUs that people just don't think to do something big."
  • The Economics of Serverless: [00:04:09] How automated resource management changes the cattle vs pets paradigm for GPU workloads.
  • Lambda vs Modal Philosophy: [00:04:20] Why Modal was designed for tasks that take bytes and emit megabytes, unlike Lambda's middleware focus.
  • Inference Economics Reality: [00:10:16] "Almost nobody gets paid to make models - organizations get paid to make predictions."
  • The Open Source Commoditization: [00:14:55] How foundation models are becoming undifferentiated capabilities like databases.
  • Task Decomposition as Core Skill: [00:22:00] Why breaking down problems is equivalent to recognizing API boundaries in software engineering.
  • Systems That Build Models: [00:33:31] The critical difference between delivering static weights versus repeatable model production systems
  • Earning the Right to Fine-Tune: [00:34:06] The infrastructure prerequisites needed before attempting model customization.
  • Multi-Node Training Challenges: [00:52:24] How serverless platforms handle the contradiction of high-performance computing with spiky demand.

*🛠️ Tools & Tech Mentioned*

  • Modal - https://modal.com  (serverless GPU infrastructure) 
  • AWS Lambda - https://aws.amazon.com/lambda/  (traditional serverless)
  • Kubernetes - https://kubernetes.io/  (container orchestration)
  • Temporal - https://temporal.io/ (workflow orchestration)
  • Weights & Biases - https://wandb.ai/ (experiment tracking)
  • Hugging Face - https://huggingface.co/  (model repository)
  • PyTorch Distributed - https://pytorch.org/tutorials/intermediate/ddp_tutorial.html  (multi-GPU training)
  • Redis - https://redis.io/ (caching and queues)


*📚 Recommended Resources*

  • Full Stack Deep Learning - https://fullstackdeeplearning.com/ (deployment best practices)
  • Modal Documentation - https://modal.com/docs (getting started guide)
  • Deep Seek Paper - https://arxiv.org/abs/2401.02954 (disaggregated inference patterns)
  • AI Engineer Summit - https://ai.engineer/ (community events)
  • MLOps Community - https://mlops.community/ (best practices)


💬 Join The Conversation

Follow How AI Is Built on YouTube - https://youtube.com/@howaiisbuilt, Bluesky - https://bsky.app/profile/howaiisbuilt.fm, or Spotify - https://open.spotify.com/show/3hhSTyHSgKPVC4sw3H0NUc?_authfailed=1%29 

If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn - https://linkedin.com/in/nicolay-gerold/, X - https://x.com/nicolaygerold, or Bluesky - https://bsky.app/profile/nicolaygerold.com. Or at nicolay.gerold@gmail.com. 

I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.

How AI Is Built
Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.