Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
History
Music
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/03/b0/38/03b03894-a103-21c1-aa77-fcd8b0d2ff65/mza_4276056948887286043.jpg/600x600bb.jpg
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Astronomer
78 episodes
3 days ago
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/
Show more...
Technology
RSS
All content for The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI is the property of Astronomer and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/
Show more...
Technology
https://files.cohostpodcasting.com/cohost/dbcdfaae-58e3-4bce-9cf7-fd6dbc27a8f5/shows/50fd8116-be73-4852-be37-94d8bc0082fa/episodes/44764ab8-9d37-4f20-be48-3fd0f7787ae7/7f098f99e0.jpg
Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
24 minutes 17 seconds
2 months ago
Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.


In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.


Key Takeaways:


00:00 Introduction.

02:13 Overview of the company’s operations and global presence.

04:00 The tech stack and structure of the data engineering team.

04:24 Running nearly 2,000 DAGs in production using Airflow.

05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.

07:05 Details on the Kubernetes-based Airflow setup using Helm charts.

09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.

14:11 Making every team member Airflow-literate through local installation.

17:56 Using custom libraries and plugins to extend Airflow functionality.


Resources Mentioned:


Sébastien Crocquevieille

https://www.linkedin.com/in/scroc/


Numberly | LinkedIn

https://www.linkedin.com/company/numberly/


Numberly | Website

https://numberly.com/


Apache Airflow

https://airflow.apache.org/


Grafana

https://grafana.com/


Apache Kafka

https://kafka.apache.org/


Helm Chart for Apache Airflow

https://airflow.apache.org/docs/helm-chart/stable/index.html


Kubernetes

https://kubernetes.io/


GitLab

https://about.gitlab.com/


KubernetesPodOperator – Airflow

https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html


Beyond Analytics Conference

https://astronomer.io/beyond/dataflowcast




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/