Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
History
Music
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/03/b0/38/03b03894-a103-21c1-aa77-fcd8b0d2ff65/mza_4276056948887286043.jpg/600x600bb.jpg
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Astronomer
78 episodes
18 hours ago
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/
Show more...
Technology
RSS
All content for The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI is the property of Astronomer and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/
Show more...
Technology
Episodes (20/78)
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
How Redica Transformed Their Data With Airflow and Snowflake with Shankar Mahindar

The life sciences industry relies on data accuracy, regulatory insight and quality intelligence. Building a unified system that keeps these elements aligned is no small feat.


In this episode, we welcome Shankar Mahindar, Senior Data Engineer II at Redica Systems. We discuss how the team restructures its data platform with Airflow to strengthen governance, reduce compliance risk and improve customer experience.


Key Takeaways:


00:00 Introduction.

01:53 A focused analytics platform reduces compliance risk in life sciences.

07:31 A centralized warehouse orchestrated by Airflow strengthens governance.

09:12 Managed orchestration keeps attention on analytics and outcomes.

10:32 A modern transformation stack enables scalable modeling and operations.

11:51 Event-driven pipelines improve data freshness and responsiveness.

14:13 Asset-oriented scheduling and versioning enhance reliability and change control.

16:53 Observability and SLAs build confidence in data quality and freshness.

21:04 Priorities include partitioned assets and streamlined developer tooling.


Resources Mentioned:


Shankar Mahindar

https://www.linkedin.com/in/shankar-mahindar-83a61b137/


Redica Systems | LinkedIn

https://www.linkedin.com/company/redicasystems/


Redica Systems | Website

https://redica.com


Apache Airflow

https://airflow.apache.org/


Astronomer

https://www.astronomer.io/


Snowflake

https://www.snowflake.com/


AWS

https://aws.amazon.com/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
23 hours ago
23 minutes 48 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
How Airflow and AI Power Investigative Journalism at the Financial Times with Zdravko Hvarlingov

The Financial Times leverages Airflow and AI to uncover powerful stories hidden within vast, unstructured data.


In this episode, Zdravko Hvarlingov, Senior Software Engineer at the Financial Times, discusses building multi-tenant Airflow systems and AI-driven pipelines that surface stories that might otherwise be missed. Zdravko walks through entity extraction and fuzzy matching, linking the UK Register of Members’ Financial Interests with Companies House, and how this work cuts weeks of manual analysis to minutes.


Key Takeaways:


00:00 Introduction.

02:12 What computational journalism means for day-to-day newsroom work.

05:22 Why a shared orchestration platform supports consistent, scalable workflows.

08:30 Tradeoffs of one centralized platform versus many separate instances.

11:52 Using pipelines to structure messy sources for faster analysis.

14:14 Turning recurring disclosures into usable data for investigations.

16:03 Applying lightweight ML and matching to reveal entities and links.

18:46 How automation reduces manual effort and shortens time to insight.

20:41 Practical improvements that make backfilling and reliability easier.


Resources Mentioned:


Zdravko Hvarlingov

https://www.linkedin.com/in/zdravko-hvarlingov-3aa36016b/


Financial Times | LinkedIn

https://www.linkedin.com/company/financial-times/


Financial Times | Website

https://www.ft.com/


Apache Airflow

https://airflow.apache.org/


UK Register of Members’ Financial Interests

https://www.parliament.uk/mps-lords-and-offices/standards-and-financial-interests/parliamentary-commissioner-for-standards/registers-of-interests/register-of-members-financial-interests/


UK Companies House

https://www.gov.uk/government/organisations/companies-house


Doppler

https://www.doppler.com/


Kubernetes

https://kubernetes.io/


Airflow Kubernetes Executor

https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html


GitHub

https://github.com/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
1 week ago
24 minutes 28 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo

The shift from monolithic to decentralized data workflows changes how teams build, connect and scale pipelines.


In this episode, we feature Oscar Ligthart, Lead Data Engineer, and Rodrigo Loredo, Lead Analytics Engineer, both at Vinted, as we unpack their YAML-driven abstraction that generates Airflow DAGs and standardizes cross-team orchestration.


Key Takeaways:


00:00 Introduction.

05:28 Challenges of decentralization.

06:45 YAML-based generator standardizes pipelines and dependencies.

12:28 Declarative assets and sensors align cross-DAG dependencies.

17:29 Task-level callbacks enable auto-recovery and clear ownership.

21:39 Standardized building blocks simplify upgrades and maintenance.

24:52 Platform focus frees domain work.

26:49 Container-only standardization prevents sprawl.


Resources Mentioned:


Oscar Ligthart

https://www.linkedin.com/in/oscar-ligthart/


Rodrigo Loredo

https://www.linkedin.com/in/rodrigo-loredo-410a16134/


Vinted | LinkedIn

https://www.linkedin.com/company/vinted/


Vinted | Website

https://www.vinted.com/?srsltid=AfmBOor87MGR_eLOauCO93V9A-aLDaAhGYx9cnu_oN8s1SAXMlCRuhW7


Apache Airflow

https://airflow.apache.org/


Kubernetes

https://kubernetes.io/


dbt

https://www.getdbt.com/


Google Cloud Vertex AI

https://cloud.google.com/vertex-ai


Airflow Datasets & Assets (concepts)

https://www.astronomer.io/docs/learn/airflow-datasets


Airflow Summit

https://airflowsummit.org/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
2 weeks ago
29 minutes 36 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Transforming Data Pipelines at XENA Intelligence with Naseem Shah

The shift from simple cron jobs to orchestrated AI-powered workflows is reshaping how startups scale. For a small team, these transitions come with unique challenges and big opportunities.


In this episode, Naseem Shah, Head of Engineering at Xena Intelligence, shares how he built data pipelines from scratch, adopted Apache Airflow and transformed Amazon review analysis with LLMs.


Key Takeaways:


00:00 Introduction.

03:28 The importance of building initial products that support growth and investment.

06:16 The process of adopting new tools to improve reliability and efficiency.

09:29 Approaches to learning complex technologies through practice and fundamentals.

13:57 Trade-offs small teams face when balancing performance and costs.

18:40 Using AI-driven approaches to generate insights from large datasets.

22:38 How unstructured data can be transformed into actionable information.

25:55 Moving from manual tasks to fully automated workflows.

28:05 Orchestration as a foundation for scaling advanced use cases.


Resources Mentioned:


Naseem Shah

https://www.linkedin.com/in/naseemshah/


Xena Intelligence | LinkedIn

https://www.linkedin.com/company/xena-intelligence/


Xena Intelligence | Website

https://xenaintelligence.com/


Apache Airflow

https://airflow.apache.org/


Google Cloud Composer

https://cloud.google.com/composer


Techstars

https://www.techstars.com/


Docker

https://www.docker.com/


AWS SQS

https://aws.amazon.com/sqs/


PostgreSQL

https://www.postgresql.org/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
3 weeks ago
28 minutes 32 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Scaling Geospatial Workflows With Airflow at Overture Maps Foundation and Wherobots with Alex Iannicelli and Daniel Smith

Using Airflow to orchestrate geospatial data pipelines unlocks powerful efficiencies for data teams. The combination of scalable processing and visual observability streamlines workflows, reduces costs and improves iteration speed.


In this episode, Alex Iannicelli, Staff Software Engineer at Overture Maps Foundation, and Daniel Smith, Senior Solutions Architect at Wherobots, join us to discuss leveraging Apache Airflow and Apache Sedona to process massive geospatial datasets, build reproducible pipelines and orchestrate complex workflows across platforms.


Key Takeaways:


00:00 Introduction.

03:22 How merging multiple data sources supports comprehensive datasets.

04:20 The value of flexible configurations for running pipelines on different platforms.

06:35 Why orchestration tools are essential for handling continuous data streams.

09:45 The importance of observability for monitoring progress and troubleshooting issues.

11:30 Strategies for processing large, complex datasets efficiently.

13:27 Expanding orchestration beyond core pipelines to automate frequent tasks.

17:02 Advantages of using open-source operators to simplify integration and deployment.

20:32 Desired improvements in orchestration tools for usability and workflow management.


Resources Mentioned:


Alex Iannicelli

https://www.linkedin.com/in/atiannicelli/


Overture Maps Foundation | LinkedIn

https://www.linkedin.com/company/overture-maps-foundation/


Overture Maps Foundation | Website

https://overturemaps.org


Daniel Smith

https://www.linkedin.com/in/daniel-smith-analyst/


Wherobots | LinkedIn

https://www.linkedin.com/company/wherobots


Wherobots | Website

https://www.wherobots.com


Apache Airflow

https://airflow.apache.org/


Apache Sedona

https://sedona.apache.org/


Github repo

https://github.com/wherobots/airflow-providers-wherobots




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
4 weeks ago
24 minutes 3 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Scaling Airflow for Enterprise Data Platforms at PepsiCo with Kunal Bhattacharya

PepsiCo’s data platform drives insights across finance, marketing and data science. Delivering stability, scalability and developer delight is central to its success, and engineering leadership plays a key role in making this possible.


In this episode, Kunal Bhattacharya, Senior Manager of Data Platform Engineering at PepsiCo, shares how his team manages Airflow at scale while ensuring security, performance and cost efficiency.


Key Takeaways:


00:00 Introduction.

02:31 Enabling developer delight by extending platform capabilities.

03:56 Role of Snowflake, dbt and Airflow in PepsiCo’s data stack.

06:10 Local developer environments built using official Airflow Helm charts.

07:13 Pre-staging and PR environments as testing playgrounds.

08:08 Automating labeling and resource allocation via DAG factories.

12:16 Cost optimization through pod labeling and Datadog insights.

14:01 Isolating dbt engines to improve performance across teams.

16:12 Wishlist for Airflow 3: Improved role-based grants and database modeling.


Resources Mentioned:


Kunal Bhattacharya

https://www.linkedin.com/in/kunaljubce/


PepsiCo | LinkedIn

https://www.linkedin.com/company/pepsico/


PepsiCo | Website

https://www.pepsico.com


Apache Airflow

https://airflow.apache.org/


Snowflake

https://www.snowflake.com


dbt

https://www.getdbt.com


Kubernetes

https://kubernetes.io


Great Expectations

https://greatexpectations.io


Monte Carlo

https://www.montecarlodata.com




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
1 month ago
19 minutes 4 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Building a Unified Data Platform at Pattern with William Graham

The orchestration of data workflows at scale requires both flexibility and security. At Pattern, decoupling scheduling from orchestration has reshaped how data teams manage large-scale pipelines.


In this episode, we are joined by William Graham, Senior Data Engineer at Pattern, who explains how his team leverages Apache Airflow alongside their open-source tool Heimdall to streamline scheduling, orchestration and access management.


Key Takeaways:


00:00 Introduction.

02:44 Structure of Pattern’s data teams across acquisition, engineering and platform.

04:27 How Airflow became the central scheduler for batch jobs.

08:57 Credential management challenges that led to decoupling scheduling and orchestration.

12:21 Heimdall simplifies multi-application access through a unified interface.

13:15 Standardized operators in Airflow using Heimdall integration.

17:13 Open-source contributions and early adoption of Heimdall within Pattern.

21:01 Community support for Airflow and satisfaction with scheduling flexibility.


Resources Mentioned:


William Graham

https://www.linkedin.com/in/willgraham2/


Pattern | LinkedIn

https://www.linkedin.com/company/pattern-hq/


Pattern | Website

https://pattern.com


Apache Airflow

https://airflow.apache.org


Heimdall on GitHub

https://github.com/Rev4N1/Heimdall


Netflix Genie

https://netflix.github.io/genie/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
1 month ago
24 minutes 9 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
How Astronomer Turns Proactive Monitoring Into Customer Success with Collin McNulty

The evolution of Airflow continues to shape data orchestration and monitoring strategies. Leveraging it beyond traditional ETL use cases opens powerful new possibilities for proactive support and internal operations.


In this episode, we are joined by Collin McNulty, Sr. Director of Global Support at Astronomer, who shares insights from his journey into data engineering and the lessons learned from leading Astronomer’s Customer Reliability Engineering (CRE) team.


Key Takeaways:


00:00 Introduction.

03:07 Lessons learned in adapting to major platform transitions.

05:18 How proactive monitoring improves reliability and customer experience.

08:10 Using automation to enhance internal support processes.

12:09 Why keeping systems current helps avoid unnecessary issues.

15:14 Approaches that strengthen system reliability and efficiency.

18:46 Best practices for simplifying complex orchestration dependencies.

23:24 Anticipated innovations that expand orchestration capabilities.


Resources Mentioned:


Collin McNulty

https://www.linkedin.com/in/collin-mcnulty/


Astronomer | LinkedIn

https://www.linkedin.com/company/astronomer/


Astronomer | Website

https://www.astronomer.io


Apache Airflow

https://airflow.apache.org/


Prometheus

https://prometheus.io/


Splunk

https://www.splunk.com/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
1 month ago
25 minutes 34 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Overcoming Data Engineering Challenges at Daiichi Sankyo Europe GmbH with Evgenii Prusov

The shift to a unified data platform is reshaping how pharmaceutical companies manage and orchestrate data. Establishing standards across regions and teams ensures scalability and efficiency in handling large-scale analytics.


In this episode, Evgenii Prusov, Senior Data Platform Engineer of Daiichi Sankyo Europe GmbH, joins us to discuss building and scaling a centralized data platform with Airflow and Astronomer.


Key Takeaways:


00:00 Introduction.

02:49 Building a centralized data platform for 15 European countries.

05:19 Adopting SaaS to manage Airflow from day one.

07:01 Leveraging Airflow for data orchestration across products.

08:16 Teaching non-Python users how to work with Airflow is challenging.

12:25 Creating a global data community across Europe, the US and Japan.

14:04 Monthly calls help share knowledge and align regional teams.

15:47 Contributing to the open-source Airflow project as a way to deepen expertise.

16:32 Desire for more guidelines, debugging tutorials and testing best practices in Airflow.


Resources Mentioned: 


Evgenii Prusov

https://www.linkedin.com/in/prusov/


Daiichi Sankyo Europe GmbH | LinkedIn

https://www.linkedin.com/company/daiichi-sankyo-europe-gmbh/


Daiichi Sankyo Europe GmbH | Website

https://www.daiichi-sankyo.eu


Apache Airflow

https://airflow.apache.org/


Astronomer

https://www.astronomer.io/


Snowflake

https://www.snowflake.com/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
1 month ago
19 minutes 26 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Building a Data-Driven Beauty and Wellness Marketplace at StyleSeat with Paschal Onuorah

StyleSeat is revolutionizing how beauty and wellness professionals grow their businesses through data-driven tools. From streamlining scheduling to optimizing marketing, their platform empowers professionals to focus on their craft while expanding their client base.


In this episode, Paschal Onuorah, Senior Data Engineer at StyleSeat, shares how the company leverages Airflow, dbt, and Cosmos to drive marketplace intelligence, improve client connections and deliver measurable growth for professionals.


Key Takeaways:


00:00 Introduction.

05:44 The role of the data engineering team in driving business success.

08:52 Leveraging technology for real-time business intelligence.

10:52 Data-driven strategies for improving marketing outcomes.

13:05 How adopting the right tools can increase revenue growth.

14:25 Advantages of simplifying and integrating technical workflows.

18:45 Benefits of multi-environment configurations for development and production.

20:17 Foundational skills and best practices for learning Airflow effectively.

22:33 Opportunities for deeper tool integration and improved data visualization.


Resources Mentioned:


Paschal Onuorah

https://www.linkedin.com/in/onuorah-paschal/


StyleSeat | LinkedIn

https://www.linkedin.com/company/styleseat/


StyleSeat | Website

https://www.styleseat.com


Apache Airflow

https://airflow.apache.org/


dbt

https://www.getdbt.com/


Astronomer Cosmos

https://www.astronomer.io/cosmos/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
2 months ago
23 minutes 5 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Building the Future of Airflow Execution at Astronomer with Ian Buss and Piotr Chomiak

The evolution of orchestration in Airflow continues with innovations that address both scalability and security. From improving executor reliability to enabling remote execution, these advancements reshape how organizations manage data pipelines.


In this episode, we’re joined by Ian Buss, Principal Software Engineer at Astronomer, and Piotr Chomiak, Principal Product Manager at Astronomer, who share insights into the Astro Executor and remote execution.


Key Takeaways:


00:00 Introduction.

04:13 How product leadership drives scalability for enterprise needs.

08:23 Architectural changes that improve reliability and remove bottlenecks.

10:15 Metrics that enhance visibility into system performance.

12:54 The role of remote execution in addressing security requirements.

15:56 Differences between open-source solutions and managed offerings.

19:04 Broad industry adoption and applicability of remote execution.

20:39 Future advancements in language support and multi-tenancy.


Resources Mentioned:


Ian Buss

https://www.linkedin.com/in/ian-buss/


Piotr Chomiak

https://www.linkedin.com/in/piotr-chomiak-b1955624/


Astronomer | Website

https://www.astronomer.io


Apache Airflow

https://airflow.apache.org/


Airflow Slack Community

https://airflow.apache.org/community/


Beyond Analytics conference

https://astronomer.io/beyond/dataflowcast




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
2 months ago
22 minutes 25 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.


In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.


Key Takeaways:


00:00 Introduction.

02:13 Overview of the company’s operations and global presence.

04:00 The tech stack and structure of the data engineering team.

04:24 Running nearly 2,000 DAGs in production using Airflow.

05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.

07:05 Details on the Kubernetes-based Airflow setup using Helm charts.

09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.

14:11 Making every team member Airflow-literate through local installation.

17:56 Using custom libraries and plugins to extend Airflow functionality.


Resources Mentioned:


Sébastien Crocquevieille

https://www.linkedin.com/in/scroc/


Numberly | LinkedIn

https://www.linkedin.com/company/numberly/


Numberly | Website

https://numberly.com/


Apache Airflow

https://airflow.apache.org/


Grafana

https://grafana.com/


Apache Kafka

https://kafka.apache.org/


Helm Chart for Apache Airflow

https://airflow.apache.org/docs/helm-chart/stable/index.html


Kubernetes

https://kubernetes.io/


GitLab

https://about.gitlab.com/


KubernetesPodOperator – Airflow

https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html


Beyond Analytics Conference

https://astronomer.io/beyond/dataflowcast




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
2 months ago
24 minutes 17 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
How Moniepoint Group Uses Airflow for Exposure Monitoring with Adeolu Adegboye

Managing financial data at scale requires precise orchestration and proactive monitoring to maintain operational efficiency.


In this episode, we are joined by Adeolu Adegboye, Data Engineer at Moniepoint Group, who shares how his team uses data pipelines and workflow automation to manage high volumes of transactions, ensure timely alerts and support diverse stakeholders across the business.


Key Takeaways:


(00:00) Introduction. 

(02:48) The role of data engineering in supporting all business operations.

(04:17) Leveraging workflow orchestration to manage daily processes.

(05:20) Proactively monitoring for anomalies to prevent potential issues.

(08:12) Simplifying complex insights for non-technical teams.

(13:01) Improving efficiency through dynamic and parallel workflows.

(14:19) Optimizing system performance to handle large-scale operations.

(17:19) Exploring creative and innovative uses for workflow automation.


Resources Mentioned:


Adeolu Adegboye

https://www.linkedin.com/in/adeolu-adegboye/


Moniepoint Group | LinkedIn

https://www.linkedin.com/company/moniepoint-inc/


Moniepoint Group | Website

https://www.moniepoint.com


Apache Airflow

https://airflow.apache.org/


ClickHouse

https://clickhouse.com/


Grafana

https://grafana.com/


Beyond Analytics Conference

https://astronomer.io/beyond/dataflowcast



Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
2 months ago
21 minutes 32 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Inside Bosch’s Airflow 3 Revolution: Remote Execution with Jens Scheffler

The evolution of Airflow has reached a milestone with the introduction of remote execution in Airflow 3, enabling flexible orchestration across distributed environments.


In this episode, Jens Scheffler, Test Execution Cluster Technical Architect at Bosch, shares insights on how his team’s need for large-scale, cross-environment testing influenced the development of the Edge Executor and shaped this major release.


Key Takeaways:


(02:39) The role of remote execution in supporting large-scale testing needs.

(04:44) How community support contributed to the Edge Executor’s development.

(08:41) Navigating network and infrastructure limitations within secure environments.

(13:25) Transitioning from database-heavy processes to an API-driven model.

(14:16) How the new task SDK in Airflow 3 improves distributed task execution.

(16:54) What is required to set up and configure the Edge Executor.

(19:36) Managing multiple queues to optimize tasks across different environments.

(23:30) Examples of extreme distance use cases for edge execution.


Resources Mentioned:


Jens Scheffler

https://www.linkedin.com/in/jens-scheffler/


Bosch | LinkedIn

https://www.linkedin.com/company/bosch/


Bosch | Website

https://www.bosch.com/


Apache Airflow

https://airflow.apache.org/


Edge Executor (Edge3 Provider Package)

https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html


Astronomer’s Astro Executor

https://www.astronomer.io/docs/astro/astro-executor/


Beyond Analytics Conference

https://astronomer.io/beyond/dataflowcast





Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
3 months ago
28 minutes 2 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Inside Modern Data Infrastructure at Massdriver with Cory O’Daniel and Jake Ferriero

Managing modern data platforms means navigating a web of complex infrastructure, competing team needs and evolving security standards. For data teams to truly thrive, infrastructure must become both accessible and compliant without sacrificing velocity or reliability.


In this episode, we’re joined by Cory O’Daniel, CEO and Co-Founder at Massdriver, and Jacob Ferriero, Senior Software Engineer at Astronomer, to unpack what it takes to make data platform engineering scalable, sustainable and secure. They share lessons from years of experience working with DevOps, ML teams and platform engineers and discuss how Airflow fits into the orchestration layer of today’s data stacks.


Key Takeaways:


(03:27) Making infrastructure accessible without deep ops knowledge.

(07:23) Distinct personas and responsibilities across data teams.

(09:53) Infrastructure hurdles specific to ML workloads.

(11:13) Compliance and governance shaping platform design.

(13:27) Tooling mismatches between teams cause friction.

(15:13) Airflow’s orchestration role within broader system architecture.

(22:10) Creating reusable infrastructure patterns for consistency.

(24:13) Enabling secure access without slowing down development.

(26:55) Opportunities to improve Airflow with event-driven and reliability tooling.


Resources Mentioned:


Cory O’Daniel

https://www.linkedin.com/in/coryodaniel/


Massdriver | LinkedIn

https://www.linkedin.com/company/massdriver/


Massdriver | Website

https://www.massdriver.cloud/


Jacob Ferriero

https://www.linkedin.com/in/jacob-ferriero/


Astronomer

https://www.linkedin.com/company/astronomer/


Apache Airflow

https://airflow.apache.org/


Prequel

https://www.prequel.co/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
3 months ago
31 minutes 24 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
The Future of Airflow Telemetry with Bolke de Bruin

Telemetry has the potential to guide the future of Airflow, but only if it’s implemented transparently and with community trust. 


In this episode, we’re joined by Bolke de Bruin, Director at Metyis and a long-time Airflow PMC member. Bolke discusses how telemetry has been handled in the past, why it matters now and what it will take to get it right.


Key Takeaways:


(03:20) The role of foundations in establishing credibility and sustainability.

(04:52) Why data collection is critical to open-source project direction.

(07:24) Lessons learned from previous approaches to user data collection.

(10:23) The current state of telemetry in the project.

(10:53) Community trust as a prerequisite for technical implementation.

(12:54) The importance of managing sensitive data within trusted ecosystems.

(16:37) Ethical considerations in balancing participation and access.

(18:45) Forward-looking ideas for improving workflow design and usability.


Resources Mentioned:


Bolke de Bruin

https://www.linkedin.com/in/bolke/


Metyis | LinkedIn

https://www.linkedin.com/company/metyis/


Metyis | Website

http://www.metyis.com


Apache Airflow

https://airflow.apache.org/


Airflow Summit

https://airflowsummit.org/


Airflow Dev List

https://lists.apache.org/list.html?dev@airflow.apache.org


https://www.astronomer.io/events/roadshow/london/

   

https://www.astronomer.io/events/roadshow/new-york/ 

  

https://www.astronomer.io/events/roadshow/sydney/   


https://www.astronomer.io/events/roadshow/san-francisco/  

 

https://www.astronomer.io/events/roadshow/chicago/ 




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
3 months ago
21 minutes 55 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Transforming the Airflow UI for Cloudera’s Users with Shubham Raj

Contributing to open-source projects can be daunting, but it can also unlock unexpected innovation. This episode showcases how one engineer’s journey with Apache Airflow led to impactful UI enhancements and infrastructure solutions at scale. Shubham Raj, Software Engineer II at Cloudera, shares how his contributions helped shape Airflow 3.0, including an intuitive drag-and-drop DAG editor and a new REST API endpoint for managing XComs.


Key Takeaways:


(02:30) Day-to-day responsibilities building platforms that simplify orchestration.

(05:27) Factors that make onboarding into large open-source projects accessible.

(07:35) The value of improved user interfaces for task state visibility and control.

(09:49) Enabling faster debugging by exposing internal data through APIs.

(13:00) Balancing frontend design goals with backend functionality.

(14:19) Creating workflow editors that lower the barrier to entry.

(16:54) Supporting a variety of task types within a visual DAG builder.

(19:32) Common infrastructure challenges faced by orchestration users.

(20:37) Addressing dependency management across distributed environments.


Resources Mentioned:


Shubham Raj

https://www.linkedin.com/in/shubhamrajofficial/


Cloudera | LinkedIn

https://www.linkedin.com/company/cloudera/


Cloudera | Website

https://www.cloudera.com/


Apache Airflow

https://airflow.apache.org/


2023 Airflow Summit

https://airflowsummit.org/


https://www.astronomer.io/events/roadshow/london/  


https://www.astronomer.io/events/roadshow/new-york/  


https://www.astronomer.io/events/roadshow/sydney/  


https://www.astronomer.io/events/roadshow/san-francisco/  


https://www.astronomer.io/events/roadshow/chicago/





Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
4 months ago
22 minutes 28 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Streamlining Thousands of Data Pipelines at Lyft with Yunhao Qing

Managing data pipelines at scale is not just a technical challenge. It is also an organizational one. At Lyft, success means empowering dozens of teams to build with autonomy while enforcing governance and best practices across thousands of workflows.


In this episode, we speak with Yunhao Qing, Software Engineer at Lyft, about building a governed data-engineering platform powered by Airflow that balances flexibility, standardization and scale.


Key Takeaways:


(03:17) Supporting internal teams with a centralized orchestration platform.

(04:54) Migrating to a managed service to reduce infrastructure overhead.

(06:04) Embedding platform-level governance into custom components.

(08:02) Consolidating and regulating the creation of custom code.

(09:48) Identifying and correcting inefficient workflow patterns.

(11:17) Replacing manual workarounds with native platform features.

(14:32) Preparing teams for major version upgrades.

(16:03) Leveraging asset-based scheduling for smarter triggers.

(18:13) Envisioning GenAI and semantic search for future productivity.


Resources Mentioned:


Yunhao Qing

https://www.linkedin.com/in/yunhao-qing


Lyft | LinkedIn

https://www.linkedin.com/company/lyft/


Lyft | Website

https://www.lyft.com/


Apache Airflow

https://airflow.apache.org/


Astronomer

https://www.astronomer.io/


Kubernetes

https://kubernetes.io/


https://www.astronomer.io/events/roadshow/london/

  

https://www.astronomer.io/events/roadshow/new-york/  


https://www.astronomer.io/events/roadshow/sydney/  


https://www.astronomer.io/events/roadshow/san-francisco/  


https://www.astronomer.io/events/roadshow/chicago/




Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.




#AI #Automation #Airflow #MachineLearning

Show more...
4 months ago
19 minutes 34 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Transforming Customer Education in Data Engineering at Astronomer with Marc Lamberti

Understanding the complexities of Apache Airflow can be daunting for newcomers and seasoned data engineers. But with the right guidance, mastering the tool becomes an achievable milestone.


In this episode, Marc Lamberti, Head of Customer Education at Astronomer, joins us to share his journey from Udemy instructor to driving education at Astronomer, and how he's helping over 100,000 learners demystify Airflow.


Key Takeaways:


(02:36) Early exposure to Airflow while addressing inefficiencies in data workflows.

(04:10) Common barriers to implementing open source tools in enterprise settings.

(06:18) The shift from part-time teaching to a full-time focus on Airflow education.

(07:53) A modular, guided approach to structuring educational content.

(09:57) The value of highlighting underused Airflow features for broader adoption.

(12:35) Certifications as a method to assess readiness and uncover knowledge gaps.

(13:25) Coverage of essential Airflow concepts in the Fundamentals exam.

(16:07) The DAG Authoring exam’s emphasis on practical, advanced features.

(20:08) A call for more visible integration of Airflow with AI workflows.


Resources Mentioned:


Marc Lamberti

https://www.linkedin.com/in/marclamberti/


Astronomer | LinkedIn

https://www.linkedin.com/company/astronomer/


Astronomer Academy

https://academy.astronomer.io/


Airflow Fundamentals Certification

https://www.astronomer.io/certification/


DAG Authoring Certification

https://academy.astronomer.io/plan/astronomer-certification-dag-authoring-for-apache-airflow-exam


The Complete Hands-On Introduction to Airflow

https://www.udemy.com/course/the-complete-hands-on-course-to-master-apache-airflow/?utm_source=adwords&utm_medium=udemyads&utm_campaign=Search_DSA_Beta_Prof_la.EN_cc.ROW-English&campaigntype=Search&portfolio=ROW-English&language=EN&product=Course&test=&audience=DSA&topic=&priority=Beta&utm_content=deal4584&utm_term=_._ag_162511579404_._ad_696197165418_._kw__._de_c_._dm__._pl__._ti_dsa-1677053911088_._li_9061346_._pd__._&matchtype=&gad_source=1&gad_campaignid=21168154305&gbraid=0AAAAADROdO3MpljfP-gssiYSmDEPdhZV9&gclid=Cj0KCQjw097CBhDIARIsAJ3-nxdjZA6G5-Y0-akk6Huksy2PLb04t92J4iNfUSIbMdrSAla_tb-o2N8aArOeEALw_wcB&couponCode=PMNVD3025


https://www.astronomer.io/events/roadshow/london/  


https://www.astronomer.io/events/roadshow/new-york/  


https://www.astronomer.io/events/roadshow/sydney/  


https://www.astronomer.io/events/roadshow/san-francisco/  


https://www.astronomer.io/events/roadshow/chicago/





Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
4 months ago
22 minutes 19 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Embracing Data Mesh and SQL Sensors for Scalable Workflows at lastminute.com with Alberto Crespi

The flexibility of Airflow plays a pivotal role in enabling decentralized data architectures and empowering cross-functional teams.


In this episode, we speak with Alberto Crespi, Data Architect at lastminute.com, who shares how his team scales Airflow across 12 teams while supporting both vertical and horizontal structures under a data mesh approach.


Key Takeaways:


(02:17) Defining responsibilities within data architecture teams.

(04:15) Consolidating multiple orchestrators into a single solution.

(07:00) Scaling Airflow environments with shared infrastructure and DevOps practices.

(10:59) Managing dependencies and readiness using SQL sensors.

(14:23) Enhancing visibility and response through Slack-integrated monitoring.

(19:28) Extending Airflow’s flexibility to run legacy systems.

(22:28) Integrating transformation tools into orchestrated pipelines.

(25:54) Enabling non-engineers to contribute to pipeline development.

(27:33) Fostering adoption through collaboration and communication.


Resources Mentioned:


Alberto Crespi

https://www.linkedin.com/in/crespialberto/


lastminute.com | Website

https://lastminute.com


Apache Airflow

https://airflow.apache.org/


dbt Labs

https://www.getdbt.com/


Astronomer Cosmos

https://github.com/astronomer/astronomer-cosmos


GitLabSlack

https://slack.com/


Kubernetes

https://kubernetes.io/


Confluence

https://www.atlassian.com/software/confluence


Slack

https://slack.com/


https://www.astronomer.io/events/roadshow/london/

   

https://www.astronomer.io/events/roadshow/new-york/

  

https://www.astronomer.io/events/roadshow/sydney/

  

https://www.astronomer.io/events/roadshow/san-francisco/

  

https://www.astronomer.io/events/roadshow/chicago/





Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Show more...
4 months ago
30 minutes 9 seconds

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/