Plumbers of Data Science

EXPLORE

Society & Culture

© 2024 PodJoint

https://is1-ssl.mzstatic.com/image/thumb/Podcasts115/v4/4c/d5/14/4cd514f0-99bb-609e-b62e-88d07f5382d6/mza_11398669677777806419.jpg/600x600bb.jpg

Plumbers of Data Science

Andreas Kretz

125 episodes

1 week ago

Data Engineering is the plumbing of data science. Almost invisible, but super important and a big mess when done wrong. We talk about interesting Data Engineering trends and topics. I also train Data Engineering in my Data Engineering Academy at LearnDataEngineering.com

Show more...

All content for Plumbers of Data Science is the property of Andreas Kretz and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Data Engineering is the plumbing of data science. Almost invisible, but super important and a big mess when done wrong. We talk about interesting Data Engineering trends and topics. I also train Data Engineering in my Data Engineering Academy at LearnDataEngineering.com

Show more...

Episodes (20/125)

Plumbers of Data Science

#126 The Cloud, Java, and the Future of Serverless – with Vadym Kazulkin

In this podcast episode, I’m talking with Vadym Kazulkin, AWS Serverless Hero and Principal Cloud Architect.

He’s part of the AWS community for years now, speaking at events, testing new features early, and helping the community grow.

We talked about:

How serverless is being used in the real world
Why Java still has a place in modern cloud setups
What stops big companies from going all in
How AWS is evolving — and what’s missing
And what engineers should know before going “cloud-native”

If you're building in the cloud or just trying to make smart decisions in a fast-moving space, this episode has a lot for you.

Connect with Vadym via LinkedIn: https://www.linkedin.com/in/vadymkazulkin/

3 months ago

47 minutes 26 seconds

Plumbers of Data Science

#125 AI Hype vs. Reality – with Christina Stathopoulos

In this episode, I’m joined by Christina Stathopoulos, a former Googler who now works independently as a data & AI evangelist, trainer, and advisor.

Not only did we talk about her path from data engineering into data science, and eventually into teaching and strategic consulting. We also covered:

Why many companies still struggle with data & AI basics
The hype vs. reality of generative AI (and how to cut through the noise)
Why soft skills like communication, critical thinking, and storytelling are underrated in tech
How regulation in Europe might be slowing things down, and why AI won’t wait for borders
The difference between online and in-person learning (and why human connection still matters)

and much more!

All in all: A thoughtful, honest conversation full of insights on AI, data culture, and the human side of technology.

Get in touch with Christina via LinkedIn:

https://www.linkedin.com/in/christinastathopoulos/

3 months ago

49 minutes 52 seconds

Plumbers of Data Science

#124 These Skills Get You a Data Consulting Job – with Tom Schamberger

In this episode, I’m talking with Tom Schamberger from the German consultancy msg. He leads their cloud data platform team and has a super interesting background: started coding Java at 12, co-founded startups, and now helps big companies design scalable data platforms.

We talk about:

What it really takes to be successful in data engineering consulting
Why soft skills matter just as much as tech skills
How consulting projects actually work — from Excel chaos to full platforms
The role of tools like Databricks, Snowflake, and Microsoft Fabric
And why being tool-agnostic might be your biggest advantage

If you're curious about consulting, data platforms, or just want to hear what a data engineer's job looks like behind the scenes, dive right into it!

4 months ago

44 minutes 19 seconds

Plumbers of Data Science

#123 Building Fast and Fun Data Projects - with Mehdi Ouazza

In this episode, I sit down with Mehdi Ouazza - data tinkerer, indie hacker, and content creator - who's always up to something interesting in the world of data and AI.

We started with DuckDB but quickly veered off into much more exciting territory: side projects, voice-to-SQL with actual quacks, the power of local models, and why WebGPU might be one of the most underrated browser technologies today.

We also talked about how we teach and learn data engineering in 2025: the importance of fun, interactivity, and why we both dream of creating a data engineering game that’s part "Among Us" and part serious skills training.

Mehdi shares what tools he's using, where he sees GenAI actually helping—not replacing—engineers, and how he's building courses and meetups that inspire creativity in technical work.

Perfect for data folks who like to experiment, educators looking for inspiration, or anyone wondering how far a fun idea can go with the right mix of curiosity and tooling.

5 months ago

1 hour 16 minutes 31 seconds

Plumbers of Data Science

#122 Why Writing Is Thinking , and What Data Engineers Can Learn from It - with Simon Späti

In this podcast episode, I’m joined by Simon Späti, long-time BI and data engineering expert turned full-time technical writer and author of the living book Data Engineering Design Patterns.

We talk about:

His 20-year journey from SQL-heavy BI to modern Data Engineering
Why switching from employee to full-time author wasn’t planned, but necessary
How he uses a “Second Brain” system to manage and publish his knowledge
Why writing is a tool for learning, not just sharing
The concept of convergent evolution in data tooling: when old and new solve the same problem
The underrated power of data modeling and pattern recognition in a hype-driven industry

Simon also shares practical advice for building your own public knowledge base, and why Markdown and simplicity still win in the long run.

Whether you're into tools, systems, or lifelong learning, this one’s a thoughtful deep dive.

***

About Simon Späti:

Simon is a Data Engineer and Technical Author with 20+ years of experience in the data field. He's the author of the Data Engineering Blog (ssp.sh), curator of the Data Engineering Vault (vault.ssp.sh), and currently writes a book about Data Engineering Design Patterns (dedp.online). Simon maintains an awareness of open-source data engineering technologies and enjoys sharing his knowledge with the community.

Socials: Bluesky, LinkedIn, Twitter/X, YouTube

5 months ago

1 hour 5 minutes 21 seconds

Plumbers of Data Science

#121 From Application Dev to AWS Hero: A Journey in Tech & Impact - with Johannes Koch

In this episode, I’m joined by Johannes Koch, Principal Engineer and AWS DevTools Hero, to talk about the real DevOps mindset, the evolution of developer experience, and how community work changed his career.

Johannes shares how starting in QA and support gave him a unique edge in understanding users, why building a proper CI/CD pipeline should come before writing code, and how the AWS Community Builders program helped him grow into his current role.

We also dive into:

What DevOps culture actually means beyond automation
His take on GitOps and why CI/CD is still underrated
Behind the scenes of the AWS Heroes Program
Creating content, mentoring others, and avoiding the trap of tech hype

Whether you're starting out in DevOps or deep into cloud architecture, Johannes' insights are packed with value for data and AI professionals.

Links:

6 months ago

1 hour 3 minutes 19 seconds

Plumbers of Data Science

#120 Teaching Data Engineering Like It’s Done on the Job - with Deepak Goyal

In this episode, I sit down with Deepak Goyal, the founder of AzureLib, to talk all things data engineering, cloud platforms, and how to teach the next generation of engineers.

We explore why Azure became his go-to, how real-world projects beat theory every time, and why tools like ChatGPT are great assistants, but no substitute for structured learning and solid fundamentals.

Follow Deepak on LinkedIn: https://www.linkedin.com/in/deepak-goyal-93805a17/

Check out AzureLib: azurelib.com

Learn Data Engineering with me: learndataengineering.com

6 months ago

48 minutes 33 seconds

Plumbers of Data Science

#119 Recruiting is harder than I thought

In this episode of the Plumbers of Data Science podcast, I dive into the challenges of recruiting today, from overwhelming job application volumes to reaching out directly to recruiters.

I’m testing new strategies to make the process smoother for everyone involved, focusing on fresh job listings and fostering connections with hiring managers who need skilled engineers. My goal? To secure five job placements in Germany by year’s end!

Have thoughts on today’s job market, or tried the Easy Apply feature yourself? Drop a comment below—I’d love to hear your experience!

1 year ago

10 minutes 10 seconds

Plumbers of Data Science

#118 Freelancing as a Data Engineer - Hero Talk with the "Seattle Data Guy" Ben Rogojan

In this Hero Talk episode, I had the pleasure of chatting with Ben Rogojan, better known as the "Seattle Data Guy." Ben is a data engineer, YouTuber, and freelancer with a background at Facebook. He's become a go-to expert on freelancing for engineers, particularly in the data space.

We dive into Ben's journey from being a full-time engineer to making the switch to freelancing, how he built his own business, and the unique challenges freelancers face in this space.

We also explore how to break into freelancing, the value of specializing in a specific skill, and practical tips on landing your first freelance clients.

1 year ago

54 minutes 32 seconds

Plumbers of Data Science

#117 We Are Starting a Recruiting Service!

In this episode of the Plumbers of Data Science podcast, I'm sharing some exciting updates about the future of Learn Data Engineering and a big new service we’re launching—recruiting!

I explain how this new offering will help engineers find their next career move while connecting companies with top talent. Tune in to hear more about how the Academy, Coaching, and now recruiting fit together into one ecosystem designed to support your career growth.

Let me know your thoughts in the comments—are you excited about this new direction?

1 year ago

11 minutes 28 seconds

Plumbers of Data Science

#116 Data Modeling is F***ing Easy!

In this episode of the Plumbers of Data Science podcast, I’m sharing my thoughts on why data modeling isn’t as complicated as people make it out to be. You hear about courses and tutorials that stretch for hours—but is it really that hard?

I’ll break down the two main things you need to focus on when modeling data and explain why, once you’ve got those down, the rest falls into place.

1 year ago

6 minutes 14 seconds

Plumbers of Data Science

#115 His Career Started With a Bootcamp & Now He Helps Others Succeed - Hero Talk w/ Mezue Obi-Eyis

In this Hero Talk episode, I talk with Mezue, a seasoned Data Engineer with expertise in Azure Databricks Data Engineering. We cover his journey from Electrical Engineering to Data Engineering and discuss the key skills, like Python, SQL, and Spark, that are essential in the field.Mezue also shares his experience running an Azure Databricks bootcamp and offers advice on how to break into Data Engineering, especially in Cloud environments. We also touch on the challenges of finding junior roles and how to stand out by working on practical projects.

1 year ago

39 minutes 17 seconds

Plumbers of Data Science

#114 Dirty Data & Data Cleaning - Hero Talk with "The Classification Guru" Susan Walsh

In this Hero Talk episode, I chat with Susan Walsh, the “Classification Guru,” known for her expertise in cleaning and classifying messy data.

We dive into her unexpected journey into the data world, starting with a spend analytics job, and how that led to her founding her own business focused on dirty data. Susan shares the unique challenges businesses face with poor data quality, explaining why 99.9% of data problems are actually people problems.

We also explore practical ways to deal with these issues, such as finding those "crappy" data cleaning jobs to gain experience, and the importance of consistent data maintenance to prevent future headaches. From addressing dirty CRM systems to battling fraud, Susan’s stories highlight how critical clean data is for business success.

1 year ago

48 minutes 30 seconds

Plumbers of Data Science

#113 A Deep Dive Into APIs, IoT, and Data Storage - Hero Talk with Paolo Lulli

In this Hero Talk episode, I sit down with Paolo Lulli, an experienced Data Engineer, to explore some of the core challenges and decisions in API development and data management. We dive deep into the debate between serverless infrastructure versus traditional servers, discussing the pros and cons of both approaches, particularly in the context of scalability, cost, and maintenance.

Paolo also shares his hands-on experience with time series databases, explaining their advantages in handling massive amounts of data from IoT devices. We delve into vendor lock-in issues, highlighting how relying too heavily on cloud providers like AWS or Azure can impact long-term flexibility.

1 year ago

34 minutes 26 seconds

Plumbers of Data Science

#112 Why testing data pipelines can be so challenging - and how to tackle it

In this episode of the Plumbers of Data Science podcast, I’m diving into why testing can be so challenging for data engineers. The inspiration for this topic actually came from one of my recent Coaching sessions, where the question of test-driven development (TDD) came up during a Q&A. It stuck with me, so I thought it would be a great topic to dive deeper into.

I’ll explain the key benefits of TDD, like improved code quality and easier refactoring, and why, despite its advantages, it’s not always widely adopted—especially in fast-paced environments where time constraints dominate. We’ll also talk about the specific challenges data engineers face with TDD, such as handling large, unpredictable data, integrating with external systems, and adapting to ever-changing data.

1 year ago

18 minutes 40 seconds

Plumbers of Data Science

#111 Is This the Synthetic Data Revolution?! Hero Talk with Mario Scriminaci from Mostly AI

In this Hero Talk episode, we dive deep into the fascinating world of synthetic data, a critical tool for development, testing, and training Machine Learning models. Joining me is Mario Scriminaci, Chief Product Officer at Mostly AI, who shares his expertise on how synthetic data can revolutionize the way we handle sensitive information, particularly in the context of privacy regulations like GDPR and CCPA.

We discuss the real-world applications of synthetic data, how it differs from traditional mock data, and its potential to drive innovation in AI and ML development. Mario also introduces Mostly AI's cutting-edge tools, highlighting how they make it easier than ever to generate realistic, privacy-safe datasets.

1 year ago

46 minutes 48 seconds

Plumbers of Data Science

#110 Bootcamps vs Coaching

In this episode of the Plumbers of Data Science podcast, I’m diving into the debate between bootcamps and coaching programs, especially for those looking to advance in Data Engineering.

I’ll break down the pros and cons of each approach - from the structured, intensive nature of bootcamps to the personalized, flexible support of coaching, I’ll share insights to help you choose the right path for your career. I’ll also discuss the experiences of my current coaching students and what I’m focusing on to help them achieve their goals.

1 year ago

25 minutes 54 seconds

Plumbers of Data Science

#109 Why your data and goals matter more than tools!

In this episode of the Plumbers of Data Science podcast, I’m diving into what truly matters when building data platforms and pipelines.

As engineers, it’s easy to get caught up in the latest tools, but real success starts with understanding your data sources and defining clear goals. I’ll walk you through the key questions to ask, from data retention to processing speeds and user needs.

1 year ago

16 minutes 32 seconds

Plumbers of Data Science

#108 Why Apache Spark Is Such An Essential Skill - Hero Talk with Philipp Brunenberg

In this episode, we explore the essentials of learning and mastering Apache Spark. Joining me is Philip, an experienced Spark developer and educator, who shares his expert roadmap for becoming proficient in Spark. We discuss why Spark is a crucial tool for data engineers, how to set it up effectively, and the best approaches to start your Spark journey.

Philip also highlights the importance of understanding Spark's internals, deploying real-world applications, and optimizing performance. He walks us through his six-part roadmap, focusing on hands-on practice and building confidence through real-world projects. We also touch on key topics like the Scala vs. Python debate, Spark's role in machine learning, and how it stands against emerging tools like Beam.

1 year ago

40 minutes 24 seconds

Plumbers of Data Science

#107 The Future of Data Observability - Hero Talk with Ryan Yackel

In this Hero Talk episode, we explore the crucial topic of data observability, a field that has become essential for Data Engineers dealing with complex data pipelines. I am joined by my special guest Ryan Yackel from DataBand, who shares his insights and expertise on the subject.

Ryan delves into the concept of data observability and its significance for Data Engineers, addressing common challenges faced in monitoring and maintaining data pipelines. He explains how DataBand helps in monitoring and improving data reliability, ensuring that data flows smoothly from source to destination.

1 year ago

53 minutes 56 seconds