Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
Technology
News
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/c9/58/cd/c958cd43-3563-b1c6-d7ad-71fd54e1cc0c/mza_4031104551652385663.jpg/600x600bb.jpg
Free the Data Podcast
Free the Data Academy
10 episodes
3 days ago
Join Ben Sullins as he talks Data Science and AI with people advancing the field in interesting ways.
Show more...
Careers
Business
RSS
All content for Free the Data Podcast is the property of Free the Data Academy and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Join Ben Sullins as he talks Data Science and AI with people advancing the field in interesting ways.
Show more...
Careers
Business
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/41182765/41182765-1715361811820-1d2a9b60e7e38.jpg
Streaming Data Using Spark with Dustin Vannoy
Free the Data Podcast
1 hour 16 minutes 50 seconds
1 year ago
Streaming Data Using Spark with Dustin Vannoy

Data engineering has historically involved extracting data from disperate sources, transforming it to a standard layout, and then loading it into a new database for analytics. Usually these data engineering pipeline jobs would run on a schedule such as nightly or weekly. In today's fastpaced high-tech world however the need for data closer to real-time, meaning when it was first generated, is higher than ever. In today's episode we hear from Dustin Vannoy who is a consultant and blogger in the streaming data space about how to use Apache Spark, the most popular streaming analytics platform. How to connect with Dustin: - WEBSITE: https://dustinvannoy.com/ - TWITTER:   / dustinvannoy   - LINKEDIN:   / dustinvannoy   - YOUTUBE:    / @dustinvannoy   Learn data skills at our academy and elevate your career. Start for free at https://ftdacademy.com/YT Chapters: 0:00:00 Intro 0:01:01 Dustin's Background 0:09:51 Transitioning from legacy databases to Big Data and Streaming 0:13:29 Microbatching vs Streaming 0:18:17 What is Spark and why use it? 0:22:33 Apache Spark vs Data Bricks 0:26:24 Pay for a hosted Spark version or roll your own? 0:28:27 Databricks setup 0:30:25 How Databricks executes queries 0:32:41 Scaling approaches to Spark 0:35:14 Connecting to external databases in databricks 0:37:51 Visualizing data in Databricks 0:39:40 Using Spark for ETL work 0:42:50 What is real-time processing? 0:44:25 How to build a streaming job in Spark using Kafka 0:46:18 Streaming architecture overview 0:49:15 Pulling data from Kafka into Spark streaming 0:51:09 Why apps use Kafka 0:54:33 Why use Spark versus alternatives 0:57:37 What is Confluent? 0:59:38 Ways to learn Spark 1:02:04 How hard is Spark to learn? 1:04:16 Troubleshooting errors in Spark 1:07:03 How hard is it to transition to Spark from traditional databases? 1:11:51 Interviewing for a Spark job 1:15:46 Outro

Free the Data Podcast
Join Ben Sullins as he talks Data Science and AI with people advancing the field in interesting ways.