Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
News
Sports
TV & Film
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/7b/31/0b/7b310bd7-c227-eb54-2bce-d5f3e6b4a8a1/mza_7754348878711149098.jpg/600x600bb.jpg
Regular Programming
Lars Wikman, Andreas Ekeroot
65 episodes
3 months ago
Conversations about programming. By Andreas Ekeroot and Lars Wikman, funded by Underjord.io.
Show more...
Technology
RSS
All content for Regular Programming is the property of Lars Wikman, Andreas Ekeroot and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Conversations about programming. By Andreas Ekeroot and Lars Wikman, funded by Underjord.io.
Show more...
Technology
https://img.transistor.fm/mRMH7muRhE7jpFbGS522R7cuOrU_ynwBc_2I-O3sagM/rs:fill:3000:3000:1/q:60/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzE2NjI3NzMv/MTcwMzcwNjQ4OS1h/cnR3b3JrLmpwZw.jpg
About Data Pipelines
Regular Programming
43 minutes
1 year ago
About Data Pipelines

Lars dove into data pipelines, and emerged bearing arrows and wishing for a lot fewer copies.

What is there to think about regarding data pipelines, what is interesting about them?

Which tools are out there, and why might you want to use them?

Why all this talk about making fewer copies of data?

What does Lars' current ideal pipeline look like, and where does Elixir fit in?

Links

  • Matt Topol
  • Apache Arrow
  • Large language models
  • Vector search
  • BigQuery
  • sed
  • AWK
  • jq
  • Replacing Hadoop with bash - "Command-line Tools can be 235x Faster than your Hadoop Cluster"
  • Hadoop
  • MapReduce
  • Unix pipes
  • Directed acyclic graph
  • tee - to "materialize inbetween states"
  • Apache Beam
  • Apache Spark
  • Apache Flink
  • Apache Pulsar
  • Airbyte - shoves data between systems using connectors
  • Cronjob
  • Fivetran - Airbyte competitor
  • Apache Airflow
  • ETL - Extract, transform, load
  • Designing data-intensive applications
  • Stream processing
  • Ephemerality
  • Data lake
  • Data warehouse
  • The people's front of Judea
  • DBT - SQL-SQL batch-work-thingy
  • SQL with Jinja templates
  • Snowflake - data warehouse thing
  • Scala
  • Broadway
  • Oban - "robust job processing for Elixir"
  • Dashbit
  • pandas - Python data library
  • APL
  • Arrow flight
  • GRPC
  • DataFusion - query execution engine
  • Polars - "DataFrames in Rust"
  • Explorer - built on top of Polars
  • Voltron data
  • The Composable Codex
  • Pyarrow - Arrow bindings for Python

Quotes

  • I've been reading a lot about data pipelines
  • What's so special about data pipelines?
  • There's a lot of special tooling
  • There's a lot of bad, bad tooling
  • Less than optimal tooling
  • Converging on something biggerlk
  • He got me eventually
  • All of your steps in one bucket
  • What tools do you associate with data?
  • I inherited a data pipeline
  • BashReduce
  • Iterate on the L and the T
  • The modern data stack
  • And then you demand more work
  • No unnecessary copies
  • Barely a copy
  • Reconnecting with my Python roots
Regular Programming
Conversations about programming. By Andreas Ekeroot and Lars Wikman, funded by Underjord.io.