Don’t be a ducker
I’m Sven writing this to help you build things with data. Whether you’re a data PM, inside a data startup, internal data lead, or investing in data companies, this is for you.
Dagster parent company elementl raises $33m
Duckers… Big data vs. small data isn’t the right question to ask.
There’s a new orchestrator in town.
Shitload of money in data orchestration
$33m Series B for Elementl, congrats!
Prefect with a Series B and $46m raised, Astronomer, the company behind Apache Airflow, with a Series C and $283m in total funding, and now Elementl, the company behind Dagster.
That’s a shitload of money in the data orchestration space.
Key takeaways:
Airflow is oooold.
Dagster “was optimized for the world of cloud, DevOps and containers.”
Dagster really tries to focus on what comes out of the data pipeline, the “data asset,” as they call it.
Dagster has substantial growth numbers, claiming a tripling YoY.
We’re still not sure who is winning the orchestrator wars.
“Elementl rethought this with what it calls a data asset (a table in a data warehouse or a file sitting in a data lake) at its core. So instead of thinking about tasks as the core abstraction, Elementl (and Dagster) focus on the data assets.”
Ducks Go Nuts
The MotherDuck, the company behind DuckDB, loves to try to convince you that big data is dead.
=> Since no one needs big data, DuckDB is perfect for everyone.
…Or so the story goes. We don’t buy it.
The other side, presented by Aditya Parameswan, tells you the ducks are full of crap.
TL;DR: Both sides are off, one just way more than the other. The question you should ask yourself is not what kinds of data to obsess over or whether to call it “smart” or “big” (or “dark”). It is what you DO, with or without the data.
The data will follow your actions.
We summed up the arguments for you with our own spiced mixed in:
The Duck argument:
Stop obsessing over “big” data; start thinking about making better decisions. (yes, we agree, just exchange “big” for “all”)
Most companies don’t have a lot of data (that’s a fact)
Most data is rarely queried (that is misleading. Not queried != not used)
Lots of blabla… wait, where’s the argument against big data? We missed it…
The big data argument:
MotherDuck basically got it upside down.
Companies don’t have a lot of data because they don’t collect it.
Most other ducky claims are only partially true or untrue.
The whole point is: If you want to focus on taking smarter actions, you’re pretty likely to start to collect more data and might end up with big data.
Our take? It’s a seatbelt situation right now with big data.
In the 70s, 90% of people did not buckle up. Now was that smart? Was that helpful? Did that mean the way to get the most out of your life was not to use a seatbelt?
We don’t think so… but that’s the MotherDuck argument. Noone wears a seatbelt, so you shouldn’t either.
Guess what? Most companies are going to disappear. Do you want to be one of them? Then follow the ducks. Or you could start to use your data, buckle up, and see how much data you need to drive your business forward.
Yet Another Orchestrator
TL:DR: It’s called orchestra and is currently in Beta.
The founder, Hugo Lu, wrote a confusing medium post about his ideas on data orchestration. Luckily we distilled the key concepts here:
It marries observability to orchestration (trend No. 1)
Orchestra jumps onto the “data asset” train (trend No. 2, hey elementl)
Oh, and it says it’s “without any code” (trend No. 3)
We’re still slightly confused by what orchestra really is trying to achieve, but hey, that’s all we could get out of the material.