In Case You Missed It: May 2023 Recap
More data newsletters, dbt, OS isn't going to free AI,...
I’m Sven writing this to help you build things with data. Whether you’re a data PM, inside a data startup, internal data lead, or investing in data companies, this is for you.
This special in-between issue serves as a recap of my writing in May. It features a few quotes from each piece so you can get a feel for them and quickly jump around the Three Data Point Thursday universe.
Enjoy!
(1) We need more data newsletters
(2) ELT 101
Yes, ELT is always a superior choice to ETL. Of course, long term, I hope we finally get to something even more powerful - a division alongside business and not technical seams, but that’s far off.
“Legacy ETL baggage still clouds all our vision. You need to understand ETL to understand why you’re better off with ELT and how to use it properly.”
(3) Dbt & too many small fish
Apparently, Dbt, the tool empowering the small guy, has too many small customers and too few whales.
“What you need to know:
Tristan thinks the tech market will take longer to recover.
He offers a fantastic leave package for his employees (including removing option cliffs and giving away laptops).
Another hoard of qualified people will hit the data job market.
DbtLabs handles laying off people smoothly. Kudos!”
…
(4) OSS is not the future of AI
“I love open source, but this isn’t going to happen—quite the opposite. The ones who see beyond “I want this to be true” will have a significant advantage over those who don’t.
Here are two facts you need to know to make wise business decisions regarding AI.
1) Training data, AI-model users, and computing power are all essential to create great AI models….”
(5) How to harm other people with data
“You should be aware of two facts: (1) you are using data, and this is a lot of time at the expanse of others (unintentionally) (2) others are doing the same thing to you.”
(6) The vector database hype explained
(7) Data Orchestrator 101 - everything you need to know
“Data orchestrators do one thing exceptionally well: enabling you to manage (scheduled) directed acyclic graphs (DAGs) for your data pipelines.”
Appreciate the shoutout!