🐰 Dbt; Matt Turck; Apache Airflow; ThDPTh #40 🐰
I’m experimenting with a few changes: More raw input & text for you! I’ll scrap the images. But there’s now also a feedback scale at the bottom which you can & should use! It’s easy, just click on it and I’ll know which newsletter sucks, and which you would like to get more of!
I’m Sven, I collect “Data Points” to help understand & shape the future, one powered by data, not electricity anymore.
This week we will look into the amazing landscape Matt Turck provided of the data space; We will take a look at a Dbt event logging module; We will also look into the Apache Airflow journey of the company HomeToGo.
Matt Turks Data Landscape
🚀 This is something you simply have to click on and dig into. I really admire the effort they took to create this landscape. A few things stand out to me, first of all, it’s the near-Cambrian explosion in the data space that apparently happened sometime in the past 2–3 years. The landscape is expanding in both the sheer number of companies but also in the number of categories.
It’s a great time to be in this space. However, if we’re switching the perspective towards the investing/VC perspective, I would wish someone would start to dig deeper into the landscape and provide a value-based segmentation. The categorization into infrastructure is nice but tells us nothing about the core of the businesses. In my eyes, a good segmentation would answer the question “Which part of the data cycle does this company enhance the most?” to get to the drivers of the value the company helps to create. Unfortunately, I currently don’t have time for that ;)
But if you do, I’d love to see the results.
Dbt's Event Logging Package
🎁 Did you ever hunt down a data problem and ask yourself “Is this model still running?” Or got asked “are the revenue numbers already in for today?”? Knowing which part of the model run has started & finished is important for a lot of reasons. This package uses simply pre & post-hooks to accomplish that. The result will be a neat table (which you can customize) that allows you to put up a nice dashboard on top of it with typical metrics like
is the model done?
average time to finish
the last run
Keep in mind, the package claims it has bad performance on Big Query.
HomeToGo Airflow Journey
🔮 Even though I like to talk about the new entrants into the data orchestration space dagster and prefect a lot, Airflow is still and might still keep on being the winner in this category.
HomeToGo has a huge data operation running, being the largest vacation rental engine in the world. This article explains the journey part of the journey, not the pretty end result which I really enjoy.
I like the focus the team puts on developer experience by providing good tools to do local development, in their case inside a decent docker image.
“make it runnable super easily.”
Keep that in mind, and take a look at your local data development environment.
Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
And of course, leave feedback if you have a strong opinion about the newsletter! So?
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue