š° Dbt; Matt Turck; Apache Airflow; ThDPTh #40 š°
Iām experimenting with a few changes: More raw input & text for you! Iāll scrap the images. But thereās now also a feedback scale at the bottom which you can & should use! Itās easy, just click on it and Iāll know which newsletter sucks, and which you would like to get more of!
Iām Sven, I collect āData Pointsā to help understand & shape the future, one powered by data, not electricity anymore.
This week we will look into the amazing landscape Matt Turck provided of the data space; We will take a look at a Dbt event logging module; We will also look into the Apache Airflow journey of the company HomeToGo.
Matt Turks Data Landscape
š This is something you simply have to click on and dig into. I really admire the effort they took to create this landscape. A few things stand out to me, first of all, itās the near-Cambrian explosion in the data space that apparently happened sometime in the past 2ā3 years. The landscape is expanding in both the sheer number of companies but also in the number of categories.
Itās a great time to be in this space. However, if weāre switching the perspective towards the investing/VC perspective, I would wish someone would start to dig deeper into the landscape and provide a value-based segmentation. The categorization into infrastructure is nice but tells us nothing about the core of the businesses. In my eyes, a good segmentation would answer the question āWhich part of the data cycle does this company enhance the most?ā to get to the drivers of the value the company helps to create. Unfortunately, I currently donāt have time for thatĀ ;)
But if you do, Iād love to see the results.
Resources:
- Matt Turcks Data Landscape 2021
Dbt's Event LoggingĀ Package
š Did you ever hunt down a data problem and ask yourself āIs this model still running?ā Or got asked āare the revenue numbers already in for today?ā? Knowing which part of the model run has started & finished is important for a lot of reasons. This package uses simply pre & post-hooks to accomplish that. The result will be a neat table (which you can customize) that allows you to put up a nice dashboard on top of it with typical metrics like
is the model done?
average time to finish
the last run
ā¦
Keep in mind, the package claims it has bad performance on Big Query.
Resources:
HomeToGo AirflowĀ Journey
š® Even though I like to talk about the new entrants into the data orchestration space dagster and prefect a lot, Airflow is still and might still keep on being the winner in this category.
HomeToGo has a huge data operation running, being the largest vacation rental engine in the world. This article explains the journey part of the journey, not the pretty end result which I really enjoy.
I like the focus the team puts on developer experience by providing good tools to do local development, in their case inside a decent docker image.
āmake it runnable super easily.ā
Keep that in mind, and take a look at your local data development environment.
Resources:
š Thanks!
Thanks for reading this far! Iād also love it if you shared this newsletter with people whom you think might be interested in it.
Data will power every piece of our existence in the near future. I collect āData Pointsā to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
And of course, leave feedback if you have a strong opinion about the newsletter! So?
It is terrible | Itās pretty bad | average newsletterā¦ | good contentā¦ | I love it!
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue