Three Data Point Thursday

Share this post

🐰 Dbt; Matt Turck; Apache Airflow; ThDPTh #40 🐰

www.thdpth.com

Discover more from Three Data Point Thursday

Making your business smarter with data & AI.
Continue reading
Sign in

🐰 Dbt; Matt Turck; Apache Airflow; ThDPTh #40 🐰

Sven Balnojan
Oct 7, 2021
Share this post

🐰 Dbt; Matt Turck; Apache Airflow; ThDPTh #40 🐰

www.thdpth.com
Share
Three Data Point Thursday

I’m experimenting with a few changes: More raw input & text for you! I’ll scrap the images. But there’s now also a feedback scale at the bottom which you can & should use! It’s easy, just click on it and I’ll know which newsletter sucks, and which you would like to get more of!

I’m Sven, I collect ā€œData Pointsā€ to help understand & shape the future, one powered by data, not electricity anymore.

This week we will look into the amazing landscape Matt Turck provided of the data space; We will take a look at a Dbt event logging module; We will also look into the Apache Airflow journey of the company HomeToGo.

Matt Turks Data Landscape

šŸš€ This is something you simply have to click on and dig into. I really admire the effort they took to create this landscape. A few things stand out to me, first of all, it’s the near-Cambrian explosion in the data space that apparently happened sometime in the past 2–3 years. The landscape is expanding in both the sheer number of companies but also in the number of categories.

It’s a great time to be in this space. However, if we’re switching the perspective towards the investing/VC perspective, I would wish someone would start to dig deeper into the landscape and provide a value-based segmentation. The categorization into infrastructure is nice but tells us nothing about the core of the businesses. In my eyes, a good segmentation would answer the question ā€œWhich part of the data cycle does this company enhance the most?ā€ to get to the drivers of the value the company helps to create. Unfortunately, I currently don’t have time for thatĀ ;)

But if you do, I’d love to see the results.

Resources:

- Matt Turcks Data Landscape 2021

Dbt's Event LoggingĀ Package

šŸŽ Did you ever hunt down a data problem and ask yourself ā€œIs this model still running?ā€ Or got asked ā€œare the revenue numbers already in for today?ā€? Knowing which part of the model run has started & finished is important for a lot of reasons. This package uses simply pre & post-hooks to accomplish that. The result will be a neat table (which you can customize) that allows you to put up a nice dashboard on top of it with typical metrics like

  • is the model done?

  • average time to finish

  • the last run

  • …

Keep in mind, the package claims it has bad performance on Big Query.

Resources:

- Dbt Event Logging Package

HomeToGo AirflowĀ Journey

šŸ”® Even though I like to talk about the new entrants into the data orchestration space dagster and prefect a lot, Airflow is still and might still keep on being the winner in this category.

HomeToGo has a huge data operation running, being the largest vacation rental engine in the world. This article explains the journey part of the journey, not the pretty end result which I really enjoy.

I like the focus the team puts on developer experience by providing good tools to do local development, in their case inside a decent docker image.

ā€œmake it runnable super easily.ā€

Keep that in mind, and take a look at your local data development environment.

Resources:

- Apache Airflow at HomeToGo

šŸŽ„ Thanks!

Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.

Data will power every piece of our existence in the near future. I collect ā€œData Pointsā€ to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

And of course, leave feedback if you have a strong opinion about the newsletter! So?

It is terrible | It’s pretty bad | average newsletter… | good content… | I love it!

P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!

By Sven Balnojan

Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.

Tweet Ā Ā Ā  Share

In order to unsubscribe, click here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

Share this post

🐰 Dbt; Matt Turck; Apache Airflow; ThDPTh #40 🐰

www.thdpth.com
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

Ā© 2023 Sven Balnojan
Privacy āˆ™ Terms āˆ™ Collection notice
Start WritingGet the app
Substack is the home for great writing