đ° #25 TechStyles modern data platform, ETL needs OS, is Airflow good enough?; ThDPTh #25 đ°

How TechStyle created their modern data platform, why ETL needs open-source, and whether airflow is good enough as a data orchestrator.

Notice how completely normal the data mesh concept appears?
Data will power every piece of our existence in the near future. I collect âData Pointsâ to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
(1) TechStyles modern data platform
The data world is in turmoil, so I love every piece of experience I can get my hands on. I really enjoyed this article by Prukalpa Sankar, featuring a modern data stack with Snowflake, Atlan, and Tableau.
Iâll just share two quotes and would simply recommend you to read the whole article. Itâs really well written.
ââThings are moving so fast nowâŚâ [âŚ] Instead, TechStyle opted for an ELT style of data engineering, where they load the data as-is from the source. Once the raw data is loaded, TechStyle uses a hybrid approach to model whatever needs to be modeled and happily leave the rest untouched.â
ââWeâre onboarding analysts, but theyâre not as effective because they donât understand the data.ââ
So after modernizing the data warehouse, they noticed that we need more, education, data cataloging, etc. Itâs a great example of todaysâ journey for data organizations.
TechStyleâs approach to data warehousing and data analytics, metadata creation, democratizing tribal knowledge, championing data management, and more
towardsdatascience.com  â˘Â Share
(2) Why ETL needs Open Source
Iâve been saying again and again that I think the data space will be dominated by open source solutions, because of the âsnowflake problemâ, the problem that every data setup inside companies is completely unique.
So itâs great to finally get an article from the guys behind airbyte on that topic that backs this up with their experience. They focus very much on the ETL use case, where I think this conclusion applies to the complete data case. But I really like how they put their experience and 200 company interviews into this form and show the exact road ETL has to go on in the future.
I also like the way they think about their CDK because the CDK is really an essential part of the incentive structure for their open source project. Great to see that even though they are at the very beginning, they got a good vision of where they need to go.
I do think though, in the future, theyâll need to spend more time on the high-level structure of the data space & their open-source side (because after all, I think ETL is a system that is set up to be obsolete in 5â10 years). But Iâm sure they will get there.
Why ETL Needs Open Source to Address the Long Tail of Integrations - DATAVERSITY
In our interviews, we found that many usersâ ETL solutions didnât support the connector they wanted, or supported it but not in the way they needed.
www.dataversity.net  â˘Â Share
(3) Is Airflow Good Enough?
Anna Geller wrote a good piece on airflow and data orchestrators in general. Hereâs a little summary of her points:
The strength of Airflow lies undoubtedly in the community, the support & extensibility that comes with that. However, as Anna writes, Airflow also brings a bunch of weaknesses.
Thereâs no native versioning of flows, itâs very unintuitive for new users, itâs got a configuration overload and is hard to use locally. All things that basically make it hard to develop fast. Itâs also where some of the new tools shine. Prefect focuses on taking lots of this out of your hands. Dagster has a great testing concept and is much easier to handle when it comes to developing new flows.
The problems at setting up Airflow at production are to my knowledge mostly mirrored at both prefect & Dagster so Iâm not sure one can consider that a weakness of airflow but more of the category of tools.
However, there are managed solutions available that take away quite a bit of the hassle. If youâre looking for a data orchestrator, take a look at Annas article.
The pros and cons of Apache Airflow as a workflow management platform for ETL & Data Science and deriving from that the use cases for which Airflow may be a good or a bad choice
towardsdatascience.com  â˘Â Share
đ Slides for Talk on Data Mesh &Â Thanks
Thanks for reading this far! Iâd also love it if you shared this newsletter with people whom you think might be interested in it.
I finally got around to holding a talk on data meshes, focused on being as concise as possible while still giving a bit of my bigger view on things. You can find the slides here:
Mars missions & Data meshes - a crash course to data meshes
Data meshes are the latest data architecture trend. Really a paradigm shift. But what actually happens is just the natural evolution of technological decentralâŚ
www.slideshare.net  â˘Â Share
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue