NordicData Architecture, Functional Data Engineering, 3 Trees; ThDPTh #69
I’m Sven, and this is the Three Data Point Thursday. The email that helps you understand and shape the one thing that will power the future: data. I’m also writing a book about the data mesh part of that.
Time to Read: 8 minutes.
Another week of data thoughts:
- Functional Data Engineering saves you data engineering brainpower
- I just had to plant 3 trees for this newsletter
- Nordics are super-efficient data architects
Functional Data Engineering
What: I’ve been writing about Maxime’s content before, this time I’m sharing a video of a talk he gave on the topic of “functional data engineering”. The idea of functional engineering is to take the ideas of functional programming and translate them into data engineering to reap the same benefits. One major is the idea of reproducibility.
My perspective: I think everything about functional data engineering can be brought down to one simple sentence
“Compute is cheap and getting cheaper, Storage is cheap and getting cheaper, data engineers' brains are expensive and are getting more expensive. ”
It’s pretty clear how you want to distribute weight on this scale right? The functional approach to data engineering is so appealing to me because when the team I joined as a data engineer set out to build a brand new data warehouse, we ended up with a very similar approach.
But for a very different reason. We set out to build an immutable system, one that could be reproduced with the click of a button because of the ideas we got from the microservices world. In that world, the ideas of immutability, statelessness and reproducibility are also key.
Needless to say, I believe having this functional approach to data will enable your company to go further with the resources you got, free them up to work on the serious & important stuff and not chase down weird ETL rabbit holes.
Oh yeah, FWIW, after watching that talk I got excited and created a bunch of easily accessible examples in Python for functional data engineering.
Nordic Data Architectures are Super-efficient!
What: Richard Wang from Validio, a data start-up building a data reliability platform, crunched some numbers and published a list of the toolkits used by nordic companies. The focus is mostly on newer companies with larger growth rates. Nevertheless, the top tools are:
- Apache Airflow, the data orchestrator, used by 50%+ of the companies
- dbt, the data build tool, used by 50% of the companies
- BigQuery, the data storage system, used by 40% of the companies
- Looker, the BI tool, is used by 30% of the companies.
My perspective: It’s very interesting to see that nordic data teams apparently prioritize community size above everything else. Of course, that might be a secondary factor, but it is definitely true for Airflow, dbt, and BigQuery, all tools that by far lead with the biggest communities surrounding these tools.
Maybe it is the size of the ecosystems that makes these tools so appealing or something else. What is clear is that the Nordics are apparently converging on an efficient data stack that can be adapted to most stages of fast-growing companies.
What: Simple tool to calculate the amount of CO2 “produced” by your website through visits, the size of the website & the electricity necessary to deliver it.
My perspective: I just learned that I need to plant 3 trees to keep up my website. Other than that, I like this delightful way of educating through data.
🎄 Thanks => Feedback!
Thanks for reading! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
And of course, please provide me with feedback: