π The Future of BI is OS, md5() in SQL, Kitagawa on Platforms; ThDPTh #12 π

What the future of BI looks like, how to generate proper unique keys in SQL, and a final look at how to build data platforms.
Data will powerΒ every piece of our existenceΒ in the near future. I collect βData Pointsβ to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
π₯ (1) The Future of BI is Open Source
Maxime Beauchemin, the creator of both Apache Airflow and Superset, just published a great piece about why the future of business intelligence is open source. I totally agree with him and still find it mind-boggling that open source is just now catching up to this. In BI, or in fact, in most data topics, the cost of implementing something is usually governed by two drivers:
1. the βsourceβ of data, meaning the number of different sources and their intrinsic complexity,
2. theΒ target, the number of use cases, and their complexity/ quality requirements.
Since this is the case, customers of BI tools, data integration tools, etc. have a very heterogeneous field of needs. This is the perfect place to apply open-source, Take a look at any piece of bought data tool in your pipe. Does it fit all your use cases? I bet not. I bet that for at least 20% of your use cases you got to customize the hell out of it.
I really like that finally, more tools are emerging in this space likeΒ airbyteΒ andΒ meltanoΒ in the data integration sector or superset & redash in the visualization sector.
Read the piece, after reading it I doubt, you will go for anything other than an OS solution for your BI stack.
The Future of Business Intelligence is Open Source | by Maxime Beauchemin | Mar, 2021 | Medium
While βsoftware is [still actively] eating the worldβ, itβs also clear that open source is taking over software. Simply put, open source is a superior approach at building and distributing softwareβ¦
maximebeauchemin.medium.com Β β’Β Share
π₯ (2) The Most Underutilized SQL Function
Tristan Handy published a short article about the md5() hashing function in SQL. Simply put, the md5() function generates a unique id. Itβs a hashing function, so the same input yields the same result, but reversing isnβt possible. Tristan Handy writesβ¦.
β[I believe] every single data model in your warehouse should have a rock-solid unique ID.β
And I agree. Indeed I find the point of having a contextless md5 id is:
A unique md5 id is ONE join key, possibly replacing a combination of keys that make a data set unique, and as such speeding up joining AND speeding up development.
It abstracts away βdomain knowledgeβ which is a great thing, and thus kills hidden assumptions which come with βdomain knowledgeβ.
It stops you from exposing these keys to end-users; Which hopefully helps you understand the true requirements.
I truly believe data teams should stay as close to the source data as possible. That includes, not making ANY assumptions on the uniqueness of anything a βsourceβ provides. If you got handed a data set which contains:
An βitem idβ
An βorder idβ
You might assume that order id + item id is a unique combination, so you could join over βitem id & order idβ but that makes your SQL statements more complex than they need to do. So a single join key should be here. Why not add another column βitemId-orderIdβ?
If you use this column, youβll indirectly assume that the combination itemId-orderId is unique, very likely without checking and certainly without making it visible to other developers. But what if updated orders actually add a new row with the same items? The solution would be to use:
md5(itemId + orderId)
Implement a check on uniqueness on the md5
Take a look at some of your SQL statements and see where you apply uniqueness assumptions but really shouldnβt. Also, check where you could reduce complex to simple joins by using just one id.
The Most Underutilized Function in SQL
Thereβs a single SQL function that I have come to use surprisingly often. What is it? md5()
blog.getdbt.com Β β’Β Share
βοΈ (3) A final look at platforms by Justin Kitagawa
I talked a lot about platforms in myΒ last newsletter, I got one last thought to go, and then Iβll leave you alone. Justin Kitagawa leads the dev platform efforts at Twilio. He describes some important shifts they made to get into the βplatformβ feeling. In particular, he has a lot of important points that focus very much on the βproduct sideβ of things. His four principles apply very well to data platforms or X-as-a-Service constructs:
API First, after all, itβs for developers and interfaces are the ground concept of platforms.
Self Service Platform, youβll want people to do things themselves. In particular, you want them to be able to use it without talking to you, let alone having an expert in their team.
Declarative over ImperativeΒ to reduce cognitive load. Declarative constructs usually take away the βhowβ which should be hidden inside the platform and focus only on the βwhatβ.
Design with empathyΒ (for the developer/ end-user), after all, it is a product they should love to use!
Iβd like to add a fifth one:
Build Best PracticesΒ in. If you want developers to adhere to best practices on your platform, then make it worth their while! Either build them in or at least explain & show, why they will benefit, by making them faster & better.
Ok, enough of data platforms.
Platforms at Twilio: Unlocking Developer Effectiveness
Justin Kitagawa talks about Twilioβs DevOps culture of βYou build it, you run itβ, and the evolution, tenets, and lessons learned of Twilioβs internal Platform.
www.infoq.com Β β’Β Share
π In other news & Thanks
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue