đ The Future of BI is OS, md5() in SQL, Kitagawa on Platforms; ThDPTh #12 đ
What the future of BI looks like, how to generate proper unique keys in SQL, and a final look at how to build data platforms.
Data will power every piece of our existence in the near future. I collect âData Pointsâ to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
đĽ (1) The Future of BI is Open Source
Maxime Beauchemin, the creator of both Apache Airflow and Superset, just published a great piece about why the future of business intelligence is open source. I totally agree with him and still find it mind-boggling that open source is just now catching up to this. In BI, or in fact, in most data topics, the cost of implementing something is usually governed by two drivers:
1. the âsourceâ of data, meaning the number of different sources and their intrinsic complexity,
2. the target, the number of use cases, and their complexity/ quality requirements.
Since this is the case, customers of BI tools, data integration tools, etc. have a very heterogeneous field of needs. This is the perfect place to apply open-source, Take a look at any piece of bought data tool in your pipe. Does it fit all your use cases? I bet not. I bet that for at least 20% of your use cases you got to customize the hell out of it.
I really like that finally, more tools are emerging in this space like airbyte and meltano in the data integration sector or superset & redash in the visualization sector.
Read the piece, after reading it I doubt, you will go for anything other than an OS solution for your BI stack.
The Future of Business Intelligence is Open Source | by Maxime Beauchemin | Mar, 2021 | Medium
While âsoftware is [still actively] eating the worldâ, itâs also clear that open source is taking over software. Simply put, open source is a superior approach at building and distributing softwareâŚ
maximebeauchemin.medium.com  â˘Â Share
đĽ (2) The Most Underutilized SQL Function
Tristan Handy published a short article about the md5() hashing function in SQL. Simply put, the md5() function generates a unique id. Itâs a hashing function, so the same input yields the same result, but reversing isnât possible. Tristan Handy writesâŚ.
â[I believe] every single data model in your warehouse should have a rock-solid unique ID.â
And I agree. Indeed I find the point of having a contextless md5 id is:
A unique md5 id is ONE join key, possibly replacing a combination of keys that make a data set unique, and as such speeding up joining AND speeding up development.
It abstracts away âdomain knowledgeâ which is a great thing, and thus kills hidden assumptions which come with âdomain knowledgeâ.
It stops you from exposing these keys to end-users; Which hopefully helps you understand the true requirements.
I truly believe data teams should stay as close to the source data as possible. That includes, not making ANY assumptions on the uniqueness of anything a âsourceâ provides. If you got handed a data set which contains:
An âitem idâ
An âorder idâ
You might assume that order id + item id is a unique combination, so you could join over âitem id & order idâ but that makes your SQL statements more complex than they need to do. So a single join key should be here. Why not add another column âitemId-orderIdâ?
If you use this column, youâll indirectly assume that the combination itemId-orderId is unique, very likely without checking and certainly without making it visible to other developers. But what if updated orders actually add a new row with the same items? The solution would be to use:
md5(itemId + orderId)
Implement a check on uniqueness on the md5
Take a look at some of your SQL statements and see where you apply uniqueness assumptions but really shouldnât. Also, check where you could reduce complex to simple joins by using just one id.
The Most Underutilized Function in SQL
Thereâs a single SQL function that I have come to use surprisingly often. What is it? md5()
blog.getdbt.com  â˘Â Share
âď¸ (3) A final look at platforms by Justin Kitagawa
I talked a lot about platforms in my last newsletter, I got one last thought to go, and then Iâll leave you alone. Justin Kitagawa leads the dev platform efforts at Twilio. He describes some important shifts they made to get into the âplatformâ feeling. In particular, he has a lot of important points that focus very much on the âproduct sideâ of things. His four principles apply very well to data platforms or X-as-a-Service constructs:
API First, after all, itâs for developers and interfaces are the ground concept of platforms.
Self Service Platform, youâll want people to do things themselves. In particular, you want them to be able to use it without talking to you, let alone having an expert in their team.
Declarative over Imperative to reduce cognitive load. Declarative constructs usually take away the âhowâ which should be hidden inside the platform and focus only on the âwhatâ.
Design with empathy (for the developer/ end-user), after all, it is a product they should love to use!
Iâd like to add a fifth one:
Build Best Practices in. If you want developers to adhere to best practices on your platform, then make it worth their while! Either build them in or at least explain & show, why they will benefit, by making them faster & better.
Ok, enough of data platforms.
Platforms at Twilio: Unlocking Developer Effectiveness
Justin Kitagawa talks about Twilioâs DevOps culture of âYou build it, you run itâ, and the evolution, tenets, and lessons learned of Twilioâs internal Platform.
www.infoq.com  â˘Â Share
đ In other news & Thanks
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue