DeDa, B/G DBs, chaos days for data teams; ThDPTh #72
I’m Sven, and this is the Three Data Point Thursday. The email that helps you understand and shape the one thing that will power the future: data. I’m also writing a book about the data mesh part of that.
Time to Read: 4 min
Another week of data thoughts:
- Data is getting decentralized (DeDa)
- Shoot down your BI systems for fun from time to time
- Doing B/G deployments in your database is useful and doable
🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰
Decentralized Data at DeSci
What: Sarah Hamburg, scientist, and co-founder of LYNX, wrote an intriguing article on the a16z blog. It’s about decentralization in science using blockchain-based technologies. DeSci is described as being at the intersection of two major trends, blockchain as a technology trend as well as the science community movement to shift how science is funded & conducted.
My perspective: The article is about DeSci, and especially funding. Sarah uses the word “funding” 14 times.
But do you know which word she uses more often?
Data. 18 times to be precise.
Because “data ownership”, “data in science” and “decentralized data” are all topics that come into play in this area.
It seems to me, that there is something happening at a slightly different intersection, that of blockchain and data.
I have no idea exactly this will turn out, but I do see that this is a much broader thing than just the DeSci movement (not to make that a “little thing”). Sarah’s startup LYNX isn’t decentralizing science as much as they are decentralizing the ownership over data. I’m looking forward to seeing more happening in that intersection.
Resource: https://future.a16z.com/what-is-decentralized-science-aka-desci/
5 Minute Chaos Day & Playbook
What: The linked resource is a very short version of a playbook written by EqualExperts for organizing chaos days. A practice where you purposefully break stuff to make them more robust (or antifragile) in the future.
My perspective: What would happen, if the filespace of your server running your data orchestrator, or the one running your business intelligence tool would overflow?
Now idea? For a lot of data teams, this would result in a *show. Full disc spaces usually mean you won’t be able to access the system either via the GUI, nor SSH. So the only solution is to kill it and bring it up again (or do some magic using backups and detaching/reattaching volumes), which usually results in lost data.
A short chaos day is there to make sure, next time your team knows what to do if unexpected stuff happens. That next time, the team learns to not just recover from the fall but to make the system stronger. That the next time, your team will have monitoring in place alarming on the spot that something is wrong.
I suggest you think about running a chaos day, injecting some fault into the system (without telling your team where the fault is), and help them manage the crisis, and learn from it.
Resource: https://github.com/EqualExperts/chaos-day-playbook/blob/master/5-minute-guide.md
B/G Deployments with Dbt & Snowflake
What: The guys from Montreal Analytics share how to do B/G deployments with snowflake and dbt. B/G deployments are the practice of first deploying a completely new artifact like a newly “built” data set to an empty instance which is “next to the production instance” and then doing a “switch” to channel production traffic over to the new dataset. This practice enables zero-downtime “upgrades”, allows for testing in a production environment, and 5-second roll-back in case something fails.
My perspective: I love the practice which is completely underused in the data world. The reason might be, that it’s actually not that easy to do B/G deployments with databases, because “switching” isn’t as straightforward as redirecting a load balancer. But there are options out there, including using Nginx as a load balancer (which has “SQL support”), using schemata to do B/G deployments, snowflake, and its easy swap command, or abstraction layers like a trinoDB query engine. True, this is a pretty long list compared to “switching the load balancer”, but the benefits of B/G deployments are real.
Ressource: https://medium.com/montreal-analytics/blue-green-deployment-with-dbt-and-snowflake-922f1c658011
🎄 Thanks => Feedback!
Thanks for reading! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
And of course, please provide me with feedback:
It is terrible | It’s pretty bad | average newsletter… | good content… | I love it!