Crypto Collapse, Great Dbt Models, Gartner D&A Summit; ThDPTh #80
What happens if dbt collapses? What are the key ingredients in well-written dbt models? What are my takeaways from the takeaways from the Gartner D&A Summit 2022?
Hi, I’m Sven, and this is the Three Data Point Thursday. We’re talking about how to build data companies, how to build great data-heavy products & tactics of high-performance data teams. I’ve also co-authored a book about the data mesh part of that.
Takeaways from Gartner D&A Summit 2022
What: Prukalpa Sankar, co-founder of Atlan, describes her key takeaways from the Gartner Data and Analytics Summit as
data concierges over data plumbers - delivering the right data, not a lot of data is becoming more important
big data is out, small and synthetic data is in
active metadata will become more important
augmented analytics will become a thing
governance isn’t about control
My perspective: Some of these are good points, some of them not, some of them need reshaping IMHO. Let’s take a look at the foundations to really figure out what will go on…
In 2020 I published an article highlighting three major forces of data and how they will shape the trends of the years 2020-2021. Turns out, nothing has changed. The three forces are:
Growing complexity in the world
Growing demand for data (use)
Growing amount of data
Since the demand for data use keeps on growing, and data use cases are typically decentralized across companies & departments, and not all of them are created equally, it is only natural that the idea of “data concierges” becomes more important - the selection of the most important data & data use cases. I see this as one implication that will hold. Although you might simply call it “product management for data”.
Decentralization of use cases and their growing demand also implies that moving data to these decentralized places will be more important in the future. Hence “active metadata” is a trend that makes sense in this more general context.
Having ever growing complexity in the world makes augmented analytics more valuable, a trend I completely agree with.
However I am lost on the ideas of small data, throwing “big data” out and the rise of synthetic data. The amount of data available to businesses is growing exponentially, if companies don’t get good at dealing with vast amounts of data, they will lose to competitors who do and who will build up data network effects. Synthetic data is an interesting idea, but I don’t see it connected to the major forces shaping the future of data.
Components of well-written dbt-models
What: Madison Scott shares three components of well written [dbt] models. They are
Modularity - reusable and referenceable code
Readability - use comments, CTEs, descriptive names
She argues these three components make for models that “stand the test of time”.
My perspective: All three components make sense. But even well written models will decay when left unattended, and IMHO no code “stands the test of time”. Code naturally rots, because context, circumstances, environments and data changes with time.
The fix any data developer can apply is to refactor, refactor often.
In fact, refactor every time you touch a model. Refactoring, creating more “modularity” or merging existing modules into one model should happen every time you tackle a new challenge. It will help you understand the underlying models faster than anything else.
If you make refactoring for understanding a common practice, you will have well written models wherever you look.
End of Crypto?
What: The Economist (paid) has a great article on the recent crypto collapse, where basically a cascade of effects is pushing crypto down, including the collapse of the third largest exchange FTX.
The author however comes to the conclusion of valuing the underlying technology, blockchain, for it still offers a very unique value proposition to the whole world.
My perspective: I’m bullish on crypto. But that’s not what I was thinking about when I read the article. Crashes and cascading collapses happen all the time, and they change industries.
What are the big cascades that could happen inside the data world?
What happens if dbt disappears?
What if SQL gets replaced by another better protocol?
What happens if Snowflakes goes out of business?
What does your niche depend on? What happens if that thing collapses?
What did you think of this edition?
Want to recommend this or have this post public?
This newsletter isn’t a secret society, it’s just in private mode… You may still recommend & forward it to others. Just send me their email, ping me on Twitter/LinkedIn and I’ll add them to the list.
If you really want to share one post in the open, again, just poke me and I’ll likely publish it on medium as well so you can share it with the world.