The visible hand of data, lakesFS, dbt doc blocks; ThDPTh #79

Nov 10, 2022

I’m Sven, and this is the Three Data Point Thursday. We’re talking about how to build data companies, how to build great data-heavy products & tactics of high-performance data teams. I’ve also co-authored a book about the data mesh part of that.

Time to Read: 3 minutes.

Another week of data thoughts:

Rolling back a bad “data deployment” in seconds is possible.
Rolling back & testing data in isolation reduces data downtime by a lot. Are you doing it?
Documentation should be DRY and at source, that’s possible using dbt.
Stop avoiding the hard decisions in data: Deciding to manage your data teams like any other product team.

🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰

Rolling back bad data

What: This article is a case study provided by one lakeFS customer called Epcor. They describe how lakeFS, a data versioning tool operating using metadata, helped to solve their problems.

My perspective: I don’t find the article particularly well written, but I do find the key points important. Two capabilities should be standard and good practice in every data pipeline, and yet they are almost always missing:

The capability to deploy all the data into an isolated environment before making it available to production.
The option of rolling a bad data change back within seconds.

These two capabilities are hugely valuable, and reduce downtime by a lot. And yet, I don’t see them implemented, almost never. I love the lakeFS concept because it only operates on metadata and thus makes it not just possible but easy to do these kinds of operations.

Ressource: https://lakefs.io/ci-cd-data-pipelines-with-lakefs/

Doc Blocs for dbt

What: The team at Montreal Analytics share their best practices for creating “sustainable dbt project documentation”. Where the sustainable part mostly refers to keeping it “DRY”. They do so by using the Jinja “doc blocks” functionality.To put it simply: using doc blocks, you only have to write down the description of columns you use across your project once.

My perspective: I seldom share these small tricks. But this is one I stumbled over multiple times, and it simply is helpful. It helps to embrace two best practices of documentation:

to document at the source,
and keep your docs DRY.

Resource: https://blog.montrealanalytics.com/building-sustainable-dbt-project-documentation-8def88ca67c3

Data’s invisible hand

What: In his recent newsletter, Benn Stancil plays around with a few economic concepts to imagine a good mechanism for handling data work inside a company. The idea is to have stakeholders with “credits” that can bet them on analytical work packages to be done by the data teams.

My perspective: In my experience, the question Benn asks “how do I prioritize the work the units inside the company hand to me” is not the best question to ask.

A better question is: “How do I figure out the right work to do? What’s the best analytical work to be done for this company?”. And yes, the answer could be “none of the things people tell me they ‘want’ and none of the things I can ‘prepackage into votable options'’”.

Or to put it more directly: The solution is almost always simple and yet hard - manage your data team like any other product team, go all in on product management for your data team.

And yet people struggle with this again and again, because it is really hard.

It is hard for data teams to adopt a “product” & “customer” focused mentality
It is hard for usually technically trained data PMs to push aside the technical training and focus on the product & the customer
It is hard for the whole team to have difficult conversations with the department & the company CXOs to get alignment in the right direction.

However, in my experience, this is the only thing you can truly do to make your data teams work worth it. And that’s also the best option for the company, so usually, there is a way to get there, as said it’s just no fun at all.

P.S.: I have been involved in other options, especially these kinds of “betting” systems. The scaled scrum framework SAFE as well as the “networked company” models both implement a betting system and I’ve experienced both of them first-hand. IMHO they provide a distraction from the true problem with a lot of fluff.

benn.substack

Data's invisible hand

Externally, startups are the ultimate capitalist enterprise. They fight in raw competition with no political clout; they benefit from no regulatory capture; they are afforded no unfair advantages. Their survival depends entirely on their ability to win users and customers. Their reward is unimaginable wealth; their punishment is des…

3 years ago · 13 likes · 16 comments · Benn Stancil

What did you think of this edition?

-🐰🐰🐰🐰🐰 I love it, will forward!

-🐰🐰🐰 Average newsletter...

-🐰 It is terrible

Want to recommend this or have this post public?

This newsletter isn’t a secret society, it’s just in private mode… You may still recommend & forward it to others. Just send me their email, ping me on Twitter/LinkedIn and I’ll add them to the list.

If you really want to share one post in the open, again, just poke me and I’ll likely publish it on medium as well so you can share it with the world.

Three Data Point Thursday