Data Product Management, Data Error Cascades, One Piece Flow, Don’t Fix Data Bugs; ThDPTh #7

Feb 18, 2021

Three Tips on data product management; How data errors cascade down, why one-piece workflows are much better than context switching, and why you shouldn’t fix all data bugs.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

As you might’ve noticed by the title, I got to think about data product management this week and in particular three topics which I like to share my favorite resources on because I think product management is so important for data teams.

1 Data Errors Cascade, Watch Your Data or Your Data Project Dies

I recently picked up a great paper from the data engineering weekly newsletter, published by Google about data, quality, and the impact on certain AI models. Since I think the qualitative implications of it are much broader, I’d like to highlight this article.

The general idea is very simple: “smart products, products using (analytical) data, rely on a long pipeline of steps. Capturing of data, emission, ingestion, lots of transformations, possibly training & configurations of derived models, etc. . is a true pipe in the sense that quality carries over. Errors cascade to later stages of the pipe. They might jump 1 or 2, but they cascade down. As a result, not applying quality assurance on the level of software results in lots of quality issues in the last stages — the stages of actual value.”

I think this extends way beyond machine learning models, into all of your data products, and as such highlights why Data as Code is such an important concept on all the stages of your pipe.

It shows me for instance, that you have to think really hard before introducing data-heavy products in an analytics department/ tech department which has a low standard for data. Because whatever you do, you have to count on any data product to be working on “60%-80% quality” no matter what you do. That’s completely fine if you want to build a PYMK feature. But it’s not if you want to presort e-mails for the customer support team. They’ll probably end up spending more time resorting than before.

Ressources

Google’s paper on data.
Intro to the PYMK feature from Linkedin written by the author.

2 Context Switching VS Focussing

Data teams in my experience have been thrown into a place where they take care of a lot of different “products”, some data science things, maybe some actual apps, a data warehouse, some reporting GUI, some ad-hoc reports, etc.

This in turn leads them to often do multiple different things within one work increment/SCRUM sprint, which then stretches these things a little longer. I absolutely love the visual explanation of Henrik Kniberg on why this is a really bad idea.

Watch the video, but I’m gonna repeat his argument because I find it crucial. Take your tasks, like producing ad-hoc analyses, then updating it and incorporating changes, prototyping a machine learning model, etc. All these things consist of steps 1–2–3 till the real business value is delivered. Now we got two options to work on multiple things:

Option 1 (Context Switching): In parallel, step 1s of all tasks together, then steps 2, then steps 3.
Option 2 (One Piece Flow): Serial, first finish the first task, then the second, then the third…

(picture by the author, after Henrik Knibergs presentation.)

What’s the difference?

Most data teams do option 1, multiple “themes” within one sprint/quarter.
Time to Market is 2–3 times higher in the second example.
Option 1 finishes all items later (Except 3 which is done at the same time)

Now the bad news is: in the real world, there is context switching. Option 2 only got 2 context switches, where option 1 has a lot of them (each day, within the daily stand-ups, in refinements,…). So option 1 actually takes longer than option 2.

I highly recommend you take a look at your workflow and the number of context switches you do within each developer’s day, each team’s day/sprint/quarter,…

Ressources

Henrik Kniberg on Vimeo on Context Switching.

3 Stop Fixing (Some) Data Bugs

(from Brandon Chu, “Ruthless Prioritization”, thanks for letting me use it.)

Data teams have a tendency to fix bugs right away. After all, something is broken, right? Some data ain’t correct, people cannot do their analyses, they need a fix.

Only that task switching + actually fixing things, which always takes longer than estimated, takes time away fromother things. From other value-producing activities.

I like Brandon Chus’ framework pictured above because it’s simple and actually pretty applicable in most data warehouse/ machine learning/ data science scenarios. The important part: If for instance some data source is not correct, and it affects 5% or less of your end-users, then you do nothing! Put it into your backlog and keep it there, prioritize it like any other feature.

Only things that are essential AND affect a lot of people should be fixed immediately. Everything else should be able to wait for at least one prioritization & planning cycle. So I suggest you go through your past iterations and see what the implications of this framework could’ve been.

Ressources

Brandon Chu, Blackbox of PM, Ruthless Prioritization.

P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. But I tend to be opinionated. But you can always hit the unsubscribe button!

By Sven Balnojan

Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.

Tweet Share

In order to unsubscribe, click here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Three Data Point Thursday

Discussion about this post