Here’s why you need to change your work with data, at your whole company; Thoughtful Friday #28

The industry's efforts to change how we work with data, how you can choose your way, and tons of transformations to choose from.

Mar 31, 2023

I’m Sven and I’m writing this to help you build things with data. Whether you’re a data PM, inside a data startup, internal data lead, or investing in data companies, this is for you.

Let’s dive in!

My favorite topic to bring up these days is the exponential growth of the data space.

“Data is growing at an exponential rate, with 90% of the world’s data being generated in the last two years alone. More data formats are appearing on more devices, applications, and cloud platforms.” (Redstor)

Data startups are growing like weeds, trends, and company-wide transformations are everywhere. “Best practices for the data space” (hey Snowflake). All of those are clear signs, that we have no clue about.

In an exponentially growing space like the data space, there are no best practices. The future of the data space is completely unpredictable (and yet here I keep on making predictions).

There are no “data” best practices. The future of the data space is completely unpredictable.

In such a complex environment, there is only one way to advance. You probe, you sense, and you respond (by doubling down or changing the course of action). Then you repeat.

The Cynefin framework. Data is completely in the complex sector.

So in what direction should you probe?

“Data engineering has missed the boat on the “DevOps movement” and rarely benefits from the sanity and peace-of-mind it provides to modern engineers. They didn’t miss the boat because they didn’t show up, they missed the boat because the ticket was too expensive for their cargo.“ (Maxime in The Downfall of the Data Engineer)

The thing is, it’s not just data engineering, it is data in general that missed the boat. The article is 6 years old to date, and almost nothing has changed. Except for one thing:

The ticket isn’t too expensive anymore.

But it might feel like the boat already took off. Maxime followed up with the concept of Functional Data Engineering, and yet to date, the adoption is almost as it was in 2018 when he published this piece.

IMHO that is because something is still “expensive”, and hard:

Adapting your thinking to a fast-changing environment. The data world is changing faster than the software world. IMHO it will eat up the software world. The pace is breathtaking. Necessarily, your thinking will lag behind the actual status quo.

The ticket isn’t expensive anymore for two reasons:

1. There is 10x more technology for the data space than there was in 2017, making things “cheaper”.

2. The cargo, the data, is becoming more and more valuable.

For this reason, I think all companies need to actively seek out opportunities to catch up with the boat. Find a new boat if you must, but you need to do something because it is clear that transformation needs to happen. It’s not about “operationalizing more AI”, it is about making it easy to do anything with data (including “operationalizing AI”).

Go big

The data world is still young, there are no “best practices”. And yet, data creeps into every aspect of a company. This means we only have one choice: to take on large transformations that change the majority of our workflows, our processes, of how a company works with data.

If you feel like you’re stuck with your data, consider a large-scale change, not just a new tool or another data team.

Our only choice: take on large transformations that change the majority of our workflows, our processes, of how a company works with data.

The thing is, I was so close to finishing this article with a long list of cool and major trends I would consider. And yet, this left me feeling strange. Because in a complex world, it is not about the options per se, there is an infinite amount.

It is about your process of making such decisions. Because only a good process will allow you to make decisions fast, and then react to the result. Only a good decision process will allow you to “probe-sense-respond”.

Only a good decision process will allow you to “probe-sense-respond”.

So let me give you one possible process of probing into how you work with data.

But how do you choose?

There is no right transformation. Indeed, all of them might be bogus and be overthrown within a year or two. But there might be one, that is valuable to your situation and your industry right now.

I often use a “bottleneck analysis” for making decisions in the data space.

What helps me to analyze this choice is always a bottleneck-based approach. So I’m going to share this one model with you (I’ve already shared it a couple of times).

That’s the datacisions cycle. A click on a button (an action) is tracked by Google Analytics and thus becomes data. This data is joined to other data, manipulated in some way inside a data warehouse, and thus becomes information, displayed on a dashboard. The information helps a human have an insight (in his brain),e.g. That this button is clicked often. The human then turns his insight into a decision, to produce more buttons. Once he does, this becomes an action. An action, that is turned into more data by the code versioning tool. The clicks on these buttons also become more data and feed the same loop.

Now the question you have to ask yourself for every step is the following: If I dramatically increase the inflow to this step, is there equally more outflow on the decision & action steps?

If I pump in 100x more tracking data, will my company be able to turn this into 100x more good decisions? Some might. Some might not. Where you fail is where you need to step in.

Once you identified your first bottleneck this way, ask yourself: What if I removed this bottleneck? How about the other steps?

For instance, if you just keep pushing down your bottleneck step after step, the data mesh sounds like the solution you’re looking for. If your bottleneck is just on the insight+ stages, then data democratization might be a trend worth looking into.

It all comes down to you, your company, and your industry. But whatever your context, chances are, you need to change. Hope this helps you think this change through.

The two key questions for you:
1. If I dramatically increase the inflow to this step, is there equally more outflow on the decision & action steps?
2. What if I removed this bottleneck? How about the other steps?

Transformation to consider

Here are my picks for larger-scale changes in how companies work with data. They tend to fall into three different “sizes”.

Data team internal: changes in how the team works, changes in process, and flow. Changes one team can do, but others don’t have to.

Data department internal: are changes that capture the whole data department whatever that means at your company. It means changing the way all data teams work.

Company-wide: company-wide changes can be relatively small, capturing only a selected few data teams as well as a selected few software teams. They can capture data teams as well as data end-users, or they can capture everyone.

(1) Data Mesh

The data mesh transforms the way the whole company works with data. It changes the processes, the culture as well as the technology. It decentralizes the ownership of data, the transformation of data into information, and data serving.

That sounds super confusing, in the most basic form it means, you’re not going to let a single bottleneck data engineering team handle all your data needs. Instead, you’re going to involve everyone from the sourcing of data to the consumption.

I believe the data mesh works for start-ups as well as for established companies.

(2) Data Product Management

Data product management means bringing the whole discipline of product management to the data world. It means having product managers inside data teams, and it means having product managers, in general, interact with data teams to get their work done.

This might be just a change within the data department, or it might carry on into the greater product management cycles. That’s up to you.

(3) Data as Code

“DaC is the approach of defining your data through a piece of source code that can be kept in version control and treated like any other software component. It allows for audibility, reproducibility of results, of data, it becomes subject to testing practices, doesn’t break production systems, and can become subject to the full continuous delivery machinery.” (Data as Code - Thoughtful Friday)

You might consider Functional Data Engineering, as outlined by Maxime to be one implementation of parts of Data as Code.

You can scope a Data as Code initiative just to one data team, to the whole department, or even extend it to other software engineering teams. However, most commonly, it will be centered around a set of data engineering teams.

Here’s the catch: Data as Code will increase the quality & productivity of whatever your data engineering teams are scoped to work on.

Data as Code will increase the quality & productivity of whatever your data engineering teams are scoped to work on.

If they just work on “data pipelines” without doing dashboarding or delivering machine learning solutions, then Data as Code alone will only tackle a small part of your cycle. And that’s ok if that’s your bottleneck.

(4) Data product teams

“There is a better way to build and run a data organization: run it as if you were building a Data Product and all of your colleagues are your customers. We believe this has the ability to transform your organization and help teams reach their true potential.” (Emilie Schario & Taylor Murphy)

This change isn’t mutually exclusive with data product management, it is just different. While data PM means investing in data PMs, training them, and hiring them, “Data product teams” are more about an internal shift.

It is what you do when you already have data leads or even PMs inside your data team. You still need to get them to switch from a service-oriented mindset towards a true product mindset.

This change usually starts in one data team and can radiate outwards.

(5) Data contracts

I like to think of data contracts as what people came up with when they realized the data mesh is a pretty big thing.

The idea of a data contract is simple: For a data engineering team, things changing or breaking upstream are events that usually cascade through the whole system. Data contracts mean stopping this from happening by formalizing the communication between the data team and the upstream data emitter (usually a software engineering team).

A data contract might cover things like:

- What data is being extracted
- Ingestion type and frequency
- Details of data ownership/ingestion, whether individual or team
- Levels of data access required
- Information relating to security and governance (e.g. anonymization)
- How it impacts any system(s) that ingestion might impact (Data contracts 101)

While you can drive a data contract initiative just out of the data department, it is only useful when you consider it a cross-departmental effort from the beginning. So while this looks like a “mini data mesh” it still is a big thing.

(6) Data democratization

The idea of “data democratization” has cooled off a lot. It got a bit of bad press, I think for no good reason.

Data democratization is a simple idea: Get more people the ability to make decisions based on data.

The implementation a lot of companies chose was to simply “give access to data to everyone”. Needless to say, that didn’t result in anything.

The key to good data democratization efforts is in education as you can see at AirBnB, and removing blockers. It is in engaging in reverse ETL to bring the data to the people, not the other way around.

The good thing about this idea is that you can start it out of one data team. Start small, then slowly grow bigger.

(7) Data observability

Data observability is about having more data uptime (or less downtime).

Data observability is an organization’s ability to fully understand the health of the data in its system. It works by applying DevOps Observability best practices to eliminate data downtime. (Barr Moses, Data Observability 101)

So while data observability might sound like a data team or data department effort, it reaches the end users of data. You might consider it somewhere in between.

(8) Analytics Engineering Model

I feel like this is missing a good description of a change within companies. But it’s there, it’s happening. With 15k+ companies using dbt (last time I checked) this is a serious movement.

To understand it, just read the words of Claire Carroll

“But my role had been changing dramatically. Finance and marketing were able to run their own reports. So a normal day for me involved preparing data for analysis by writing transformation and testing code, and writing really good documentation. My tools were no longer Excel and Looker, they were iTerm, GitHub, and Atom.” (What is Analytics Engineering)

The true power of analytics engineering unfolds once you consider it a cultural change within the data department. Beware that there are very different forms of pulling this off, very different ways of organizing data teams and thus analytics engineers inside the data department.

Where to cut - four ways of integrating data teams into the company.

Depending on how you organize your data teams, a shift towards analytics engineering might be smaller or larger.

Summary

That’s it. You now know it’s time to change, it’s time to act now. You can use the datacisions cycle, the bottleneck approach to probe, or you can use your approach.

It doesn’t really how, it just matters that you start to act, and fast.

How was it?

New articles by me

- Mental Models For The Data Space.

Shameless plugs of things by me:

Check out Data Mesh in Action (co-author, book)
and Build a Small Dockerized Data Mesh (author, liveProject in Python).
You’ll find me on Twitter, Linkedin
And on Medium with more unique content.

Share ThDPTh

I created the “20 Point Questionnaire To Assess The Strength Of Your Data Startup Idea”.

Just recommend the ThDPTh and respond to this email with “SHARE: [link to your recommendation]” and you’ll receive this cool giveaway.

I truly believe that you can take a lot of shortcuts by reading pieces from people with real experience that can condense their wisdom into words.

And that’s what I’m collecting here, little pieces of wisdom from other smart people.

You’re welcome to email me with questions or raise issues I should discuss. If you know a great topic, let me know about it.

If you feel like this might be worthwhile to someone else, go ahead and pass it along, finding good reads is always a hard challenge, and they will appreciate it.

Until next week,

Sven

Three Data Point Thursday

Discussion about this post