Why People Are Lying To You About Big Data

Data contracts distilled; Why big data haters are lying and what you really need to know; Four ways to introduce explainability into your data-heavy products

Sven Balnojan PhD

Aug 31, 2023

This is the Three Data Point Thursday making your business smarter with data & AI.

Want to share anything with me? Hit me up on Twitter @sbalnojan or Linkedin.

Let’s dive in!

Data contracts distilled

TL;DR: Data contracts are a tool everyone should know about, but approach carefully as a “solution” to anything.

Key challenge: If you want to implement data contracts, the only way to make them successful is by making them beneficial to both sides, product and data teams.

Data Contracts distilled:

How a data engineer would explain data contracts: "So that software engineering teams stop breaking my stuff."

How software engineering teams explain it: “Like SLAs & an API schema for data, even in our warehouse. You know, we keep breaking the stuff because data isn’t as important as product. “

There is a fundamental tension between software teams shipping their products fast and data engineers that consume data from that product as a byproduct.

So what? Data contracts are a simple tool to mitigate this fundamental divide!

It's a tool to get both sides talking and making both aware of the other side's perspective.

Two good resources are the article by Monte Carlo and the template from PayPal.

Why big data haters are lying and what you really need to know

People have been lying to you. Big Data isn’t dead; it’s not bad or “hard to do.”

The simple truth: Big data is your only option to become a data business.

DIVO = Data in, value out. That’s how the world works today.

So what? Well, there are three ways to get more value from data. And big data is where it’s all headed.

The three only ways of getting more value out of data:

Getting more data (tap more sources)
Getting more value out of the data you already have (the stuff you have lying around)
Selecting more value-rich data pieces.

It’s like oil drilling. You can search for new fields (brute force), get the cool tech to get more oil out of the existing fields, or get smart about finding new fields.

But you know what? No matter what you do, only one of those options scales in the long run. It’s brute force. The others are just making you better at brute force.

It’s like that in data; if you want to become a data business, in the long run, you need to get good at big data, at bruce forcing more and more data into your systems.

Bottom line: Whoever tells you big data is dead doesn’t know his data stuff or is trying to get you out of business.

Four ways to introduce explainability into your data-heavy products

ML & AI algorithms are hard to understand and sometimes scary to users! To make them more successful, we need to introduce transparency!

How? Here are four simple ways you can do so:

1) Explain the data basis to users. It doesn't have to be fancy; simple is better.

Netflix shows me recommendations (ML) with a friendly text above saying, "Because you watched MOANA."

Amazon: "OTHERS ALSO viewed"

2) Explain how the algorithm works to users in simple terms.

Netflix: "BECAUSE YOU WATCHED Moana"

Amazon: "Others also VIEWED."

It can be as simple as that.

3) In-depth explanations of your algorithms, like Facebook does for most large customer-facing algorithms.

4) Give users control options. Even small inputs make users feel in control.

How about "turn personalization off"? Or simple things like "make it location-specific."

Three Data Point Thursday

Why People Are Lying To You About Big Data

Data contracts distilled; Why big data haters are lying and what you really need to know; Four ways to introduce explainability into your data-heavy products

Data contracts distilled

Why big data haters are lying and what you really need to know

Four ways to introduce explainability into your data-heavy products

Discussion about this post