Why You’re Missing The Point Of Analytics
The hermaphrodite word data product; Cold Data isn’t cold; Why You’re Missing The Point Of Analytics
This is the Three Data Point Thursday, making your business smarter with data & AI.
Want to share anything with me? Hit me up on Twitter @sbalnojan or Linkedin.
Let’s dive in!
Why You’re Missing The Point Of Analytics
Analytics is defined as “the systematic computational analysis of data”; Business Intelligence is “the set of corporate strategies to analyze business information” and turn them into actionable insights.”
And yet, almost every company thinks analytics & BI are about the analysis of historical data.
There are two fallacies right here:
Thinking it’s about analysis only, where in fact analysis is only a means to an end.
Thinking it’s about historical data only, where in fact the more important data is the data not yet created.
Everyone falls for those fallacies; you too.
First task for the data team? Build up a data warehouse to store historical data.
First task for the data scientist? Analyze historical purchase behavior and predict something useful.
First task for the business analyst? Analyze historical order data and show us how we compare.
Who’s implementing tracking? Not the analyst someone in marketing.
Who makes sure there is tracking in the first place? No one in analytics, someone PM, maybe.
That’s like doing only half the job. And not the vital half. So here’s the critical half:
It’s making sure data “gets made.” Because that’s what you need to help guide actions and create the right data for new decisions.
That the correct data gets made, it’s the more fuzzy and less sexy part of the job because it involves getting people to hypothesize before they make a decision, then build the proper tracking into whatever they are shipping.
It means doing Google Analytics and implementing “event tracking”; it means actually having to change course instead of just justifying decisions with data.
So what should you do? Think deep and hard about which side you’re currently on.
Cold data isn’t
…. cold, it’s pretty much heating up your data bill.
In the data world, we separate data into hot and cold.
Hot data: The data that’s in active use, meaning it is queried daily, weekly, or at least once a month.
Cold data: the data that’s queried less than that.
Example: In an e-commerce company, the order data from this week, this month, and this year are in active use. They probably also use last years data for comparisons. All of this data is stored in Snowflake.
Why should you care about this separation? Because the cost of storage and complexity (yes, there is such a thing as complexity cost) doesn’t depend on usage but rather on simple existence.
What you can do: Businesses can separate hot from cold data logically and in storage (cost tiers).
Example cont.: Our e-commerce company does two things.
It precomputes the values used for comparisons like order amounts, order net rev, etc., for the usual time horizons.
It pushes all data from 12 months ago to AWS S3 on the 1st of every month.
The effect? Object storage like AWS S3 is about five times as cheap as Snowflake, and even more savings with AWS Glacier are possible.
Less complexity in data: On top of this, the data team has to deal with way less data when a problem arises, as most of the data is cold and not stored in the exact location of the new data.
The trade-off: However, while separating cold and hot data reduces the complexity of the data, it increases the overall complexity. It means another small component to maintain.
So what? Maybe you’re OK with your data bill; if that’s the case, ignore this! There is no sense in putting in more effort.
However, if your data bill is becoming a problem, and you’ll know when this is the case, go and do a quick audit. Let someone look at query accesses and compile a rough sketch of your data cost separated by hot and cold data.
The hermaphrodite word data product
Watch out for the word hermaphrodite words "Data product management"” "data PM," or “data product”:
It means either:
product management for a data-heavy product
managing "data products," where "data products" are basically data sets, analyses, reports,…
data PM as PM specialized in data-heavy products means
data science
machine learning
data engineering
internal & externally facing data teams
It can mean the customers are external, and it can also tell your customers are internal, the employees of the company the PM is working for.
It means building dashboards, recommendation engines, and possibly UI features,...
Remember: The expectations differ vastly for those roles depending on what scope is implied!
Data products as datasets/analyses stem from the #datamesh concept, where "data as a product" is one fundamental principle.
Treating data as a product levels data itself up to a product, and thus, of course, also reports analyses, etc.
In "Data Jujitsu" data products are defined as
“a product that facilitates an end goal through the use of data"
very much in line with the data PM meaning. (That was in 2012!)
However, the hot #datamesh trend has completely messed up this perception.
The gist of it all:
Watch out for the word data product; it has different meanings to different people!
data sets should be considered a subset of all data products (as a subset of all products)
not all datasets are products!!!
So what?
Watch out whom you hire!
Watch out how you market yourself
Watch out for what the tools can do.
Goodie Time! Exclusive Gifts
Here are some special goodies for my readers:
👉 The Data-Heavy Product Idea Checklist - Got a new product idea for a data-heavy product? Then, use this 26-point checklist to see whether it’s good!
The Practical Alternative Data Brief - The exec. Summary on (our perspective on) alternative data - filled with exercises to play around with.
The Alt Data Inspiration List - Dozens of ideas on alternative data sources to get you inspired!