This is the Three Data Point Thursday, making your business smarter with data & AI.
Let’s dive in!
0%. That’s about the rate of crime in Dubai, or at least that is what I heard a couple of weeks ago, following the explanation from a lively tour guide. Checking online told me that’s not completely true, but approaching the truth.
Still, I had a chance to test it out. I dropped my wallet in the front seat of a cab 26 hours before I had to leave for my plane. Lo and behold, 5 hours before I had to leave, in the middle of the night, I found myself outside at a parking lot, getting my wallet back from the cabby I lost it with.
Uncontrolled experiment: Check.
The thing is, I didn’t think I would get my wallet back. Some knowledge you gotta acquire by doing, rather than by hearing.
Best practices in the data space
David Snowden published the Cynefin framework in 1999, and I still feel like no one really knows about it—it truly changed my perspective on words like “best practices.” It’s a simple classification of how we acquire knowledge in different circumstances, most notably in three domains: the simple, the complicated, and the complex.
In simple domains, you know what will happen if you do something. If I follow the iPhone repair tutorial on YouTube, my iPhone will likely be fine - that’s a best practice, it is what you should do, and it is what we all understand as best practices.
If I build a house, I need to get an architect; he’s an expert, can analyze the surroundings and the ground and, using good practices, come up with a great plan that will build a house lasting 100 years.
The catch? In exponentially fast-moving fields like data, there is little space for simple and complicated domains. We’re inside the complex, and almost always, there is no best practice. There is only one way to find out what you should do: That’s by acting first.
FWIW, sorry for publishing all those articles titled “Best practices of X.” I’m aware I’m making the problem worse myself.
Unstructured mess
New cleaning strategy: Let’s leave the messed-up rooms and stay in the clean ones we have.
Doesn’t sound like a good strategy to living, right? And yet, it’s what most data teams are told to do. The catch: The mess will grow, in 5 years, all data is unstructured. We keep on ignoring it, but it will suffocate us.
The unaware will be caught of guard, not able to deal with it. Teams and companies working with unstructured data today, will experience a significant boost, and the rest will feel a true gap appearing.
Which side of that gap do you want to be on?
Big data doesn’t matter
When I started out a decade ago, big data was hot. Now it’s merely a word used in old texts, people hype small data, smart data, duckDB, these kinds of things.
And yet, every single industry dominating company in tech is dominating because of big data, literally because they have more data than anybody else, Airbnb, Google, Netflix. That holds true for enterprise and for startups.
Modern warfare is simply based on having more data (Russia, China, and the US spent billions on acquiring data). - The facts are simple: big data is better, less data is worse.
ChatGPT/gen AI-based products & features suck
I can count the good generative AI using products and features I know and use on one hand. I need the hands of almost all humanity to count the ones that aren’t any good. - And I think you’re with me on this. There’s no good explanation; we just suck at utilizing gen AI. To me, it’s like it’s the 1800s; we’ve been handed a chainsaw, and we just keep trying to cut down grass with it.
The data infrastructure of tomorrow
It’s mind-boggling to me how many data people think more components, APIs, and tools for a data stack are of any use. Most data teams are drowning in components. I feel like the data engineer is the only developer type in the field who thinks more components are better: another transformation tool here, a profiler there, a testing framework over here…
They are not. No business wants a database, everyone wants real problems solved, regardless of the technical details (yes infrastructure is details). All builders in the data space have a choice: go deeply vertical into tech or go deeply horizontal, focus on a specific end user, and solve a complete problem.
I’m not saying one is better than the other, but I am saying, every builder has a choice, and the data space seems to have a bias here.
In the data space, founders & PMs are biased towards deeply technical products, and they shouldn’t be. They should see both world, not just one.
Data teams take their jobs very seriously….
It’s a simple spiral: company hires CIO/ head of data to hand off turning data into value. Like a CTO, he is great at efficiently getting the most out of this department. But unlike a CTO, his team isn’t building the actual product, removed from the actual business. That, by definition will cap the maximum value he can get out of data, because the most value inside the company is in its product, literally.
Thus, management will get the impression that there isn’t that much value in their data after all, reducing the impact the data department can have and pushing it into a spiral of irrelevance.
Now, there are two lessons for me inside that spiral: companies that don’t integrate data departments into product & business work won’t be able to utilize data at all.
Second, the default head of data should be a facilitator, facilitating change towards a company that integrates data into products and businesses.
Data departments shouldn’t do the job they are hired to do; they should facilitate change instead.
Data strategy
Products without a good data strategy struggle to survive.
95% of data strategies I’ve seen are bad data strategies. I don’t mean they are bad or not well conceived. Most of the ones I read and get explained are well throughout, written and illustrated; people put a lot of thought and research into them.
And still, as a strategy, they suck! They miss leverage and/or a set of guiding actions that utilize this leverage. And that goes for internal projects as well as for startups or data-heavy products.
Problem is, I don’t think it’s enough to be a great product manager or data leader, I think you truly need to understand strategy in the data space. You don’t have time to interview 100s of customers before creating a new product, you need a deep understanding and a good strategy, and then move fast.
It is time
We’re at an interception of temporal data becoming analyzable by everyone, and this kind of data is ubiquitous. It’s easy to miss, but we’re already there.
A year ago, it took time for series modeling experts to create a bespoke model for every single series. In a year, it will be a simple line of Python.
Sure, it will be based on ML, and it will only be 80% as good. But you know what? It will be able to scale to everyone and scale to the 100s of time series available to you by then - an amount no expert can model.
These combinations of trends are hard to spot, but when they happen, they are changing industries.
Time series data analysis will disrupt the analytics space.
Getting emotional
“If someone told you that engineering was a field where you could get away with not dealing with people or feelings, then I’m very sorry to tell you that you have been lied to.” - Yonatan Zunger as quoted in The Alignment Problem by Brian Christian.
Computers are becoming better than humans at displaying and recognizing emotions. Emotional AI as a field is starting to pick up. This is a next hot field that will boom, you can use it in your technologies products, you can use products that use it, and you can found businesses in this sector and you won’t be disappointed.
Cybersecurity
I can almost hear your thoughts: boohoo, boooooring! And yet, I couldn’t be more excited about cyber security - and more scared.
I’ve learned two things: First, cybercrime is here to stay and will likely boom in the near future with personalized attacks, making it the single most economically destructive force on the planet (way before wars and pandemics).
Second, cybersecurity will be democratized thanks to the same technology that will propel it forward: AI.
Sadly these kinds of trends rarely run in parallel, but rather are like two intertwined cords. So, let’s prepare for some ups and downs!
Getting real (-time)
I’m gonna give you a data point: Assume your top customer will churn with a probability of 90%. When would you like to have that data point? If it were me, I’d say “yesterday!!!!” Because things happen. Competitors move on your customers; they steal them away if they get that data point first.
Your customer might just be unsatisfied and leave without an alternative. Whatever happens, your data loses value with every single other event that influences the same object, so it basically loses value every second and in big discontinuous hops. There is only one way to ensure you get significant value out of data: By acting ASAP, by having real-time data.
But I think real-time data is like the iPhone. If you don’t make it super easy to use and put into action, if you don’t deliver it well, no one will be able to tell you they want it.
Someone is going to build it. There are enough creators in this world who don’t want to live in Blackberryland.
Real-time data matters, and will be disruptive, I just don’t know by whom or when.
Open up to close down
Open source is NOT going to democratize the data space, period. I wish it were otherwise. I argumented that it will (3 years ago), but it very likely won’t.
What made me change my mind? Long and painful experiences, seeing the same cycle happening over and over - and then learning that Chris Dixon, in a different field, already connected the dots.
Open source projects are platforms, it is the platform property that gives them so much power, that some investors only invest into them. And yet, it is also what makes them tip every single time (in the data space).
Software in the data spac is complex, there’s no simple protocol layer, no standard data stack. As such, all long-term successful software development in the data space, in particular open source based, requires money, vast amounts of it, as incentives for an open source platform and its developers, and for the core team pushing the project into a coherent direction.
Every single time, companies thus end up building a fat core heavy platform that is built mostly by the company itself. As such, once the money starts to flow in, all profits flow into the company that backs the platform, the open-source project, not the platform participants.
Chris Dixon describes this cycle in general for platforms as the attract-extract cycle. First, attract through a massively open platform, then, once it is big enough, extract all the profits into the core.
Sadly, there’s no happy end for this week. Feel free to challenge me on this; I would love to be wrong.
Here are some special goodies for my readers
👉 The Data-Heavy Product Idea Checklist - Got a new product idea for a data-heavy product? Then, use this 26-point checklist to see whether it’s good!
The Practical Alternative Data Brief - The exec. Summary on (our perspective on) alternative data - filled with exercises to play around with.
The Alt Data Inspiration List - Dozens of ideas on alternative data sources to inspire you!
> Software in the data spac is complex
---
"Modern warfare is simply based on having more data"
looks great next to
"Let’s leave the messed-up rooms and stay in the clean ones we have. "
:D