Dataset-centric Viz, Machine Learnings, Analytics is doing OK; Three Data Point Thursday #85
I’m Sven and I’m writing this to help you (1) build excellent data companies, (2) build great data-heavy products, (3) become a high-performance data team & (4) build great things with open source.
Every other Thursday, I share my opinion on three pieces of content about the data world.
Let’s dive in!
Dataset-centric visualisation is superior to other kinds of approaches
We have no idea how people will react to any new ML solution
Knowing this is central to building ML solutions
Analytics is moving in the right direction, it just takes some time.
🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰
(1) The Economist on machine learnings (paywall)
What: The Economist asked itself how do employees and customers feel about artificial intelligence? Job security, understanding AI, how people react to AI, all come up as relevant questions.
My perspective: I’m sharing this article because I agree with the author, and I think the topic deserves a lot more attention. The author cites a lot of recent research into all of these fields, and makes the point that the picture emerging is messy.
To put it short: We have no idea on any of these topics, we have no idea on how AI makes people feel.
And yet, the success of any AI, or call it machine learning solution if you want, hinges on its acceptance by people.
No people, no AI. So understanding how people feel about AI & machine learning should be the first concern of any implementer, not the last.
Resource: https://www.economist.com/business/2023/02/02/the-relationship-between-ai-and-humans
(2) There is no bad feedback loop in data
What: Robert Yi, CPO of hyperquery, a sharable data notebook start-up, explains why he thinks there is one negative feedback loop causing a lot of what’s wrong with analytics today. That feedback loop, he suggests, is driven by an over-emphasis of technical work. Indeed he thinks, the technical focus of a lot of data teams is received well and thus data teams become even more technical.
He also argues that speed is overrated and that tools should be less technical.
My perspective: I think the analysis is off.
Yes, data teams currently have an over-emphasis on “technical work”. But it's not driven by a negative feedback loop at all. Quite the opposite, teams that feel “they do not quite contribute” are under the influence of positive feedback, nudging them into the direction of more outcome-driven work.
I don’t think there’s anything wrong with the state as it is, and we have a clear direction on where we’re going with it: data product management. Data work has historically been mostly technical. After all, that’s what a DBA (database admin) was hired for. But the times have changed, data has become more important to the business, and now data product management is slowly emerging as a discipline to integrate data into the general business oriented workflows of any company.
Robert argues that tools should be less technical. I don’t think so, in fact my feeling is, data engineering tools should become more technical. Tools should be technical, and precisely do not have too much context. A hammer is just that, a tool that leverages my force for certain tasks, a tool that I can use in many different ways, regardless of what I want to build. The context, the non-technical parts come from me. A hammer should not tell you to “build a house for customer X”, it should drive nails into materials.
Robert also argues that speed is overrated. I think speed in data is underrated. Robert is right with saying we should provide knowledge instead of just raw data, but we should do so fast! And this should enable me to make faster decisions! This means we should always opt for providing less knowledge faster, rather than more knowledge slower.
So, let’s turn this around, what should an emerging data team focus on IMHO?
1. It should be led just like any other product team (see e.g. the data teams at Netflix for an example).
2. It should focus on delivering insights (not raw data) fast, leaning for speed over comprehensiveness.
3. It should not be afraid to use tools as they are meant to be used, as solutions for simple tasks, not as solutions to business problems.
Resource:
(3) Dataset-centric Visualization
What: Maxime, CEO of superset, discusses three different approaches of linking data and visualizations. He makes a good case for a “dataset centric approach”.
My perspective:
A very simple explanation of dataset-centric visualization would be this: If I produce an “orders by customers per month” report in my BI tool, I could have
1. A query-based approach, that does two SQL joins inside the BI tool
2. A semantic-layer based approach, that joins orders to customers and keeps one time dim, does one join inside the BI tool.
And I could have a dataset-centric approach that basically does (1), but puts it into one big enriched table inside my database, data precomputed, nicely duplicated etc.
I’ve always been a fan of this approach. Because for one thing: Simplicity. To most data developers, it sounds counterintuitive if I say this is simple, but it is in the best of the meaning: It decouples! It reduces the overall system complexity.
It enables you to have business logic inside the transformation tooling of your choice, and thus versioned, subject to testing and all your good practices.
I love how Maxime offers pros and cons for all approaches, but only pros for the dataset-centric one ;)
Note: Of course you could leverage the capabilities of a BI tool to produce the “dataset” but that would kind of ruin the decoupling, so my advice: do this, and do it on the data store level.
Resource: https://preset.io/blog/dataset-centric-visualization/
Shameless plugs of things by me:
New things from this week!
“There ain’t no such thing as a free lunch — 8 Characteristics of Great Open Source Projects”
“If You Only Read A Few Data Articles In 2023, Read These” is the collection of the best articles in 2022 from this newsletter.
Books, courses and general writing:
Check out Data Mesh in Action (co-author, book)
and Build a Small Dockerized Data Mesh (author, liveProject in Python).
And on Medium with more unique content.
How was it?
I truly believe that you can take a lot of shortcuts by reading pieces from people with real experience that are able to condense their wisdom into words.
And that’s what I’m collecting here, little pieces of wisdom from other smart people.
You’re welcome to email me with questions, raise issues I should discuss. If you know a great topic, let me know about it.
If you feel like this might be worthwhile to someone else, go ahead and pass it along, finding good reads is always a hard challenge, they will appreciate it.
Until next week,
Sven
P.S.: I’m on vacation the next two weeks, so no editions there. Next one is a Thoughtful Friday on the 3rd of March!