Data and Predictions for 2023! ThDPTh #81

Dec 08, 2022

Barr Moses 7 predictions for data engineering in 2023 are spot on, they support the underlying trends that will last much longer! Tx Zhuo already talked about the major shifts happening now back in 2016. The State of IoT is still doing good even though growth has slowed down to 8% and supports the underlying forces of data.

Hi, I’m Sven, and this is the Three Data Point Thursday. We’re talking about how to build data companies, how to build great data-heavy products & tactics of high-performance data teams. I’ve also co-authored a book about the data mesh part of that.

🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰

(1) Barr Moses 7 Predictions for DE in 2023+

What: Barr Moses, Co-founder of Monte Carlo, shares her predictions for data engineering for 2023+. Her main theme is the focus on tools & practices that reduce the time on non-value producing activities.

Her seven predictions:

More time spent on “FinOps”, optimizing cost. (duh, not sure that qualifies as “value producing” ;))
Further specialization in Data Engineering-like roles
The data mesh + central team will be a thing.
ML models will make it to production more often/faster.
Data contracts will move to the early adoption phase.
Data warehouses and lakes will blur.
Team will be quicker on time to resolve quality issues.

My perspective: I wholeheartedly agree with Barr! It was a pleasure to read her predictions and recommend them to everyone. I particularly like that her advice is actionable and always well grounded.

Since I do have a different time horizon in mind when making predictions (10+ years), I’d like to share a few additional thoughts on three of her predictions.

#1 FinOps.

The cost of data is often clouded, esp. If you still own “big old servers” staying around in a data center. Barr thinks next year we will see an increase in both more transparency and a reduction of cloud costs for data.

Cost of ECS cluster running ingestion tooling X$$$
Cost of orchestrator Y$$$
Cost of data lake Z$$$
Cost of data warehouse D$$$

Usually this will result in a breakdown like this one, and then following an analysis, possible reductions.

Examples would be: “start to move data to archives, use a different data warehouse, work on idle times on the compute costs.” etc. etc..

While this is valuable, with data becoming more and more important to business, I’d just like to remind people that what we really would love to have is a break down that looks more like this:

Complete cost of management dashboards
Complete cost of the recommendation system
Complete cost of datasets X,Y,Z

(The difference: The first breakdown is alongside technical seams, the second one alongside business seams)

With data becoming more and more important, this will increasingly become the case anyways, so I’m in 100% support of this trend. Just don’t forget where you want to head.

There’s no good reason to “move all data to archive after 1 year”, but there is a good reason to do this for datasets that do not power the key decision maker dashboards.

#2 Special roles.

Barr thinks the next few years will call for more DataOps engineers, data reliability engineers, data product managers, and overall more specialization inside the data engineering job world.

I do want to call out my perspective on the much larger underlying trend that I see is powering this change:

The convergence of software engineering & data work.

Increasingly, the future will be shaped by the rise of the value of data. This has two implications:

Software engineers will need to know data, a lot in fact about data, as all apps will be about moving & computing on large amounts of data.
Data engineers will end up building high performance distributed apps, all day long, as such, will need to understand more about software engineering.

This will lead the engineering (both software and data!) world to look much more like:

A smaller fraction of software only engineers
A large fraction of software-data engineers able to build large scale data processing products (all products will be of this kind in the future)
A small fraction of data only engineers

I think this will be a true convergence where both disciplines move closer together, data developers learning the chops of building high-scale performance products, and software engineers learning the chops of data, not just a one-way move.

With that in mind, if you look at Barrs predictions, all of the roles already exist in the software world, Barrs predictions are basically one step in this convergence.

It kind of looks like DataOps was step 1,
step 2 was the (still ongoing) introduction of product management (fuelled by Zhamaks landmark articles & book on the data mesh)
and this will become step 3 of this convergence of the engineering world.

#3 Meshy data with a (permanent) pitstop.

Barr outlines a “pitstop”, a permanent one on the journey towards the data mesh that most companies will feel quite comfortable with.

I 100% agree with Barr on this one. Data Meshes are the future of any company, because the business will drive this adoption. But this is going to take time, a lot of time.

Barr describes a “pit stop” with a platform team that works together with an embedded model. It’s the model we ran at Unite, and in my experience for the bulk of companies, this model will work out just fine for the foreseeable future.

The underlying trend here is the decentralization of data and use cases that will drive the adoption of the data mesh, and the inertia in the culture of most companies, that will end up in this kind of “pit stop” along the way.

Caveat: I do think startups have a chance to go all in data mesh from day 1 and get a leg up on incumbents. Don’t miss out on it if you can (because as start ups, you don’t have the mentioned cultural inertia!)

Resource: https://towardsdatascience.com/whats-next-for-data-engineering-in-2023-7-predictions-b57e3c1bf2d3

(2) Big Data => Fast Data

What: Tx Zhuo, a VC, shares how moving from big to fast data can help companies. He shared this in 2016. His three pieces of advice are to empower all employees to use data, use multiple sources of data, and to use data proactively.

My perspective: The most astonishing thing about this article is that it’s from 2016 - it reads as if it could’ve been written today, in 2022. But, because data is growing exponentially, the numbers tx mentions only become stronger. From a 2022 perspective, we have:

90% of all data on earth has been created roughly in the last 365 days.
50%+ of all data sources on earth went “live” in the last 720 days.
So what do you do with this knowledge?

Three things first and foremost:

You focus on real-time and recent data.
You focus on getting more data sources on board.

3. You focus on speed, and by that as Tx explains, means speed of turning data into value (not on collecting data quickly).

These numbers are only getting stronger, if you don’t follow, you will likely fall behind. Until the day you notice, 90% of all data was created in the last month, while you’re still analyzing data from a year ago.

Sadly, I don’t see many companies doing this. But the ones that do, thrive. (Yes the Amazons and the Netflixes of the world incorporate all of these ideas at the core of their data strategies - and those again are simply their business strategies).

Resource: https://www.entrepreneur.com/science-technology/big-data-is-no-longer-enough-its-now-all-about-fast/273561

(3) State of IoT

What: The IoT Analytics GmbH shares their insights on growth, investment and further forecasts for the IoT. Due to the pandemic, and the Ukraine war, the growth is lower than expected.

My perspective: I care deeply about the IoT because it is the major driver of data growth in the coming decade. Some insights I found interesting:

Growth is still 8% YoY
Investment into IoT went from 266m to 1,2b USD in Q1 YoY. A four fold increase. That’s big.
Most of the investments center around analytics, AI and security.
I assume the decline in growth and the negative trends are more than offset by the huge shift to digital that also happened due to the pandemic.
So my internal forecast for the data growth is still as bright as ever (exponential until further notice, doubling roughly every 3 years.

Resource: https://www.iotforall.com/state-of-iot-2022

What did you think of this edition?

Want to recommend this or have this post public?

This newsletter isn’t a secret society, it’s just in private mode… You may still recommend & forward it to others. Just send me their email, ping me on Twitter/LinkedIn and I’ll add them to the list.

If you really want to share one post in the open, again, just poke me and I’ll likely publish it on medium as well so you can share it with the world.

Three Data Point Thursday