Fast Moving Waters, Whaly.io, Competing on Cost; ThDPTh #77

Sep 29, 2022

I’m Sven, and this is the Three Data Point Thursday. We’re talking about how to build data companies, how to build great data-heavy products & tactics for high-performance data teams. I’ve also co-authored a book about the data mesh part of that.

Time to Read: 6 minutes

Another week of data thoughts:

The modern data stacks’ most important feature might be its modularity.
If you’re a data company, get into fast-moving waters, not all parts of the data world are moving as fast as others.
If you’re competing on cost with an open-source-based business, you might doom yourself.

🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰

Whaly.io, another BI start-up

What: Whaly.io is a new data start-up, at the seed stage, offering a “modern data stack out-of-the-box that grows with you”.

My perspective: I haven’t used the tool, but one concept I find interesting is their idea of the modular modern data stack and the business opportunity hidden there:

Whaly promises to take care of your complete modern data stack. But since the modern data stack is primarily about modularity, whaly allows you to switch off parts that whaly takes care of as you grow.

So, if you’re a one-person data team, whaly has EL+dbt+BI covered and hosted. If you grow, you can use your own dbt, and if you grow further, you can also replace the EL part.

I like the idea and it reminds me how the modern data stacks’ most important feature might be its modularity.

Ressource: https://whaly.io/

The Fast Moving Waters

What: James Currier from NfX explains the idea of having businesses in fast-moving waters. In his mind, companies need to move into the fast-moving waters to really succeed. Until then, they should stay nimble and small to be able to move quickly into the fast-moving waters and not be pulled down by the company's internal gravity.

My perspective: I like the concept, it seems to be related to what I referred to as the importance of “systems thinking” in the data space, also based on a NfX content piece. The data space overall is growing, but not every part of it is growing equally. Some parts are growing exponentially, while others are not. Be sure to pick one that is.

To emphasize this: The difference between being in an exponentially growing part, and one that is not is tiny at day 1, and year 1, but it is humongous at year 3-5 and beyond.

Resource: https://www.nfx.com/post/find-the-fast-moving-water

Databricks “alternative” iomete and their doomed business model

What: iomete promises to be a low-cost open databricks alternative built on Apache Iceberg.

My perspective:

(1) Lakehouses all around

Lately, I keep stumbling over companies offering “lakehouse setups”. The lakehouse is a pattern propagated by the company databricks. A very simple version of it would be

“you got a data lake? Great, let’s give it all the powers & guarantees of the data warehouse so you can do everything with it.”.

To raise a data lake to a lakehouse, we need technology that adds functionality. Dremio advertises itself as “the lakehouse platform for teams that know and love SQL”. Although databricks also has excellent SQL support and is rolling out feature after feature related to it. But it is true that databricks initially targeted Spark developers.

The company iomete provides a lakehouse built on the same table format as Dremio’s, namely Apache Iceberg, but advertises their setup as “low-cost”, promising to always match the AWS-compute prices.

(2) The doomed business model

While I like the idea of having an open lakehouse and supporting every effort of building them, I’m having a hard time seeing iomete succeed with its current business model.

“We bundle data volumes from individual customers, that allows [us] to obtain volume discounts from AWS by reserving server capacity in advance.” - (iomete pricing page)

Let’s peek into the future 5 years from now:

Option 1: Iomete doesn’t attract mass, isn’t able to procure at high volume, and is thus forced to go bankrupt or change business strategy.
Option 2: Iomete does attract mass, and is able to profit the way described above. The open source combination proves successful. Customers enjoy the combination offered by iomete. If DbtLabs serves, as a rule, between 5,000-10,000 customers is a good benchmark for this point in time.
Consequence 1: Since iomete it’s competing on price, they aren’t able to develop significant additional features, for a lack of funds. That’s not the strategy after all.
Consequence 2: The world notices this “free market research” validating that indeed there is a large set of customers looking for a low-cost lakehouse solution based on these features provided by this open source stack.
Consequence 3: Any hyper-cloud (AWS, GCP, Azure) can now harvest the fruits and provide its own low-cost version of this open-source stack. In fact, any company can. After all, it is open, and 5-10,000 customers aren’t nearly enough to create any kind of network effect to fight competitors.
Consequence 4: By sheer mass, both mini-clouds and hyper-clouds will be able to offer dramatically lower costs compared to iomete and one additional benefit: Integration into existing services. Because it is open source after all!
Consequence 5: Customers from iomete will switch, and quickly, because they are not locked in, and care first and foremost about low cost, by design.
Consequence 6: iomete cannot switch the business model now, as the complete customer base is low-cost first customers, and iomete hasn’t built up a “value feature” dev muscle.

This is the path of cOSS (commoditization of open source services).

(3) The problem

I put at least two assumptions in the path, which may or may not be true.

Assumption 1: Customers in the data space are heterogeneous. The ones that put the cost as the priority are not very “close” (in the economic sense of preferences) to others.
Assumption 2: Customers of hyper-clouds mostly care about two things: (1) Price (yes I feel the iomete team is wrong on their assumption) (2) consolidation/ integration in other cloud provider services.

If these two assumptions are good, it means going down the road of commoditization, competing on price, is going to attract a lot of competition. And competition in markets where mass counts are going to be deadly. It also means, once you go down that road, it’s going to be hard to turn back.

(4) So how can they make it work?

Can we provide an open datalakehouse at a low cost? Yes, but I think the focus needs to be a little different. I see two options if we’re trying to achieve that:

Focus on a different (slightly) different set of customers, the ones for whom the price is a constraint, not a priority (FWIW, their blog posts actually read like this is what they want to focus on but don’t know it yet)
Focus on the same set of customers and add consolidation-based features into the open stack they chose. By doing so, focus on buyer-important features and “circle the wagon” kind of features

Resource: https://www.iomete.com/

What did you think of this edition?

-🐰🐰🐰🐰🐰 I love it, will forward!

-🐰🐰🐰 Average newsletter...

-🐰 It is terrible ( = I just made it to this link b.c. I was looking for the unsubscribe button)

Want to recommend this or have this post public?

This newsletter isn’t a secret society, it’s just in private mode… You may still recommend & forward it to others. Just send me their e-mail, ping me on Twitter/LinkedIn and I’ll add them to the list.

If you really want to share one post in the open, again, just poke me and I’ll likely publish it on medium as well so you can share it with the world.

Three Data Point Thursday

Discussion about this post

Ready for more?