TimescaleDB and the Quest for the Ultimate Time Series Database: Growth, Challenges, and Strategic Moves
This is the Three Data Point Thursday, making your business smarter with data & AI.
Let’s dive in!
Actionable Insights
If you only have a few minutes, here’s what you can learn from the time series database company Timescale:
The data of tomorrow is 90% real-time, unstructured, and time series-based. And we have surprisingly little technology to handle that at all. Data people shy away from it, but software engineers are starting to engage in it.
The time series space is moving into fast water. Both companies and customers will be plunged, so better watch out! Some killer applications will emerge that we can all use!
The big question in fast water is: Where is the true explosive demand coming from? What are the big things you can build as a company? Timescale is about to face that exact question. It is hanging right now between two chairs: Do they dig deeper into time series, or do they use their leverage to expand into other areas?
More often than not, the competition in the data space isn’t the competitor; it’s DIY! The pattern is similar across software and data engineers; both love to employ open source and often default to DIY solutions because the market cannot catch up.
Open-source companies often get stuck selling to experienced software/data/… engineers and then experience slowing growth. A devious spiral happens: open source projects attract people who can contribute and launch the tool independently. That means, by definition, those people are not a good fit for a managed solution. So, only a tiny percentage will become paying customers. This looks like an initial burst in customers but quickly slows down. It’s a dilemma Timescale faces right now.
The next wave of growth for data startups might happen with a customer base of software engineers, not data people. Software engineers work directly on the value stream, while data people are often removed from it. Software engineers are the most vivid users of technologies like timeseriesDB in the time series space. There might be more openings for new products here!
Ajay Kulkarni is a name I hadn’t heard of before 2018 when I stumbled upon his insightful article on Commercial Open-Source Software (COSS for short) business strategies. As it turned out, Ajay had skin in the game. He was and still is the CEO and co-founder of Timescale, a company building the TimescaleDB open-source project.
Fast forward to today, and Ajay’s company, Timescale, has delivered on the promise of successfully navigating the COSS space with a huge open-source project and a successful business model, achieving unicorn status in late 2022 with its latest funding round. With that, Timescale is already one of just a handful of companies managing the open-source/business straddle position.
But what makes me return to Timescale today isn’t open-source; it is what’s about to come for that company: I believe Timescale is about to enter fast-moving water—the space in business where it will feel like businesses are getting swept away by currents so strong that you’re either growing extremely fast or going broke.
The one key question I want to dive into today is…. Is Timescale able to handle the coming torrent? Will they be swept away or catch the current and grow absurdly quickly?
Let’s take a look into the history of Timescale first.
A short history of timescale
Early in 2017, Ajay Kulkarni and Mike Freedman, a Princeton professor focused on distributed systems, got together to launch something exciting: an open-source project called TimescaleDB. They secured an early seed round of $3.7m from NEA and created a company around the project in April of that year.
“We realized that this problem needed a new kind of database, and so we built TimescaleDB: the first open-source time-series database designed for scalability and complex analysis.” - Ajay in a medium post back in 2018.
Nine months later, Timescale already had 100,000 downloads, $16m in total funding, and a vision that would become reality over six years. Fast forward to today, and the GitHub project has 16k stars, over 500+ paying customers (numbers from 2022), and over 10k Slack users inside their community. Oh, and the company is valued at over $1b.
The time has come
Image from the original post from Ajay, 2018.
I believe a confluence of three trends is creating a unique moment for Timescale:
the exponential growth of time series data,
the growing realization in the software engineering domain that time series-based applications are essential,
and the democratization of time series analysis.
Let’s dive into each in detail.
In 2020, I studied the future of the shape of data. By that, I mean the shapes and types of data that will matter tomorrow.
From that day on, one thing was clear: We must bet on different tech! The new data coming our way is 90% real-time, time series data, and unstructured (look at this article to understand what I mean by the word unstructured).
Yet there is surprisingly little (mainstream) tech to handle this data type! Indeed, data and analytics teams are miles away from even touching this kind of data. Software teams, the user base that Timescale targets, have realized the value of this data and are building serious products on top of it. Still, they are also stumbling over the incredible challenge that comes with it: Reliably handling huge amounts of data in almost real-time.
The combination of software engineers demanding tech to build time series-based applications is already a strong sign of change, and the fact that time series data is growing exponentially strengthens that trend.
But these two trends alone don't make for a discontinuous jump. What caught my attention earlier this year was a fun technology called timeGPT. While that piece of tech in itself isn’t relevant, the advancements it signals are: We’re about to arrive at a time when working with time series data won’t require heavily trained specialists anymore. Working with time series data will be democratized.
In other words, The cost of working with time series data is falling exponentially in discontinuous jumps, while the amount of time series data available is growing exponentially (smoothly).
This confluence will produce a series of discontinuous jumps inside the market. It will produce fast water.
A detailed look into Timescale
Amazingly, Timescale has relied on one single product for almost 6 years: a managed version of TimescaleDB.
To complement this product, Timescale has TimescaleDB, the open-source version thousands of people run on in production, and a 30-day free trial version of the managed option on AWS.
Most of the features of Timescale have moved into the so-called “community edition,” which has a special license, the Timescale License, essentially stopping AWS from stealing and hosting TimescaleDB. It’s free for personal use. A few days ago, the company relaxed the license to make it even more convenient for self-hosters. That’s a strong move and a good sign the company is still aligned with its open-source community. Once COSS companies start selling a managed service, they often complicate self-hosting indirectly to drive more customers into their managed service.
Getting technical on the product: Timescale is “basically just a Postgres extension.” But that is a huge understatement, as Postgres is a “bare bones” kernel that thrives on extensions. The key features are the editions of so-called “hyper tables,” a new table for time-series data. They auto partition and auto compress, making inserting, updating, and retrieving thousands of rows each second easy.
It looks like the managed version isn’t very different from the open-source community edition, which follows the GitLab model.
New stuff on the horizon: While the product stayed the same for 6 years, Timescale alluded to broadening its scope in its 2022 fund-raising announcement. Today, it has two additional products in early access: an ML/AI-focused vector embedding database and a general-purpose managed PostgreSQL.
Customers and competition
So, who’s using TimescaleDB? At an individual level, the answer seems clear: 90% of the users are software engineers who need to scale the backbone of their application.
“Messari used InfluxDB for a vast amount of ingestion and continuous aggregation in addition to around-the-clock queries for user requests. [...] Moving to Timescale has improved Messari’s performance. They saw an average of ~40 ms improvement in INSERT and SELECT queries.” - Messari customer story
The user's work develops applications that have terra—of petabytes of data, powering applications in finance, IoT, or analytics. These applications process up to a thousand queries per second.
On top of that, 10% of the users seem to be data engineers doing lots of time series work like predictions and forecasting.
Based on the cost estimator, I expect most decent applications built on top of Timescale to pay a rough $1- $4k/month for Timescale.
That also means the buying cycle for Timescale is complex, likely involving:
Software engineers trying out Timescale either DIY with the open-source version or the 30-day trial
CTOs/PMs approving the acquisition costs following a multi-month sales cycle of prototyping and alternative comparison
The software engineer's need situation is pretty clear: He likely has already built an application using either PostgreSQL or any other standard data store technology, but they cannot get to the performance levels they want. This means they usually work in real-time settings. The two most prominent ones I could make out are finance and IoT.
But this is also where the competition comes in. By far, the most common alternative to TimescaleDB is DIY! Timescale is open-source, so it targets developers who want to get their hands dirty. That means experienced developers who can launch an open-source project into the Cloud and maintain it for the duration of a prototype themselves.
Timescale has serious competition there. There are multiple other open-sourced time series databases, and PostgreSQL has multiple extensions that make working with time series easier.
On the other hand, commercial offerings like the $3B valuation PingCap offer a managed TiDB with essentially the same promise. There’s also a $1B valuation InfluxData, with essentially the same sales pitch.
However, in an exploding market, competitors aren’t the real problem; competition might be. It’s a growing pie, and every developer will make due somehow, so the question for Timescale and all other players in the market is essential: How do I grab the biggest piece of a quickly growing pie?
The potential options for Timescale’s future development
We’re already seeing the current strategy of Timescale: Probing additional horizontal use cases like a
General purpose managed PostgreSQL instance
AI/ML use cases that need vector embeddings
But what other options are there? Which makes the most sense, given the potential for a huge surge in demand for time series-based application development?
Let’s talk about five options that are at the top of my mind:
1) Expand/shift horizontally to data teams (that currently only make up 10% of the users)
2) Keep on working with the segment they target, DIY-friendly, OS-affine developers
3) Expand to non-DIY-friendly developers
4) Expand/shift horizontally to other use cases like AI, ML, or others!
5) Expand vertically to strengthen the time series use case, e.g., narrowing in on finance/IoT
1. Why data teams are no good
As a former data PM and an enthusiast for data teams, I often look to data teams as a first solution. But it is clear that they aren’t the ones building the future of data & AI applications; it’s software engineers who do. And I don’t see a reason to count on a change there.
Given that, I believe the future market demand will grow and continue to come from software engineers and their product teams looking to build data-heavy products, not data scientists or data engineers doing forecasting-like use cases.
2. Why the current direction won’t hold forever
Given that Timescale reported growth numbers in 2022 but not in 2023, I assume the growth of the paying customer base (500 in 2022) has slowed. My best guess is a standing 1,000 paying customers. That’s a typical story in a market situation like this., where open source plays a significant role and targets DIY-friendly users.
Marketing to DIY-friendly, OS-affine developers have a few pros. First, time series databases are opaque, so it is hard to see how they perform on your specific use case without trying them out. Open source is a great way of “just trying out.”
Also, the open-source project is vast, with 16k GitHub stars, making it a great marketing vehicle.
But there are also serious downsides. The open-source tool is only used by very experienced software engineers (SEs) who can launch and maintain it themselves, even if it’s just for a test. The managed solution is likely primarily used by a small fraction of these teams that want to offload the maintenance. That makes the conversion rate from OS to managed timescale pretty low.
For most other developers, a managed TimescaleDB simply is too opaque. It is not the market leader in any significant market segment, so most will not even try it out unless the process is super easy.
That’s to say, the initial market segment is tiny and likely saturated, which explains the slowing growth in that segment (if true, I don’t have internal insights).
To truly break into the segment of very experienced software engineers, Timescale would have to step up the game and really score on the performance metrics card throughout multiple use cases. That’s a hard task.
There’s, of course, a different adjacent option.
3. Expand to non-DIY savvy developers
Expanding to less experienced developers is likely a good choice, but it comes with tradeoffs. Less experienced developers also, on average, maintain less heavy applications. However, the entire reason for adopting TimescaleDB at the moment is performance at scale.
So, the Timescale team would have to double down on ease of use, thinking about decoupling a part of the feature set from the open-source version, which could potentially produce alignment issues with their community.
The question also is whether a market is big enough to have less experienced development teams to make the pricing model work.
4. Expand/shift horizontally to other use cases like AI, ML, or others!
Given slowing growth, looking for other segments offering more growth opportunities is tempting. But adding a second segment also cuts the speed in one segment in half, and usually much more than that. Losing focus is one of the worst things a startup can do while it is looking to dominate its first market segment.
The more general question is: Where do we think the future demand for custom databases comes from?
AI & ML use cases are super hyped at the moment, but from my experience, it looks like vector embeddings are a niche case in a much larger picture. It is unclear that everyone needs to have their own vector embedding database; right now, I’d argue that’s only true for the most advanced ML developers - a tiny market.
But, returning to the confluence of the three trends, I believe time series is a good source of future demand!
5. Expand vertically to strengthen the time series use case
The problem with catching such trends is that your product is always built with assumptions, which change quickly in fast waters.
The only solution is to return to the core of your customers' problem: How do I build a reliable real-time application based on time series data?
Using a time series database might be part of the picture, but in general, the picture software engineers have in their heads is much bigger.
If you go through the list of use cases Timescale publishes on its website, you’ll see a plethora of examples. They cover a vast set of different industries with only a few commonalities. It looks like the only ones they share are “vast data” and “real-time applications.”
One option to drill down on the core problem is to pick 1-2 use cases and build vertically deeper technology. For instance, the customer stories contain 2-3 finance and crypto use cases. Finance makes for vast data and almost always needs real-time applications, so these market segments make for a good starting point.
For that use case, Timescale could provide a dashboarding solution or plain preaggregated dashboarding data with triggers to update dashboards on the fly. Timescale could also provide APIs built on top of its data.
Both would make it much simpler for software developers in the financial sector to go from 0-1 on a new application.
No one said it’s easy
I’m pretty certain of two things: There is a torrent coming, and Timescale needs to learn to surf it. Their best bet is to dig deep into the question of how to build time-series-based applications fast.
I have no idea which of the five options works best, but I can say there’s going to be trouble with the existing open-source community down the line. Timescale isn’t extracting the maximum it could from them, and future paths will likely not align with open-source growth.
That said, I’m amazed at how well Timescale has done so far in staying aligned with its community, and I hope this will continue.
Here are some special goodies for my readers
👉 The Data-Heavy Product Idea Checklist - Got a new product idea for a data-heavy product? Then, use this 26-point checklist to see whether it’s good!
The Practical Alternative Data Brief - The exec. Summary on (our perspective on) alternative data - filled with exercises to play around with.
The Alt Data Inspiration List - Dozens of ideas on alternative data sources to inspire you!