☀️ Data Mesh Slack Community, Reverse ETL, Data Product SLAs; ThDPTh #9

Mar 04, 2021

Enter into data mesh learning mode, a new data tool category, and why you need data SLAs.

💥 Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.

🚀 If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

This week I stumbled through a series of things, the data mesh learning slack community, a primer on Reverse ETL, and Data Product SLAs.

This Weeks Three Data Points

1 🎄 Data Mesh Learning Slack Community

I “got” stumbled over the data mesh learning slack community, which is an excellent place to learn about data meshes and ask questions to people already involved deeply in building these things. Scott Hirleman et. al. did a great job of setting it up and provides a good framework for learning. It includes events and blog posts on the topic. So if you think about building a data mesh, I strongly suggest you join the group.

2 Reverse ETL or Just Another Data Pipeline

New tool categories in the data world are important, we’re still at day 1 with data so there are lots to come. A new entry is a category of “Reverse ETL” tools. In the data engineering world, you’re mostly focused on getting data into your data pot, transforming it, and serving it to end-users through some GUI or an API.

But recently, a lot of requirements for data engineering teams are on the other end, they are about sending OUT the transformed & merged data to other tools like Google Analytics, CRM Systems, Marketing Automation suits etc. You could write a new interface for each of these, but that multiplies pretty quickly and maintenance will be a nuisance.

That’s what “Reverse ETL’’ tools are here to solve. Both articles are a great primer together because I find some of the points raised by the rudderstack team really important, especially their perspective on “events”.

So these tools are great. However, I still see one problem with approaching the problem this way. Reverse ETL lengthens a “chain” that’s already way too long, from data ingestion, cleansing, transformation, to “Reverse ETL”. Problem is, the chain is built “orthogonal to the axis of change” and thus not very agile.

I’d rather see multiple small “chains” (possibly dropping some chain segments) which still can enjoy the “Reverse ETL” as a service for pushing out data. But I’d like to see these small chains cut with the axis of change, that is with business units, contexts, domains, etc. Of course, data meshes provide a general framework for providing these “small chains”, but also doing that within your current context is quite possible.

Ressources

A primer on Reverse ETL, by Astasia Myers
Rudderstacks response to “Reverse ETL”

3 🎁 Data Product SLAs /SLOs

I read both, the great SRE book by Google, as well as the data-focused follow-up “Database Reliability Engineering”. Both provide great concepts for SLAs & SLOs. This article is a good reminder of why SLAs & SLOs are so important for data teams. And to be honest, I have the feeling that most data teams don’t have either.

In my mind, the most important part about having a specific Service Level Agreement (agreed together with stakeholders) or a Service Level Objective (possibly self-set) is the transparency it brings to the work of a data team. Because honestly, a bunch of my SLOs from the time as a data engineer would read “provide 80% data accuracy” or “provide correct data on 60% of the weekdays”.

If these service levels aren’t even out there to talk about, your data team will probably never understand or get communicated why working on data quality is essential.

On the other hand, if you’re happy with your service levels, then I recommend taking the tool called “error budgeting” and converting it to a “data quality budgeting” tool. The basic idea is that you would agree together with your stakeholders on certain data quality service levels, like “95% of the days the data should be up to date by 1 hour.”. Now if the indicator falls below this level, say to 90% your team stops developing new features and starts working on quality till the service level is satisfactory. That way, you can handle the tension that exists between feature development and quality.

Ressources

Barr Moses article on Data Product SLAs
Site Reliability Engineering, Google
Database Reliability Engineering

In Other News, and Thanks for Reading!

I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. But I tend to be opinionated. But you can always hit the unsubscribe button!

By Sven Balnojan

Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.

Tweet Share

In order to unsubscribe, click here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Three Data Point Thursday

Discussion about this post