🐰 #24 Measuring data teams, dagster & airbyte & meltano again; ThDPTh #24 🐰
How to measure a data team’s success, why dagster is a kool tool, and how Airbyte compares to meltano in the EL(T) open-source space.
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
(1)🔮 Measuring Data Teams
Einar Orr, Co-founder of lakeFS and long-time data hero makes a good case for using meaningful metrics to evaluate data teams. The three metrics she suggests are:
Data development velocity
She makes a good case for the three and gives some insights on how to treat each of these metrics. I like that approach and find it feasible as well, but I still like to keep to the “good old stuff” which simply assumes data teams are good development teams, and as such we should really focus on the four key metrics first:
Meantime to restore
Change fail percentage
If you take a look at this, it means I rather focus on the foundations, teams that are quick to fix mistakes, and as a result have high data quality. Teams that are quick to react to changes in the environment and deploy frequently and as a result have a high development velocity and so on… But I’m sure both approaches work well.
Know your KPIs. Data engineering leaders should measure quality, uptime, and velocity to ensure their teams are operating effectively.
I’ve been playing around with dagster lately, comparing it to prefect and airflow and I came to like it. Two reasons make dagster so much fun. Of the currently in vogue data orchestrators it’s the one that:
Has the most compelling “vision”, focusing on being a true orchestrator, abstracting away the stuff below it
Is the most fun to develop!
What’s the vision? To orchestrate, basically build an overarching “DAG’’ regardless of your tool choice. You can use a Jupyter notebook, Spark, SQL, whatever, dagster doesn’t care. That resonates very well with what is currently happening in the typical data team and will very likely continue to happen in most teams.
Why is it fun to develop? First and foremost, because dagster makes it easy to write tests! Tests for the smallest units, tests for the whole flow. You can mock data and run things on your laptop quickly, you can easily swap environments and run against either integration or a production environment. That’s made possible by outputs & inputs and a stronger system around the “metadata of the flow”.
As a resource, for now, I recommend simply take a look at the journey of Mapbox, until I get around to write a “test-driven dagster” tutorial.
At Mapbox, we’ve adopted Dagster without breaking compatibility with our legacy Airflow systems – and with huge gains to developer productivity.
(3)🎄 Comparing Airbyte and Meltano
I sometimes feel like the unofficial Airbyte evangelist… Robert Stolz of preset wrote a comparison of the two major upcoming open-source data integration solutions Airbyte and Meltano. He uses it for a smaller project so the comparison does not involve the question of scale. But he does give a good introduction to both tools.
If you’re on the hunt for a new data integration tool, go read the article.
Airbyte and Meltano compared
🎄 In Other News & Thanks
Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
I did get some good writing in this week, and was able to produce a piece I really like:
The data space is booming, with companies like mongoDB (valued at 18 billion USD)…
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue