š° #24 Measuring data teams, dagster & airbyte & meltano again; ThDPTh #24 š°

How to measure a data teamās success, why dagster is a kool tool, and how Airbyte compares to meltano in the EL(T) open-source space.
Data will power every piece of our existence in the near future. I collect āData Pointsā to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
(1)š® Measuring DataĀ Teams
Einar Orr, Co-founder of lakeFS and long-time data hero makes a good case for using meaningful metrics to evaluate data teams. The three metrics she suggests are:
Data quality
Data development velocity
Data uptime
She makes a good case for the three and gives some insights on how to treat each of these metrics. I like that approach and find it feasible as well, but I still like to keep to the āgood old stuffā which simply assumes data teams are good development teams, and as such we should really focus on the four key metrics first:
Lead time
Deployment frequency
Meantime to restore
Change fail percentage
If you take a look at this, it means I rather focus on the foundations, teams that are quick to fix mistakes, and as a result have high data quality. Teams that are quick to react to changes in the environment and deploy frequently and as a result have a high development velocity and so on⦠But Iām sure both approaches work well.
How To Measure Data Engineering Teams
Know your KPIs. Data engineering leaders should measure quality, uptime, and velocity to ensure their teams are operating effectively.
(2)š Dagster
Iāve been playing around with dagster lately, comparing it to prefect and airflow and I came to like it. Two reasons make dagster so much fun. Of the currently in vogue data orchestrators itās the one that:
Has the most compelling āvisionā, focusing on being a true orchestrator, abstracting away the stuff below it
Is the most fun to develop!
Whatās the vision? To orchestrate, basically build an overarching āDAGāā regardless of your tool choice. You can use a Jupyter notebook, Spark, SQL, whatever, dagster doesnāt care. That resonates very well with what is currently happening in the typical data team and will very likely continue to happen in most teams.
Why is it fun to develop? First and foremost, because dagster makes it easy to write tests! Tests for the smallest units, tests for the whole flow. You can mock data and run things on your laptop quickly, you can easily swap environments and run against either integration or a production environment. Thatās made possible by outputs & inputs and a stronger system around the āmetadata of the flowā.
As a resource, for now, I recommend simply take a look at the journey of Mapbox, until I get around to write a ātest-driven dagsterā tutorial.
Incrementally Adopting Dagster at Mapbox | Dagster Blog
At Mapbox, weāve adopted Dagster without breaking compatibility with our legacy Airflow systems ā and with huge gains to developer productivity.
dagster.io Ā ā¢Ā Share
(3)š Comparing Airbyte andĀ Meltano
I sometimes feel like the unofficial Airbyte evangelist⦠Robert Stolz of preset wrote a comparison of the two major upcoming open-source data integration solutions Airbyte and Meltano. He uses it for a smaller project so the comparison does not involve the question of scale. But he does give a good introduction to both tools.
If youāre on the hunt for a new data integration tool, go read the article.
Which Open-source Data Integration Tool Is Right for Your Project? - Blog | Preset
Airbyte and Meltano compared
š In Other News &Ā Thanks
Thanks for reading this far! Iād also love it if you shared this newsletter with people whom you think might be interested in it.
I did get some good writing in this week, and was able to produce a piece I really like:
The data space is booming, with companies like mongoDB (valued at 18 billion USD)ā¦
towardsdatascience.com Ā ā¢Ā Share
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue