Discover more from Three Data Point Thursday
🐰 #17 Singer SDK, Data as a Product, Readmes for Data; ThDPTh #17 🐰
Finally a singer SDK, the data as a product webinar and using readme driven development for better data related work products.
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
🔥 Singer SDK + SingerHub + Spec Extension
I do believe that the future of data, be it BI or data integration & EL (T) workflows is open-source simply due to the nature of the task. So it’s great to see the GitLab meltano team tackle the three major challenges that are out there with one of the current options.
Let’s step back for a second. Currently, there really is only one open-source data “connector” which is called singer. But singer got a bunch of problems. Still, the meltano team decided to base its tool on top of singer. The great news? Meltano is starting to tackle all of the major problems with singer.
The first problem is how to find singer taps. They will work on the “SingerHub” in April which will start to address this problem. Other tools like airbyte already have something like this by integrating all “taps” into one repository. The second problem is developing a new tap which just became a lot easier with their new advanced cookiecuttr template. The third problem is the actual spec of singer. The spec is way too generic to make it a successful OS project simply because it leaves too much freedom. Airbyte solves this issue by wrapping all taps into docker. The meltano team will work on extending the singer spec so I’m looking forward to their approach to fix these problems.
The Meltano team launches their Singer Tap SDK - the easiest way to build and maintain high quality data extractors compatible with the Singer ecosystem.
🔮 Data as a Product Webinar
I recently watched Zhamak Dehghani’s webinar on “Data as a Product” in which she really takes the time to focus on just one of the principles of the data mesh paradigm shift. I really like the focus because seeing data as a product that has to be managed by-product management techniques and not just as a by-product is in my experience the single hardest thing about a data mesh. Everything else is just technical problems and things that derive from this principle.
Zhamak extends the already known “DATSIS” framework for creating good data products to include a few more items. One that I found important is the idea that a data product should be “valuable on its own” and “more valuable joined together”.
That’s an important idea and should tell you, that if you think your analytical database tables with lots of ids in there are data products, then you’re most likely mistaken. For the rest, watch the webinar.
Zhamak Dehgani explored during a recent webinar the principle of “data as a product” and described how this simple change in perspective has a deep …
📣 Readme Driven Development for Data
Documenting things isn’t the most fun exercise for data analysts, analytics engineers, or data scientists. And yet it’s a crucial step to get your work product to be used, be it an API, a dashboard, or a dbt model.
The GitHub Co-founder Tom Preston-Werner explains a great approach that makes keeping documentation up to date much easier: “Readme Driven Development”. The benefits are best explained by the terratest library which uses RDD:
“[RDD] ensures the documentation stays up to date and allows you to think through the problem at a high level before you get lost in the weeds of coding.” (terratest Contributing Guideline)
I use this approach myself and find it very useful no matter what you’re developing. If you’re doing a new dashboard, write the description first. Make sure it’s short and concise and suddenly your dashboard will become much less cluttered, much more focused.
…By the same principle a beautifully crafted library with no documentation is also damn near worthless. If your software solves the wrong problem or nobody can figure out how to use it, there’s something very bad going on.
🎄 In Other News & Thanks
Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue