🐰 #20 Airbytes CDK, Querybook & Data Discovery Platforms; ThDPTh #20 🐰
Airbyte follows up with a CDK, Querybook is open-sourced, and how to choose a data discovery platform.
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
(1)🎁 Airbytes CDK
I think open-source data integration is the future of data integration. Both of the current newcomers in this space, meltano, and Airbyte are facing some hurdles. One of the biggest is the ease of contribution to their projects. Last month, meltano launched their SDK for building connectors, and only a month after Airbyte followed suit.
They do provide a speed run through the CDK which I really like. Of course the easier it is to add to an existing data integration solution, the better its adoption because whatever the company, it will always need some custom sources. A tool that makes developing custom sources easy would be right on the top of my list of tools to choose from.
I really recommend you check out the speed run through the CDK if you want to get a feeling for it.
Speedrun: Creating a Source with the CDK - Airbyte Documentation
We’re working with the Exchange Rates API, so we need to define our input schema to reflect that. Open the spec.json file here and replace it with:
(2) 📣 Choosing a Data Discovery Platform
Modern data architectures really need a data discovery platform. Otherwise, there’s not a chance self-service analytics will work. Since self-service analytics is usually the key to scaling any data architecture in terms of use cases, people, and source systems, this seems like an important missing & usually neglected piece of the puzzle.
So I really enjoyed Eugene Yan’s article about data discovery platforms which gives both, a specific tool recommendation as well as an evaluation framework for choosing or even building a system.
“Is your organization struggling with data discovery? If so, take a look at Amundsen, Atlas, and DataHub. Or if you’re trying to develop one in-house, consider how your features will help users answer their questions.”
Data Discovery Platforms and Their Open Source Solutions
What questions do they answer? How do they compare? What open-source solutions are available?
(3) ☀️ Querybook
I wrote an article about different BI “artifacts” a while back and included “Stories” as one artifact. Stories are basically graphs, tables and text meshed together to go deeper into a data set, to add context. I really like the idea of notebooks as an addition to a company’s BI stack so I really enjoyed it when I read that Pinterest just open-sourced Querybook which is a great notebooking/ story engine for SQL.
Basically, in a querybook you can put text, graphs, and SQL queries together, document the book, and share it with others. It supports lots of data sources and is extensible in most dimensions. Great to see yet another move into the open-source future of business intelligence.
Charlie Gu | Tech Lead…
🎄 In Other News & Thanks
I managed to publish one fun article and one deep article this week, I’d really like it if you take a look at them:
Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
P.P.S: Yeay! We made it to edition 20!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue