🐰 #22 AI Whisky, Data Business Models, Data Version Control; ThDPTh #22 🐰
How AI creates an award-winning whisky, how data companies make money, and which tools you can use to version data.
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
(1)🔮 Data Open Source Business Models
I just stumbled across some weird data orchestrator business models, so I started researching…
I’m sharing this article because as I said before, I believe the data space will be dominated by open source solutions pretty soon. As such I think it’s interesting to understand how open source companies actually make money and make sure they survive. Something we as end-users actually have a lot of interest in. I don’t like using tools that won’t be supported anymore in 2–3 years.
The authors run timescaleDB, an open-source database, so they are very aware of the fact that a lot of data companies go the open-source route. They provide a great list of big open-source data companies that made it like CockroachDB, Elastic, Databricks, MongoDB, and many more.
They also shine a good light on the true importance of community building and understanding why some business models are a better fit than others. I really enjoyed their take on it and will probably contribute something on the same lines soon.
5 ways open source software companies make money
A guide on how to evaluate the long-term sustainability of the business behind any open-source software you are using (or considering working on yourself).
(2)🔥 AI Whisky wins Gold
I sometimes like a sip of whisky, now an AI-mixed whisky has won a bunch of awards showcasing how the interaction of humans & machines in the future might look like.
“The work of a Master Blender is not at risk,” Angela states. “While the whisky recipe is created by AI, we still benefit from a person’s expertise and knowledge. We believe that the whisky is AI-generated, but human-curated. Ultimately, the decision is made by a person.”
It seems to be a good display of what I called “human aided machine engineering” and what will be the target model for AI-human interaction for the foreseeable future. I think it’s also a great model you should adopt when thinking about introducing AI into your products/ processes. That of course has two sides.
The good: it means you can probably introduce AI into much more things than you previously thought. The bad: AI probably won’t take the whole process out of your hands and as such, the value of introducing AI is probably lower than you think. I’m not sure whether this helps or confuses you, but at least it always keeps me thinking.
Mackmyra | Fourkind, part of ThoughtWorks | ThoughtWorks
Together with Fourkind, part of ThoughtWorks, Mackmyra created the world’s first whisky developed completely by machine learning. In an industry synonymous with deep-rooted tradition, human expertise and craftsmanship, what happens when 1,000-year-old techniques meet advanced 21st Century technology?
(3) 📣 Data Version Control
I just had a discussion with a friend about data version control and he pointed me to this comparison. The author, Guy Smilovsky provided a decent comparison of data version control tools in 2020 and to my knowledge, not much has changed in the space (sadly). I do still think some major innovation has to take place in this space, but so far really there are only two options for actually versioning data as code:
GFS is built for a different purpose and as such doesn’t work the way we need it to work to e.g. version control your database or the input of your machine learning models. It might still do to manage a large ML model, but that’s about it.
Other solutions like Pachyderm all come packaged with lots of baggage, and so are not really useful as standalone data version control. That leaves us with two options, DVC and lakeFS. DVCs pipeline functionality is nice, but not a must-have. LakeFS really shines with branching etc and I’m still looking forward to them implementing distributed version control features someday.
I don’t agree with everything the author says, in particular, I do really think we just need one tool to do data versioning, and as said, DVC and lakeFS shine there. But still, the comparison is sound.
Comparing Data Version Control Tools — 2020 | by Guy Smoilovsky | Towards Data Science
An overview and comparison of tools for data version control in 2020
towardsdatascience.com • Share
🎄 In Other News & Thanks
Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue