Levels, Hamel Husain on MLOps, Julia; ThDPTh #51
ThDPTh is an author-reader-friendly newsletter, and I believe that both you and I should be able to take time off to spend it with friends and family, instead of writing, managing Twitter, or reading work stuff. So don’t expect a holiday edition between the 16th — 30th of December.
Enjoy your time off! (Inspired by Gruntworks, and my kids)
I had a short Twitter discussion with GitHubs’ Hamel Husain this week about MLOps which inspired me to take a deeper look into his thoughts on the topic.
And I found an interesting tool along the way!
Read about it below…
I’m Sven, I collect “Data Points” to help understand & shape the future, one powered by data.
Svens Thoughts
If you only have 30 seconds to spare, here is what I would consider actionable insights for investors, data leaders, and data company founders.
- The programming language Julia is becoming a real thing. 1–2 years ago, the language was already strong but didn’t have the ecosystem to compare to e.g. Python. In 2022, this might change. At least it seems to be worth a look.
- MLOps is a real thing. MLOps is a cry stemming from a simple problem. The machine learning infrastructure, the tooling so to say that is needed to accomplish a task is still lightyears behind the tooling for software engineering. But MLOps tools are starting to grow like weeds in the data space.
- Open doesn’t mean open-source, openness is the key. Levels is a very interesting company in a space with hardware, where open-source isn’t as easy to implement as in the software space. Levels still are trying to be 100% open and leverage that, just like COSS companies to their advantage.
Why you should invest in Julia now, as a Data Scientist
What: Logan Kilpatrick, the Julia Language Community Manager (didn’t know there is such a thing) makes a case for learning Julia now as a Data Scientist. Julia is a high-level programming language built both for accessibility and speed comparable to C/C++. It’s been experiencing quite a bit of growth in its data science-related ecosystem so it might be worth a look.
My perspective: A colleague of mine is quite a fan of Julia, so I’ve been looking at Julia on and off again for quite some time now. 1–2 years ago I still remember making the argument that Python is so much superior for the size of its ecosystem. But now, I am not so sure anymore. Julia is still making strong progress in its data science-related ecosystem so it indeed might be worth writing a few components in it.
Hamel Husain on Fast AI & nbdev
What: This podcast episode features Hamel Husain, an ML engineer focusing on ML infrastructure at GitHub with a long history in that field. He describes how the ML tooling field is far behind other fields. When he joined Airbnb, he thought Airbnb should be really advanced in its ML tooling….
. “It really blew my mind, this is Silicon Valley? There’s no ML tooling at all.”. That seemed to kick him off on a journey to keep on building ML tooling. The discussion is focused around fas.tai and nbdev, both very interesting tools.
My perspective: I enjoyed getting a peek into Hamel’s perspective, a look into the day-to-day of machine learners around the world. The projects he discusses are pretty interesting as well. Especially the complete integration of fast.ai of the GitHub API makes it a pretty amazing tool for going quickly from dev to production in machine learning projects.
But particularly the nbdev project caught my attention. Basically, nbdev is a tool that nudges you to use SWE best practices by making a few things easier. Nbdev allows you to write your code, docs & tests in ONE jupyter notebook and then breaks that down into actual documentation, tests, and Python class files.
The nbdev template also automatically creates a GitHub Action so your tests are run automatically. So basically, by using nbdev with GitHub you’ll get a small CI Stubb and online docs ready right from your first line of code.
What: The resource is the “secret master plan” from a start-up called Levels. They aim to build a product to reverse metabolic dysfunction. And they are doing it all in the open.
“Building in public has obvious advantages. Levels share almost all of its thinking publicly. That includes all-hands meetings and working sessions between company founders. Opening its doors has been a sharp move: despite being in beta, Levels has a groundswell of consumer support that’s unusual for a company at its stage.”
(taken from The Generalist, Levels: A cultural Anomaly)
My perspective: I made a connection here when reading this because of two points.
One, I was amazed to learn that Hex has been around since 2019, and turns out a lot of data companies work “behind closed doors” — staying in the dark for 1–2 years and then going for their “beta” which still is mostly closed.
If you compare that to the amazing speed Airbyte is putting forth, it’s a totally different story. Airbyte took 2 months to get out in the open and since has been there with a usable open-source product, a running beta & now a public product offering as well. Totally different story, many times the traction.
Two, the key to leveraging a community is not open-source, it simply is about being open. Period. Levels are just open, certainly not “open-source” and yet they are able to leverage the same effects open-source projects have, albeit not the whole power of network effects.
🎁 Notes from the ThDPTh community
I am always stunned by how many amazing data leaders, VCs, and data companies read this newsletter. Here are some of the reader’s recent noteworthy pieces.
Nothing new this week! If you have something, just ping me!
Only for email-based subscribers & interesting data-related topics: To share your projects just reply to the email and leave me a quick note with your name, company, and one sentence about your article/ project. I select a small list of the most fitting pieces.
🎄 Thanks => Feedback!
Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
And of course, leave feedback if you have a strong opinion about the newsletter! So?
It is terrible | It’s pretty bad | average newsletter… | good content… | I love it!
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue