Hiring Data People,Tracking OS Growth, Great CLIs; Three Data Point Thursday #84
I’m Sven and I’m writing this to help you (1) build excellent data companies, (2) build great data-heavy products, (3) become a high-performance data team & (4) build great things with open source.
Every other Thursday, I share my opinion on three pieces of content about the data world.
Shameless plugs: Check out Data Mesh in Action (co-author, book) and Build a Small Dockerized Data Mesh (author, liveProject in Python).
Let’s dive in!
Hiring data people is hard, especially today.
Onboarding & training are becoming more and more important, especially for emerging roles like “analytics engineer”.
Astronomer just shared their framework for tracking growth of their open source projects (Airflow and more).
CLIs should be built with great UX in mind. There’s a great framework out there to achieve that, the 12 Factor CLI.
🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰🐰
(1) Hiring, Training & Onboarding AEs
What: Jacob Frackson from Montreal Analytics, an analytics consultancy, writes about good practices for hiring, onboarding and training analytics engineers in an increasingly difficult job market.
Jacob makes the case that in an increasingly difficult job market, it is more than ever important to onboard people and train people to ultimately retain them.
My perspective: I like this short peak behind the scenes and the focus Frackson sets on continuous learning as well as group learning. Go check it out.
Resource: https://blog.montrealanalytics.com/hiring-onboarding-and-training-analytics-engineers-316fdd9db3fd
Tracking Growth of an Open Source Project
What: The community team at Astronomer, the data company behind Apache Airflow, explains how they track the growth of their open source projects, Apache Airflow and OpenLineage.
My perspective: I like to think there are mostly 13 good business reasons to publish open source. Growth isn't fundamental to all of them. But for most of them, it is crucial. However, growth of OS projects is elusive as it falls into two categories, not one:
Growth in users of the output of the open source project
Growth in breadth of contributions.
All open source projects are mostly written & maintained by a very small subset of actual contributors. So growing the number of “external” contributors is usually not a worthy goal to pursue. Neither is going for contributions per se.
What is often important however is to increase the breadth of contributions, to widen the horizon of the project itself to a variety of applications & use cases. Indeed this is one of the unique strengths of open source, not to be confused with any other kind of growth.
I particularly like how the community team at astronomer goes about measuring both of these types of growth and realizing that breadth is indeed crucial to their success. I’m not sure why they bother to measure what they call “Development -the share of external contributions” but the article is still a good read.
Resource: https://medium.com/@astronomer.io/how-we-track-the-growth-of-apache-airflow-ad4d5e7dc5f1
Building CLIs with great UX
What: Jeff Dickey from Heroku translates the 12 Factor App methodology to CLIs. These 12 factors will help you build CLIs with great UX, something a lot of CLIs are lacking despite their ubiquity.
Great help is essential (no matter how a user inputs the word “help”! Use auto-complete)
Prefer flags to argos (don’t overcomplicate)
Output version in a number of ways (“version”, “-v”, “--version”,...)
Mind the streams (stdout, stderr, properly control error output)
Handle things going wrong (it’s just a CLI, much more stuff will go wrong than on an UI)
Be fancy! (Yes modern CLIs can do a lot of fun stuff! Use it.)
Prompt if you can. (But never require it)
Use tables. (But not table borders etc.! )
Be speedy. (under 500ms for a startup is a must!)
[…]
Be clear on subcommands vs commands.
[…]
My perspective: Since data companies love to build CLIs, I found this article particularly interesting to read. I agree with all he laid out, and would love for all data start ups to take them by heart. Almost all data CLIs are no fun to use at all. Recommended reading!
Deep geek mode: If you got more specific problems you can also check out clig.dev, they e.g. have good recommendations around subcommand command structures (noun verb or verb noun, like docker container create, docker container list,...).
Resource: https://medium.com/@jdxcode/12-factor-cli-apps-dd3c227a0e46