Thoughtful Friday #20: TerraGrunt & Where is DataGrunt?

Nov 04, 2022

I’m Sven, and this is Thoughtful Friday. We’re talking about how to build data companies, how to build great data-heavy products & tactics of high-performance data teams. I’ve also co-authored a book about the data mesh part of that.

Let’s dive in!

Time to Read:

There’s a plethora of data tools and it only keeps on growing
Handling individual tools is being made easy
But the combination of multiple tools is hard, and just gets harder
We’re missing wrappers that provide good defaults for chains of these tools
And wrapper that provide good chains of these tools

🔮🔮🔮🔮🔮🔮🔮🔮🔮🔮🔮🔮🔮🔮🔮🔮🔮

The company Gruntworks is doing a great service to the infrastructure-as-code world, the kind that is still underdeveloped in the data world.

In a world full of complicated infrastructure pieces, tools like Pulumi and terraform try to provide an interface for them to standardize their life-cycle management. And yet, this doesn’t make working with them easy, not when you need more than one component. Gruntworks provides two additional pieces to the puzzle that make working with more than one component much easier:

Terragrunt as a “thin wrapper” that provides a good default way of working with infrastructure components and
Reference architectures as complete stacks you can take off the shelf, deploy but still modify as you wish.

These two pieces make working in the IaC world easier, let us look at what is still missing in the data world.

The key attributes of these two are reasonable defaults and chains of commands.

Terragrunt and the gruntworks reference architecture describe themselves as follows:

“Terragrunt is a thin wrapper that provides extra tools for keeping your configurations DRY, working with multiple Terraform modules, and managing remote state.” (https://terragrunt.gruntwork.io/)

“An opinionated, end-to-end tech stack built on top of the Infrastructure as Code Library that we deploy into your AWS accounts in about one day.” (https://gruntwork.io/reference-architecture/)

For me, both implement two similar ideas on different levels:

Defaults, preselected options for complicated things.
Chains of things, e.g. the reference architectures (which are chains of infrastructure components), possibly paired with good defaults.

We need more defaults & chains in the data world.

In the data world, we have a plethora of tools that integrate with each other. And yet we have nothing that is equivalent to “preselected options” or “chains of things”.

Yes, companies like dremio & databricks or start ups like iomete effectively provide us with defaults & chains of tools, but these tend to not be “wrapped” but abstracted away.

There is a crucial difference between wrapping & abstracting away.

Wrapping allows you to go to the underlying, the wrapped thing and change it. Wrapping makes things easy for 80% of users, and keeps all of the flexibility for the 20% of users.
Abstracting away makes things easy for 100% of the users, and reduces the flexibility, period.

Key idea: What we need in the data world are true “wrappers”, that allow to keep the flexibility of the underlying tools, of the plethora of data apps, and yet make them easy to manage for the majority of use cases.

What kinds of defaults & chains we might need.

The short answer is, I don’t know. But there is so much that would make working more productive that it’s easy to come up with a few random examples:

Dbt is quickly becoming the de facto standard for SQL-based transformations. There are great testing packages and monitoring packages for Dbt available. And yet there is no “opinionated dbt install” that ships with a good set of packages by default paired with a set of best practices to use them.
Dbt + Redshift/ Bigquery + Tableau/Looker/Superset is becoming one of the most used modern data stacks. And yet, I don’t see any easy way to set up this stack with a 1-click solution that still allows me to modify the underlying assets to fit my specific use case.

How about you, do you have more suggestions? Do you see these things emerging?

What did you think of this edition?

-🐰🐰🐰🐰🐰 I love it, will forward!

-🐰🐰🐰 Average newsletter...

-🐰 It is terrible

Want to recommend this or have this post public?

This newsletter isn’t a secret society, it’s just in private mode… You may still recommend & forward it to others. Just send me their email, ping me on Twitter/LinkedIn and I’ll add them to the list.

If you really want to share one post in the open, again, just poke me and I’ll likely publish it on medium as well so you can share it with the world.

Three Data Point Thursday

Discussion about this post