Why you need to ditch dbt for SQLMesh today
8 things you have to know before building anything with LLMs; Ditch dbt! PandasAI is actually useful
I’m Sven writing this to help you build things with data. Whether you’re a data PM, inside a data startup, internal data lead, or investing in data companies, this is for you.
8 things you have to know before building anything with LLMs
Ditch dbt!
PandasAI is actually useful
Read this before building anything with LLMs
Everyone nowadays seems to reveal the “biggest AI revolution inside the tool X ever”.
And yet every single one I’ve seen is just fluff. No product, just a big messy AI-canon attached to a small bicycle.
I’ve been impressed with only one single LLM-based product: the Google demo.
Think you can be like Google?
Phillip Carter from honeycomb.io has a few words to share on that.
TL;DR: Building products with LLMs is frigging hard! They are a new shiny tool you know nothing about.
Our take: If you don’t know how to use your shiny new chainsaw,... you get the idea.
Luckily, we read the instructions, so you don’t have to:
Context windows are a challenge with no complete solution
Solution: you’ll have to figure it out.
2. LLMs are slow, and chaining is a non-starter
Solution: Don’t chain, then do anything to cache/use embeddings.
3. Prompt engineering is weird and has few best practices
Solution: Test a lot!
4. Correctness and usefulness can be at odds
Solution: Go for usefulness first.
5. Prompt injection is an unsolved problem
Solution: Implement safeguards! And stay scared.
6. LLMs aren’t products
Solution: Integrate the LLM into your existing product.
7. LLMs force you to address legal and compliance stuff
Solution: “Do a full security and compliance audit of LLM providers. Spoiler alert: only OpenAI could meet our requirements for now. Props to them for building a robust service!”
8. Early Access Programs won’t save you
Solution: Get out as fast as possible, and test your stuff as broadly as possible.
PandasAI might actually make you more productive
The big fat panda just got an AI update, pandasAI, by individual contributors, with no association with the panda itself.
The shocking truth: pandasAI actually looks useful! Sadly the creators didn’t read the piece above.
PandasAI implements two groups of features:
query-like abilities for data analysis
AI-powered shortcuts.
The first category looks like this:
They call these “queries,” and it's, in our opinion, not the main thing here.
The main thing is this:
And this:
That’s how you integrate LLMs smartly into a product to make people more productive.
Sadly, it looks like the PandasAI creators are not aware of what they should focus on and instead keep pushing these queries and prompts.
So what? Well, maybe it’s time for someone else to develop this second set of features.
Why you should ditch Dbt for SQLMesh
What you need to know: SQLMesh is a dbt alternative, one that focuses more on the developer inside the analytics engineer, than the analyst (who dbt focuses on).
So, SQLMesh it’s perfect for data engineers! The people inside the data workflow, adding the most value to data.
Dbt has always been the hero of the analyst, slowly teaching them to use git, how to test thing etc.
That’s been all good and well, but dbt thus was and still is designed around this particular user set.
Dbt has always been the hero of the analyst, not the data engineer.
DbtLabs has been struggling to adapt dbt to other users like machine learners, data engineers, data scientists, and basically anyone not building dashboards.
We think it’s because DbtLabs doesn’t want to let go of the movement, of their current community.
Well turns out, the most important people inside a company using dbt are actually not analysts turned analytics engineers, but instead data engineers and data scientists turned part-time analytics engineers.
SQLMesh is designed for them. Providing a great development environment off the bat!
We think: If DbtLabs keeps up their stance on not letting anyone behind, basically anyone can snatch the whole pie out of their hands.
What you need to do:
If you’re using dbt: Give SQLMesh a try
If you’re someone else: give it a star/like/share,
If you’re at tobiko data: redo your product messaging; your idea is way better than your messaging...
If you’re looking for product ideas: “{existing product name} for {slightly different customer group} is always a good pattern to start with.”
FWIW, if you work in product at DbtLabs, or are in a similar situation: No, you don’t need to ditch your precious community.
But if your new targeted customer segments (MLers and data engineers) are pretty distinct, you need to develop focused products just for them.
So stop extending dbt to data engineers and launch the “dbt-powered data pipeline IDE.” Instead of adding Python to dbt, launch the “ML-modeller addon to dbt.”
You get the idea.