How to choose your LLM architecture - Yes, you should have one

Alt data can be used for any business problem; How to choose your LLM architecture; AI is frightening and awesome.

Jul 06, 2023

I’m Sven writing this to help you build things with data. Whether you’re a data PM, inside a data startup, internal data lead, or investing in data companies, this is for you.

Want to share anything with me? Hit me up on Twitter @sbalnojan or LinkedIn.

Alt data can be used for any business problem.
How to choose your LLM architecture
AI is frightening and awesome.

Let’s dive in!

Alt data can be used for all business problems!

Not just the hedge fund massive investment-no-data kind of problems.

It’s growing exponentially. https://alternativedata.org/alternative-data/

Don’t believe us? Le’s look at a business problem in the physical world.

If you can solve these types of problems with alt data, surely you can also solve any digital one!

…

Question: Does my store get enough foot traffic to keep it open?

Problem: Data = foot traffic counts; hard to get by. It’s either never collected or not available to the store owner.

Solution: Use alt data.

Here’s how Bernard Marr approached this problem:

Install a small cell phone signal counter next to the entry door
Use it to count foot traffic outside
And inside the store
And the number of people stopping to look
Using this data, the sales team can estimate the conversion rate (which was fine)
And the foot traffic outside, which was not acceptable.
Easy decision = close the store.

TL;DR: Alt data is there to stay in every industry. Thanks to free software and accessible data, it should be in everyone's toolkit.

The essential lesson for you! “Alt data” isn’t anything new, but it is hot (providers are growing exponentially);

MartinChes reminded me that it is an excellent wrapper for hard-to-grasp concepts like orthogonal data sources, estimation exercises, or lateral thinking.

How to choose your LLM architecture

Building something with LLMs? If not, you should. Here’s how from a technical perspective.

TL;DR: There’s a clear emerging architecture for starting out.

Architecture:

Use in-context learning;
Use OpenAIs API to create an embedding;
Use Pinecone to store your embedding;
Use an OpenAI model as LLM.
And use either LangChain or LlamaIndex for the orchestration.

LLMs are one kind of generative AI for asking (text) questions, and they are hot!

That doesn’t mean you should be afraid; in our opinion, it’s the perfect time to get started.

Here’s how you’d technically do it in more detail!

(1) Go for in-context learning. In-context learning/few shots = Not fine-tuning but providing a chain of prompts.
For example, turn “make this tweet shorter” => “Here are ten examples of a shortened tweet; [...], now make this tweet shorter”. (the stuff you add is the context)
(2) Use OpenAIs text-embedding-ada-002 to create the embeddings.
You can, of course, use different embeddings for different use cases.
But then we need a way to switch b! Pro tip: we highly encourage multiple embeddings; it’s more complicated but builds great moats!
(3) Store your embedding in any vector DB; Pinecone should be your default choice.
(4) Use Databricks/Airflow/ your favorite data orchestrator to orchestrate the above process. Then use either LangChain or LlamaIndex to orchestrate the in-context process.

Companion reads: A must-read is the honeycomb guide on all the problems that arise once you get your first prototype running…

And there’s the prompt engineering guide to help you with the in-context stuff.

Now go and build something extraordinary!

AI will create a bright and frightening future

Sam Altman traveled worldwide to convince people that AI is awesome and not a threat.

Now he’s sharing his biggest fears (bioterrorism and cyber threats) and his excitement (about erasing poverty).

Why are bioterrorism and cybersecurity threads (mine, too!) potential problems?

Because AI tech is growing exponentially.

It's why it "suddenly appeared" (it actually didn't). And humans are just terrible at predicting exponential growth. So we’re likely also terrible at predicting the threats here.

The current wave of AI is going to enable good stuff too! Ending global poverty by bringing

quality education
medical care

to everyone on the planet.

But Sam believes we do need global regulation on things that are existential threads and very little regulation around everything else.

Remember, from the future looking, today will be the first tech revolution, not the last!

There is a bright and technologically advanced future in front of us!

How was it?

Three Data Point Thursday

Discussion about this post