Discover more from Three Data Point Thursday
Everything you need to know about geometric deep learning
You should be using alt data; You should not be using geometric deep learning; Unless you’re a data entrepreneur.
I’m Sven writing this to help you build things with data. Whether you’re a data PM, inside a data startup, internal data lead, or investing in data companies, this is for you.
You should be using alt data
You should not be using geometric deep learning
Unless you’re a data entrepreneur
Let’s dive in!
Use Alt Data, seriously.
Do you know what’s an easy way to know what your local Best Buy sales figures look like?
Hire a drone (there’s a service for that), fly it over the parking lot at different times of the day, use image recognition to count the cars, and estimate the customer count and average basket size.
And that’s it.
That’s alternative data. Data that was collected unconventionally.
Most alternative data leverages a proxy, something that proximates what you really want to know.
Our take: Next to no one is using alternative data besides a few selected industries (hedge funds!), but everyone should! You should.
Want to know more? I’m a fan of John Farrall's newsletter Alt Data Weekly.
Subscribe (free!) or someone will steal your data business & (data) users.
Don’t use geometric deep learning
TL; DR: Geometric deep learning is a new research area with vast business potential! As such, it is too early to apply in most practical situations and just the right time to invest in and build a company around it.
Welcome to flat land!
You just stumbled into an “Abstract Thought” machine. Just like in the movie “Inside Out,” you just got turned into this …
One of many two-dimensional forms. That’s the result of the two-dimesionalization phase of abstraction.
I don’t know about you, but I don’t think that’s a fun way to live. I wouldn’t even know how to hold my coffee cup this way!
@hex: Yes, I still got it; it’s in active use!
And yet, believe it or not, our world is two-dimensionalized daily by almost every big data company and ML engineer.
They all do, use, and can’t stop it.
The key point: This happens by collecting two-dimensional data. OR by abstracting more dimensional data into two dimensions.
Side note: Going from anything to two dimensions is done via encodings and embeddings.
So what’s the big deal? You lose stuff by doing so, just as with my coffee cup! You lose an unidentifiable amount of information. And my morning brew!
The solution: It’s called Geometric deep learning (GDL)!
But, there’s always a but; there are two big bad things about GDL.
Bad thing No. 1: For some reason, people inside GDL want to scare others away by using words like geodesics, manifolds, gauges, and groups (that word scares even me!).
Bad thing No. 2: The world of machine learning is all built up on top of tables. GDL needs to develop a new algorithm for everything, so unless you’re lucky, you won’t be able to use it for your use case.
What is it?
Geometric deep learning realizes we always choose a model to collect data.
Thereby we abstract (=systemetically deleting information)
Almost all machine learning algorithms take as input just one format.
Some n-dimensional vector.
That sucks if a different model would contain more information.
Popular forms of things that are not vectors are …
Graphs (the reason why Facebook has PyTorch BigGraph)
And the sphere is essential for doing global weather forecasts (yes, that’s an unsolved problem, I didn’t know before..)
But GDL has to develop a pretty extensive toolchain to solve these things.
Because, well, you do need a different algorithm to use different inputs.
There are, for instance, GraphCNNs that feed the entire graph into the algorithm at each iteration.
Or… you can take the easy way out and flatten your stuff again using the BigGraph thingy from Facebook or Euler from Alibaba, or any other embedding/encoding framework.
I like this introduction, although it is already pretty technical!
Key takeaways: We got two lessons for you
If you’re building something already, don’t use GDL; flatten your stuff.
If you’re a data entrepreneur, go for it! It’s the stuff nerds do on nights and weekends.
Technical side note for all those telling me I’m misrepresenting things: The problem is not that everything is turned into a table, but rather that we turn it into ordinary real n-space and discard the proper geometry we already know about.
How was it?