š° #26 Future of Data Engineering, Personalization & ML Observability; ThDPTh #26 š°
How RudderStack sees the future of data engineering, different approaches to personalization in machine learning models, and what ml observability actuallyĀ is.
Data will power every piece of our existence in the near future. I collect āData Pointsā to help understand & shape this future.
If you want to support this, please share it on Twitter, Linked In, or Facebook.
(1)š® RudderStack, Future of Data Engineering
RudderStack highlights a few interesting points in a recent article. One is the coming rise of C-level data executives, which is already happening.
Second is a shift towards data becoming important in every single development team, which is something thatās already being carried forward by the data mesh paradigm or in general platform teams as a concept.
They feel that moving data will become commoditized, and I agree, although I am still very worried about when we will get there. I feel like itās still gonna take some time because the important parts of that problem are not yet addressed. Not by RudderStack and so far not by anyone else in a meaningful manner.
Finally, they mention real-time data, which I think will play a huge role in the future, and Iāve already written about that; RudderStacks team already noticed that and integrated quite a bit of event data into their tool, but Iām not sure this is enough to move into the right direction.
So thatās the data engineering future according to RudderStack. I donāt think itās very comprehensive, but I do feel the points they make are sound. But since itās not very comprehensive, it makes me worry a little bit about RudderStacks general directionĀ ;)
The data engineering megatrend impacts companies across industries. Know the big changes to the field of data and for the role of data engineer .
rudderstack.medium.com Ā ā¢Ā Share
(2)š„ Patterns for ML Personalization
I really like the depth Eugene Yan provides in this overview. Back up a sec.:
āPersonalization is the process of customizing each individualās experience. Itās how an electronics geek gets different recommendations from a cooking hobbyist, and how they might get different results from the same search query (e.g., āAppleā)ā
This is the problem, and Eugene provides a nice little summary at the end which I have to share with you:
Ā When to use which? Hereās a rough heuristic:
- Want to continuously explore while minimizing regret? Bandits
- Starting with neural recsys and want something simple? Embeddings+MLP
- Have long-term user histories and sequences? Sequential
- Have sparse behavior data but lots of item/user metadata? Graphs
- Want generic embeddings for multiple problems? User models
Now if you want to know some of the details, just dive right into the article which is written really well!
Btw. I did also enjoy Eugeneās welcome series to his newsletter āHow to be an effective data scientistā.
Patterns for Personalization in Recommendations and Search
A whirlwind tour of bandits, embedding+MLP, sequences, graph, and user embeddings.
eugeneyan.com Ā ā¢Ā Share
(3)š£ What is ML Observability?
Aparna Dhinakaran wrote a decent article describing ML observability Iād like to share. Iāll recap a short bit. If we productionize ml systems, we soon will hit a few typical problems:
Training/Serving Screw (production data is different, and the system fails to work properlyā¦)
Changing data distribution (āoh itās summer, people buy different stuffā¦ā)
Messy data (āoh, these are all the same articles! Donāt you match duplicates?ā)
So where does observability come in to cure these problems? Well of course, it doesnāt cure any of them. ML observability means making these things transparent, which reduces the time to detection & to correction.
Then, Aparna describes an approach to achieve observability Iām not sure I agree on completely, but I find the general idea behind it very useful, after all, monitoring & observability really are just āstatistical process controlā which is exactly what a good agile production system in the physical world would implement.
In my mind the problem should be tackled more from the data as code perspective & then layer on the āstatistical process controlā, but again, I also find her approach useful.
š In Other News &Ā Thanks
Thanks for reading this far! Iād also love it if you shared this newsletter with people whom you think might be interested in it.
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue