Masked self-attention: How LLMs learn relationships between tokens
Masked self-attention is the key building block that allows LLMs to learn rich relationships and patterns between the words of a sentence. Let’s build it together from scratch.
I’m currently the Director of AI at Rebuy, a personalized search and recommendations platform for D2C e-commerce brands. Prior to Rebuy, I was a Research Scientist at Alegion. Additionally, I worked for Salesforce Commerce Cloud for two years.
Masked self-attention is the key building block that allows LLMs to learn rich relationships and patterns between the words of a sentence. Let’s build it together from scratch.
The decoder-only transformer architecture is one of the most fundamental ideas in AI research.
Retrieval-augmented generation (RAG) is one of the best (and easiest) ways to specialize an LLM over your own data, but successfully applying RAG in practice involves more than just stitching together pretrained models.
Here’s a simple, three-part framework that explains generative language models.
This new LLM technique has started improving the results of models without additional training.