Course: CS25: Transformers United
Video Lecture: https://youtu.be/P127jhj-8-Y
Instructors: Div Garg, Chetanya Rastogi, Advay Pal
The course is about Transformers which have revolutionized fields like natural language processing (NLP) and Computer Vision. It’s also now making strides in other areas of machine learning like reinforcement learning and other scientific fields like Physics and Biology.
Before jumping into Transformers and self-attention, we can start by discussing attention and its timeline. Before self-attention, which is one of the key ingredients of Transformers, we had other classical models like recurrent neural networks (RNNs), long short term memory (LSTM) networks, and simple attention mechanisms. Let’s look at the below timeline more in detail.
Attention Timeline. Figure adapted from Transformer United Course by Stanford.
In this era, there were models like Seq2Seq, LSTMs, and GRUs that were being used for NLP problems.