Introduction to Transformers

Course: CS25: Transformers United

Video Lecture: https://youtu.be/P127jhj-8-Y

Instructors: Div Garg, Chetanya Rastogi, Advay Pal

What you will learn in the course:

How do transformers work?
How they are being applied (beyond just NLP)
Some new directions of research

Introduction

The course is about Transformers which have revolutionized fields like natural language processing (NLP) and Computer Vision. It’s also now making strides in other areas of machine learning like reinforcement learning and other scientific fields like Physics and Biology.

Before jumping into Transformers and self-attention, we can start by discussing attention and its timeline. Before self-attention, which is one of the key ingredients of Transformers, we had other classical models like recurrent neural networks (RNNs), long short term memory (LSTM) networks, and simple attention mechanisms. Let’s look at the below timeline more in detail.

Attention Timeline. Figure adapted from Transformer United Course by Stanford.

Prehistoric Era…

In this era, there were models like Seq2Seq, LSTMs, and GRUs that were being used for NLP problems.