A transformer is a type of neural network architecture that uses self-attention mechanism to learn long-range dependencies between the input and output sequences
There are three main types of transformers used in large language models (LLMs):
-
Auto-regressive Transformer models: These models are trained to predict the next token in a sequence, given the previous tokens. They are often used for tasks such as text generation, translation, and summarization. Some examples of GPT-like models include GPT-3.5,GPT-4.
-
Auto-encoding Transformer models: These models are trained to predict masked words in a sequence. They are often used for tasks such as text classification, question answering, and natural language inference. Some examples of BERT-like models include BERT, RoBERTa, and DistilBERT.
-
Sequence-to-sequence Transformer models: These models are trained to translate text from one language to another. They can also be used for other tasks that involve converting one sequence of text into another, such as summarization and question answering. Some examples of BART/T5-like models include BART, T5, and Marian.