Understand the transformer architecture by learning about decoders with detailed explanations on the architecture and a mini-project
This repository is not made to use the code inside in itself, but as a summary of differents classes and papers you can find on the internet. It is a complete guide to understand the basics, but in details, of how decoders within the Transformer architecture work and how they can be used as a standalone architecture for certain tasks.
You will find:
-
A explanations.ipynb notebook in which you will find all the information about decoders and their code implementation.
-
A model.py file in which you will find the whole implemention in a single file.
If used alone to learn about decoders, I recommand to first check my other repository [Encoders_Explained]
- Vaswani, A., et al. (2017). "Attention Is All You Need". arXiv:1706.03762. [Paper]
- Hugging Face. (2022). "Transformer: decodeur". [YouTube]
- Machine Learning Studio. "A Dive Into Multihead Attention, Self-Attention and Cross-Attention". [YouTube]
- Machine Learning Studio. "Self-Attention Using Scaled Dot-Product Approach". [YouTube]
If this repository helped you understand decoders, consider giving it a star !
