Transformer Implementation

This repository holds the code for the training and architectural design of an auto-regressive decoder module.

Attention Trick:

A lower triangular matrix is used for attention scores, so that tokens only attend to the ones generated before them.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md