This project implements the Transformer architecture from scratch considering Machine Translation as the usecase. It's mainly intended as an educational resource and a functional implementation of the architecture and the training/inference logic.
pip install tfs-mtuv[install]
git clone https://github.com/Giovo17/tfs-mt.git
cd tfs-mt
uv sync
cp .env.example .env
# Edit .env file with your configurationTo start training the model with the default configuration:
uv run src/train.pyTo run inference using the trained model from the HuggingFace repo:
uv run src/inference.pyThe whole project parameters can be configured in src/tfs_mt/configs/config.yml. Key configurations include:
- Model Architecture: Config, dropout, GloVe embedding init, ...
- Training: Optimizer, Learning rate scheduler, number of epochs, ...
- Data: Dataset, Dataloader, Tokenizer, ...
For a detailed explanation of the architecture and design choices, please refer to the Architecture Documentation.
The project supports various model configurations to suit different computational resources:
| Parameter | Nano | Small | Base | Original |
|---|---|---|---|---|
| Encoder Layers | 4 | 6 | 8 | 6 |
| Decoder Layers | 4 | 6 | 8 | 6 |
| d_model | 50 | 100 | 300 | 512 |
| Num Heads | 4 | 6 | 8 | 8 |
| d_ff | 200 | 400 | 800 | 2048 |
| Norm Type | PostNorm | PostNorm | PostNorm | PostNorm |
| Dropout | 0.1 | 0.1 | 0.1 | 0.1 |
| GloVe Dim | 50d | 100d | 300d | - |
Full documentation is available at https://giovo17.github.io/tfs-mt/.
If you use tfs-mt in your research or project, please cite:
@software{Spadaro_tfs-mt,
author = {Spadaro, Giovanni},
license = {MIT},
title = {{tfs-mt}},
url = {https://github.com/Giovo17/tfs-mt}
}