Skip to content

Seq2Seq NMT (PyTorch): Implementation of the classic Encoder-Decoder RNN architecture for Neural Machine Translation (German to English) featuring a modular project structure and using the Multi30k dataset.

Notifications You must be signed in to change notification settings

theMagusDev/Dncoder-Decoder-RNN-for-NMT

Repository files navigation

Sequence-to-Sequence (Seq2Seq) for Neural Machine Translation

Overview

This project provides an implementation of the classic Sequence-to-to-Sequence (Seq2Seq) architecture, powered by Recurrent Neural Networks (RNN/LSTM) and built with PyTorch, to address the problem of Neural Machine Translation (NMT).

The model is trained to translate from German (Source) to English (Target), utilizing the Multi30k dataset.

The implementation is based on an article by Ilya Sutskever, Oriol Vinyals, Quoc V. Le: https://arxiv.org/abs/1409.3215

🏛️ Model Architecture

The project implements the standard Encoder-Decoder approach:

  1. Encoder: A stack of multi-layered LSTMs that reads the input sequence (the German sentence, which is reversed for improved RNN performance) and generates a context vector (the final hidden states).
  2. Decoder: A stack of multi-layered LSTMs that takes the context vector and generates the target sequence (the English sentence) one token at a time.
  3. Teacher Forcing: Used during training to accelerate convergence and stabilize gradients.

🗂️ Project Structure

The project is organized into a modular structure for ease of testing and scalability.

nmt_project/
├── config.py           # Core hyperparameters and constants (dimensions, dropout, learning rate, special tokens).
├── data.py             # Functions for loading the Multi30k dataset, tokenization (WordPunctTokenizer), vocabulary building, and PyTorch DataLoader creation.
├── model.py            # Neural network module classes: Encoder, Decoder, Seq2Seq.
├── utils.py            # Helper functions (setting the random seed, weight initialization).
├── train.py            # Main script for launching model training and evaluation. Saves the best model.
├── inference.py        # Script for testing the trained model on arbitrary sentences.
├── best-model.pt       # Saved weights of the best model (created after the first training run).
└── notebooks/
    └── seq2seq_for_nmt.ipynb # The original code in Jupyter Notebook format for experimentation and visualization.

⚙️ Getting Started

Prerequisites

Install the necessary libraries using pip:

pip install -r requirements.txt

(Note: Running this requires a GPU (CUDA) for fast performance, but the code will automatically fall back to CPU if no GPU is available.)

1. Training the Model

To start the training process, execute the main script:

python train.py

The script will perform the following steps:

  1. Load and prepare the data.
  2. Initialize the model with parameters defined in config.py.
  3. Run the training loop, evaluating the model on the validation set after each epoch.
  4. Save the best performing model weights to best-model.pt.

2. Inference (Translation)

Once the best-model.pt file has been created, you can use the inference.py script to translate test sentences:

python inference.py

This script loads the saved model and performs a translation of a predefined test sentence (which can be modified within the if __name__ == "__main__": block).

💻 Experiments

The notebooks/seq2seq_for_nmt.ipynb file contains the step-by-step version of the implementation used for initial experimentation. You can use it to:

  • Visualize sentence length distributions.
  • Examine tokenization steps.
  • Line-by-line debugging of the model logic.

About

Seq2Seq NMT (PyTorch): Implementation of the classic Encoder-Decoder RNN architecture for Neural Machine Translation (German to English) featuring a modular project structure and using the Multi30k dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published