A research project exploring diffusion-based text generation using different transformer models as alternatives to traditional autoregressive language models.
mmDiffBERT/
├── pyproject.toml # Project dependencies and configuration
├── README.md # This file
├── wiki-mmBERT/ # mmBERT-based diffusion implementation
│ ├── finetune.py # Training script for mmBERT
│ ├── inference.py # mmBERT diffusion inference
│ ├── compare.py # mmBERT vs GPT-2 comparison
│ ├── gpt2_inference.py # GPT-2 baseline
│ ├── README.md # mmBERT-specific documentation
│ └── model_weights/ # Trained mmBERT model
└── wiki-roberta/ # RoBERTa-based diffusion implementation (legacy)
├── finetune.py # Training script for RoBERTa
├── inference.py # RoBERTa diffusion inference
├── compare.py # RoBERTa vs GPT-2 comparison
├── gpt2_inference.py # GPT-2 baseline
├── README.md # RoBERTa-specific documentation
└── model_weights/ # Trained RoBERTa model
This project explores diffusion-based text generation as an alternative to traditional autoregressive language models like GPT-2. Instead of generating text left-to-right one token at a time, this approach uses:
- Fixed prefix (first 16 tokens)
- Mask tokens for remaining positions
- Iterative denoising over multiple steps
- Progressive unmasking until fully denoised
- Bidirectional Generation: Unlike autoregressive models, can attend to full sequence context
- Iterative Denoising: Gradually reveals text over configurable number of steps
- Prefix Control: First 16 tokens remain fixed, providing stable context
- Visual Animations: Step-by-step matplotlib animations showing generation process
- Comparative Analysis: Side-by-side comparison with GPT-2 baseline
- Multi-Model Support: Support for different transformer models (mmBERT, RoBERTa)
This project uses uv for package management.
# Install dependencies
uv syncRequirements:
- Python >= 3.11
- PyTorch 2.7.0+
- Transformers 4.52.4
- Datasets 3.6.0
- Matplotlib 3.10.3
- Accelerate 1.7.0
cd wiki-mmBERT
# Basic text generation
python inference.py "Your prompt here"
# Fine-tuning
python finetune.py
# Side-by-side comparison with GPT-2
python compare.py "Your prompt here"cd wiki-roberta
# Basic text generation
python inference.py "Your prompt here"
# Fine-tuning
python finetune.py
# Side-by-side comparison with GPT-2
python compare.py "Your prompt here"- mmBERT Implementation: See
wiki-mmBERT/README.mdfor detailed mmBERT-specific documentation - RoBERTa Implementation: See
wiki-roberta/README.mdfor detailed RoBERTa-specific documentation
This project is designed for research into:
- Alternative text generation paradigms
- Bidirectional context utilization
- Iterative refinement approaches
- Model comparison and evaluation
This project is based on the original RoBERTaDiffusion implementation:
@misc{robertadiffusion2024,
title={RoBERTa Diffusion Text Generation},
author={Nathan Barry},
year={2024},
url={https://github.com/nathan-barry/RoBERTaDiffusion},
note={A research project exploring fine-tuning BERT-style models for text generation}
}Original Repository: nathan-barry/RoBERTaDiffusion
This mmDiffBERT project extends the original work by:
- Supporting multiple transformer models (mmBERT, RoBERTa)
- Adding Hindi language support with Wikipedia and Sangraha datasets
- With the primary objective of checking the syntax and compatiability.