F1RAG is a retrieval-augmented generation system focused on Formula 1 racing data. The system leverages historical Formula 1 data to provide insights, answer questions, and generate content related to Formula 1 racing.
The system uses Formula 1 World Championship data from 1950-2020, sourced from:
- Dataset: Formula 1 World Championship (1950-2020)
- Platform: Kaggle
- Credit: Rohan Rao (Dataset Creator)
-
Clone the repository:
git clone https://github.com/ibanrohazz/F1RAG.git cd F1RAG -
Quick setup (recommended):
# On Windows setup_env.bat # On any platform with Python python install_deps.py
-
Manual setup:
# Create and activate a virtual environment python -m venv env # On Windows env\Scripts\activate # On macOS/Linux source env/bin/activate # Install required packages pip install torch transformers datasets scikit-learn pandas matplotlib seaborn
-
Download the dataset:
- Download the Formula 1 dataset from Kaggle
- Extract the CSV files into the
data/raw/archivedirectory
Always activate the virtual environment first:
# On Windows
env\Scripts\activate
# On macOS/Linux
source env/bin/activateThen follow these steps:
-
Process the race data:
python src/data_processing.py
-
Train the model:
python src/rag_model.py
-
Generate race summaries:
python src/rag_model.py --generate
-
Visualize the race summaries (optional):
python src/visualization.py
If you encounter a ModuleNotFoundError, ensure you've:
- Activated the virtual environment
- Installed all dependencies
- Used the correct Python interpreter
To verify your environment, run:
python -c "import transformers; print('Transformers version:', transformers.__version__)"If that fails, try reinstalling the transformers package:
pip install --upgrade transformersThis project uses data under the terms provided by Kaggle and the dataset creator. The F1 data is provided for educational and research purposes.
- Thanks to Rohan Rao for compiling and sharing the comprehensive Formula 1 dataset on Kaggle.
- Formula 1 (F1) for the exciting sport that generated this data.