Machine Learning Sports Classification

A comprehensive machine learning project for classifying 100 sports categories from images using both deep learning (CNNs) and traditional machine learning approaches.

Overview

This project implements and compares multiple approaches for sports image classification:

Deep Learning (CNNs): ResNet, EfficientNet, Vision Transformer
Traditional ML: KNN, Random Forest, SVM
Feature Extraction: Color Histograms, HOG, LBP
Dataset Analysis & Augmentation utilities

Dataset size: 14,899 images across 100 sports categories.

Dataset

Dataset Statistics

Training: 11,958 images
Validation: 1,405 images
Test: 1,536 images
Total: 14,899 images

Directory Structure

MachineLearning-SportsClassification/
├── train/
│   ├── air hockey/
│   ├── archery/
│   └── ... (100 classes)
├── valid/
│   └── ... (100 classes)
└── test/
    └── ... (100 classes)

Quick Start

1. Environment Setup

Important: Use Python 3.10–3.12 (Python 3.13 is not supported by some ML libraries).

git clone https://github.com/asyau/MachineLearning-SportsClassification.git
cd MachineLearning-SportsClassification

python -m venv venv
source venv/bin/activate  # macOS / Linux
# venv\Scripts\activate  # Windows

pip install --upgrade pip
pip install -r requirements.txt

2. Dataset Preparation

Ensure your dataset is placed in the project root:

train/
valid/
test/

Dataset Analysis & Preparation (Optional)

Analyze Class Distribution

python analyze_data_distribution.py

Outputs:

Class distribution plots
Imbalance analysis
CSV summaries (saved to class-dist-analysis/)

Get Class Sizes

python get_train_class_sizes.py

Outputs:

train_class_sizes.txt

Augment Small Classes

Augments classes with fewer than 110 images.

python augment_small_classes.py

Create Train/Validation/Test Split

python create_train_val_test_split.py

Edit the script to configure:

Input directory
Output directory
Split ratios (default: 80 / 10 / 10)

Deep Learning (CNN) – Recommended

Training

python train.py

or

cd cnn
python train.py

Configuration (`cnn/config.py`)

MODEL_NAME = 'efficientnet_b4'  # resnet50, efficientnet_b0-b4, vit_b_16
BATCH_SIZE = 24
NUM_EPOCHS = 15
LEARNING_RATE = 0.0005

USE_CLASS_WEIGHTS = True
USE_MIXED_PRECISION = True
EARLY_STOPPING = True
EARLY_STOPPING_PATIENCE = 5

Evaluation

python evaluate.py

Supported metrics:

Top-1 / Top-3 / Top-5 Accuracy
Precision, Recall, F1 (Macro & Weighted)
Confusion Matrix
Per-class metrics

Inference

Single image:

python inference.py --image path/to/image.jpg

Directory:

python inference.py --dir path/to/images/

Top-K predictions:

python inference.py --image path/to/image.jpg --top_k 10

Custom checkpoint:

python inference.py --image path/to/image.jpg --checkpoint checkpoints/best_model.pt

TensorBoard

tensorboard --logdir logs/

Open: http://localhost:6006

Traditional Machine Learning Approaches

K-Nearest Neighbors (KNN)

python knn_color_histogram_only.py
python knn_hog_only.py
python knn_lbp_only.py

Random Forest (RF)

python rf_color_histogram_only.py
python rf_hog_only.py
python rf_lbp_only.py

Support Vector Machine (SVM)

Edit SVM_hog_color_lbp.py:

BASE_DIR = "."
SELECTED_FEATURE = 'color'  # 'color', 'hog', 'lbp'
USE_PCA = True

Run:

python SVM_hog_color_lbp.py

Outputs saved to svm_output/.

Combined Features

python concatinated.py

Uses Color + HOG + LBP together.

Project Structure

MachineLearning-SportsClassification/
├── cnn/
│   ├── config.py
│   ├── dataset.py
│   ├── model.py
│   ├── train.py
│   ├── evaluate.py
│   ├── inference.py
│   └── utils.py
│
├── train.py
├── evaluate.py
├── inference.py
│
├── analyze_data_distribution.py
├── augment_small_classes.py
├── create_train_val_test_split.py
├── get_train_class_sizes.py
│
├── knn_*.py
├── rf_*.py
├── SVM_hog_color_lbp.py
├── concatinated.py
│
├── train/
├── valid/
├── test/
├── checkpoints/
├── logs/
└── class-dist-analysis/

Requirements (Core)

Key dependencies (see requirements.txt for full list):

torch
torchvision
tensorflow
scikit-learn
opencv-python
numpy
pandas
matplotlib
seaborn
tensorboard

Troubleshooting

CUDA Out of Memory

Reduce BATCH_SIZE
Reduce IMAGE_SIZE
Enable mixed precision

Slow Training

Use smaller model (efficientnet_b0)
Enable mixed precision
Increase NUM_WORKERS

Dataset Not Found

Ensure train/, valid/, test/ exist in project root
Check BASE_DIR in scripts

Notes

CNN models give significantly better accuracy than traditional ML
Traditional ML is useful for baseline comparisons
Class imbalance handling is strongly recommended

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
class-dist-analysis		class-dist-analysis
cnn		cnn
test		test
train		train
train_old		train_old
valid		valid
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
SVM_hog_color_lbp.py		SVM_hog_color_lbp.py
analyze_data_distribution.py		analyze_data_distribution.py
augment_small_classes.py		augment_small_classes.py
augmentation_report.txt		augmentation_report.txt
concatinated.py		concatinated.py
create_train_val_test_split.py		create_train_val_test_split.py
evaluate.py		evaluate.py
get_train_class_sizes.py		get_train_class_sizes.py
inference.py		inference.py
knn_color_histogram_only.py		knn_color_histogram_only.py
knn_hog_only.py		knn_hog_only.py
knn_lbp_only.py		knn_lbp_only.py
requirements.txt		requirements.txt
rf_color_histogram_only.py		rf_color_histogram_only.py
rf_hog_only.py		rf_hog_only.py
rf_lbp_only.py		rf_lbp_only.py
train.py		train.py
train_class_sizes.txt		train_class_sizes.txt
train_class_sizes_old.txt		train_class_sizes_old.txt

asyau/MachineLearning-SportsClassification

Folders and files

Latest commit

History

Repository files navigation