Skip to content

ash01825/vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ResNet From Scratch — Rebuilding Modern CNN Training (CIFAR-10)

This repository is a from-scratch reimplementation of ResNet-style convolutional networks, inspired by the original "Deep Residual Learning for Image Recognition" (He et al., 2015), with modern training practices layered on top.

The intent is not to chase benchmarks or novelty, but to rebuild the full training stack deliberately—architecture, optimization, scheduling, and data handling—to develop first-principles intuition for why these systems work.

Think of this as re-deriving ResNet in 2025, not copy-pasting it.


What This Project Is (and Isn’t)

This is:

  • A clean, readable ResNet implementation without torchvision.models
  • A practical study of modern CNN training dynamics
  • A reproducible, inspectable training pipeline
  • A foundation for experimentation and extensions

This is not:

  • A benchmark leaderboard submission
  • A hyperparameter sweep or AutoML setup
  • A research paper claiming architectural novelty

Key Design Choices

Architecture

  • Residual blocks implemented directly from the ResNet paper
  • Configurable depth via a single parameter (n)
  • Explicit skip connections (no hidden abstractions)

Dataset

  • CIFAR-10 (10-class image classification dataset)
  • 50,000 training images, 10,000 test images
  • 32×32 RGB images
  • Used deliberately for fast iteration while still exposing real training dynamics

Training Stack

  • Optimizer: AdamW (decoupled weight decay)
  • Learning Rate Schedule: OneCycleLR
  • Loss: Cross-Entropy
  • Precision: FP32 with high matmul precision
  • Compilation: torch.compile (when supported)

Data Handling

  • Explicit train/validation split
  • Dedicated data augmentation module
  • Dataset logic separated from training logic

Each choice is made to be visible and auditable, not hidden behind convenience APIs.


Project Structure

.
├── train.py              # Training loop + optimization logic
├── test.py 
├── layers.py 
├── model.py              # ResNet implementation (from scratch)
├── utils.py              # CIFAR-10 dataset + data augmentation
├── artifacts/            # Generated after training
│   ├── loss_curve.png
│   ├── val_accuracy.png
│   └── lr_schedule.png
├── best_resnet.pth       # Best model checkpoint (saved during training)
└── requirements.txt

Training Artifacts

Running training automatically generates a small set of high-signal artifacts:

  • Training Loss vs Epoch — convergence behavior
  • Validation Accuracy vs Epoch — generalization trend
  • OneCycleLR Schedule — learning rate dynamics

These plots are intentionally minimal and designed for:

  • fast iteration
  • sanity checking

Results

  • Final Test Accuracy: 92.95% on CIFAR-10
  • Training Time: ~3 hours
  • Hardware: NVIDIA RTX 4090

The model was trained after loading the entire CIFAR-10 dataset into memory to minimize data-loading overhead and maximize GPU utilization.

Data augmentation was carefully tuned and validated during training to improve generalization without introducing instability.


Running the Code

1. Install dependencies

pip install -r requirements.txt

2. Train the model

python train.py

Artifacts will be saved under artifacts/ after training completes.


Motivation

Many modern deep learning workflows hide critical decisions behind high-level abstractions. This project intentionally avoids that.

The focus is on:

  • understanding why residual connections stabilize deep networks
  • observing the impact of learning rate schedules vs architecture depth
  • building intuition for optimization-driven performance gains

CIFAR-10 is used deliberately: it is small enough for fast iteration, yet rich enough to expose real training dynamics.


Extensions & Experiments (Planned / Easy to Add)

  • Deeper or wider residual stacks
  • Ablations: constant LR vs OneCycleLR
  • Error analysis via sample predictions
  • Comparison against torchvision baselines

Acknowledgements

This implementation is based on the ideas introduced in:

He et al., 2015 — Deep Residual Learning for Image Recognition

All mistakes and design tradeoffs in this codebase are my own.


License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages