This project implements a 3-layer Deep Neural Network (DNN) using only NumPy to classify handwritten digits from the MNIST dataset. By building all components from scratch—including forward/backward propagation, activation functions, and gradient descent—we demonstrate the foundational mechanics of deep learning without high-level frameworks like TensorFlow or PyTorch.
Key Features:
- Pure NumPy implementation (no DL frameworks)
- He initialization for efficient learning
- Mini-batch gradient descent optimization
- ReLU activation in hidden layers
- Softmax output layer with cross-entropy loss
- Achieves 95% accuracy on validation set
.
├── DNN_using_NumPy.ipynb # Main Jupyter notebook with implementation
├── Neural Networks and Deep Learning (Notes).pdf # Notes explaining Mathematical part of Neural Networks
├── train.csv # MNIST training data (CSV format)
├── README.md # This documentation
└── requirements.txt # Python dependencies
-
Install Dependencies:
pip install -r requirements.txt
Requirements: NumPy, pandas, scikit-learn
-
Download Data:
- Place
train.csvin the project directory (available on Kaggle)
- Place
-
Execute the Notebook:
jupyter notebook DNN_using_NumPy.ipynb
-
Training Process:
- 3-layer network (128 → 64 → 10 units)
- 20 epochs with batch size 64
- Learning rate: 0.01
- Loss decreases from 1.12 to 0.13
| Epoch | Loss | Validation Accuracy |
|---|---|---|
| 1 | 1.1287 | - |
| 10 | 0.2068 | - |
| 20 | 0.1366 | 95.07% |
Input (784) → Hidden 1 (128, ReLU) → Hidden 2 (64, ReLU) → Output (10, Softmax)- Initialization: He weight initialization (
init_params()) - Forward Propagation:
forward_prop() - Activation Functions: ReLU and Softmax
- Loss Calculation: Cross-entropy (
cross_entropy()) - Backpropagation:
backward_prop() - Parameter Update: Gradient descent (
update_params())
Mini-batch gradient descent with shuffling:
for epoch in range(epochs):
perm = np.random.permutation(n)
for i in range(0, n, batch_size):
# Forward pass, loss calculation, backprop, update-
Hyperparameter Tuning:
- Implement learning rate scheduling
- Experiment with Adam optimizer
-
Regularization Techniques:
- Add L2 regularization
- Implement dropout layers
-
Architecture Enhancements:
- Add batch normalization
- Extend to convolutional layers (CNN)
-
Deployment:
- Create web interface for real-time predictions
- Optimize for mobile devices using ONNX
- Python 3.7+
- NumPy 1.21+
- pandas 1.3+
- scikit-learn 1.0+
Utkarsh Bhardwaj
Contact: ubhardwaj284@gmail.com
Publish Date: 8th June, 2025
Why This Project Stands Out: By implementing every neural network component from scratch, we demystify deep learning fundamentals while achieving competitive performance. This serves as both an educational resource and a foundation for more complex architectures.
