Object Detection Models: Comprehensive Comparison and Analysis

A comprehensive implementation and evaluation of three state-of-the-art object detection architectures: Faster R-CNN, YOLOv11n, and DETR on COCO 2017 and Pascal VOC 2012 datasets.

📋 Table of Contents

Overview
Features
Project Structure
Models Implemented
Datasets
Installation
Usage
Evaluation Metrics
Results
Visualizations

🎯 Overview

This project provides a comprehensive comparison of three different object detection paradigms:

Two-stage detectors (Faster R-CNN with ResNet-50 backbone)
One-stage detectors (YOLOv11n)
Transformer-based detectors (DETR with ResNet-101 backbone)

Each model is evaluated on standard benchmarks using consistent metrics, with detailed analysis of their strengths, weaknesses, and use cases.

✨ Features

🔍 Three Detection Architectures: Implementation of Faster R-CNN, YOLOv11n, and DETR
📊 Comprehensive Evaluation: mAP, IoU, inference time, and FPS metrics
🎨 Rich Visualizations:
- Feature map extraction from multiple layers
- GradCAM interpretability analysis
- Success and failure case studies
- Comparative performance charts
📈 Dual Dataset Evaluation: COCO 2017 and Pascal VOC 2012
📝 Detailed Documentation: Complete LaTeX report with architecture analysis
🚀 Production-Ready Code: Well-structured, modular implementation

📂 Project Structure

Object-Detection-Models/
│
├── Faster-RCNN-Resnet50/          # Faster R-CNN implementation
│   └── faster-rcnn.ipynb          # Main notebook
│
├── Yolov11n/                      # YOLOv11n implementation
│   ├── output_yolov11n_coco/      # COCO dataset results
│   │   ├── coco-yolo-v-11-n.ipynb
│   │   ├── success_case_*.png
│   │   ├── failure_case_*.png
│   │   └── feature_maps_*.png
│   └── output_yolov11n_voc/       # Pascal VOC results
│       ├── pascal-voc-yolo-v-11-n.ipynb
│       └── [similar output files]
│
├── Detr/                          # DETR implementation
│   ├── output_detr_coco/          # COCO dataset results
│   └── output_detr_voc/           # Pascal VOC results
│
├── images/                        # Architecture diagrams and figures
│   ├── faster-rcnn-arch.png
│   ├── yolo11-arch.png
│   ├── detr-arch.png
│   ├── iou.png
│   ├── gradcam.png
│   └── ...
│
├── outputs_fasterrcnn/            # Faster R-CNN output files
├── coco-dataset/                  # COCO 2017 dataset
├── pascal-voc-dataset/            # Pascal VOC 2012 dataset
│
├── assignment_report.tex          # LaTeX source
├── assignment_report.pdf          # Final report
└── README.md                      # This file

🤖 Models Implemented

1. Faster R-CNN (ResNet-50-FPN)

Type: Two-stage detector

Architecture:

Backbone: ResNet-50 with Feature Pyramid Network (FPN)
Region Proposal Network (RPN): Generates candidate object regions
RoI Head: Classifies and refines bounding boxes

Key Characteristics:

High accuracy for complex scenes
Excellent localization precision
Slower inference compared to one-stage detectors
Best for applications prioritizing accuracy over speed

Use Cases: Medical imaging, autonomous vehicles (non-real-time), quality inspection

2. YOLOv11n

Type: One-stage detector

Architecture:

Backbone: CSPDarknet with Cross Stage Partial connections
Neck: PANet (Path Aggregation Network)
Head: Direct prediction of bounding boxes and classes

Key Characteristics:

Real-time inference speed
Good balance of accuracy and efficiency
Excellent for detecting large objects
Single forward pass architecture

Use Cases: Real-time surveillance, robotics, mobile applications, live video processing

3. DETR (DEtection TRansformer)

Type: Transformer-based detector

Architecture:

Backbone: ResNet-101
Transformer Encoder: Global context understanding
Transformer Decoder: Direct set prediction with learned object queries

Key Characteristics:

No anchor boxes or NMS required
Global reasoning through self-attention
Best performance on objects with unusual aspect ratios
End-to-end trainable

Use Cases: Complex scene understanding, dense object detection, research applications

📊 Datasets

COCO 2017 (Common Objects in Context)

Validation Set: 5,000 images
Categories: 80 object classes + 11 stuff categories
Annotations: Bounding boxes, segmentation masks, keypoints
Characteristics: Complex scenes, multiple objects, various scales

Pascal VOC 2012

Validation Set: 5,823 images
Categories: 20 object classes
Annotations: Bounding boxes, segmentation masks
Characteristics: Single/few objects per image, clear backgrounds

🛠️ Installation

Prerequisites

Python >= 3.10
CUDA >= 11.8 (for GPU acceleration)

Step 1: Clone the Repository

git clone https://github.com/ranimeshehata/Object-Detection-Models.git
cd Object-Detection-Models

Step 2: Create Virtual Environment

# Using conda (recommended)
conda create -n cv python=3.10
conda activate cv

Step 3: Install Dependencies

# Core dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install ultralytics  # For YOLOv11
pip install pycocotools
pip install transformers  # For DETR
pip install numpy pandas matplotlib pillow
pip install tqdm opencv-python

# for visualization
pip install seaborn plotly

Step 4: Download Datasets

COCO 2017:

mkdir -p coco-dataset
cd coco-dataset
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip val2017.zip
unzip annotations_trainval2017.zip
cd ..

Pascal VOC 2012:

mkdir -p pascal-voc-dataset
cd pascal-voc-dataset
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_11-May-2012.tar
cd ..

🚀 Usage

Running Faster R-CNN

jupyter notebook Faster-RCNN-Resnet50/faster-rcnn.ipynb

Key steps in the notebook:

Load pretrained model from torchvision
Configure COCO/VOC dataset loaders
Run inference on validation set
Calculate mAP and IoU metrics
Generate feature maps and GradCAM visualizations
Analyze success/failure cases

Running YOLOv11n

For COCO:

jupyter notebook Yolov11n/output_yolov11n_coco/coco-yolo-v-11-n.ipynb

For Pascal VOC:

jupyter notebook Yolov11n/output_yolov11n_voc/pascal-voc-yolo-v-11-n.ipynb

Key steps:

Load YOLOv11n pretrained weights
Run batch inference with confidence threshold 0.25
Convert predictions to COCO format
Evaluate using COCO evaluation API
Extract feature maps from backbone layers
Visualize success and failure cases

Running DETR

jupyter notebook Detr/detr_evaluation.ipynb

Key steps:

Load DETR from Hugging Face transformers
Process images through transformer encoder-decoder
Apply Hungarian matching for evaluation
Calculate metrics and visualize attention maps

📏 Evaluation Metrics

1. Intersection over Union (IoU)

IoU = Area of Intersection / Area of Union

Measures the overlap between predicted and ground truth bounding boxes.

2. Mean Average Precision (mAP)

mAP = (1/|C|) × Σ AP_c

Where:

C is the set of categories
AP_c is the Average Precision for category c
Reported at IoU threshold 0.5 (mAP@0.5)

3. Inference Time & FPS

Inference Time: Average time per image (milliseconds)
FPS: Frames per second = 1000 / inference_time_ms

📈 Results

Performance on COCO Validation Set

Model	mAP@0.5	Avg IoU	Inference Time (ms)	FPS
Faster R-CNN	46.145	0.5377	113.9	9.1
YOLOv11n	44.6	0.5051	16.4	56
DETR	60.5	0.6588	75.3	13.3

Performance on Pascal VOC Dataset

Model	mAP@0.5	Avg IoU	Inference Time (ms)	FPS
Faster R-CNN	48.693	0.3533	110	8.8
YOLOv11n	56.8	0.7176	14.48	70
DETR	74.2	0.8175	86.3	11.3

Key Findings

✅ DETR achieves the highest mAP and best IoU scores, excelling at complex scenes
✅ YOLOv11n offers good accuracy with reasonable speed, best for real-time needs
✅ Faster R-CNN provides balanced performance with excellent localization

🎨 Visualizations

Feature Map Extraction

Feature maps are extracted from three layers of each model:

Early layers: Low-level features (edges, textures, colors)
Middle layers: Mid-level features (object parts, local structures)
Late layers: High-level semantic features (complete objects)

Example outputs saved in:

Yolov11n/output_yolov11n_coco/feature_maps_8_channels.png
Yolov11n/output_yolov11n_coco/feature_maps_average.png

GradCAM Visualization

Gradient-weighted Class Activation Mapping shows which regions contribute most to detections:

Highlights important image regions for specific object classes
Helps understand model decision-making
Validates that models focus on correct features

Success and Failure Cases

Each model's performance is analyzed through:

Success cases: High-confidence, accurate detections
Failure cases: Missed objects, false positives, localization errors

Examples saved in respective output directories.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
Detr		Detr
Faster-RCNN-Resnet50		Faster-RCNN-Resnet50
Yolov11n		Yolov11n
images		images
outputs_fasterrcnn/outputs_fasterrcnn		outputs_fasterrcnn/outputs_fasterrcnn
.gitignore		.gitignore
README.md		README.md
assignment_report.aux		assignment_report.aux
assignment_report.fdb_latexmk		assignment_report.fdb_latexmk
assignment_report.fls		assignment_report.fls
assignment_report.log		assignment_report.log
assignment_report.out		assignment_report.out
assignment_report.pdf		assignment_report.pdf
assignment_report.synctex.gz		assignment_report.synctex.gz
assignment_report.tex		assignment_report.tex
assignment_report.toc		assignment_report.toc

ranimeshehata/Object-Detection-Models

Folders and files

Latest commit

History

Repository files navigation

Object Detection Models: Comprehensive Comparison and Analysis

📋 Table of Contents

🎯 Overview

✨ Features

📂 Project Structure

🤖 Models Implemented

1. Faster R-CNN (ResNet-50-FPN)

2. YOLOv11n

3. DETR (DEtection TRansformer)

📊 Datasets

COCO 2017 (Common Objects in Context)

Pascal VOC 2012

🛠️ Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Create Virtual Environment

Step 3: Install Dependencies

Step 4: Download Datasets

🚀 Usage

Running Faster R-CNN

Running YOLOv11n

Running DETR

📏 Evaluation Metrics

1. Intersection over Union (IoU)

2. Mean Average Precision (mAP)

3. Inference Time & FPS

📈 Results

Performance on COCO Validation Set

Performance on Pascal VOC Dataset

Key Findings

🎨 Visualizations

Feature Map Extraction

GradCAM Visualization

Success and Failure Cases

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages