Skip to content

wisecoder17/Delivery_ETA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Absolutely, Richard. Here’s a fully polished, GitHub-ready README.md with badges, Table of Contents, collapsible sections, and professional formatting. You can copy and paste it directly.

Delivery Time Analysis & Prediction System

Python License Status


Table of Contents


Overview

This project implements an end-to-end delivery time prediction pipeline using real-world logistics data.
It covers data preprocessing, feature engineering, model training, evaluation, model selection, and production-ready inference.

Objective: Predict delivery duration (in minutes) based on distance, traffic conditions, temporal factors, and weather.


Problem Statement

Accurate delivery time estimation is critical for:

  • Optimizing customer experience
  • Supporting operational planning
  • Improving logistics and fleet efficiency

Delivery time is influenced by non-linear, interacting factors, making it a regression problem best addressed with ensemble learning.


Dataset

Primary dataset: Amazon Delivery Dataset (~43,000 rows)

Contains:

  • Store & drop-off coordinates
  • Order timestamps
  • Traffic conditions
  • Weather categories
  • Actual delivery time (target variable)

Data Provenance: Publicly available; preprocessed chronologically to prevent data leakage.


Feature Engineering

Click to expand engineered features
Feature Description
distance_km Haversine distance between store and drop-off location
hour Hour of day extracted from order timestamp
day_of_week Integer-encoded weekday (0–6)
is_weekend Binary weekend indicator
traffic_level Ordinal traffic encoding (Low → Jam)
weather One-hot encoded categorical feature

Consistency: All transformations applied during training are reused during inference.


Modeling Strategy

Models Trained:

  • Linear Regression
  • K-Nearest Neighbors (KNN)
  • Random Forest
  • XGBoost (Selected Best Model)

Evaluation Metrics:

  • Mean Absolute Error (MAE) – primary
  • Root Mean Squared Error (RMSE)
  • R² Score

Selection Criterion: Lowest test MAE for operational relevance.


Results Summary

Model MAE RMSE
Linear Regression High High Low
KNN Moderate Moderate Low
Random Forest Overfitting observed Unstable Low
XGBoost Best Best 0.33

Key Insight: XGBoost captures non-linear interactions between traffic, time, and distance and generalizes best.


Project Structure

Delivery_ETA/
├── data/
│   ├── raw/
│   └── processed/
├── notebooks/
│   ├── eda_amazon.ipynb
│   └── modeling_amazon.ipynb
├── src/
│   ├── preprocessing.py
│   ├── dl_model.py
│   ├── eda_summary.py
│   ├── train_model.py
│   ├── evaluate_model.py
│   └── predict.py
├── models/
│   ├── amazon_best_model.pkl
│   └── Weather_encoder.pkl
├── results/
│   ├── figures/
│   └── tables/
├── requirements.txt
└── README.md

Setup Instructions

# Clone repository
git clone <repo-link>
cd Delivery_ETA

# Create virtual environment
python -m venv venv
source venv/bin/activate   # Linux/Mac
venv\Scripts\activate      # Windows

# Install dependencies
pip install -r requirements.txt

Usage

Command-Line Interface

python src/predict.py

Programmatic API

from src.predict import predict_delivery_time

predict_delivery_time(
    distance_km=12.5,
    hour=16,
    day_of_week=4,
    weather="Sunny",
    traffic_level="High",
    is_weekend=1
)

Features:

  • Computes distance if only coordinates are provided
  • Normalizes categorical inputs
  • Applies same encoders as used during training

Key Insights

  • Traffic level is the strongest predictor
  • Temporal features (hour, weekday) influence delivery time
  • Distance alone is insufficient; contextual features are essential
  • Ensemble models outperform linear baselines
  • Model and encoders are serialized; inference logic decoupled from training
  • All plots and tables are stored under results/

Future Improvements

  • Residual and error distribution analysis
  • Integration of real-time traffic data
  • API deployment (FastAPI)
  • Continuous retraining with additional datasets

About

Delivery Time Estimation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published