Delhivery Data Analysis

About Delhivery

Delhivery is the largest and fastest-growing fully integrated logistics provider in India as of Fiscal 2021. The company aims to build the operating system for commerce through a blend of world-class infrastructure, high-quality logistics operations, and cutting-edge engineering and technology capabilities.

The data team at Delhivery leverages vast datasets to enhance business intelligence, drive operational efficiency, and maintain profitability, creating a significant competitive edge.

Objective

The goal of this project is to process and analyze data generated by Delhivery's logistics operations to:

Clean, sanitize, and manipulate raw data to derive actionable insights.
Create useful features for the data science team to develop forecasting models.

Dataset

The dataset consists of records from Delhivery's logistics and operational data pipeline.

Key Features:

data: Indicates if the record is training or testing data.
trip_creation_time: Timestamp of trip creation.
route_schedule_uuid: Unique identifier for a route schedule.
route_type: Type of transportation (FTL, Carting).
- FTL: Full Truck Load shipments, faster delivery as there are no intermediate pickups/drop-offs.
- Carting: Delivery system using smaller vehicles (carts).
trip_uuid: Unique identifier for a trip (a trip can involve multiple source and destination centers).
source_center: ID of the trip's origin center.
source_name: Name of the trip's origin center.
destination_center: ID of the destination center.
destination_name: Name of the destination center.
od_start_time: Trip start time.
od_end_time: Trip end time.
start_scan_to_end_scan: Total time taken for delivery from source to destination.
actual_distance_to_destination: Actual distance in kilometers between source and destination.
actual_time: Cumulative time taken to complete the delivery.
osrm_time: Time calculated by the Open-Source Routing Machine (OSRM) considering shortest paths and typical traffic conditions (cumulative).
osrm_distance: Distance calculated by OSRM (cumulative).
segment_actual_time: Time taken for a segment of the delivery.
segment_osrm_time: OSRM-calculated time for a delivery segment.
segment_osrm_distance: OSRM-calculated distance for a delivery segment.

Additional Fields:

Some fields with currently unclear meanings, like is_cutoff, cutoff_factor, cutoff_timestamp, and factor, are included for completeness and may be explored further.

Process Overview

1. Feature Engineering:

Derived meaningful metrics such as:
- time_diff_hours: Time difference between od_start_time and od_end_time.
- Extracted components from timestamps (e.g., month, year, day of the week).
- Split and standardized source and destination names into city, place code, and state.

2. Data Cleaning:

Handled missing values using appropriate imputation techniques.
Addressed outliers with boxplots and the IQR method.

3. Categorical Feature Handling:

Applied one-hot encoding to variables like route_type for better interpretability in downstream models.

4. Normalization and Standardization:

Used MinMaxScaler and StandardScaler for numerical columns to align features to a uniform scale.

Key Insights

Route Type Insights:
- FTL routes are faster and more efficient for long distances compared to Carting.
Source and Destination Patterns:
- High-frequency routes indicate key operational hubs that could benefit from resource optimization.
Time Efficiency:
- Delivery times vary significantly by route type, season, and traffic conditions.
OSRM vs. Actual Metrics:
- Discrepancies between OSRM-calculated and actual times/distances highlight areas for improving routing algorithms.

Tools and Libraries

This project utilized the following tools:

Python:
- Pandas for data manipulation.
- Matplotlib and Seaborn for visualization.
- Sklearn for preprocessing and scaling.
Jupyter Notebook: For interactive analysis and documentation.

Repository Structure

data/: Contains the dataset used for analysis.
notebooks/: Jupyter Notebooks documenting the analysis process.
visualizations/: Saved plots and charts.
README.md: Overview of the project (this file).

Next Steps

Future directions for this project include:

Developing predictive models for delivery time and distance.
Investigating patterns in the unknown fields (is_cutoff, cutoff_factor, etc.).
Implementing clustering techniques to identify high-demand routes.

Acknowledgments

Dataset Source: Provided by Scaler for this analysis.
Python Libraries: Thanks to the open-source Python community for providing versatile data analysis tools.

License

This project is licensed for educational and non-commercial use only. If utilizing any part of this repository, please credit the author.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Delhivery_Hypothesis_Analysis - Jupyter Notebook.pdf		Delhivery_Hypothesis_Analysis - Jupyter Notebook.pdf
Delhivery_Hypothesis_Analysis.ipynb		Delhivery_Hypothesis_Analysis.ipynb
README.md		README.md
segmented_data.csv		segmented_data.csv
trip_level_data.csv		trip_level_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delhivery Data Analysis

About Delhivery

Objective

Dataset

Key Features:

Additional Fields:

Process Overview

1. Feature Engineering:

2. Data Cleaning:

3. Categorical Feature Handling:

4. Normalization and Standardization:

Key Insights

Tools and Libraries

Repository Structure

Next Steps

Acknowledgments

License

About

Uh oh!

Releases

Packages

Languages

sayed-ashfaq/Delhivery-DataAnalysis

Folders and files

Latest commit

History

Repository files navigation

Delhivery Data Analysis

About Delhivery

Objective

Dataset

Key Features:

Additional Fields:

Process Overview

1. Feature Engineering:

2. Data Cleaning:

3. Categorical Feature Handling:

4. Normalization and Standardization:

Key Insights

Tools and Libraries

Repository Structure

Next Steps

Acknowledgments

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages