Delhivery is the largest and fastest-growing fully integrated logistics provider in India as of Fiscal 2021. The company aims to build the operating system for commerce through a blend of world-class infrastructure, high-quality logistics operations, and cutting-edge engineering and technology capabilities.
The data team at Delhivery leverages vast datasets to enhance business intelligence, drive operational efficiency, and maintain profitability, creating a significant competitive edge.
The goal of this project is to process and analyze data generated by Delhivery's logistics operations to:
- Clean, sanitize, and manipulate raw data to derive actionable insights.
- Create useful features for the data science team to develop forecasting models.
The dataset consists of records from Delhivery's logistics and operational data pipeline.
data: Indicates if the record is training or testing data.trip_creation_time: Timestamp of trip creation.route_schedule_uuid: Unique identifier for a route schedule.route_type: Type of transportation (FTL,Carting).- FTL: Full Truck Load shipments, faster delivery as there are no intermediate pickups/drop-offs.
- Carting: Delivery system using smaller vehicles (carts).
trip_uuid: Unique identifier for a trip (a trip can involve multiple source and destination centers).source_center: ID of the trip's origin center.source_name: Name of the trip's origin center.destination_center: ID of the destination center.destination_name: Name of the destination center.od_start_time: Trip start time.od_end_time: Trip end time.start_scan_to_end_scan: Total time taken for delivery from source to destination.actual_distance_to_destination: Actual distance in kilometers between source and destination.actual_time: Cumulative time taken to complete the delivery.osrm_time: Time calculated by the Open-Source Routing Machine (OSRM) considering shortest paths and typical traffic conditions (cumulative).osrm_distance: Distance calculated by OSRM (cumulative).segment_actual_time: Time taken for a segment of the delivery.segment_osrm_time: OSRM-calculated time for a delivery segment.segment_osrm_distance: OSRM-calculated distance for a delivery segment.
Some fields with currently unclear meanings, like is_cutoff, cutoff_factor, cutoff_timestamp, and factor, are included for completeness and may be explored further.
- Derived meaningful metrics such as:
time_diff_hours: Time difference betweenod_start_timeandod_end_time.- Extracted components from timestamps (e.g., month, year, day of the week).
- Split and standardized source and destination names into city, place code, and state.
- Handled missing values using appropriate imputation techniques.
- Addressed
outlierswith boxplots and theIQRmethod.
- Applied one-hot encoding to variables like
route_typefor better interpretability in downstream models.
- Used MinMaxScaler and StandardScaler for numerical columns to align features to a uniform scale.
-
Route Type Insights:
- FTL routes are faster and more efficient for long distances compared to Carting.
-
Source and Destination Patterns:
- High-frequency routes indicate key operational hubs that could benefit from resource optimization.
-
Time Efficiency:
- Delivery times vary significantly by route type, season, and traffic conditions.
-
OSRM vs. Actual Metrics:
- Discrepancies between OSRM-calculated and actual times/distances highlight areas for improving routing algorithms.
This project utilized the following tools:
- Python:
Pandasfor data manipulation.MatplotlibandSeabornfor visualization.Sklearnfor preprocessing and scaling.
- Jupyter Notebook: For interactive analysis and documentation.
data/: Contains the dataset used for analysis.notebooks/: Jupyter Notebooks documenting the analysis process.visualizations/: Saved plots and charts.README.md: Overview of the project (this file).
Future directions for this project include:
- Developing predictive models for delivery time and distance.
- Investigating patterns in the unknown fields (
is_cutoff,cutoff_factor, etc.). - Implementing clustering techniques to identify high-demand routes.
- Dataset Source: Provided by Scaler for this analysis.
- Python Libraries: Thanks to the open-source Python community for providing versatile data analysis tools.
This project is licensed for educational and non-commercial use only. If utilizing any part of this repository, please credit the author.