Package Picking Pipeline

A computer vision pipeline to find optimal picking surfaces and normal vectors on packages for robotic manipulation, using a single RGB image.
View a Demo »

Report Bug • Request Feature

📔 Introduction

This project demonstrates a complete pipeline to identify optimal picking surfaces on packages and estimate their surface normals for robotic manipulation. It combines modern deep learning models for monocular depth estimation (Depth Anything V2) and instance segmentation (Segment Anything Model - SAM) to process a single RGB image and produce actionable 3D orientation data for a robotic gripper.

The core challenge in robotic picking is moving beyond a simple 2D bounding box to understand the 3D geometry of an object. This pipeline achieves this by fusing 2D segmentation with 3D depth data. A key innovation is performing segmentation on the depth map instead of the RGB image. This allows the model to separate objects based on geometric discontinuities, making it more robust against variations in color, texture, or lighting.

The pipeline is designed to find the most suitable, flat surfaces for picking. Instead of using an entire object mask, it converts the object's depth data into a point cloud and applies the RANSAC algorithm. This robustly isolates the largest flat region on a package's surface, which is ideal for a suction gripper, while ignoring noise from curves, wrinkles, and edges.

Note

The pipeline is demonstrated using static images of packages, but the methodology is designed for integration into a real-time robotic system. The depth estimation is relative; for real-world applications, a calibrated RGB-D camera would provide metric depth for precise positioning.

PIPELINE OVERVIEW: The program takes an RGB image as input and processes it through a multi-stage pipeline to generate actionable picking data.

Depth Estimation
The input RGB image is processed by the Depth Anything V2 model to generate a high-quality relative depth map of the scene.
Segmentation on Depth Map
The generated depth map is fed into the Segment Anything Model (SAM), which identifies all potential object boundaries based on geometric information.
Mask Filtering & Deduplication
Raw masks from SAM are filtered by size and confidence score. A non-maximum suppression step then removes redundant, overlapping masks to ensure each package is processed only once.
Optimal Surface & Normal Calculation
For each unique package, its depth data is converted to a 3D point cloud. The RANSAC algorithm fits a plane to find the largest flat surface (the optimal picking area). The normal vector is derived from this plane and oriented to point outwards from the surface.
Visualization
The final output overlays the results on the original image, highlighting the optimal picking surface in green and drawing the normal vector as a magenta arrow, indicating the ideal approach for a robotic arm.

Try a demo on Google Colab

⇧

🛠 Built With

This project is written in Python and leverages several state-of-the-art deep learning models and libraries.

Python • PyTorch • Transformers • Segment Anything • OpenCV • Open3D

⇧

📚 Improvements & Future Work

While the current pipeline is effective, several key areas could be enhanced to improve its accuracy, robustness, and real-world applicability for robotics.

Metric Depth with RGB-D Cameras: The most significant improvement would be to replace monocular depth estimation with a true RGB-D camera (e.g., Intel RealSense, Azure Kinect). This would provide a metric depth map (in meters) and known camera intrinsics, dramatically increasing the accuracy of the 3D data.
Advanced Picking Point Selection: The current method uses the centroid of the flattest surface. A more advanced approach would be to calculate the "pole of inaccessibility" (the point furthest from any edge within the optimal surface), providing a more stable picking point.
Semantic Filtering: The current filtering relies on heuristics like area and IoU. This could be improved by adding a simple classifier to distinguish between "package" and "floor" segments, making the filtering more robust.
Performance Optimization: For real-time applications, model inference speed is critical. The deep learning models could be optimized using tools like TensorRT (for NVIDIA GPUs) or ONNX Runtime to achieve faster performance on target hardware.

⇧

🧰 Prerequisites

To run this project, you will need a Python environment and the required libraries. A CUDA-enabled GPU is recommended for faster model inference, but the code will automatically fall back to the CPU if one is not available.

Warning

The segment-anything library requires a specific Python version range. Please ensure your environment uses Python >= 3.10 and < 3.12.

To install all necessary dependencies, run the following command:

pip install -q transformers requests matplotlib numpy opencv-python torch torchvision open3d segment_anything

⇧

⚙️ How to Start

The project is structured as a Jupyter Notebook, which makes it easy to step through the pipeline and visualize the intermediate results.

Clone the repo

git clone https://github.com/Piero24/Normal-Angle-Package-Picking.git

Navigate to the project directory

cd Normal-Angle-Package-Picking

Install the dependencies as described in the Prerequisites section.
Open the Normal_Angle_Package_Picking_Pipeline.ipynb notebook in a compatible environment (like Jupyter Lab or Google Colab).
Set the Image Path: In Block 4 of the notebook, change the image_path variable to point to your desired input image.
```
# Block 4: Image Definition
image_path = "path/to/your/image.jpeg"
```
Run the Notebook: Execute the cells sequentially from top to bottom. The first run will download the model weights for SAM, which may take a few moments.

Note

The notebook is self-contained and includes detailed explanations for each code block, from model initialization to final visualization. You can modify parameters in Block 11 to tune the mask filtering for different types of scenes.

⇧

📮 Responsible Disclosure

We assume no responsibility for an improper use of this code and everything related to it. We do not assume any responsibility for damage caused to people and / or objects in the use of the code.

By using this code even in a small part, the developers are declined from any responsibility.

It is possible to have more information by viewing the following links: Code of conduct • License

⇧

🐛 Bug and Feature

To report a bug or to request the implementation of new features, it is strongly recommended to use the ISSUES tool from Github »

Here you may already find the answer to the problem you have encountered, in case it has already happened to other people. Otherwise you can report the bugs found.

ATTENTION: To speed up the resolution of problems, it is recommended to answer all the questions present in the request phase in an exhaustive manner.

(Even in the phase of requests for the implementation of new functions, we ask you to better specify the reasons for the request and what final result you want to obtain).

⇧

🔍 License

MIT LICENSE

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including...

License Documentation »

⇧

📌 Third Party Licenses

In the event that the software uses third-party components for its operation,
the individual licenses are indicated in the following section.

Software list:

Software	License owner	License type	Link
Depth Anything V2	LiheYoung	Apache-2.0 license	here
Segment Anything	Meta AI	Apache-2.0 license	here
PyTorch	PyTorch	BSD-style	here
Transformers	Hugging Face	Apache-2.0 license	here
OpenCV	OpenCV	Apache-2.0 license	here
Open3D	Open3D	MIT	here

⇧

Copyrright (C) by Pietrobon Andrea
Released date: 15-09-2024

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
.gitignore		.gitignore
IMG_9102.jpeg		IMG_9102.jpeg
IMG_9103.jpeg		IMG_9103.jpeg
IMG_9104.jpeg		IMG_9104.jpeg
LICENSE		LICENSE
Normal_Angle_Package_Picking.ipynb		Normal_Angle_Package_Picking.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Package Picking Pipeline

📔 Introduction

🛠 Built With

📚 Improvements & Future Work

🧰 Prerequisites

⚙️ How to Start

📮 Responsible Disclosure

🐛 Bug and Feature

🔍 License

📌 Third Party Licenses

About

Uh oh!

Releases

Packages

Languages

License

Piero24/Normal-Angle-Package-Picking

Folders and files

Latest commit

History

Repository files navigation

Package Picking Pipeline

📔 Introduction

🛠 Built With

📚 Improvements & Future Work

🧰 Prerequisites

⚙️ How to Start

📮 Responsible Disclosure

🐛 Bug and Feature

🔍 License

📌 Third Party Licenses

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages