Skip to content

Piero24/Normal-Angle-Package-Picking

Repository files navigation



A computer vision pipeline to find optimal picking surfaces and normal vectors on packages for robotic manipulation, using a single RGB image.
View a Demo »

Report BugRequest Feature




📔 Introduction

This project demonstrates a complete pipeline to identify optimal picking surfaces on packages and estimate their surface normals for robotic manipulation. It combines modern deep learning models for monocular depth estimation (Depth Anything V2) and instance segmentation (Segment Anything Model - SAM) to process a single RGB image and produce actionable 3D orientation data for a robotic gripper.


The core challenge in robotic picking is moving beyond a simple 2D bounding box to understand the 3D geometry of an object. This pipeline achieves this by fusing 2D segmentation with 3D depth data. A key innovation is performing segmentation on the depth map instead of the RGB image. This allows the model to separate objects based on geometric discontinuities, making it more robust against variations in color, texture, or lighting.



The pipeline is designed to find the most suitable, flat surfaces for picking. Instead of using an entire object mask, it converts the object's depth data into a point cloud and applies the RANSAC algorithm. This robustly isolates the largest flat region on a package's surface, which is ideal for a suction gripper, while ignoring noise from curves, wrinkles, and edges.

Note

The pipeline is demonstrated using static images of packages, but the methodology is designed for integration into a real-time robotic system. The depth estimation is relative; for real-world applications, a calibrated RGB-D camera would provide metric depth for precise positioning.


PIPELINE OVERVIEW: The program takes an RGB image as input and processes it through a multi-stage pipeline to generate actionable picking data.

  1. Depth Estimation

    The input RGB image is processed by the Depth Anything V2 model to generate a high-quality relative depth map of the scene.

  2. Segmentation on Depth Map

    The generated depth map is fed into the Segment Anything Model (SAM), which identifies all potential object boundaries based on geometric information.

  3. Mask Filtering & Deduplication

    Raw masks from SAM are filtered by size and confidence score. A non-maximum suppression step then removes redundant, overlapping masks to ensure each package is processed only once.

  4. Optimal Surface & Normal Calculation

    For each unique package, its depth data is converted to a 3D point cloud. The RANSAC algorithm fits a plane to find the largest flat surface (the optimal picking area). The normal vector is derived from this plane and oriented to point outwards from the surface.

  5. Visualization

    The final output overlays the results on the original image, highlighting the optimal picking surface in green and drawing the normal vector as a magenta arrow, indicating the ideal approach for a robotic arm.



Try a demo on Google Colab


🛠 Built With

This project is written in Python and leverages several state-of-the-art deep learning models and libraries.

PythonPyTorchTransformersSegment AnythingOpenCVOpen3D



📚 Improvements & Future Work

While the current pipeline is effective, several key areas could be enhanced to improve its accuracy, robustness, and real-world applicability for robotics.

  • Metric Depth with RGB-D Cameras: The most significant improvement would be to replace monocular depth estimation with a true RGB-D camera (e.g., Intel RealSense, Azure Kinect). This would provide a metric depth map (in meters) and known camera intrinsics, dramatically increasing the accuracy of the 3D data.
  • Advanced Picking Point Selection: The current method uses the centroid of the flattest surface. A more advanced approach would be to calculate the "pole of inaccessibility" (the point furthest from any edge within the optimal surface), providing a more stable picking point.
  • Semantic Filtering: The current filtering relies on heuristics like area and IoU. This could be improved by adding a simple classifier to distinguish between "package" and "floor" segments, making the filtering more robust.
  • Performance Optimization: For real-time applications, model inference speed is critical. The deep learning models could be optimized using tools like TensorRT (for NVIDIA GPUs) or ONNX Runtime to achieve faster performance on target hardware.


🧰 Prerequisites

To run this project, you will need a Python environment and the required libraries. A CUDA-enabled GPU is recommended for faster model inference, but the code will automatically fall back to the CPU if one is not available.

Warning

The segment-anything library requires a specific Python version range. Please ensure your environment uses Python >= 3.10 and < 3.12.

To install all necessary dependencies, run the following command:

pip install -q transformers requests matplotlib numpy opencv-python torch torchvision open3d segment_anything


⚙️ How to Start

The project is structured as a Jupyter Notebook, which makes it easy to step through the pipeline and visualize the intermediate results.


  1. Clone the repo
git clone https://github.com/Piero24/Normal-Angle-Package-Picking.git
  1. Navigate to the project directory
cd Normal-Angle-Package-Picking
  1. Install the dependencies as described in the Prerequisites section.
  2. Open the Normal_Angle_Package_Picking_Pipeline.ipynb notebook in a compatible environment (like Jupyter Lab or Google Colab).
  3. Set the Image Path: In Block 4 of the notebook, change the image_path variable to point to your desired input image.
    # Block 4: Image Definition
    image_path = "path/to/your/image.jpeg"
  4. Run the Notebook: Execute the cells sequentially from top to bottom. The first run will download the model weights for SAM, which may take a few moments.

Note

The notebook is self-contained and includes detailed explanations for each code block, from model initialization to final visualization. You can modify parameters in Block 11 to tune the mask filtering for different types of scenes.



📮 Responsible Disclosure

We assume no responsibility for an improper use of this code and everything related to it. We do not assume any responsibility for damage caused to people and / or objects in the use of the code.

By using this code even in a small part, the developers are declined from any responsibility.

It is possible to have more information by viewing the following links: Code of conductLicense


🐛 Bug and Feature

To report a bug or to request the implementation of new features, it is strongly recommended to use the ISSUES tool from Github »


Here you may already find the answer to the problem you have encountered, in case it has already happened to other people. Otherwise you can report the bugs found.


ATTENTION: To speed up the resolution of problems, it is recommended to answer all the questions present in the request phase in an exhaustive manner.

(Even in the phase of requests for the implementation of new functions, we ask you to better specify the reasons for the request and what final result you want to obtain).




🔍 License

MIT LICENSE

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including...

License Documentation »


📌 Third Party Licenses

In the event that the software uses third-party components for its operation,
the individual licenses are indicated in the following section.

Software list:

Software License owner License type Link
Depth Anything V2 LiheYoung Apache-2.0 license here
Segment Anything Meta AI Apache-2.0 license here
PyTorch PyTorch BSD-style here
Transformers Hugging Face Apache-2.0 license here
OpenCV OpenCV Apache-2.0 license here
Open3D Open3D MIT here


Copyrright (C) by Pietrobon Andrea
Released date: 15-09-2024

About

A pipeline to identify optimal picking surfaces on packages and estimate their surface normals for robotic manipulation. by using Depth Anything V2 and Segment Anything Model - SAM.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published