This project implements a real-time semantic segmentation system using a deep learning model. The system captures live video feed from a camera and performs pixel-wise classification to identify various objects and their boundaries in the scene.
-
Dependencies:
- Python 3.x
- OpenCV
- PyTorch
- PIL
- torchvision
- scipy
- numpy
-
Installing Required Libraries:
pip install opencv-python-headless torch torchvision Pillow numpy scipy
-
Model Weights: Download the model weights (
encoder_epoch.pthanddecoder_epoch.pth) from the provided link and place them in the project directory. -
Running the Application:
- Execute the main script to start the real-time semantic segmentation:
python main_script.py
- The application opens a window displaying the live camera feed.
- Press
Spaceto toggle real-time mode. - Press
TABto toggle between different visualizations. - Press
1-9ora-fto choose specific classes for segmentation visualization. - Press
Escto exit the application. - Press
sto save the current frame and segmentation result.
The purpose of this project is to demonstrate the capabilities of semantic segmentation in real-time applications. It can be used for educational purposes or as a base for more complex computer vision projects.
- Model training and architecture are based on CSAILVision's Semantic Segmentation PyTorch.
- Data used: ADE20K MIT Scene Parsing Benchchmark.