This Python project provides two scripts for capturing images, performing Optical Character Recognition (OCR) to extract text, and using Text-to-Speech (TTS) to read the text aloud. One script uses a standard USB webcam, and the other uses a mobile phone's camera stream. Both are controlled by a physical button connected to a Raspberry Pi's GPIO pin.
Image_To_Speech_Transcript_Webcam.py: Uses a USB webcam.Image_To_Speech_Transcript.py: Uses a mobile phone camera stream (e.g., via an IP Webcam app).
- Dual Camera Input: Supports both USB webcams and mobile phone camera streams.
- Button-Controlled Operation:
- First button press: Starts the live preview from the selected camera source.
- Second button press: Captures an image.
- Image Preprocessing (Mobile Stream Version): The mobile stream script includes image resizing, grayscale conversion, and various filtering techniques to enhance OCR accuracy.
- OCR: Uses
pytesseractto extract text from captured images. - Text Cleaning (Mobile Stream Version): The mobile stream script includes a step to remove special characters from the extracted text.
- Text-to-Speech: Uses
espeakto convert extracted text into audible speech. - Raspberry Pi GPIO: Utilizes a button connected to a GPIO pin for control.
- Raspberry Pi (with Raspberry Pi OS or similar) or a Linux system for the webcam version (GPIO functionality is RPi-specific).
- Python 3
pytesseractTesseract OCRengine (and its language data, e.g.,tesseract-ocr-eng)OpenCV(cv2)Pillow(PIL)RPi.GPIO(for Raspberry Pi button input)espeak- For Webcam Version: A USB webcam connected to the system.
- For Mobile Stream Version: An IP Webcam application running on a mobile phone on the same network as the Raspberry Pi.
- Install Dependencies:
sudo apt-get update sudo apt-get install tesseract-ocr espeak python3-opencv python3-pil python3-rpi.gpio # For Raspberry Pi # For other Linux systems running the webcam script without GPIO, python3-rpi.gpio might not be needed. pip3 install pytesseract
- Hardware (Raspberry Pi for button input):
- Connect a momentary push button to the GPIO pin specified by
BUTTON_PIN(default is 17 in both scripts) and to a ground pin on the Raspberry Pi. - Webcam script (
Image_To_Speech_Transcript_Webcam.py): The script checks forGPIO.input(BUTTON_PIN) == GPIO.HIGH. This might imply a pull-down resistor setup or an external pull-up resistor where the button connects the pin to VCC. If using the common internal pull-up (pull_up_down=GPIO.PUD_UPwhere button connects to GND), you'll need to change the check toGPIO.LOW. - Mobile stream script (
Image_To_Speech_Transcript.py): The script usespull_up_down=GPIO.PUD_UPand checks forGPIO.input(BUTTON_PIN) == GPIO.LOW, which is consistent for a button connecting the GPIO pin to ground when pressed.
- Connect a momentary push button to the GPIO pin specified by
- Configure Mobile Stream (for
Image_To_Speech_Transcript.py):- Install and run an IP webcam app on your mobile phone.
- Note the IP address and port of the stream.
- Update the
ipandportvariables inImage_To_Speech_Transcript.py:ip = "YOUR_MOBILE_IP_ADDRESS" # e.g., "192.168.1.5" port = "YOUR_MOBILE_STREAM_PORT" # e.g., "8080" # Ensure the stream_url format (f"http://{ip}:{port}/video") matches your app.
- Webcam (for
Image_To_Speech_Transcript_Webcam.py):- Connect your USB webcam to the system.
-
Navigate to the script's directory.
-
Run the desired script:
- For Webcam:
python3 Image_To_Speech_Transcript_Webcam.py - For Mobile Stream:
python3 Image_To_Speech_Transcript.py
- For Webcam:
-
Operation:
- The script will print "Waiting for button press(es)...".
- First button press: Initiates the camera. A live preview window should appear for the mobile stream version. The webcam version initializes the camera but might not show a preview window by default in its current form.
- Second button press: Captures an image, saves it as
captured_image.jpg, performs OCR, and speaks the extracted text. - For the mobile stream version, the preview window can be closed by pressing the
ESCkey.
Common Elements:
BUTTON_PIN: Defines the GPIO pin for button input (default 17).capture_image(): Captures and saves an image. Releases camera resources.text_to_speech(): Usesespeakfor TTS.- Main loop (
if __name__ == "__main__"): Handles button press logic and workflow.
Image_To_Speech_Transcript.py (Mobile Stream):
ip,port,stream_url: Configure the mobile camera stream.start_preview(): Connects to the mobile stream URL usingcv2.VideoCapture(stream_url)and handles live preview display.image_to_text(): Includes significant image preprocessing (resize, grayscale, various filters, adaptive thresholding) and text cleaning using regular expressions.
Image_To_Speech_Transcript_Webcam.py (Webcam):
start_preview(): Initializes the default webcam usingcv2.VideoCapture(0).image_to_text(): Simpler OCR process, opens the image with Pillow and passes it directly topytesseract.
- Button Pin: Change
BUTTON_PINin the respective script if you use a different GPIO pin. - Button Logic: Adjust the
if GPIO.input(BUTTON_PIN) == ...condition in the main loop ofImage_To_Speech_Transcript_Webcam.pyif your button wiring for that script is different (e.g., toGPIO.LOWif using internal pull-up and button to ground). - Webcam Index (for Webcam script): If you have multiple webcams, modify
cv2.VideoCapture(0)instart_previewto the correct index (e.g.,1). - OCR Language: For
Image_To_Speech_Transcript.py, modifyconfig = "--oem 3 --psm 6 -l eng"inimage_to_text. ForImage_To_Speech_Transcript_Webcam.py, modifypytesseract.image_to_string(img)topytesseract.image_to_string(img, lang='your_lang_code'). Ensure corresponding Tesseract language data is installed. - TTS Voice/Speed: Adjust
speedandvoiceparameters intext_to_speechcalls. - Image Preprocessing: To improve OCR for the webcam version, consider adding preprocessing steps from the mobile stream script's
image_to_textfunction into the webcam script'simage_to_text.
- "Error: Could not access mobile stream." (Mobile Stream Script):
- Verify mobile and Raspberry Pi are on the same Wi-Fi.
- Check IP/port in the script and webcam app.
- Firewall or app status.
- "Error: Could not access the webcam." (Webcam Script):
- Webcam connected and recognized (
lsusb). - Try different webcam index if multiple are present.
- Ensure no other app is using the webcam.
- Check permissions (user in
videogroup or run withsudo).
- Webcam connected and recognized (
- Poor OCR Accuracy:
- Ensure clear, well-lit images.
- For webcam: Add image preprocessing.
- For both: Ensure correct Tesseract language model.
- GPIO Issues (Raspberry Pi):
- Double-check wiring and
GPIO.setupconfiguration. - Ensure
RPi.GPIOis installed.
- Double-check wiring and
Raspberry Pi with button and USB webcam connected.
Live preview from mobile phone camera stream.
Extracted text from a captured image displayed in the terminal.
