Skip to content

omwanere/Image-to-Speech-Transcript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Image to Speech (Webcam & Mobile Stream)

This Python project provides two scripts for capturing images, performing Optical Character Recognition (OCR) to extract text, and using Text-to-Speech (TTS) to read the text aloud. One script uses a standard USB webcam, and the other uses a mobile phone's camera stream. Both are controlled by a physical button connected to a Raspberry Pi's GPIO pin.

Scripts

  1. Image_To_Speech_Transcript_Webcam.py: Uses a USB webcam.
  2. Image_To_Speech_Transcript.py: Uses a mobile phone camera stream (e.g., via an IP Webcam app).

Features

  • Dual Camera Input: Supports both USB webcams and mobile phone camera streams.
  • Button-Controlled Operation:
    • First button press: Starts the live preview from the selected camera source.
    • Second button press: Captures an image.
  • Image Preprocessing (Mobile Stream Version): The mobile stream script includes image resizing, grayscale conversion, and various filtering techniques to enhance OCR accuracy.
  • OCR: Uses pytesseract to extract text from captured images.
  • Text Cleaning (Mobile Stream Version): The mobile stream script includes a step to remove special characters from the extracted text.
  • Text-to-Speech: Uses espeak to convert extracted text into audible speech.
  • Raspberry Pi GPIO: Utilizes a button connected to a GPIO pin for control.

Prerequisites

  • Raspberry Pi (with Raspberry Pi OS or similar) or a Linux system for the webcam version (GPIO functionality is RPi-specific).
  • Python 3
  • pytesseract
  • Tesseract OCR engine (and its language data, e.g., tesseract-ocr-eng)
  • OpenCV (cv2)
  • Pillow (PIL)
  • RPi.GPIO (for Raspberry Pi button input)
  • espeak
  • For Webcam Version: A USB webcam connected to the system.
  • For Mobile Stream Version: An IP Webcam application running on a mobile phone on the same network as the Raspberry Pi.

Setup

  1. Install Dependencies:
    sudo apt-get update
    sudo apt-get install tesseract-ocr espeak python3-opencv python3-pil python3-rpi.gpio # For Raspberry Pi
    # For other Linux systems running the webcam script without GPIO, python3-rpi.gpio might not be needed.
    pip3 install pytesseract
  2. Hardware (Raspberry Pi for button input):
    • Connect a momentary push button to the GPIO pin specified by BUTTON_PIN (default is 17 in both scripts) and to a ground pin on the Raspberry Pi.
    • Webcam script (Image_To_Speech_Transcript_Webcam.py): The script checks for GPIO.input(BUTTON_PIN) == GPIO.HIGH. This might imply a pull-down resistor setup or an external pull-up resistor where the button connects the pin to VCC. If using the common internal pull-up (pull_up_down=GPIO.PUD_UP where button connects to GND), you'll need to change the check to GPIO.LOW.
    • Mobile stream script (Image_To_Speech_Transcript.py): The script uses pull_up_down=GPIO.PUD_UP and checks for GPIO.input(BUTTON_PIN) == GPIO.LOW, which is consistent for a button connecting the GPIO pin to ground when pressed.
  3. Configure Mobile Stream (for Image_To_Speech_Transcript.py):
    • Install and run an IP webcam app on your mobile phone.
    • Note the IP address and port of the stream.
    • Update the ip and port variables in Image_To_Speech_Transcript.py:
      ip = "YOUR_MOBILE_IP_ADDRESS"  # e.g., "192.168.1.5"
      port = "YOUR_MOBILE_STREAM_PORT" # e.g., "8080"
      # Ensure the stream_url format (f"http://{ip}:{port}/video") matches your app.
  4. Webcam (for Image_To_Speech_Transcript_Webcam.py):
    • Connect your USB webcam to the system.

Running the Scripts

  1. Navigate to the script's directory.

  2. Run the desired script:

    • For Webcam: python3 Image_To_Speech_Transcript_Webcam.py
    • For Mobile Stream: python3 Image_To_Speech_Transcript.py
  3. Operation:

    • The script will print "Waiting for button press(es)...".
    • First button press: Initiates the camera. A live preview window should appear for the mobile stream version. The webcam version initializes the camera but might not show a preview window by default in its current form.
    • Second button press: Captures an image, saves it as captured_image.jpg, performs OCR, and speaks the extracted text.
    • For the mobile stream version, the preview window can be closed by pressing the ESC key.

Code Overview (Key Differences and Common Functions)

Common Elements:

  • BUTTON_PIN: Defines the GPIO pin for button input (default 17).
  • capture_image(): Captures and saves an image. Releases camera resources.
  • text_to_speech(): Uses espeak for TTS.
  • Main loop (if __name__ == "__main__"): Handles button press logic and workflow.

Image_To_Speech_Transcript.py (Mobile Stream):

  • ip, port, stream_url: Configure the mobile camera stream.
  • start_preview(): Connects to the mobile stream URL using cv2.VideoCapture(stream_url) and handles live preview display.
  • image_to_text(): Includes significant image preprocessing (resize, grayscale, various filters, adaptive thresholding) and text cleaning using regular expressions.

Image_To_Speech_Transcript_Webcam.py (Webcam):

  • start_preview(): Initializes the default webcam using cv2.VideoCapture(0).
  • image_to_text(): Simpler OCR process, opens the image with Pillow and passes it directly to pytesseract.

Customization

  • Button Pin: Change BUTTON_PIN in the respective script if you use a different GPIO pin.
  • Button Logic: Adjust the if GPIO.input(BUTTON_PIN) == ... condition in the main loop of Image_To_Speech_Transcript_Webcam.py if your button wiring for that script is different (e.g., to GPIO.LOW if using internal pull-up and button to ground).
  • Webcam Index (for Webcam script): If you have multiple webcams, modify cv2.VideoCapture(0) in start_preview to the correct index (e.g., 1).
  • OCR Language: For Image_To_Speech_Transcript.py, modify config = "--oem 3 --psm 6 -l eng" in image_to_text. For Image_To_Speech_Transcript_Webcam.py, modify pytesseract.image_to_string(img) to pytesseract.image_to_string(img, lang='your_lang_code'). Ensure corresponding Tesseract language data is installed.
  • TTS Voice/Speed: Adjust speed and voice parameters in text_to_speech calls.
  • Image Preprocessing: To improve OCR for the webcam version, consider adding preprocessing steps from the mobile stream script's image_to_text function into the webcam script's image_to_text.

Troubleshooting

  • "Error: Could not access mobile stream." (Mobile Stream Script):
    • Verify mobile and Raspberry Pi are on the same Wi-Fi.
    • Check IP/port in the script and webcam app.
    • Firewall or app status.
  • "Error: Could not access the webcam." (Webcam Script):
    • Webcam connected and recognized (lsusb).
    • Try different webcam index if multiple are present.
    • Ensure no other app is using the webcam.
    • Check permissions (user in video group or run with sudo).
  • Poor OCR Accuracy:
    • Ensure clear, well-lit images.
    • For webcam: Add image preprocessing.
    • For both: Ensure correct Tesseract language model.
  • GPIO Issues (Raspberry Pi):
    • Double-check wiring and GPIO.setup configuration.
    • Ensure RPi.GPIO is installed.

Example Setup and Usage

Hardware Setup

Raspberry Pi with Button and Webcam Raspberry Pi with button and USB webcam connected.

Mobile Stream Preview

Mobile Stream Preview Live preview from mobile phone camera stream.

Webcam Preview

Webcam Preview Live preview from USB webcam.

Example OCR Result

OCR Result Example Extracted text from a captured image displayed in the terminal.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages