This Python program extracts text and LaTeX code from screenshots, which can be useful when dealing with poorly formatted PDFs or complex scientific documents. The program leverages Optical Character Recognition (OCR) technology, temporary file management, and user-friendly GUI for streamlined and efficient text and LaTeX extraction.
- Screenshot Capture: Capture screenshots of selected screen areas and save them as temporary files.
- OCR Engine (Pytesseract): Extract text from the captured screenshots using the Pytesseract library.
- LaTeX Engine (Modified Pix2tex from Forked LatexOCR): Extract LaTeX code from the captured temporary screenshots based on the LaTeX-OCR (modified pix2tex, a forked LatexOCR model).
- Clipboard Management: Append the recognized text or LaTeX code to the clipboard in an organized manner. Includes an easy-clear button for clearing the clipboard content.
- User Interface: A lightweight, easy-to-use interface built with tkinter.
- Clone the repository and navigate to the project folder:
git clone https://github.com/yourusername/mac_screenshot_ocr_latex.git
cd mac_screenshot_ocr_latex- Install the required libraries:
pip install -r requirements.txt-
Clone and integrate the modified LatexOCR model (https://github.com/rawcsav/LaTeX-OCR) for use with the
latex_engine.pyfile. -
Modify the
capture_tools.py,ocr_engine.py, andlatex_engine.pyfiles to include your custom configurations, if necessary. -
Run the
main_app.pyscript to start the application:
python main_app.py-
Open the application and choose your desired recognition mode (Text OCR or LaTeX OCR).
-
Press "Enter" or click the corresponding button to capture a screenshot of the desired content.
-
The recognized text or translated LaTeX code will be appended to your clipboard.
-
Optional: Click "Clear Clipboard" to erase the clipboard content.
-
Paste the extracted content into your preferred application.
Thanks to Lukas Belcher and his foundational Latex model (https://github.com/lukas-blecher/LaTeX-OCR)!
While this solution performs well in recognizing standard text, it may struggle with complex scientific symbols, mathematical notations, or technical expressions. In the future, specialized OCR models can be developed to handle such content more accurately. Additionally, improvements can be made to preprocessing and post-processing techniques to further enhance the quality and readability of extracted text.