Cleaning Dirty Books: Post-OCR Processing for Previously Scanned Texts Accompanying code for Findings of EMNLP 2021 paper: Cleaning Dirty Books: Post-OCR Processing for Previously Scanned Texts Due to copyright issues, the book alignment data cannot be readily distributed.