eMOP Github Repo

A tool for identifying and transcribing paratext on a page image in TypeWright.
A robust image comparison environment, presenting versions of texts in filmstrip view along side each other and collating these images of different texts while allowing users to adjust the collation.
The code that implements the entire eMOP workflow.
The online dashboard that powers the eMOP workflow.
A tool created for eMOP that allows users to create training for Tesseract with their own typeface samples.
A tool created for eMOP post-processing that removes noise from Tesseract's hOCR output.
A command line version of Juxta that compares OCR output to groundtruth files.
A tool created for eMOP that uses dictionary files and a google 3-gram DB to correct Tesseract output.
A tool created for eMOP that evaluates OCR output to determine how correctable it is.
Printer, Seller, and location information culled from the imprint lines of the entire eMOP dataset. These XML files (EEBO and ECCO separately) contain only those entries for which we have an ESTC number.
A tool created for eMOP that compares OCR output to groundtruth files.
A collection of training created for Tesseract by eMOP using Franken+.