News

EEBO now in TypeWright

Submitted by egrumbac on Thu, 03/10/2016 - 15:11

EEBO in TypeWright

We are pleased to announce that the Mellon-funded Early Modern OCR Project – eMOP – has completed running Optical Character Recognition Software on the 138,538 documents in ProQuest’s Early English Books Online (EEBO), and we are now making almost all of them available in 18thConnect.org for correcting the OCR. Some document images were too poor to run through the software, but we have loaded the resulting “dirty OCR” for 113,909 documents into the TypeWright tool at 18thConnect.org for crowd-sourced correction (http://www.18thconnect.org/typewright/documents).

Read more about EEBO now in TypeWright

eMOP Mellon Final Report

Submitted by mchristy on Mon, 11/23/2015 - 10:23

Read more about eMOP Mellon Final Report

eMOP Releases its Full Set of Early Modern Typeface Training for Tesseract

Submitted by mchristy on Mon, 02/16/2015 - 12:23

In accordance with Andrew W. Mellon Foundation grant requirements and IDHMC guiding principles, the Early Modern OCR Project has released all of the Early Modern Typeface Training we created for use with the Tesseract OCR engine.

Read more about eMOP Releases its Full Set of Early Modern Typeface Training for Tesseract

More Early Modern Word Lists Released by eMOP on Github

Submitted by mchristy on Thu, 10/02/2014 - 12:46

The eMOP team is happy to announce the release of more early modern word lists, which we have compiled, cleaned, and combined over the last 2 years. Our sources include Ted Underwood, Martin Mueller, Loretta Auvil, the VARD project, and the TCP transcriptions of EEBO and ECCO. Please see our Github page for more information.

Read more about More Early Modern Word Lists Released by eMOP on Github

SAA 2014 Pre-Conference Workshop - OCRing with Open-Source Tools

Submitted by mchristy on Thu, 08/07/2014 - 13:03

The slides for our 1-day pre-conference workshop on OCR'ing with Open Source Tools, given at the Society of American Archivists 2014 Annual Conference in Washinton, DC on August 12.

Or download the original Powerpoint slides.

Read more about SAA 2014 Pre-Conference Workshop - OCRing with Open-Source Tools

Early Modern Word List with Variant Spellings

Submitted by mchristy on Thu, 07/24/2014 - 00:09

The eMOP team is happy to release the early modern word list we've compiled by parsing the 46,000 TCP transcriptions of EEBO & ECCO documents and combining it with the alternate spelling list available via the VARD tool.

Read more about Early Modern Word List with Variant Spellings

eMOP @ DH2014-Lausanne: eMOP and the Cobre Tool

Submitted by mchristy on Thu, 07/17/2014 - 20:14

A presentation from DH2014-Lausanne discussing distributed reading, crowdsourcing and the Cobre tool as used in eMOP.

Read more about eMOP @ DH2014-Lausanne: eMOP and the Cobre Tool

eMOP @ DH2014-Lausanne: Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards

Submitted by mchristy on Thu, 07/17/2014 - 20:01

A presentation at DH2014-Lausanne discussing some of the problems faces during the eMOP project, and by eMOP project collaborators on other large projects. We discuss how changes are dealt with in large DH projects.

Read more about eMOP @ DH2014-Lausanne: Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards

eMOP @ DH2014-Lausanne: eMOP Poster

Submitted by mchristy on Mon, 07/14/2014 - 12:29

The poster we presented on eMOP at DH2014-Lausanne. It was a huge success and we talked to many people interested in the work we are doing and the tools we've created.

Read more about eMOP @ DH2014-Lausanne: eMOP Poster

eMOP @ DH2014-Lausanne: eMOP Post-OCR Triage

Submitted by mchristy on Mon, 07/14/2014 - 12:03

A presentation at DH2014-Lausanne of the eMOP (at the IDHMC at Texas A&M) on our post-processing triage method along with our expanded treatment and diagnosis queues for correcting and analysing Tessearct OCR results.

Read more about eMOP @ DH2014-Lausanne: eMOP Post-OCR Triage

eMOP @ DH2014-Lausanne: eMOP Book History Tools

Submitted by mchristy on Mon, 07/14/2014 - 10:32

A presentation at DH2014-Lausanne of the Tesseract training methods and tools developed by eMOP (at the IDHMC at Texas A&M), and their potential application for other book history and typeface history research projects.

Read more about eMOP @ DH2014-Lausanne: eMOP Book History Tools

TCDL 24x7 Presentation on eMOP Workflows (April 28,2014)

Submitted by mchristy on Wed, 04/30/2014 - 14:23

This is 24x7 presentation made to the Texas Conference on Digital Libraries, in Austin TX. 24 slides in 7 minted. A very fast presentation of eMOP based on our workflows.

Read more about TCDL 24x7 Presentation on eMOP Workflows (April 28,2014)

TxDHC Presentation of eMOP Workflows (April 11, 2014)

Submitted by mchristy on Fri, 04/04/2014 - 13:11

A text outline of a presentation given at the Texas Digital Humanities Consortium's (TxDHC) Conference at the University of Houston, April 10-12, 2014. This presentation provides an overview of the OCR training, OCRing, and post-processing analysis and correction processes being done by eMOP through a series of workflow diagrams created over the life of the project.

1 comments

Submitted by Matthew Christy (not verified) on Fri, 04/18/2014 - 18:07

eMOP at TxDHC

Follow eMOP at the 1st annual Texas Digital Humanities Consortium Conference via Twitter @ Storify:https://storify.com/EMGrumbach/emop-at-txdhc

Historical Typemaking and its Artifacts

Submitted by toddsamuelson on Fri, 02/28/2014 - 11:11

In late 2013, Todd Samuelson traveled to Europe in search of typographical specimens for the eMOP initiative. In a series of dispatches, he will highlight his findings and discuss the significance of historical research in the development of the project.

Read more about Historical Typemaking and its Artifacts

eMOP Mellon Interim Report

Submitted by tayphil8992 on Wed, 01/15/2014 - 14:47

Prepared by PI and IDHMC Director, Dr. Laura Mandell, and eMOP Co-Project Managers for year two, Matthew Christy and Elizabeth Grumbach, the following post contains the Mellon Interim Report for the Early Modern OCR Project.

Read more about eMOP Mellon Interim Report

Special Characters, Unicode, and Early Modern English

Submitted by mchristy on Wed, 11/20/2013 - 13:19

With a dataset of 45 million page images, the eMOP team is dealing with a lot of text output, and that means dealing with Unicode. As an early modern English project, we're also working with ligatures and other special characters specific to the period, and that means considering the MUFI (the Medieval Unicode Font Inititiave).

Read more about Special Characters, Unicode, and Early Modern English

October OCR Testing & Training

Submitted by mchristy on Tue, 10/15/2013 - 09:46

eMOP progress continues as our team experiments to find the best method for training Tesseract to recognize various early modern fonts. The new Franken+ tool, developed by eMOP graduate student Bryan Tarpley, has passed through the alpha testing phase and dramatically improves our ability to create a variety of training sets for Tesseract. Now we're hard at work investigating various methods for creating “training sets,” for Tesseract to see what will give us the best OCR results.

Read more about October OCR Testing & Training

eMOP's Zotero Page of OCR Readings

Submitted by mchristy on Fri, 10/11/2013 - 12:19

A eMOP library exists under the IDHMC Group in Zotero. It contains a variety of readings related to OCR in general and Tesseract in particular. Come check it out (at eMOP Zotero Library) and peruse our collection of OCR-related readings. You'll never want to know more than this about OCR.

Read more about eMOP's Zotero Page of OCR Readings

This Fall on eMOP: Post Processing

Submitted by egrumbac on Fri, 10/11/2013 - 08:53

In the near future, we intend to write up a post detailing our successes and goals for this fall, but we'd like to immediately share an interesting development at the beginning of Year Two. As our team and collaborators begin thinking towards the post-processing and triage stage of this project, we've been having a series of meetings here to rethink the granularity of our diagnostics and triage approach.

Read more about This Fall on eMOP: Post Processing

KB National Library of the Netherlands posts on eMOP

Submitted by heil on Fri, 03/22/2013 - 16:43

KB National Library of the Netherlands has recently given the Early Modern OCR Project some publicity on the other side of the Atlantic. Koninklijke Bibliotheek (KB) coordinates one of our international partner projects, IMPACT: Improving Access to Text.

Read more about KB National Library of the Netherlands posts on eMOP

eMOP Featured in Library Journal

Submitted by heil on Thu, 12/06/2012 - 09:08

Matt Enis, Associate Editor of Technology for the Library Journal, asks "OCR [optical character recognition] works great for paperbacks—but what about 15th Century texts set by hand?"

Link:

Next Gen OCR Project Reaches Back into Early English History (and Databases)

Read more about eMOP Featured in Library Journal

ProQuest Joins Forces with TAMU Scholars to Make 15th Century Books Behave Like Born-Digital Text

Submitted by heil on Fri, 11/16/2012 - 18:07

ANN ARBOR, Mich., November 6, 2012 - Information powerhouse ProQuest is participating in a project that will vastly accelerate research of 15th through 17th Century cultural history. The company will provide access to page images from the veritable Early English Books Online and newcomer Early European Books to the Early Modern OCR Project (eMOP) at Texas A&M. EMOP will use the content to create a database of typefaces used in the early modern era, train OCR software to read them and then apply crowd-sourcing for editing. The project will turn the rich corpus of works from this pivotal historical period into fully searchable digital documents.

Link:

Proquest eMOP Press Release

Read more about ProQuest Joins Forces with TAMU Scholars to Make 15th Century Books Behave Like Born-Digital Text

eMOP Receives Funding from Andrew W. Mellon Foundation

Submitted by egrumbac on Fri, 11/16/2012 - 17:29

English Professor Laura Mandell, Director of the Initiative for Digital Humanities, Media, and Culture (IDHMC), along with two co-PIs Professor Ricardo Gutierrez-Osuna and Professor Richard Furuta, are very pleased to announce that Texas A&M has received a 2-year, $734,000 development grant from the Andrew W. Mellon Foundation for the Early Modern OCR Project (eMOP, http://emop.tamu.edu ). The two other project leaders, Anton DuPlessis and Todd Samuelson, are book historians from Cushing Rare Books Library.

Read more about eMOP Receives Funding from Andrew W. Mellon Foundation

News

EEBO now in TypeWright

eMOP Mellon Final Report

eMOP Releases its Full Set of Early Modern Typeface Training for Tesseract

More Early Modern Word Lists Released by eMOP on Github

SAA 2014 Pre-Conference Workshop - OCRing with Open-Source Tools

Early Modern Word List with Variant Spellings

eMOP @ DH2014-Lausanne: eMOP and the Cobre Tool

eMOP @ DH2014-Lausanne: Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards

eMOP @ DH2014-Lausanne: eMOP Poster

eMOP @ DH2014-Lausanne: eMOP Post-OCR Triage

eMOP @ DH2014-Lausanne: eMOP Book History Tools

TCDL 24x7 Presentation on eMOP Workflows (April 28,2014)

TxDHC Presentation of eMOP Workflows (April 11, 2014)

1 comments

eMOP at TxDHC

Historical Typemaking and its Artifacts

eMOP Mellon Interim Report

Special Characters, Unicode, and Early Modern English

October OCR Testing & Training

eMOP's Zotero Page of OCR Readings

This Fall on eMOP: Post Processing

KB National Library of the Netherlands posts on eMOP

eMOP Featured in Library Journal

ProQuest Joins Forces with TAMU Scholars to Make 15th Century Books Behave Like Born-Digital Text

eMOP Receives Funding from Andrew W. Mellon Foundation

About eMOP

LOGIN | Create an Account

Search form

News

1 comments

About eMOP

LOGIN | Create an Account