Juxta-CL Text Comparison Tool is Available

Juxta-CL is a command line text comparison tool based on the online JuxtaCommons tool, created for eMOP by Performant Software Solutions, to do ground-truth comparison for testing the accuracy of our OCR processes. It is now available open-source through our Github page. Have fun.

Juxta-CL is a command line text comparison tool based on the online JuxtaCommons tool. Juxta-CL was created by eMOP collaborators Performant Software Solutions to do ground-truth comparison for testing the accuracy of our OCR processes. It compares two separate pages of text and generates a score between 0 & 1 indicating the their correlation to each other. Juxta-CL can use one of several different distance algorithms for this purpose:

  • Jaro-Winkler
  • Levenshtein
  • native Juxta compare

Juxta-CL also has command-line options to:

  • ignore punctuation
  • ignore case
  • ignore end of line hyphenation
  • normalize file encoding to UTF-8

Juxta-CL is now available as an open-source project from eMOP via our Github page under an Apache Software License, v2.0

Installing Juxta-CL

Juxta-CL is a java based tool and so can be run on any platform that is loaded with the Java SE Developer's Kit. To install Juxta-CL without building it yourself:

  1. Download and unzip the Juxta-CL.zip file.
  2. For Windows users, download/copy the juxta.bat file, and put it in your new Juxta-CL folder.
  3. type sh juxta.sh or juxta.bat for Juxta-CL help information