OCR Accuracy when there is superscript

asked on December 4, 2021

I have one large volume vendor who's invoices I scan for and they indicate credit amounts with a small 'CR' superscript after the value.  I have a very inconsistent read on the OCR to indicate those values, and need to determine whether the value is negative or positive.  My results include all kind of character combinations, but too infrequently are the results 'CR' if it is a credit situation.  Any ideas?


replied on December 14, 2021 Show version history

I assume you're using full page OCR, currently?  Have you tried using a separate Zone OCR process that just targets the places the superscripts appear? (may not be possible if they are in different places)

You could also try using some local image enhancement processes to improve the OCR engines's ability to read the characters (e.g., a Smooth process that use Grow or Sand and Fill to make the superscript characters "bolder").  To be fair, this does require a bit of educated guessing about what would help, followed by testing.


You can also use a regular expression-based Substitution process to help replace commonly misidentified characters (e.g., the '+' as a 't' or the 'C' as an 'O' when immediately following numeric/currency characters).

