Question

ocr mistakes

Updated August 11, 2016

asked on August 9, 2016

Hi!

I have an issue where we are getting a large number of spelling and formatting mistakes in the OCR text.

This has only been reported since the update to v10.1, but im not 100% sure it wasn't doing it before. The users report 'it was better before' but i'm not sure exactly when it started.

The documents that are having the issue are direct from LF Forms (saved as a .TIFF file) so there is no handwriting and everything is clean text. I've played with the font and size in the form, but no change.

I have tried to use DCC for the OCR and also let the users generate it manually, but it seems to work about the same. i've set it to 'Accuracy' rather than 'speed' and get the same result.

To give you an example, this is what i'm getting;

Starting phrase (pasted into a multi-line text box in the form)

The quick brown fox jumps over the lazy dog
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
01234567890

Result (copied out of the text window of the resulting document.

The quick brow fox jumps aver the lazy dog
AECDEFC+ IJKLKNCPCRSTUWVXYZ
abcdef ghijkl mnopgrstuvwcyz
01234567890

Anyone have any thoughts about what i can try next?

Thanks

Mark

0 0

Replies

replied on August 10, 2016

We see the same thing. But what's interesting is if you run those same tiff images directly through Nuance's tools (Omnipage or PowerPDF), they get it right with nearly 100% accuracy.

As a Nuance customer I've had a lot of interaction with their support group around difficult OCR issues, and over the last 18 months they've provided and incorporated numerous hotfixes into their OCR engine that have greatly improved recognition accuracy and stability with challenging images.

I posted a question here about a year ago asking how often Laserfiche picks up new builds of the Omnipage engine they distribute with Laserfiche, but never got an answer. Could anyone at Laserfiche comment as to how often you are getting and redistributing the latest builds of Omnipage?

0 0

replied on August 10, 2016

Mark,

It's not really possible to guess what might be going on. It could be the font, the size or the resolution of the image. I would recommend opening a support case and attaching a few sample images.

Geoffrey,

We update the OCR engine once every couple of years when major versions come out or fixes and features are added to warrant an upgrade. We're also monitoring our support cases and updating if issues related to OCR are patched. However, there have not been a lot of cases related to OCR quality that come through our Tech Support in the past 5 years or so.

0 0

replied on August 10, 2016

Hi!

The documents in question are Laserfiche forms. So they are generated electronically and therefore all straight and clean.

Does anyone know if there is any way of changing the resolution of the image that forms creates?

Also, there are a few fields that are of interest, more so than the whole form. I'm reluctant to enlarge the text of the whole form too much as this would kill my layout. Is there any way to enlarge just the contents of those few fields? I've tried different fonts. I tried CSS, but that only affects the display size while filling out.

I've also noticed that if you widen the form, it scales down to fit to page when in LF. this effectively shrinks the fonts further.

To test, i've created a new form. standard everything, only one field, a multi-line. This is mostly ok except the lower case.

I widened the form from default 800px to 1200px and the shrinking size made the OCR a lot worse.

So i believe i need to up the resolution of the forms .TIFF (if possible) or enlarge the font.

form OCR issue.jpg (72.69 KB)

| Download

0 0

replied on August 11, 2016

Update:

i've replicated the same thing on 2 different sites. and found that it might be a v10.1 thing.

The system with issues has been updated to v10.1 recently.

The other system i've tried is still at v10.0.

the v10.0 system OCR's perfectly, even if i shrink the font significantly.

0 0

replied on August 10, 2016 • Show version history

Hi Mark,

Have you tried changing the OCR from the default option of balanced to accuracy? This can be found in the client options (Tools>Options).

Hope this helps!

EDIT: Sorry just re-read your post, and see you have tried this. How are the scans being generated? OCR quality is only as good as the original....

0 2

You are not allowed to follow up in this post.

Question

Question

ocr mistakes

Replies

Sign in to reply to this post.