You are viewing limited content. For full access, please sign in.

Question

Question

Improving OmniPage Zone OCR Accuracy in Quick Fields

asked on April 29 Show version history

I’m configuring a Quick Fields session that relies on OmniPage Zone OCR to extract key data from legislative documents. Here's the general setup:

  1. First Page Identification: Using Zone OCR to locate the words ORDINANCE or RESOLUTION.

  2. Document Type Extraction: Using pattern matching to determine the document type (e.g., Resolution or Ordinance).

  3. Document Number Extraction: Using pattern matching to capture numbers like 25-101.

However, the OCR output during the session is inaccurate or inconsistent. I've experimented with:

  • Character Preferences: Prioritizing letters or numbers.

  • Optimization Styles: Switching between Balanced and Accuracy.

While some settings are slightly better, the improvement is minimal.

Interestingly:

  • The text displays correctly in the repository viewer.

  • When I insert the same page as a sample and OCR it with Accuracy optimization, the result is very accurate.

Question:
Is there a way to improve OCR accuracy within the session runtime to match the quality seen when OCR'ing a sample file manually?

 

Below is the OCR behavior comparison:

Figure 1: Shows the document and how cleanly Laserfiche Client extracts the text.

 

Figure 2: Shows the OCR result when the same page is added as a sample document and OCR’ed in Quick Fields using Accuracy optimization — result is excellent.

 

Figure 3: Shows the Zone OCR output during actual session runs, which is significantly less accurate and sometimes unreadable.

2 0

Replies

replied on May 9

Hi Margaret,

There is a possibility that inaccuracy is caused image compression in the runtime. There is a configuration in Tools -> Options -> Quick Fields -> General -> Use JEPG compression. Do you enable this option and set Quality level of JEPG lower than 100?

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.