I’m configuring a Quick Fields session that relies on OmniPage Zone OCR to extract key data from legislative documents. Here's the general setup:
-
First Page Identification: Using Zone OCR to locate the words ORDINANCE or RESOLUTION.
-
Document Type Extraction: Using pattern matching to determine the document type (e.g., Resolution or Ordinance).
-
Document Number Extraction: Using pattern matching to capture numbers like 25-101.
However, the OCR output during the session is inaccurate or inconsistent. I've experimented with:
-
Character Preferences: Prioritizing letters or numbers.
-
Optimization Styles: Switching between Balanced and Accuracy.
While some settings are slightly better, the improvement is minimal.
Interestingly:
-
The text displays correctly in the repository viewer.
-
When I insert the same page as a sample and OCR it with Accuracy optimization, the result is very accurate.
Question:
Is there a way to improve OCR accuracy within the session runtime to match the quality seen when OCR'ing a sample file manually?
Below is the OCR behavior comparison:
Figure 1: Shows the document and how cleanly Laserfiche Client extracts the text.
Figure 2: Shows the OCR result when the same page is added as a sample document and OCR’ed in Quick Fields using Accuracy optimization — result is excellent.
Figure 3: Shows the Zone OCR output during actual session runs, which is significantly less accurate and sometimes unreadable.