Question

distributed processing and quickfields

Quick Fields Import Agent Distributed Computing Cluster Capture

Updated February 21, 2014

asked on August 6, 2013

Can the option in Quick Fields Zone OCR "Use existing text" use text acquired from distributed processing? (Or import agent?).

does it have the proper location features for it to work effectively?

0 0

Answer

APPROVED ANSWER

replied on August 8, 2013

It sure can! Regardless of how your text is generated for a document (OCR through drag and drop, Import Agent, DPS, Scanning, etc), it's going to be handled the same. And if there's text associated with a document, you can take advantage of the 'Use Existing Text' option for Quick Fields Zone OCR.

2 0

Replies

replied on November 6, 2013

To Re-iterate what Brett has said, Quick Fields does not discriminate against searchable text/information. If their is information/text available, no matter how it was generated, Quick Fields will handle it the same.

So if you are using "use existing text" then it will use it when available, regardless of how it got into the document.

The great thing about this is that Laserfiche has essentially provided several different products that would allow you to handle the same situation multiple ways and other products you use from Laserfiche will treat them in a similar way, allowing you to cookie cutter your solution to suite your needs to a tee.

0 0

replied on February 20, 2014

Wait a minute. How does the Zone OCR know what text was found within it's zone in a previous OCR?

0 0

replied on February 20, 2014

I am familiar with this because we use it with import agent quite a bit. (I just didn't know if DCC would work the same).

When a page is OCR'd you'll see 3 files - the tif, the txt and a loc file. The loc file is where the text is.

So if you change your zone ocr to area to "use existing text" it'll use the information in the loc and text file which will be much much faster. There's no actual OCR involved.

Of course in my observations any whole page OCR will never be as accurate as an actual zone OCR if the quality is less than perfect. But it's awesome if most of your incoming documents are 1st generation scans or if they are electronic documents that were snapshotted (such as snapshotting invoices).

I've also found they can be pretty good even on more marginal documents for identification. One trick I've used is to put in 5 different things to identify in certain areas then used token identification to see if at least 3 of the 5 match. It leads to a rather long list of matching conditions but it's easy to do because I put all 5 conditions in a "matches all" group, then right click and copy that group and paste it quite a bit. Then I delete identifiers 1/2, then identifiers 1/3 in the next, then 1/4, then 1/5. (Then 2/3, 2/4, 2/5, 3/4, 3/5, 4/5). That way if any three of the 5 zones match it'll work.

This seems like a lot of processing but it's really not - it's all processing the loc file and using the text that's already there - and as it's working on a page all of that data is already available. And this becomes even better with QF 9 as you can do a rough identification and then pass it to a stronger OCR based identification if it meets the first criteria with conditionals (thus only firing up the actual much much slower OCR engine when needed).

0 0

replied on February 21, 2014

Just a quick note on Chris's point above. The point of OCRed text is to make the documents searchable. And when we return search hits, to allow showing context hits by highlighting the word you searched for on the image and text. So, all LF products (including DCC) that do OCR generate the same set of text and location files for each image. The locations are basically the coordinates of each word on the image. So QF can use that to match the region of the image with the text in it without having to re-OCR the region.

0 0

replied on February 20, 2014

A few points of clarification, in order to avoid confusion.

If Quick Fields is "scanning" documents out of a Laserfiche repository, then it can pull in an existing text record so that the document does not need to be re-OCRed.
From this text record, you can use Pattern Matching to look for specific parts of text, based on terminology for text format.
If you actually need to use Zone OCR--Pattern Matching can't get at what you want for some reason--then you'll run that process against the image document. This has nothing to do with an existing text record.

0 0

replied on February 20, 2014 • Show version history

I'm not in front of my work computer right now but I know I've used "use existing text" in a zone OCR so that the "zone" I select is not being re-ocr'd.

From the help files (http://www.laserfiche.com/support/webhelp/quickfields/9.0/en-US/quickfields.htm#Processes/ZoneOCRAdvancedOptions.htm?Highlight=use existing

Use existing text: This option specifies whether the zone should use existing text or OCR text from the image. If True is selected, it will check and see if there is any text associated with the image (from OCR, Laserfiche Capture Engine retrieving text, or PDF generating text). If there is no text, it will be OCRed and returned. If False is selected the text within the zone bounds will be OCRed and used.

I find this option immensely useful because I can look for a specific set of words in a specific area of the page. Using whole page text matching is much less useful if the words you have are words that may show up in other locations as well.

Edit: Here's a screenshot of the setting in Laserfiche

0 0

replied on February 21, 2014

You're right, Chris. Thanks for the clarification!

0 0

You are not allowed to follow up in this post.

Question

Question

distributed processing and quickfields

Answer

Replies

Sign in to reply to this post.