QF Performance between Multiple Zone OCRs v/s Many OCR Regions

replied on October 6, 2015

The biggest thing I've seen with Zone OCR's is to make sure none of the zone OCR regions in the same process overlap. I think it's faster to have multiple regions but if they overlap in the slightest it can cause issues.

However the big question is "How can I minimize how many times my Zone OCR processes have to run?"

I have found that with extremely large projects like this it pays to have pre-quick fields process that OCRs the documents separately so that you can do the bulk of the identification work without using zone OCR.

In this case I usually use import agent or Quick Fields agent to pull in the documents without OCRing, then setup a DCC workflow to have multiple machines do whole page OCR. Then when running Quick Fields I can use pattern matching to create tokens that help identify the documents. I then use the Quick Field conditionals that were introduced in 9 to determine if I need to run zone OCR to extract data or to do additional identification.

For example I had one document that was 99% the same as another document. Since the document has already been OCR'd I can search for text to do a rough identification that this is the page I need to identify, and then I can do a zone identification to see if the text I am looking for is in position 1 or position 2. I can then run the additional steps for each identification if needed. Because I didn't have to run a zone OCR on each page I sped up the whole process by 95%. Each document packet was 15-20 pages and the vast majority of those pages never had any zone OCRs applied to them so it went very fast.

In addition you might find that the whole page OCR will allow you to change your zone OCR sessions to use existing text. If you do that that activity speeds up by at least 75%.

the vast majority of the time I'm bringing in documents the customer wants the entire document OCR'd for text searching so doing a whole page OCR needs to be done at some point anyways. Doing it before Quick fields uses it helps speed up the whole process with this technique.

2 0

Question

Question

QF Performance between Multiple Zone OCRs v/s Many OCR Regions

Answer

Replies

Sign in to reply to this post.