You are viewing limited content. For full access, please sign in.

Question

Question

QF Performance between Multiple Zone OCRs v/s Many OCR Regions

asked on October 6, 2015

Hello,

 

I have a QF session where I need to capture 6 indexes using Zone OCR.

 

Does anyone have made a comparison of the performance between:

 

1. Having many Zone OCRs processes with 1 OCR Region each

2. Having 1 Zone OCR process with many OCR Regions

 

I have 3.5 millions docs to process and a slight difference will have a considerable impact in the result.

 

Thank you and best regards,

 

Ignacio PdeA

BMB sal

0 0

Answer

SELECTED ANSWER
replied on October 6, 2015

One OCR process with multiple zones is more efficient than multiple OCR processes with single zones because the OCR engine only gets initialized once. There is a limit of 20 zones per process.

As Chris said, if your data can be retrieved from full page OCR or Zone OCR on larger regions with Pattern Matching, that would be even more efficient.

1 0

Replies

replied on October 6, 2015

The biggest thing I've seen with Zone OCR's is to make sure none of the zone OCR regions in the same process overlap. I think it's faster to have multiple regions but if they overlap in the slightest it can cause issues.

However the big question is "How can I minimize how many times my Zone OCR processes have to run?"

 

I have found that with extremely large projects like this it pays to have pre-quick fields process that OCRs the documents separately so that you can do the bulk of the identification work without using zone OCR. 

 

In this case I usually use import agent or Quick Fields agent to pull in the documents without OCRing, then setup a DCC workflow to have multiple machines do whole page OCR. Then when running Quick Fields I can use pattern matching to create tokens that help identify the documents. I then use the Quick Field conditionals that were introduced in 9 to determine if I need to run zone OCR to extract data or to do additional identification. 

 

For example I had one document that was 99% the same as another document. Since the document has already been OCR'd I can search for text to do a rough identification that this is the page I need to identify, and then I can do a zone identification to see if the text I am looking for is in position 1 or position 2. I can then run the additional steps for each identification if needed. Because I didn't have to run a zone OCR on each page I sped up the whole process by 95%. Each document packet was 15-20 pages and the vast majority of those pages never had any zone OCRs applied to them so it went very fast.

 

In addition you might find that the whole page OCR will allow you to change your zone OCR sessions to use existing text. If you do that that activity speeds up by at least 75%. 

 

the vast majority of the time I'm bringing in documents the customer wants the entire document OCR'd for text searching so doing a whole page OCR needs to be done at some point anyways. Doing it before Quick fields uses it helps speed up the whole process with this technique. 

2 0
replied on October 7, 2015

Thanks a lot to both of you!

Have a nice day,

 

Ignacio PdeA

BMB sal

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.