You are viewing limited content. For full access, please sign in.

Question

Question

OCR on import

asked on November 27, 2019

I have import agent set up to OCR on import.  I have this set up as I want to run a workflow that uses the OCR text and pattern matching to fill in some template fields.  The problem is that the OCR on import seems to run about 2 hours after it imports.  However, I can manually run the OCR and it runs immediately.

I have removed OCR on import from folders where I don't need it (as we run an OCR process every night for any documents that have no ocr'd pages), and I have now decreased the accuracy on OCR settings on import agent.  I need to wait until all the currently running or scheduled processes complete to see if it improves the situation - but as I try to automate more and more templates to auto fill I will need to turn the OCR back on for those import agent profiles.  Is there something else I should be looking at that I'm missing?  The ocr is running on our ADMIN server as it seems to be the least used server.  I really want this to work for full automation - but right now even with a 10 minute pause on the workflow (in the hopes it will ocr in time) the workflow is still running before the OCR process manages to run on the document.

Does manually running the OCR process give it priority?  Is that why it runs immediately?

0 0

Answer

SELECTED ANSWER
replied on November 28, 2019

Whether Document Text exist is not available as a rule option.  It would need to be a condition inside of the workflow such as the example below

0 0

Replies

replied on November 27, 2019

Hi Tracy,

When manually running OCR in the Desktop Client, it is executed using the OCR software installed on the local PC, which is why it runs immediately instead of getting "scheduled"

OCR is very resource-intensive, so on a server where simultaneous processes are more probably and would have significant impact if not regulated, the process is handled a little different.

0 0
replied on November 28, 2019

So is anyone else trying to do something similar to me?  It seems like pattern matching and automation are pretty hard to make use of in this situation.

In this case I can actually run the automation piece at night after the nightly OCR process runs, simply because the documents are not processed by the department until next day - but there are a lot of documents that are processed on the same day they are received and I was hoping to automate a lot more of them to reduce manual data entry (and the many errors that come with it.)  Are people running OCR on a stand alone server?  or do I need to throw more resources at it to get it to do what I want it to do? 

0 0
replied on November 28, 2019

Hi Tracy

One thing you might want to add to your workflow is to test if the Document Text exist before attempting the Pattern Matching. This way, you could put in a Delay in a loop and only continue once the OCR has been completed

0 0
replied on November 28, 2019

so would you put the starting condition on "created" and "change" (would the OCR completing indicate a change?) and then another starting condition that document text is not blank?  Hmmm, that could work.

0 0
SELECTED ANSWER
replied on November 28, 2019

Whether Document Text exist is not available as a rule option.  It would need to be a condition inside of the workflow such as the example below

0 0
replied on November 28, 2019

You are the best Steve!!  I'm just testing it out now - but it all makes perfect sense.  Now all that is outstanding is figuring out how to not bog down the server running the ocr if we do want to add more ocr on entry so it isn't running at 100%!

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.