You are viewing limited content. For full access, please sign in.

Question

Question

OCR Process Failing over Time

asked on August 5, 2019

Hello, I'm having an issue with OCR-ing large batches of existing documents in the repository.  When I start the process, it will run smoothly but after a certain amount of time (or quantity processed) the OCR process will start failing for every single page (reporting Error 404 and Error 6408).  When I try to restart the OCR process, the same errors occur.

If I restart the client and then restart the OCR process, sometimes it will start working immediately and other times it takes a while but again it only works until it reaches the threshold (either time or qnty processed) and then start failing again until I reboot the client.

I'm thinking this has something to do with resources getting overloaded but can't confirm this as of yet.  I know the server performs indexing on text files in order to make them searchable in the client.  Could the indexing process be getting overwhelmed and preventing new text files from being OCR'd?

0 0

Replies

replied on August 22, 2019 Show version history

Grant,

 

I'm having the same issue with multipage TIFF files as I'm importing them into the repository.  Everything worked fine for a while and then I started receiving 6408 and 404 errors on everything I upload. Sometimes the errors appear on different pages when I try to reload the same document.

I added [Settings][OCR]PagesOption =  0 to my Everyone attributes and (whether that was the real fix or not) everything worked well for a few days until the issue popped up again two days ago.

 

Should I try switching my import to a new volume? Does anyone have any suggestions?  

0 0
replied on December 19, 2019

Hi,

We are getting the same errors.  Did anyone come up with an answer?

0 0
replied on December 19, 2019

Hi Sue,

It all depends on your scenario.  Are you getting these errors on the LF server machine or individual workstations?

Our scenario has us migrating nearly 300,000 documents (over 6 million pages) from an old DMS directly to Laserfiche.  We attempted to at first perform OCR upon import but it was causing our imports to fail at random points making it very difficult to proceed.  So we disabled it and decided to run OCR in between data imports and that's when we encountered the errors specified in the topic.

We were trying to OCR thousands of documents from the local LF machine and so resources were being split between OCR and the other server duties such as indexing.  From what we can tell, both OCR and Indexing are very resource heavy and can trip each other up.  

We have designed two workaround solutions but have yet to test/implement either of them due to time constraints with completing this migration.  We plan to be completed with the migration in February 2020 and will then as a team evaluate these solutions.

One solution involves the Workflow activity "Schedule OCR" and a string of Distributed Processing workstations

The other solution involves both Workflow and QuickFields with QF Agent/Scheduler.

Sorry I can't be more help at this time.

0 0
replied on December 20, 2019

Hi Grant,  We were running the OCR on the server.  If we come up with a solution, I'll let you know.  Thanks, Sue  

0 0
replied on December 19, 2019

I looked back at my emails with my solution provider to see if I could remember how they resolved the issue. I believe that one (if not the primary) cause was that I had an older or corrupt version of the OCR engine installed and that we fixed it by uninstalling the old engine and installing the latest version. I'm not sure what the name of that file is or where it's installed though.

0 0
replied on December 20, 2019

Thank you John.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.