You are viewing limited content. For full access, please sign in.

Question

Question

OCR with Import Agent and Quickfields

asked on October 17, 2022

When testing our Import Agent task, we noticed issues on the server where there were a large number of hung OCR jobs eating up resources on the server and user’s quick fields sessions across other use of Laserfiche were slow to respond.  We had to reboot the server at that point to resolve the issue and everything seemed to clean up.

Due to licensing, only two instances of the OCR process can be running at any time.  This is true if you have both Quickfields and Import Agent installed on the same server (like we do).  Currently, we allow two jobs in Quickfields to run simultaneously with the Quickfield scheduler (that prevents one big quickfields session from preventing any other sessions from running).  Almost all of those jobs depend on OCR to work.  When Import Agent runs simultaneously and also uses OCR, they fight each other for that OCR process and result in deadlocked states where nobody can use the OCR and all the jobs appear to hang out there unfinished.

Last week, this happened because Register of Deeds started doing bulk scanning of Military Discharge papers from Microfiche film.  Those images are imported thru ImportAgent and are bad/grainy quality so are very slow to scan thru the OCR tool.  With them OCRing thru Import Agent, Quickfields hung to a point where peoples documents didn’t appear in Laserfiche until hours later (which was not acceptable for AP documents especially).  I’ve resolved this issue with ROD by turning off OCR for the Military Discharge papers.  This was fine because that isn’t normally how they search those documents and reliability of any of the ocr’d text would be pretty bad.

Unfortunately, we need OCR enabled for the medical records being imported into Laserfiche.  We need to come up with a workaround for this issue if we plan on doing a bulk load of the Health images (or run it only over weekends where it will have the least impact).

Other people must be running those two pieces of software on the same server?

0 0

Replies

replied on October 17, 2022

I am not aware of any licensing restrictions on the number of concurrent OCR processes. Both Quick Fields Agent and Import Agent allow multiple sessions or import profiles to run in parallel, scaling up with the number of processors on the machine. Quick Fields and Import Agent each launch their own OCR processes, they are not shared between the 2 applications or even between 2 Quick Fields sessions or 2 Import Agent profiles.

However, OCR is CPU-intensive and running concurrent OCR processes may impact the performance of the machine as a whole. What you may be seeing is OCR taking a longer than usual time to processes low quality pages. Quick Fields will default to timing out OCR in 10 minutes if it has not completed a given page. The latest Import Agent patch adds a similar timeout.

If both Quick Fields and Import Agent are up to date on patches, I would recommend that you open a support case so we can take a closer look.

1 0
replied on October 19, 2022

As Miruna said, this isn't about licensing, it is about resources/bandwidth. We OCR over 20,000 documents each day and it takes a LOT of CPU bandwidth to do that effectively, but it can be done.

If you are doing this much OCR then I would highly recommend looking into the Distributed Computing Cluster (DCC) as a way of offloading the OCR tasks (at this point 95% of our OCR and PDF page generation is done through the DCC).

You can add workers to the DCC to divide the workload, which makes it much easier, and faster, to do a large amount of concurrent OCR tasks if you dedicate enough resources.

We have our DCC set up with multiple workers, each of which has 16GB of RAM and 12 virtual processors (we configured DCC to only use 11 processors at most so the server would never be maxed out and inaccessible).

These workers are not used for anything else because OCR eats up a lot of CPU resources and it makes sense to isolate these activities (note the RAM isn't that high and I don't know that I've really even seen them use 4GB)

0 0
replied on October 19, 2022

 

The reference to a licensing restriction came from this post:

https://answers.laserfiche.com/questions/65525/CPU-maxing-to-100-at-Quick-fields-session?sort=newest

This is an old incident, but the behavior is identical.  Perhaps newer version of the software no longer have the licensing restriction with Omni (or the poster in that thread was mistaken).  Once the OCR threads are in a 'dead-lock' type state, they will not close on their own, slowly eating more and more memory.  I have to stop the import agent service to clear them out.  

From the initial response, it sounds like the 10 min timeout was only implemented in the most recent version of Import Agent, so that could also be what we are seeing.

 

Thanks for your help!

 

 

Thanks!

 

It seems we only have this problem when running those two pieces of software simultaneously.    

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.