Question

DCC Scheduler at Capacity

Distributed Computing Cluster Workflow

Updated September 27, 2023

asked on September 11, 2014

I am triggering DCC with Workflow based on document creation and it is working fine with normal day to day scanning. The issue I have is that we have a 3rd party company doing back-scanning (hundreds of thousands of documents) that we are importing in bulk. When they come in they are triggering the workflow to file them that includes a step to OCR if they have not already been OCR'd. There are so many documents that are attempting to schedule OCR with the DCC Scheduler that the workflow activity is getting the message "The Scheduler is currently at capacity and cannot accept any new jobs at this time". While I would expect a message like this on a local station, the point of the DCC is server level processing. I understand if it takes a long time to actually OCR these, depending on server resources, but it should be able to queue the documents in the same manner Workflow does. The limit appears to be be 5000 in process scheduled OCR jobs. Even if I ran a scheduled workflow after hours it would still queue up all of these since they all need to be OCR'd. How should I get around this limit or is this fixed in the 9.2 version?

0 0

Replies

replied on September 11, 2014

At this time DCC queues jobs up in memory, so a limit had to be set. Once DCC has 5000 active jobs, it will stop accepting new ones until some of the current ones are completed and the number goes below the limit. You can have a scheduled workflow that sends batches of 200-500 jobs periodically so you end up with a continuous load below the 5000 job limit rather than OCRing on import and hitting the cap.

0 0

replied on September 27, 2023

Good morning,

Do you know if there is a way to get the current queue count, so instead of the workflow just throwing an error when it hits the queue limit, we could instead use that information and send the appropriate number of jobs based on what slots are free. I.E if there are 2500 active jobs, then I know I can send another 2500 jobs through before I hit the 5000 limit.

0 0

replied on September 11, 2014

DCC has that limit in place to avoid running out of memory on the scheduler. It can queue many jobs at once like Workflow, but it still has limited memory. That said, if you have a machine with plenty of memory and want to increase the limit, you can do that. In C:\Program Files (x86)\Laserfiche\Distributed Computing Cluster\Config, there is a SchedulerConfig.xml file with a MaxNumberOfRunningJobs field. Edit that field to whatever number of jobs you would like DCC to queue at a time. You'll need to restart the LfDcc service after editing that file for changes to take effect.

A word of caution: setting that number too high can cause the scheduler to crash if it runs out of memory.

As an alternative to increasing the job limit, you may be able to change the way that you send jobs to DCC to avoid the issue altogether (and potentially improve performance). DCC is optimized to perform best when its jobs consist of many documents. If you can rework the way that you are submitting jobs to contain larger batches of recently imported documents (say, every document imported in the last minute), you will not have as many jobs running on DCC and won't run into the 5000 job limit.

0 0

replied on September 12, 2014

Because of this limit in the DCC queueing I know have 30,000 workflows stuck waiting for the DCC to open up so they can process. They attempt to contact the DCC, wait around an hour, and try again. I want to kill all of these workflows but it is taking 30-45 seconds per workflow for them to terminate which will take quite a while to kill them all. I disabled the DCC but now they keep looping looking for it instead of telling me it is full. I've disabled the OCR activity for the workflow and applied it to all active workflows but since these already started the OCR step they don't seem to see that it is now disabled. Any idea how to kill 30,000 workflows fairly quickly?

0 0

replied on September 12, 2014

If you select them all in the search results in the Designer and terminate them, you don't have to wait. The dialog will send the message and then it can be closed.

How long WF waits before retrying to assign the task to DCC and how many times is controlled by the Task Error Handlers in the WF Admin Console (under Server Configuration). Look for the "DPSTooBusyException" and switch it from "retry" to "critical". Next time an instance tries to schedule a task and hits that error, it will terminate the workflow. That might take care of your stuck instances without you having to do anything else. You can re-enable it later.

0 0

replied on September 12, 2014

I've tried the "terminate workflows" option and that is what is taking 30-45 seconds a piece to terminate. I'm still letting that method work but I've also changed "DPSTooBusyException" to critical per your suggestion. I'd looked at those but wasn't sure which one to try it on. Thanks a ton for the help and I'll post back in here if it works.

0 0

replied on September 16, 2014

Slowly but surely I was able to kill all of the Workflows. I am hoping the next version of Workflow will have the ability to bypass the OCR Scheduler if if fails to schedule, instead of trying over and over again. Thanks again for the help.

0 0

replied on September 30, 2014

Any word on having Workflow better handle OCR queue capacity issues? I'd love for it to skip past the activity if it gets a DCC capacity issues error and let me then have a scheduled nightly workflow that picks up the slack as necessary.

0 0

replied on September 30, 2014

If you let the activity fail instead of retrying, you can wrap it in a Try Catch to ignore the failure and allow the workflow to continue.

0 0

replied on December 23, 2022

I have received the same warning message in LF 11. Does anyone know if the jobs that were not able to be sent to the DCC will be picked back up when there are openings, or are those OCR jobs skipped forever?

0 0

replied on January 3, 2023

By default, the error is configured as a retryable error, so Workflow will attempt to send the work again.

If you're seeing it as a warning, that should be a sign Workflow's task error handlers worked.

0 0

You are not allowed to follow up in this post.

Question

Question

DCC Scheduler at Capacity

Replies

Sign in to reply to this post.