I am triggering DCC with Workflow based on document creation and it is working fine with normal day to day scanning. The issue I have is that we have a 3rd party company doing back-scanning (hundreds of thousands of documents) that we are importing in bulk. When they come in they are triggering the workflow to file them that includes a step to OCR if they have not already been OCR'd. There are so many documents that are attempting to schedule OCR with the DCC Scheduler that the workflow activity is getting the message "The Scheduler is currently at capacity and cannot accept any new jobs at this time". While I would expect a message like this on a local station, the point of the DCC is server level processing. I understand if it takes a long time to actually OCR these, depending on server resources, but it should be able to queue the documents in the same manner Workflow does. The limit appears to be be 5000 in process scheduled OCR jobs. Even if I ran a scheduled workflow after hours it would still queue up all of these since they all need to be OCR'd. How should I get around this limit or is this fixed in the 9.2 version?
Question
Question
Replies
At this time DCC queues jobs up in memory, so a limit had to be set. Once DCC has 5000 active jobs, it will stop accepting new ones until some of the current ones are completed and the number goes below the limit. You can have a scheduled workflow that sends batches of 200-500 jobs periodically so you end up with a continuous load below the 5000 job limit rather than OCRing on import and hitting the cap.
DCC has that limit in place to avoid running out of memory on the scheduler. It can queue many jobs at once like Workflow, but it still has limited memory. That said, if you have a machine with plenty of memory and want to increase the limit, you can do that. In C:\Program Files (x86)\Laserfiche\Distributed Computing Cluster\Config, there is a SchedulerConfig.xml file with a MaxNumberOfRunningJobs field. Edit that field to whatever number of jobs you would like DCC to queue at a time. You'll need to restart the LfDcc service after editing that file for changes to take effect.
A word of caution: setting that number too high can cause the scheduler to crash if it runs out of memory.
As an alternative to increasing the job limit, you may be able to change the way that you send jobs to DCC to avoid the issue altogether (and potentially improve performance). DCC is optimized to perform best when its jobs consist of many documents. If you can rework the way that you are submitting jobs to contain larger batches of recently imported documents (say, every document imported in the last minute), you will not have as many jobs running on DCC and won't run into the 5000 job limit.
Because of this limit in the DCC queueing I know have 30,000 workflows stuck waiting for the DCC to open up so they can process. They attempt to contact the DCC, wait around an hour, and try again. I want to kill all of these workflows but it is taking 30-45 seconds per workflow for them to terminate which will take quite a while to kill them all. I disabled the DCC but now they keep looping looking for it instead of telling me it is full. I've disabled the OCR activity for the workflow and applied it to all active workflows but since these already started the OCR step they don't seem to see that it is now disabled. Any idea how to kill 30,000 workflows fairly quickly?
If you select them all in the search results in the Designer and terminate them, you don't have to wait. The dialog will send the message and then it can be closed.
How long WF waits before retrying to assign the task to DCC and how many times is controlled by the Task Error Handlers in the WF Admin Console (under Server Configuration). Look for the "DPSTooBusyException" and switch it from "retry" to "critical". Next time an instance tries to schedule a task and hits that error, it will terminate the workflow. That might take care of your stuck instances without you having to do anything else. You can re-enable it later.
I've tried the "terminate workflows" option and that is what is taking 30-45 seconds a piece to terminate. I'm still letting that method work but I've also changed "DPSTooBusyException" to critical per your suggestion. I'd looked at those but wasn't sure which one to try it on. Thanks a ton for the help and I'll post back in here if it works.
Slowly but surely I was able to kill all of the Workflows. I am hoping the next version of Workflow will have the ability to bypass the OCR Scheduler if if fails to schedule, instead of trying over and over again. Thanks again for the help.
Any word on having Workflow better handle OCR queue capacity issues? I'd love for it to skip past the activity if it gets a DCC capacity issues error and let me then have a scheduled nightly workflow that picks up the slack as necessary.
I have received the same warning message in LF 11. Does anyone know if the jobs that were not able to be sent to the DCC will be picked back up when there are openings, or are those OCR jobs skipped forever?
By default, the error is configured as a retryable error, so Workflow will attempt to send the work again.
If you're seeing it as a warning, that should be a sign Workflow's task error handlers worked.