This is an old thread, but we have the same issue with v10.0.0.46.
It appears that even though the scheduler says it failed, the worker might still be working on it. The scheduler believes the machine is done, then assigns another job. This creates the situation where the worker has many instances/processes of LFOmniOCR Engine (32bit) at once, and then seizes up the machine.
For us, the failure is often caused by odd page sizes (building plans or maps). When you run the client on the same document, it can take over 20min to complete. We have not found an ideal remedy to get the hung OCR processes to terminate, other then scheduling a restart of the worker.
These threads have provided some help.
Related Thread: https://answers.laserfiche.com/questions/56453/distributed-computing-cluster-job-failed
DCC Timeout: https://answers.laserfiche.com/questions/74555/Would-like-to-decrease-the-default-settings-for-when-OCR-timesout-in-the-Client-