Hi all,
I have successfully setup my DCC with the main server as the scheduler and worker and executed some test workflows to discover documents in the repository (tested on one folder so far) that have no OCR Text and perform scheduled OCR. Pic of workflow attached.
The thing is the server is set to perform maintenance and backups at night and for the initial OCR of the backlog of files (the site has just completed a relocation and done about 1500 documents of between 1,2,10,200 and 1000 pages) and needs now to digest them.
So I suggested rather than contesting for capacity at night we schedule the OCR to run between 6am and 6pm Saturday and Sunday this first weekend and see how much it gets done.
So the workflow, scheduled to start 6am on Sat and Sun, is running within the conditional:
If %(Time) is less than 06:00:00 PM
then End Workflow
And it acts on the output of the search:
({LF:AssociatedPages="Y"} & {LF:OCR=none})
Something that has occurred to a colleague and I is that with the DCC configured to hand the OCR task to the worker process is it not possible that with a search result of 1500 documents being returned in milliseconds and the queue being handed over to the worker process the workflow may in fact end more or less immediately and the OCR could continue to run for days (presumably not but at any rate you get my drift).
Can anyone enlighten me about how these interactions are governed and what the best practice would be to contain the load of a long backlog OCR like this with routine maintenance and backups that must be done at pre-determined times?
I considered running the workflow in batches of 10 or 100 documents and loops that look somehow at average execution time but it all got too complicated given that I don't know how the backend processes are interacting anyway.
Also, when 6pm ticks over what is the way that a conditional will end a process like this? Is there a distinction between a graceful completion of current queued OCR task and a Kill -9 that may be problematic for the subsequent OCR when the workflow runs again?
Any suggestions would be very much appreciated,
Best regards,
Will