I have had a bunch of trouble OCR'ing documents because of computer performance. I have customers that are getting sold back logs of documents and this is thousands of documents that need to be OCR'ed. What is the best process to handle a job like this as well as whats the best recommendation for hardware. I deal with a lot of people that have their servers setup on VMs and they want to know whats the best setup for these machines, not just the minimum requirement. RAM, CPU speed, cores, stuff like that. Thank you.
Question
Question
Whats the best way to OCR a large amount of files for Avante
Replies
if they have Avante licensing then I would recommend using Distributed Computing Cluster, and using a workflow to setup auto ocr.
those are good ideas and I have looked at that. I'm looking more for what would be the best way to make the actually OCR'ing of the documents go faster?
Essentially going with Tommy's response by utilizing the DCC you can have up to 10 machines as workers. Based on the hardware requirements a single job/process takes a single core on the machine. I would not cap out on the amount of jobs for the total amount of cores on each machine though. The reason for this is if a job were to fail or hang you do not want your CPU utilization at 100% if you are utilizing all cores. Therefore, I would go over the recommended requirements section for the DCC. I would try to figure out how many documents you would want being OCR'ed at a time and then determine some specifications from there.