You are viewing limited content. For full access, please sign in.

Question

Question

Whats the best way to OCR a large amount of files for Avante

asked on February 8, 2016

I have had a bunch of trouble OCR'ing documents because of computer performance. I have customers that are getting sold back logs of documents and this is thousands of documents that need to be OCR'ed. What is the best process to handle a job like this as well as whats the best recommendation for hardware. I deal with a lot of people that have their servers setup on VMs and they want to know whats the best setup for these machines, not  just the minimum requirement. RAM, CPU speed, cores, stuff like that. Thank you.  

0 0

Replies

replied on February 8, 2016 Show version history

if they have Avante licensing then I would recommend using Distributed Computing Cluster, and using a workflow to setup auto ocr.

1 0
replied on February 8, 2016

those are good ideas and I have looked at that. I'm looking more for what would be the best way to make the actually OCR'ing of the documents go faster? 

0 0
replied on February 9, 2016

Essentially going with Tommy's response by utilizing the DCC you can have up to 10 machines as workers. Based on the hardware requirements a single job/process takes a single core on the machine. I would not cap out on the amount of jobs for the total amount of cores on each machine though. The reason for this is if a job were to fail or hang you do not want your CPU utilization at 100% if you are utilizing all cores. Therefore, I would go over the recommended requirements section for the DCC. I would try to figure out how many documents you would want being OCR'ed at a time and then determine some specifications from there.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.