Hi,
I am using DCC to OCR all documents created in specific folder. The volume of pages that need to be OCR'd are quite alot. A single document will have minimum 60 pages.
The process of OCR'ing these pages becomes extremely slow if the specific page being OCR'd has alot of printed data on it. The less data on the page, the quicker the OCR process for that page.
For example, the OCR of a page that is scanned at 300dpi, Black and White and has a size of about 200kb, could take up to a minute to OCR. However, a page of 20kb in size would take less than 5 seconds. The nature of these scanned documents is such that there is alot of handwritten pages, and bad quality pages that results in longer OCR times.
My design only requires the OCR of the small pages, not the large ones. The smaller pages are simply cover sheets that I use to identify document types. I'm not interested in the OCR of all the other pages.
Is there a way to pick out the pages of a document based on its size, then only OCR those specific pages? If it must be done via script, I will do so. If it can be done via DCC, I will do that too, but I don't see the options in DCC for this.
Can anyone assist?
Thanks
Sheldon