Hi guys,
I'm having a bit of a challenge getting acceptable results with Quick Fields (version 10.1.0.168) Zone OCR generation from PDF documents. So far a 36% document fail rate (21 out of 58 of the customer's test documents fail)! I've spent considerable time adjusting Image Enhancement settings (in the correct order, colour removal for smoothing, despeckle and smoothing pixel adjustments 1 pixel at a time then test, set, and test again, etc etc etc), testing with different DPI scaling, setting Zone OCR to "Accuracy" and so on. I still get better OCR results if I OCR the file in the Laserfiche client, but this isn't an option as I need to extract a number from the PDF both for a metadata field as well as the file name of the document in Laserfiche. Plus there are several thousand documents to back scan.
I'm OCR'ing a zone to find a text string "DAILY JOB RECORD" and the number that follows that text. I verify the orientation of the page by checking that one of the three words is recognised. I then strip all letters and spaces from the result to leave me with a 7-digit number. The number is what I'm after. Hopefully the screenshots below help clarify things.
Here are the Quick Fields PDF settings:
I've tested with/without converting to B&W but get the same results, however I get better results with 600 DPI than 300.
Here's a successful extraction:
Here's an unsuccessful one - the whole string in the zone (the OCR zone is a lot bigger than displayed, I'm just limiting what I'm putting online) has been recognised simply as "7":
If I then store the documents into Laserfiche, open the file in Laserfiche and run OCR it successfully recognises the whole string including the number from the same document that failed in Quick Fields i.e. it only returned "7" from the OCR:
If I print the PDF via Snapshot I also get an accurate read:
I really need to be able to get at least the same quality/result in Quick Fields. I've tested with all the different Image Enhancement settings except for Invert and Line Removal, but nothing has helped even down to settings of 1 pixel for the three Smoothing options and Despeckle. I'm also testing with coloured original PDFs and B&W originals, but it doesn't seem to make any difference, I still get a much better result in the Laserfiche client than in Quick Fields.
Any tips or advice would be much appreciated.
Thanks,
Mike