I'm "finally" using some of the things I've used at conference over the years I'm working on a session to read a MN driver license. We have a lot of "noise" on them so I'm trying to add some local processes to better OCR the card. I don't have helpful things like Name: - it just goes right into the name so it's hard to add clean up processes. I've also got different colored text.
I've attached a sample file. The OCR isn't reading very cleanly. I have tried with an "original" DL as well.
These are the processes I've put in place and the order:
Pre-Classification:
- OmniPage OCR
- Local Process: Color Smoothing (20%) and Color Removal (with adjustment for shaded background - high)
Classification:
- Zone OCR - find "Drivers or Driver's" in the top section (I also tried to do form recognition, but didn't have quite enough experience to make it work!) - this is still failing, but I can likely figure out why
Page Processing:
- Page Zones (Set to creation of multi-value token and using existing text from preclassification)
- One additional zone set for the Drivers License Number because it's located under the picture. This is the only one I can have read with great accuracy at the moment.
Questions:
- Are they some suggestions for getting this to read better? Did I just pick the wrong processes/order?
- I have this token from one of the classes: (\w+)\n and since I can't "guess" how a person's name is going to be (Last Name, First Name Middle - they are on the license as First Middle Last), can I make this work? Is it a regular expression or format for token editor? :-)
Any assistance is greatly appreciated!
Thanks,
Toyia
Administrator edit: screenshot of driver's license removed as it was revealing personal information on a public site.