You are viewing limited content. For full access, please sign in.

Question

Question

Quick Fields and Drivers Licenses (MN license...:-))

asked on July 28, 2014 Show version history

I'm "finally" using some of the things I've used at conference over the yearscheeky  I'm working on a session to read a MN driver license.  We have a lot of "noise" on them so I'm trying to add some local processes to better OCR the card.  I don't have helpful things like Name: - it just goes right into the name so it's hard to add clean up processes.   I've also got different colored text.

 

I've attached a sample file.  The OCR isn't reading very cleanly.  I have tried with an "original" DL as well.

 

These are the processes I've put in place and the order:

 

Pre-Classification:

  • OmniPage OCR
    • Local Process: Color Smoothing (20%) and Color Removal (with adjustment for shaded background - high)

 

Classification:

  • Zone OCR - find "Drivers or Driver's" in the top section (I also tried to do form recognition, but didn't have quite enough experience to make it work!) - this is still failing, but I can likely figure out why

 

Page Processing:

  • Page Zones (Set to creation of multi-value token and using existing text from preclassification)
  • One additional zone set for the Drivers License Number because it's located under the picture.  This is the only one I can have read with great accuracy at the moment.

 

Questions:

  • Are they some suggestions for getting this to read better?  Did I just pick the wrong processes/order?
  • I have this token from one of the classes: (\w+)\n and since I can't "guess" how a person's name is going to be (Last Name, First Name Middle - they are on the license as First Middle Last), can I make this work?  Is it a regular expression or format for token editor? :-)

 

Any assistance is greatly appreciated!

 

Thanks,

Toyia

 

 

Administrator edit: screenshot of driver's license removed as it was revealing personal information on a public site.

0 0

Answer

SELECTED ANSWER
replied on August 5, 2014

Andrew - these screenshots were a tremendous help and I found my error right away.  I didn't get to test much as we were having system issuesangry

 

Any thoughts on the OCR part of my issue - anyone??

0 0

Replies

replied on July 29, 2014 Show version history

Well first of all you can try the regex:

 

Edit: Adjusted Regex

[A-Za-z '-]*[A-Za-z'-]

 

This will work for all names including hyphens & dashes but it does not work for international characters and unfortunately there are really no characters that can't form a legal name.  Still, I would try this regex and see if it gives you the results you are looking for.  Granted you might have to adjust to make it only capture the name seeing as it looks as if the name comes after the line "Driver's License".

0 0
replied on July 29, 2014

Yeah, I've come to the conclusion that I might not be able to do the name like I want:-)  I played a little more yesterday and am getting closer to consistent results with the OCR.  The name and address is more consistent because the ink is black.  The DOB is harder because they put that in red to make it easier to see if someone was 21 or not.  I've been experimenting with smooth to see if I can even out the characters after converting to b/w in a local process - it's not quite there yet.

 

Thanks for the tip - I will give it a try!

 

Toyia

0 0
replied on July 29, 2014

Hi Toyia,

 

To make OCR more accurate, try the invert and/or despeckle functions as local processes for OCR. Also make sure that your OCR is set to Accuracy, not Balance or Speed.

 

A regular expression that can be used for the name is “\w+\s+\w+\s+\w+”. While this will work, it is not full-proof. It will not compensate for hyphenated names or instances where there is no middle name.

 

Another Suggestion, if you can OCR the driver’s license and grab all other necessary information from a database using that identifier, you should be able to do away with collecting the name directly from the image as well as OmniPage OCR, greatly increasing your processing speed.

 

Hope that helps!

0 0
replied on July 29, 2014

Kathryn,

 

Thanks!  Check on the despeckle and accuracy!  I can't really invert based on what I've got.  I'll try both expressions.  They will always have a way to manually fix things, but if we can get "most" cases, they will appreciate that:-)  I did throw out collecting from the database, but we had some "very naughty people" in MN and I don't think they're ready to allow for that kind of data collection yet - even if it was a machine call getting it (rather than a person) via a web service.

 

Toyia

0 0
replied on July 30, 2014

I pulled the driver's license off the internet - it was a sample that I found to try and set up the session:-)  If you search for MN Driver's license, you will find this example.  Thanks for your caution though!

0 0
replied on July 30, 2014

That's good to know. It looked very realisticwink. As a matter of policy, we'll pull any WFX/WFI/QFX (Workflow/Quick Fields) files as well as Forms business process XML and any images that look like they might contain personal info.

0 0
replied on July 31, 2014

No problem Miruna:-)

 

I'm getting close everyone - I really appreciate all the input.  I have one final question for someone that has a little more regular expression experience than me.  I am getting the information to read with consistency now on the driver's license with the exception of the date of birth, which is in a different color.  But we have the new and fancy conditional in QF:-)  It is reading the numbers, just not always the dashes so I want to do a substitiuton.  I've attached my stab at it.  Could someone please tell me where I went wrong!


Thanks!

0 0
replied on July 31, 2014

Toyia,

 

What you could try doing, is rather than using a substitution, you can use the value you acquired from the Zone OCR(for the date), and use a "Pattern Matching" token. In that "Pattern Matching", the "Look for the pattern in:" you will use the token value of your date Zone OCR. Then in the "Pattern" section, input the "\d" value so that all you will acquire is numerical values since it seems that reading the numbers are not an issue. 

 

So with the Pattern Matching, in your case from your attachment, it will generate the numbers: "1211974" rather than an incomplete value "1 21-1974". Now to make it look like an actual date,  you will go back to your "Document class". From there, you can input that "Pattern Token" value that we just made into the your date field. You do so by pressing the ">" button and select Token Dialog..."  Choose the respective "Pattern Matching" on the left side of that window under "Tokens". Then on the right, you will see a few check boxes. Select "Apply Formatting", and you can place in the format "#/##/####" and this will generate the date as "1/21/1974". Do note that if you wish for your date field to have the "1-21-1974" format, you will need to customize it in the Laserfiche Administration console as guided here.

 

(http://www.laserfiche.com/support/webhelp/Laserfiche/9.1/en-US/AdminGuide/LFAdmin.htm#Custom_Date_Time_Field_Display.htm)

 

A quick summary on how to do so... So in your date field, you would right click that date field and select "properties". Then select "Edit Format"  and select "custom". then you can input the format however you like such as "MM-dd-yyyy". this will turn "1/11/1234" into "01-11-1234". Hopefully this addresses your date of birth inquiry!

 

Regards,

Andrew

0 0
replied on July 31, 2014

Andrew,

 

Thank you for your very thorough response.  I feel like a big dork because I can't get my mind to wrap around the first part.  Applying the formatting completely makes sense.

 

So!  I'm getting things like this: 12 21-1974 or 12-21 1974.  It will pick up the dash sometimes.  I'm wanting to get just the numbers in the pattern match.  I do know "where" I'm putting it - I'm just struggling with "how" to put it together. I seem to need to account for the dashes "sometimes" being there - right???  I tried typing straight \d\d\d.... values in and it doesn't bring back just numbers to parse into a date.  I did try some of the other tips people provided - those aren't "quite" right either.  I'll keep playing as I know this takes practice, but the help is appreciatedangel

 

Toyia

0 0
replied on July 31, 2014 Show version history

In your substitution, I think you need to be using Match Groups. Match groups are a good tool for reformatting data, like inserting a dash between two groups of numbers. The following seems to work for me:

 

 

Where the pattern I'm using for "Replace" is (edited to make it more flexible):

(\d{2})[\s\-]+(\d{2})[\s\-]+(\d{4})

And the pattern for "With" is:

${1}-${2}-${3}

The ${1} syntax is the Match Group: I have grouped each set of numbers (month, day, year) using parenthesis in the pattern match area, then referred to them with ${1}, ${2}, etc.

 

The term [\s\-] is a regular expression for a space or a hyphen.

 

 

Edited:

If you want to make the substitution more flexible, you can use [\s\-]+ to indicate one or more spaces and/or hyphens, just like you have \s+ in your original pattern. I have edited my pattern to show this.

0 0
replied on July 31, 2014

Toyia,

 

No worries. If I'm understanding the situation correctly, the formatting that was suggested, you would not need to account for anything but numbers. The "Pattern Matching" will only output numbers. Even if your Zone OCR came back with "12-21-1974" or "12-----21----1974"(just something super random), it will ignore all those "-" and just output "12211974" . However, I believe i may have left out one additional information. In the "Edit Token" options in the "Pattern Matching" token, Under Pattern it should only be left as "\d". However in the drop down box below that, ( "If multiple matches exist, return:"), select "All Matches (Combined with no spaces)". that will generate all the numbers rather than just one number if it is left at the default "First Match only". I uploaded my example on the attachment. Of course the "Name:" and "Token value" will vary depending on what you called yours. Let me know if that helps!

 

Regards,

Andrew

Pattern Matching Edit Token option.png
0 0
replied on August 1, 2014

Thank you everyone!!!  I will do my testing and let you know the results:-)

0 0
replied on August 4, 2014

Hi Toyia,

 

If your question has been answered, please let us know by clicking the "This answered my question" button on the response.

If you still need assistance with this matter, just update this thread. Thanks!

0 0
replied on August 4, 2014

Nope, still working on this.  I am so appreciative of everyones help!  Since I started with Andrew's approach, that's the one I'm working with - no worries for everyone else.  All those suggestions have been logged for future reference and will be usedwink

 

I've got my pattern match updated - that's working - yah:-)  I've got my pattern match used for my DOB, but my input seems a little off because it seems to "expect" the dashes to work properly.  Perhaps it's because the DOB Field is a date field???  And I need to change it to text??? :-)

 

The final issue is my OCR.  I've attached a sample.  In one spot - it's flawless!!  But when it fills out the template - it's terrible.  Can somebody tell me where it's grabbing the "perfect" copy so I can adjust my tokens?  I think I have the right screenshots attached for you.

 

Again my thanks for everyone's help!

 

0 0
replied on August 4, 2014

Toyia,

 

In regards to DOB token issue...

Actually you can keep it as a "date" format. You would not need to make it a "text". You technically could, however there may be an issue if you leave it as "text". If somewhere in the future, you wish to utilize the "date" value itself, it will read it as  "text" rather than a "date". So we will shy away from the "text" idea. But upon looking at your word document, i believe that you attached the MM-dd-yyyy in the wrong section. I will do a quick recap with some screen shots to hopefully point you in the right direction.

 

We will assume we got the zone OCR covered and the Pattern matching done. The Pattern matching should only be generating numbers regardless of the "\" or "-" or other alphabets.  The key to always confirming your tokens is to always ultilize the "Test Value" box. The "result value" will input exactly as it would behave. So that is how to determine if your token will work the way you want it. 

 

The "pattern Matching" should look something like the attachment called "pattern Matching edit token option".

 

Then afterwards when you go back to your "Document Class" in Quick Fields and input the value into the date field, it should look something like the attachment called "Document field formatting.png"

 

This will make your date come out as 12/31/2014. So an input of "12312014" will transform into "12/31/2014". As mentioned, it is always a good habit to test your token in the "test value" box to see what the result is. 

 

However if you want your date to look like 12-31-2014, you will need go into your Laserfiche Administration Console. You can create a new date field. Its properties should look like the attachment "date field properties". That will make your date look like 12-31-2014 rather than 12/31/2014. If you wish to use this are your date format, then don't forget to add that field into your Quick Fields "fields" tab!

 

Hope this addresses your issues regarding the DOB token.

 

Regards,

Andrew

Pattern Matching Edit Token option.png
Document field formatting.png
date field properties.png
0 0
SELECTED ANSWER
replied on August 5, 2014

Andrew - these screenshots were a tremendous help and I found my error right away.  I didn't get to test much as we were having system issuesangry

 

Any thoughts on the OCR part of my issue - anyone??

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.