You are viewing limited content. For full access, please sign in.

Question

Question

Pattern Matching Help

asked on October 29, 2014

I have the following data being captured via a Zonal OCR:

 

Information    1    Zone 2 : Observer: Date Started: Date Submitted:
J. Michael Lausch Oct 2, 2013 4:14:39 PM May 22, 2014 9:28:16 AM
Type:
Standard (Walkthrough)
Location: Evaluation:
Londonderry Elementary School These results count towards evaluation    Zone Data to Capture        0    

 

I would like to capture the "Observer"= J. Michael Lausch as well as "Date"= Oct 2, 2013 and "Submittal"=May 22, 2014.  The name is not always going to have 3 (it may just be Daryl Foxhoven)...I'm having a tough time figuring out how to capture certain info on Line 2.

 

Any help would be appreciated.

 

Thanks!!

1 0

Replies

replied on October 30, 2014 Show version history

\bDate Submitted\b\:\n(\w\.+\s\w*\s\w*)

Above will give you the name: J. Michael Lausch

 

Make similar expression to gather the date as well.

 

Check out this post for more ideas: 

https://answers.laserfiche.com/questions/64435/Quickfields-not-returning-right-values-for-Pattern-Matching

1 0
replied on October 29, 2014

Hi Daryl,

 

does it come in the same exact way every time?

 

Tony

0 0
replied on October 29, 2014

This is just a thought, considering that line two is consistent

 

you can use this regex to grab both dates, and assign that as a multi-value token.

([A-Z]+[a-z]+\s[0-9]+,\s[0-9]+)

then, once those are assigned to tokens, grab the date at index 1 and use that as a pattern to then grab the name and assign that to a token.

once this is done, the rest is pretty straight forward, again, considering it's consistent.

 

 

Tony

0 0
replied on October 29, 2014

This regex matches pretty much any name type including ones with space, hyphen, apostrophes but take into account that this will not work for international names.  This is part of why regex doesn't really work well for verifying names.

 

Either way for the most part this should do the trick for names:

 

^[A-Z][a-zA-Z '&-.]*[A-Za-z]$

0 0
replied on October 30, 2014 Show version history

I see Uzair posted another rexeg.  While it will capture the name J. Michael Lausch, it won't serve the purpose of capturing the many forms a name can come in.  With the above regex I posted you can get names with ['&-.] included although I doubt anyones name will have a & symbol or a period.  If you test out the regex I posted you will see it should meet most of your needs

0 0
replied on October 31, 2014

Hi Ramon,

 

Yes you are right, as the one I posted will address this specific OCR extract only but I was just giving an idea that how this could be achieved smiley

1 0
replied on November 6, 2014

I'm using what you suggested above...both the following:

 

\bDate Submitted\b\:\n(^[A-Z][a-zA-Z '&-.]*[A-Za-z]$)

as well as

^[A-Z][a-zA-Z '&-.]*[A-Za-z]$ 

with the following data

Information    1    Zone 2 : Observer: Date Started: Date Submitted:
J. Michael Lausch Oct 2, 2013 4:14:39 PM May 22, 2014 9:28:16 AM
Type:
Standard (Walkthrough)
Location: Evaluation:
Londonderry Elementary School These results count towards evaluation    Zone Data to Capture        0    

 

However, neither seem to be returning the proper result of J. Michael Lausch

 

Am I doing something wrong??

0 0
replied on November 6, 2014

Daryl,

 

Try this: 

Date Submitted:\s*(\w+.\s*\w+.\s*\w+.)

The above should return name: J. Michael Lausch

 

 

 

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.