pattern matching name field

replied on March 25, 2014

'Assign Token Values' is the way to go, but your pattern can be a bit simpler.

(\w+), \w+
\w+, (\w+)

The top pattern looks for something with 1 or more letters, a comma, a space, and one or more letters; it returns everything before the comma. The second pattern looks for the same thing, but returns everything after the space.

For more information about pattern matching, I suggest taking a look at the pattern matching article in the online Workflow help files. It covers the syntax in detail.

2 0

replied on March 25, 2014

I find that assuming a single space will sometimes trip up the RegEx, so I'll do something like \s? or even \s*? in order to take into account instances where the OCR didn't read the white space properly.

1 0

replied on March 25, 2014

One last thing: the patterns I included above are very specific; they look for letters and numbers, and will get tripped up by hyphens and other extra characters. A more robust pattern, such as the following...

([\w-]+)\s*,\s*[\w-]+

...will catch more. This pattern will get letters, numbers, and hyphens, and won't be fazed by spaces on either side of the comma. Your mileage may vary; finding the right pattern for your session generally takes a bit of review and a few iterations.

Kenneth's recommendation of the split function is another way to approach the problem, and it should work just as well. The Quick Fields 9 online help files include a description of the split and trim functions.

1 0

replied on March 25, 2014 • Show version history

Try this regex:

^[^,]+,[^,]+$

"Beginning of string, one or more non-commas, comma, one or more non-commas, end of string"

This will get you all the characters on either side of the single comma, or fail to match if there is more than one comma, or no commas at all, so that you can handle the improper format. Just trim the whitespace off the result, and you're about as close as you're going to get for a regex. If you are completely sure that some other characters won't be included in the names, add them to the negated character groups.

Also note: If you can use string splitting and notify an operator on a problem, do that instead of a regex. String split is much more legible, maintainable, and straightforward to write than a regular expression! Don't overlook simple tools just because a more sophisticated one is available.

1 0

replied on March 25, 2014

You first capture the area with the Zone OCR. You then can use a variety of methods, the easiest/most common way I use is a "Assign Token Value" process. Just create a token and use the value from the zoneOCR. You then right click on the token for the zoneOCR and use the token editor to use a regular expression like so:

First Name - , ?(\w{0,99})$
Last Name - ^(\w{0,99}),

I have not tested those regular expressions, but you can also use things like the functions, where you use the "Split" function on the ',' and then add the "Trim" function to remove any whitespaces. You then can use the index to set the first and last name tokens.

First Name - Index 2
Last Name - Index 1

0 0

replied on March 25, 2014

The answers so far have some issues: they all fail on anything but a very simple name like "Smith, John". Consider a hyphenated name, like "Doe-Smith, John". The regex "(\w+), (\w+)" would capture "Smith" and "John", which of course becomes an issue. This is further complicated by the fact that OCR is imperfect, and will add an unknown numbers of spaces, sometimes even within words. So I would suggest the following:

1.) If you are using Quick Fields, use a more permissive regex that still enforces your format, like this:

^([^,]+),([^,]+)$

It matches "beginning of string, one or more non-comma characters, a comma, one or more non-comma characters, end of string". This strictly enforces your format, so Quick Fields can be set to notify an operator if the format isn't met.

2.) If you are using Workflow, use a string split and trim the result. If you split on a ',' character, then count how many tokens result, you know whether the string was properly formatted. This is much easier than developing (and maintaining) a regular expression! Regexes are very powerful, but don't overlook simpler tools! Sometimes they will work fine.

replied on July 23, 2014

If a response answered your question, please click "This answered my question" to let us know.

0 0

Question

Question

Replies

Sign in to reply to this post.