How do I take a zone OCR area that contains 'Last name, First name' and convert it to two different tokens of 'last name' and 'first name'?
Question
Question
Replies
'Assign Token Values' is the way to go, but your pattern can be a bit simpler.
(\w+), \w+ \w+, (\w+)
The top pattern looks for something with 1 or more letters, a comma, a space, and one or more letters; it returns everything before the comma. The second pattern looks for the same thing, but returns everything after the space.
For more information about pattern matching, I suggest taking a look at the pattern matching article in the online Workflow help files. It covers the syntax in detail.
I find that assuming a single space will sometimes trip up the RegEx, so I'll do something like \s? or even \s*? in order to take into account instances where the OCR didn't read the white space properly.
One last thing: the patterns I included above are very specific; they look for letters and numbers, and will get tripped up by hyphens and other extra characters. A more robust pattern, such as the following...
([\w-]+)\s*,\s*[\w-]+
...will catch more. This pattern will get letters, numbers, and hyphens, and won't be fazed by spaces on either side of the comma. Your mileage may vary; finding the right pattern for your session generally takes a bit of review and a few iterations.
Kenneth's recommendation of the split function is another way to approach the problem, and it should work just as well. The Quick Fields 9 online help files include a description of the split and trim functions.
Try this regex:
^[^,]+,[^,]+$
"Beginning of string, one or more non-commas, comma, one or more non-commas, end of string"
This will get you all the characters on either side of the single comma, or fail to match if there is more than one comma, or no commas at all, so that you can handle the improper format. Just trim the whitespace off the result, and you're about as close as you're going to get for a regex. If you are completely sure that some other characters won't be included in the names, add them to the negated character groups.
Also note: If you can use string splitting and notify an operator on a problem, do that instead of a regex. String split is much more legible, maintainable, and straightforward to write than a regular expression! Don't overlook simple tools just because a more sophisticated one is available.
You first capture the area with the Zone OCR. You then can use a variety of methods, the easiest/most common way I use is a "Assign Token Value" process. Just create a token and use the value from the zoneOCR. You then right click on the token for the zoneOCR and use the token editor to use a regular expression like so:
- First Name - , ?(\w{0,99})$
- Last Name - ^(\w{0,99}),
I have not tested those regular expressions, but you can also use things like the functions, where you use the "Split" function on the ',' and then add the "Trim" function to remove any whitespaces. You then can use the index to set the first and last name tokens.
- First Name - Index 2
- Last Name - Index 1
If a response answered your question, please click "This answered my question" to let us know.