Question

Help With Regular Express to Extract Employees Name

Laserfiche Workflow

Updated June 20, 2014

asked on June 20, 2014 • Show version history

I need to be able to extract just the employees name from the following format:

"Last Name, First Name - 1234". The last name needs to be able to account for dashes.

I'm not very good at regular expressions, so anyone that could help, it would be greatly appreciated.

0 0

Answer

SELECTED ANSWER

replied on June 20, 2014 • Show version history

For me, this works: (.+)(?= - \d+)

Here's an explanation:

On the subject of learning regular expressions in general, I'd recommend RegexBuddy. It's a great tool for learning and testing. It also has excellent documentation on regular expressions.

0 0

replied on June 20, 2014

The pattern works as long as you have a space after the name portion, then a dash, then a space, then the number.

0 0

replied on June 20, 2014 • Show version history

Indeed. That's what was specified in the OP, and as Matthew points out, there's a lot more that can be done to make the actual character matching more resilient. This is especially true if the file was originally named with something that was OCR'd.

I normally try to avoid trusting names and numbers unless I have a way to verify them. Hence, we have a lot of that kind of data warehoused from various systems and we use that to do lookups and crosschecks on metadata fields.

0 0

Replies

replied on June 20, 2014

The important thing here is going to be recognizing which characters are legal in which portions of your pattern. I'll assume that you want to be able to extract the Last Name, First Name, and the number afterwards so those should be in capture groups. This should be a decent starting point:

([\w\-]+)\s*,\s*(\w+)\s*-\s*(\d{4})

Capture one or more word characters or dashes, then match 0 or more spaces, followed by a comma, followed by zero or more spaces, then capture one or more word characters, then match zero or more spaces, followed by a dash, followed by zero or more spaces, then capture four digits.

Some quick notes on this regex:

It allows underscores in the names
It does not allow spaces in the names
It requires exactly four digits following the name
It allows for spaces around your comma and hyphen delimiters
It has capture groups equal to (Last Name), (First Name), (1234)

0 0

You are not allowed to follow up in this post.

Question

Question

Help With Regular Express to Extract Employees Name

Answer

Replies

Sign in to reply to this post.