You are viewing limited content. For full access, please sign in.

Question

Question

Help With Regular Express to Extract Employees Name

asked on June 20, 2014 Show version history

I need to be able to extract just the employees name from the following format:

"Last Name, First Name - 1234". The last name needs to be able to account for dashes.

 

I'm not very good at regular expressions, so anyone that could help, it would be greatly appreciated.

0 0

Answer

SELECTED ANSWER
replied on June 20, 2014 Show version history

For me, this works: (.+)(?= - \d+)

 

Here's an explanation:

 

On the subject of learning regular expressions in general, I'd recommend RegexBuddy. It's a great tool for learning and testing. It also has excellent documentation on regular expressions.

0 0
replied on June 20, 2014

Thank you for your reply. Unfortunately when I plug that into the Pattern Matching activity it does not work.

replied on June 20, 2014

Try (.+)\s-\s\d\d\d\d

replied on June 20, 2014

The pattern works as long as you have a space after the name portion, then a dash, then a space, then the number.  

0 0
replied on June 20, 2014 Show version history

That worked, thank you.

 

I have an additional question. All of this is part of a bigger project. We are working on a renaming workflow so when an employee changes their name, it will update the employees folder, documents, and metadata assigned to them.

 

Now that I have the employees old name (thanks to Devin), I need to be able to find that name in any found documents within that employees folder and replace it with the employees new name, while retaining the other parts of the document name. Any ideas how to do this? Below is what I currently have. I'm just not sure how to do the last part of pulling out the employees name and inserting the new name.

 

 

replied on June 20, 2014 Show version history

Indeed. That's what was specified in the OP, and as Matthew points out, there's a lot more that can be done to make the actual character matching more resilient. This is especially true if the file was originally named with something that was OCR'd.

 

I normally try to avoid trusting names and numbers unless I have a way to verify them. Hence, we have a lot of that kind of data warehoused from various systems and we use that to do lookups and crosschecks on metadata fields.

0 0

Replies

replied on June 20, 2014

The important thing here is going to be recognizing which characters are legal in which portions of your pattern. I'll assume that you want to be able to extract the Last Name, First Name, and the number afterwards so those should be in capture groups. This should be a decent starting point:

([\w\-]+)\s*,\s*(\w+)\s*-\s*(\d{4})

Capture one or more word characters or dashes, then match 0 or more spaces, followed by a comma, followed by zero or more spaces, then capture one or more word characters, then match zero or more spaces, followed by a dash, followed by zero or more spaces, then capture four digits.

Some quick notes on this regex:

  1. It allows underscores in the names
  2. It does not allow spaces in the names
  3. It requires exactly four digits following the name
  4. It allows for spaces around your comma and hyphen delimiters
  5. It has capture groups equal to (Last Name), (First Name), (1234)
0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.