Pattern Matching question about filtering out multiple parts of file name

SELECTED ANSWER

replied on November 9, 2017 • Show version history

There are a lot of different regular expressions that would work:

TEXT_TEXT_NUMBER

For the first TEXT, you could use: ^([A-Za-z]+) or [A-Za-z]+ (this second one will also capture the second text string, so you will specify that you want to capture only the first match)

For the second TEXT, you could use: _([A-Za-z]+)_

For the NUMBER, you could use: _(\d+) or just \d+

1 0

View 5 previous replies

replied on November 9, 2017

Perfect! Thank you for you quick reply.

0 0

replied on November 9, 2017

One problem. I didn't notice that some files are text_text-text_number.

The files with the hyphen in the middle text are coming back with that field blank in the template. What should I add in the regex to accommodate the hyphen?

0 0

replied on November 9, 2017

Sure, so one thing to note is that brackets "[....]" denote what characters you want to capture. If you want to account for the hyphen, you just need to add it into the bracket. For you that would look like: _([A-Za-z-]+)_

This finds all letters A-Z, a-z, and hyphens that are in between your underscores.

1 0

replied on November 9, 2017

Thank you! That helps a lot. I'll add that to my notes.

0 0

replied on November 9, 2017

Now I have another group of files that are named ABCD_12345_67890

The regex I was using no longer works because \d+ isolates the first set of numbers. How do I isolate the last set of numbers?

0 0

replied on November 9, 2017 • Show version history

\d+$

$ denotes the end of a string. Conversely, ^ denotes the start of a string.

1 0

replied on November 9, 2017

You've been a big help. Thanks for the helpful link. That will come in handy!

0 0

replied on November 10, 2017 • Show version history

You could also use a base expression like this:

[^_]+_[^_]+_[^_]+

The [^_] means any thing but the underscore and followed by a + means to return as many consecutive instances that it finds.

Then you wrap the section that you want returned like this ([^_]+) and you create 3 tokens with expressions shown below:

First Token = ([^_]+)_[^_]+_[^_]+

Second Token = [^_]+_([^_]+)_[^_]+

Third Token = [^_]+_[^_]+_([^_]+)

0 0

Question

Question

Pattern Matching question about filtering out multiple parts of file name

Answer

Replies

Sign in to reply to this post.