You are viewing limited content. For full access, please sign in.

Question

Question

Quick Fields - Pattern Matching - Need help writing regular expression

asked on July 29

Hello,

I've been staring at this for hours now and I can't figure out for the life of me how to write what feels like an easy regular expression for this. In the attached screenshot, you can see a column for an invoice number and a column for a date. The date is always the 2 digit month/2 digit day/2 digit year, and as you can see, sometimes the invoice number runs right up next to it. So what I'd like my pattern to do is look for that 2 digit month with the slash after it, and take everything that comes before it. So the first line would return 45NV08005, as an example. If I capture all those invoice numbers in my zone, what pattern would return each invoice number in a multi value token?

This has been driving me up the wall that I can't figure this out, so I'll be eternally grateful to whoever can offer some assistance. Thanks!

0 0

Answer

SELECTED ANSWER
replied on July 29

That's because the "any char except..." part will try to match as much as possible and \s* is "zero or more spaces", so the space gets put into the first part.

([^\r\n]+[^\r\n\s])\s*\d{2}\/\d{2}\/\d{2}   - this specifies that the last char in the matched group can't be a space (in addition to excluding newline chars). That way it stops before the space and now the space matches \s* instead.

1 0

Replies

replied on July 29

([^\r\n]+)\d{2}\/\d{2}\/\d{2}  seems to work for me.

[^\r\n]  is any character except for newlines, so that keeps the matches from going past the end of each line.

\d{2}\/\d{2}\/\d{2}  specifies the date, escaping the slash for being a special character.

1 0
replied on July 29

Hi Miruna,

That looks like it'll work. In some cases, the invoice number doesn't extend all the way to the date like it does in the screenshot, so there's some whitespace between the invoice number and the date. I tried adding a \s* in front of the first \d{2} to account for that, but it was still putting that whitespace into my token. How would I exclude the possible whitespace in front of the date?

0 0
SELECTED ANSWER
replied on July 29

That's because the "any char except..." part will try to match as much as possible and \s* is "zero or more spaces", so the space gets put into the first part.

([^\r\n]+[^\r\n\s])\s*\d{2}\/\d{2}\/\d{2}   - this specifies that the last char in the matched group can't be a space (in addition to excluding newline chars). That way it stops before the space and now the space matches \s* instead.

1 0
replied on July 29

Thanks for the help!

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.