Hello,
I've been staring at this for hours now and I can't figure out for the life of me how to write what feels like an easy regular expression for this. In the attached screenshot, you can see a column for an invoice number and a column for a date. The date is always the 2 digit month/2 digit day/2 digit year, and as you can see, sometimes the invoice number runs right up next to it. So what I'd like my pattern to do is look for that 2 digit month with the slash after it, and take everything that comes before it. So the first line would return 45NV08005, as an example. If I capture all those invoice numbers in my zone, what pattern would return each invoice number in a multi value token?
This has been driving me up the wall that I can't figure this out, so I'll be eternally grateful to whoever can offer some assistance. Thanks!
Question
Question
Quick Fields - Pattern Matching - Need help writing regular expression
Answer
That's because the "any char except..." part will try to match as much as possible and \s* is "zero or more spaces", so the space gets put into the first part.
([^\r\n]+[^\r\n\s])\s*\d{2}\/\d{2}\/\d{2} - this specifies that the last char in the matched group can't be a space (in addition to excluding newline chars). That way it stops before the space and now the space matches \s* instead.
Replies
([^\r\n]+)\d{2}\/\d{2}\/\d{2} seems to work for me.
[^\r\n] is any character except for newlines, so that keeps the matches from going past the end of each line.
\d{2}\/\d{2}\/\d{2} specifies the date, escaping the slash for being a special character.
Hi Miruna,
That looks like it'll work. In some cases, the invoice number doesn't extend all the way to the date like it does in the screenshot, so there's some whitespace between the invoice number and the date. I tried adding a \s* in front of the first \d{2} to account for that, but it was still putting that whitespace into my token. How would I exclude the possible whitespace in front of the date?
That's because the "any char except..." part will try to match as much as possible and \s* is "zero or more spaces", so the space gets put into the first part.
([^\r\n]+[^\r\n\s])\s*\d{2}\/\d{2}\/\d{2} - this specifies that the last char in the matched group can't be a space (in addition to excluding newline chars). That way it stops before the space and now the space matches \s* instead.
Thanks for the help!