You are viewing limited content. For full access, please sign in.

Question

Question

Regular Expression - Find text that comes after a certain string

asked on October 23, 2023

I'd like to use workflow regular expressions to grab whatever string comes after "DELIVERY CERTIFICATE" in the following example. Not sure if it's possible, but can reg expressions be smart enough to grab whatever is on that whole line? So, grab "J P Cullen & Sons Inc" ?


 

0 0

Replies

replied on October 23, 2023

You can try

DELIVERY\s*CERTIFICATE\r\n(.*)\r\n

2 0
replied on October 23, 2023

Brilliant!!!!! Thank you so much!!

0 0
replied on October 23, 2023 Show version history

Steve's suggestion is great, but I would recommend tweaking it slightly.

DELIVERY\s*CERTIFICATE[\r\n]+(.*)[\r\n]+

The original is requiring \r\n between lines, which is standard, especially in Windows.  But depending on your data source, it may not always be \r\n (line breaks in text strings for example are often just \n).  This rewrite will accept one or more \r or \n or \r\n or \n\r.

regex101.com will provide a detailed explanation for each component of the RegEx code and what it does.

DELIVERY matches the characters DELIVERY literally (case sensitive)

\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

CERTIFICATE matches the characters CERTIFICATE literally (case sensitive)

Match a single character present in the list below [\r\n]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)

1st Capturing Group (.*)
. matches any character (except for line terminators)
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

Match a single character present in the list below [\r\n]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)

 

2 0
replied on October 23, 2023

You guys are awesome. I have one more scenario I have to account for. Sometime it's going to come in like this where I have to grab the customer name when it comes after our zip code, so I'd need to grab "Bergstrom Appleton Chevrolet" here:

0 0
replied on October 23, 2023

How would you know when to grab 1 versus the other?

1 0
replied on October 23, 2023

If the text in general contains the string "GFC Leasing", then I'll pattern match to find the Customer Name after the zip code.

0 0
replied on October 23, 2023 Show version history

Assuming your Zip Code is always 53711-4906 exactly, and the GFC Leasing is always exactly that (case sensitive) and DELIVERY CERTIFICATE is always exactly that (case sensitive), then this should work to match either case: 

DELIVERY\s*CERTIFICATE[\r\n]+GFC\s*Leasing.*[\s\S\r\n]+53711-4906[\r\n]+((?!GFC Leasing).*)[\r\n]+|DELIVERY\s*CERTIFICATE[\r\n]+((?!GFC Leasing).*)[\r\n]+

From RegEx101.com:

Your first scenario:

Your second Scenario:

1 0
replied on October 23, 2023

looks like that just pulls in "GFC Leasing - Equipment Billing"

0 0
replied on October 23, 2023 Show version history

I had made an edit to that last post to fix a typo.  I think you grabbed the one before I changed it.

Can you try with this one? 

DELIVERY\s*CERTIFICATE[\r\n]+GFC\s*Leasing.*[\s\S\r\n]+53711-4906[\r\n]+((?!GFC Leasing).*)[\r\n]+|DELIVERY\s*CERTIFICATE[\r\n]+((?!GFC Leasing).*)[\r\n]+

 

0 0
replied on October 23, 2023

Empty Match 
:/

0 0
replied on October 23, 2023

Hmmm...  Worked for me.  🤔

0 0
replied on October 23, 2023 Show version history

Me to! I mean, what Matt provided worked for me.

Maybe provide the actual text from the text layer as opposed to a pic

1 0
replied on October 23, 2023 Show version history

This is what I'm using that is working:

DELIVERY\s*CERTIFICATE[\r\n]+GFC\s*Leasing.*[\s\S\r\n]+53711-4906[\r\n]+((?!GFC Leasing).*)[\r\n]+|DELIVERY\s*CERTIFICATE[\r\n]+((?!GFC Leasing).*)[\r\n]+


Madison
SO Date: 10/10/2023
DELIVERY CERTIFICATE
GFC Leasing - Equipment BIlling
2675 Research Park Dr
Fitchberg, WI  53711-4906
Bergstrom Appleton Chevrolet
2245 W College Ave

Note: use the  button for a cleaner view of the text I posted, and use the  button and select type as Text to post the same way that I did.

0 0
replied on October 23, 2023

weird! just does not work for me!


I'd be totally fine with two separate pattern matching activities, just run them down a conditional branch based on whether or not "GFC Leasing" appears in the retrieved text. Would you have a Pattern just for the zip code case?

0 0
replied on October 23, 2023 Show version history

Your screenshot is not an exact match of the pattern I listed here.  For example, it looks like you are using an opening square bracket symbol ( [ ) where there should be the pipe ( | ) symbol.  The pipe symbol means "OR" and it's separating the two possible patterns we're checking for.  So that difference, and anything else that is different, is why we are not getting the same result.

If you wanted to do them in different activities, then you would just take everything on either side of the pipe symbol as the two different expressions: 

Before:
DELIVERY\s*CERTIFICATE[\r\n]+GFC\s*Leasing.*[\s\S\r\n]+53711-4906[\r\n]+((?!GFC Leasing).*)[\r\n]+|DELIVERY\s*CERTIFICATE[\r\n]+((?!GFC Leasing).*)[\r\n]+

After:
DELIVERY\s*CERTIFICATE[\r\n]+GFC\s*Leasing.*[\s\S\r\n]+53711-4906[\r\n]+((?!GFC Leasing).*)[\r\n]+
DELIVERY\s*CERTIFICATE[\r\n]+((?!GFC Leasing).*)[\r\n]+

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.