You are viewing limited content. For full access, please sign in.

Question

Question

Need help pattern matching

asked on September 24, 2020

Hi all,

 

I need your help to make a reg exp.

This is my sentence

Bonjour
le monde
Aurevoir

Hi
The world
Bye

I need to retrieve "le monde" and "The world".

Actually I'm using 2 patterns matching to get all the results

1st : (?s)Bonjour(.*)Aurevoir

2nd : (?s)Hi(.*)Bye

 

Is it possible to merge the 2 expressions in only 1?

Something like that : (?s)Bonjour(.*)Aurevoir|(?s)Hi(.*)Bye (This one is not correct).

 

Thanks in advance.

Regards

0 0

Answer

SELECTED ANSWER
replied on September 24, 2020

Ok, that makes sense. Try this one instead:

 

(?:Bonjour\s*)(.*)(?:\s*Aurevoir)|(?:Hi\s*)(.*)(?:\s*Bye)

1 0

Replies

replied on September 24, 2020 Show version history

You can capture them both in one expression like this:

 

(?:Bonjour\s*)(.*)(?:\s*Aurevoir\s*Hi\s*)(.*)(?:\s*Bye)

 

But it's going to combine everything into one token.

Result = "le monde The world"

 

If that's what you're looking for, that should work.

0 0
replied on September 24, 2020

Hi Dustin,

 

Thanks for your help.

This is not exactly what I want.

The result should be = "le monde" OR "The world" (if the first group doesn't find)

0 0
replied on September 24, 2020

Ok, that makes sense. Then you just need to throw the OR operator ( | ) in the expression. Try this:

 

(?:Bonjour\s*)(.*)|(?:\s*Aurevoir\s*Hi\s*)(.*)(?:\s*Bye)

SELECTED ANSWER
replied on September 24, 2020

Ok, that makes sense. Try this one instead:

 

(?:Bonjour\s*)(.*)(?:\s*Aurevoir)|(?:Hi\s*)(.*)(?:\s*Bye)

1 0
replied on September 24, 2020

Dustin,

 

Thanks for your help.

I don't understand how did you get the result without \r\n?

 

Can you explain your expression please?

0 0
replied on September 24, 2020

Sure. The RegEx is ran against all of the text at once, not line-by-line (unless you set it in single-line mode), so then I just needed to isolate the pieces I wanted.

 

(?: ) - this is a non-capture group, so I put pieces in there that I want to use to identify where to start my capture, but I don't want to capture those pieces (such as "Bonjour" and "Aurevoir")

 

Let's break it down...

(?:Bonjour\s*)(.*)(?:\s*Aurevoir)

(?:Bonjour\s*) - This is a non-capture group that specifies to start the capture after the word "Bonjour" and any number of whitespaces

(.*) - this is a capture group that captures everything

(?:\s*Aurevoir) - this is a non-capture group that is telling my "everything" capture where to end; it ends when it finds any number of whitespaces followed by the word "Aurevoir"

 

Then I threw in the OR variable ( | ) and basically did the same thing to start a capture following the word "Hi" with any number of whitespaces after it and end it when any number of whitespaces are followed by the word "Bye".

4 0
replied on September 24, 2020

Thanks a lot!

0 0
replied on September 28, 2020

I am going to try hijacking this a bit, I am trying to get pattern matching to work for me but line breaks seem to cause all sorts of problems, is there a good way to get around them?

For me, I am trying to find the word "Influenza" and then find the next date after it and return that.

My sample text is

"

/
Patient Vaccine Administration Record
No of Immunizations 6
Vaccine Date Given Dose Location Lot No.NDC Code Manufacturer Exp. Date Given By
1.HEP B VACCINE 3 DOSE
ADULT IM 
08/06/2019 1 mL Right Deltoid 54A32 58160-0821-
52 
11/12/2021 
2.HEP B VACCINE 3 DOSE
ADULT IM 
09/06/2019 1 mL Left Deltoid 54A32 58160-0821-
52 
11/12/2021
3.HEP B VACCINE 3 DOSE
ADULT IM 
01/24/2020 1 mL Left Deltoid 5NY2K 58160-0821-
52 
GlaxoSmithKline 05/29/2022
4.Influenza Quad (Iiv4), pf,
0.5mL, 6m and up 
01/24/2020 0.5 mL Right Deltoid 5RS7Z 19515-0906-
41 
GlaxoSmithKline 06/30/2020
5.PPD 08/06/2019 0.1 mL Left Lower
Forearm 
C5587CA 49281-0752-
21 
05/24/2021

"

However a Pattern of "influenza.*?(\d\d?/\d\d?/\d\d\d?\d?)" doesn't seem to function properly due to the line breaks, how would I get this to work?

0 0
replied on September 28, 2020

Great stuff guys. I had some down time so I was looking at this to keep my RegEx up to speed. I think I have a correction to the code.

Oliver said he needs it retrieve "le monde" and "The world". Those are specific strings. I would modify the code like this:

(?:Bonjour\s*)(le monde)(?:\s*Aurevoir)|(?:Hi\s*)(The world)(?:\s*Bye)

Here is the thing, if you use the super awesome code that @████████ put together and change the text it returns an incorrect string. 

In other words, in the below image I deleted the "e" from le monde and the result turned into "le mond" and didn't sound any alarms that something was off.

So, if the exact text is important, enter that exact text into the RegEx, if you just need to capture what is on that line then use the code Dustin provided.

If you need the exact text, you can use tokens in the RegEx like this:

 

Hope this helps :)

1 0
replied on September 28, 2020

@████████ you can declare line breaks with "\n"

1 0
replied on September 28, 2020

How would I incorporate that into the Pattern, sorry I am super new to Pattern Matching and RegEx in general.

0 0
replied on September 28, 2020

@████████ I am going to mess around with it right now.

1 0
replied on September 28, 2020

Thanks!

0 0
replied on September 28, 2020 Show version history

(?:Influenza.*\n.*\n)(\d\d\/\d\d\/\d\d\d\d)

There was an issue with the date declarations, so compare mine to yours. Also see above where Dustin explains how a non-capturing group works.

I used the word "Influenza" to tell it where to start looking (using non-capturing group) 

I then used 2 ".*\n" to tell it to go down 2 rows. The ".*" means any number of any character type.

The capture catches the next date it finds using (\d\d\/\d\d\/\d\d\d\d).

See what I wrote above about using tokens in the ReqEx. Then you can swap out the word Influenza with whatever you want.

Hope that helps!

2 0
replied on September 28, 2020

That is amazing, thanks!

1 0
replied on September 28, 2020

Sorry Chris, I am going to bug you one more time.

So I need to find the employee first and last name to fill in their respective fields

Text looks like this:

Record generated by eClinicalWorks EMR/PM Software (www.eclinicalworks.com)
DOE, JANE H, F, 01/01/1975

I can grab the Last Name with (?:Record generated.*\n)(\w*) thanks to what I learned from you but I am having issues grabbing Jane for the first name.

I've tried :

(?:%(PatternMatching_Employee Last Name).*)(\w*)

and 

%(PatternMatching_Employee Last Name).*?(\w*) neither seem to work, is there a better way to grab "JANE"?

0 0
replied on September 28, 2020

Timothy,

 

You're on the right track. The easiest way is just to build on your current expression. The (\w*) at the end of your expression grabs any number of "word" characters (i.e. letters) and stops at any non-word characters (i.e. symbols, spaces, etc.)

Essentially, your first expression is grabbing the first word on the second line (following the line break "\n"). What you want, for the last name, is to grab the SECOND word on that line. To do that, you can simply add the first word "\w*" to the NON-capture group, along with any number of non-word characters "\W*" (lower-case "w" is word characters, upper-case "W" is non-word characters), then grab the next word in the capture group. It will look like this:

 

(?:Record generated.*\n\w*\W*)(\w*)

1 0
replied on September 28, 2020

Beautiful! Thanks!

0 0
replied on September 29, 2020 Show version history

Cool stuff! 

Here are a few alternatives to add into the mix:

(?:Record generated.*\n)(\w*,\s*\w*) = DOE, JANE (comma separated)

(?:Record generated.*\n)(\w*),(\s+\w*) = DOE JANE (space separated)

^just a reminder that you can capture more than one string at a time and then use the token filter to parse out what you need. Could come in handy for future challenges. 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.