You are viewing limited content. For full access, please sign in.

Question

Question

Regex - Pattern matching

asked on December 9, 2020

Looking for some regex recommendations, I need to pull the name from the following address block, however the challenge is the first two lines are not always there or consistent. 

 

First Example:

005874

0BBXV

John Smith

PO Bok 123

Fake place MB R1J 3k9

 

Second Example:

5641 P30

John Smith

123 Fake St 

Winnipeg MB

R4K 1Y7

 

Last Example

00587123565               123456

John Smith

123-1234 Fake St 

Winnipeg MB R1C 4K9

 

 

0 0

Replies

replied on December 9, 2020

As long as there are no missing examples here, the obvious pattern is the first line that begins with a non-numeric character.

([^0-9][^\r\n]+)

This says look for a non-numeric character and then capture everything until you hit a line break and carriage return.

1 0
replied on December 9, 2020

the problem is it doesn't always seem to register the line break and the carriage return, plus you'll see in the first example that there are non-numeric characters in the number. Meaning that it pulls the BBXV as the persons name

0 0
replied on December 9, 2020

Oh yes of course, You need to check for the LR CR right before

[\r\n]([^0-9][^\r\n]+)

I have not had any issues with the characters not being detected except sometimes in the tester windows of LF

0 0
replied on December 9, 2020

But how do I got it to ignore the characters in the below example:

 

First Example:

005874

0BBXV

John Smith

PO Bok 123

Fake place MB R1J 3k9

 

I tried a similar pattern and the clients name kept coming back as BBXV

0 0
replied on December 9, 2020

BBXV isn't a match to the pattern [\r\n]([^0-9][^\r\n]+) because we are specifying that the next line must start with a non-numeric character

0 0
replied on December 9, 2020

ah ok. I see. Thanks I will give that one a try!

0 0
replied on December 10, 2020 Show version history

It Worked, kinda. I had to modify it. I ended up with:

 

[\r\n]([^0-9][^\r\n]+) ([A-Z\W? ]*)\s*\r\n 

 

Its not perfect it deleted the space between John and Smith, but its the closest I have got and it worked with all three examples above

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.