You are viewing limited content. For full access, please sign in.

Question

Question

Regex - How to Pattern Matching starting from the end of the string in stead of the front?

asked on September 13, 2022 Show version history

Hello,

I am pattern matching on an application we receive from outside folks. I was able to get the pattern to match if everything follows the first example below; however, we see that some folks put apartment information in with a comma like the second example causing issues. Is there a way to start from the end of the string instead of the front since the end is always the same? I'm new at regex, and any help is greatly appreciated. 

Regex I'm using to parse the State:

ADDRESS:\(Street, City, State/Province, Zip/Postal Code\)\r(?:[^,]+,){2}\s(\w+)

Example 1: 

ADDRESS:(Street, City, State/Province, Zip/Postal Code)
12345978 9th Ave NE, Everett, Washington 98204 

Example 2:

ADDRESS:(Street, City, State/Province, Zip/Postal Code)
12345978 9th Ave NE, Apt #403, Everett, Washington 98204 

 

Updating to show all data being parsed that is OCR'd when brought into Laserfiche.

 

EMPLOYMENT APPLICATION 
CITY OF MARYSVILLE
1049 State Avenue
Marysville, Washington 98270
http://marysvillewa.gov 
Smith, John Joe
00574 POLICE OFFICER - ENTRY LEVEL 
Received: 8/30/22 5:19
PM
For Official Use Only:
QUAL:_________
DNQ:__________
   Experience
   Training
   Other:______
PERSONAL INFORMATION
POSITION TITLE:
POLICE OFFICER - ENTRY LEVEL 
EXAM ID#:
00123
NAME:(Last, First, Middle)
Smith, John Joe
SOCIAL SECURITY NUMBER:
123-00-9789
ADDRESS:(Street, City, State/Province, Zip/Postal Code)
1234 56th Place Southwest, Apt# 123,  Lake Stevens, Washington 98258 
EMAIL ADDRESS:
john.smithtacos99@gmail.com
HOME PHONE:
13601234567
NOTIFICATION PREFERENCE:
Email
DRIVER'S LICENSE:
Yes No 
DRIVER'S LICENSE:
State:WA Number:WDLXXXXXXXXXB  
LEGAL RIGHT TO WORK IN THE UNITED STATES?
Yes No
What is your highest level of education?
Bachelor's Degree 
PREFERENCES
WHAT TYPE OF JOB ARE YOU LOOKING FOR?
Regular
TYPES OF WORK YOU WILL ACCEPT:
Full Time
SHIFTS YOU WILL ACCEPT:
Day,Evening,Night,Rotating,Weekends,On Call (as needed)
OBJECTIVE:
Find a new career.
EDUCATION
DATES:
From: 9/2019 To: 9/2021 
SCHOOL NAME:
Seattle University
LOCATION:(City, State/Province)
Seattle , Washington 
DID YOU GRADUATE?
Yes No 
DEGREE RECEIVED:
Bachelor's
MAJOR:
Taco Creation 
UNITS COMPLETED:
90 - Quarter
DATES:
From: 1/2019 To: 8/2019 
SCHOOL NAME:
Highline College
LOCATION:(City, State/Province)
Lake Stevens , Washington 
DID YOU GRADUATE?
Yes No 
DEGREE RECEIVED:
Associate's
MAJOR:
Taco Creation emphasis on burritos
UNITS COMPLETED:
45 - Quarter
DATES:
From: 4/2018 To: 12/2018 
SCHOOL NAME:
Renton Technical College
LOCATION:(City, State/Province)
Renton , Washington 
DID YOU GRADUATE?
Yes No 
DEGREE RECEIVED:
Associate's
MAJOR:
General 
UNITS COMPLETED:
55 - Quarter
DATES:
From: 4/2021 
SCHOOL NAME:
Seattle University
LOCATION:(City, State/Province)
Seattle , Washington 
DID YOU GRADUATE?
Yes No 
DEGREE RECEIVED:
Master's
MAJOR:
Criminal Justice
WORK EXPERIENCE
DATES:
From: 5/2017 To: 5/2018 
EMPLOYER:
Johns Best Tacos
POSITION TITLE:
Store Manager
ADDRESS:(Street, City, State/Province, Zip/Postal Code)
1234 Johns Taco Street, Suite #101,  Arlington, Washington, 98233
PHONE NUMBER:
360-123-4567 
SUPERVISOR:
Billy Joel Owner
MAY WE CONTACT THIS EMPLOYER?
Yes No
John Smith Person ID: 12345678 Received: 8/30/22 5:19 PM

 

 

Thank you,

 

Brandon

 

0 0

Replies

replied on September 13, 2022

You can use the Lookahead feature for this, looking ahead for the Washington state info and grabbing the city name just before.

IE:

([^,]+), (?=Washington 98204)

Lookahead is (?=Pattern that exists immeidately following your expression)

https://www.rexegg.com/regex-lookarounds.html

0 0
replied on September 14, 2022

Thank you Chad. I'll take a look, this might be exactly what I need. :) I'll take a look at the URL as well to learn more about this. 

 

Thank you,


Brandon 

0 0
replied on September 13, 2022

Not the exact answer you are looking for, but I use this site to test out a regex pattern:  https://regex101.com/  

0 0
replied on September 14, 2022

Thanks Steven. I'll take a look at this to try and learn more about regex. :) 

0 0
replied on September 13, 2022

I don't think we have quite enough information about your situation, but I'll ask some questions and take a shot at it based on what we know.

1) What application are you using this in? (this determines both which regex engine is being used and also whether there's other options to help you, e.g., workflow activities or QF processes)

2) What exactly you're trying to capture?  Based on your pattern, I'm inferring that you want to capture something like "Washington 98204" from the address?

3) You can't start from the end and work backwards, at least that I'm aware of, but I don't think you need to in this case.  You can anchor your expression to the end of the line, but I don't think that helps you very much in this case (it might, which is why I mention it, but I think it's unlikely).

4) One option is to enhance your regex to account for, and ignore, the apartment information.

5) Parsing addresses is really hard because there's a lot of variability in them.  If you can narrow the scope, that helps a lot (e.g., by using a Zone OCR in Quick Fields).

Here's the regular expression I came up with, though I'll warn you that it is not robust:

ADDRESS:[^\r\n]+\r?\n.*,\s?(\w+\s\d{5})

What it's doing: Match ADDRESS plus some other stuff (you probably don't need to be so exacting as in your post) until you reach a new line.  Skip the new line.  Skip a bunch of stuff until you find this specific pattern.  Capture this specific pattern, where this specific pattern is a comma + word + 5 digit number. 

If you need to handle zip codes , replace \d{5}) with \d{5}(?:\-\d{4})?

0 0
replied on September 14, 2022

Hello Jacob, 

 

1. This is for a job application that is coming through from NeoGov. We are using workflow to pattern match and make tokens that we can then use to populate a sql table and then we use it as a lookup table in forms to auto-fill. 

2. We are capturing several things. We actually capture the entire address line, but separate them out into different tokens to be saved into the sql table. 

3. I'm new to regex so i'm not sure how I would go about anchoring it.

4. We would need the apt information as all this information is used to do a background check for our police department folks that are applying for jobs.

5. I haven't used Quick Fields, but if that's an option I can look into that as well.

 

I appreciate the help, and i'll try the regex you came up with and see how that works. :) 


Thank you,

 

Brandon

0 0
replied on September 14, 2022

Yes, there is. $ marks the end of the input and you can specify a pattern before that. Something like this may work to get the state:

([a-z]+)\s+\d{5}\s?$

(ie, "one word before the 5-digit zip code at the end" - assuming the example above is the whole value we're working with). This was made in Workflow and it was marked case insensitive. The same regular expression will work in Quick Fields too.

You may need to tweak it to account for potentially having the optional 4 digits in the zip code.

0 0
replied on September 14, 2022

Hello Miruna,

 

The data i'm parsing is an entire application out of NeoGov, this is just one piece of the application that I have been struggling with in workflow. I'll have to take a look at Quick Fields, I saw that Jacob suggested it as well. If needed i can get an empty application and post the data i'm searching against if that makes it easier? 

 

Thank you for you help.

 

Brandon

 

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.