You are viewing limited content. For full access, please sign in.

Question

Question

Sample RegEX for Pattern Matching OCR -

asked on April 9, 2019 Show version history

Hi All

Working on a new QF OCR job.  Wanting to use Pattern Matching against full page OCR to pull "string" of data from the OCR text.

We normally use Zone OCR, however our source files for this are somewhat inconsistent layout wise so zone OCR is too specific.

See attached snippet.

Want the values after:

Carrier Code: - 4 digits, I have this one working

PARS#: - wanting the full string

Port of Entry: want the first 3D, and then the multiword string after the -

ETA: date string and a time string

Submitted By: just the string (could be multi word)

 

Hoping someone can give me a few pointers and I can build on those examples.

 

 

Jeff

 

 

screenshotb.png
screenshota.png
screenshotb.png (239.6 KB)
0 0

Replies

replied on April 9, 2019

PARS#: 471347926923062R0

PARS\s*#:\s*([A-Z,a-z,0-9]*)

Port of Entry: 427 - NIAGRA FALLS

Use two regular expressions to capture the 3 digits and then the word(s). 

Port of Entry:\s*(\d{3})
Port of Entry:\s*\d{3}\s*-\s*([\w\s]+)

ETA: 04/08/2019 08:24 PM

Again, use two regex to capture the date and time elements as tokens.

ETA:\s*(\d{2}/\d{2}/\d{4})
ETA:\s*\d{1,2}/\d{1,2}/\d{4}\s*(\d{2}:\d{2}\s*\w{2})

Submitted by: FIS

(Submitted by:\s*[\w\s]+)

 

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.