You are viewing limited content. For full access, please sign in.

Question

Question

Pattern Matching: Remove OCR "Artifacts" from Number Sting

asked on February 12, 2021

Hello, I was wondering if there's an easy RegEx pattern to remove random "artifacts" that sometime get pulled from OCR.

For example, I'm trying to extract a Sales Order number from a Bill of Lading document in QuickFields but the image quality is not great so I'm getting " 014158'1 " instead of " 0141581 ".   I know (\d*)[',.](\d*) would work to ignore one artifact in the string but I'm concerned that in production, it may place multiple random non-numbers anywhere within the number string.

Is there anyway to tell QuickFields to identify a string of numbers and remove anything within that string that is a non-number?

Here's the example text I'm extracting from if it helps:

SHIPPER NUMBER I
014158'1 

0 0

Replies

replied on February 15, 2021

Hi Grant,

Using OmniPage OCR Zone you can specify the type of the character and select "number" in the option.

 

 

 

0 0
replied on February 15, 2021

Hey Olivier, thanks for the reply.  I just tried that and it's still picking up a non-existent apostrophe.

 

I also tried lowering the optimization from accuracy to balanced and still got the same result.

0 0
replied on February 16, 2021

Hi Grant, maybe you should check the others settings (i saw somewhere you can change the reliability.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.