Pattern Matching: Remove OCR "Artifacts" from Number Sting

asked on February 12, 2021

Hello, I was wondering if there's an easy RegEx pattern to remove random "artifacts" that sometime get pulled from OCR.

For example, I'm trying to extract a Sales Order number from a Bill of Lading document in QuickFields but the image quality is not great so I'm getting " 014158'1 " instead of " 0141581 ". I know (\d*)[',.](\d*) would work to ignore one artifact in the string but I'm concerned that in production, it may place multiple random non-numbers anywhere within the number string.

Is there anyway to tell QuickFields to identify a string of numbers and remove anything within that string that is a non-number?

Here's the example text I'm extracting from if it helps:

SHIPPER NUMBER I
014158'1

0 0

replied on February 15, 2021

Hey Olivier, thanks for the reply. I just tried that and it's still picking up a non-existent apostrophe.

I also tried lowering the optimization from accuracy to balanced and still got the same result.

replied on February 16, 2021

Hi Grant, maybe you should check the others settings (i saw somewhere you can change the reliability.

Question

Question

Pattern Matching: Remove OCR "Artifacts" from Number Sting

Replies

Sign in to reply to this post.