You are viewing limited content. For full access, please sign in.

Question

Question

Quickfield OCR spaces between two digit numbers

asked on March 26, 2015 Show version history

I'm currently tying to extract information out of some printed and scanned document. I'm noticing that the OCR process in quick field isn't really reliable even in high accuracy mode.

One of the significant problem is it's keep putting one space in between two digit numbers.

Since some of our business logic is relaying on these numbers, what is the suggested way to handle these problem?

 

0 0

Answer

SELECTED ANSWER
replied on March 26, 2015

You run pattern matching on the OCR value to get everything but the spaces: ([^\s]+) set to all matches.

0 0

Replies

replied on March 26, 2015

You can modify the inaccurate OCRed text by applying a Regular Expression. For a token that has two digits with extra space between them, a simple Regular Expression you could use would be: 

(\d)\s*(\d)

 

This Regular Expression would eliminate any spaces between two digits. For example, it would turn '1 2' into '12'. 

1 0
replied on March 26, 2015 Show version history

Thanks for the reply.

Can you be more specific?

Here is my use case.

I'm using pattern matching to match a string from the page:

which is '5922MAR12'

Ideally the match Regexp should be '\d{4}\w{3}\d+'

but in my case the OCR get me a string which is '5 9 2 2 MAR1 2'

If i want a token has value of '5922MAR12' and use this token in my file name, what should i do? 

 

0 0
SELECTED ANSWER
replied on March 26, 2015

You run pattern matching on the OCR value to get everything but the spaces: ([^\s]+) set to all matches.

0 0
replied on March 27, 2015

This works. 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.