You are viewing limited content. For full access, please sign in.

Question

Question

Token Formatting Help

asked on May 24, 2019

I am having an issue with consistency.

 

I am OCRing a voucher amount. Usually, the decimal point is picked up. Sometimes it is missed, sometimes it is not existent. I need to apply some formatting to allow it to always match the value coming from the database. 

DB value = ######.##

OCR value standard ###,###.##

OCR Value sometimes ###,### ## (with the space where the decimal is)

I have run around in circles regarding changing formatting and field type, no luck.

 

How can I handle this? Please assist.

0 0

Answer

SELECTED ANSWER
replied on May 28, 2019 Show version history

Unfortunately, if OCR does not correctly identify all of the characters, you won't be able to repair that easily. However, you might be able to use the Substitution process to remove all of the non-digit characters, such as commas and spaces...

Then break it apart, and put it back in with the decimal where it should be.

 

You could also do the same thing with scripting, if you didn't want to use the regular processes.

If the original string matches the format you've labeled "OCR value standard" then you can use the Fixed Point format specifier to strip out the commas.

1 0
replied on May 28, 2019

Just thinking out of the box here. Depending on your document, you might be able to use the Zone-OCR process on the area where the voucher amount appears. This would allow you to apply various local processes, or different OCR settings to that area, without affecting the primary OCR process.

1 0
replied on May 29, 2019

Thanks! You initial answer got me through

0 0

Replies

You are not allowed to reply in this post.
You are not allowed to follow up in this post.

Sign in to reply to this post.