You are viewing limited content. For full access, please sign in.

Question

Question

Quick Fields pattern matching

asked on December 4, 2014 Show version history

Hello,

 

Quick:

is there a way I can build an expression like this (\d\d\d\d\d\d\d\D\d\d$) and it will pull any dollar value amount up to 9,999,999.99?

or is there a way to have the document pull largest value closest to the bottom of the page with a decimal?

 

Not Quick:

I need to see if there is a way to pull the total amount of an invoice. This account get upwards of 200+ different invoices a month and I need to use pattern matching to find the total since zone ocr is fairly useless for this volume of documents. I know im not going to get all 200 accurately but if I can get close to 60% of them it would help.

I have tried several ways to pull the correct information but unfortunately I end up pulling too many results or not the correct results at all.

The customer wants to be able to run a stack of documents(25 or so) and pull the total off each page that has a total.

here are the variables im dealing with:

1. Some times the invoice says total, other times it will say total due or amount due or total amount. 

2. The spacing changes a lot so its hard to say its x number of characters of the right of the word total.

3. Some of the amounts needed are below the word total and not beside it.

4. Most of the time the total has an $ prior to it and ive tried to focus on that but unfortunately it seems to be the "End Here" character for quick fields so it doesn't recognize the $ as the actual character I want. 

5. I tried to zone ocr just the bottom half of the page to limit the amount of negative results and the results were so far off. for some documents the  zone ocr didn't recognize a single word. it came out in gibberish.

 

I tried to do many different things but I ended up with something like this:    total....(\d\D\d\d$)

To get this to work I have to build a token for each placement value : 1.00 10.00 100.00 1000.00 10000.00 100000.00

 

total....(\d\D\d\d$) This expression comes back with a value of 1.00 which is what I want when the value is in the dollar range but the next page is 108.50 and this token is at the top so it only pulls the value of 8.50 

 

How do I tell quick fields to ignore space between the trigger word and the desired value?

How do I tell quick fields to pull the total value and not just half?

How do I commit the results to a field once I successfully find a result?

Is there a way I can get the software to just start at $ and end at a blank space? 

Or is there a way to tell the software to pull the highest amount of all the results pulled off the page?

 

Is there any type of back end performance increasing piece to this software? If I ocr the same exact data over time does it become more accurate?

 

0 0

Replies

replied on December 4, 2014 Show version history

The $ is a reserved word, so if you want it treated as a literal dollar sign, you need to use \$ (the slash is the "escape" character that tells regular expression that the next character is not a reserved word this time).

I'd probably try something like ([\d\.\,]+) to match digits with periods and commas regardless of how many digits they have. Or ([\d\,]+\.\d\d) if you wanted to specify that there should always be 2 decimals.

1 0
replied on December 4, 2014

Ill give that a shot.

 

Is there any type of "performance increase" by scanning similar documents multiple times?

0 0
replied on December 4, 2014

No.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.