Zonal OCR/Workflow

replied on August 1, 2014

My suggestion is to use Zone OCR down the sides of each page to capture the dates and corresponding dollar amounts of each transaction and save each set into a multi-valued token. You’ll probably need to use pattern matching to make sure that neither token ends up with blank or incorrect values. Also check that the correct number of dates and dollar amounts are being added to each token. If you’re using Quick Fields 9, you can start the Workflow outlined below, pass the date and dollar value tokens to it via parameters, and store the document into the repository. Otherwise you’ll need to add the tokens to the metadata of the document before you store it.

Set up a workflow to begin when this document is stored. This workflow will create the duplicate entries of the document and apply the metadata. To do this, set up a For Each Value loop. Inside the loop, use a Create Entry and a Move Pages activity to create a duplicate document for each date in the multi-valued token that contains the dates. Apply the appropriate metadata from the date and dollar value multi-valued fields using the syntax: %(FieldName#[%(ForEachValue_Iteration)]#). This token will apply the single value of the multi-valued token that has the index of the loop integration number.

1 0

View 5 previous replies

replied on August 1, 2014

Going to give that a shot and see how it works...will post if I have further questions/need help!

0 0

replied on August 1, 2014

Daryl,

So I can see the height of transactions (2 to 4 lines) vary in your example, meaning that Zone OCR may pose some challenges.

Otherwise workflow pattern matching activities shall work no matter the above heights...

0 0

replied on August 3, 2014

Zone OCR isn't so much a problem here; just capture an entire column and use regex to extract amounts or dates into multi value fields.

The challenge will be with the transactions. Will they all start with "169" or match "\d+\s/\s\w+"? Will the second line always start with "ref"? And of course use a replace function to remove "credit amount:" too.

0 0

replied on August 4, 2014

Ok...so I'm having a tough time capturing the dates from the above document. I set my ZONE OCR and this is the returning value of the Zone Field:

CL
OF
C.-_
Co
Currency:USD
Bank: 1210(
Account: 4121E
I—
Balances
Closing Le(
Closing Co
Opening A,
One Day Fl
Two+ Day
MTD Aver
MTD Aver;
Total Credi
Total Debit
Total Numt
Total Numt
Summaries
Type of Credit
Total ACH Credi
Total Deposits
Total Lockbox C
Total Wire Trans
Credit Totals
Type of Debit
...............................................
Total ZBA Debit
Debit Totals
Credit Transact
7/31/2014
7/31/2014
7/31/2014
7/31/2014
7/31/2014 Dates/Amounts 0

So I know it's picking up the dates...this is my PM that I have setup, but when I test the process, I get nothing in the date field. \d*\d\S\d\d\S\d\d\d\d

Any suggestions? When I test the PM, it works in the configurator, but when I "test the current process", then it returns no value?

Thanks!

0 0

replied on August 4, 2014

Replace the asterisk with a question mark. Should be good after that. Also for a more generic approach, replace \S with [- /\.]

0 0

replied on August 4, 2014

The last time I had a problem like that there were hidden carriage returns. Another time it just worked after deleting the session and creating a new one.

0 0

replied on August 5, 2014 • Show version history

Ok...so I have created a Zone Field going down the right side of my page

However, it seems to be only capturing the first 5 amounts. On page 1 (page above) that's not an issue, however, when I get to pages 2 and on, it should be capturing more amounts (see below)

But it's only capturing the first 5, then on pages 3 and on, it's like the Pattern Matching isn't even working. My PM for the Amount is as follows: Credit Amount:\s+(\d+\S\d\d\d.\d\d)....basically I'm looking for the amount that follows the word Credit Amount: When I test the PM on pages 3, etc...it doesn't work...is my PM wrong?

So I see what is wrong...sometimes the amount might contain a comma and sometimes it might not...how do I compensate for that?

0 0

SELECTED ANSWER

replied on August 5, 2014

Your pattern specifies that all amounts are in the thousands. There aren't any like that on the second page.

You could just go with Credit Amount:\s+(\d[^\n]+)\n to get anything between the space after "Credit Amount:" and the end of the line.

1 0

Question

Question

Answer

Replies

Sign in to reply to this post.