You are viewing limited content. For full access, please sign in.

Question

Question

Zone OCR to split document

asked on March 24, 2014 Show version history

I am trying to set up a quickfields session that will take in a LARGE document of invoices and split it into many documents for each invoice.

 

I am trying to use a Zone OCR to grab a certain part of the page that will change on the initiation of a new invoice and will also be on each subsequent page of the same invoice.

 

What i would like to do is every time i find a change to this text i will create a new document with the pages leading up to the next change, with the name of the document being another zone OCR section on the first page the change is found.

 

Couple problems i am finding.

- The first 4ish pages aren't structured the same as the rest of the document. (this we can get around by not scanning these pages to this process....)

- I am not even sure if this is possible to do in the first place.

- When scanning the document it has to be done double sided. sometimes the back of certain pages are blank and i am not sure if i can get it to ignore this (as the zone ocr would be different) or if it will create a bunch of blank documents.

- Invoices are not a set amount of pages, they can range from 1-x number of pages.

 

 

I am sure i'm missing some information here but i'm just struggling with the ability to even get going with this.

0 0

Answer

SELECTED ANSWER
replied on March 26, 2014

Hurray i finally found the option i was looking for.

 

In "Last Page Indentification" is where you can set the document length now. (idk why its there but it is).

 

So setting the document length to 1 it will attempt to create the new document for every page that is in said document. Then with a zone-OCR i grab a field off each page that is consistent across the invoices. When placing the new 1 page documents in the repository it will merge (append) anything with the same name.

 

Thanks for all the help

1 0

Replies

replied on March 24, 2014

Quick Fields Zone OCR cannot identify documents based on a value changing. You could set that up with scripts, but it is not a built-in feature of Zone OCR.

0 0
replied on March 24, 2014

So i would need to... ZoneOCR the section, put that value into a token, and then use a custom script in order to use that token to evaluate whether or not it has changed.

 

Then use that script to initiate the creation of a new document?

0 0
replied on March 24, 2014

You would have to write the value somewhere because the token go out of scope when the next page comes in. But, yes, those steps are essentially correct.

0 0
replied on March 24, 2014

The more i think about it the more steps get added to this..... where to store all the information, past info, future info, compare it, keep another for saving, how many pages the document will be....

 

I can just tell i'm trying to achieve something that quick fields is not meant to do.

Taking a quick look at Workflow i don't think the features i want are there either (is ZoneOCR even in workflow?)

 

Back to the drawing board!

 

 

0 0
replied on March 24, 2014 Show version history

No, Workflow does not have image processing capabilities.

0 0
replied on March 24, 2014 Show version history

Next question,

 

I heard before from reps and random searching (in the past) that it would be possible to say do a separation by putting blank pages into the document before scanning, and it would automatically recognize the blank page and start a new documents (which i could then zone OCR the first page of each new document for the information)

 

Couple questions

1) Is this blank page thing an option that would work?

2) How would it react to multiple blank pages (our scan is double sided so there could be 2 or 3 blank pages in a row)

0 0
replied on March 24, 2014 Show version history

You could create a new document for every page with redundant metadata assigned  to each and then merge them together with workflow. To use ZoneOCR with workflow simply have your QF session write to a hidden field (one that only workflow can see). Then workflow can compare the invoice number on every document and merge matches.

 

The separation page is the standard way many of our customers do it. They use a page with a specific mark or text to identify though rather than a blank page for exactly the reason you mention.

0 0
replied on March 25, 2014

I haven't done much with document creation yet (most things i have done is moving and OCRing, changing some fields)

 

How would i go about setting it up to make each page a new document. I see how i would go about scanning each page afterwards and zone OCRing the information i need that would allow me to merge and name the document correctly.

0 0
replied on March 25, 2014

It just requires an identification condition that is always true. It can be a zone that always has data or a token that always equals 1.

0 0
replied on March 25, 2014

I just don't see the function that i would use to then save each page separately. getting a token from each page is fine

0 0
replied on March 25, 2014

I don't see a token condition actually. Maybe you could use the Page Size Identification if the page sizes are all the same. As long as the identification condition is always true each page will be treated as a new document.

0 0
replied on March 25, 2014 Show version history

I can't figure this out. i know once i get the document split either combining them or using zone OCR for the information i want wont be hard....

 

Just splitting the document up into individual pages/invoices is besting me at the moment.

 

**EDIT

 

now im trying to find the "Properties" this page is talking about so i can limit the pages and then do my own stuff after

 

http://www.laserfiche.com/support/webhelp/quickfields/8.0/en-us/content/document%20length.htm

0 0
replied on March 25, 2014

A common trick I have seen is a very small zone OCR in a location that is always white. Then setup the condition to be is empty.

0 0
SELECTED ANSWER
replied on March 26, 2014

Hurray i finally found the option i was looking for.

 

In "Last Page Indentification" is where you can set the document length now. (idk why its there but it is).

 

So setting the document length to 1 it will attempt to create the new document for every page that is in said document. Then with a zone-OCR i grab a field off each page that is consistent across the invoices. When placing the new 1 page documents in the repository it will merge (append) anything with the same name.

 

Thanks for all the help

1 0
replied on March 26, 2014

Good to know about the last page option. In a way you are passing the Zone OCR data to workflow through the document name, that is likely the most elegant way to do it in your case.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.