You are viewing limited content. For full access, please sign in.

Question

Question

non uniform documents

asked on May 18, 2017

 I want to scan a series of documents that have a couple of pages with information on them to populate the file name and some template information. All the documents should contain the two pages

 

The problems are as  follows.
-the documents have varying number of pages( don't believe this is an issue but just in case)

- the first page and the last page are not always the same to seperate documents from one another

the first/last page on one document not appear on the first page in every other document etc...

-the pages I wish to use the Zone OCR are not always in the same place

 

0 0

Replies

replied on May 18, 2017 Show version history

One useful question to start with in situations like this is: How would a human figure out how to separate the documents from one another? And how would a human find the right information that's needed to populate the file name, etc?

1 0
replied on May 19, 2017

Can you use slip sheets to separate the documents?

Can you use the same pattern to find the information on all documents?  With a Pattern Match activity, you can match on the whole text of the page rather that trying to limit it to a zone.

0 0
replied on May 19, 2017

The documents are already  scanned and in  tiffs format, there are about 1300 of them. I could potentially edit the documents but with 1300 that seems a bit unreasonable now.

 

I'm unfamiliar with the Pattern Match at the moment, it sounds good but my problem seems to be separating one document from the next.

0 0
replied on May 19, 2017

Do you have one multi-page TIFF for each document? If that's the case, then you don't need identification processes. Universal Capture can just keep their current structure.

Your original description sounded like you were describing a stack of papers with no pattern to where a document ends and one starts, so both Tessa and Bert were trying to figure out how would you tell when to separate them.

1 0
replied on May 19, 2017

Hi Miruna,

 

What a have is a set of tif images (about 1300)  and each has varying number of pages( one having 67 and another having 24 etc..)

 

I want to scan for a permit # and then store each tiff  as one document. So the 67 page tif as one document and the 24 page tif as a separate document. The permit number should exist on a page that should be in every tif image/document but not necessarily  on the same page number in each instance.

 

My apologies if  I wasn't being clear. Let me know if this makes sense. Thanks for your help.

0 0
replied on May 19, 2017

I think I am on the right track now. Ive only used this once before and the documents I was working with were only 2 pages every time with a clear cut starting page that could be easily identified.

replied on May 19, 2017

Ah I see. Yeah in your case, for the identification, since your documents are already split up properly you don't need to configure any identification processes - just check this box instead:

And then for the permit number, you'll want to OCR all pages and use pattern matching to search for the permit number on all pages of the document.  

1 0
replied on May 19, 2017

How are the Tiff files that you already have named?  Is the permit number already part of the document name for each Tiff?  If so, you can grab that without OCR and Pattern Matching.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.