You are viewing limited content. For full access, please sign in.

Question

Question

Processing Single Document for multiple pages...

asked on August 18, 2017

This is my first QF project so please bear with me.  I have a large document which is being imported using Import Agent. This document is then processed by QuickFields agent running  two QF sessions.  The large document contains multiple 2 page documents.  I ran into issues trying to read both pages at the same time when a page was scanned backwards (page 2 first). So I decided to read page 1 from the file first, then run another session for page 2. 

What we are doing is Zone OCRing an ID and matching that up with an Zone OCRed Name. We do a database lookup on the ID and verify the name returned matches the scanned name.  This prevents documents from being filed under the wrong person.

One issue I am having is when the Database lookup fails to get a match on Name and ID it returns a null set. I thought this would fail the page identification...it does not, the page is still created if it meets the other criteria and the Filename which uses the ID and Name is blank in those areas.

Secondly, I want to strip the identified pages from the original laserfiche document.  This will help speed up processing for page 2. Additionally, it will allow only failures to remain after page 2 is processed. These can then easily be manually processed.

 

In review:

1.)I cannot get the Lookup Process to fail the Page Classification.

2.)I cannot get the Page once identified to be removed from the original document.

 

Any help is greatly appreciated.

0 0

Answer

SELECTED ANSWER
replied on August 18, 2017

Would a work-around solution be to:

  1. Run it through quick fields and have it split every 2 pages, OCR the zones for name (and anything else you want for metadata) and apply the metadata, and OCR the whole document to create a text file.
  2. Store it in a folder in the repository.
  3. Set up a workflow that will grab those files and process them.
    1. It would read the text and if a certain phrase is not on page one, you could rearrange the pages.
    2. Then grab the name from the metadata and do the database look up.
    3. If no name is found in the database you could move the file to another folder for further review.
1 0

Replies

replied on August 18, 2017

Do you have the session set up to delete the original document from the repository?  Do you have the session set up to create new documents or merge documents?  If you are deleting the source document all together and creating new documents for each document class, this should take care of issue #2.

DCSetup.jpg
QFSetUp.jpg
DCSetup.jpg (74.2 KB)
QFSetUp.jpg (55.92 KB)
1 0
replied on August 18, 2017

I have Merge the documents setup so that when I process page 1 with a name using metadata [UserLast, User First (USERID)-Registration], page 2 with the same name will append back together with it's page 1 creating a complete 2 page record for an ID. 

On the post processing, I have is setup to simply move the document currently. I DO NOT HAVE Quick Fields Complete which includes Multiple Document Identification.  This is why I have to process page 1 and page 2 in different processes vs creating a second document class.  If I deleted the original document after retrieval then I wouldn't have the document to process after page 1.

Additionally the failed documents all go to one location with one name as I can't use the %(path) token to separate them for the 60 individual contributing users.

2017-08-18_13-28-57.gif
0 0
replied on August 18, 2017

also if I created documents from the two classes then deleted the original, how would I identify the failures which could not be processed?

0 0
replied on August 18, 2017

Can you share a couple of screenshots of how your OCR sessions are set up and how your document classes are set up?

0 0
replied on August 18, 2017

Done...

 

0 0
replied on August 18, 2017 Show version history

So I was able to get 1.) resolved.  The solution for 1. was that even though I put the conditions in my Lookup Process and it failed to return a record, it was not acting as a identification validation.  So I added just below the Lookup a Token Identification and Ran the same validation LookupID=OCRid now the pages will fail.

 

So now for #2  How do I remove identified pages from the source document?

2017-08-18_11-28-54.gif
0 0
replied on August 18, 2017

What is your ultimate goal for the documents?  To have the large file broken down into smaller files that have both page 1 and page 2?  I fear that if you run the documents through 2 sessions (one to identify page one and one to identify page 2) you will end up with the document being broken down in twice as many files because page 1 and page 2 would be separate files.

Will the large document always only contain 2 page per form?  Instead of using first page identification could you use last page identification and just set it to always split at 2 pages?

0 0
replied on August 18, 2017

Jennifer good thinking....I have 64,000 pages which will be either a page 1 or page 2.  These are scanned in by stations remotely and put into network shares.

The issues with the 2 pages per form was reversed documents caused all kinds of havoc. Works great if you can guarantee that all documents are in the correct order.  That's why I am doing it by this method.

So here is the ideal situation.  A 200 page document is scanned and moved into the monitored repository folder.  QF Agent runs page 1 identification on the document. 98 of the pages are identified as page 1 (two error out based on OCR issues). The 98 correctly identified page 1's now are new documents named using metadata retrieved from the Lookup process which are filed under their respective users by metadata(already have that process down in workflow). 

Now the original 200 page document is 102 pages (200- the 98 identified pages). This document now contains page 2's and the Error page 1's.  QF agent now runs page 2 identification on it. 99 page 2's are identified, created as new documents and stored based on metadata pulled from the lookup.  The original 200 page document now has 102-99=3 documents which are the errors. 

A user can now come in and manually process the 3 errant pages contained in the document which used to consist of 200 pages.

 

0 0
replied on August 18, 2017 Show version history

I get what you are trying to do but I don't think it is going to work the way you think it will.

 

When Quick Fields uses first page identification it keeps all pages together until it find another page that fits the first page identification.

 

In a small example let’s take a 10-page document.

 

Page 1 says A

Page 2 says B

Page 3 says A

Page 4 says B

Page 5 says A

Page 6 says B

Page 7 says A

Page 8 says B

Page 9 says A

Page 10 says B

 

In your document class the first page identification is that the page must say A.  In the above example you would end up with 5 two-page documents (pages 1 & 2; pages 3 & 4; pages 5 & 6; pages 7 & 8; pages 9 & 10) and then three 2-page documents.

 

 

Example 2

 

Page 1 says A

Page 2 says B

Page 3 says B

Page 4 says A

Page 5 says A

Page 6 says B

Page 7 says A

Page 8 says B

Page 9 says A

Page 10 says B

 

If the pages are backwards, you will still end up with 5 documents but the first will be 3 pages long (pages 1 & 2 & 3) and one will be 1 page (page 4).

1 0
replied on August 18, 2017

Thanks I think that's what I'll have to do. Thank you. 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.