You are viewing limited content. For full access, please sign in.

Question

Question

Sometimes extract a single document from an Document Class and apply the same metadata.

asked on December 1, 2015

I am currently working with a client that has a need I have never tried to accomplish before.  They are scanning stacks of Fixed Deposit Certificates as well as their supplementary documentation.    These are standard typed form documents and was a very simple process to set up.  At the end of the day today, the CIO came to me to see if it was possible to extract one document, the Fixed Deposit Application, from the packet, only when it exists, and apply the already captured metadata to this new document to be saved separately.    Normally I would say just put this form first when prepping the scans, but this is a completely hand written document.  I can identify it and assign it to a class, but I can't get any metadata.  

I suggested Scanning the applications separately or saving the document class  to a different folder for someone to manually enter a couple of pieces of metadata.  His issue with that, is they are just starting their imaging project and this will easily add 10,000 documents that someone will have to manually key in information to make them usable.  We thought about just manually extracting the page via "create new document" in the client, but again, to many manual interventions.

Here were the thoughts of what might be options after QF identifies the form and separates it out.

1. Is there a token collector, script, etc that may be run in QF to copy the metadata captured from the certificate to the application?

2. Is it possible to put an annotation on that page to tag it as it comes through the QF session, then workflow could look for pages with that annotation and extract the page to "create a new document".  

I know this is a long shot but after a few hours of trying different options and exploring QF and WF activities, I thought I would bounce it off the community.

Thanks for any help.

1 0

Answer

SELECTED ANSWER
replied on December 3, 2015

So, I found a way to make this work.  I found something that I could OCR that would only be on the handwritten page (A section called "For Official Use Only").  I have a process setup to OCR that section of every page 2 and put the info into an unused field on the template.  I then applied a regular expression to that token so that it only put the word Official in that field.  BTW, before I just applied regular expression to the token I had tried to use pattern matching to do the same thing, but for some reason the OCR'd text would show up in the output but never get passed to the pattern matching process below.  The OCR Process that processes page 1 passes to pattern matching correctly.

 

Now, once this field has the word "Official" in it, Workflow can use that as a key to perform it's functions.  There was already a workflow processing and doing quality control on these incoming documents, so I added a routing decision to handle this variation.  If the document comes in with the word "Official" in it's field, then Workflow will do the following activities.

  1. Create a new document entry (Blank)
  2. Assign the field values from the starting entry to the new blank entry
  3. Move page two from the starting entry to the newly created entry. 

 

Now the handwritten application can be captured as part of a package of documents and will get pulled out when it is detected in the package. 

1 0

Replies

replied on December 2, 2015

Jason,

The token collector is already built in to Quick Fields.  If I understand you correctly, the Fixed Deposit Application is part of your "supplementary documentation" right now and you want to break that out into it's own document when you find it but it's handwritten so you can't pull any metadata off the form.  You should be able to create a document class for the Application since you said you can recognize it.  Then use the Token Collector process in the original Document Class to store the metadata that you want to pass on to the new Document Class.  Use the Token Retriever process in the new Document Class to use the stored values.

Will this work for you?

1 0
replied on December 2, 2015

I thought of that, but there are 3 different document classes and the application can come through in any of those classes.  Therefore, Unless I don't understand how the token collector / retriever works.  I don't know which class to pull info from once I pull out the application is split out.

 

What I am trying to do now is extract a unique word off that form and put it in an unused metadata field.  Then once the document is saved to LF with that word in a given metadata field, workflow can then extract page 2 (where the application will always be) and create as separate document with it and copy the metadata from the original document.  The issue I m running into is that I can't get the OCR to run on page 2 of the class, doesn't seem to work so no data is being passed to my pattern matching process.  Thoughts?

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.