You are viewing limited content. For full access, please sign in.

Question

Question

How to extract pages from a batch that contain a certain phrase

asked on October 27, 2014

I have a customer that is going through an audit and has scanned completed jobs into batches and within these batches there are invoices that we would like to extract.  We have been able to extract the invoices by searching in LF and then refining our search and then copying the search results.  This has been very painfully slow.  Does anyone know of a more effecient way of doing this by using workflow or quickfields?
 

0 0

Answer

SELECTED ANSWER
replied on October 29, 2014

For this request, you need to make sure you not only have searchable text, but pages as well. If a PDF has text but no pages, you are out of luck trying to copy or remove that page from a document.

 

1. Create a search that finds all documents with pages, and contains the phrase in the text that you need.

2. In workflow, create a search activity and input that search into it. 

3. Use a "For Each Entry" activity and set it to go through all the found items.

4. Use a "Retrieve Document Text" activity with the Token option set for making the text a multi-value token

5. Use a "For Each Value" activity to iterate through each value of the text

6. Use a Conditional Sequence to see if the current value contains the phrase you need. 

7. Inside the conditional sequence, have it use the token for the iteration of the value for the page number you are to copy to a new entry. If you move the page out, then you are making the next invoice inside the same document inaccurate as you have removed a page now. 

8. Publish and initiate workflow. Sit back and relax with some lemonade.

1 0

Replies

replied on October 28, 2014

Where are you looking to extract those documents? Based on the information that you've given, it definitely sounds possible with Workflow but it's difficult to say for certain without more information. Basically it would work like this, since you have the OCR'ed text already you can pattern match for the pages in Workflow and copy those pages and put them into another folder so you don't modify the original documents. 

1 0
replied on October 27, 2014

By the way the batches have been ocr'd

0 0
replied on October 29, 2014

I will give it a try thanks

0 0
replied on October 31, 2014

Hi John, 

If your question has been answered, please let us know by clicking the "Mark this reply as the answer" button on the appropriate response.

If you still need assistance with this matter, just update this thread. Thanks!

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.