You are viewing limited content. For full access, please sign in.

Question

Question

Using Workflow and Distributed Processing to Generate Laserfiche Pages for PDF's and OCR Documents

asked on June 26, 2014

We are looking to do Batch Importing of Documents.

 

My thought was to reduce Import Time we could have the users drag and drop documents and then use Workflow and DCC to OCR the documents.

 

I have tested drag and drop with a batch of TIFF images and DCC works fine to OCR those documents.

 

However I have done the same test with Non-Searachable PDF's and the Workflow Runs but the PDF's are not OCR'd even though the Workflow is successful and the DCC process states a Successful Task.

 

Is it possible to use DCC OCR to OCR Non-Searchable PDF's. I even tried first Generating the Images for the PDF without Searchable Text. Still DCC OCR does not OCR the PDF's. Does DCC work with PDF's or only image files?

 

Alternately can you think of another way to batch process PDF's in a Workflow to generate Image Files and OCR them?

0 0

Answer

APPROVED ANSWER
replied on June 26, 2014

DCC and Workflow do not currently have PDF page extraction capabilities. You can use either Quick Fields or Import Agent to import PDFs and generate image pages for them.

0 0

Replies

replied on June 26, 2014

You could always create an SDK script within workflow to generate pages for PDF documents. The only issue is building a queue for the order in which the PDFs get pages generated. The other issue is it would have to be specific for PDF documents since it would hang if you were to do it for any other type of documents e.g. Office documents.

replied on July 8, 2014

I wanted to piggy-back off of this discussion and see if there's a way I can accomplish a similar task using the DCC Schedule OCR Workflow Activity.

 

Our customer is using a PDF injection utility to send PDF reports directly to Laserfiche.  They need to have the Laserfiche OCR text available for these PDFs to be text searchable in WebLink.  They are able to manually generate the necessary OCR text by using the Laserfiche Client.  In doing so they get the desired results; PDF files stored in LF that are fully text searchable.

 

My question is, is there a way to use DCC/Workflow to OCR the PDF reports that are stored in Laserfiche?  Is there another way I can accomplish the generation of OCR text other than the current process of generating manually through the Laserfiche Client?

 

 

0 0
replied on July 8, 2014

You don't need image pages and OCR to have searchable PDFs. If you install the Adobe IFilter on the Laserfiche Server, the search engine will extract the text from PDFs and index it. (there won't actually be any text pages, but searching for text will find the documents).

2 0
replied on July 8, 2014

Miruna,

 

Thank you very much, that's working great!

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.