Question

Using Workflow and Distributed Processing to Generate Laserfiche Pages for PDF's and OCR Documents

Workflow Distributed Computing Cluster

Updated July 8, 2014

asked on June 26, 2014

We are looking to do Batch Importing of Documents.

My thought was to reduce Import Time we could have the users drag and drop documents and then use Workflow and DCC to OCR the documents.

I have tested drag and drop with a batch of TIFF images and DCC works fine to OCR those documents.

However I have done the same test with Non-Searachable PDF's and the Workflow Runs but the PDF's are not OCR'd even though the Workflow is successful and the DCC process states a Successful Task.

Is it possible to use DCC OCR to OCR Non-Searchable PDF's. I even tried first Generating the Images for the PDF without Searchable Text. Still DCC OCR does not OCR the PDF's. Does DCC work with PDF's or only image files?

Alternately can you think of another way to batch process PDF's in a Workflow to generate Image Files and OCR them?

0 0

Answer

APPROVED ANSWER

replied on June 26, 2014

DCC and Workflow do not currently have PDF page extraction capabilities. You can use either Quick Fields or Import Agent to import PDFs and generate image pages for them.

0 0

Replies

replied on July 8, 2014

I wanted to piggy-back off of this discussion and see if there's a way I can accomplish a similar task using the DCC Schedule OCR Workflow Activity.

Our customer is using a PDF injection utility to send PDF reports directly to Laserfiche. They need to have the Laserfiche OCR text available for these PDFs to be text searchable in WebLink. They are able to manually generate the necessary OCR text by using the Laserfiche Client. In doing so they get the desired results; PDF files stored in LF that are fully text searchable.

My question is, is there a way to use DCC/Workflow to OCR the PDF reports that are stored in Laserfiche? Is there another way I can accomplish the generation of OCR text other than the current process of generating manually through the Laserfiche Client?

0 0

replied on July 8, 2014

You don't need image pages and OCR to have searchable PDFs. If you install the Adobe IFilter on the Laserfiche Server, the search engine will extract the text from PDFs and index it. (there won't actually be any text pages, but searching for text will find the documents).

2 0

replied on July 8, 2014

Miruna,

Thank you very much, that's working great!

0 0

You are not allowed to follow up in this post.

Question

Question

Using Workflow and Distributed Processing to Generate Laserfiche Pages for PDF's and OCR Documents

Answer

Replies

Sign in to reply to this post.