Ive setup the Schedule OCR process within workflow using the Laserfiche Distributed Cluster (on the production server). The Workflow is running and searching the repository each night for documents that have not been OCR’d. The problem is that is does not find (and therefore does not OCR) PDF files.
Question
Question
Will the schedule OCR process in Workflow process PDF files.
Replies
The OCR process only applies to image pages.
Hey Jeff,
You may want to consider generating images for your PDF documents in order to make sure that the Workflow OCR's the entries. Depending on the volume of PDFs, you could run a pretty simple search and generate images en masse.
Hey Rob,
Thanks for the info and we were talking about that option. I know you can automate the search in workflow but is there a way to automate generating the LF pages maybe with a script? The documents are being directly inserted into LF using a 3rd party application or we would just set the option in the client to generate the pages.
Maybe I'm missing something, but is there a reason that we're not extracting text from the PDF upon import? Check out the options menu in Tools>Options>Generate Text>Advanced Settings for PDFs. Here you'll find radio buttons that allow you to generate text using text extraction or OCR existing text as soon as a PDF is imported to the repository.
I'll assume that you want to move forward with the original plan, but if you'd like more information regarding the previous paragraph, just let me know!
OK, so the original problem is regarding the automation of generating pages for PDFs. You'll find an option in the Tools>Options>New Documents>Settings menu that allows you to "Generate Laserfiche pages...when importing PDFs". Once you have your PDF/TIFF document available in the repository, your nightly workflow that OCRs should be able to find the PDF/TIFF file and OCR appropriately.