I’m currently in the process of importing thousands of files from our network drives into laserfiche. The issue is that these files are in all in folders, and some of the folders contain only .jpg files and some contain .jpg files + PDF files with text. If I select a large batch of folders to import, then generate laserfiche pages and also generate searchable text, the import hangs whenever it comes to an actual image, like maybe a car bumper from a police report or similar. It does occasionally recover, but the amount of time lost is substantial. Is there a way to tell Laserfiche to ignore OCRing .jpeg images if I know all the images contain no usable or necessary text? Thanks for any insight you may have.
Question
Question
problem importing large amounts of documents
Answer
OCR has an internal timeout for trying to detect text on pages. The default is 10 minutes. See this KB article for instructions on how to set that value to a lower threshold.
Replies
Since you are doing this manually, then I think your best bet would be to import them separately with different settings (you could create a second account with different import settings and even log into a second LF instance on the same machine if you wanted), or just do it on a more powerful machine because OCR is especially resource-intensive.
The problem is that the OCR step applies to the generated pages not the original source file, so by that point there there is no more "jpeg" or "pdf" to exclude. You can tell LF not to generate pages for jpeg, but that's not really what you want either.
If you use Import Agent, then you can create a filter on the import agent profile to only target specific file types; you could configure one profile for your PDF files, and another for your JPEG files with the OCR option disabled.