I have a client that has about 100,000 tiff documents that they want to import into LaserFiche. They want them all full text OCR'd when entered into the system. Is there an easier way to import the documents at first without running the OCR then letting it run after the import? It is just taking forever then eventually force quitting after a couple thousand documents. There are not indexing any documents just merely searching for documents using OCR text
Question
Question
Is there an easy way to import a large amount of documents?
Answer
Try Import Agent or Quick Fields for bulk import.
Replies
As others have said, import agent is great for bulk import, especially if the tif files you are importing are large files. 10,000 pages is not too bad to churn through if they are all 1 page docs but 500,000 pages is another level if they average out to 50 pages each.
Personally I like to combine that in situations like this with Distributed Computer Cluster. You can have import agent pull everything in and add a tag such as "OCR Needed". Then you can have a workflow that periodically runs adding documents to the DCC Queue by doing a search for the tag and only adding the first X of them.
This way if it is a very large amount of documents you can add more machines into your cluster to churn through them faster. Plus the documents are available as soon as they've been uploaded, even if they haven't been OCR'd yet. You just have to watch out that you don't have too much in the queue at once. It runs into issues if it has more than 4-5k in there at once. (hence the workflow that runs periodically to feed documents in at a slower rate).
Import Agent is what we use. We recently did a large import of documents. I scheduled the import to process after hours only in order to not interfere with normal daily tasks. It worked perfectly.
We are manually importing thousands of PDFs from an external scanning company.
We need to generate pages and then OCR these files and move to a workflow.
Some are thousands of pages. LF Cloud
Suggestions on how we can improve this process?
Thank you in advance