You are viewing limited content. For full access, please sign in.

Question

Question

The Laserfiche 8.3 client reposity that I manage contains a large amount of documents that were scanned in directly and without OCR. How can I systematically search for all non-OCR documents and perform an OCR action on all of them in bulk? t

asked on February 19, 2014

The Laserfiche 8.3 client reposity that I manage contains a large amount of documents that were scanned in directly and without OCR. How can I systematically search for all non-OCR documents and perform an OCR action on all of them in bulk?  

0 0

Answer

SELECTED ANSWER
replied on February 19, 2014

One tip if you have thousands and thousands of documents:

 

Do it on multiple machines!

 

But you have to be careful you are not working on the same documents... 

 

One easy way to do it with two machines is do the search above as shown by Blake and click a column to sort, such as created date. Then grab the first 1000 (if you shift click the first one and then scroll down a few pages and shift click again it'll show you how many you selected in on the status bar at the bottom of the client) and start OCRing them before you leave at night.

 

Then go to machine B and do the same search but this time click whichever column twice to sort it in reverse. 

 

 

For the future, if you don't want to spend time OCRing as you are scanning (since it does slows things down somewhat) and if you don't want to do this process manually again I'd consider talking with your VAR and purchasing either Import Agent or a Distributed Computing Cluster license (which if you happen to have Rio you now have one license free already).

2 0

Replies

replied on February 19, 2014

You can perform a Pages search for documents that contain text on no pages. After you receive your results you can select all and start the OCR.

 

2 0
SELECTED ANSWER
replied on February 19, 2014

One tip if you have thousands and thousands of documents:

 

Do it on multiple machines!

 

But you have to be careful you are not working on the same documents... 

 

One easy way to do it with two machines is do the search above as shown by Blake and click a column to sort, such as created date. Then grab the first 1000 (if you shift click the first one and then scroll down a few pages and shift click again it'll show you how many you selected in on the status bar at the bottom of the client) and start OCRing them before you leave at night.

 

Then go to machine B and do the same search but this time click whichever column twice to sort it in reverse. 

 

 

For the future, if you don't want to spend time OCRing as you are scanning (since it does slows things down somewhat) and if you don't want to do this process manually again I'd consider talking with your VAR and purchasing either Import Agent or a Distributed Computing Cluster license (which if you happen to have Rio you now have one license free already).

2 0
replied on February 19, 2014

As Chris mentioned, it can bog down your server. A way around this would be to perform the search for non-ocred documents before you leave for the evening and then have it OCR overnight.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.