We had a client that had about 40gb of data uploaded. Fulltext Catolog wasn’t run on the storage. Is there a way to run fulltext on the entire repository? The documents were imported and workflow created folder structure based on the metadata.
Question
Question
Answer
I added a for each entry module and put in the OCR inside it. Worked after I did that.
Replies
Are you referring to this?
Do you mean the documents were not OCRed or that they were not indexed by the Full Text Search Engine?
If they are just not OCRed (therefore you cannot text search them), you can run a search to find them, select the results, and generate text. It will take a while to OCR that many, so it may be better to use DCC.
OCR is the process of extracting text from the image and creating the text layer. This text layer is then used to index the words in the Search Catalog. Only after the document has been indexed by the LFFTS is it text searchable.
There are several reason a document may be OCRed, but not Indexed. The most common is that the Search Catalog sometimes will get corrupted and either go off-line or into read-only. When you have this condition nothing new gets indexed and little if any text results are returned. To fix this, delete the old Search Catalog and then create a new one and reindex the whole repository.
Another common reason can be if the LFFTS service has stopped. Restart the service and then try to manually trigger the indexing of the document.
Sometimes it's as simple as just seeing if you can restart indexing. If that doesn't work, you can do the above.
no you have realllllly confused me. The documents ARE indexed but we still can't search and find document text. Why would that be??? I rcreated a new search catalog and reindexed the repo. all is completed. Why would I not be able to search the text on the document? Do I also have to use DCC and create OCR for each Doc?
so basically you are saying i have to OCR before it can index....? If I don't OCR I would assume the index would have nothing to catalog? If this is the case, wow.
The documents must have text before they can be indexed. If the documents have text in the text pane, then it should be possible to index them. If there is no text, then you'll have to OCR them. Are the documents native Laserfiche documents, with pages, or are they some other format?
Workflow does not perform the OCR on it's own. It needs DCC in order to do the work. So, you'll have to set up DCC, and then it will work.
Ok, so I am really confused here. What is the difference between Full Text search and OCRed? What is the point of Full Text indexing a document when I can't search in the client by the information in the pages?? So, if the user wants to search by words it isn't full text but OCR searching???