You are viewing limited content. For full access, please sign in.

Question

Question

LF Indexing - How does it work when it comes across entries with 0 pages

asked on February 3, 2020

Hello,

 

We have a large queue of entries to be indexed but a large number have 0 pages.

What would be the best way to deal with these entries as it seems the Laserfiche indexing is spending a large time processing them, I've thought about setting up a temporary LF account and PC and slowly generate tiff pages using Snapshot on the 330,000+ entries with 0 pages and letting it just run in the background over the coming days/weeks...

 

Questions

  • Is there a better way of dealing with these documents?
  • Why does the index seem to be running slowly on these entries with 0 pages?
  • How does the index work on electronic documents that are not compatible with being indexed, i.e. .zip entries, does it have an exclude list, or does it try 12 times on the entry that is a zipped folder before moving on to the next entry?
  • When the indexing comes across an electronic document with "0" pages how does it treat that entry?

 

We have our indexing server on a separate server to our repo server. Which also has a spec that is better then recommended guidelines.

 

Just trying to piece together all our info that we know about the indexing side of things.

 

Thanks in advance!

 

Rob

 

0 0

Replies

replied on February 3, 2020

There are multiple document components that could be subject to indexing: text pages, electronic document, fields. If the document has an electronic component, the search engine will attempt to extract text from it (using IFilters) and index that.

The document would be skipped if none of its components are indexable.

Generating image pages through Snapshot is not going to make a difference unless you also OCR. The search engine relies on text not images.

2 0
replied on February 3, 2020

Thanks for the swift response Miruna,

 

In regards to the generating images that was my thought as we run a workflow at the end of the working day to catch any documents that haven't been OCR'd, so the ones that I have generated images for would join the queue.

 

Any ideas on why it's processing so slow - we have the necessary IFilters in place, we're curious if its the entries themselves that are causing the issue, for example I was trying to generate pages for some .doc entries and get the following error:

 

Error Code: 0
Error Message: Cannot obtain current entry ID.

------------ Technical Details: ------------

LFSO:
LF.exe (10.4.0.311):
    Call Stack: (Current)
        PrintToSnapshot
        EdocWorkerThread
    Call History:
           CAttachedRepository::GetProfileValue
          CLFApp::StartMonitor
          CLFProcessMonitor::PrintProcess
           CLFProcessMonitor::ShellExec
          GetOptionString ([Settings]SnapshotBatchMode)
           CAttachedRepository::GetProfileValue
          GetOptionString ([Settings]SnapshotBatchMode)
           CAttachedRepository::GetProfileValue

 

Thanks in advance! smiley

0 0
replied on February 3, 2020

That's probably a different issue. It would be better if you take this up with your reseller and open a case with Tech Support so we can take a closer look.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.