You are viewing limited content. For full access, please sign in.

Question

Question

how to determine indexed search fragmentation and fix very slow indexed searches

asked on November 7, 2014

Our indexed search is quite slow and I want to fix it.

In the PowerPoint for "EDM301: Improving Search Efficiencies" on slide 33 there is an image showing a dramatic improvement in search speed after defragmenting.

How can I determine what the fragmentation percentage of my index is?

If the fragmentation percentage is close to 0 what is the next thing too look at?

Simple searches take minutes, and this is with with no fuzzy, partial, or root word variation searching turned on (metadata searches are fast usually a few seconds or less). I don't see any indication that we are running low on memory or other server resources even when the index is optimizing.

Also, sometimes pictures in tiffs are OCRed and generate garbage text files. Is this garbage text indexed and does it slow searches?

The attached shows the size of our index files.

Thanks in advance.

index.jpg
index.jpg (67.58 KB)
0 0

Replies

replied on November 13, 2014

Hi Robert,

It sounds like the Laserfiche Search Engine Configuration Utility will be of use to you. This utility allows you to monitor and manage the Laserfiche Full-Text Indexing and Search Service, including fragmentation percentage and memory usage.

If you find that the search index is fairly fragmented, take a look at the Laserfiche QuickReindex utility as another resource in addition to the optimization options available with the Search Engine Configuration Utility.

If you still experience search issues after looking at fragmentation and index optimization, I recommend trying out some of the other tips mentioned in the Empower 2014 EDM301 presentation, such as limiting the number of results returned and verifying that Bypass Filter Expressions is enabled.

As for "garbage text," additional text files will always provide more content against which a search term is checked. How much of an impact this has on search performance will depend on how many of these files you have.

2 0
replied on November 13, 2014

Thanks Kelsey, I did not know about that utility.

We had some improvements, but I'd like to do more if possible:

  1. The index has both a body and non-body fragmentation of 0% so that looks good. The optimizer runs at times so that’s good as well.
  2. Using bypass filter expressions and bypass browse helped. Cutting the results to 500 makes it go fairly fast (<10 seconds) but I don’t know how it picks the “top 500” results. Sometimes someone really does want all the results. Does a full repository search taking ~2.5 minutes returning ~20,000 entries sound like something we can improve on? Total text size using the SQL query in EDM301 =~ 15GB
  3. When I do a “showmem” with the utility I see many cases where “CurrentSize” is smaller than both “DesiredSize” and “RequiredSize”. Can this be addressed just by adding more memory? We are a little short based on EDM301.
  4. If I can identify pages with garbage text is there a way I can exclude them from the index?

Thanks

0 0
replied on November 19, 2014

The search operation includes both full-text search, and security check. Usually, full-text search time is not affected by the search result number limit. In your case, full-text search seems to be fast (as it is when the results are limited to 500); while security check is the bottleneck. So I suggest you to try other methods mentioned in 'Security and Performance' and 'Column Display' in the slides.

As for your questions:
1. Fragmentation should be 0% after optimization. And you don't need to further optimize the index in this case.
2. The results are ranked by relevance. And tuning security settings should be able to improve search performance.
3. In most cases, this is not an issue. Current size can be small if your search requests don't need more memory.
4. For the garbage text, you can delete these text pages. As mentioned before, I don't think this will improve your search performance a lot.

1 0
replied on November 19, 2014

Thanks Cangfei.

The BFE & BB settings were a huge improvements. I will also turn on indexing for larger fields and hopefully we get even more speed.

Rob

0 0
replied on November 19, 2014

Please note that any users with Bypass Browse will be able to see the full folder structure of the repository, regardless of their entry access rights. Depending on the system this could be a security issue.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.