Question

Full-text Search Performance and Scalability

Search and Retrieval

Updated August 14, 2015

asked on October 8, 2014

Is there any way to configure the architecture of Laserfiche to accommodate sub-5 second full-text searching across 15-20TB of imaged content, ~200-300 million pages?

In this particular instance there are not that many people searching, but they can't wait minutes for results.

Does anyone have any suggestions regarding how this might be accomplished. Money isn't an issue, but SharePoint and Federated Searching of Laserfiche Repositories is not an option.

1 0

Replies

replied on October 13, 2014

Hey James,

It'll probably be easiest to consolidate our conversation to one medium. Since this is visible and may prove useful in the future, I'll leave my comments to your most recent email to Presales here:

The development teams are in the process of working on the searching capabilities. In particular, work has been done on adding clustering support to LFFTS and we hope to provide that in the next major release.

In terms of the threads, a single search uses a single thread. Concurrent searches will each have their own dedicated thread. In general, one the number of threads equals the number of physical cores available, that's where we'll start running into diminishing returns.

I'm very curious to hear more about the solution that you proposed regarding the multiple queries across different repositories! I'm not aware of any development like this internally, but depending on the feasibility and performance gains, it sounds like it could be a promising idea!

Unless you have any objections, let's migrate the discussion to this forum so we can have this as a reference to work with in the future. If you'd like to continue the discussion privately, as to avoid the distribution of certain details, just respond with any comments or questions to the Presales inbox.

0 0

replied on October 14, 2014

It is a challenging task to return search results in sub-5 second in such a repository. While, this is practicable if the query is always simple (ideally only one word in each query), and the number of search results is limited (for each word in the query, there are <10k entries match it). A possible scenario is to search a phone number across the repository. In this case, put the index files on SSD, and then a single high-end machine should do the job.

As Rob pointed out, clustering is the long-term solution for scalability in most cases. I will keep you updated when clustering is supported in LFFTS.

0 0

replied on March 3, 2015

In 9.2.1 release, Parallel IO mode is introduced to improve index and search performance. It is recommended to use Parallel IO mode for large catalogs. Please visit https://support.laserfiche.com/ow.aspx?LFFTS9.2/ListOfChanges921 for details.

0 0

replied on August 14, 2015

If LFFTS is on a seperate server, the registry change must be made on the LFFTS machine, LFS machine, or both?

0 0

replied on August 14, 2015

LFFTS only, as parallel I/O mode is an internal feature of LFFTS and is transparent to LFS.

0 0

replied on August 11, 2015

What's the general threshold you'd consider as "large" catalog?

0 0

replied on August 13, 2015

Your repository doesn't need to be very large to see benefits. If it's at least 200k pages and you're encountering full-text search performance issues, parallel I/O mode could be beneficial.

0 0

You are not allowed to follow up in this post.

Question

Question

Full-text Search Performance and Scalability

Replies

Sign in to reply to this post.