You are viewing limited content. For full access, please sign in.

Question

Question

Workflow full repository query

asked on January 26, 2024

I have a user with a workflow that scans the whole repository, it is currently taking 15+ seconds to move a folder. When they ran a simple query on their test DB it took about 6 seconds, then they built a really basic index for a field and the resulting searches were sub 1 second. Is there a way to increase the speed of the whole repository search?

0 0

Replies

replied on January 26, 2024

Is the folder move part of the workflow? Or is the issue that the search slows everything else down? Showing a diagram of what the workflow is doing would be very useful to determine what is happening and what might be causing slowdowns. You may want to start your troubleshooting by looking at the list of active workflows.

This may or may not be relevant, but I've experienced similar issues in a workflow when I ran a search and then ran a "For Each Entry" loop that ran another workflow on each entry, but did not check the "Wait for invoked workflow" checkbox on the "Invoke Workflow" activity, so it quickly spun off a load of simultaneous workflows. The slowdown was fixed by checking the "Wait for invoked workflow" box, because it then only runs one at a time.

0 0
replied on January 26, 2024 Show version history

The folder move is not what is taking time.   When it goes to move the folder it is only a few 100 milliseconds.    What is taking the time is the query to search the entire repository to see if that folder already exists.   That is showing in the workflow at 15+ seconds.   Thanks!

 

0 0
replied on January 26, 2024

Understood. Searches that return lots of results do tend to take quite a while. Understanding what the workflow is intended to do might help provide us with information we could use to provide suggestions on how it could be improved.

Here's an anecdotal example I can give to tell you how I fixed the slowness for a workflow that needed to run a search for, and process, over a million entries.

The goal of the primary workflow was to run a secondary processing workflow on each entry within a repository folder that contained an "Unprocessed" tag. The processing workflow would remove this "Unprocessed" tag, so they would never be processed twice. Like you, the issue was that searching for that many entries takes forever.

However, you can limit the number of results that a search returns. Because of this, I looked at approximately how many entries my workflow could process in an hour and set the search result limit to slightly above that amount. I put the "For Each Entry" activity within the primary branch of a Deadline activity. I also put an "End Workflow" activity within the Deadline branch and set the deadline to 1 hour. I then set the schedule of this primary workflow to run once a day and repeat every hour for 23 hours.

Using this method, the primary workflow's searches are considerably faster because the entries are being processed in batches rather than all at once, and the workflows don't pile up or conflict because they will always only run for an hour at a time. An additional benefit is that I can now view each workflow instance's data. This is important because, if there are too many entries in a workflow like yours, trying to view the workflow instance's data can cause it to time out and you will be unable to view the data at all.

It sounds like you may be able to do something similar, or even break it down into shorter workflows if needed.

0 0
replied on January 26, 2024

Sure.... the workflow is designed to take a folder from one area and move it to another.   So what is does is.

Search Entire Directory to see if a folder with the same name exists.

YES:   Merge the data into the prexisting folder.

NO:    Create a new folder in the target destination.

That is pretty much it.   

Here is the search query

{[]:[Case Number]="*"} & {LF:Name="%(CaseNumber)", Type="F"} & ({LF:LOOKIN="digital-evidence\1. PROCESSED DIGITAL EVIDENCE"} | {LF:LOOKIN="digital-evidence\2. DISCOVERY OPEN - CA"} | {LF:LOOKIN="digital-evidence\3. ATN RETURNED"} | {LF:LOOKIN="digital-evidence\4. ARCHIVE"} | {LF:LOOKIN="digital-evidence\2. DISCOVERY OPEN - DA"} | {LF:LOOKIN="digital-evidence\5. PURGE"} | {LF:LOOKIN="digital-evidence\6. REMOVED FILES"})

 

Here is the activity from the Workflow. 

 

 

0 0
replied on January 26, 2024

I just ran a little test to see if I could find a more efficient search method, and I think I have.

If the %(CaseNumber) folder is always a child of one of those folders you are searching within, you should add "SUBFOLDERS=0" to your LOOKIN statement. That should shave some time off. For example, {LF:LOOKIN="digital-evidence\2. DISCOVERY OPEN - CA", SUBFOLDERS=0}

What is the purpose of having {[]:[Case Number]="*"} in the search? Is that to ensure the field is assigned and not blank? If you don't need to check that value, remove it from your search altogether and see how that impacts the time.

0 0
replied on January 26, 2024

I am not entirely sure what the {[]:[Case Number]="*"} is for.   I wasn't the original builder.    Those are some great thoughts and it would almost work except when it goes into our '4. Archive' at which point the folder structure changes.   I would need to shave it down to SUBFOLDERS=2 or 3.   

 

Can I specify different SUBFOLDER values for each LF:LOOKIN section?   

Also, do you know if this search query is a mask for a SQL query?   I just cannot imagine it is working like a Windows file system but I am not sure.

Thanks again!  

 

0 0
replied on January 26, 2024

{[]:[Case Number]="*"} ensures that the metadata field "Case Number" is assigned and that the value is not blank. It may not be necessary at all, but you should do some test searches to be certain.

SUBFOLDERS cannot be set to 2 or 3, only Y/N or 1/0. 0 is the same thing as N: do not search in subfolders. 1 is the same thing as Y: search in subfolders. As you can imagine, searching within subfolders could dramatically increase the search time if you have case folders that the search will also look in.

All I can say in response to your SQL question is that Laserfiche runs on SQL, so yes, it is most likely converting this syntax into a SQL query.

0 0
replied on January 26, 2024

Thanks for clearing those things up for me.   I will try to see if I can find a way to limit it a little further.    I don't know why it would care about the metadata for CASE NUMBER so I will try removing that.  

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.