You are viewing limited content. For full access, please sign in.

Question

Question

Workflow - Search Repository Efficiency

asked on December 27, 2018

Hi,

I am wondering what will be the most efficient way to use the Search Repository activity.

 

The workflow that I am working on has one simple objective. Migrate entries to the another volume.

The folder structure is:

Repository\Archive:

...\Department A\(50 to 500 subfolders + thousands of documents)

...\Department B\(50 to 500 subfolders + thousands of documents)

...\Department C\(50 to 500 subfolders + thousands of documents)

...\Department D\(50 to 500 subfolders + thousands of documents)

and so forth, as there are 10 departmental folders.

 

My doubt is about what approach would return the entries that need to be migrated faster.

Option 1: Search pretty much everywhere:

({LF:Volname<>"NEWVOLUME"}) & {LF:LOOKIN="VFCLF8\Archive"}

 

Option 2: Using Find Entries it gets all the departmental folders, and then using Search Repository it gets the entries that need to be migrated for each department:

({LF:Volname<>"NEWVOLUME"}) & {LF:LOOKIN="VFCLF8\Archive\%(ForEachFolder_CurrentEntry_Name)"}

 

I have done some testing and to get the entries for the first department (option 2) is faster than searching for all. When only the 10th folder will have entries to migrate, will it still be faster? Or would it be better to change the approach after the first 4 or 5 folders?

 

Thanks!

0 0

Answer

SELECTED ANSWER
replied on December 28, 2018

Given that use case, I'd actually have a workflow that triggers for documents created/moved/copied into the \Archive path when they are not in the right volume. That way you spread the load throughout the day and the documents get processed immediately.

To answer your question, though, yes, I would search for all documents under the \Archive tree, subfolders included, that are not in the correct subfolder. I would also limit the search to about 500 entries at a time. The reason both Devin and I tried to steer you away from a workflow for a one-time migration is that when working with very large data sets, Workflow has a lot of overhead because it keeps track of all the entries it processes and each activity it runs. So for thousands of documents at a time, it tends to get a bit slower with every iteration. The best performance would be somewhere between 500 and 1000 documents at a time, depending on the number of activities in the workflow. But schedules allow repeating, so you can loop around the entire document set in chunks of 500 at a time.

As far as speed goes, Workflow can execute multiple workflows at the same time. If you search, then run For Each Entry and migrate the document, each iteration has to wait for the document to finish migrating. So that's 3-4 seconds for each loop.

Now, if instead of migrating the document in the For Each loop, you only call a separate workflow to migrate it, the loop only has to invoke the other workflow, then it can move on to the next document. You are adding a bit of overhead with starting another workflow (usually milliseconds and a bit of data in the database), but at the same time you are now migrating multiple documents concurrently, so you'll save time overall.

You do have a point on the possible performance impact of multiple documents migrating at the same time as they need to be copied from one volume to another. But Workflow is limited to 4 actively running instances (of the same type) per CPU, so on a, say,  16 CPU WF server, you'd have at most 64 documents being migrated at the exact same time. Unless your SQL or disk are really underpowered, that shouldn't really make a noticeable impact on performance. And even if the migrations do slow down, unless they go from 3 sec per document to 3sec * (4 * number of CPUs) for the group of concurrently migrated documents, you're still coming up ahead overall.

 

Sorry for the novel, let me know if something doesn't make sense.

0 0

Replies

replied on December 27, 2018

Searches that return fewer documents will return fastest. Is this a regular thing, or a one-time load? It might be faster to do this in an SDK script.

2 0
replied on December 27, 2018

At the very basic level, a search is likely to be slower than directly retrieving the children of a given path. But you're still going to iterate over potentially 5000 subfolders, so you're not really saving much time there. And Find Entries doesn't let you exclude documents by volume, so you'd potentially be processing more documents that way.

If this is a one-time migration, I'm with Devin, it's not really worth the overhead of a workflow. Use a script or even the Client (search, select all and migrate and let it run over the weekend).

If it's a regular thing, I'd go with search, narrow it down to 500 documents or so at a time and use Invoke Workflow to do the migration. That way, you'd be taking advantage of Workflow's parallel processing capabilities and you don't have to wait for a document to finish migrating before the next one starts. Then put the workflow on a repeating schedule.

2 0
replied on December 28, 2018 Show version history

Thank you for your feedback. This is not exactly a one time migration, even though entries are not created in/moved to that Archive folder regularly. But new entries will end up there. For this reason the idea is to have a scheduled workflow that will check if there are any entries that need to be migrated on a daily basis.

 

Miruna, by "I'd go with search" you mean that you would search for all entries in that folder (and subfolders) that are not in the desired volume?

 

Invoking another workflow wouldn't increase the time the workflow takes to migrate each entry? (e.g. now it takes 3 to 4 seconds to migrate it, if it has several instances migrating entries, won't that time be like 10 seconds or mote?)

 

Thank you again for all your help!

0 0
SELECTED ANSWER
replied on December 28, 2018

Given that use case, I'd actually have a workflow that triggers for documents created/moved/copied into the \Archive path when they are not in the right volume. That way you spread the load throughout the day and the documents get processed immediately.

To answer your question, though, yes, I would search for all documents under the \Archive tree, subfolders included, that are not in the correct subfolder. I would also limit the search to about 500 entries at a time. The reason both Devin and I tried to steer you away from a workflow for a one-time migration is that when working with very large data sets, Workflow has a lot of overhead because it keeps track of all the entries it processes and each activity it runs. So for thousands of documents at a time, it tends to get a bit slower with every iteration. The best performance would be somewhere between 500 and 1000 documents at a time, depending on the number of activities in the workflow. But schedules allow repeating, so you can loop around the entire document set in chunks of 500 at a time.

As far as speed goes, Workflow can execute multiple workflows at the same time. If you search, then run For Each Entry and migrate the document, each iteration has to wait for the document to finish migrating. So that's 3-4 seconds for each loop.

Now, if instead of migrating the document in the For Each loop, you only call a separate workflow to migrate it, the loop only has to invoke the other workflow, then it can move on to the next document. You are adding a bit of overhead with starting another workflow (usually milliseconds and a bit of data in the database), but at the same time you are now migrating multiple documents concurrently, so you'll save time overall.

You do have a point on the possible performance impact of multiple documents migrating at the same time as they need to be copied from one volume to another. But Workflow is limited to 4 actively running instances (of the same type) per CPU, so on a, say,  16 CPU WF server, you'd have at most 64 documents being migrated at the exact same time. Unless your SQL or disk are really underpowered, that shouldn't really make a noticeable impact on performance. And even if the migrations do slow down, unless they go from 3 sec per document to 3sec * (4 * number of CPUs) for the group of concurrently migrated documents, you're still coming up ahead overall.

 

Sorry for the novel, let me know if something doesn't make sense.

0 0
replied on December 28, 2018 Show version history

Thank you Miruna! You helped a lot and answered my questions!

 

One last thing. To get just chunks of 500 documents at a time, I should add that to the "Results to Return" section of the Search Repository activity, right? (First 500 results)

 

0 0
replied on December 28, 2018

Correct.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.