You are viewing limited content. For full access, please sign in.

Question

Question

Best practise for Workflow: 1 big search or many little searches?

asked on September 8, 2021 Show version history

Hello,

 

I'm setting up a workflow to go through 15,000 entries and I'm stuck at a bit of a dilemma, do I do 1 big search and run a for each entry activity afterwards for 15,000 entries, or do I do 1,500 small searches where the each entry activity would run about 10 times per search?

 

Curious to see what you think.

 

Cheers!

0 0

Answer

SELECTED ANSWER
replied on September 8, 2021 Show version history

Do it in batches with smaller searches. You don't need to go as small as 10, but keeping it in the 100-250 range would provide optimal performance depending on how many activities run in each iteration. 15,000 is a lot of results, but the bigger concern is the for each loop.

When a workflow runs any kind of loop like that, each loop iteration is logged and as you loop that activity data will accumulate and severely impact the workflow's speed as it builds up.

For example, when I first started working with workflow I had a process running on millions of documents. I found that in large batch loops, it would take milliseconds per document in the beginning, but by the end it would start taking minutes per document.

Breaking it into batches will provide optimal speed all the way through, but, it is important not just to break the work into batches but also give each batch a fresh workflow instance.

What I usually do is have a parent workflow that will invoke the "batch" workflow multiple times. So you could have your workflow pull 250 results, then have the parent workflow invoke that in a loop, waiting for each instance to finish, until all of your documents have been addressed.

Basically, the search itself isn't the biggest issue, it's the loop iterations and activity data that accumulates when you iterate the results and take actions on the documents/entries.

3 0
replied on September 9, 2021

Thanks Jason for the speedy response, that's really useful to know!

 

I've done similar workflows in the past for large number of documents where I would limit the search to 10,000 and have a condition at the end of the workflow saying 'if result count equals 10000' then run the same workflow again (risky I suppose but only if you don't set it correctly laugh).

 

But if your saying 15,000 is a lot of results for a workflow to work on pending number of activities then I will certainly bare this in mind going forward - thanks again!

0 0
replied on September 9, 2021

I'd echo Jason's response here.

 

Usually batches of 1000 works best when processing large volumes of search results. Workflow seems to have a strange feature whereby if you process large batches, it goes from 100 docs per minute down to about 4 docs per minute, never really found a root cause of this or any kind of understanding as to why.

 

Simple answer, bathes of around a 1000 seems to work best from experience.

2 0
replied on September 15, 2021

@████████, it's because the instance keeps track of all activities and tokens it generated, so at each loop iteration, it needs to load all that history. So the more iterations you have, the bigger the load becomes for each iteration.

I'm going to add another vote to keep loops under 1000 iterations, 500 is a good middle ground.

Depending on what you do inside the For Each loop, you might gain some speed by spinning those activities into their own workflow to invoke. That way, instead of executing everything serially, you take advantage of Workflow multi-threaded  architecture by dispatching each entry to its own instance.

3 0

Replies

You are not allowed to reply in this post.
You are not allowed to follow up in this post.

Sign in to reply to this post.