You are viewing limited content. For full access, please sign in.

Question

Question

Workflow increasing time complexity with large collections.

asked on November 19, 2014

We are attempting to update an existing repository by adding folders, permissions to some of those folders, moving documents into those folders, and changing some templates. Because these are existing, we have no "Created" or "Move" event to start this workflow to operate on a single document, instead I simply build a collection (find) and iterate over it with foreach.

We are actually nesting for each loops to account for the Zero-to-Many nature of this set and the workflow runs at an acceptable rate on small sets. Unfortunately, we see the time required to perform the same work increase per iteration. The numbers we are talking about aren't extreme; even over only 4000 objects we see the time per iteration go from 10 seconds per to approximately 5 minutes per. As actions taken per iteration (often actions that occur inside nested loops) increase so does the rate of change for time complexity. For instance, a workflow that performs 100 actions per Iteration will slow more quickly than a workflow that only performs 10 actions per iteration. The time per iteration still increases for both but it's less dramatic on the workflow that performs less work during its life. Restarting the workflow returns the time complexity to normal - this phenomenon only occurs within the scope of the workflow itself; even workflows running concurrently are unaffected.

I believe that there is a direct relationship between the time complexity's rate of change and the number of actions performed (e.g., Move Entry). Since the work performed per iteration is essentially the same, it seems like this increasing workload must be coming from somewhere else, e.g., some form of Tracing.

Mostly I'm just curious if this is expected behavior. Perhaps there is a way to turn off whatever process is adding work over the life of a workflow? Has anyone else run foreach on -- I don't really consider 10,000  elements large but -- moderately large collections and noticed this as well?

 

 

 

0 0

Replies

replied on November 19, 2014

This is expected. A Workflow instance keeps track of the previous iterations and all the data associated with them. So the more iterations, the more data is being saved in the instace.

You can limit the number of hits returned by your searches and run the workflow on a schedule. You could also spin off the activities in a For Each loop into their own workflow that is invoked for each activity. That will minimize the number of activities tracked in the main instance.

0 0
replied on November 19, 2014

We do what Miruna has suggested all the time. Create a parent workflow that is extremely simple (search -> for each -> invoke workflow) and then have a child workflow that does all the processing for each hit. It's much faster, and I believe it takes better advantage of the multithreading capabilities to maximize the number of documents being processed in parallel.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.