You are viewing limited content. For full access, please sign in.

Question

Question

Workflow overhead options

asked on February 2, 2018

So we do a good amount of conversions from other systems into Laserfiche and we use workflow to do it.  What we find consistently is the speed of the workflow activities and process time really slows down after some time.  I know the time depends on what you are doing (search repository, any SQL queries, etc).

For example, a recent one is doing a query to a SQL table with over 100,000 rows and 200+ columns.  Then a for each row, followed by a find entry and the rest of the conversion workflow.  We even wrote into the workflow to only go through 5000 rows, then invoke itself, end the current workflow, and in the invoke, the next SQL query uses OFFSET so the query does not return the ones we already went through.  This is because after 5,000 or so rows, the workflow is so bogged down the document processes is drastically slower.

I assume this is because of all the tokens (not track token activity which I am not using).  In the Designer, all of the tabs: Details, Entries, Messages, Condition, Parameters end up holding huge amounts of stuff after so many rows. 

So, is there a way to run a workflow without all of this overhead?  I totally get all the messages and tokens.  They are great for troubleshooting/testing/building and I use them all the time.  But after all the testing, and once we are ready to go on processing hundreds of thousands of entries,  Any way to turn that stuff off for that workflow?

Thanks,

Chris

0 0

Answer

SELECTED ANSWER
replied on February 2, 2018

This is because Workflow keeps track of the iterations and all their tokens, so the more iterations you have, the bigger the instance gets. I would recommend breaking out the activities inside For Each into their own workflow and invoking it instead. That way you'll parallelize that work instead of waiting for each iteration to finish. I would also limit the query to 500 rows instead of 5000.

We do have plans for "system" workflows that will log minimal information, but I don't have a release date for it at this point.

1 0
replied on February 2, 2018

Thanks Miruna, those 'system' workflows will certainly be welcome.

0 0
replied on August 1, 2018

@████████

My VAR has confirmed this same performance slow down that I am experiencing running workflows. My workflow runs a search and adds a field to the search results. After a few thousand documents, the time to add the field goes from 0-100ms to 500 ms and it keeps climbing after about 10,000 docs to 5 seconds just to add a field! 

The whole server runs slower, especially retrieving/opening documents. 

If I reduce the search results from 2000 to 500, the performance degrades at a slower rate, but still gets worse.

I have tried stopping the workflow and running it again, but the performance doesn't return until I restart the LFS service. Restarting WF isn't enough.

If I run more than 1 of these workflows simultaneously, the performance degradation is immediate and gets worse from there. This seems like something that should be investigated.

Any tips for getting through 400,000 docs without having to baby sit this thing? It stops after 32 iterations.

0 0
replied on August 1, 2018

Are you still running with an Oracle backend?

It's probably best that we investigate this through a support case so we can get the network configuration, workflow setup and whatnot documented.

Do you know if SQL query time gets longer or if the performance loss happens somewhere else down the chain?

0 0
replied on August 1, 2018 Show version history

Yes, currently using an Oracle backend. Though, when I described and showed the issue to my VAR, they said they see the same issue performing similar mass-update workflows. They see this most often when merging 2 credit union clients' repositories and field updates in one are required. This makes me think its not Oracle specific (which is rare, I know).

I have monitored the disk, memory, and CPU activity of all servers involved and they all hardly register the activity, especially once it slows down to over 1 second per field update.

I don't have SQL queries in my workflow and I have not monitored the response time of the WF SQL server (its MSSQL, not Oracle).

Maybe you could recreate it on your end. The WF is simple. Search > Add entry ID to an independent field > if there are more than zero search results, then call the WF again:

I made it a business process and it uses the folder selected as the path to perform the search, so I can limit the scope. If you run a couple at a time, you'll see the duration of the Assign Field Value task getting longer within a few minutes.

 

I did it using a Find Entries activity instead of a search as well. Same issue, but find entries finds all docs, which can be a lot in some cases. The search can be limited on its results it returns.

0 0

Replies

replied on February 2, 2018

Chris,

Have you thought about moving away from Workflow to do the conversions?  Writing a simple utility with the SDK to do conversions like that is not that much more difficult than writing workflows.  As in workflows, once you have the basic utility it takes even less time to make any site specific modifications to it.  The primary advantage is that you can distribute the processing across multiple machines instead of running it on just one machine.   We have used this approach for many conversions and it seems to work well.

0 0
replied on February 2, 2018

Yes we have done it with the SDK a bit.  I don't have all that much experience coding or using the SDK so tend to work within the tool most times.  

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.