You are viewing limited content. For full access, please sign in.

Question

Question

Workflow Speed

asked on December 3, 2022

I am doing a conversion for a customer.  This is just one cabinet that has around 500,000 pages that need to be stitched together into documents.

Here is the base workflow and it is working.  Basically I have a few put together with pages and their locations (already imported the folders to Laserfiche).  For each row I look for the page and if I find it, move it to a folder, put some metadata on it, and rename it.  If I don't, write to a table so I can review.

Essentially the workflow is moving the pages to a folder called doc ID and the images are renamed docID_Pagenumber.  I then need to run another workflow to merge them.

Anyway, Workflow has been running for days and is only around 2,700 documents in.  I think there is roughtly 23,000 documents?  Where is the heavy lifting being done?  The initial query took a couple minutes to get rolling but once into the loop I figure it would be faster than this.

SQL is on a different box and I don't have access to the desktop so cannot see resources.  On the WF box we are sitting at around 25% CPU for Workflow.

I know its a lot of data but with a trial run of just this cabinet the time frame is not looking the best.  Any resource updates we could do?  WF server has 4 cores, not sure if more would help or if the lifting is on the SQL side of things.

Thanks!

0 0

Replies

replied on December 5, 2022 Show version history

The problem you're likely having is that the workflow activity data is building up with each iteration of your loop which bogs down performance and makes the workflow run progressively slower.

I ran into this with my very first Laserfiche project in which I was routing tens of millions of documents and I found that the more activities in an instance, the more data, and the slower it got over time.

For example, when I'd kick off the workflow it would take less than 10 seconds per document but after thousands of iterations that would climb to minutes per document.

I'd be willing to bet that if you look at your instance details you'll find that the recent activities are taking much longer than those at the start and is this because of how much data has accumulated.

To maintain consistent performance, the best practice is to break this into chunks by changing your workflow do this in smaller batches and call that workflow from a parent process until you're done.

By breaking it into batches like that you ensure a fresh workflow instance for each batch and that way the activity data doesn't pile up so much that it negatively impacts performance.

It will depend on the type and number of activities, but as a general rule of thumb I try to keep my workflows to a maximum of 250-500 iterations per instance.

What you could do is look at your current instance to see at what point the activity durations started to get too bad and use that as your cutoff point.

0 0
replied on December 5, 2022

Thanks Jason.  I guess I can do it a piece at a time.  Using offset I get only so many rows and then start back up again.  However:

Does anyone know how to use a token as an offset?  Just doing a simple test of this and getting an error:

Here is the setup

The goal would be to return the next 10,000 rows and then invoke the same workflow passing forward the new offset.  I know there is a limit to the invoke but otherwise I would have to constantly check it (I could also get it to run over night).

Any reason why I cannot use a variable in the offset?

0 0
replied on December 5, 2022 Show version history

Is this a standard MS SQL Database? If so, you should be using a Direct connection rather than ODBC. Using an ODBC connection is just adding a layer of complexity because ODBC is acting as a middleman.

EDIT: To clarify, as Miruna said, ODBC shouldn't be related to the issue. However, it can create other issues. For example, we've had issues where failures with ODBC queries would cause a false negative (i.e. 0 results) instead of actual errors.

0 0
replied on December 5, 2022

The type of connection shouldn't matter. Workflow uses parametrized queries to protect from SQL injection. So you cannot use parameters in the order by clause.

You would have to build the query outside the activity and set the custom query as a single parameter if you want to modify the offset.

0 0
replied on December 5, 2022

So do you mean create a token value with the query statement and variables like below?

 

 

Then your custom query would just be the token "query"?

0 0
replied on December 5, 2022

Yes.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.