You are viewing limited content. For full access, please sign in.

Question

Question

Workflow Speed

asked on February 17, 2023 Show version history

I am doing a conversion and really running into trouble with speed and workflow.  

I am on the second to final step in converting one cabinet. 

The setup is that all the pages for a document are sitting in a folder named their doc ID.  Each page is named docID_page number and I am merging them together.  I believe the workflow is efficient as there are no searches and I am capping the number of loops.


So find all the document ID folders, then for each doc ID find all the pages in it, create a new document and merge the pages together.  The conditional branch is to stop the workflow after so many loops and it becomes unusable quickly.

This is working fine but there are 65000 documents left.  I capped the loops at 500 and started again.

It's like 1-2 documents per minute?  Thats like 45 days....  That can't be how this is going to work - this one of many many cabinets and we would never finish.

They have 8 GB of RAM on the server and 4 virtual cores.  Is there anything I am missing?  Are there settings to where workflow can take up more cores or resources?  I understand that there are many pages per documents (sometimes) but still this seems to be incredibly slow.

Thanks,

Chris

 

Edit /Update - Alright well I kicked it into two workflows.  The fist find then invokes a workflow for each document and passes forward the current name and path I need.  Will still take some days/weeks but that is certainly better than what we were looking at before.

Another bump up for the idea of a "lean" workflow designer or something that does not have the overhead of lots of additional items.

0 0

Replies

replied on February 17, 2023

This design is inefficient because the nested loops are causing the activity count for the instance to balloon into the hundreds of thousands, if not millions. This instance will have to keep track of those activities, so it will get progressively slower.

This design is also making document processing running as a serial process, one at a time. Since you already have them segregated by folder, that shouldn't be needed to keep things from ending up in the wrong document.

If you move the contents of For Each Entry 2 into their own workflow and invoke it, it will parallelize the processing of the document folders at a rate of up to 4 concurrent per CPU on the Workflow Server. This way, this instance acts as a trigger for each document processing. Given a 65000 document set, this instance still runs over 130,000 activities, but the invoked instances will run only a few hundred activities, so the overall load and run time should be a lot better.

It should be noted that running more instances in parallel will increase the load on SQL and the Laserfiche Server. Though, unless you have a really large number of CPUs on the Workflow Server and really underpowered SQL, that's probably not a concern.

I'm not sure what the conditional decision is looking at but, if possible, replace Find Entries with a Search Repository that uses the condition as part of the search criteria. That may further narrow down the initial results set that you're iterating over.

6 0
replied on February 17, 2023

Chris, can you post what it shows in the activities tab when you view the workflow after it ran? That will show how long each activity is taking to run and can help narrow down any bottlenecks.

0 0
replied on February 17, 2023 Show version history

Its still running but it :

I am currently going to test building the find entries and invoking the next find to see

0 0
replied on February 17, 2023

Change the filter on that page to show all activities and then sort by the Duration column. Be default it filters to only show running activities.

0 0
replied on February 17, 2023

If there is no deadline for completion, you might set it up to run, say... 100, documents overnight until the task is done.

0 0
replied on February 17, 2023

The customer is looking in two places as their old system is read only while the conversion is running.  At 100 per night that almost two years of baby sitting a workflow so that is certainly not going to work...

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.