You are viewing limited content. For full access, please sign in.

Question

Question

Stop schedule for workflow?

asked on March 29, 2022

I'm working on a pretty big migration that will take many days to complete. My servers undergo daily backups so I need to be able to pause the running workflow during these maintenance windows. I noticed that when I run my test imports, workflow had up to 9000 or more instances in the "Active" of running. My average completion time for a simple workflow instance is 14 mins. The workflow is querying a database to get metadata and then applying that data and filing away the doc. the Query is the bottleneck and I'm not sure there is much I can do.

My question is how can I set up Workflow to stop processing during a certain time of day, and then pick back up. I'm using Import Agent to bring in the docs and I see the option to only import during scheduled hours, so can I do that with workflow also? 

Thanks.

0 0

Replies

replied on March 29, 2022

Couple of ideas:

1. You can pre-retrieve the database info and generate an xml file.  The xml file will point at the actual content to ingest, but already have the template, metadata, and Laserfiche folder included - ultimately eliminating workflow altogether.  Setup Import Agent to only read xml files in that folder.  I have a .Net C# utility that builds the xml file, if you're interested.

2. If the xml file isn't an option, you can change the workflow to not run when OnCreated, but as a business process.  Import Agent can get content into the repository, and then you run the business process manually, on a group of documents of your choosing.

3. 14-min is an awful long time.  Of that time, how long does the query take?  You may want to take that query and analyze it.  If you're using SQL Server, the query analyzer will tell you if you how to optimize it.  You may need to create an index because it sounds like you're doing full-table scans over and over.

4. I always use the Laserfiche API for migrations.  This gives me full control to put things in where I want them, but also allows me to then write a reconciliation routine to ensure everything got in without having to comb the server event logs for any failures.

1 0
replied on March 29, 2022

Hello Rich, I'm interested in this XML idea. Honestly, I have researched this migration process so much and I keep getting different ideas. For example, option 2 was an idea of mine but a Laserfiche person had told me that the most efficient way to run the migration was what I'm doing. Import them all and run a Workflow on document creation. Something about the multithreading the WF could do would maximize speed. 

14 minutes is a long time. I have actually brought that down from 25 minutes by moving the DB I was querying to the LF server's SQL server. The query is what takes the most time but if I run the workflow on 1 document myself, it's very fast. I think it has something to do with the fact that over 9000 workflows are "running". So the start happens but the query is backed up. I'm not sure how many concurrent queries can happen at one time but it's sure not a lot. MY guess is the query doesn't actually take that long, that is just the amount of time between the start of WF and when the query actually gets run. 

A backup of the DB I need to query was 40GB so it's large. I'm expecting this migration to take just under a month which is why I'm looking to find the most efficient way and also ensure it doesn't crash during the daily maintenance window. 

0 0
replied on March 29, 2022

If you have reason to believe the bottleneck is Workflow rather than SQL, you could try going to Advanced Server Options: Activity Performance and clearing "Database Activities" from both the "Run as tasks" and "Run in external process" lists. Will cause Workflow to use more compute resources and may cause your query-bottlenecked instances to run faster/with higher throughput.

If it's an option, I'd also try temporarily giving the Workflow server VM more vCPUs/cores. Workflow can have four concurrently active running instances of a given process per vCPU. For a 4 vCPU instance, this would be 16 of those 9000 invoked Workflows actually firing off their queries and the rest waiting. If Workflow parallelization is the limiting factor, you can expect to see near-linear increases in throughput by increasing vCPU count.

If you can't increase vCPU count but that's still the bottleneck, another approach is creating two or more identical versions of the workflow with different starting rules so they don't fire on the same document. For example, one that triggers on new documents starting with letters "A-M" and the other on everything else. Pick rules based on what makes sense for the content you're migrating.

I'll also second @████████'s third point on query efficiency. SQL may easily "hide" how computationally/disk IO expensive the query is when only one is running because it can use all of its parallel processing resources to still return a result in a fast absolute time. As soon as you're running many instances of the query, especially if it's against a large non-indexed table, the absolute times can shoot way up.

 

0 0
replied on March 30, 2022

Not a huge fan of changing server-based values based on one use case, because a mature organization should require this to go through a proper change management process.  Then, you get on management radar, and have to prove prove prove the change doesn't affect anything adversely.

You know what I AM a huge fan of is @████████

 

 

 

3 0
replied on March 30, 2022

I have got more CPUs/cores for my SQL server that is connected to my workflow server. That did seem to speed things up. However, I noticed that the SQL server that I'm querying to get data for the documents was pretty maxed during the process. So I tried putting that data on my SQL server which also is the SQL server for my repository. It has a ton more memory and 6 processors. I ran the same test with the same documents and what was crazy is that it seemed like it was going faster. The inbox that IA is dumping the files never got backed up. The workflow was completing in an average of 7 seconds! I thought I had this down. I thought I did until I looked at when the workflows started running and when it finished. 

With an average workflow completion of 7 seconds, the first workflow started at 1:36:26 and after 20464 instances, the last one finished at 2:38:25 for a total of 62 mins. 

I looked at when I was using the underpowered but separate SQL server, and the stats were an average instance completion time of 14 mins. The first WF started at 12:36:21 and the last one finished at 1:19:09 for a total of 43 mins. Same total instances of 20464.

I'm completely confused on why 20K workflow with an average completion time of 7 seconds still took longer overall time. Is it because I was querying the same SQL server that runs my repository?

0 0
replied on March 30, 2022

Because of queuing for execution. As I was saying above, WF would run 4 queries per CPU concurrently (in order to not overwhelm SQL). The other instances would wait their turn and only get a slot to run their query when one of the currently running ones is done. On an underpowered SQL, the query itself would take a bit longer too, so everything would get to queue a bit longer.

1 0
replied on March 30, 2022

@████████ Aww, thanks! I normally wouldn't have recommended changing the server-level settings. @████████said he was using a dedicated Workflow server for this migration though, so those changes wouldn't ​​​​impact any other processes.

Lucas, now that you're querying a SQL Server instance that isn't maxed out, see if you can get more cores for Workflow itself to increase parallel executions to get through the queued instances faster. If that's the bottleneck, you should see near-linear increases in throughput.

 

0 0
replied on March 30, 2022

Haha, +1 on not changing the activities to run in-process rather than tasks. For exactly the reason Rich stated: when interacting with other systems, you can't always rely on them being stable. And if the driver crashes, with the new settings, it would take out the entire WF Server process. It's not common with SQL ODBC drivers, but i have seen it happen. When running as a task, there's less of a chance that a query issue would crash the server.

1 0
replied on March 29, 2022

The number of concurrent queries is controlled by the number of "external tasks" specified in the advanced server options (by default, 4 per CPU). If you hit the server with 9000 documents at roughly the same time, they'll queue up. So I'll second the requests to look into whether the query takes 14 minutes or the instance queues up for the better part of those 14 minutes.

You can increase the number of CPUs and/or the number of concurrent tasks per CPU, but that would  transfer some of the load to SQL, which may not be desirable.

0 0
replied on March 29, 2022

I'm asking my server guys to give me more cores for the dedicated workflow server I'm using for this migration. I gave the example of 9000 documents because it does seem like IA is bringing in documents faster than workflow can process. When I turn on IA, the first workflow is fast, and as IA brings in more and more, the "Active Workflow" search just keeps getting bigger and bigger. If I only run the workflow on 100 docs, the average run time of my workflow is 1 or 2 seconds. Import Agent is just moving so very fast. 

I still haven't got an answer for a stop schedule for my workflow. I saw someone post about it but I can't find any information about where that setting is. My server guys tell me that they do backups nightly and I need to make sure nothing is running during that window. 

0 0
replied on March 29, 2022

You should not need to "make sure nothing is running" during the backup window. I have not in eight years identified that as a real requirement for any snapshot-based server backup solution. Make sure the Workflow database is set to use the Full Recovery mode and that you're taking frequent transaction log backups for point-in-time recovery capability. You restore by recovering to your last WF server backup snapshot, noting the exact timestamp, and then restoring the WF database to that exact point in time.

If they are insistent, however, the solution is to stop the Workflow service and/or shut down the VM (really, all Laserfiche services and/or VMs) on a schedule. When you bring services/VMs back up, start with LFDS, then LFS, then things that connect to LFS like Import Agent, Workflow, and Web Client.

You can schedule service stops/starts with PowerShell scripts and Task Scheduler.

1 0
replied on March 29, 2022

Short of stopping services, there is no "pause" in the Workflow Server. But like Sam said, there's really no need to worry about running stuff. You do want to make sure you are backing up the WF volume along with the database though.

1 0
replied on March 30, 2022

@████████

@████████

If you change hardware, you may need to restart a service with a new .licx, correct?  Not sure if Workflow is one of them.

0 0
replied on March 30, 2022

In the setups where LFDS is used, WF doesn't have a license itself because it just checks if the LFServer is licensed. So if you swap the license on LFS, WF wouldn't really have a problem (if LFS is down or unlicensed for a longer period of time, WF would slow down its attempts to connect to the repository, so it may take a bit longer to notice when it comes back up. In that case, you can restart the WFServer to force it to check that LFS has been relicensed).

You don't have to restart LFS to make it reload its license, you can run it from a command prompt with the -reloadlicense flag to force it to reload the license. But most people find it easier to restart it.

 
0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.