You are viewing limited content. For full access, please sign in.

Question

Question

Workflow subscriber events, difference between live and trace monitoring services

asked on October 15, 2020 Show version history

Is the service that monitors Live Events different from the service that monitors for start and wait conditions in workflow. I noticed that if I monitor my event with the live monitor and I don't see that same event in the Trace Log, then my workflows with conditions for that event will never have a trigger.

Here this event doesn't show up in the trace, along with any  other events from 10/15 because whatever handles the trace log is not working while the live log is.

The reason this confuses me is if I am troubleshooting whether or not the subscriber can see events in the repository, the  next place I go after checking the trace, is the live, but everything looks good in the live so there is a disconnect between the 2.

I am thinking the Live is handled by the Laserfiche Server Service queue and the Trace is handled by the Subscriber Service.

0 0

Replies

replied on October 15, 2020 Show version history

The Live Event viewer is all the notifications the Laserfiche Server emits while the window is open. The Subscriber Trace will only show the events the Subscriber evaluates.

Your Subscriber seems to have stopped on the 14th at some point. Click on the "Laserfiche" tab over in the Subscriber Trace and see if it's still connected to the repository properly.

0 0
replied on October 15, 2020

Ok, so the live log is data in the Server Service itself. Which explains why it doesn't have to match the trace log and why subscriber can be offline for it to work.

Just restarting the Workflow Service can resolve the issue sometimes.

There is no recent errors often when this happens. It seems the situation where restarting the workflow service is the solution is when at some point in the past, workflow lost a connection to it's database.

Even if the database is working fine hours or days later, the subscriber was maybe told to be suspended and never updated to be active again, but I don't see a status of the subscriber anywhere that displays this.

I am not even sure if Workflow is really telling the subscriber to be suspended, but I can't think of anything else since it's not a clear configuration disconnection, but always a restart or re-install of workflow that fixes it.

In this case there was a 5 minute outage for communications to the SQL server at 1am, but 12 hours later there was still no events and no additional errors since 1am.

0 0
replied on October 15, 2020

Workflow Server does not tell the Subscriber to suspend.

What does the repository tab show in the Subscriber Trace?

0 0
replied on October 15, 2020

Well everything looks good now since I took these screenshots before restarting the service to resolve the issue but next time I will look here. What would I look for though, I expect I will see the last event time will be out of date but I can already see that from the last event in the Events tab.

Maybe here I would see the Queue building up?

0 0
replied on October 15, 2020

It depends. If there's a problem connecting to the repository, it will have a warning icon at the top in the "Processing loads" graph. Clicking it will tell you what the error was.

The rest of it might tell you if the problem is actually with the Subscriber or something else. You would see things like when it last got a list of updated rules from the Workflow Server, when it got its last event from the repository, whether there's a backlog of messages to process.

0 0
replied on November 17, 2020

We have determined the cause, and there is something concerning here. IT had started taking down the SQL server for some security updates. We found that if it is down for more than 30 seconds, all changes in the repository just get dumped into a queue until the Workflow Service is restarted. At that point those changes will start being processed.

The problem is this queue is only so large, if the SQL server was down for 30 seconds at some point in the past, and since then too many changes have been made, then only some of the changes will be processed.

They have been doing this security audit thing every week at 1am, so we have seen this happen many times. Each time it starts with a message in the communication log that the connection to the SQL server timed out or that the license was not able to be established with Directory Server because Directory Server could not see the database. Then there is nothing in the trace log until the Workflow Service is restarted.

0 0
replied on November 17, 2020

Workflow doesn't talk to Directory Server.

It sounds like you're running separate SQL servers for the Laserfiche Server and Workflow? If the Laserfiche Server can still talk to its SQL server, then yes, there would be notifications of activity coming in. Though 30 seconds doesn't sound like enough downtime to build a sizable queue in most repositories. I'd be curious to know what the repository tab in the Subscriber trace looks like during one of these periods.

Workflow services should be stopped during SQL maintenance ( in general, any services should be stopped when extended downtime is expected on one of the down-level dependencies).

0 0
replied on November 17, 2020

Oh no, I should clarify a few things. 30 seconds of downtime is the timeout period, before services begin to give up ever talking to SQL again. They never try to re-connect. So a simple reboot of the SQL server operating system can leave many Laserfiche services rendered in a sleeping state for days until the problem is discovered.

Workflow does appear to talk to DS in a round about way. The message it gives is that the license is invalid, it is getting this from the LFS, and at the same time LFS is saying it can't communicate for license needs to the Directory Server and at the same time Directory Server is saying it can't find it's database. This doesn't always happen in this order, but it is one scenario of loss of connection to the SQL server.

But either way, workflow always goes into this sleep mode when the SQL server is restarted and there is always some chain to trace it back to a SQL timeout being reached some days ago.

I have made it clear that they should shut down the Laserfiche application servers during maintenance of SQL, but maybe their maintenance software is not this advanced? Does it only require the Workflow Services be shut down (they do seem to be the only services which are permanently effected after about 10 repeats of this scenario.

0 0
replied on January 4, 2021

Just another example that came up, as another way this can happen. Workflow said it could not communicate with the Message Queuing service a few days ago.

After checking that the Message Queuing service was running I just needed to restart workflow. Then events started queuing up from the days of which workflow was in a sleep mode.

I compared the time of the workflow error with the Windows Update log to confirm that Windows was indeed making updates during this time. The update service restarted itself about 15 minutes before the workflow error and then the logs are filled with many update notes for the next hour or so.

Obviously this was a no-reboot update or it would not have been a problem.

But whatever the update must have done with the communications to the message queuing service put workflow to sleep forever.

The events are still queued, to an extent. Once you wake up workflow everything processes, unless you overextend the buffer size in which case you will have a gap in your events.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.