You are viewing limited content. For full access, please sign in.

Question

Question

Packet loss on LFDS server.

asked on April 14, 2022

As part of troubleshooting a separate issue I had our network engineer run a packet capture on our LFDS server just to see if there were any issues with the network. I'm no expert when it comes to dissecting a packet capture but what I saw shocked me. It appears there's a retransmission rate of 44.5% (see attachment) which is something I've never seen before.

From what I understand a 3% retransmission rate is normal and a rate of 5% to 8% will have a noticeable impact on response times. With a rate of 44.5% it's a wonder the LFDS is functioning at all.

It is happening across all VLANs, all ports and all protocols for both local and wide area networks. LFDS (5048 & 5049) LDAP, RDP, SQL Server, Splunk: High retransmission rate for all of them.

Oddly enough other than sluggish logons (which we've seen) and long AD synchronization times, I'm not sure how this is affecting our environment overall. If anyone wants to chime in with what they think the impact might be, I would appreciate it.

Now I am neither the administrator of the server nor the network and I am not sure where to begin troubleshooting this anyway. One thing I did was have them do a packet capture on our SQL server and the traffic for it seems quite clean overall. I might have them spot check a few other servers but it appears for now the issue is limited to the LFDS

There does not appear to be any constrains on system resources. RAM and CPU on the server never max out. I could see it possibly being a VMWare problem. Maybe there's some kind of firmware upgrade or driver update that might help resolve the issue. I will also note that we are still using Windows Serve 2012 R2 (upgrading to 2022 is the next project on our list) so I have to wonder if using such an old OS might be part of the problem. We are running Laserfiche 11 for whatever that's worth. If the issue is on the networking side then I have no idea what it might be.

My role in this is to convince both Systems Administration and Network Engineering that this is a genuine problem that needs to be addressed and then convince one or the other or both to dig into it. Yes, we have opened a ticket with Laserfiche Support but I just wanted to throw this out here to see if anyone else has dealt with a similar issue and how they handled it.

Thanks!

Monday PCAP.png
Monday PCAP.png (50.72 KB)
0 0

Replies

replied on April 14, 2022

It is happening across all VLANs, all ports and all protocols for both local and wide area networks. LFDS (5048 & 5049) LDAP, RDP, SQL Server, Splunk: High retransmission rate for all of them.

 

This makes sense, as packet loss should be a physical problem, not a software problem. The services shouldn't even be aware of the packet loss and re-transmissions. In the end they receive their response as the network layer is handling packets being lost due to interference.

You should be able to troubleshoot this entirely without any Laserfiche software invovled, by checking the packet loss between where the server endpoints are using any device. A small linux based test device should reproduce the same issue.

2 0
replied on April 14, 2022

Turns out it was an artifact of how the packet capture was done. Because it was capturing traffic across two VLANs it was somehow capturing packets twice and Wireshark didn't know how to parse that. That's why there were so many duplicates and retransmissions. Sorry for the bother!

0 0
replied on April 14, 2022

Chad,

 

Can you connect with me at mwilliams@slocoe.org?

0 0
replied on April 14, 2022

Hi Michael,

You mentioned that SQL Server has a high retransmission rate but later say that you think the issue is limited to LFDS. Is there something you are seeing that leads you to believe that this problem is only occurring for the LFDS application rather than the server itself? Additionally, would you mind sharing the support case ID? I can't seem to find it.

1 0
replied on April 14, 2022

Come to think of it I might not have actually put in a ticket. Tom Lappas is working with us. Turns out we found out what the problem with the PCAP was. It was an artifact of how the packet capture was done. Basically it was capturing some of the packets twice because it was capturing traffic across two VLANs and Wireshark kept seeing those as duplicates/retransmissions.

Sorry for the bother. I was just barking up the wrong tree but I can only work with the data I'm given, right? The issue that prompted all this hasn't happened again anyway.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.