Hi Chris,
I've recently been thinking about approaches to performance issues. The main trouble is that there are different types of "slow" which always manifest as the same "slow" to a user, whether the root of it is at the front-end, the back-end, the network in between...or some combination of the three.
I guess if I had to summarize the approach, it boils down to starting from the symptoms and isolating in ever-narrower scopes where the next iteration of investigation should take place. Start by identifying what "slow" means and when it manifests. Continue to determine whether the source of slowness is in rendering the display, or coming from the server or network. Finally, by profiling resources with appropriate tools, we can begin to make concrete decisions such as whether the form needs to be tweaked, services need to be given more resources, routing needs to be looked at, etc.
Here are some more concrete thoughts on how to approach this, not really arranged in a strict order.
- Is your test form representative of the forms running in production? If so, it may actually help to use instead a completely "empty" form --- no theme, no custom JS/CSS/HTML, no fields at all in fact. If this also takes 5-10 seconds to load, then we know it has nothing to do with any customizations.
- Additionally, is the long loading time dependent on time of day or user activity? You mentioned Application Initialization --- that helps with the "cold start" problem immediately after an application pool initializes, so if long loading time is still manifesting after several people have been using Forms (and not just for submissions), then this is unlikely to help. Conversely, if the problem gets worse when there is more activity, you may want to check resources upstream --- memory and processing on the Forms webserver, or even upstream at SQL Server, including whether there are any alarming SQL load metrics.
- You could also help to rule out network latency by testing on the web server itself, using a local address. The local address part is important because for example if the form is publicly accessible and you use a public URL, then service discovery will kick in and your request would still go through DNS lookup, and any public firewalls, DMZ routing, reverse proxy, load balancing, etc.
- In the browser Development Tools pane (F12 or Ctrl+Shift+I in most browsers) there is typically a tab called "Network". Clear this out and refresh a form page to reload all the content. You will then see a mess of requests that the browser is making. In particular, you will also see how long each request takes. Importantly, this measures the request to and from the web server, so includes server-processing time and any network latency. If a particular request seems outlandishly long, that gives information on where to hone in. If the shortest request is still fairly long, that hints at something network-or-server-side. In general, static JS and CSS resources should be fairly speedy. The "xhr" (XmlHttpRequest) request duration typically scales with the size of forms (number of fields), so it's hard a priori to say what to look for with respect to that.
- A tool someone recently clued me into is the Performance tab of the Chrome developer tools pane. When recording a profile, it actually records a screenshot every time the UI changes. (It also does a breakdown of browser-side script executions, but since everything is minified, that's less meaningful in context here.) You can then see more clearly the order in which page elements load. Frames with no screenshot indicate initial page request. Since I haven't used this extensively myself, I don't have much better commentary or recommendations, but perhaps someone else could weigh in.
- For profiling server-side resources, Task Manager is a good tool for seeing where memory or CPU are being used. If multiple services are being hosted on the same machine, and one appears to be a resource hog, it may well be worth splitting that off to its own box. If there is no clear hog, but one service needs better performance, you could lower the priority of other processes if they could take the hit. If all of the main services on a machine need performance improvements but there is no clear hog, really the only answers would be splitting out services to other boxes or beefing up the monolith machine. (Which one is better winds up being an economics decision.)
- Side remark on splitting up services. I feel that as time goes on and systems tend toward distributed systems --- which have plenty of advantages --- there begins to be a sort of a priori stigma against monolithic network architectures. Whether having everything together poses an actual performance issue depends on activity and resource availability so really shouldn't be something an outsider can judge off the bat. However, from a diagnostics perspective, it definitely helps to split apart heavily-used services just to be able to help with "assigning blame" --- the major services will then never be in direct competition with each other for machine resources, and there can be greater flexibility in provisioning resources per service, especially if the machines are virtualized.
The circumstance where this becomes less helpful is when "everything" is "generally slow" "all the time", and things like network traces are inconclusive. I don't have a good answer (yet?) for what to do in that circumstance aside from relying on experience with the actual system to make educated assumptions that can then be tested.
Hope this helps as a starting point for thinking about performance troubleshooting, and I hope that others with additional experience can weigh in as well for some of the "grey areas" I've left.