LSASS and High CPU Usage

Our high CPU utilization of LSASS and slow response times was caused by a poor implementation of user impersonation.

We've been load testing one of our in-house API's trying to find the cause of the slow response times. By slow, I mean 45+ seconds to complete the request.

After a lot of digging, logging, and optimizing of our code to trim everything we could, we were able to knock off about 1 second. At the same time, we did the optimizing of the code we had control over we put in a timer around the calls to code we don't have control over. This pointed us in the direction of the problem.

The problem was the calls out to our underlying document storage system. That system, when under anything more than about 10 concurrent sessions would plummet, taking upwards of 30 to 90+ seconds to complete any given request.

While testing we were monitoring that server and noticed that LSASS.exe was taking up almost 80% CPU. LSASS is the Local Security Authority Subsystem Service and is responsible for authenticating domain user logins. We also knew that the document system was doing user impersonation to access network resources like it's underlying storage array. This system is a bit of a black box where we can see little bits and pieces but not much.

We were now confident that we were narrowing down the problem, something to do with the way our document system handles user impersonation.

Thankfully, one of the developers who worked on the REST API moved from the company that developed the software to the consulting company that we used to maintain it. After discussing our findings with him, he confirmed our findings. He then gave us the solution to disabling impersonation in their software, so we could just run the application pool as the correct user.

When we removed the impersonation, we were handling 25 concurrent sessions within just under half a second. Just for kicks, we tried 100, each request completed in right around 3 seconds.

So, long story short. Our high CPU utilization of LSASS and slow response times was caused by a poor implementation of user impersonation.