We’ve been receiving a number of alerts for high Percentage Interrupt Time.
One possibility is that the Virtual Server is running the wrong HAL (Hardware Abstraction Layer). This will usually only occur when we convert a physical device to a virtual device and change the number of processors. Our server was built virtual from the ground up so this didn’t apply to us.
For our situation, the server with the high % interrupt time had only 1 other process that was using up CPU time and that was the HealthService process. After digging through numerous articles and watching perfmon tick away for hours it dawned on me that since this is a multi-processor system, I should be looking at the interrupt time on each individual processor. Sure enough, only Processor 0 was having an issue.
To resolve (or work-around) the issue I set the processor affinity for the HealthService process to Processor 1 ONLY and that immediately cleared up all resource issues on the server and dropped our average CPU utilization to under 10%.
I don’t understand why Windows here was not automatically spreading the load out between the two processors but obviously the HealthService process was in contention for CPU time with the system Interrupts on Processor 0. Your mileage may vary but this issue isn’t all that uncommon on VMs and there is more that you can do to resolve the underlying causes if you’re willing to dig a bit deeper.