At NowOnline, we recently experienced major performance problems with two of our webservers (Windows 2008 Web Server with IIS7.5). Over the months memory consumption and CPU usage had increased substantially as the number of hosting packages increased to respectively 290 and 120 websites. As a result, websites running on these machines became progressively slower. Worse, the dreaded ‘Out of memory’ exception started showing up periodically as the servers tried to free and allocate RAM to new worker processes. Adding additional RAM to the virtual machines did remedy the problems for a while, but this was not a permanent solution.
Our hosting provider diagnosed the problem initially and attributed it to the sourcecode and the volume of websites (and suggested adding more RAM). So I set out to figure out what was going wrong, really. I initially explored potential memory leaks or bad code but was unable to find a clear pattern here. Some of the larger sites used more memory, obviously. But I couldn’t find clear examples of the telltale sign of memory leaks; an ever increasing memory consumption by a single website. Next, we started moving websites to another webserver to test if the problem was caused by the load of the increasing pool of websites. Turning off some heavy websites did not improve performance and did not reduce memory load significantly. Any freed up RAM was quickly used up by other websites.
A major part of the solution turned out to be in the way we used application pools. When configuring the webservers I had always adhered to the ‘best practice’ of creating a separate application pool per production site (see this article, for example). This is actually the default behavior when creating a new website in IIS. There are clear advantages to this approach:
- If a code problem causes the website to crash (i.e. infinite loop or a memory leak), only the associated application pool crashes. This means that other websites are not affected by problems in other websites;
- This approach makes it easier to tweak application pool settings (CPU and RAM allocation) for specific websites;
- It is easier to diagnose memory leaks by investigating RAM usage by a single application pool;
IIS is quite clever when it comes to managing the application pools. If a website is not getting any traffic for a while, IIS shuts down the associated application pool by default. This frees up resources (RAM and CPU) for other application pools. Should a new visitor arrive, IIS simply starts up the application pool and spawns worker processes. It soon dawned on me that every application pool requires some overhead in resources. And RAM that was allocated to one pool could not be freely used by another pool.
To test if our configuration (one application pool per website) was indeed causing the problems, I started grouping the websites into several shared application pools based on their logical relatedness and .NET framework. For the more critical websites, I decided to stick to isolated application pools just to be sure. I ended up going from 250 pools to about 20. The results speak for themselves:
Picture 1: Memory consumption by alpha-web1 before and after the change (around 4PM)
Picture 2: Memory consumption by alpha-web2 before and after the change (between 7 and 8.30PM)
The results clearly show that RAM consumption dropped dramatically, from a structural 98% to around 45%. The virtual memory (‘Swap’ in the picture) that had to made available to virtually increase RAM also dropped to zero, which had the nice side effect of lowering Disk I/O. Not only had the servers become significantly more responsive, performance for individual websites also increased.
We initially worried about strange behavior, such as duplicate MVC route names, caching conflicts and such, but experienced none of these. IIS does isolate this kind of behavior nicely.
The lesson that I learned from this is that it’s not a good idea to