Find out more about our consulting services. McLean. Boston. New York.
The following blog post is part of a series on Sitecore optimization from the forthcoming NavigationArts’ whitepaper entitled “Advanced Sitecore Performance Optimization”. View more posts in this series here http://blog.navigationarts.com/tag/sitecore-optimization/.
Load testing is a necessary and often overlooked tool in the optimizer’s arsenal, often relegated to a step after optimization has taken place. Indeed, at first blush, load testing might seem less of an analysis tool than pure validation. Ideally, if you have optimized well, then your Sitecore site will perform well under load, and vice-versa. If you haven’t, then a load test might tell you which of your components is the bottleneck – hardware (CPU, memory), networking (bandwidth, location), code, etc – but not necessarily why. Depending on circumstances, however, a load test can provide valuable insight into why a site is performing poorly.
The presumption that a site’s performance at non-load scales in conjunction with its performance under minimal usage is a general but deceptive guideline. A high-performing site under minimal load can generally be expected to perform better at high load than a site that is already slow with one user – that much is obvious. But reality is complex, and when studying the load curve of a site it’s common to find many permutations: response times that bounce up and down with abandon; gentle curves with major fall-offs and huge spikes; and inflection points where a bottleneck jams or clears. The interplay of test parameters, code complications and resource availability can easily present a load profile that doesn’t fit into a developer’s predefined notions of how a site should scale.
Before starting a load test process, it is important to differentiate between stressing a system to the point of failure versus analytic performance testing. A test that stresses to failure is meant to address what are hopefully extreme circumstances, and determine what the bottleneck is under those circumstances. Nearing the point of failure, one or more resources might start to peg out at 100% – Sitecore server CPU, SQL Server CPU, physical memory limit, etc, and this tends to have a spiraling effect. Simple webserver requests under those conditions can take as long as complex ones, as the system is unable to process anything in the appropriate amount of time. This is very different than tests that address the performance of the site under significant but non-critical load. With the possible spiraling behavior, it is wrong to draw performance conclusions from a stress to failure. Use such a test to determine whether additional RAM or an increase or CPU will offer the most benefit, not as a realistic assessment of performance under stable load.
In choosing a load testing tool, it is important to be able to control a wide number of parameters while still getting useful and readable data. One tool that does this well is the Web Application Stress Tool, or WAPT, by SoftLogica. WAPT is a Windows based application that delivers extremely detailed reports over a large number of configurable options. It is also has a comprehensive integrated browser action recorder which allows a developer to spend time tweaking configuration rather than creating entire test scripts from scratch. While specific versions of the product give the ability to distribute load across multiple servers, it can be difficult to properly simulate traffic over many nodes across the internet – in cases where that is paramount, a web-based tool such as Load Impact might be better to accurately simulate network traffic. WAPT and Load Impact are paid products; developers seeking a free and open-source tool might be interested in JMeter, a robust if sometimes inscrutable Java application that is now part of the Apache project.
Let’s take a look at a specific scenario: a website where the bulk of the traffic came from search traffic to any number of pages with the specific template type of a report abstract. In initial concept, this abstract might have appeared to be a simplistic HTML representation with a link to a PDF version, but the end result was a complex beast that required:
During development, it became clear that this page would be “heavy”, doing a number of queries with dynamically changing parameters to produce the content on the page. Thus, the page became a good potential candidate for Sitecore caching to alleviate burden on the server and reduce load times.
To verify this, initial heuristic tests were run on a development instance using a strong-arm approach. The “page” logic for the abstract had been built as several sublayouts placed in a content placeholder set inside the overall layout. Using Firebug to display the delivery times of the Sitecore aspx page hits, several uncached report abstract pages were loaded. Their load times were recorded and averaged in a spreadsheet, with the first hit excluded as an outlier. Then, the exact same tests were run with all the sublayouts in the content placeholder cached with only a VaryByData parameter to indicate the result needed to be different for each abstract page that was loaded up [see the Caching via Sitecore’s Caching Parameters for more details on how this functions]. This was an unrealistic scenario, since the page content changed based on user and login information, but it established a baseline of how fast the page could be if those factors were not in play.
The results were surprising. On average, the page took roughly 250 to 350 milliseconds to load in the uncached version. The cached version showed virtually no improvement, with the majority of hits falling into the same range as the uncached page.
A reasonable assessment of these results might lead a programmer to wrongly conclude that caching will not provide a significant benefit to performance. It is only by replicating real world traffic that the overall problem becomes clear.
Below is a performance diagram of a single report abstract with caching enabled, run through a load test using WAPT. The parameters were set to gradually ramp up the number of virtual users hitting this page on the site, starting at 1 and ending up at 125, over a roughly four minute period. On the left is the response time in seconds, which correlates with the dotted and squared response time lines. On the right is the number of users, which correlates with the rectangular user line.
The graph shows the most hits still riding between the 250 to 350 millisecond mark, with some outlying higher values that still generally take less than a second to complete. This cached version of the page scales remarkably well as the simultaneous user hits increase. It is important to take that indicator with a grain of salt – this is a specific test that is not strictly representative of real world performance, just a general trend line.
Now view a graph for the test run with identical parameters, except in this instance the abstract page being hit does not have its logic cached:
It is immediately apparent that the entire scale of the left side, relating to response time, is blown out. A scenario that capped at 1 second in the first run now extends up to 16 seconds! The legend of the graph does not have sufficient resolution to tell the whole story, but using additional report data from WAPT, the story becomes clear. Within the first few users hitting the page, the response time bounced around the comfortable 250 to 350 millisecond mark. Then, as more users piled on, response time started to average between one or two seconds, climbing up to greater than 5 seconds as the 100 user mark is reached. The trend is unmistakable, signifying that the caching change that may seem irrelevant on a single page hit is actually a tremendous performance saver under load.
In this instance, the final outcome of the testing forced development to split up the controls in such a way that caching could be used to address the bulk of the “heavy” parts of the page, and the truly dynamic pieces were separated into sublayouts that had multiple caching parameters. Proper use of load testing tools demonstrated the need to do this in a way that other methods of analyzing performance could not have.