Tags: advisory service, aggregation, bcr, data distribution, desktop service, gomez, internet application, internet locations, measurement data, measurement tools, measurements, number 1, performance aspects, performance measurement, sevcik, simple answer, single users, user experience, waltham ma, web performance,
Web Performance Not a Simple Number
Net Forecasts Peter J. Sevcik
BCR Volume 33, Number 1
January 2003
We are often asked to analyze the performance of Gomez (Waltham, MA) also operates server
an Internet application by supplying a single agents at key Internet locations. In addition, it
answer to a single question: "How fast is it?" has a desktop service based upon "normal"
Whenever we explain that there is no simple single users permitting their PCs to be agents under
number, a common response is: "Yeah, OK, but the control of Gómez. The company also
what is the number?" provides an advisory service to understand and
improve non-performance aspects of the Web
Performance measurement vendors, on the other site and the total quality of the user experience.
hand, make their living by supplying a single
number. The result, however, can be misleading; NetForecast used the measurement tools of each
decisions founded on simple data, even if it's firm to gather data on the performance of many
impartial, often wind up badly. Web pages under a variety of conditions. However,
in each case we asked the service to supply us with
The problem with the "simple answer" is that by raw measurement data so we could investigate the
definition it is an aggregation of all the data distribution of time samples.
gathered on a site. That's why you often see results
that repeatedly contain the word "average." For It is important to note that the goal of this analysis
example, "This is the average of measurements is to show the insights that can be learned from
from each agent by hour, then averaged across all seeing all the data rather than the mean. It is not a
agents, then averaged across the week." When you comparative assessment of the measurement
see that kind of sentence beware: The interesting or services.
meaningful information often has been ground out.
The result of the analysis is shown in Figure 1. The
In the real world, users experience a range of curves represent the distributions of page load
performance that is often much worse than the times within the continental United States based
average would indicate. The average or mean is upon the following data sources:
influenced by the preponderance of time samples Keynote BK40 is the Keynote Business 40
around a measurement peak. A more realistic view Index of 40 business sites across a week of
of performance is the histogram or statistical measurements. The measurements use
distribution of time samples. This method often Keynote's Web Site Perspective Business
shows what is known as a "long tail" to the Edition operating in about 21 cities where the
distribution of time samples. servers are directly connected by 45-Mbps
connections to major backbone Internet service
Getting Detailed Measurements providers (ISPs).
Gomez PN Financials is the Gomez
We were fortunate to get the cooperation of the Performance Network (GPN) service
three leading measurement services in order to measurements of 40 major financial sites over
study this issue. The companies are: a week. The GPN service operates in 25 cities
Matrix NetSystems (Austin, TX) has been using agents connected to multiple backbone
measuring the Internet since 1990. It utilizes ISPs by 10-Mbps connections.
thousands of beacons around the Internet that Matrix of KB40 is the Matrix Internet
can be controlled to measure latency, packet Average service used to supply round-trip time
loss and reachability. (RTT) and packet-loss measurements using
Keynote Systems (San Mateo, CA) has an five beacons testing to 109 destinations that are
extensive set of tools for benchmarking and connected to the Internet by typical access
testing of Web sites on the Internet. It operates lines and access ISPs. We used this low-level
agents on servers in key locations on the Internet data as input to our performance
Internet. The data supplied shows load times of model set with the application profiles of all
full Web pages and their components. the KB40 sites. This is therefore a model-
Net Forecasts January 2003 Page 1
generated result of page load times at the measurement agents, the two curves are almost
network edge based upon Internet identical.
measurements (see BCR, October 2001, pp.
28-36). Moreover, it is clear that where within the 'Net (i.e.,
Gomez DM of KB40 is the Gomez Desktop backbone vs. edge) you measure is much more
Monitoring (GDM) service, which measured important than how you measure. What you
the KB40 sites from hundreds of user desktops measure is defined by the individual page profiles
over two weeks. This service was set to only that are diverse; in this case, we averaged results
use broadband connected desktops (1.5 Mbps across all Web sites measured by Keynote and
or faster). This is the most "edge-oriented" Gomez. The detailed view by site, while important,
measurement since it uses desktops that are would have resulted in Figure 1 having 160 curves.
connected via access networks or corporate The two "edge" curves of Matrix and Gomez DM
networks. are not as close a match, but are still similar enough
to be called a pair.
The curves in Figure 1 provide very interesting
insights. First, it is clear that there really are two Second, the edge view consistently shows much
distinct set of curves: the Keynote and Gomez slower performance. The backbone performance is
backbone-measurement pair and the Matrix and fast and consistent. There is a distinct peak of
Gomez edge-measurement pair. Most surprising is performance in the 2 to 3 second range, with a very
the fact that although the Keynote and Gomez fast drop-off towards longer times. However, the
backbone services measured different sites (only 2 edge curves show a much more sloppy performance
of the 80 are in common) using different zone across the 3 to 12 second range.
Keynote KB40 Gomez PN Financials Matrix of KB40 Gomez DM of KB40
45%
Probability of Page Loading in Time
40%
35%
(by 1 sec increments)
30%
25%
20%
15%
10%
5%
0%
0 5 10 15 20 25 30
Total Page Load Time (sec)
Figure 1 Distribution of Page Load Times Within the US
Net Forecasts January 2003 Page 2
Investigating the Long Tail is understandably slower. The slowest 20th and 10th
percentiles of the samples are 3- and 4-times slower
The four curves in Figure 1 look different in the than the overall means. This is a consistent pattern
first 15 seconds, and very similar after 15 seconds for both the backbone and edge measurements. The
with hardly any users occupying the region. But long tail is confirmed and shows the same pattern
this doesn't mean that there is an almost zero in all measurement methods. This is an important
probability of seeing times after 15 seconds. finding, and the implication is that regardless of the
Indeed, while the probability of seeing each unique measurement service or testing location, large
second after the 15 second mark looks slight, there numbers of the user population see significantly
are many of them stretching far out to 60 seconds poorer performance than the one-number answer.
and beyond. These small probabilities add up. To
really understand the effect of the "long tail," you Keynote also supplied us with a week of raw
have to perform in-depth statistical analysis of user measurements of the Keynote Consumer 40 (KC40)
groups out on the tail as show in Table 1. sites. It was expected that these sites, which are
measured using dial-up modems, would show a
Table 1 starts with the overall mean of the total more significant distribution tail.
sample set for each measurement service. This is
the "one number" answer that most people like to Surprisingly, although the KC40 had a long overall
hear. Not surprisingly, the means for the two mean of 22.5 seconds, the ratio of the slow group to
backbone measurements are very close (2.2 and 2.7 the overall mean was a very modest 1.8 for both the
seconds). But notice that even though the curves slowest 20th and 10th percentiles. This shows that
for the edge measurements look different in Figure the consumer test results are much more flat with
1, their means are actually the same at 8.4 seconds. no performance peak or tail. We think that the
payload for the KC40 Web sites trying to pass over
However, when the means are calculated for the constrained dial-up link governs this flat curve.
subsets of the total population, the values tell a Therefore, the effect of the distribution of latency
different story. Not surprisingly, when the samples across the Internet is removed from the picture.
are cut in half by speed, the faster half is much
faster than the overall mean, while the slower half
Table 1 Mean Response Times (sec) by User Percentile
Backbone Measurements Edge Measurements
Keynote Gomez PN Matrix of Gomez DM
KB40 Financials KB40 of KB40
Overall Mean (0-100%) 2.2 2.7 8.4 8.4
Fastest Half (0-50%) 0.7 1.0 3.8 2.7
Slowest Half (51-100%) 3.4 4.3 12.5 13.2
Slowest Fifth (81-100%) 6.1 7.1 19.6 25.6
Slowest Tenth (91-100%) 9.2 9.8 26.8 41.9
Net Forecasts January 2003 Page 3
Practical Example performance that is slower than 12 seconds. Now
how about a conversation with management that
Why should you care about all these complicated starts with, "How bad is 4-times the target for half
details? Simple: It matters to your users. For of our users?"
example, consider a Web site that has a 3-second
target for page-load time, which is reasonable for a A shift in measurement point and looking deeper
typical business-to-business site. into the data creates a very different perspective on
how the users feel.
If the Web manager were to subscribe to one of the
more popular backbone measurement services, he Conclusion
or she may believe that they had achieved their goal
with a mean score of less than 3 seconds (see Table Use these measurement services, but use them
1). And given the above lesson in the distribution wisely. Most importantly, avoid the simple index or
tail, even the slowest 20th and 10th percentiles of single-number answer. Demand insightful
users experience about 7 and 9 seconds mean distribution analysis of performance from these
response times. Not good, but not too bad. In some vendors. One size does not fit all in clothing, so
on-line business situations, it may be worth having why would one number represent the performance
a discussion with business managers with the seen by all your users?
question, "How bad is 3-times the target for a tenth
of our users?"
However, if the manager thought through the
problem a bit more, he/she might realize that very
few users are directly connected to major Internet
backbone nodes at 10 or 45 Mbps. In fact, the
broadband edge user profile may be much more Peter Sevcik is president of NetForecast in
typical of the real user population. Andover, MA, and is a leading authority on
Internet traffic, performance and technology. Peter
Furthermore, 4-times slower has a significant has contributed to the design of more than 100
impact to the user's interaction with a computer (see networks, including the Internet, and holds the
BCR, July 2002, pp. 8-9). In this example, 12 patent on application response-time prediction. He
seconds is 4-times the target, but we can clearly see can be reached at peter@netforecast.com.
that half of the users on the Internet edge are seeing
NetForecast Inc. is a network technology consulting firm based in
Andover, Massachusetts. Our seasoned consultants draw on decades
of experience to help clients worldwide choose new technologies,
improve performance, and align infrastructure to business. We have
helped leading enterprises, service providers, and vendors navigate the
changing competitive landscape of the Internet economy. Please call
us to discuss how we can help your information network succeed. www.netforecast.com
Net Forecasts January 2003 Page 4