Information about http://www.netforecast.com/Articles/BCR%20C25%20Web%20Performance%20-%20Not%20A%20Simple%20Number.pdf

Web Performance ­ Not a Simple Number …

Tags: advisory service, aggregation, bcr, data distribution, desktop service, gomez, internet application, internet locations, measurement data, measurement tools, measurements, number 1, performance aspects, performance measurement, sevcik, simple answer, single users, user experience, waltham ma, web performance,
Pages: 4
Language: english
Created: Wed Dec 18 15:46:15 2002
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
                           Web Performance ­ Not a Simple Number
                                       Net Forecasts ­ Peter J. Sevcik
                                        BCR Volume 33, Number 1
                                               January 2003

We are often asked to analyze the performance of             Gomez (Waltham, MA) also operates server
an Internet application by supplying a single                agents at key Internet locations. In addition, it
answer to a single question: "How fast is it?"               has a desktop service based upon "normal"
Whenever we explain that there is no simple single           users permitting their PCs to be agents under
number, a common response is: "Yeah, OK, but                 the control of Gómez. The company also
what is the number?"                                         provides an advisory service to understand and
                                                             improve non-performance aspects of the Web
Performance measurement vendors, on the other                site and the total quality of the user experience.
hand, make their living by supplying a single
number. The result, however, can be misleading;          NetForecast used the measurement tools of each
decisions founded on simple data, even if it's           firm to gather data on the performance of many
impartial, often wind up badly.                          Web pages under a variety of conditions. However,
                                                         in each case we asked the service to supply us with
The problem with the "simple answer" is that by          raw measurement data so we could investigate the
definition it is an aggregation of all the data          distribution of time samples.
gathered on a site. That's why you often see results
that repeatedly contain the word "average." For          It is important to note that the goal of this analysis
example, "This is the average of measurements            is to show the insights that can be learned from
from each agent by hour, then averaged across all        seeing all the data rather than the mean. It is not a
agents, then averaged across the week." When you         comparative assessment of the measurement
see that kind of sentence beware: The interesting or     services.
meaningful information often has been ground out.
                                                         The result of the analysis is shown in Figure 1. The
In the real world, users experience a range of           curves represent the distributions of page load
performance that is often much worse than the            times within the continental United States based
average would indicate. The average or mean is           upon the following data sources:
influenced by the preponderance of time samples              Keynote BK40 is the Keynote Business 40
around a measurement peak. A more realistic view             Index of 40 business sites across a week of
of performance is the histogram or statistical               measurements. The measurements use
distribution of time samples. This method often              Keynote's Web Site Perspective Business
shows what is known as a "long tail" to the                  Edition operating in about 21 cities where the
distribution of time samples.                                servers are directly connected by 45-Mbps
                                                             connections to major backbone Internet service
Getting Detailed Measurements                                providers (ISPs).
                                                             Gomez PN Financials is the Gomez
We were fortunate to get the cooperation of the              Performance Network (GPN) service
three leading measurement services in order to               measurements of 40 major financial sites over
study this issue. The companies are:                         a week. The GPN service operates in 25 cities
     Matrix NetSystems (Austin, TX) has been                 using agents connected to multiple backbone
     measuring the Internet since 1990. It utilizes          ISPs by 10-Mbps connections.
     thousands of beacons around the Internet that           Matrix of KB40 is the Matrix Internet
     can be controlled to measure latency, packet            Average service used to supply round-trip time
     loss and reachability.                                  (RTT) and packet-loss measurements using
     Keynote Systems (San Mateo, CA) has an                  five beacons testing to 109 destinations that are
     extensive set of tools for benchmarking and             connected to the Internet by typical access
     testing of Web sites on the Internet. It operates       lines and access ISPs. We used this low-level
     agents on servers in key locations on the               Internet data as input to our performance
     Internet. The data supplied shows load times of         model set with the application profiles of all
     full Web pages and their components.                    the KB40 sites. This is therefore a model-

Net Forecasts                                    January 2003                                            Page 1
    generated result of page load times at the                                                                  measurement agents, the two curves are almost
    network edge based upon Internet                                                                            identical.
    measurements (see BCR, October 2001, pp.
    28-36).                                                                                                     Moreover, it is clear that where within the 'Net (i.e.,
    Gomez DM of KB40 is the Gomez Desktop                                                                       backbone vs. edge) you measure is much more
    Monitoring (GDM) service, which measured                                                                    important than how you measure. What you
    the KB40 sites from hundreds of user desktops                                                               measure is defined by the individual page profiles
    over two weeks. This service was set to only                                                                that are diverse; in this case, we averaged results
    use broadband connected desktops (1.5 Mbps                                                                  across all Web sites measured by Keynote and
    or faster). This is the most "edge-oriented"                                                                Gomez. The detailed view by site, while important,
    measurement since it uses desktops that are                                                                 would have resulted in Figure 1 having 160 curves.
    connected via access networks or corporate                                                                  The two "edge" curves of Matrix and Gomez DM
    networks.                                                                                                   are not as close a match, but are still similar enough
                                                                                                                to be called a pair.
The curves in Figure 1 provide very interesting
insights. First, it is clear that there really are two                                                          Second, the edge view consistently shows much
distinct set of curves: the Keynote and Gomez                                                                   slower performance. The backbone performance is
backbone-measurement pair and the Matrix and                                                                    fast and consistent. There is a distinct peak of
Gomez edge-measurement pair. Most surprising is                                                                 performance in the 2 to 3 second range, with a very
the fact that although the Keynote and Gomez                                                                    fast drop-off towards longer times. However, the
backbone services measured different sites (only 2                                                              edge curves show a much more sloppy performance
of the 80 are in common) using different                                                                        zone across the 3 to 12 second range.


                                                                              Keynote KB40   Gomez PN Financials        Matrix of KB40      Gomez DM of KB40
                                                                    45%
      Probability of Page Loading in Time




                                                                    40%

                                                                    35%
                                            (by 1 sec increments)




                                                                    30%

                                                                    25%

                                                                    20%

                                                                    15%

                                                                    10%

                                                                    5%

                                                                    0%
                                                                          0            5            10            15             20              25             30
                                                                                                   Total Page Load Time (sec)


                                                                                 Figure 1 ­ Distribution of Page Load Times Within the US




Net Forecasts                                                                                            January 2003                                           Page 2
Investigating the Long Tail                             is understandably slower. The slowest 20th and 10th
                                                        percentiles of the samples are 3- and 4-times slower
The four curves in Figure 1 look different in the       than the overall means. This is a consistent pattern
first 15 seconds, and very similar after 15 seconds     for both the backbone and edge measurements. The
with hardly any users occupying the region. But         long tail is confirmed and shows the same pattern
this doesn't mean that there is an almost zero          in all measurement methods. This is an important
probability of seeing times after 15 seconds.           finding, and the implication is that regardless of the
Indeed, while the probability of seeing each unique     measurement service or testing location, large
second after the 15 second mark looks slight, there     numbers of the user population see significantly
are many of them stretching far out to 60 seconds       poorer performance than the one-number answer.
and beyond. These small probabilities add up. To
really understand the effect of the "long tail," you    Keynote also supplied us with a week of raw
have to perform in-depth statistical analysis of user   measurements of the Keynote Consumer 40 (KC40)
groups out on the tail as show in Table 1.              sites. It was expected that these sites, which are
                                                        measured using dial-up modems, would show a
Table 1 starts with the overall mean of the total       more significant distribution tail.
sample set for each measurement service. This is
the "one number" answer that most people like to        Surprisingly, although the KC40 had a long overall
hear. Not surprisingly, the means for the two           mean of 22.5 seconds, the ratio of the slow group to
backbone measurements are very close (2.2 and 2.7       the overall mean was a very modest 1.8 for both the
seconds). But notice that even though the curves        slowest 20th and 10th percentiles. This shows that
for the edge measurements look different in Figure      the consumer test results are much more flat with
1, their means are actually the same at 8.4 seconds.    no performance peak or tail. We think that the
                                                        payload for the KC40 Web sites trying to pass over
However, when the means are calculated for              the constrained dial-up link governs this flat curve.
subsets of the total population, the values tell a      Therefore, the effect of the distribution of latency
different story. Not surprisingly, when the samples     across the Internet is removed from the picture.
are cut in half by speed, the faster half is much
faster than the overall mean, while the slower half



                        Table 1 ­ Mean Response Times (sec) by User Percentile



                                       Backbone Measurements                 Edge Measurements

                                            Keynote     Gomez PN               Matrix of     Gomez DM
                                              KB40       Financials              KB40          of KB40
   Overall Mean (0-100%)                       2.2            2.7                   8.4            8.4
   Fastest Half (0-50%)                        0.7            1.0                   3.8            2.7
   Slowest Half (51-100%)                      3.4            4.3                 12.5            13.2
   Slowest Fifth (81-100%)                     6.1            7.1                 19.6            25.6
   Slowest Tenth (91-100%)                     9.2            9.8                 26.8            41.9




Net Forecasts                                   January 2003                                           Page 3
Practical Example                                        performance that is slower than 12 seconds. Now
                                                         how about a conversation with management that
Why should you care about all these complicated          starts with, "How bad is 4-times the target for half
details? Simple: It matters to your users. For           of our users?"
example, consider a Web site that has a 3-second
target for page-load time, which is reasonable for a     A shift in measurement point and looking deeper
typical business-to-business site.                       into the data creates a very different perspective on
                                                         how the users feel.
If the Web manager were to subscribe to one of the
more popular backbone measurement services, he           Conclusion
or she may believe that they had achieved their goal
with a mean score of less than 3 seconds (see Table      Use these measurement services, but use them
1). And given the above lesson in the distribution       wisely. Most importantly, avoid the simple index or
tail, even the slowest 20th and 10th percentiles of      single-number answer. Demand insightful
users experience about 7 and 9 seconds mean              distribution analysis of performance from these
response times. Not good, but not too bad. In some       vendors. One size does not fit all in clothing, so
on-line business situations, it may be worth having      why would one number represent the performance
a discussion with business managers with the             seen by all your users?
question, "How bad is 3-times the target for a tenth
of our users?"

However, if the manager thought through the
problem a bit more, he/she might realize that very
few users are directly connected to major Internet
backbone nodes at 10 or 45 Mbps. In fact, the
broadband edge user profile may be much more             Peter Sevcik is president of NetForecast in
typical of the real user population.                     Andover, MA, and is a leading authority on
                                                         Internet traffic, performance and technology. Peter
Furthermore, 4-times slower has a significant            has contributed to the design of more than 100
impact to the user's interaction with a computer (see    networks, including the Internet, and holds the
BCR, July 2002, pp. 8-9). In this example, 12            patent on application response-time prediction. He
seconds is 4-times the target, but we can clearly see    can be reached at peter@netforecast.com.
that half of the users on the Internet edge are seeing




NetForecast Inc. is a network technology consulting firm based in
Andover, Massachusetts. Our seasoned consultants draw on decades
of experience to help clients worldwide choose new technologies,
improve performance, and align infrastructure to business. We have
helped leading enterprises, service providers, and vendors navigate the
changing competitive landscape of the Internet economy. Please call
us to discuss how we can help your information network succeed.                  www.netforecast.com




Net Forecasts                                   January 2003                                           Page 4