Tags: acceptable response, access provider, attribute, bandwidth, criticality, ing, insiders, internet infrastructure, internet performance, internet service provider, john bartlett, performance problem, reliability, response times, service satisfaction, sevcik, speed matters, web performance, wetzel, wis,
INTERNET INFRASTRUCTURE
Understanding Web
Performance
Peter Sevcik and John Bartlett
Sure, speed matters, but it's Wetzel (rwetzel@rwetzel.com) has been survey-
ing enterprises about what they think of their
not a one-dimensional Internet service provider. Her latest survey shows
problem. And, despite what that while performance is the second highest
you've heard, just adding attribute out of 12 (after reliability) they desire
from an access provider, it is third from the bot-
more bandwidth doesn't tom in the service satisfaction rating.
always make things go There is a wide range of acceptable response
times depending on the activity, its criticality, etc.
faster. But all points along the range of acceptable
speeds are trending down. Yesterday's "fast" is
s there an Internet performance problem? considered "slow" today.
I There are two views: The conventional wis-
dom held by 'Net insiders is that things are
just fine. They point to the massive invest-
ment in bandwidth that has eliminated many con-
gestion problems, and to the performance of the
How Applications Work On A Network
The World Wide Web is a complex system of ser-
vices that operate on top of the Internet, a sepa-
rate, complex system of connectivity and trans-
Keynote Business-40 Index, which has fallen port. But despite all the complexity, the interac-
from more than 12 seconds to less than 3 seconds tion and behavior of the transactions between a
in only four years. client (browser) and server (website) is very con-
The other view is held by the vast majority of sistent. Page-load time--from the click on a URL
Internet users, who complain that the Web is an to the point at which the page is completely dis-
awfully slow way to do anything useful. To be played on the destination PC--is a process that
sure, most connections are slow; no matter whose can be boiled down to two functions: discovery
numbers you use, anywhere from 78 percent to 93 and transfer.
percent of the users in the U.S. use a dial-up s Discovery: A user starts the process by
modem to connect to the Internet. And most con- instructing his/her browser to open a connection
nections actually are at speeds closer to 30 kbps. to a destination known by a Universal Record
Furthermore, most real users and network Locator--a URL. URLs are convenient names or
shoppers don't visit the Keynote Business-40 handles people give to piles of information or
sites. Instead, they visit sites like MSN, AOL, some specific data. The browser must first ask a
Amazon, ICQ, ESPN and Disney, which are local Domain Name Server (DNS) to resolve the
designed to be "interesting and cool" rather than URL name into an IP address. After the DNS sys-
optimized for performance; these sites are slow by tem replies with a specific IP address for the URL,
design. In short, geography, demographics and the browser opens a Transmission Control Proto-
interest all play important roles in determining col (TCP) connection to that address.
what the "Internet experience" will be like. The process by which the connection is
The Gomez report, "Performance Metrics in opened is called a three-way handshake--three
Peter Sevcik is Context" (www.gomez.com), describes these packets of information and sequence numbers are
president of effects: Only 47 percent of ecommerce users sur- exchanged. Once the connection is established,
NetForecast in veyed are satisfied with the speed of the Web. the browser sends a HTTP "Get" command, ask-
Andover, MA, and a Indeed, speed is always among the top five ing for the content of the URL. The server replies
BCR columnist. He reasons for selecting an on-line service for a busi- with a base page, which is a description of what
can be reached at ness transaction or abandoning a shopping cart. the Web page will look like when loaded on the
peter@netforecast.com. But the need for speed is relative to the type of site screen along with a list of elements (more URLs)
John Bartlett is a being visited (brokerage, shopping, travel) and the to fill the screen. Then the browser sends a "Get"
principal at type of function being performed (browsing, buy- for each element, one at a time.
NetForecast, and he ing, getting a confirmation). So far, the browser has been "discovering"
can be reached at These findings aren't limited to the consumer where to go to get the content and how to proceed
john@netforecast.com. market. Since 1997, analyst and author Rebecca on actually calling for the content. Each of the
28 BUSINESS COMMUNICATIONS REVIEW/ OCT2001 Use BCR's Acronym Directory at www.bcr.com/bcrmag
exchanges described above requires processing by
the client (desktop PC) and one or more servers on FIGURE 1 Wide Range Of Application Profiles
the Internet. Each exchange also requires that at
least one packet go from the client to a server with
at least one packet coming back in reply. Getting
a Web page is truly a process of discovery, which
has just begun.
Conventions and specifications on how the
Web operates force a few more discoveries to
occur. Each element that must be retrieved from a
website requires a separate TCP connection along
with, potentially, a new DNS address resolution
process. (The attribute "persistent TCP" in HTTP
version 1.1 reduces the need for some of the TCP
opens, but its effectiveness is limited and some-
times counterproductive.) It is very likely that the
browser will be told to get some of the content
from other servers that are not even associated
with the base Web page, such as banner ads. In a
typical scenario, the website sends the browser to
DoubleClick, which will exchange cookie data to
figure out which ad this user should see at this
time, and then a URL is sent directing the brows-
er to yet another server to receive the ad.
All the exchanges described above can be
grouped into a number we call "turns." A turn is a
non-content carrying exchange of packets
between client and server that requires a round-
trip over the network. More specifically, it is a
count of each time communications changes a new byte count. If all goes well, the content
direction among these discovery packets. TCP- never stops arriving; the client acknowledges fast
level acknowledgements (ACKs) are not counted enough and the server keeps the connection or
as turns. A turn is limited to the ping-pong packets "pipe" full. In theory, the transfer should operate
that do not move any user visible content. at the speed of the slowest link in the system less
Think of the number of times you have to the overhead of protocol headers.
swing your head back and forth if you are watch- But things hardly ever go that well. There are
ing a tennis match from a seat near the net. Now often delays in updating the window. The server
think of the number of times the objective scor- often waits to get an acknowledgement. If a pack-
ing a point occurred. The ratio of head turns to et is lost, a retransmission has to occur. TCP also
total points in the match may make for an inter- uses a mechanism called "slow-start" to help man-
esting game, but they are a sure indication of how age congestion. Since most Web elements are
long the game will take. Some tennis matches take small enough to fit into one or two packets, the
hours to end! Some websites take long to load for system is always operating in the start-up (slow)
the same reason. phase of the cycle.
Turns add up. Turns take time. They are a The bottom line is that transfer takes time, and
direct byproduct of the quest to make the Web the throughput is not nearly that of the slowest
simple to build and highly scalable. link in the system.
s Transfer: Once the browser finishes the dis-
covery process for each element, it starts the trans- Application Profiles
fer process of moving the content (text, graphic, Any transactional application--the Web is transac-
photograph, etc.) to the desktop. The transfer is tional--can be characterized by payload size and
performed by TCP using standard windowing and turn count. This article does not cover non-transac-
acknowledgement procedures coded into the tional applications like voice and video. We call
client and server operating system. the payload and turn data an "application profile."
TCP is a transfer protocol that is controlled by Fundamental performance over a network can be
the receiver. Since the overwhelming ratio of con- derived from only these two numbers.
tent to be moved is from server to client, it is the Figure 1 shows the wide range of payload and
client that governs how fast things will move. The turns from our library of more than fifty applica-
client advertises a window size in bytes that it is tions. Each circle in the figure encompasses the
prepared to receive from the server. Once some or profiles of the most common user tasks for each
all of the window is successfully received, the application. They are grouped by major applica-
client acknowledges and updates the window with tion genre.
BUSINESS COMMUNICATIONS REVIEW/ OCT2001 29
works. The formula predicts the time necessary to
FIGURE 2 Application Profiles Of The KB40 bring payload across the network, including the
overhead of the protocols involved (TCP open,
DNS look-up, etc).
Once the data has arrived, additional time is
required to render the information in a useful for-
mat on the computer screen for the user. The for-
mula does not account for this necessary render-
ing time, only the time needed to bring in the pay-
load itself. This formula also assumes that the
server has the content being requested, and mere-
ly has to retrieve and send it. If the server is
required to do an extensive database search before
responding, this additional time needs to be added
to this equation.
There are two parts of the performance equa-
tion, representing the two components of delay in
retrieving data through the Internet. The discovery
component accounts for the client/server interac-
tions required to set up the payload transfer. The
Our first significant study of Web traffic and its transfer component accounts for the time it takes
application profile was performed in 1995, when to move the payload bytes across the network.
we watched 20,000 users at a single large compa- s Part 1--Discovery (Accounting for Turns):
ny attack the Web with gusto. In 1995, the typical This term accounts for the delay incurred as the
business Web home page had a profile of 50,000 client and server set up the payload transfer, dri-
bytes payload and 20 turns. Using the Keynote ven by the number of application turns. These
Business-40 (KB40) as a representative sample of turns include DNS lookups, TCP opens, HTTP
business sites, the current average KB40 profile is Gets and other protocol interactions that are nec-
115,000 bytes payload and 40 turns. Figure 2 essary to find the server, open the connection and
shows the application profile of each site in the establish which piece of data is required. Since
KB40 along with the average composite profiles these application turns typically use very small
for three summers where we gathered significant packets, their network performance is limited by
data on business sites. round trip delay. This portion of the total time is
We noticed that during this summer, 10 sites in represented by:
the KB40 were consistently ranked by Keynote as Discovery Time = 2(D+L+C)+(D+C/2)
one of the top 10 performing sites for that week. It ((T-2)/M)+DLn((T-2)/M+1)
is interesting to see that the average profile for The multiplexing factor M represents the abili-
those 10 sites is 24 turns and 65,000 bytes of pay- ty of some applications or browsers to multi-thread
load--half of the profile for the remaining 30 or initiate more than one transfer simultaneously.
KB40 sites. Clearly a good application profile is a While current browsers are set to a multiplexing
good step in making a site perform fast. factor of four, actual measurements show that such
Figure 2 also indicates the trends in business- efficiency is rarely achieved; most browser/Web
oriented website profiles. Payload has been climb- page combinations operate at three threads.
ing steadily--the compound annual growth rate The 2(D+L+C) at the beginning of the equa-
from 1995 to 1999 was 13 percent, and from 1999 tion represents two round-trip delays, one for the
to 2001 it accelerated to 19 percent. TCP Open and one for the HTTP Get. These two
More alarming growth occurred in turns from interactions must take place sequentially to get the
1995 to 1999, where the annual growth rate was base Web page and discover how many elements
22 percent, but the turn count appears to have need to be fetched. Once these two interactions
peaked in 2000, and it has since fallen consider- are complete, the remaining components are sub-
ably. The overall change in turns from 1999 to ject to the multiplexing factor.
2001 was a decline of 4 percent. It appears that s Part 2--Transfer (Moving the Payload):
Web managers at these sites, who are under the Payload transfer is limited either by the connec-
Keynote microscope, have finally realized that tion speed, or by the combination of window size
simplifying the Web page and thus reducing turn and round-trip delay. Whichever is greater deter-
count is to their benefit. mines the transfer time. The equation calculates
both times, and then chooses the larger value for
Predicting Performance this portion of the equation. This portion of the
We have developed a useful formula for predict- overall calculation is:
ing the performance of an application across the Payload Time = max(8P(1+OHD)/B, DP/W)/
Internet. This formula was developed by analyz- (1-sqrt(L))
ing the behavior of the protocols, as well as by The max function chooses either line delay or
extensive comparison with real data from real net- window delay in the numerator of the equation.
30 BUSINESS COMMUNICATIONS REVIEW/ OCT2001
FIGURE 3 Comparing Company Measurement Services
Window size,
round-trip delay
and packet loss
all affect
transfer time
Note that overhead is added to the payload. Over- between server and client. This is typically the
head is a percentage that accounts for HTTP, TCP access line from the service provider to the enter-
and Level 2 bytes that are added to the actual pay- prise or home.
load to move it through the network. If 10 percent Lastly, the mux factor must be set to match the
additional bytes are required to move the payload, behavior of the client or browser. As has been
OHD would be set to 0.1 as is the current situation noted, current browsers achieve an effective mux
on the Web. of three. However, transaction-processing applica-
The window size and round-trip delay affect tions, ftp transfers and most older applications are
the payload transfer, because the server is only single-threaded, and so will require that the mux
allowed to send a window's worth of data before be set to one.
receiving back an acknowledgement from the
client that the data was received. The acknowl- Measuring Web Performance
edgement time is limited by the round-trip delay We used detailed measurement data of the top 10
of the connection. Although the window size is performing sites in the KB40 to verify the accura-
usually at least 8 Kbytes, TCP is required to send cy of the formula shown above. This also gave us
an acknowledgement after two full-size packets
are received. Empirical evidence shows that set-
ting the window size to match two full-size pack- The Complete Formula
ets (3 Kbytes) works well. R = 2(D+L+C)+(D+C/2)((T-2)/M)
One more factor comes into play: packet loss. +Dln((T-2)/M+1)+max(8P(1+OHD)/
Each loss of a packet causes an inefficiency in the B, DP/W)/(1-sqrt(L))
TCP interaction, slowing the transfer. The denom- B = Min line speed (bits per second)
inator of the equation models this slowdown. C = Cc + Cs
s Total Response Time: The total response time
is the sum of the two sections above, discovery Cc = Client processing time (seconds)
and transfer. (See "The Complete Formula.") An Cs = Server processing time (seconds)
Excel spreadsheet with the full equation can be D = Round trip delay (seconds)
found on the NetForecast website, at www.net- L = Packet loss (fraction)
forecast.com. M = multiplexing factor
The equation makes the simplifying assump-
OHD = Overhead (fraction)
tion that the server payload is much larger than the
client payload, as is the case for Web pages. Clear- P = Payload (bytes)
ly, client payload must be accounted for in situa- R = Response Time (seconds)
tions where client payload dominates, such as T = application turns (count)
Web publishing or on-line backup. W = Window size (bytes)
The bandwidth value in these equations is the ŠNetForecast Inc.
bandwidth of the slowest link in the network
BUSINESS COMMUNICATIONS REVIEW/ OCT2001 31
How Did We Get The Data?
eynote Systems (www.keynote.com): users have downloaded into their desktop
Keynote and
Porivo have
different testing
K Keynote publishes a weekly list of the
Keynote Business 40 performance on
their website. They publish the top 10 sites
computers. Porivo is then able to schedule the
clients to run specific performance tests
throughout the day. Porivo ran performance
(best performance) for each week. Keynote tests against the top 10 members of the KB40
approaches tests these sites every 15 minutes throughout for four weeks, from late July through mid
the business day from their test agents. These August, using agents on T1 and cable access
agents are located in 25 cities, where they are lines. We imported this data into a Microsoft
connected to the Internet with T3 speeds or Access database, and then sorted and averaged
greater. The KB40 websites are tested from the numbers to generate the Porivo results.
each server, and the performance measurement Note that because the Porivo client is running
they post is an aggregate of those values. on a user desktop, it will be affected by local
Porivo Technologies (www.porivo.com): proxy and caching servers. It will not, however,
Porivo has thousands of clients that Internet take advantage of browser caches
the opportunity to compare the techniques of two First on the list is access line rate. We focused
leading Web measurement services, Keynote and the data taken from Porivo on users with broad-
Porivo. (See "How Did We Get The Data?" band access, but "broadband" often means a speed
Keynote tests the Keynote Business-40 each above 384 kbps, about one-fourth of a T1. But
day of the workweek, every 15 minutes, from Keynote has high-speed access--45 Mbps T3--
each of its testing agents. These tests are then from its datacenter locations. For most websites,
averaged together and sorted by page-load time. the delay caused by payload transiting a T1 link
The top 10, those websites with the highest per- will not be a decisive factor, but it may make a dif-
formance (fastest page-load times), are listed on ference for low latency connections.
their website each week, showing the site name Because real user desktops are further from the
and the average time to load that Web page. Internet core and on slower lines, they also exhib-
Keynote also indicates how many weeks this site it higher packet loss percentages. A prime spot for
has been in the top 10. packet loss is at the boundary between the core
We asked Porivo to test the fastest 10 of the and the user's access ISP. This additional loss
Keynote Business-40, over a four-week period in slows the transfer, as explained above.
July and August of 2001. Porivo activated its Thirdly, round-trip delay is lower for Keynote
agents, which are installed in user desktops across than for most end users. Keynote has its test
the nation. These agents then tested the download agents set up at datacenters around the country,
speed of the target websites every 15 minutes where they are directly tied to backbone
throughout the workweek. The average download providers. They are never far, in Internet delay,
speed of each site, per week, was then calculated from the big carrier that will take them to the web-
from the results. site being tested. End users, on the other hand, are
Keynote and Porivo have different testing on the other end of an access link or even an
approaches, and can be expected to deliver slight- access ISP, which causes additional delay. We
ly different results. We have compiled data here have broken down the delay in Table 1 to show
for four weeks in August from each service, and both the backbone portion and the access link por-
Figure 3 (p. 31) shows the results. In looking at tion to demonstrate this difference.
that figure, however, two quite different answers Multiplexing also comes into play when a test-
emerge to the same question: How fast does this ing service emulates the browser behavior. If the
page download? Is this the same Internet? test agent behaves exactly like a browser, it uses
Let's work with the formula proposed above to only three connections at a time and, typically,
recreate these numbers. The variables we have to downloads only two objects at a time. The
play with are bandwidth, client-processing time, Keynote agents, once they parse the base page,
round-trip delay, packet loss and fetch as many objects simultaneously as possible.
the multiplexing factor. The Web This tests the performance of the Internet, but
TABLE 1 Variable Changes page profile and server parame- does not necessarily match the user experience of
Keynote Porivo ters are constant across both mea- opening that page.
Line Rate 45 Mbps 1.5 Mbps
surement services. Lastly, client processing follows much the
Table 1 shows the result of same argument as multiplexing. Because Keynote
Loss 0.1 % 5%
poking at the formula variables uses a dedicated, powerful server for its testing
RTT Backbone 21 msec 21 msec until it properly recreates the agent, it is likely to have much shorter client-pro-
RTT Access 0 26 msec results shown in Figure 1 above. cessing times than a user desktop. The desktop is
M 6 3 Here is why we believe these running a non-real time operating system and may
Client Proc 12 msec 36 msec parameter changes make sense. be doing other tasks concurrently. Increasing the
32 BUSINESS COMMUNICATIONS REVIEW/ OCT2001
server and client computing times to zero. A
TABLE 2 Realistic Performance Parameters perfect network is one that has no latency and
Best Case Typical Case no packet loss.
Each recalculation gave an equal or better
Beyond 512 kbps,
Line Rate 1.5 Mbps 1.5 Mbps result than the base case. An equal result indi-
Loss 1% 5% cates that the change to the parameter made there's no
RTT (Backbone + Access) 55 msec 110 msec no change to the total response time. We then performance
RTT for Modem Users 155 msec 210 msec apportion the improvements to the base case
(Figure 4). advantage in
M 3 3
Note that access bandwidth improves buying more
Response Time 3.9 sec 8.2 sec
things dramatically as you go from 56 kbps to
384 kbps, but then the effect goes away com- access bandwidth
client-processing value for Porivo makes sense in pletely by 1.5 Mbps. The reason there is still some
this context. minor benefit for the 384-kbps "Best Case" user to
Making the above parameter changes and feed- buy more bandwidth is that his/her network per-
ing the equations with the profiles of the top 10 formance is good enough to take advantage of the
KB40 Web pages then yields numbers that close- better speed. However, there is no advantage for
ly match the empirical results shown in Figure 3. either the Best Case or the Typical Case user to buy
more than 1.5 Mbps. In fact, the point at which no
What Contributes To Poor Performance more benefit occurs is at about 512 kbps.
Given that the formula can match measured per- As broadband access grows, the focus will
formance of 10 different sites using two different have to shift to making network latency and loss
measurement techniques, we are confident of its commensurately lower. Packet loss can be
predictive capabilities across a wider range of addressed with proper engineering. However,
alternatives. It is interesting to test the value of latency is limited by the speed of light and the cir-
new technologies that are proposed to improve cuitous routing that paths will always take in a
Web performance. network. The only sure way to improve perfor-
Here we use the overall KB40 average mance below four seconds is to either move the
(115,000 byte payload and 40 turns) as a better server closer to the user or to reduce the number
indicator of a typical page, because the tests were of turns in a Web page. There are many companies
clearly performed on a group of sites that have an that are addressing these approaches with a variety
unusually low payload and turn count. We also of performance-boosting products and services.
made a few changes to the Keynote parameter set-
tings in order to make it a more realistic represen- Implications Of The Data
tation of a very well-connected user that we call Clearly, there is a Web performance problem. Real
the "Best Case" as shown in Table 2. The Porivo world performance is 310 times slower than the
parameters are essentially unchanged, becoming
the "Typical Case." In addition, we had to create a
new RTT for dial-up users that accounts for the FIGURE 4 Overall Delay Drivers
100 msec latency penalty of a dial-up modem.
The results indicate that a Best Case broadband
user will see the typical Web page load in four sec-
onds, while the more Typical Case broadband user
will likely experience a load in eight seconds. This
range is in line with our experience.
Now that the real performance model is under-
stood, we can vary any parameter in order to see
the effect. The most logical investigation is to
study the effect bandwidth has on response time.
Figure 4 shows the dramatic effect to response
time when bandwidth is much slower than 1.5
Mbps. The majority of Internet users see a typical
page load in more than 20 seconds, a vast differ-
ence from the Best Case broadband user.
However, it is also interesting to make some of
the elements of delay go away. Figure 4 shows the
components of delay for users in each of the band-
width classes. In each case, we recalculate the for-
mula as if the element under investigation were
perfect. We replace the access line rate (band-
width) with a Gigabit Ethernet pipe (1,000 Mbps).
In the case of the computers, we drove both the
BUSINESS COMMUNICATIONS REVIEW / OCT 2001 35
often-quoted Keynote Business-40 numbers; no under-investing in companies that make the 'Net
users actually see the performance of the KB40. run better (edge services). Second, they are sur-
During a meeting held here at NetForecast last prised when the mass market (millions of users)
year (May 2, 2000), we challenged the CEO of does not show up for their dot-com investments.
Clearly, there is a Keynote, Umang Gupta, with the observation that Maybe, just maybe, the fact that the basic Web
Web performance the KB40 index was shifting away from being a page took about 20 seconds to download had
problem real measure of true performance. His reply was, something to do with it.
"Our data is not intended for use as a measure of Porivo stands out as a reliably accurate source
any specific user, and they should not be used for for realistic measurements of the true user experi-
historic trend analysis since they change over ence, largely because of two key factors: First, its
time. The data is intended as a benchmark for agents are on real desktops that see the perfor-
comparison between websites at any given time." mance of the full path from the server. Second, it
While Keynote is providing a very useful and can tap the resourses of thousands of agents dis-
interesting benchmarking service, they should tributed over demographic and geographic points
remove "seconds" from the charts and simply call that match the true Internet user population.
it an "Index" very much like the Dow Jones Indus- A lot still needs to be done to improve perfor-
trial Index: Interesting but not relevant to daily life. mance on the 'Net. We also need better methods
We can think of only one group of users that of measuring performance, along with under-
sees performance approaching the level published standing the impacts of poor performance. It will
by Keynote. They are connected with lightly- be interesting to watch the improvements emerge
loaded T1 lines directly to a core ISP, have only that successfully tackle the real long-term culprits
the latest fastest desktops and spend most of their --payload, turn count and network latency
day checking on company credit ratings at Dunn
and Bradstreet and chasing document shipments Companies Mentioned In This Article
on FedEx; in other words, VCs.
These VCs who think the 'Net is just fine are Gomez (www.gomez.com)
thus making two very wrong bets: First, they over- Keynote Systems (www.keynote.com)
invest in bandwidth plays (e.g., optical) while Porivo Technologies (www.porivo.com)
Webtorials
36 BUSINESS COMMUNICATIONS REVIEW/ OCT2001