Information about http://www.cs.princeton.edu/sip/pub/webtiming.pdf

Timing Attacks on Web Privacy …

Tags: browser design, browser features, department of computer science, edward w felten, information gathering, internet programming, java javascript, laboratory department, malicious web, princeton nj, princeton university, programming laboratory, quire, sev, slowdown, time variations, university princeton, unrelated web, web access, web privacy,
Pages: 8
Language: english
Created: Fri Aug 25 11:58:51 2000
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
Page 6
image
Page 7
image
Page 8
image
                                        Timing Attacks on Web Privacy
                                          Timing Attacks on Web Privacy
                                                Edward W. Felten and Michael A. Schneider
                                                 Secure Internet Programming Laboratory
                                                    Department of Computer Science
                                                           Princeton University
                                                        Princeton, NJ 08544 USA
ABSTRACT                                                                                       · Standard Web "anonymization" services do not prevent the
We describe a class of attacks that can compromise the privacy of                                attacks; in many cases they actually make the attacks worse.
users' Web-browsing histories. The attacks allow a malicious Web                               · Disabling browser features such as Java, JavaScript, and client-
site to determine whether or not the user has recently visited some                              side caching do not prevent the attacks.
other, unrelated Web page. The malicious page can determine this
information by measuring the time the user's browser requires to                               · The only effective ways we know to prevent the attacks re-
perform certain operations. Since browsers perform various forms                                 quire either an unacceptable slowdown in Web access, or a
of caching, the time required for operations depends on the user's                               modification to the design of the browser.
browsing history; this paper shows that the resulting time variations
convey enough information to compromise users' privacy. This at-                               · Even modifying the browser design allows only a partial rem-
tack method also allows other types of information gathering by                                  edy; several attacks remain possible.
Web sites, such as a more invasive form of Web "cookies". The
attacks we describe can be carried out without the victim's knowl-                        1.1 Why Web Privacy Matters
edge, and most "anonymous browsing" tools fail to prevent them.                              There is now widespread concern about the privacy of users' ac-
Other simple countermeasures also fail to prevent these attacks. We                       tivities on the World-Wide Web. The list of Web locations visited
describe a way of reengineering browsers to prevent most of them.                         by a user often conveys detailed information about the user's fam-
                                                                                          ily, financial or health situation. Consequently, users often consider
                                                                                          their Web-browsing history to be private information that they do
1.      Introduction                                                                      not want unknown parties to learn. Of course, visiting a Web site
   This paper describes a class of attacks that allow the privacy of                      necessarily leaks some information to that site; but users would like
users' activities on the Web to be compromised. The attacks al-                           some assurance that information about their visits to a site is not
low any Web site to determine whether or not each visitor to the                          available to arbitrary third parties.
site has recently visited some other site (or set of sites) on the Web.                      Thus far in the short history of the Web, two types of prob-
The attacker can do this without the knowledge or consent of ei-                          lems have led to compromise of users' Web-browsing histories, and
ther the user or the other site. For example, an insurance-company                        remedies are available for both types.
site could determine whether the user has recently visited Web sites                         First, some Web sites gather information and then reveal it to
relating to a particular medical condition; or an employer's Web                          third parties without the informed consent of users. (Alternatively,
site could determine whether an employee visiting it had recently                         some sites cause users' browsers to reveal information directly to a
visited the sites of various political organizations.                                     third-party site.) These problems have been dealt with by the use
   The attacks work by exploiting the fact that the time required by                      of privacy policies and third party audits of Web sites. While these
the user's browser to perform certain operations varies, depending                        remedies leave much to be desired, they do give users a chance to
on which Web sites the user has visited in the past. By measur-                           guess where information will go after it is revealed to a law-abiding
ing the time required by certain operations, a Web site can learn                         site.
information about the user's past activities. These attacks are par-                         Second, some implementation bugs in browsers have provided
ticularly worrisome, for several reasons:                                                 opportunities for unscrupulous third parties to gather information
                                                                                          without the user's consent. While these bugs are part of an unfor-
     · The attacks are possible because of basic properties of Web                        tunate pattern of security bugs in browsers, each bug by itself has
       browsers, not because of fixable "bugs" in a browser.                              been fixable.
     · The attacks can be carried out without the victim's knowl-                            The attacks we describe in this paper admit no such remedy. Be-
       edge.                                                                              cause information about visits to a site is not controlled by that site,
                                                                                          privacy policies, auditing, and trust in sites are not effective reme-
                                                                                          dies. Because the attacks are not caused by browser bugs, they
                                                                                          cannot easily be fixed.


Permission to make digital or hard copies of all or part of this work for                 2.      Exploiting Web Caching
personal or classroom use is granted without fee provided that copies are                    The first timing attack we will discuss exploits Web caching. We
not made or distributed for profit or commercial advantage and that copies                first review how Web caching works, and then discuss the attack.
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific        2.1 Web Caching
permission and/or a fee.
CCS'00, Athens, Greece.                                                                     Accessing Web documents often takes a long time, so Web browsers
Copyright 2000 ACM 1-58113-203-4/00/0011 ..$5.00                                          use caching -- they save copies of recently-accessed files, so that


                                                                                     25
future accesses to those files can be satisfied locally, rather than re-        -- along with the main measurement. Known misses can be gen-
quiring another time-consuming Web access. Caching is relatively                erated by accessing nonexistent (e.g. randomly generated) URLs
effective in reducing perceived access times of Web pages.                      on the same site as the file to be measured. Known hits can be
   The purpose of caching is to make accesses to recently-visited               generated by replicating the main measurement. The second and
files faster. The attack exploits this by measuring the amount of               subsequent measurements will be known cache hits. By measuring
time required to access a particular file. If that file is in the user's        several known hits and misses, the attacker can choose a threshold
cache, the access will be fast; if not, it will be slow. By measuring           for discriminating between hits and misses. We show below that
the access time, an attacker can determine whether or not a file is             the attacker can pick a good threshold and make determinations
in the user's cache, and hence whether or not the user has accessed             with high accuracy, even when the control cases are restricted to
the file recently.                                                              only two known hits and two known misses.
                                                                                   Another way for the adversary to improve the accuracy of his
2.2 An Example                                                                  measurement is to combine the results of several measurements.
   To illustrate what we are talking about, we will now give a simple           For example, there might be several static image files linked from
example. Readers should bear in mind that this is only one example;             Charlie's home page. If Bob measures the access time for each of
countermeasures that would prevent this simple example attack will              these images individually, this gives him several estimates of the
not necessarily prevent more sophisticated versions of the attack.              likelihood that Alice had visited Bob's site. By using standard sta-
   Suppose that Alice is surfing the Web, and she visits Bob's Web              tistical techniques, Bob can combine these estimates to obtain a
site at http://www.bob.com. Bob wants to find out whether                       more accurate estimate.
Alice has visited Charlie's Web site (http://www.charlie.
com).                                                                           2.4 Delivering an Attack
   First, Bob looks at Charlie's site, and picks a file that any visitor           The measurements described above are all performed by HTML
to the site will have seen. Let's say Bob picks the file contain-               pages that are viewed by the victim. In order to carry out the attack,
ing Charlie's corporate logo, at http://www.charlie.com/                        the attacker must cause the victim to view an HTML page written
logo.jpg. Bob is going to determine whether the logo file is in                 by the attacker. There are many ways to do this.
Alice's Web cache. If the file is in her cache, then she must have
visited Charlie's Web site recently.                                               1. An unscrupulous organization that runs a popular Web site
   Bob writes a Java applet that implements his attack, and em-                       could put measurement code into its site.
beds it in his home page. When Alice views the attack page, the
Java applet is automatically downloaded and run in Alice's browser.                2. A Web advertising agency could add measurement code to
The applet measures the time required to access http://www.                           the banner ads it distributes.
charlie.com/logo.jpg on Alice's machine, and reports this
time back to Bob. If the time is less than some threshold (say, 80                 3. The attacker could create a Web site that is engineered to
milliseconds), Bob concludes that Alice has been to Charlie's site                    appear near the top of the list returned by a popular search
recently; if it is greater than the threshold, he concludes that she has              engine when the user searches for a common search term.
not been to the site recently.
                                                                                   4. The attacker could send the victim an email message refer-
2.3 Measuring Access Time                                                             ring to some enticing content, such as a low-price sale, or an
   The main component of the attack is a measurement of the access                    award certificate, on the attacker's Web site.
time for a particular Web URL. There are several ways the attacker
could measure this time. Java and JavaScript both have facilities for              5. The attacker could send the victim an email message with an
measuring the current time and for loading an arbitrary URL; these                    HTML message body. If the victim uses an HTML-enabled
can be combined in the obvious way to measure the time required                       mail reader (and most popular mail readers are now HTML-
to load a URL.                                                                        enabled), the measurement would be performed when the
   Java and JavaScript provide the most accurate means of mea-                        victim read the email message. The message could be dis-
suring access time, but the attacker can get a sufficiently accurate                  guised as an unwanted "spam" message, so that the victim
measurement even if Java and JavaScript are disabled. This is done                    did not notice anything unusual.
by writing a Web page that loads three files in sequence:
   1. a dummy file on the attacker's site,
                                                                                2.5 Accuracy of the Measurements in Practice
                                                                                   We performed a series of experiments to determine how accu-
   2. the file whose access time is to be measured, and                         rately an attacker could distinguish cache hits from cache misses.
                                                                                In the experiments, we ran the Netscape Navigator 4.5 browser on
   3. another dummy file on the attacker's site.
                                                                                a Windows NT 4.0 (Service Pack 4) PC with a 350 MHz Pentium
The attacker's Web server can record the time at which it receives              II processor and 256 Megabytes of RAM. (Preliminary results indi-
requests 1 and 3; subtracting yields an approximation to the time               cate that our conclusions would have been unaffected had we used
required to perform step 2. (We omit the straightforward but tedious            Internet Explorer rather than Netscape Navigator.) This PC was
details of how to write HTML that causes popular browsers to make               connected to our department's network via a 10BaseT link.
serialized accesses to files.)                                                     The experiments used our department's Web server, which is
   Any of these methods can be used to perform a measurement                    on the same switched departmental network as the client browser.
invisibly to the victim. The Java and JavaScript programs need                  Because the client and server are so close together in our experi-
not display anything, and the three-step method can also be im-                 ments, cache miss times are much lower than they would be in prac-
plemented in a way that does not affect the appearance of the page              tice, thereby making it artificially difficult to distinguish hits from
doing the measuring.                                                            misses. In practice, we expect miss times to be longer, giving the
   To make the measurement as accurate as possible, the attacker                attacker even higher measurement accuracy than our experiments
can measure some control cases -- known hits and known misses                   show.


                                                                           26
2.5.1 Computing the Accuracy
   The ultimate goal of our experiments will be to determine how                                     500                             Hit in Browser Cache
accurately an attacker can distinguish cache hits from misses. We                                                                    Miss in Browser Cache
will characterize the accuracy as a percentage, which tells us what
percentage of unknown references the attacker will be able to cor-                                   400
rectly characterize. We now consider how to compute this percent-




                                                                                         Frequency
age.
   The attacker will choose a threshold value T , such that, given an                                300

unknown reference r with access time tr , he will decide r is a hit if
tr < T , and will decide r is a miss if tr > T . If tr = T , he will decide
                                                                                                     200
randomly, choosing "hit" with probability p. Given T and p, we
can compute the attacker's accuracy by seeing what percentage of
references are mischaracterized.                                                                     100
   To determine the accuracy, we first compute the accuracy sepa-
rately for hits and misses. In other words, we compute the fraction
of hits that are correctly characterized, and the fraction of misses
                                                                                                           -50   0    50       100          150       200
that are correctly characterized. We then take the minimum of these
two values, so that an accuracy of (say) 98% means that the attacker                                                 Load time (ms)
can get at least 98% of hits right, and at least 98% of misses right.
   But how would the attacker choose T and p? Different methods
                                                                                   Figure 1: Distribution of access times for known cache hits and
are called for, depending on whether or not the attacker knows the
                                                                                   known cache misses, as measured by a JavaScript program em-
performance characteristics of the user's system. We will consider
                                                                                   bedded in the attacker's Web page.
two cases. In the first, the attacker knows the distributions of access
times for hits and misses. In the second case, the attacker does not
know these distributions.                                                             2. Choose T as the mean of max(H1 , H2 ) and min(M1 , M2 ).
2.5.2 Computing Accuracy from Known Time Distributions
   To choose T and p optimally, the attacker needs to know two                        3. Choose p, the probability of concluding a hit for measure-
things:                                                                                  ments exactly equal to T , as 1/2.
   1. the distribution of access times for references that hit in the              Intuitively, the four-sample method will achieve high accuracy if
      cache, which we will represent with a function h such that                   there is not much overlap between the distributions for hits and for
      h(t) is the probability that a hit will have access time equal to            misses. If almost all misses take longer than almost all hits, high
      t; and                                                                       accuracy will result, even if the attacker does not know the distribu-
                                                                                   tions of hit and miss times.
   2. the distribution of access times for references that miss in the
      cache, which we will represent with a function m such that
                                                                                   2.5.4 Measuring with Client-Side JavaScript
                                                                                      Figure 1 shows the distribution of hit times and miss times when
      m(t) is the probability that a miss will have access time equal              the attacker measures access latency using a JavaScript program
      to t.                                                                        embedded in a Web page. The JavaScript program runs on the client
From these, we calculate two additional functions: H(t) is the prob-               browser and measures the time both before and after loading a file.
ability that a hit will have access time less than t, and M(t) is the              Subtracting the two measured times yields the access latency, which
probability that a miss will have access time greater than t.                      is shown here.
  Now, if we choose the parameters T and p, the accuracy is                           Figure 1 shows that hit times are almost always considerably
                                                                                   lower than miss times. An attacker who knows these distributions
       A(T, p) = min(H(T ) + ph(T ), M(T ) + (1 - p)m(T )).                        will set his threshold T to be 60 ms, his p to be 0.14, and will
To find the optimal values for the parameters, we loop over all val-               achieve 98.5% accuracy. An attacker who does not know the dis-
ues of T . For each value of T we find the value of p that maximizes               tribution, but uses the four-sample method, will achieve a mean
A(T, p), by first finding the value p such that                                    accuracy of 97.7%.
                                                                                   2.5.5 Measuring on the Server Side
              H(T ) + p h(T ) = M(T ) + (1 - p )m(T ),                                Using a Java or JavaScript program to measure access times is
                                                                                   the most effective strategy for the attacker, but a few victims will
and then choosing p as follows:
                                                                                   make this impossible by turning off JavaScript. Even if JavaScript is
   1. If p  0, choose p = 0.                                                       turned off, the attacker can still measure the access time by creating
                                                                                   an HTML page that hits a known URL on the attacker's site, then
   2. If p  1, choose p = 1.                                                       loads the file that the attacker wants to test, then hits another known
                                                                                   URL on the attacker's site. The attacker's Web server can measure
   3. If 0 < p < 1, choose p = p .
                                                                                   the times at which it receives the two hits. Subtracting these two
Finding the optimal T and p is relatively efficient.                               time measurements yields a measure of how long it took to load the
2.5.3 Computing Accuracy with Unknown Time Distributions                           file that the attacker wants to test.
   In practice, the attacker often does not know the distributions h(t)               This approach will typically lead to a higher variance in mea-
and m(t). To find a good value for T , the attacker can employ the                 sured times, since the time for the first and third hits to reach the
following "four-sample" method:                                                    attacker's Web server will vary. We performed an experiment to
                                                                                   determine how well the attacker could distinguish cache hits from
   1. Measure the time taken by the client to retrieve two known                   misses in this scenario.
      hits (H1 , H2 ), and two known misses (M1 , M2 ).


                                                                              27
                   200
                                                                                 Since accesses made through these systems are all subject to
                                                                              caching, none of these systems prevents the timing attacks dis-
                                                           Known Hit          cussed above. Indeed, these systems may well make the attack eas-
                                                           Known Miss         ier -- routing cache misses through an intermediary makes misses
                   150
                                                                              even slower, which makes it easier for the adversary to distinguish
                                                                              hits from misses.
                                                                                 Some of these systems attempt to block Java and JavaScript from
       Frequency




                   100
                                                                              reaching the browser, which would force the attacker to rely on the
                                                                              less accurate server-side timing method. However, this is unlikely
                                                                              to prevent the attack entirely, and in any case there is doubt as to
                                                                              whether it is possible for a firewall to block Java and JavaScript in
                   50
                                                                              all cases [4].
                                                                                 An anonymizing tool could also modify Web sessions to mark
                                                                              every incoming file as uncacheable. This would be equivalent to
                                                                              turning off caching at the Web browser, which is discussed in Sec-
                    0                                                         tion 7.
                         0    50         100         150           200
                             Server measurement time (ms)
                                                                              4. Exploiting DNS Caching
                                                                                 Attacks on Web caching can be prevented by turning off Web
Figure 2: Distribution of access times for known cache hits and               caching. This has a high performance penalty, since caching has a
known cache misses, as measured by server-side time measure-                  major effect on overall Web performance.
ment.                                                                            Even if Web caching is turned off, an attacker can exploit other
                                                                              types of caching, such as DNS caching, to learn about a user's ac-
                                                                              tivities.
   Figure 2 shows the distribution of times we measured for known
cache hits and known cache misses. The two distributions overlap a            4.1 DNS Caching
bit but are generally disjoint. To be precise, an attacker who knows             The Domain Naming System (DNS) [1] is the mechanism that
the distributions will choose T to be 60 ms and p = 1.0, giving an ac-        looks up human-readable "DNS names" like www.whitehouse.
curacy of 96.7%. An attacker who does not know the distributions,             gov so that they can be translated into the numeric IP addresses
but uses the four-sample method, will achieve a mean accuracy of              like 192.168.13.4 that the Internet protocols use. Every DNS
93.8%.                                                                        address must undergo this translation before it is used. This is done
   Summing up these two experiments, we can see that an attacker              with the help of a set of server machines on the Internet.
can determine with high accuracy whether or not a particular file is             Because a DNS lookup requires potentially expensive network
present in the victim's cache.                                                communication, and because machines typically look up the same
                                                                              DNS addresses repeatedly, most machines keep a DNS cache that
2.6 Uncacheable Files                                                         holds the results of recent DNS lookups. Subsequent lookups of the
   Not all files on the Web are cacheable. Some files are explic-             same address can be completed quickly by using the cached result.
itly marked as uncacheable by the Web server supplying them, and                 As with a Web cache, a DNS cache records information about
some files are inherently uncacheable, for example because they               a user's recent activities. In particular, if the user visits a Web
are dynamically generated. A study [9] by Wolman and colleagues               server at www.whitehouse.gov, this will cause a DNS lookup
at the University of Washington found that about 60% of Web ac-               for www.whitehouse.gov, so an entry for this address will ap-
cesses are to cacheable files1 . The Washington group's data also             pear in the DNS cache of the user's PC. Subsequent attempts to
shows that the fraction of files that are cacheable is not changing           look up www.whitehouse.gov will complete quickly because
appreciably over time [8].                                                    of the cache entry.
   Uncacheable files do not always prevent timing attacks. Pages                 There are several ways for an HTML page to cause a DNS lookup.
that are dynamically generated often include embedded images that             For example, a Java applet can directly ask to have a DNS lookup
are cacheable -- for example, a portal's news page might include              done; a JavaScript program can indirectly cause a DNS lookup to
the portal's corporate logo. To carry out a successful attack, the            occur. By measuring the time required to look up a particular DNS
attacker need only find one cacheable file that is referenced by the          address, an HTML page can determine whether or not a particular
page.                                                                         address appears in a user's DNS cache, and can therefore determine
                                                                              whether the user accessed that address recently.
3. Anonymization Tools Do Not Help                                               Such an attack can be delivered by any of the same methods as
  Several tools exist to provide "anonymous" Web browsing; ex-                the Web caching attacks described above, via either a Web page or
amples include anonymizer.com [2], Crowds [6], the Lucent Per-                an email message.
sonal Web Assistant [3], and Zero Knowledge Freedom [10]. These               4.2 Accuracy of the Measurements
tools all send Web requests through some kind of intermediary which
                                                                                 Figure 3 shows the distribution of DNS lookup time for known
shields the address of the original requester.
                                                                              hits and known misses to three different Internet sites, as deter-
                                                                              mined by a Java applet. The three sites cs.princeton.edu
                                                                              (our department), rutgers.edu (about thirty kilometers from
                                                                              us), and berkeley.edu (about 4000 kilometers from us) were
1 In
   their study, a file was considered cacheable if version 2 of the           chosen as examples of DNS servers with different network laten-
Squid proxy cache [7] would have been willing to cache it.                    cies for DNS queries. The test client experienced a small latency


                                                                         28
                 150                                           Hit in Resolver Cache         of multilevel caching are often used for Web content, backing a
                                                                                             client browser cache with a departmental or institutional Web proxy
     Frequency

                                                               Miss in Resolver Cache
                 100
                                                                                             which performs caching.
                  50                                                                            Multilevel caches are presumably effective in reducing average
                                                                                             access time; that is the main reason for their existence. In prin-
                                                                                             ciple, then, if an attacker determines that an access misses in the
                       0            50             100              150
                           [cs.princeton.edu] load time (ms)                                 first-level cache, he may be able to tell whether it hits or misses
                                                                                             in the second-level cache. This may allow attackers to gather even
                                                                                             more information than is available from measuring first-level cache
                 150                                           Hit in Resolver Cache         contents.
     Frequency




                                                               Miss in Resolver Cache
                 100
                                                                                                The typical difference between a first-level cache and a second-
                                                                                             level cache is that second-level caches are usually shared between
                  50                                                                         multiple client machines. For example, a departmental DNS cache
                                                                                             might be shared by all of the client machines in the department. In
                       0            50             100              150                      general, a shared departmental cache aggregates information about
                             [rutgers.edu] load time (ms)                                    the access patterns of individuals in the department: it records which
                                                                                             files have been accessed by members of the department, but does
                                                                                             not record which specific individual accessed which file. Depend-
                 150                                           Hit in Resolver Cache
                                                                                             ing on the circumstances, leaking aggregated information may be
     Frequency




                                                               Miss in Resolver Cache
                 100                                                                         either more or less dangerous than leaking individual information.
                  50
                                                                                             5.1 Testing for Cache Sharing
                                                                                                If an adversary can test for the presence of files in second-level
                       0            50             100              150                      caches, he can also determine whether two clients share a second-
                            [berkeley.edu] load time (ms)
                                                                                             level cache. This would be done by causing the first client to access
                                                                                             a unique name (in a part of the namespace controlled by the at-
                                                                                             tacker), and then determining whether that name is present in the
Figure 3: Distribution of DNS lookup time for known DNS hits
                                                                                             second-level cache of the other client. If it is present, then the two
and known DNS misses for three sites, as measured by a Java
                                                                                             clients must share a second-level cache.
applet embedded in the attacker's Web page.
                                                                                                An attacker can use this attack in several ways. He might use it
                                                                                             to probe the organizational structure of an enterprise, exploiting the
                                                                                             fact that employees who work in the same group, department, or
for queries to cs.princeton.edu, a slightly larger latency for                               location tend to share second-level caches. Alternatively, he might
queries to rutgers.edu, and a somewhat larger latency in queries                             use it to find the location of an unknown client, by checking whether
to berkeley.edu.                                                                             that client shares a second-level cache with any of a set of clients
   Computing the accuracy as described above in Section 2.5.2                                whose locations are known.
for each site, we find that, for the three servers, an attacker would
knows the distributions would choose (T, p) to be (10 ms, 0.63),                             5.2 Accuracy of Measurements
(20 ms, 0.15), and (60 ms, 0), respectively, giving optimal accura-                             We performed an experiment to determine how accurately an at-
cies of 53.5%, 90.4%, and 100.0%, respectively. Using the four-                              tacker could distinguish hits from misses in a shared departmental
sample method, an attacker who does not know the distributions                               HTTP proxy cache. To do this, we disabled caching in a client
will achieve mean accuracies of 53.5%, 87.7%, and 100.0% respec-                             browser and configured the client to use our department's caching
tively.                                                                                      HTTP proxy. We then measured the access time for known hits and
   The differences in these accuracy measurements demonstrate that                           known misses in the proxy cache.
these tests do not perform well when queries to the DNS server be-                              Figure 4 shows the results. The hit distribution and the miss dis-
ing tested experience a very small cache miss penalty. However,                              tribution are nearly disjoint. Following the methodology of Sec-
when the miss penalty is large enough to detect (as it is for almost                         tion 2.5.2, we find that an attacker who knows the distributions
all sites on the net) the tests lead to very high accuracies.                                will choose his threshold T to be 50 ms, and his p to be 0.58,
   Thus, when the attacker chooses a DNS server with a measurable                            achieving an accuracy of 96.7%. Using the four-sample method,
latency between the server and the victim, the attacker can effec-                           an attacker who does not know the distributions can attain an accu-
tively determine whether or not a particular entry is in the victim's                        racy of 95.7%. In other words, the attacker can effectively test for
DNS cache.                                                                                   the existence of a file in a shared second-level cache.

5.          Attacking Multi-Level Caches                                                     6.     Cache Cookies
   So far we have described how attackers can learn the contents                                Traditional web cookies are one way for sites to maintain persis-
of caches on client workstations. Often, client caches are backed                            tent state on the clients who visit their pages. The cookie itself is a
by additional caches that are shared by many clients. For example,                           relatively small amount of data, "written" by the server and stored
a client PC's DNS cache is typically backed by another cache that                            by the client as part of a normal HTTP request. Clients volunteer
is shared at a departmental level. If an address is not found in the                         the contents of the cookie back to the same server on subsequent
client's DNS cache, a request is sent to the departmental cache.                             HTTP accesses as part of the HTTP request. Clients can store cook-
If the address is in the departmental cache, the result is provided                          ies across multiple browsing sessions; a cookie may have an expiry
to the client; if not, the departmental cache does a DNS lookup,                             time associated with it beyond which it will be discarded by the
which may involve lookups in additional caches. Similar forms                                client. [5]


                                                                                        29
                                                                              of information (a unique identifier, for example), the number of
                 800                               Hit at Proxy               cache files required is small enough to be managable.
                                                   Miss at Proxy
                                                                              6.3 Cookie Reading Caveats
                 600
                                                                                 While the process of writing a cache cookie into the user's cache
                                                                              is straightforward, checking for the existence of a cache cookie is
     Frequency




                                                                              slightly more complex. Checking for the presence of a file requires
                                                                              reading the file; and reading the file normally causes the file to be
                 400                                                          loaded into the cache, so it might appear that checking for the ex-
                                                                              istence of a cache cookie has the side-effect of putting that cookie
                                                                              into the user's cache. An additional technique is necessary to pre-
                                                                              vent cookie checking from destroying the information contained in
                 200
                                                                              the cookies.
                                                                                 To make reads nondestructive, the server must know, when serv-
                                                                              ing a URL corresponding to a cache cookie, whether the client
                                                                              is reading or writing. One method of discrimination is to use the
                       0    50             100               150              Referer: HTTP header. For example, suppose we have a cookie
                              Load time (ms)                                  URL cookie.jpg. The cookie might be read by an html page
                                                                              reader.html, or written by another html page writer.html.
                                                                              Since the enclosing page is included in the Referer: HTTP header,
Figure 4: Distribution of access times, with client-side caching              the server can examine this header to determine whether to read or
disabled, for known hits and and known misses in a shared                     write.
HTTP proxy cache, as measured by a JavaScript program em-                        When writing, the server should return the file with a very long
bedded in the attacker's Web page.                                            expiry time. This will help to make sure that the cookie does not
                                                                              expire from the client's web cache. When reading, the server should
                                                                              return the file with a very short expiry time. After a string of cookies
                                                                              has been read, the ones that were present already (which will have
   Sites might use the persistent state stored on each client, for ex-        long expiry times) will persist in the cache. The cookies which
ample, to retain the site preferences of individual users. Cookies            were absent (and hit during the reading process) will expire quickly,
can also be used to tie multiple site visits to the same user. Most           returning them to the absent state. In this way, servers can read
web browsers provide a method for accepting (storing) or rejecting            client-side cache cookies nondestructively.
(ignoring) web cookies on the client. This method allows clients to
opt-out of the persistent state, perhaps for personal privacy reasons.        6.4 Using Cache Cookies
                                                                                 Cache cookies have the advantage (from the attacker's point of
6.1 Cache Cookies                                                             view) that they require no client-side support. In most cases the
   Using the methods described above, it is possible to develop a             client would be completely unaware that cache cookies are in use.
new, more intrusive, form of web cookies, which we call "cache                Worse yet, cache cookies can be used across sites (i.e. written by
cookies". Servers can store cache cookies on clients who visit their          one site and read by another independent site). These qualities
pages, without the client's knowledge or approval. Cache cook-                make cache cookies very dangerous to the privacy of web users.
ies are stored in the form of entries in the client's web cache. By
forcing a client to retrieve a specific URL (using an IMG tag, for
example), the server can effectively write entries into the client's          7. Countermeasures
cache, thus storing the cookie. To read the cookie, the server can              We now turn to the question of how to counter these attacks.
use any of the measurement techniques described above to measure              Several types of countermeasures exist, but all are unattractive.
the retrieval time for the same URL. A hit indicates that the cookie
is present, otherwise the cookie is not present.                              7.1 Turning Off Caching
   Although cache cookies can be used to emulate traditional cook-               The first countermeasure is simply to turn off caching. This
ies, they differ in two important respects. First, they require no            will certainly prevent the attacks, but it imposes a big performance
client side support. A server can use cache cookies to store infor-           penalty. Indeed, it seems likely that the Web and the DNS infras-
mation on a client without the user's knowledge or approval. We               tructure could not meet the bandwidth demands imposed by the re-
know of no way that a browser can unconditinally block these sorts            moval of caching.
of cookies without an unacceptable performance loss.                          7.1.1 Turning Off First-Level Caching
   Second, cache cookies can be writen and read by different servers.            Rather than turning off caching altogether, we could try to turn
The reading of traditional web cookies is restricted to the site which        off first-level caching and rely on department-scale second-level
originally wrote them by web browser security. Implementing this              caching. This has significant performance implications, and it still
type security for cache cookies would most likely require reengi-             allows attackers to gather aggregated information from the second-
neering of the operation of the client's web browser.                         level caches.
                                                                              7.1.2 Implications of Transparent Caching
6.2 Emulating "Traditional" Web Cookies                                          Recently, systems employing transparent Web caching have been
   A single cache file can be only present or absent, so it can store         devised. Transparent caching does not require clients to explictly
only a single bit of state. Traditional Web cookies store multi-bit           designate a cache. Rather, a transparent cache observes network
values, so several files (one per bit, plus a few extra for error cor-        traffic over some network segment, detects which traffic encodes
rection) must be put in the cache to emulate one traditional cookie.          Web requests, and automatically provides a Web cache for those re-
Since traditional cookies are used only to store very small amounts           quests, without the knowledge or consent of the requestors. Trans-


                                                                         30
parent caching can be embedded in network components such as                       it is in the user's cache. However, this attempt would come with
routers, and it offers improved performance for Web clients with-                  domain-tag adversary.com, which is different from the domain-
out requiring any reconfiguration at the client. Transparent caching               tag of the cached copy, so the result would be a cache miss. In short,
cannot be turned off at the client, so one client, acting alone, may               the adversary's access to the page would miss in the cache, regard-
be unable to avoid using caching.                                                  less of whether the user had visited Yahoo.
                                                                                       Domain tagging does transform some hits into misses, but this
7.2 Altering Hit or Miss Performance                                               only occurs when the same file is accessed while viewing multiple
   Another class of countermeasures relies on altering the perfor-                 domains. This situation is sufficiently rare that it should not pose
mance of cache hits, in an attempt to make them harder to distin-                  much of a performance problem in practice.
guish from misses.                                                                     Domain tagging would be effective in preventing some timing at-
   Simply slowing down hits is of limited value. As long as hits                   tacks on browser caches, but it fails to protect against other attacks.
are appreciably faster than misses, attackers will be able to extract              It will not prevent timing attacks that learn shared proxy cache con-
useful information about cache contents. Of course, if we make hits                tents (unless we augment the proxy-http protocol to carry domain-
as slow as misses, then attackers will be handcuffed; but in that case             tag information). It will not prevent attacks on DNS caches. It also
we might as well have turned off caching.                                          fails to prevent the use of cache cookies, although it does prevent a
   Another approach is to increase the variance in hit times in a                  cache cookie set by one site from being seen by another site. Given
randomized fashion. This also has only limited value. If a particular              the limited value provided by domain tagging, it is not clear whether
hit is noticeably faster than a miss, then the attacker will still be able         it is worth implementing in real browsers.
to categorize it as a hit. If it is not noticeably faster than a miss, then
it might as well have been a miss. In other words, this approach
essentially converts some randomly chosen hits into misses, with a
                                                                                   9.     Conclusion
consequent loss of performance. The more hits we wish to disguise,                    Cache-based timing attacks are possible in many situations where
the more performance will suffer.                                                  references to data are cached. By measuring the time required to
   Alternatively, we could try increasing the variance in miss times.              access a data element, a malicious program can determine whether
This fails utterly. We cannot make misses faster, since they are                   that element is in a nearby cache, thereby allowing the program to
already as fast as they can be. However, if we make a miss slower,                 learn whether that element has been accessed recently on the same
we just make it easier to classify as a miss. Thus increasing miss                 machine (for local caches) or in the same group of machines (for
variance can only make things worse.                                               shared caches). We expect that this type of attack will be possible
                                                                                   in many situations besides the ones we have described.
7.3 Turning Off Java and JavaScript                                                   Web technologies allow an attacker to control the sequence of
   Our final countermeasure is to turn off Java and JavaScript. This               data accesses on a remote machine, and hence to carry out cache-
takes away the attacker's most accurate measurement tools, and                     based timing attacks. An attack could be delivered by a Web page,
hence lowers the accuracy of the attacker's measurements.                          or in an email message if the victim uses an HTML-enabled mailer.
   There are two problems with this approach. First, the attacker can                 We have described attacks that probe the contents of Web browser
still do reasonably accurate measurements, as the data presented                   file caches, to learn a user's Web browsing history, and attacks that
above in Section 2.5.5 demonstrates. Second, Java and JavaScript                   probe DNS caches, to learn which network addresses a machine has
are now so widely used on Web sites that it seems unrealistic to ask               connected to recently.
ordinary users to turn them off.                                                      We are not aware of any practical countermeasures to these at-
                                                                                   tacks. There seems to be little hope that effective countermeasures
7.4 Countermeasures Summary                                                        will be developed and deployed any time soon.
  None of the countermeasures discussed here is attractive. At
present, we would have to turn off caching, paying a high perfor-                  Acknowledgments
mance price, to prevent these attacks.
                                                                                   Thanks to Drew Dean, Gary McGraw, Avi Rubin, Dan Wallach,
                                                                                   Randy Wang, and the anonymous referees for their comments and
8.     Domain Tagging                                                              suggestions.
   As discussed above, no practical countermeasure against our tim-
ing attacks is available today. We now describe a countermeasure,                  10. REFERENCES
based on changing the browser's cache implementation to imple-
                                                                                    [1] A LBITZ , P., AND L IU , C. DNS and BIND, third ed. O'Reilly
ment "domain tagging," that prevents some timing attacks.
                                                                                        and Associates, 1998.
   Currently, each item in the browser's cache is tagged with the
URL of the file of which it is a copy. We add another tag, called                   [2] A NONYMIZER. Web site at http://www.anonymizer.com.
the domain-tag, which gives the domain name of the page that the                    [3] G ABBER , E., G IBBONS , P., M ATIAS , Y., AND M AYER , A.
user is viewing. A cache access is treated as a hit only if both tags                   How to make personalized web browsing simple, secure, and
match.                                                                                  anonymous. In Proc. of Financial Cryptography '97 (1997).
   For example, suppose the user is currently viewing the page http:                [4] M ARTIN J R ., D. M., R AJAGOPALAN , S., AND RUBIN ,
//search.yahoo.com/search/options. All accesses to                                      A. D. Blocking Java applets at the firewall. In Proc. of
the cache while rendering this page (and any images and other                           Internet Society Symposium on Network and Distributed
material embedded in it) are given the domain-tag yahoo.com.                            System Security (1997).
For example, while fetching the image http://us.yimg.com/                           [5] N ETSCAPE C OMMUNICATIONS. Persistent client state:
images/yahoo.gif (which is embedded in the search options                               HTTP cookies. Web site at
page), the domain-tag yahoo.com is used.                                                http://home.netscape.com/newsref/std/cookie spec.html.
  If the user later visited www.adversary.com, this page could
embed the yahoo.gif image, in an attempt to determine whether


                                                                              31
 [6] R EITER , M. K., AND RUBIN , A. D. Crowds: Anonymity for
     web transactions. ACM Trans. on Information and System
     Security 1, 1 (June 1998).
 [7] The squid web proxy cache. Web site at http://squid.nlanr.net.
 [8] VOELKER , G. Personal communication.
 [9] W OLMAN , A., VOELKER , G., S HARMA , N., C ARDWELL ,
     N., B ROWN , M., L ANDRAY, T., P INNEL , D., K ARLIN , A.,
     AND L EVY, H. Organization-based analysis of web-object
     sharing and caching. In Proc. of Second USENIX Symp. on
     Internet Technologies and Systems (Oct. 1999), pp. 25­36.
[10] Z ERO K NOWLEDGE S YSTEMS. Freedom 1.0. Web site at
     http://www.freedom.net.




                                                               32