Tags: browser design, browser features, department of computer science, edward w felten, information gathering, internet programming, java javascript, laboratory department, malicious web, princeton nj, princeton university, programming laboratory, quire, sev, slowdown, time variations, university princeton, unrelated web, web access, web privacy,
Timing Attacks on Web Privacy
Timing Attacks on Web Privacy
Edward W. Felten and Michael A. Schneider
Secure Internet Programming Laboratory
Department of Computer Science
Princeton University
Princeton, NJ 08544 USA
ABSTRACT · Standard Web "anonymization" services do not prevent the
We describe a class of attacks that can compromise the privacy of attacks; in many cases they actually make the attacks worse.
users' Web-browsing histories. The attacks allow a malicious Web · Disabling browser features such as Java, JavaScript, and client-
site to determine whether or not the user has recently visited some side caching do not prevent the attacks.
other, unrelated Web page. The malicious page can determine this
information by measuring the time the user's browser requires to · The only effective ways we know to prevent the attacks re-
perform certain operations. Since browsers perform various forms quire either an unacceptable slowdown in Web access, or a
of caching, the time required for operations depends on the user's modification to the design of the browser.
browsing history; this paper shows that the resulting time variations
convey enough information to compromise users' privacy. This at- · Even modifying the browser design allows only a partial rem-
tack method also allows other types of information gathering by edy; several attacks remain possible.
Web sites, such as a more invasive form of Web "cookies". The
attacks we describe can be carried out without the victim's knowl- 1.1 Why Web Privacy Matters
edge, and most "anonymous browsing" tools fail to prevent them. There is now widespread concern about the privacy of users' ac-
Other simple countermeasures also fail to prevent these attacks. We tivities on the World-Wide Web. The list of Web locations visited
describe a way of reengineering browsers to prevent most of them. by a user often conveys detailed information about the user's fam-
ily, financial or health situation. Consequently, users often consider
their Web-browsing history to be private information that they do
1. Introduction not want unknown parties to learn. Of course, visiting a Web site
This paper describes a class of attacks that allow the privacy of necessarily leaks some information to that site; but users would like
users' activities on the Web to be compromised. The attacks al- some assurance that information about their visits to a site is not
low any Web site to determine whether or not each visitor to the available to arbitrary third parties.
site has recently visited some other site (or set of sites) on the Web. Thus far in the short history of the Web, two types of prob-
The attacker can do this without the knowledge or consent of ei- lems have led to compromise of users' Web-browsing histories, and
ther the user or the other site. For example, an insurance-company remedies are available for both types.
site could determine whether the user has recently visited Web sites First, some Web sites gather information and then reveal it to
relating to a particular medical condition; or an employer's Web third parties without the informed consent of users. (Alternatively,
site could determine whether an employee visiting it had recently some sites cause users' browsers to reveal information directly to a
visited the sites of various political organizations. third-party site.) These problems have been dealt with by the use
The attacks work by exploiting the fact that the time required by of privacy policies and third party audits of Web sites. While these
the user's browser to perform certain operations varies, depending remedies leave much to be desired, they do give users a chance to
on which Web sites the user has visited in the past. By measur- guess where information will go after it is revealed to a law-abiding
ing the time required by certain operations, a Web site can learn site.
information about the user's past activities. These attacks are par- Second, some implementation bugs in browsers have provided
ticularly worrisome, for several reasons: opportunities for unscrupulous third parties to gather information
without the user's consent. While these bugs are part of an unfor-
· The attacks are possible because of basic properties of Web tunate pattern of security bugs in browsers, each bug by itself has
browsers, not because of fixable "bugs" in a browser. been fixable.
· The attacks can be carried out without the victim's knowl- The attacks we describe in this paper admit no such remedy. Be-
edge. cause information about visits to a site is not controlled by that site,
privacy policies, auditing, and trust in sites are not effective reme-
dies. Because the attacks are not caused by browser bugs, they
cannot easily be fixed.
Permission to make digital or hard copies of all or part of this work for 2. Exploiting Web Caching
personal or classroom use is granted without fee provided that copies are The first timing attack we will discuss exploits Web caching. We
not made or distributed for profit or commercial advantage and that copies first review how Web caching works, and then discuss the attack.
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific 2.1 Web Caching
permission and/or a fee.
CCS'00, Athens, Greece. Accessing Web documents often takes a long time, so Web browsers
Copyright 2000 ACM 1-58113-203-4/00/0011 ..$5.00 use caching -- they save copies of recently-accessed files, so that
25
future accesses to those files can be satisfied locally, rather than re- -- along with the main measurement. Known misses can be gen-
quiring another time-consuming Web access. Caching is relatively erated by accessing nonexistent (e.g. randomly generated) URLs
effective in reducing perceived access times of Web pages. on the same site as the file to be measured. Known hits can be
The purpose of caching is to make accesses to recently-visited generated by replicating the main measurement. The second and
files faster. The attack exploits this by measuring the amount of subsequent measurements will be known cache hits. By measuring
time required to access a particular file. If that file is in the user's several known hits and misses, the attacker can choose a threshold
cache, the access will be fast; if not, it will be slow. By measuring for discriminating between hits and misses. We show below that
the access time, an attacker can determine whether or not a file is the attacker can pick a good threshold and make determinations
in the user's cache, and hence whether or not the user has accessed with high accuracy, even when the control cases are restricted to
the file recently. only two known hits and two known misses.
Another way for the adversary to improve the accuracy of his
2.2 An Example measurement is to combine the results of several measurements.
To illustrate what we are talking about, we will now give a simple For example, there might be several static image files linked from
example. Readers should bear in mind that this is only one example; Charlie's home page. If Bob measures the access time for each of
countermeasures that would prevent this simple example attack will these images individually, this gives him several estimates of the
not necessarily prevent more sophisticated versions of the attack. likelihood that Alice had visited Bob's site. By using standard sta-
Suppose that Alice is surfing the Web, and she visits Bob's Web tistical techniques, Bob can combine these estimates to obtain a
site at http://www.bob.com. Bob wants to find out whether more accurate estimate.
Alice has visited Charlie's Web site (http://www.charlie.
com). 2.4 Delivering an Attack
First, Bob looks at Charlie's site, and picks a file that any visitor The measurements described above are all performed by HTML
to the site will have seen. Let's say Bob picks the file contain- pages that are viewed by the victim. In order to carry out the attack,
ing Charlie's corporate logo, at http://www.charlie.com/ the attacker must cause the victim to view an HTML page written
logo.jpg. Bob is going to determine whether the logo file is in by the attacker. There are many ways to do this.
Alice's Web cache. If the file is in her cache, then she must have
visited Charlie's Web site recently. 1. An unscrupulous organization that runs a popular Web site
Bob writes a Java applet that implements his attack, and em- could put measurement code into its site.
beds it in his home page. When Alice views the attack page, the
Java applet is automatically downloaded and run in Alice's browser. 2. A Web advertising agency could add measurement code to
The applet measures the time required to access http://www. the banner ads it distributes.
charlie.com/logo.jpg on Alice's machine, and reports this
time back to Bob. If the time is less than some threshold (say, 80 3. The attacker could create a Web site that is engineered to
milliseconds), Bob concludes that Alice has been to Charlie's site appear near the top of the list returned by a popular search
recently; if it is greater than the threshold, he concludes that she has engine when the user searches for a common search term.
not been to the site recently.
4. The attacker could send the victim an email message refer-
2.3 Measuring Access Time ring to some enticing content, such as a low-price sale, or an
The main component of the attack is a measurement of the access award certificate, on the attacker's Web site.
time for a particular Web URL. There are several ways the attacker
could measure this time. Java and JavaScript both have facilities for 5. The attacker could send the victim an email message with an
measuring the current time and for loading an arbitrary URL; these HTML message body. If the victim uses an HTML-enabled
can be combined in the obvious way to measure the time required mail reader (and most popular mail readers are now HTML-
to load a URL. enabled), the measurement would be performed when the
Java and JavaScript provide the most accurate means of mea- victim read the email message. The message could be dis-
suring access time, but the attacker can get a sufficiently accurate guised as an unwanted "spam" message, so that the victim
measurement even if Java and JavaScript are disabled. This is done did not notice anything unusual.
by writing a Web page that loads three files in sequence:
1. a dummy file on the attacker's site,
2.5 Accuracy of the Measurements in Practice
We performed a series of experiments to determine how accu-
2. the file whose access time is to be measured, and rately an attacker could distinguish cache hits from cache misses.
In the experiments, we ran the Netscape Navigator 4.5 browser on
3. another dummy file on the attacker's site.
a Windows NT 4.0 (Service Pack 4) PC with a 350 MHz Pentium
The attacker's Web server can record the time at which it receives II processor and 256 Megabytes of RAM. (Preliminary results indi-
requests 1 and 3; subtracting yields an approximation to the time cate that our conclusions would have been unaffected had we used
required to perform step 2. (We omit the straightforward but tedious Internet Explorer rather than Netscape Navigator.) This PC was
details of how to write HTML that causes popular browsers to make connected to our department's network via a 10BaseT link.
serialized accesses to files.) The experiments used our department's Web server, which is
Any of these methods can be used to perform a measurement on the same switched departmental network as the client browser.
invisibly to the victim. The Java and JavaScript programs need Because the client and server are so close together in our experi-
not display anything, and the three-step method can also be im- ments, cache miss times are much lower than they would be in prac-
plemented in a way that does not affect the appearance of the page tice, thereby making it artificially difficult to distinguish hits from
doing the measuring. misses. In practice, we expect miss times to be longer, giving the
To make the measurement as accurate as possible, the attacker attacker even higher measurement accuracy than our experiments
can measure some control cases -- known hits and known misses show.
26
2.5.1 Computing the Accuracy
The ultimate goal of our experiments will be to determine how 500 Hit in Browser Cache
accurately an attacker can distinguish cache hits from misses. We Miss in Browser Cache
will characterize the accuracy as a percentage, which tells us what
percentage of unknown references the attacker will be able to cor- 400
rectly characterize. We now consider how to compute this percent-
Frequency
age.
The attacker will choose a threshold value T , such that, given an 300
unknown reference r with access time tr , he will decide r is a hit if
tr < T , and will decide r is a miss if tr > T . If tr = T , he will decide
200
randomly, choosing "hit" with probability p. Given T and p, we
can compute the attacker's accuracy by seeing what percentage of
references are mischaracterized. 100
To determine the accuracy, we first compute the accuracy sepa-
rately for hits and misses. In other words, we compute the fraction
of hits that are correctly characterized, and the fraction of misses
-50 0 50 100 150 200
that are correctly characterized. We then take the minimum of these
two values, so that an accuracy of (say) 98% means that the attacker Load time (ms)
can get at least 98% of hits right, and at least 98% of misses right.
But how would the attacker choose T and p? Different methods
Figure 1: Distribution of access times for known cache hits and
are called for, depending on whether or not the attacker knows the
known cache misses, as measured by a JavaScript program em-
performance characteristics of the user's system. We will consider
bedded in the attacker's Web page.
two cases. In the first, the attacker knows the distributions of access
times for hits and misses. In the second case, the attacker does not
know these distributions. 2. Choose T as the mean of max(H1 , H2 ) and min(M1 , M2 ).
2.5.2 Computing Accuracy from Known Time Distributions
To choose T and p optimally, the attacker needs to know two 3. Choose p, the probability of concluding a hit for measure-
things: ments exactly equal to T , as 1/2.
1. the distribution of access times for references that hit in the Intuitively, the four-sample method will achieve high accuracy if
cache, which we will represent with a function h such that there is not much overlap between the distributions for hits and for
h(t) is the probability that a hit will have access time equal to misses. If almost all misses take longer than almost all hits, high
t; and accuracy will result, even if the attacker does not know the distribu-
tions of hit and miss times.
2. the distribution of access times for references that miss in the
cache, which we will represent with a function m such that
2.5.4 Measuring with Client-Side JavaScript
Figure 1 shows the distribution of hit times and miss times when
m(t) is the probability that a miss will have access time equal the attacker measures access latency using a JavaScript program
to t. embedded in a Web page. The JavaScript program runs on the client
From these, we calculate two additional functions: H(t) is the prob- browser and measures the time both before and after loading a file.
ability that a hit will have access time less than t, and M(t) is the Subtracting the two measured times yields the access latency, which
probability that a miss will have access time greater than t. is shown here.
Now, if we choose the parameters T and p, the accuracy is Figure 1 shows that hit times are almost always considerably
lower than miss times. An attacker who knows these distributions
A(T, p) = min(H(T ) + ph(T ), M(T ) + (1 - p)m(T )). will set his threshold T to be 60 ms, his p to be 0.14, and will
To find the optimal values for the parameters, we loop over all val- achieve 98.5% accuracy. An attacker who does not know the dis-
ues of T . For each value of T we find the value of p that maximizes tribution, but uses the four-sample method, will achieve a mean
A(T, p), by first finding the value p such that accuracy of 97.7%.
2.5.5 Measuring on the Server Side
H(T ) + p h(T ) = M(T ) + (1 - p )m(T ), Using a Java or JavaScript program to measure access times is
the most effective strategy for the attacker, but a few victims will
and then choosing p as follows:
make this impossible by turning off JavaScript. Even if JavaScript is
1. If p 0, choose p = 0. turned off, the attacker can still measure the access time by creating
an HTML page that hits a known URL on the attacker's site, then
2. If p 1, choose p = 1. loads the file that the attacker wants to test, then hits another known
URL on the attacker's site. The attacker's Web server can measure
3. If 0 < p < 1, choose p = p .
the times at which it receives the two hits. Subtracting these two
Finding the optimal T and p is relatively efficient. time measurements yields a measure of how long it took to load the
2.5.3 Computing Accuracy with Unknown Time Distributions file that the attacker wants to test.
In practice, the attacker often does not know the distributions h(t) This approach will typically lead to a higher variance in mea-
and m(t). To find a good value for T , the attacker can employ the sured times, since the time for the first and third hits to reach the
following "four-sample" method: attacker's Web server will vary. We performed an experiment to
determine how well the attacker could distinguish cache hits from
1. Measure the time taken by the client to retrieve two known misses in this scenario.
hits (H1 , H2 ), and two known misses (M1 , M2 ).
27
200
Since accesses made through these systems are all subject to
caching, none of these systems prevents the timing attacks dis-
Known Hit cussed above. Indeed, these systems may well make the attack eas-
Known Miss ier -- routing cache misses through an intermediary makes misses
150
even slower, which makes it easier for the adversary to distinguish
hits from misses.
Some of these systems attempt to block Java and JavaScript from
Frequency
100
reaching the browser, which would force the attacker to rely on the
less accurate server-side timing method. However, this is unlikely
to prevent the attack entirely, and in any case there is doubt as to
whether it is possible for a firewall to block Java and JavaScript in
50
all cases [4].
An anonymizing tool could also modify Web sessions to mark
every incoming file as uncacheable. This would be equivalent to
turning off caching at the Web browser, which is discussed in Sec-
0 tion 7.
0 50 100 150 200
Server measurement time (ms)
4. Exploiting DNS Caching
Attacks on Web caching can be prevented by turning off Web
Figure 2: Distribution of access times for known cache hits and caching. This has a high performance penalty, since caching has a
known cache misses, as measured by server-side time measure- major effect on overall Web performance.
ment. Even if Web caching is turned off, an attacker can exploit other
types of caching, such as DNS caching, to learn about a user's ac-
tivities.
Figure 2 shows the distribution of times we measured for known
cache hits and known cache misses. The two distributions overlap a 4.1 DNS Caching
bit but are generally disjoint. To be precise, an attacker who knows The Domain Naming System (DNS) [1] is the mechanism that
the distributions will choose T to be 60 ms and p = 1.0, giving an ac- looks up human-readable "DNS names" like www.whitehouse.
curacy of 96.7%. An attacker who does not know the distributions, gov so that they can be translated into the numeric IP addresses
but uses the four-sample method, will achieve a mean accuracy of like 192.168.13.4 that the Internet protocols use. Every DNS
93.8%. address must undergo this translation before it is used. This is done
Summing up these two experiments, we can see that an attacker with the help of a set of server machines on the Internet.
can determine with high accuracy whether or not a particular file is Because a DNS lookup requires potentially expensive network
present in the victim's cache. communication, and because machines typically look up the same
DNS addresses repeatedly, most machines keep a DNS cache that
2.6 Uncacheable Files holds the results of recent DNS lookups. Subsequent lookups of the
Not all files on the Web are cacheable. Some files are explic- same address can be completed quickly by using the cached result.
itly marked as uncacheable by the Web server supplying them, and As with a Web cache, a DNS cache records information about
some files are inherently uncacheable, for example because they a user's recent activities. In particular, if the user visits a Web
are dynamically generated. A study [9] by Wolman and colleagues server at www.whitehouse.gov, this will cause a DNS lookup
at the University of Washington found that about 60% of Web ac- for www.whitehouse.gov, so an entry for this address will ap-
cesses are to cacheable files1 . The Washington group's data also pear in the DNS cache of the user's PC. Subsequent attempts to
shows that the fraction of files that are cacheable is not changing look up www.whitehouse.gov will complete quickly because
appreciably over time [8]. of the cache entry.
Uncacheable files do not always prevent timing attacks. Pages There are several ways for an HTML page to cause a DNS lookup.
that are dynamically generated often include embedded images that For example, a Java applet can directly ask to have a DNS lookup
are cacheable -- for example, a portal's news page might include done; a JavaScript program can indirectly cause a DNS lookup to
the portal's corporate logo. To carry out a successful attack, the occur. By measuring the time required to look up a particular DNS
attacker need only find one cacheable file that is referenced by the address, an HTML page can determine whether or not a particular
page. address appears in a user's DNS cache, and can therefore determine
whether the user accessed that address recently.
3. Anonymization Tools Do Not Help Such an attack can be delivered by any of the same methods as
Several tools exist to provide "anonymous" Web browsing; ex- the Web caching attacks described above, via either a Web page or
amples include anonymizer.com [2], Crowds [6], the Lucent Per- an email message.
sonal Web Assistant [3], and Zero Knowledge Freedom [10]. These 4.2 Accuracy of the Measurements
tools all send Web requests through some kind of intermediary which
Figure 3 shows the distribution of DNS lookup time for known
shields the address of the original requester.
hits and known misses to three different Internet sites, as deter-
mined by a Java applet. The three sites cs.princeton.edu
(our department), rutgers.edu (about thirty kilometers from
us), and berkeley.edu (about 4000 kilometers from us) were
1 In
their study, a file was considered cacheable if version 2 of the chosen as examples of DNS servers with different network laten-
Squid proxy cache [7] would have been willing to cache it. cies for DNS queries. The test client experienced a small latency
28
150 Hit in Resolver Cache of multilevel caching are often used for Web content, backing a
client browser cache with a departmental or institutional Web proxy
Frequency
Miss in Resolver Cache
100
which performs caching.
50 Multilevel caches are presumably effective in reducing average
access time; that is the main reason for their existence. In prin-
ciple, then, if an attacker determines that an access misses in the
0 50 100 150
[cs.princeton.edu] load time (ms) first-level cache, he may be able to tell whether it hits or misses
in the second-level cache. This may allow attackers to gather even
more information than is available from measuring first-level cache
150 Hit in Resolver Cache contents.
Frequency
Miss in Resolver Cache
100
The typical difference between a first-level cache and a second-
level cache is that second-level caches are usually shared between
50 multiple client machines. For example, a departmental DNS cache
might be shared by all of the client machines in the department. In
0 50 100 150 general, a shared departmental cache aggregates information about
[rutgers.edu] load time (ms) the access patterns of individuals in the department: it records which
files have been accessed by members of the department, but does
not record which specific individual accessed which file. Depend-
150 Hit in Resolver Cache
ing on the circumstances, leaking aggregated information may be
Frequency
Miss in Resolver Cache
100 either more or less dangerous than leaking individual information.
50
5.1 Testing for Cache Sharing
If an adversary can test for the presence of files in second-level
0 50 100 150 caches, he can also determine whether two clients share a second-
[berkeley.edu] load time (ms)
level cache. This would be done by causing the first client to access
a unique name (in a part of the namespace controlled by the at-
tacker), and then determining whether that name is present in the
Figure 3: Distribution of DNS lookup time for known DNS hits
second-level cache of the other client. If it is present, then the two
and known DNS misses for three sites, as measured by a Java
clients must share a second-level cache.
applet embedded in the attacker's Web page.
An attacker can use this attack in several ways. He might use it
to probe the organizational structure of an enterprise, exploiting the
fact that employees who work in the same group, department, or
for queries to cs.princeton.edu, a slightly larger latency for location tend to share second-level caches. Alternatively, he might
queries to rutgers.edu, and a somewhat larger latency in queries use it to find the location of an unknown client, by checking whether
to berkeley.edu. that client shares a second-level cache with any of a set of clients
Computing the accuracy as described above in Section 2.5.2 whose locations are known.
for each site, we find that, for the three servers, an attacker would
knows the distributions would choose (T, p) to be (10 ms, 0.63), 5.2 Accuracy of Measurements
(20 ms, 0.15), and (60 ms, 0), respectively, giving optimal accura- We performed an experiment to determine how accurately an at-
cies of 53.5%, 90.4%, and 100.0%, respectively. Using the four- tacker could distinguish hits from misses in a shared departmental
sample method, an attacker who does not know the distributions HTTP proxy cache. To do this, we disabled caching in a client
will achieve mean accuracies of 53.5%, 87.7%, and 100.0% respec- browser and configured the client to use our department's caching
tively. HTTP proxy. We then measured the access time for known hits and
The differences in these accuracy measurements demonstrate that known misses in the proxy cache.
these tests do not perform well when queries to the DNS server be- Figure 4 shows the results. The hit distribution and the miss dis-
ing tested experience a very small cache miss penalty. However, tribution are nearly disjoint. Following the methodology of Sec-
when the miss penalty is large enough to detect (as it is for almost tion 2.5.2, we find that an attacker who knows the distributions
all sites on the net) the tests lead to very high accuracies. will choose his threshold T to be 50 ms, and his p to be 0.58,
Thus, when the attacker chooses a DNS server with a measurable achieving an accuracy of 96.7%. Using the four-sample method,
latency between the server and the victim, the attacker can effec- an attacker who does not know the distributions can attain an accu-
tively determine whether or not a particular entry is in the victim's racy of 95.7%. In other words, the attacker can effectively test for
DNS cache. the existence of a file in a shared second-level cache.
5. Attacking Multi-Level Caches 6. Cache Cookies
So far we have described how attackers can learn the contents Traditional web cookies are one way for sites to maintain persis-
of caches on client workstations. Often, client caches are backed tent state on the clients who visit their pages. The cookie itself is a
by additional caches that are shared by many clients. For example, relatively small amount of data, "written" by the server and stored
a client PC's DNS cache is typically backed by another cache that by the client as part of a normal HTTP request. Clients volunteer
is shared at a departmental level. If an address is not found in the the contents of the cookie back to the same server on subsequent
client's DNS cache, a request is sent to the departmental cache. HTTP accesses as part of the HTTP request. Clients can store cook-
If the address is in the departmental cache, the result is provided ies across multiple browsing sessions; a cookie may have an expiry
to the client; if not, the departmental cache does a DNS lookup, time associated with it beyond which it will be discarded by the
which may involve lookups in additional caches. Similar forms client. [5]
29
of information (a unique identifier, for example), the number of
800 Hit at Proxy cache files required is small enough to be managable.
Miss at Proxy
6.3 Cookie Reading Caveats
600
While the process of writing a cache cookie into the user's cache
is straightforward, checking for the existence of a cache cookie is
Frequency
slightly more complex. Checking for the presence of a file requires
reading the file; and reading the file normally causes the file to be
400 loaded into the cache, so it might appear that checking for the ex-
istence of a cache cookie has the side-effect of putting that cookie
into the user's cache. An additional technique is necessary to pre-
vent cookie checking from destroying the information contained in
200
the cookies.
To make reads nondestructive, the server must know, when serv-
ing a URL corresponding to a cache cookie, whether the client
is reading or writing. One method of discrimination is to use the
0 50 100 150 Referer: HTTP header. For example, suppose we have a cookie
Load time (ms) URL cookie.jpg. The cookie might be read by an html page
reader.html, or written by another html page writer.html.
Since the enclosing page is included in the Referer: HTTP header,
Figure 4: Distribution of access times, with client-side caching the server can examine this header to determine whether to read or
disabled, for known hits and and known misses in a shared write.
HTTP proxy cache, as measured by a JavaScript program em- When writing, the server should return the file with a very long
bedded in the attacker's Web page. expiry time. This will help to make sure that the cookie does not
expire from the client's web cache. When reading, the server should
return the file with a very short expiry time. After a string of cookies
has been read, the ones that were present already (which will have
Sites might use the persistent state stored on each client, for ex- long expiry times) will persist in the cache. The cookies which
ample, to retain the site preferences of individual users. Cookies were absent (and hit during the reading process) will expire quickly,
can also be used to tie multiple site visits to the same user. Most returning them to the absent state. In this way, servers can read
web browsers provide a method for accepting (storing) or rejecting client-side cache cookies nondestructively.
(ignoring) web cookies on the client. This method allows clients to
opt-out of the persistent state, perhaps for personal privacy reasons. 6.4 Using Cache Cookies
Cache cookies have the advantage (from the attacker's point of
6.1 Cache Cookies view) that they require no client-side support. In most cases the
Using the methods described above, it is possible to develop a client would be completely unaware that cache cookies are in use.
new, more intrusive, form of web cookies, which we call "cache Worse yet, cache cookies can be used across sites (i.e. written by
cookies". Servers can store cache cookies on clients who visit their one site and read by another independent site). These qualities
pages, without the client's knowledge or approval. Cache cook- make cache cookies very dangerous to the privacy of web users.
ies are stored in the form of entries in the client's web cache. By
forcing a client to retrieve a specific URL (using an IMG tag, for
example), the server can effectively write entries into the client's 7. Countermeasures
cache, thus storing the cookie. To read the cookie, the server can We now turn to the question of how to counter these attacks.
use any of the measurement techniques described above to measure Several types of countermeasures exist, but all are unattractive.
the retrieval time for the same URL. A hit indicates that the cookie
is present, otherwise the cookie is not present. 7.1 Turning Off Caching
Although cache cookies can be used to emulate traditional cook- The first countermeasure is simply to turn off caching. This
ies, they differ in two important respects. First, they require no will certainly prevent the attacks, but it imposes a big performance
client side support. A server can use cache cookies to store infor- penalty. Indeed, it seems likely that the Web and the DNS infras-
mation on a client without the user's knowledge or approval. We tructure could not meet the bandwidth demands imposed by the re-
know of no way that a browser can unconditinally block these sorts moval of caching.
of cookies without an unacceptable performance loss. 7.1.1 Turning Off First-Level Caching
Second, cache cookies can be writen and read by different servers. Rather than turning off caching altogether, we could try to turn
The reading of traditional web cookies is restricted to the site which off first-level caching and rely on department-scale second-level
originally wrote them by web browser security. Implementing this caching. This has significant performance implications, and it still
type security for cache cookies would most likely require reengi- allows attackers to gather aggregated information from the second-
neering of the operation of the client's web browser. level caches.
7.1.2 Implications of Transparent Caching
6.2 Emulating "Traditional" Web Cookies Recently, systems employing transparent Web caching have been
A single cache file can be only present or absent, so it can store devised. Transparent caching does not require clients to explictly
only a single bit of state. Traditional Web cookies store multi-bit designate a cache. Rather, a transparent cache observes network
values, so several files (one per bit, plus a few extra for error cor- traffic over some network segment, detects which traffic encodes
rection) must be put in the cache to emulate one traditional cookie. Web requests, and automatically provides a Web cache for those re-
Since traditional cookies are used only to store very small amounts quests, without the knowledge or consent of the requestors. Trans-
30
parent caching can be embedded in network components such as it is in the user's cache. However, this attempt would come with
routers, and it offers improved performance for Web clients with- domain-tag adversary.com, which is different from the domain-
out requiring any reconfiguration at the client. Transparent caching tag of the cached copy, so the result would be a cache miss. In short,
cannot be turned off at the client, so one client, acting alone, may the adversary's access to the page would miss in the cache, regard-
be unable to avoid using caching. less of whether the user had visited Yahoo.
Domain tagging does transform some hits into misses, but this
7.2 Altering Hit or Miss Performance only occurs when the same file is accessed while viewing multiple
Another class of countermeasures relies on altering the perfor- domains. This situation is sufficiently rare that it should not pose
mance of cache hits, in an attempt to make them harder to distin- much of a performance problem in practice.
guish from misses. Domain tagging would be effective in preventing some timing at-
Simply slowing down hits is of limited value. As long as hits tacks on browser caches, but it fails to protect against other attacks.
are appreciably faster than misses, attackers will be able to extract It will not prevent timing attacks that learn shared proxy cache con-
useful information about cache contents. Of course, if we make hits tents (unless we augment the proxy-http protocol to carry domain-
as slow as misses, then attackers will be handcuffed; but in that case tag information). It will not prevent attacks on DNS caches. It also
we might as well have turned off caching. fails to prevent the use of cache cookies, although it does prevent a
Another approach is to increase the variance in hit times in a cache cookie set by one site from being seen by another site. Given
randomized fashion. This also has only limited value. If a particular the limited value provided by domain tagging, it is not clear whether
hit is noticeably faster than a miss, then the attacker will still be able it is worth implementing in real browsers.
to categorize it as a hit. If it is not noticeably faster than a miss, then
it might as well have been a miss. In other words, this approach
essentially converts some randomly chosen hits into misses, with a
9. Conclusion
consequent loss of performance. The more hits we wish to disguise, Cache-based timing attacks are possible in many situations where
the more performance will suffer. references to data are cached. By measuring the time required to
Alternatively, we could try increasing the variance in miss times. access a data element, a malicious program can determine whether
This fails utterly. We cannot make misses faster, since they are that element is in a nearby cache, thereby allowing the program to
already as fast as they can be. However, if we make a miss slower, learn whether that element has been accessed recently on the same
we just make it easier to classify as a miss. Thus increasing miss machine (for local caches) or in the same group of machines (for
variance can only make things worse. shared caches). We expect that this type of attack will be possible
in many situations besides the ones we have described.
7.3 Turning Off Java and JavaScript Web technologies allow an attacker to control the sequence of
Our final countermeasure is to turn off Java and JavaScript. This data accesses on a remote machine, and hence to carry out cache-
takes away the attacker's most accurate measurement tools, and based timing attacks. An attack could be delivered by a Web page,
hence lowers the accuracy of the attacker's measurements. or in an email message if the victim uses an HTML-enabled mailer.
There are two problems with this approach. First, the attacker can We have described attacks that probe the contents of Web browser
still do reasonably accurate measurements, as the data presented file caches, to learn a user's Web browsing history, and attacks that
above in Section 2.5.5 demonstrates. Second, Java and JavaScript probe DNS caches, to learn which network addresses a machine has
are now so widely used on Web sites that it seems unrealistic to ask connected to recently.
ordinary users to turn them off. We are not aware of any practical countermeasures to these at-
tacks. There seems to be little hope that effective countermeasures
7.4 Countermeasures Summary will be developed and deployed any time soon.
None of the countermeasures discussed here is attractive. At
present, we would have to turn off caching, paying a high perfor- Acknowledgments
mance price, to prevent these attacks.
Thanks to Drew Dean, Gary McGraw, Avi Rubin, Dan Wallach,
Randy Wang, and the anonymous referees for their comments and
8. Domain Tagging suggestions.
As discussed above, no practical countermeasure against our tim-
ing attacks is available today. We now describe a countermeasure, 10. REFERENCES
based on changing the browser's cache implementation to imple-
[1] A LBITZ , P., AND L IU , C. DNS and BIND, third ed. O'Reilly
ment "domain tagging," that prevents some timing attacks.
and Associates, 1998.
Currently, each item in the browser's cache is tagged with the
URL of the file of which it is a copy. We add another tag, called [2] A NONYMIZER. Web site at http://www.anonymizer.com.
the domain-tag, which gives the domain name of the page that the [3] G ABBER , E., G IBBONS , P., M ATIAS , Y., AND M AYER , A.
user is viewing. A cache access is treated as a hit only if both tags How to make personalized web browsing simple, secure, and
match. anonymous. In Proc. of Financial Cryptography '97 (1997).
For example, suppose the user is currently viewing the page http: [4] M ARTIN J R ., D. M., R AJAGOPALAN , S., AND RUBIN ,
//search.yahoo.com/search/options. All accesses to A. D. Blocking Java applets at the firewall. In Proc. of
the cache while rendering this page (and any images and other Internet Society Symposium on Network and Distributed
material embedded in it) are given the domain-tag yahoo.com. System Security (1997).
For example, while fetching the image http://us.yimg.com/ [5] N ETSCAPE C OMMUNICATIONS. Persistent client state:
images/yahoo.gif (which is embedded in the search options HTTP cookies. Web site at
page), the domain-tag yahoo.com is used. http://home.netscape.com/newsref/std/cookie spec.html.
If the user later visited www.adversary.com, this page could
embed the yahoo.gif image, in an attempt to determine whether
31
[6] R EITER , M. K., AND RUBIN , A. D. Crowds: Anonymity for
web transactions. ACM Trans. on Information and System
Security 1, 1 (June 1998).
[7] The squid web proxy cache. Web site at http://squid.nlanr.net.
[8] VOELKER , G. Personal communication.
[9] W OLMAN , A., VOELKER , G., S HARMA , N., C ARDWELL ,
N., B ROWN , M., L ANDRAY, T., P INNEL , D., K ARLIN , A.,
AND L EVY, H. Organization-based analysis of web-object
sharing and caching. In Proc. of Second USENIX Symp. on
Internet Technologies and Systems (Oct. 1999), pp. 2536.
[10] Z ERO K NOWLEDGE S YSTEMS. Freedom 1.0. Web site at
http://www.freedom.net.
32