Tags: anonymity on the internet, caller line identification, cambridge cb3, cambridge computer laboratory, clayton university, cli, conclusive evidence, innocent bystander, internet service provider, investigative tool, jj thompson, law enforcement agencies, law enforcement officers, london internet exchange, opponent, practice document, regulators, richard clayton, spoofing, university of cambridge,
The Limits of Traceability
Richard Clayton
University of Cambridge, Computer Laboratory, Gates Building, JJ Thompson Avenue,
Cambridge CB3 0FD, United Kingdom
richard.clayton@cl.cam.ac.uk
Abstract. Traditional "traceability" (the flipside idea to "anonymity") on the Internet attempts to
identify the IP address that caused an action to occur. This is sufficient for an Internet Service Provider
(ISP) to take action against the authorized user of that IP address. Law enforcement agencies usually
need to go beyond this in order to identify the individual concerned. However, shared accounts,
unavailable Caller Line Identification (CLI), and spoofing on Ethernets, mean that in the real world
there is poor traceability for the "last hop". This means that it is not possible to consider the information
gathered as "conclusive evidence" suitable for a Court, but just as "intelligence"; an investigative tool,
albeit a valuable one. Law enforcement officers should be especially wary that a sophisticated opponent
might be able to frame an innocent bystander.
1. Introduction
It is often the case that an action has occurred on the Internet and one wishes to answer the question,
"who did this?" The ability to track down the originator of an action is usually called "traceability" and it
can be seen as the flipside idea to "anonymity". It is of interest to regulators [1] and to law enforcement
agencies (LEAs) [2] as well as to those who operate the Internet itself. There are a number of techniques
available for establishing the originator of an action, many of which can be found described in the London
Internet Exchange (LINX) "Best Practice" document [3].
The LINX document describes how to determine which IP address was the source of the action of
interest and how to trace who was using that IP address at the relevant time. The document also considers
what Best Practice procedures will ensure that this tracing can be performed as simply as possible; or, to
put the same idea the other way around, which steps in the evidence chain are open to attack by a defense
lawyer.
LEAs have seized upon traceability as a magic wand for the detection of "Hi-Tech" crime. Data
retention (causing logs to be preserved for a known period) and data preservation (ensuring logs of special
interest are not destroyed) have become hot topics at the inter-governmental level within the G8 and
elsewhere. In the United Kingdom (UK) some LEA officials have gone so far as to propose data
warehouses that will hold seven years worth of logging information from telephone companies and Internet
Service Providers (ISPs) [4].
Traceability is usually talked about in the abstract without considering who is doing the tracing.
However, the reason for doing the tracing cannot be ignored. Traceability begins to become far more
problematic as one nears the actual source of the action and this poses very significant problems for a law
enforcement official whereas an ISP may be able to ignore these complexities altogether.
Real world experiences, such as those of the author in the UK ISP industry, show that significant
amounts of logging data are regularly collected and this can sometimes provide excellent levels of
traceability. However, as soon as one wishes to identify individuals then one is reduced to deduction and
inference rather than being able to be "sure".
Page 1 of 7
As an example, consider a dialup Internet access account that has been used for sending unsolicited bulk
email ("spam"). When the evidence of this abuse arrives at the ISP the dialup account will be identified
beyond question and it will then be disabled. No further investigation is needed. If, instead, the dialup
account had been used for posting illegal material (for example, sexual images of children are unlawful in
most jurisdictions) then law enforcement officers will be interested in exactly who was using the dialup
account to post the material. This can be, as will be explained in detail below, extremely hard to determine.
In fact, it turns out that almost every method of accessing the Internet poses significant traceability
problems over "the last hop". This means that LEAs should cease to consider traceability as an evidential
tool (proof that someone is guilty) and should only be considering it as an investigative tool (pointers to
people who are worth paying more attention to). In particular, where the quarry is known to be technically
sophisticated, careful attention must be paid to the possibility that an innocent person may be "framed" by
the guilty and heavy-handed intervention may serve only to "tip off" the actual miscreant.
Typical problems with various access methods will now be described in turn.
2. Dialup Access (Modems and ISDN)
Dialup access is one of the most common forms of access to the Internet and there are great many ISPs
offering this type of connection. A communications link is made over ordinary voice telephone lines
(analogue or digital) using a modem or ISDN terminal adapter. The session starts with the user
authenticating themselves to the ISP, usually by a simple password, and they are then given an IP address
and connected to the Internet.
There are two ways in which it is possible to tie a dialup connection to a particular individual. One can
examine the credentials that were given when the dialup account was opened and one can use CLI (Caller
Line Identification) to determine the source of any particular call within the telephone network.
Traditionally, one paid an ISP for a dialup account so the credentials given when accounts were opened
would include some means of payment. The user would have to continue making payments at regular
intervals or their account would lapse. It is most unusual for dialup accounts to be paid for in cash and the
financial community can, upon production of suitable authority, trace the owner of a checkbook or credit
card. If the credentials were forged then the ISP would discover this fairly promptly (usually when the
credit card company refused payment) and the account will be closed if indeed it was ever opened.
The recent invention of other business models based on telephone interconnect charges, advertising, or
mere hot air, have meant that it is no longer necessary for the ISP to collect money from their subscribers
directly. The ISP will still attempt to collect customer information (if only to profile their user base for the
benefit of advertisers) but it has no way of establishing whether it is accurate.
With credentials becoming increasingly dubious as an identification mechanism, this leaves just the CLI
information as a traceability tool. The CLI can be recorded by the ISP's equipment and in principle it
provides the phone number that made each individual call to the ISP. However, in the United Kingdom
(UK), and doubtless elsewhere, it has significant limitations:
· the user may have requested the phone company to withhold CLI
Even if CLI is not suppressed for all calls, dialing a special code (141 in the UK) will disable CLI
for an individual call. The CLI is still available to a telco in an "engineering" form, but not to an end
system. UK ISPs are to be given access to "engineering CLI" once a suitable Code of Practice has
been formulated, but this arrangement is not yet in place.
Page 2 of 7
· CLI may be lost at telco boundaries
It is unusual for trans-Atlantic calls to have any CLI, and in the past CLI has failed to travel between
various UK based networks with any reliability.
· the CLI provided may be generic
One of the side effects of using some discount phone schemes can be that the call appears to
originate at the discount provider rather than at the true source. Therefore traceability will depend
upon how much logging has been enabled within the third party system.
The CLI may belong to a company, with many users dialing out through a PABX. Traceability will
depend upon what types of records are kept of usage by particular telephone extensions.
In order to prevent serial abuse UK ISPs who are providing "free" services usually require that CLI be
presented before calls are accepted. When abuse has occurred no further accounts can be opened or
operated by calls that originate from the same CLI. The ISPs who provide "paid" services tend not to
require CLI since they would reckon to identify serial abusers from re-use, for payment, of the same credit
card number.
Even where traceability appears to be provided, either by credentials or by CLI, LEAs have considerable
further difficulties to overcome with both forms of identification. It may be straightforward to determine
which account was being used but if the details of access to the account are widely known then the
individual who used it may be impossible to trace. The password for a company account may be known to
dozens of people any of whom could pass it on to a friend. Passwords are often inadvertently posted to
support newsgroups when evidence of login problems are being discussed and of course the sysadmins at
the ISP could well be in a position to collect the passwords of any user they wish.
3. New Technologies
There are a number of relatively new technologies for providing access to the Internet.
High-speed access is becoming available over phone lines or cable networks. In general terms, the IP
traffic is transferred across a lower level transport system (such as ATM). Standard traceability techniques,
as described in the LINX document, will determine which subscriber's account has been used at the ISP
level. The equivalent of CLI in this type of scheme is that work will have been done at the transport level to
set up the path between customer and ISP. Provided that suitable records have been kept it should be
straightforward to determine the customer premises that were involved.
However, these records may not exist in a suitable form where the system automatically configures paths
to ISPs using its own authentication scheme. If this is the case, as it is today in the UK for ADSL access via
British Telecom's infrastructure, then there still remains the possibility that account and password details
have been leaked and an imposter must be located.
In passing, it should be noted that it is, in principle, possible to determine the origin of a dialup call even
without the presence of CLI. The telco companies involved will have billing records that include every
single call and with accurate time and duration information a particular call could be located. However,
because of the high volume of calls made this would be a non-trivial task and it is not a service that the
telcos currently make available. However, on lower volume systems, such as ADSL networks, it may be
that appropriate types of record exist and can be searched cost-effectively.
Internet access from mobile (cell) phones is also available. This may involve plugging a laptop into the
handset or, with the current WAP system and forthcoming 3GPP devices, the computer may be within the
phone itself.
Page 3 of 7
Dialup access to the Internet from a mobile handset has the same set of traceability issues as does dialup
from fixed lines. Once again, when faced with abuse the ISP can block the account or the CLI whereas
LEAs have the same set of problems in establishing the actual identity of the miscreant. There is, however,
extra complexity for law enforcement when a "prepaid mobile" is used. In many countries there are no
registration requirements for the ownership of "prepaid mobiles" where the call minutes are bought in
advance and both phones and time can be purchased for cash. Therefore, even if the phone number is
available, there will be no record of the identity of the owner. Technology does exist for determining the
physical location of the phone, but the accuracy of this is limited. In practice, most traceability for prepaid
mobiles is done by an analysis of billing records to try and establish the likely identity of the owner from
the record of who they have been in contact with.
4. Leased Lines
A leased line is, in general, a permanently configured pathway across the telco network from customer to
ISP. Although the term originally described a dedicated piece of copper, it would be rare for specific bits of
cable to be involved except for the very last hop to the customer premises. The rest of the pathway will be
virtual circuits running over shared hardware.
Because of the way in which the circuits are provided it might be assumed that the customer end is fixed
and cannot be hijacked or impersonated, except by physical attacks on the cabling or by corruption within
the telco itself. These possibilities cannot be entirely discounted as experience with burglar alarm circuits
has demonstrated [5].
However, in the case of leased lines, a much more significant traceability issue arises for LEAs. They
will need to determine which workstation at the customer end of a leased line is involved in the events that
they wish to investigate. Of course this is potentially a problem with any of the Internet access methods that
are described above as well. It would be most unusual to connect a large company network to the Internet
over a mobile phone, but many companies will use ISDN or ADSL to access the Internet and the same type
of issues will arise there as well.
The basic difficulty is that the logging records at the ISP will ensure that events can be traced to the
leased line, but the records will not usually be able to distinguish between different machines at the
customer's premises.
There are two general schemes for allocating IP addresses to customer networks. The first is to allocate a
subnet, a group of 2, 6, 14, 30, 62... addresses of which one is used for the customer "router" and the others
are available for individual machines. The second scheme is called NAT (Network Address Translation) [6]
where a single IP address will be visible to the Internet as a whole and "private" address space will be used
on the customer network. A specialist machine on the customer premises will keep track of all open
connections and will distribute incoming data to the appropriate customer workstation. It is usual, but not
ubiquitous, for NAT to be used for dialup and subnets for leased lines.
In principle, traceability can be extended from the open Internet onto the private IP network so as to
determine the workstation that is of interest. However, there are a number of practical difficulties with this
and the first one is that the customer may not be keeping sufficient logging. If NAT is being used then a
record would be needed of the connections it has handled which would be a substantial amount of data
and is very unlikely to have been recorded. In both the NAT case and the subnet case, there may not be any
records of which machines were using particular IP addresses. It is not unusual for addresses to be
dynamically allocated by the DHCP protocol [7] and the server may not keep particularly useful records. In
all cases where records are being kept there should be some caution in the minds of law enforcement
officers as to whether these records are in any sense trustworthy or indeed whether the mere attempt to
inspect them will "tip off" the miscreant about the investigation.
Page 4 of 7
It is not only the lack of historical records that makes traceability problematic on customer networks.
Even when records exist and even when one is tracking someone down in real-time rather than trying to
untangle a previously occurring event there is a further problem. A technically competent person (or one
with access to sophisticated tools) may be able to "hide" themselves on an Ethernet network in such a way
as to incriminate an innocent user. This poses a problem for law enforcement officers in that a high profile
"raid" on the innocent may serve to "tip off" someone nearby.
In order to understand how to "hide" on a Ethernet (which is by far the most common local area network
technology in use today) one must remember that it is fundamentally a broadcast medium. Traffic will
arrive from the Internet down the single wire from the ISP but is then broadcast to all the machines on the
Ethernet. In the Ethernet environment the addressing is done by "MAC address" and network interfaces
will only handle traffic for their particular address. Protocols such as ARP are used to associate IP address
and MAC address together.
MAC addresses are intended to be unique and fixed, serving as a machine identifier. The IP addresses
may be statically allocated by an administrator, in which case traceability is achieved by consulting the
administrator's records. Alternatively, it may be dynamically allocated by a protocol such as DHCP, in
which case inspection of the DHCP server records will provide a mapping to a MAC address and possibly
also an indication of information such as the Windows NT domain identifier being used.
Where historic records are unavailable (or cannot be trusted to be accurate) traceability might be
achieved by real-time monitoring of the network if the target is still using it. However, even if the target is
still using the network, they may be hard to physically locate. Time Domain Reflectometers are a classic
way of locating where devices are attached, and can be used to locate extra kit that should not be connected
to the network. However, in reality their output can be difficult to interpret, and switches and hubs will
segment the network so that the search will take a long time. They will of course be of no use at all where
you know that you are seeking an authorized attachment that is behaving in an unauthorized manner.
It is good practice to keep a record of the MAC addresses that belong to particular pieces of hardware to
assist in tracing network packets to their source. However, even though the addresses are usually still stored
in the hardware, modern network drivers tend to handle the MAC addresses within the software. This
means that they are trivially changed to any value required possibly "borrowing" other people's settings.
Difficulties arise if the machine the MAC address is borrowed from is still running on the network, but if it
is switched off (or subject to a disabling denial of service attack) then it can be readily impersonated.
It is also possible to use "spare" IP addresses or to hijack another machines' IP address. Again,
complications arise if the IP address is simultaneously in use by its real owner, because the ARP protocol
will be aware that unusual events are occurring. This can be overcome by careful forwarding of data
combined with gratuitous ARP traffic to undo anything that the real owner of the IP address may do. With
suitable tools such as monitoring the ARP table or network "sniffers" [8] it is possible to determine that
unusual things are occurring, but even if this is known, it will still be hard to identify the node that is
misbehaving.
5. Using Anonymous Machines
Since tracing can sometimes be effective, some people use a machine other than their own, so that it will
not matter that it is traced.
"Cybercafes" are often cited as the ultimate in anonymous access to the Internet. They have all the same
characteristics as a corporate network except that there is no formal record kept of the identity of those who
are using particular machines. There will of course be informal records in that by putting in a personal
appearance a risk is run of being remembered by the counter staff or recorded on CCTV in the street
outside.
Page 5 of 7
Cybercafes do have an interest in preventing the type of abuse that would cause their connectivity
provider to close them down (such as allowing the bulk sending of unsolicited email). As such, there must
be an expectation that some types of traffic (such as connections to the SMTP port 25) may be monitored in
real time. As such it may well not be a "safer" environment for the miscreant than a university or company
network.
Cybercafes are not unique in providing public access points. One can often find machines in libraries or
hotel lobbies, where it is likely that the technical expertise of the system administrators will be somewhat
less than in an organization whose main business is the provision of connectivity.
Some people (often called "hackers") may use a machine without permission, remotely operating it with
the intent of providing a "cut-off" so that tracing will have to repeated from that point to locate the actual
user. If the events being investigated took place in the past and the local records have been competently
removed (or altered) then the trail is effectively dead. If traffic is ongoing then, with co-operation from the
owner of the machine or their network administrator, it should be a relatively simple task to determine the
IP address of the controlling machine and this can be located (or not) using standard techniques.
However, real cases such as the ROME Labs intrusion [9] show that it may be very difficult to gather this
type of evidence in a form that can be confidently expected to survive a challenge in court.
6. Conclusions
Traditional traceability techniques allow one to determine the administrative owner of an IP address.
This is of great practical value to ISPs and other network operators because they can then use the threat of
disconnection from the Internet as a way of ensuring co-operation and future good behavior. Connectivity
providers have almost always seen removal of connectivity as sufficient sanction in itself and have seldom
seen value in identifying people so that they might be called to account in the "Real World". It is therefore
hardly surprising to find that the traceability that has been developed over the years works extremely well
in determining where to "pull the plug" but is poor at going beyond that.
LEAs have a need to identify people. The current emphasis by policy makers on the retention and
preservation of logging data by telcos and ISPs will not, in practice, deliver what law enforcement needs to
do this identification. The likely result of current initiatives is to demonstrate that in many common
circumstances traceability information is unavailable or incomplete. At present, there is little evidence that
improving the quality of the logging will be tackled in parallel to increasing the quantity.
Practical traceability often comes from imaginative use of the available records. This will usually
provide LEAs with information that will materially assist in determining who should be investigated.
However, the circumstantial nature of the evidence that traceability depends upon means that it will often
be unsuitable for use before a court. Where the information is ambiguous, it may lead the investigator along
false trails that are close enough to the real target to "tip them off". Where individuals are actively trying to
hide, they may succeed in making innocent bystanders appear responsible for actions they never performed.
Although logging is a "hot topic" for privacy campaigners, this is partly because it is ascribed magical
powers to track everyone, everywhere. The failure of traceability at the edges of the network has seldom
been tackled or described. ISPs have ignored the issue because their needs for traceability stop at the
account level and these needs are satisfied by the status quo. LEAs are still learning what is possible in
cyberspace and are currently concentrating on their perception that what is important is the failure of
traceability in the center of the network because of unsuitable log retention policies. The academic security
community shies away from topics that "everyone knows", even though everyone doesn't, and has mainly
concentrated on protecting systems from harm, rather than catching the bad guys after the event.
Meanwhile it is "hackers", using individual techniques in an ad hoc manner, that have provided the
majority of the documentation of viable methods of avoiding being traced.
Page 6 of 7
It is now time to pull these disparate approaches together and understand more clearly what traceability
can actually deliver and where its limits may lie. This paper has been a first attempt at performing this
important task.
References
1. Coffee, J.C.Jr.: Brave New World? The Impact(s) of the Internet on Modern Securities Regulation. The
Business Lawyer 52(4) (1997).
2. President's Working Group on Unlawful Conduct on the Internet: The Electronic Frontier: The
Challenge of Unlawful Conduct Involving the Use of the Internet. US Department of Justice (March
2000)
3. LINX Content Regulation Committee, Clayton et al: LINX Best Current Practice Traceability.
Version 1.0. (18 May 1999) http://www.linx.net/noncore/bcp/traceability-bcp.html
4. Gaspar, R.: NCIS submission on communications data retention law. http://www.cryptome.org/ncis-
carnivore.htm
5. Anderson, R.: Security Engineering: A Guide to Building Dependable Distributed Systems. Wiley
2001. pp 213-216.
6. Egevang, K.: The IP Network Address Translator (NAT). Request for Comments 1631 (May 1994)
http://www.ietf.org/rfc/rfc1631.txt
7. Droms, R.: Dynamic Host Configuration Protocol. Request for Comments 2131 (March 1997)
http://www.ietf.org/rfc/rfc2131.txt
8. Graham, R.: Sniffing (Network wiretap, sniffer) FAQ. v0.3.3 (September 14, 2000)
http://www.robertgraham.com/pubs/sniffing-faq.html
9. Sommer, P.: Intrusion Detection Systems as Evidence. RAID 98 (1998)
http://www.zurich.ibm.com/pub/Other/RAID/Prog_RAID98/Full_Papers/Sommer_text.pdf
Page 7 of 7