Tags: available tools, cidr, countermeasures, general meeting, gions, internet abuse, karsten, linuxmafia, minimal time, mitigation, netcom, network sources, network space, router data, san francisco ca, server level, single end, strong power, tolerant networks, working group,
CIDR House-Rules: Use of BGP router data to identify and address
sources of Internet abuse
Karsten M. Self
kmself@ix.netcom.com
1st March 2006
Presented to the Messaging Anti-Abuse Working Group · The bulk of spam originates from a very small subset
6th General Meeting of network sources.
San Francisco, CA
http://linuxmafia.com/~karsten/cidr-house-rules.pdf · These networks are readily identifiable by com-
monly available tools and methods.
Abstract Though based on observations from a single end-user
mailbox, trends noted should be similar in character to
BGP router data may be used to identify contiguous re- those seen at the server level. Comparing data from sev-
gions of network space from which significant abuse is eral sources show similar trends. This is not an "ulti-
observed. Experience suggests a strong power-law rela- mate solution", however it may be a useful tool particu-
tionship in ranking such sources. Applying this knowl- larly on large sites, sites with large spam loads, or sites
edge in abuse countermeasures may markedly reduce fil- in which mitigation methods should incur minimal time,
tering overhead while minimizing inadvertant blocking bandwidth, and processor overhead. It would also he
and increasing total costs to abuse-tolerant networks. helpful to have capabilities directly integrated with stan-
dard mail transfer agents.
The intended audience for this discussion includes
1 I know where your spam comes postmasters, email abuse reporting and mitigation man-
from agers, webmail providers, email server developers, email
plugin (server or client) developers, blog operators, VOIP
For typical Internet sites, from a quarter to half or more of vendors, and others dealing with network abuse.
all spam and other forms of network abuse may originate This paper merely introduces the concepts. It is neither
from a very small number of sources. a complete solution nor an exhaustive technical analysis.
The methods discussed here result from reporting and
data analysis on nearly 200,000 spams received at a single
ISP POP account since January, 2004. The interest isn't in 2 Technical concepts
specific sources, but in the tools used to aggregate infor-
mation on spam-transmitting peers and the applicability For presentation calibration: some of the technical con-
of these methods to large-scale spam mitigation. Several cepts covered in this presentation will include: email,
application scenarios are suggested. SMTP, DNS, CIDR, ASN, BGP, DNSBL, network hy-
The principle is to note sources of spam by IP peer on giene, greylisting, proportionate response, and denial of
an aggregated basis. Studying such data over time it has interest. Much of the following discussion assumes a
become clear that: moderate understanding of these terms.
1
Though initial applications have been for email, and · Quickly, cheaply, accurately
principally based around spam, other abuse for which
clear and not readily spoofable peer relationships exist · And could develop policies for email and network
may be appropriate. traffic management
· Oh, and could also identify your good / trusted net-
work peers
3 Existing spam filtering methods
The answer, of course, is, "You can".
Methods such as whitelist/blacklist, DNSBL, content
(rule-based) filters, Bayesian filters, greymilter, and
tarpitting are more-or-less widely deployed. They do 4 BGP, Routeviews.org, and you
work and are often effective. Several are strongly en-
dorsed by the author. Border gateway protocol (BGP), to quote Cisco:
However, they share a number of disadvantages:
is an interautonomous system routing protocol.
· Data-lossy, particularly filters, regards spam source. An autonomous system is a network or group of net-
Information gained regarding one IP isn't gainfully works under a common administration and with com-
applicable to its neighbors, or even (often) itself in mon routing policies. BGP is used to exchange rout-
subsequent abuse attempts. ing information for the Internet and is the protocol
used between Internet service providers (ISP).
· Whack-a-mole, particularly DNSBLs, regards point
[http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/bgp.htm]
vs. aggregate source. Rinse, wash, repeat with IPv6:
DNSBLs scale very, very poorly in this case. The key points to recognize are:
· Reliance on third parties reliably, accurately, equi- · BGP is fundamental to the nature of the Internet. It
tably, and expeditiously collect and distribute assess- defines the relationships between autonomous sys-
ment data. tems the networks the Internet internetworks be-
tween.
· CPU and/or wall-clock intensive, particularly for
large sites. Often extending to other resources in- · It ties directly to an organization: the AS owner,
cluding threads, filehandles, memory, etc. identified by ASN.
· Generally fail to impose overhead on spam source. · It ties directly to network data, the CIDRs which
BGP peering rules are applied to.
· Are uniformly applied to mail from both trusted and
untrusted sources inducing unnecessary cost. · Though IP space is large, and will likely get vastly
larger as IPv6 is widely adopted, pragmatic con-
While not arguing that these methods be disposed of, a straints suggest that ASN proliferation will not
method is presented here of taking a large first cut at the change as markedly. Currently there are some
spam problem before incurring the cost and uncertainty of 39,500 assigned ASNs with a total namespace of
other filtering methods. 65,535.
What If You Could...
In other words: you've found the folks in charge, where
· Tie an IP address to the organization responsible for they are, and how they relate to you. Since SMTP deliv-
it. eries are stateful TCP transactions with defined IP peer
· And a network address space (CIDR block) relationships (and spoofing is not practically significant),
we have a known IP.
· In a manner leveraging existing spam detection / fil- Now all you need is something which can return ASN
tering tools for single-point IPs and CIDR data for a given IP address.
2
The Routeviews project (http://www.routeviews.org/)
provides just such a capability, though others exist. It is "a
tool for Internet operators to obtain real-time information
about the global routing system from the perspectives of
several different backbones and locations around the In-
ternet", and was first noted by Joe St. Sauver of Univer-
sity of Oregon. Routeviews provides zonefiles, updated
twice daily, and queryable at:
host -t txt
.asn.routeviews.org
To determine, for example, the ASN and CIDR for the
AOL mailserver mailin-01.mx.aol.com at 64.12.137.249: The plot shows the total percent of spam contributed
(vertical axis) by ASNs (incremented along horizontal
axis).
$ host -t mx
249.137.12.64.asn.routeviews.org
CIDR data show a similar, though less concentrated,
249.137.12.64.asn.routeviews.org power distribution. Specific ASNs involved vary, though
descriptive text "8176" gross abusers have been fairly stable over time. Typical
"64.12.0.0" "16" among them are ASNs from China, Korea, large web-
mail providers (usually 419/Advanced-Fee fraud spam),
large European or Middle-eastern ISPs (often quasi-
This tells us that the server is in ASN 8176, CIDR
governmental monopolies), blowback/backscatter sources
64.12.0.0/16.
(which would be specific to a given email address at any
For use in mitigating spam, you want to find which one time), and occasionally larger US commercial ISPs.
ASNs are principally associated with spam traffic, noting Specific trends are highly idiosyncratic. You are very
volumes of both spam and ham (non-spam) mail received strongly encouraged to note trends from your own expe-
from various sources. Ideally both total mail volume and rience, not other sites'. Sharing data is possible and may
spam proportion would be noted. be useful but should not be principally relied on.
Routeviews.org makes the zonefiles available via rsync In conjunction with numerous spam reports sent to the
to allow large sites to run queries against a local name- organizations associated with domains, IPs, and/or ASNs,
server for increased performance. it's further noted that network organizations can be seper-
ated into two classes:
5 Pareto's law and spam sources · Those which deal preemptively or reactively in a way
which minimizes abuse problems
A power distribution is very evident in monthly data seen · Those which don't, can't, or won't.
to date.
This observation gives rise to the concept of network hy-
· Over a two year period, 3-5 ASNs con- giene, namely that there are neighborhoods which are well
policed and those which aren't. Methods for increasing
tribute 25% of all monthly spam. the accountability of a network's own hygenic practices
would be a net benefit.
· 50% of all spam comes from 9 to 35 Additional statistics, tables, and plots follow at the end
sources. of this paper.
3
6 Application would be denied or dropped, either at the service (proto-
col) or IP level, at random. If done at the SMTP transac-
ASNs by themselves don't tell you whether or not traffic tion level, either as a timeout (without 250 OK) or non-
is abusive, or if a given IP range is spammy. What's nec- permenant rejection, this would mean legitimate mail still
essary is to identify sources of undesireable traffic, map has a fighting chance to get through. A 90% reject rate
these to an ASN and / CIDR, and determine your house would allow half of mail through on 5 retries, for a typi-
rules for handling traffic from that CIDR. Two steps are cal 2 hour delay. A spam server without retry rules would
necessary: data aquisition, and policy enforcement. fail delivery of 90% of its mail, with retries it would suffer
large mail spools and possible other resource starvation.
The site implementing such a policy will receive imme-
6.1 Data aquisition diate benefit to itself. Widespread adoption is not neces-
sary to be locally beneficial. As multiple and large sites
Aquire a list of IPs doing things you don't (or do) like:
adopt such measures, impacts on abuse-tolerant networks
spam, viruses, open proxies, portscans, blog / comment
would be significant. The approach is to be both non-
spam, referrer spam, business partners, friends, vendors,
invasive and non-retaliatory. You are not taking any ac-
bad breath, drinking white zin. Look up the associated
tion which in any way directly changes or affects a remote
ASN / CIDR. Note which are naughty or nice. This
system: but are subjecting it to a denial of interest.
could be accomplished in the case of spam by dropbox
As a proportionate response, reject rates could vary
accounts, honeypots, server logs, end-user submissions,
with total traffic volume, abusive traffic percentage, and
or other means. Because of the power of aggregation al-
severity of abuse, as suited specific needs. Fine levels of
lowed by ASN/CIDR lookups, a reasonably constructed
control are therefor possible, operators are not reduced to
spam provider sample may be very small. On the order of
all-or-nothing responses to abuse.
1:1,000,000 or fewer mails for a very large provider.
6.2 Policy enforcement: CIDR house-rules 7 Data and additional references
Implement a policy at the service (eg: email, web, mes- Some additional information and references on use of
saging) or firewall (eg: iptables) level. These are your BGP and ASN data in spam mitigation.
house rules for interacting with a given CIDR, ASN, IP,
or other defined network block. 7.1 Related third-party discussions of spam
While blocklisting is one possible option, I'd very and ASN data:
much like to see the discussion move beyond that point.
A preferred approach is what I term "proportionate re- · The Routeviews project:
http://www.routeviews.org/
sponse". First: you'll likely want rules to expedite known-
trusted mail, or high priority mail from remote organiza- · Chris Siebenmann's blog describing spam combat at
tional sites, peers, clients, vendors, or other established the University of Toronto, Canada, including use of
BGP and ASN data at the server level:
relationships. Secondly, many peers will either have small http://utcc.utoronto.ca/~cks/space/blog/spam/SpamByASN
overall volumes, or not have a clearly identifiable na-
ture. This leaves the set of networks which are both · Michael Greb's blog on spam, including data on
spam by ASN, collected from several spamtrap ad-
high-volume and overwhelmingly spammy in nature. Of dresses:
course, any such implementation would have to be evalu- http://spam.thegrebs.com/
ated in a business and organizational context.
In proportionate response, a certain level of abuse
7.2 Summaries of spam by ASN & CIDR
would be met by a proportionate level of response. For
example, a network from which 90% of email was found Full online reports of my own data are frequently updated
to be spam, 90% of traffic originating from that network at:
4
http://linuxmafia.com/~karsten/monthly-asn-report
http://linuxmafia.com/~karsten/monthly-cidr-report
Historical data from January 2004 through present, with
some gaps, are saved by year and month in YYYYMM
form available at:
http://linuxmafia.com/~karsten/monthly-asn-report-YYYYMM.txt
http://linuxmafia.com/~karsten/monthly-cidr-report-YYYYMM.txt
From current data, ASNs and CIDRs with most reported
spams. Note that report classification isn't entirely accu-
rate though trends are generally well presented.
Report date: Mon Feb 27 23:37:48 PST 2006
Total spams: 11249
Total ASNs: 955
Rank Cumulative % % Spams ASN Description
1 9.9% 9.9% 1113 8176 NETSCAPE-ASN
2 18.5% 8.6% 968 4135 CHINANET-BACKBONE
3 24.5% 6.0% 673 4814 CHINA169-BBN CNCGROUP
4 28.3% 3.8% 432 8176 NETSCAPE-ASN
5 31.6% 3.3% 373 4837 CHINA169-BACKBONE
6 34.5% 2.9% 322 4755 KIXS-AS-KR Korea Telecom
7 36.7% 2.0% 248 3269 ASN-IBSNAZ TELECOM ITALIA
8 38.8% 2.0% 230 17858 KRNIC-ASBLOCK-AP KRNIC
9 40.7% 1.9% 217 1668 AOL-ATDN
10 42.1% 1.4% 161 17849 GINAMHANVIT-AS-KR
Report date: Mon Feb 27 23:39:24 PST 2006
Total spams: 11248 Total CIDRs: 2251
Rank Cumulative % % Spams CIDR AS & Description
1 9.9% 9.9% 1113 64.12.0.0/16 8176 NETSCAPE-ASN
2 13.7% 3.8% 432 16/ 8176 NETSCAPE-ASN
3 15.7% 1.9% 217 205.188.0.0/16 1668 AOL-ATDN
4 17.5% 1.9% 211 212.216.128.0/17 3269 ASN-IBSNAZ TELECOM ITALIA
5 19.2% 1.6% 183 220.163.0.0/17 4134 CHINANET-BACKBONE
6 20.5 1.4% 155 4755/61.17.128.0 VSNL-AS
7 21.9% 1.5% 152 221.220.128.0/18 4814 CHINA169-BBN
8 23.1% 1.2% 139 4755/61.17.176.0 4755 VSNL-AS
9 23.4% 1.2% 136 218.63.0.0/17 4134 CHINANET-BACKBONE
10 25.4% 1.0% 115 61.148.128.0/18 4814 CHINA169-BBN
10 26.4% 1.0% 115 222.129.64.0/18 4815 CHINA169-BBN
5