Information about http://linuxmafia.com/~karsten/cidr-house-rules.pdf

CIDR House-Rules: Use of BGP router data to identify and address …

Tags: available tools, cidr, countermeasures, general meeting, gions, internet abuse, karsten, linuxmafia, minimal time, mitigation, netcom, network sources, network space, router data, san francisco ca, server level, single end, strong power, tolerant networks, working group,
Pages: 5
Language: english
Created: Wed Mar 1 10:19:35 2006
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
CIDR House-Rules: Use of BGP router data to identify and address
                  sources of Internet abuse
                                                          Karsten M. Self
                                                         kmself@ix.netcom.com

                                                              1st March 2006

 Presented to the Messaging Anti-Abuse Working Group                      · The bulk of spam originates from a very small subset
                  6th General Meeting                                       of network sources.
                   San Francisco, CA
        http://linuxmafia.com/~karsten/cidr-house-rules.pdf               · These networks are readily identifiable by com-
                                                                            monly available tools and methods.

Abstract                                                                Though based on observations from a single end-user
                                                                        mailbox, trends noted should be similar in character to
BGP router data may be used to identify contiguous re-                  those seen at the server level. Comparing data from sev-
gions of network space from which significant abuse is                  eral sources show similar trends. This is not an "ulti-
observed. Experience suggests a strong power-law rela-                  mate solution", however it may be a useful tool particu-
tionship in ranking such sources. Applying this knowl-                  larly on large sites, sites with large spam loads, or sites
edge in abuse countermeasures may markedly reduce fil-                  in which mitigation methods should incur minimal time,
tering overhead while minimizing inadvertant blocking                   bandwidth, and processor overhead. It would also he
and increasing total costs to abuse-tolerant networks.                  helpful to have capabilities directly integrated with stan-
                                                                        dard mail transfer agents.
                                                                           The intended audience for this discussion includes
1 I know where your spam comes                                          postmasters, email abuse reporting and mitigation man-
  from                                                                  agers, webmail providers, email server developers, email
                                                                        plugin (server or client) developers, blog operators, VOIP
For typical Internet sites, from a quarter to half or more of           vendors, and others dealing with network abuse.
all spam and other forms of network abuse may originate                    This paper merely introduces the concepts. It is neither
from a very small number of sources.                                    a complete solution nor an exhaustive technical analysis.
   The methods discussed here result from reporting and
data analysis on nearly 200,000 spams received at a single
ISP POP account since January, 2004. The interest isn't in              2 Technical concepts
specific sources, but in the tools used to aggregate infor-
mation on spam-transmitting peers and the applicability                 For presentation calibration: some of the technical con-
of these methods to large-scale spam mitigation. Several                cepts covered in this presentation will include: email,
application scenarios are suggested.                                    SMTP, DNS, CIDR, ASN, BGP, DNSBL, network hy-
   The principle is to note sources of spam by IP peer on               giene, greylisting, proportionate response, and denial of
an aggregated basis. Studying such data over time it has                interest. Much of the following discussion assumes a
become clear that:                                                      moderate understanding of these terms.

                                                                    1
   Though initial applications have been for email, and               · Quickly, cheaply, accurately
principally based around spam, other abuse for which
clear and not readily spoofable peer relationships exist              · And could develop policies for email and network
may be appropriate.                                                     traffic management
                                                                      · Oh, and could also identify your good / trusted net-
                                                                        work peers
3      Existing spam filtering methods
                                                        The answer, of course, is, "You can".
Methods such as whitelist/blacklist, DNSBL, content
(rule-based) filters, Bayesian filters, greymilter, and
tarpitting are more-or-less widely deployed. They do 4 BGP, Routeviews.org, and you
work and are often effective. Several are strongly en-
dorsed by the author.                                   Border gateway protocol (BGP), to quote Cisco:
   However, they share a number of disadvantages:
                                                                             is an interautonomous system routing protocol.
    · Data-lossy, particularly filters, regards spam source.             An autonomous system is a network or group of net-
      Information gained regarding one IP isn't gainfully                works under a common administration and with com-
      applicable to its neighbors, or even (often) itself in             mon routing policies. BGP is used to exchange rout-
      subsequent abuse attempts.                                         ing information for the Internet and is the protocol
                                                                         used between Internet service providers (ISP).
    · Whack-a-mole, particularly DNSBLs, regards point
                                                                    [http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/bgp.htm]
      vs. aggregate source. Rinse, wash, repeat with IPv6:
      DNSBLs scale very, very poorly in this case.                    The key points to recognize are:

    · Reliance on third parties reliably, accurately, equi-           · BGP is fundamental to the nature of the Internet. It
      tably, and expeditiously collect and distribute assess-           defines the relationships between autonomous sys-
      ment data.                                                        tems ­ the networks the Internet internetworks be-
                                                                        tween.
    · CPU and/or wall-clock intensive, particularly for
      large sites. Often extending to other resources in-             · It ties directly to an organization: the AS owner,
      cluding threads, filehandles, memory, etc.                        identified by ASN.

    · Generally fail to impose overhead on spam source.               · It ties directly to network data, the CIDRs which
                                                                        BGP peering rules are applied to.
    · Are uniformly applied to mail from both trusted and
      untrusted sources inducing unnecessary cost.                    · Though IP space is large, and will likely get vastly
                                                                        larger as IPv6 is widely adopted, pragmatic con-
While not arguing that these methods be disposed of, a                  straints suggest that ASN proliferation will not
method is presented here of taking a large first cut at the             change as markedly. Currently there are some
spam problem before incurring the cost and uncertainty of               39,500 assigned ASNs with a total namespace of
other filtering methods.                                                65,535.
  What If You Could...
                                                            In other words: you've found the folks in charge, where
    · Tie an IP address to the organization responsible for they are, and how they relate to you. Since SMTP deliv-
      it.                                                   eries are stateful TCP transactions with defined IP peer
    · And a network address space (CIDR block)              relationships (and spoofing is not practically significant),
                                                            we have a known IP.
    · In a manner leveraging existing spam detection / fil-    Now all you need is something which can return ASN
      tering tools for single-point IPs                     and CIDR data for a given IP address.

                                                                2
   The Routeviews project (http://www.routeviews.org/)
provides just such a capability, though others exist. It is "a
tool for Internet operators to obtain real-time information
about the global routing system from the perspectives of
several different backbones and locations around the In-
ternet", and was first noted by Joe St. Sauver of Univer-
sity of Oregon. Routeviews provides zonefiles, updated
twice daily, and queryable at:

      host -t txt
         .asn.routeviews.org


To determine, for example, the ASN and CIDR for the
AOL mailserver mailin-01.mx.aol.com at 64.12.137.249:                   The plot shows the total percent of spam contributed
                                                                     (vertical axis) by ASNs (incremented along horizontal
                                                                     axis).
      $ host -t mx
         249.137.12.64.asn.routeviews.org
                                                                        CIDR data show a similar, though less concentrated,
      249.137.12.64.asn.routeviews.org                               power distribution. Specific ASNs involved vary, though
         descriptive text "8176"                                     gross abusers have been fairly stable over time. Typical
         "64.12.0.0" "16"                                            among them are ASNs from China, Korea, large web-
                                                                     mail providers (usually 419/Advanced-Fee fraud spam),
                                                                     large European or Middle-eastern ISPs (often quasi-
This tells us that the server is in ASN 8176, CIDR
                                                                     governmental monopolies), blowback/backscatter sources
64.12.0.0/16.
                                                                     (which would be specific to a given email address at any
   For use in mitigating spam, you want to find which                one time), and occasionally larger US commercial ISPs.
ASNs are principally associated with spam traffic, noting            Specific trends are highly idiosyncratic. You are very
volumes of both spam and ham (non-spam) mail received                strongly encouraged to note trends from your own expe-
from various sources. Ideally both total mail volume and             rience, not other sites'. Sharing data is possible and may
spam proportion would be noted.                                      be useful but should not be principally relied on.
   Routeviews.org makes the zonefiles available via rsync               In conjunction with numerous spam reports sent to the
to allow large sites to run queries against a local name-            organizations associated with domains, IPs, and/or ASNs,
server for increased performance.                                    it's further noted that network organizations can be seper-
                                                                     ated into two classes:

5 Pareto's law and spam sources                                        · Those which deal preemptively or reactively in a way
                                                                         which minimizes abuse problems
A power distribution is very evident in monthly data seen              · Those which don't, can't, or won't.
to date.
                                                                     This observation gives rise to the concept of network hy-
  ·   Over a two year period, 3-5 ASNs con-                          giene, namely that there are neighborhoods which are well
                                                                     policed and those which aren't. Methods for increasing
      tribute 25% of all monthly spam.                               the accountability of a network's own hygenic practices
                                                                     would be a net benefit.
  ·   50% of all spam comes from 9 to 35                                Additional statistics, tables, and plots follow at the end
      sources.                                                       of this paper.

                                                                 3
6    Application                                                  would be denied or dropped, either at the service (proto-
                                                                  col) or IP level, at random. If done at the SMTP transac-
ASNs by themselves don't tell you whether or not traffic          tion level, either as a timeout (without 250 OK) or non-
is abusive, or if a given IP range is spammy. What's nec-         permenant rejection, this would mean legitimate mail still
essary is to identify sources of undesireable traffic, map        has a fighting chance to get through. A 90% reject rate
these to an ASN and / CIDR, and determine your house              would allow half of mail through on 5 retries, for a typi-
rules for handling traffic from that CIDR. Two steps are          cal 2 hour delay. A spam server without retry rules would
necessary: data aquisition, and policy enforcement.               fail delivery of 90% of its mail, with retries it would suffer
                                                                  large mail spools and possible other resource starvation.
                                                                     The site implementing such a policy will receive imme-
6.1 Data aquisition                                               diate benefit to itself. Widespread adoption is not neces-
                                                                  sary to be locally beneficial. As multiple and large sites
Aquire a list of IPs doing things you don't (or do) like:
                                                                  adopt such measures, impacts on abuse-tolerant networks
spam, viruses, open proxies, portscans, blog / comment
                                                                  would be significant. The approach is to be both non-
spam, referrer spam, business partners, friends, vendors,
                                                                  invasive and non-retaliatory. You are not taking any ac-
bad breath, drinking white zin. Look up the associated
                                                                  tion which in any way directly changes or affects a remote
ASN / CIDR. Note which are naughty or nice. This
                                                                  system: but are subjecting it to a denial of interest.
could be accomplished in the case of spam by dropbox
                                                                     As a proportionate response, reject rates could vary
accounts, honeypots, server logs, end-user submissions,
                                                                  with total traffic volume, abusive traffic percentage, and
or other means. Because of the power of aggregation al-
                                                                  severity of abuse, as suited specific needs. Fine levels of
lowed by ASN/CIDR lookups, a reasonably constructed
                                                                  control are therefor possible, operators are not reduced to
spam provider sample may be very small. On the order of
                                                                  all-or-nothing responses to abuse.
1:1,000,000 or fewer mails for a very large provider.


6.2 Policy enforcement: CIDR house-rules                          7 Data and additional references
Implement a policy at the service (eg: email, web, mes-           Some additional information and references on use of
saging) or firewall (eg: iptables) level. These are your          BGP and ASN data in spam mitigation.
house rules for interacting with a given CIDR, ASN, IP,
or other defined network block.                                   7.1 Related third-party discussions of spam
   While blocklisting is one possible option, I'd very                and ASN data:
much like to see the discussion move beyond that point.
A preferred approach is what I term "proportionate re-              · The Routeviews project:
                                                                       http://www.routeviews.org/
sponse". First: you'll likely want rules to expedite known-
trusted mail, or high priority mail from remote organiza-           · Chris Siebenmann's blog describing spam combat at
tional sites, peers, clients, vendors, or other established           the University of Toronto, Canada, including use of
                                                                      BGP and ASN data at the server level:
relationships. Secondly, many peers will either have small             http://utcc.utoronto.ca/~cks/space/blog/spam/SpamByASN
overall volumes, or not have a clearly identifiable na-
ture. This leaves the set of networks which are both                · Michael Greb's blog on spam, including data on
                                                                      spam by ASN, collected from several spamtrap ad-
high-volume and overwhelmingly spammy in nature. Of                   dresses:
course, any such implementation would have to be evalu-                http://spam.thegrebs.com/
ated in a business and organizational context.
   In proportionate response, a certain level of abuse
                                                                  7.2 Summaries of spam by ASN & CIDR
would be met by a proportionate level of response. For
example, a network from which 90% of email was found              Full online reports of my own data are frequently updated
to be spam, 90% of traffic originating from that network          at:

                                                              4
     http://linuxmafia.com/~karsten/monthly-asn-report
     http://linuxmafia.com/~karsten/monthly-cidr-report

Historical data from January 2004 through present, with
some gaps, are saved by year and month in YYYYMM
form available at:
     http://linuxmafia.com/~karsten/monthly-asn-report-YYYYMM.txt
     http://linuxmafia.com/~karsten/monthly-cidr-report-YYYYMM.txt

From current data, ASNs and CIDRs with most reported
spams. Note that report classification isn't entirely accu-
rate though trends are generally well presented.

  Report date: Mon Feb 27 23:37:48 PST 2006
  Total spams: 11249
  Total ASNs: 955
   Rank Cumulative %          % Spams ASN                 Description
     1             9.9% 9.9%        1113 8176             NETSCAPE-ASN
     2            18.5% 8.6%         968 4135             CHINANET-BACKBONE
     3            24.5% 6.0%         673 4814             CHINA169-BBN CNCGROUP
     4            28.3% 3.8%         432 8176             NETSCAPE-ASN
     5            31.6% 3.3%         373 4837             CHINA169-BACKBONE
     6            34.5% 2.9%         322 4755             KIXS-AS-KR Korea Telecom
     7            36.7% 2.0%         248 3269             ASN-IBSNAZ TELECOM ITALIA
     8            38.8% 2.0%         230 17858            KRNIC-ASBLOCK-AP KRNIC
     9            40.7% 1.9%         217 1668             AOL-ATDN
    10            42.1% 1.4%         161 17849            GINAMHANVIT-AS-KR

  Report date: Mon Feb 27 23:39:24 PST 2006
  Total spams: 11248 Total CIDRs: 2251
   Rank Cumulative %          % Spams CIDR                        AS & Description
     1             9.9% 9.9%        1113 64.12.0.0/16             8176 NETSCAPE-ASN
     2            13.7% 3.8%         432 16/                      8176 NETSCAPE-ASN
     3            15.7% 1.9%         217 205.188.0.0/16           1668 AOL-ATDN
     4            17.5% 1.9%         211 212.216.128.0/17         3269 ASN-IBSNAZ TELECOM ITALIA
     5            19.2% 1.6%         183 220.163.0.0/17           4134 CHINANET-BACKBONE
     6              20.5 1.4%        155 4755/61.17.128.0         VSNL-AS
     7            21.9% 1.5%         152 221.220.128.0/18         4814 CHINA169-BBN
     8            23.1% 1.2%         139 4755/61.17.176.0         4755 VSNL-AS
     9            23.4% 1.2%         136 218.63.0.0/17            4134 CHINANET-BACKBONE
    10            25.4% 1.0%         115 61.148.128.0/18          4814 CHINA169-BBN
    10            26.4% 1.0%         115 222.129.64.0/18          4815 CHINA169-BBN




                                                              5