Information about http://ftp.isc.org/isc/inn/extra-docs/innusenix.pdf

InterNetNews: Usenet …

Tags: bulletin board system, checks, electronic mail messages, explosive growth, hosts, implementation, incoming connections, internet sites, logical network, network news transfer protocol, open software foundation, server architecture, traffic, transport layer, unix, unix server, usenet message,
Pages: 6
Language: english
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
Page 6
image
                           InterNetNews: Usenet
                         transport for Internet sites
                                  Rich Salz ­ Open Software Foundation

                                               ABSTRACT
              NNTP, the Network News Transfer Protocol, has been labelled the most widely
        implemented elective protocol in the Internet. The growth of the Internet has meant more
        sites exchanging NNTP data. While the explosive growth in Usenet traffic places demands
        on all sites, the goal of fast network access puts particular demands on NNTP hosts.
              InterNetNews is an implementation of the Usenet transport layer designed to address
        this situation. It replaces the standard UNIX server architecture with a single long-running
        server that handles all incoming connections. It has proven to be quite successful, providing
        quick and efficient news transfer.

                   Introduction                                  The Path header is used to prevent message
                                                           loops. For example, an article written at A could get
      Usenet is a distributed bulletin board system,
                                                           sent to B, D, C, and then back to A. Before pro-
built as a logical network on top of other networks
                                                           pagating an article, a site prepends its own name to
and connections. By design, messages resemble
                                                           the Path header. Before propagating an article to a
standard Internet electronic mail messages as defined
                                                           site, the receiving host checks to make sure that the
in RFC822 [Crocker82]. The Usenet message for-
                                                           site that would receive the article does not appear in
mat is described in RFC1036 [Adams87]. This
                                                           the Path line. For example, when the article arrived
defines some additional headers. It also limits the
                                                           at site C, the Path would contain A!B!D, so site C
values of some of the standard headers as well as
                                                           would know not to send the article to A.
giving some of them special semantics.
                                                                 Sites also keep a record of the Message-ID's of
      Newsgroups are the classification system of
                                                           all articles they currently have. If D receives an
Usenet. The required Newsgroups header specifies
                                                           article from B, it will reject the article if C offers it
where a message, or article, should be filed upon
                                                           later. For self-protection, most sites keep a record
reception. Sites are free to carry whatever groups
                                                           of recent articles that they no longer have. This is
they want. Most sites carry the core set of so-called
                                                           very useful when another site dumps a (usually quite
``mainstream'' groups. There are currently about
                                                           large) batch of old news back out to Usenet.
730 of these groups, and one or two new ones is
created every week.                                              For the past few years, the amount of data gen-
                                                           erated by Usenet sites has been doubling every year.
      Messages generated at a site are sent to the
                                                           A site that receives all the mainstream groups is
site's ``neighbors'' who process them and relay them
                                                           receiving over 17 megabytes a day spread out over
to their neighbors, and so on. Sites can be intercon-
                                                           11,000 articles [Adams92]. About 20% of the data
nected ­ indeed, on the Internet, this is quite com-
                                                           is article headers, and while all of them must be
mon. See Figure 1.
                                                           scanned only half of it is must be processed by the
                                                           Usenet software.1

                   B                                             The number of sites participating in Usenet has
                                                           been growing almost as quickly. Based on articles
                                                           his site receives and survey data sent in by partici-
                                                           pating sites, Brian Reid estimates that there are
                                                           36,000 sites with 1.4 million participants [Reid91].
  A                                 D                      A ``sendsys'' message to the ``inet'' distribution in
                                                           June of 1989 received about 200 replies in the first
                                                           twenty-four hours. A year later, nearly 700 replies
                                                           were received. (Sendsys is a special article that asks
                   C                                       all receiving sites to send back an email message,
                                                           usually without human intervention; by convention,
                                                           inet is primarily the set of sites on the Internet.)
Figure 1: Small Usenet topology (all links are
    two-way).                                               1Yes, this means that, as far as the software is
                                                           concerned, Usenet is over 90% noise.



Summer '92 USENIX ­ June 8-June 12, 1992 ­ San Antonio, TX                                                       93
InterNetNews ...                                                                                               Salz


      The NNTP protocol is defined in Internet RFC           appropriate rnews or relaynews program processed it.
977 [Kantor86] published in February, 1986. This             In order to avoid processing an article the system
was accompanied by the general public release of a           already has, it first does a lookup on the history
reference implementation, also called ``nntp.'' This         database to see if the article exists. It soon became
has been the only NNTP implementation that is gen-           apparent that invoking relaynews for every article
erally available to UNIX sites.                              lost all of C News's speed gain, so the NNTP pack-
                                                             age was changed to write a set of articles into a
                   Usenet Software                           batch, and offer the batch to relaynews.
      In addition to InterNetNews, there are two                    When articles arrive faster then relaynews can
major Usenet packages available for UNIX sites. All          process them, they must be spooled. If two sites (B
three share several common implementation details.           and C in the previous examples) both offer a third
A newsgroup name such as comp.foo is mapped to a             site (D) the same article at the ``same time'' then an
directory comp/foo within a global spool directory.          extra copy will be spooled, only to be rejected when
An article posted to a group is assigned a unique            it is processed, wasting disk space; this problem
increasing number based on a file called the active          multiplies as the number of incoming sites
file. If an article is posted to multiple groups, links      increases.2 To alleviate this problem, most sites run
are used so that only one copy of the data is kept.          Paul Vixie's msgidd, a daemon that keeps a
A sys file contains patterns describing what news-           memory-resident list of article Message-ID's offered
groups the site wishes to receive, and how articles          within the last 24 hours. The NNTP server is
should be propagated. In most cases, this means that         modified so that it tells this daemon all of the arti-
a record of the article is written to a ``batchfile'' that   cles that it is handing to Usenet and queries the dae-
is processed off-line to do the actual sending.              mon before telling the remote site that it needs the
      The first Usenet package is called B News, also        article. This is not a perfect solution ­ if the first,
known as B2.11. The B news model is very simple:             spooled, copy of the article is lost or corrupted, the
the program rnews is run to process each incoming            site will likely never be offered the article after the
article. Locking is used to make sure that only one          msgidd cache entry has expired. Going further,
rnews process tries to update the active file and his-       msgidd is work-around for a problem inherent in the
tory database. At one site that received over 15,000         current software architecture.
articles per day, the locking would often fail so that              Other problems, while not as severe, lead to the
10 to 100 duplicates were not uncommon. Because              conclusion that a new implementation of Usenet is
each article is handled by a separate process, it is         needed for Internet sites. For example:
impossible to pre-calcuate or cache any useful data.            




                                                                   Since all articles are spooled, relaynews can-
      More importantly, file I/O had become a major                not tell the NNTP server the ultimate disposi-
bottleneck. A site that feeds 10 other sites does over             tion of the article, and the server cannot tell
150,000 open/append/close operations on its                        its peer at the other end of the wire. This
batchfiles. It is generally agreed that B news cannot              hides transmission problems. For example, a
keep up with current Usenet volume; it is no longer                site tracing the communication has no way of
being maintained, and its author has said more then                finding out an article was rejected because the
once that the software should be considered ``dead.''              remote site does not receive that particular set
                                                                   of newsgroups.
      C News gets much better performance then B
                                                                   The NNTP reference implementation is show-
                                                                




news by processing articles in batches [Collyer87].
                                                                   ing signs of age. Maintaining the server is
The relaynews program is run several times a day to
                                                                   becoming a maintenance nightmare; over
process all the articles that have been received since
                                                                   one-tenth of its 6,800 lines are #ifdef-related.
the last run. Since only one relaynews program is
                                                                   All articles are written to disk at extra time.
                                                                




running, it is not necessary to do fine-grain locking
                                                                   Disks are getting bigger, but not faster, while
of any of the supporting data files. More impor-
                                                                   CPU's, memory, and networks are.
tantly, it can keep the entire active and sys file in
memory. It can also use buffered I/O on its                               InterNetNews architecture
batchfiles, reducing the amount of system calls by
one or two orders of magnitude.                                   There are four key programs in the InterNet-
                                                             News package (see Figure 2):
      An alpha version of C News was released in             Innd is the principal news server for incoming
October, 1987. Within four years it surpassed B                  newsfeeds;
news in popularity, and there are now more sites
running C News then ever ran B news.                           2This is quite common for Internet sites, where
      From the beginning, the NNTP reference                 redundant fast newsfeeds are common and where many
implementation was layered on top of the existing            Usenet administrators seem to be avid players of the
Usenet software: an article received from a remote           ``exchange news with as many other people as possible''
NNTP peer was written to a temporary file and the            game.



94                                               Summer '92 USENIX ­ June 8-June 12, 1992 ­ San Antonio, TX
      Salz                                                                                             InterNetNews ...


      innxmit reads a file identifying articles and offers            Until recently, innxmit used writev to send its
           them to another site;                                 data to the remote host. At start-up it filled a three-
      ctlinnd sends control commands to innd;                    element struct iovec array with the following ele-
      nnrpd is an NNTP server oriented for newsreaders.          ments:
            Of these programs, the most important is innd.        [0]     { ".", 1 };
      We first mention enough of its architecture to give a       [1]     { placeholder };
      context for the other programs, and then discuss its        [2]     { "\r\n" }
      design and features in more detail at the end of this
                                                                 To write a line, the placeholder was filled in with a
      section.
                                                                 pointer to the buffer, and its length, and a single wri-
                                                                 tev was done, starting from either element zero or
                     innd daemon                                 one. While this implementation was clever, and
                                                                 simpler then what was done elsewhere, it was not
                                                                 very fast. Innxmit now uses a 16k buffer and only
Remote Feeds                                                     does a write when the next line would not fit. This
                                                                 is also consistent with ideas used throughout the rest
                                                                 of INN: use the read and write system calls,
                                                                 referencing the data out of large buffers while avoid-
                                        Local newsreaders        ing the copying commonly done by the standard I/O
                                                                 library.
                                                                       The ctlinnd program is used to tell the innd
                           ctlinnd                               server to perform special tasks. It does this by com-
                   Figure 2: Innd architecture                   municating over a UNIX-domain datagram socket.
                                                                 The socket is behind a firewall directory that is
            Innd is a single daemon that receives all            mode 770, so that only members of the news
      incoming articles, files them, and queues them up for      administration group can send messages to it. It is a
      propagation. It waits for connections on the NNTP          very small program that parses the first parameter in
      port. When a connection is received, a getpeer-            its command line and converts it to an internal com-
      name (2) is done. If the host is not in an access file,    mand identifier. This identifier and the remaining
      then an nnrpd process is spawned with the connec-          parameters are sent to innd which processes the
      tion tied to its standard input and output.3 It is worth   command, and sends back a formatted reply. For
      noting that nnrpd is only about 3,500 lines of code,       example, the following commands stops the server
      and 20% of them are for the ``POST'' command,              from accepting any new connections, adds a news-
      used to verify the headers in a user's article. Nnrpd      group, and then tells it to recompute the list of hosts
      provides all NNTP commands except for ``IHAVE''            that are authorized newsfeeds:
      and an incomplete version of ``NEWNEWS''. On                ctlinnd pause "Clearing out log files"
      the other hand, it does provide extensions for              ctlinnd newgroup comp.sources.unix m \
      pattern-matching on an article header and listing                       vixie@pa.dec.com
      exactly what articles are in a group. The NNTP pro-         ctlinnd reload newsfeeds "Added OSF feed"
      tocol seems to be a good example of the UNIX philo-         ctlinnd go ""
      sophy: it is small, general, and powerful and can be       The text arguments are sent to syslog (8) for audit
      implemented in a very small program.                       purposes.
            Articles are usually forwarded by having innd              The most commonly-used ctlinnd command is
      record the article in a ``batchfile'' which is pro-        ``flush.'' This directs the server to close the
      cessed by another program. For Internet sites, the         batchfile that is open for a site, and is typically used
      innxmit program is used to offer articles to the host      as follows:
      specified on its command line.4 The input to innxmit
      is a set of lines containing a pathname to the article      mv batchfile batchfile.work
      and its Message-ID. Since the Message-ID is in the          ctlinnd flush sitename
      batchfile, innxmit does not have to open the article        innxmit sitename batchfile.work
      and scan it before offering the article to the remote
                                                                       The flush command points out another differ-
      site. This can give significant savings if the remote
                                                                 ence between INN and other Usenet software. The
      site already has a percentage of the articles.
                                                                 B News inews program needed no external locking ­
                                                                 files were opened and closed for a very short win-
                                                                 dow, the time needed to process one article. The C
       3Unlike other implementations, no single INN program      News relaynews could be running for a longer period
      implements the entire NNTP protocol.                       of time. The only way to get access to a batchfile is
        4Other programs, like nntplink, are supported but not
                                                                 to either lock the entire news system, which is over-
      part of the INN distribution.                              kill for the desired task, or to rename the file and


      Summer '92 USENIX ­ June 8-June 12, 1992 ­ San Antonio, TX                                                      95
InterNetNews ...                                                                                               Salz


wait until the original name shows up again. The                     ng->Sites[i]->Sendit = TRUE;
INN approach is more efficient and conceptually                  }
cleaner.                                                     }
                                                             for (sp = Sites; sp < LastSite; sp++) {
                   Innd Structure                              if (!sp->Sendit)
     When innd starts up it reads the active file into           continue;
memory. An array of NEWSGROUP structures is                    for (p = sp->FileFlags; *p; p++)
                                                                 switch (*p) {
created, one for each newsgroup, that contains the
                                                                 case 'm':
following elements:                                                /* Write Message-ID */
 char   *Name;           /*   "comp.sources.unix */              case 'n':
 char   *Dir;            /*   "comp/sources/unix/" */              /* Write filename */
 long   Last;            /*   0211 */                            ...
 int    LastWidth;       /*   5 */
                                                                 }
 char   *LastString;     /*   "00211..." */
 char   *Rest;           /*   "m\n..." */                    }
 int    SiteCount;       /*   1 */                         The ARTDATA structure contains information about
 SITE   **Sites;         /*   defined below */
                                                           the current article such as its size, the host that sent
The C comments above show the data that would              it, and so on. The MeetsSiteCriteria function is an
generated for the following line in the active file:       abstraction for the in-line tests that are done to see if
 comp.sources.unix 00211 00202 m                           an article really should be propagated to a site (e.g.,
                                                           checking the Path header as described above).
The Last field specifies the name to be given to the       AssignName is described below.
next article in the group. The LastString element
                                                                 At its core, innd is an I/O scheduler that makes
points into the in-memory copy of the file. This
                                                           callbacks when select (2) has determined that there is
number is carefully formatted so that the file can be
                                                           activity on a descriptor. This is encapsulated in the
memory-mapped, or updated with a single write.
                                                           CHANNEL structure, which has the following ele-
       A hash table into the structure array is built,     ments:
using a function provided by Chris Torek [Torek91].
The hash calculation is very simple, yet empirically         enum TYPE         Type;
it gives near-uniform distribution. The secondary            enum STATE        State;
key is the highest article number, so groups with the        int               fd;
most traffic tend to be at the top of the bucket.            FUNCPTR           Reader;
                                                             FUNCPTR           WriteDone;
       The INN equivalent of the sys file read next.         BUFFER            In;
An array of SITE structures is created, one for each         BUFFER            Out;
site, that contains the following elements:
 BOOL     Sendit;                                             The Type field is used for internal consistency
 char     FileFlags[10];                                checks. There four different types of channels ­
                                                        local-accept, remote-accept, local-control (used by
The FileFlags array specifies what information          ctlinnd) and NNTP connection. Each type is imple-
should be written to the site's data stream when it     mented in anywhere from 100 to 1200 lines of code.
receives an article. The subscription list for the site The Reader and WriteDone function pointers, and
is then parsed, and for all the newsgroup that it       the State enumeration are used for protocol-specific
recives, the matching NEWSGROUP structure will          data. For example, State field is used by the NNTP
contain a pointer to the SITE structure.                channel code to determine whether the site is send-
      Using these two structures it is easy to step     ing an NNTP command or an article. The BUFFER
through how an article is propagated:                   datatype contains sized reusable I/O buffers that
  extern ARTDATA *art;                                  grow as needed.
  extern SITE *Sites, *LastSite;                              At start time innd calls getdtablesize (2) to
  extern int nSites;                                    create an array of channels that can be directly
  char **pp;                                            indexed by descriptor.
  SITE *sp;
  NEWSGROUP *ng;                                              The code to listen on the NNTP port is show in
  int i;                                                Figure 3. When a host connects to the NNTP port,
                                                        select (2) will report activity on the descriptor and
  while (*pp) {                                         call RemoteReader which will accept the connection
     ng = HashNewsgroup(*pp++);                         and possibly create fill in a new CHANNEL out of
     if (ng == NULL)
       continue;
                                                        the resultant descriptor.
     AssignName(ng);                                          It took a bit of effort to write the callback loop
     for (i = 0; i < ng->nSites; i++) {                 so that it was fair ­ i.e., so that the lowest descrip-
       if (MeetsSiteCritera(ng->Sites[i], art)) tors did not get priority treatment. The problem was


96                                           Summer '92 USENIX ­ June 8-June 12, 1992 ­ San Antonio, TX
Salz                                                                                              InterNetNews ...


complicated because other parts of the server can            This is a very fast way of writing the article to disk;
add and remove themselves from the select (2) read           it avoids extra memory copies, and is only possible
and write mask, as needed.                                   because the entire article is kept in memory until the
       Once the NNTP channel has been created for a          last moment.
site, the server is ready to accept articles from that                           Future work
site. The reader function for NNTP channels reads
                                                RFC 977 follows the SMTP protocol for send-
as much data as is available from the descriptor. If
                                          ing text: line are terminated with \r\n, a period is
it is in ``get a command'' state, it looks for a simple
                                          placed before all lines starting with a period, and
\r\n terminator; if it is in ``reading an article'' state,
                                          data is terminated with a line consisting of a single
it looks for a ``\r\n.\r\n'' terminator. If not found, it
                                          period [Postel82]. Innd must scan the text of all
just returns; the data will become available at some
                                          articles it receives and convert them to standard
point. If the terminator is found, it processes the
                                          UNIX format. On the transmission side, innxmit must
data in the buffer. For filing an article, this means
                                          read the articles a line at a time in order to add the
cleaning up the NNTP text escapes, and calling the
                                          extra data. If all newsreading is done via NNTP,
article abstraction to process the article.
                                          then articles could be stored directly in NNTP for-
       Processing the article is the largest single rou-
                                          mat, and innxmit could read and write the article in
tine in the server. The AssignName shown above
                                          two system calls. The innd gains would not be as
increments the high-water mark for the newsgroup.
                                          dramatic, but tests show it would still be somewhat
If the article has already been written to disk, a link
                                          measurable.
is made under the new name. (Symbolic links, if
                                                There is no NNTP ``TURN'' command, so that
available, can be used if the spool area spans parti-
                                          a single connection cannot be used for bidirectional
tions.) If the article has not been written, a
                                          article transfer. Turn-around is very successful on
struct iovec array is set up as shown below, where
vertical bars separate each iovec element:UUCP over conventional phone lines, but seems of
 First headers...                         limited use on higher-performance network links.
 Path: |LOCAL_PATH_PREFIX|rest of path... The SMTP protocol has had a ``TURN'' command
 Second headers...                        since its inception, but it has received no practical
 |XREF_HEADER|                            use. Several people find the idea of adding outgoing
                                          transfer to innd attractive, since it is already struc-
 |Article body.                           tured for multi-host I/O and the idea of caching
 int
 RemoteReader(cp)
     CHANNEL *cp;
 {
     int newfd;
     struct sockadr_in remote;
     int remsize;
        newfd = accept(cp->fd, &remote, &remsize);
        if (InAccessFile(remsocket))
            CHANcreate(newfd, TYPEnntp, STATEgetcmd, NCreader, NCwritedone);
        else {
            ForkAndExec("nnrpd", newfd);
            close(newfd);
        }
 }
 int
 RemoteSetup()
 {
     int fd;

        fd = GetSocketBoundToNNTPPort();
        CHANcreate(fd, TYPEremote, STATElisten, RemoteReader, NULL);
 }
                                        Figure 3: Listening on an NNTP port



Summer '92 USENIX ­ June 8-June 12, 1992 ­ San Antonio, TX                                                       97
InterNetNews ...                                                                                         Salz


recent articles in memory has its appeal. Adding           [Adams92] Rick Adams, Total traffic through uunet
outgoing transfer to innd would take a moderate                 for the last 2 weeks, Usenet message
effort.