Tags: bulletin board system, checks, electronic mail messages, explosive growth, hosts, implementation, incoming connections, internet sites, logical network, network news transfer protocol, open software foundation, server architecture, traffic, transport layer, unix, unix server, usenet message,
InterNetNews: Usenet
transport for Internet sites
Rich Salz Open Software Foundation
ABSTRACT
NNTP, the Network News Transfer Protocol, has been labelled the most widely
implemented elective protocol in the Internet. The growth of the Internet has meant more
sites exchanging NNTP data. While the explosive growth in Usenet traffic places demands
on all sites, the goal of fast network access puts particular demands on NNTP hosts.
InterNetNews is an implementation of the Usenet transport layer designed to address
this situation. It replaces the standard UNIX server architecture with a single long-running
server that handles all incoming connections. It has proven to be quite successful, providing
quick and efficient news transfer.
Introduction The Path header is used to prevent message
loops. For example, an article written at A could get
Usenet is a distributed bulletin board system,
sent to B, D, C, and then back to A. Before pro-
built as a logical network on top of other networks
pagating an article, a site prepends its own name to
and connections. By design, messages resemble
the Path header. Before propagating an article to a
standard Internet electronic mail messages as defined
site, the receiving host checks to make sure that the
in RFC822 [Crocker82]. The Usenet message for-
site that would receive the article does not appear in
mat is described in RFC1036 [Adams87]. This
the Path line. For example, when the article arrived
defines some additional headers. It also limits the
at site C, the Path would contain A!B!D, so site C
values of some of the standard headers as well as
would know not to send the article to A.
giving some of them special semantics.
Sites also keep a record of the Message-ID's of
Newsgroups are the classification system of
all articles they currently have. If D receives an
Usenet. The required Newsgroups header specifies
article from B, it will reject the article if C offers it
where a message, or article, should be filed upon
later. For self-protection, most sites keep a record
reception. Sites are free to carry whatever groups
of recent articles that they no longer have. This is
they want. Most sites carry the core set of so-called
very useful when another site dumps a (usually quite
``mainstream'' groups. There are currently about
large) batch of old news back out to Usenet.
730 of these groups, and one or two new ones is
created every week. For the past few years, the amount of data gen-
erated by Usenet sites has been doubling every year.
Messages generated at a site are sent to the
A site that receives all the mainstream groups is
site's ``neighbors'' who process them and relay them
receiving over 17 megabytes a day spread out over
to their neighbors, and so on. Sites can be intercon-
11,000 articles [Adams92]. About 20% of the data
nected indeed, on the Internet, this is quite com-
is article headers, and while all of them must be
mon. See Figure 1.
scanned only half of it is must be processed by the
Usenet software.1
B The number of sites participating in Usenet has
been growing almost as quickly. Based on articles
his site receives and survey data sent in by partici-
pating sites, Brian Reid estimates that there are
36,000 sites with 1.4 million participants [Reid91].
A D A ``sendsys'' message to the ``inet'' distribution in
June of 1989 received about 200 replies in the first
twenty-four hours. A year later, nearly 700 replies
were received. (Sendsys is a special article that asks
C all receiving sites to send back an email message,
usually without human intervention; by convention,
inet is primarily the set of sites on the Internet.)
Figure 1: Small Usenet topology (all links are
two-way). 1Yes, this means that, as far as the software is
concerned, Usenet is over 90% noise.
Summer '92 USENIX June 8-June 12, 1992 San Antonio, TX 93
InterNetNews ... Salz
The NNTP protocol is defined in Internet RFC appropriate rnews or relaynews program processed it.
977 [Kantor86] published in February, 1986. This In order to avoid processing an article the system
was accompanied by the general public release of a already has, it first does a lookup on the history
reference implementation, also called ``nntp.'' This database to see if the article exists. It soon became
has been the only NNTP implementation that is gen- apparent that invoking relaynews for every article
erally available to UNIX sites. lost all of C News's speed gain, so the NNTP pack-
age was changed to write a set of articles into a
Usenet Software batch, and offer the batch to relaynews.
In addition to InterNetNews, there are two When articles arrive faster then relaynews can
major Usenet packages available for UNIX sites. All process them, they must be spooled. If two sites (B
three share several common implementation details. and C in the previous examples) both offer a third
A newsgroup name such as comp.foo is mapped to a site (D) the same article at the ``same time'' then an
directory comp/foo within a global spool directory. extra copy will be spooled, only to be rejected when
An article posted to a group is assigned a unique it is processed, wasting disk space; this problem
increasing number based on a file called the active multiplies as the number of incoming sites
file. If an article is posted to multiple groups, links increases.2 To alleviate this problem, most sites run
are used so that only one copy of the data is kept. Paul Vixie's msgidd, a daemon that keeps a
A sys file contains patterns describing what news- memory-resident list of article Message-ID's offered
groups the site wishes to receive, and how articles within the last 24 hours. The NNTP server is
should be propagated. In most cases, this means that modified so that it tells this daemon all of the arti-
a record of the article is written to a ``batchfile'' that cles that it is handing to Usenet and queries the dae-
is processed off-line to do the actual sending. mon before telling the remote site that it needs the
The first Usenet package is called B News, also article. This is not a perfect solution if the first,
known as B2.11. The B news model is very simple: spooled, copy of the article is lost or corrupted, the
the program rnews is run to process each incoming site will likely never be offered the article after the
article. Locking is used to make sure that only one msgidd cache entry has expired. Going further,
rnews process tries to update the active file and his- msgidd is work-around for a problem inherent in the
tory database. At one site that received over 15,000 current software architecture.
articles per day, the locking would often fail so that Other problems, while not as severe, lead to the
10 to 100 duplicates were not uncommon. Because conclusion that a new implementation of Usenet is
each article is handled by a separate process, it is needed for Internet sites. For example:
impossible to pre-calcuate or cache any useful data.
Since all articles are spooled, relaynews can-
More importantly, file I/O had become a major not tell the NNTP server the ultimate disposi-
bottleneck. A site that feeds 10 other sites does over tion of the article, and the server cannot tell
150,000 open/append/close operations on its its peer at the other end of the wire. This
batchfiles. It is generally agreed that B news cannot hides transmission problems. For example, a
keep up with current Usenet volume; it is no longer site tracing the communication has no way of
being maintained, and its author has said more then finding out an article was rejected because the
once that the software should be considered ``dead.'' remote site does not receive that particular set
of newsgroups.
C News gets much better performance then B
The NNTP reference implementation is show-
news by processing articles in batches [Collyer87].
ing signs of age. Maintaining the server is
The relaynews program is run several times a day to
becoming a maintenance nightmare; over
process all the articles that have been received since
one-tenth of its 6,800 lines are #ifdef-related.
the last run. Since only one relaynews program is
All articles are written to disk at extra time.
running, it is not necessary to do fine-grain locking
Disks are getting bigger, but not faster, while
of any of the supporting data files. More impor-
CPU's, memory, and networks are.
tantly, it can keep the entire active and sys file in
memory. It can also use buffered I/O on its InterNetNews architecture
batchfiles, reducing the amount of system calls by
one or two orders of magnitude. There are four key programs in the InterNet-
News package (see Figure 2):
An alpha version of C News was released in Innd is the principal news server for incoming
October, 1987. Within four years it surpassed B newsfeeds;
news in popularity, and there are now more sites
running C News then ever ran B news. 2This is quite common for Internet sites, where
From the beginning, the NNTP reference redundant fast newsfeeds are common and where many
implementation was layered on top of the existing Usenet administrators seem to be avid players of the
Usenet software: an article received from a remote ``exchange news with as many other people as possible''
NNTP peer was written to a temporary file and the game.
94 Summer '92 USENIX June 8-June 12, 1992 San Antonio, TX
Salz InterNetNews ...
innxmit reads a file identifying articles and offers Until recently, innxmit used writev to send its
them to another site; data to the remote host. At start-up it filled a three-
ctlinnd sends control commands to innd; element struct iovec array with the following ele-
nnrpd is an NNTP server oriented for newsreaders. ments:
Of these programs, the most important is innd. [0] { ".", 1 };
We first mention enough of its architecture to give a [1] { placeholder };
context for the other programs, and then discuss its [2] { "\r\n" }
design and features in more detail at the end of this
To write a line, the placeholder was filled in with a
section.
pointer to the buffer, and its length, and a single wri-
tev was done, starting from either element zero or
innd daemon one. While this implementation was clever, and
simpler then what was done elsewhere, it was not
very fast. Innxmit now uses a 16k buffer and only
Remote Feeds does a write when the next line would not fit. This
is also consistent with ideas used throughout the rest
of INN: use the read and write system calls,
referencing the data out of large buffers while avoid-
Local newsreaders ing the copying commonly done by the standard I/O
library.
The ctlinnd program is used to tell the innd
ctlinnd server to perform special tasks. It does this by com-
Figure 2: Innd architecture municating over a UNIX-domain datagram socket.
The socket is behind a firewall directory that is
Innd is a single daemon that receives all mode 770, so that only members of the news
incoming articles, files them, and queues them up for administration group can send messages to it. It is a
propagation. It waits for connections on the NNTP very small program that parses the first parameter in
port. When a connection is received, a getpeer- its command line and converts it to an internal com-
name (2) is done. If the host is not in an access file, mand identifier. This identifier and the remaining
then an nnrpd process is spawned with the connec- parameters are sent to innd which processes the
tion tied to its standard input and output.3 It is worth command, and sends back a formatted reply. For
noting that nnrpd is only about 3,500 lines of code, example, the following commands stops the server
and 20% of them are for the ``POST'' command, from accepting any new connections, adds a news-
used to verify the headers in a user's article. Nnrpd group, and then tells it to recompute the list of hosts
provides all NNTP commands except for ``IHAVE'' that are authorized newsfeeds:
and an incomplete version of ``NEWNEWS''. On ctlinnd pause "Clearing out log files"
the other hand, it does provide extensions for ctlinnd newgroup comp.sources.unix m \
pattern-matching on an article header and listing vixie@pa.dec.com
exactly what articles are in a group. The NNTP pro- ctlinnd reload newsfeeds "Added OSF feed"
tocol seems to be a good example of the UNIX philo- ctlinnd go ""
sophy: it is small, general, and powerful and can be The text arguments are sent to syslog (8) for audit
implemented in a very small program. purposes.
Articles are usually forwarded by having innd The most commonly-used ctlinnd command is
record the article in a ``batchfile'' which is pro- ``flush.'' This directs the server to close the
cessed by another program. For Internet sites, the batchfile that is open for a site, and is typically used
innxmit program is used to offer articles to the host as follows:
specified on its command line.4 The input to innxmit
is a set of lines containing a pathname to the article mv batchfile batchfile.work
and its Message-ID. Since the Message-ID is in the ctlinnd flush sitename
batchfile, innxmit does not have to open the article innxmit sitename batchfile.work
and scan it before offering the article to the remote
The flush command points out another differ-
site. This can give significant savings if the remote
ence between INN and other Usenet software. The
site already has a percentage of the articles.
B News inews program needed no external locking
files were opened and closed for a very short win-
dow, the time needed to process one article. The C
3Unlike other implementations, no single INN program News relaynews could be running for a longer period
implements the entire NNTP protocol. of time. The only way to get access to a batchfile is
4Other programs, like nntplink, are supported but not
to either lock the entire news system, which is over-
part of the INN distribution. kill for the desired task, or to rename the file and
Summer '92 USENIX June 8-June 12, 1992 San Antonio, TX 95
InterNetNews ... Salz
wait until the original name shows up again. The ng->Sites[i]->Sendit = TRUE;
INN approach is more efficient and conceptually }
cleaner. }
for (sp = Sites; sp < LastSite; sp++) {
Innd Structure if (!sp->Sendit)
When innd starts up it reads the active file into continue;
memory. An array of NEWSGROUP structures is for (p = sp->FileFlags; *p; p++)
switch (*p) {
created, one for each newsgroup, that contains the
case 'm':
following elements: /* Write Message-ID */
char *Name; /* "comp.sources.unix */ case 'n':
char *Dir; /* "comp/sources/unix/" */ /* Write filename */
long Last; /* 0211 */ ...
int LastWidth; /* 5 */
}
char *LastString; /* "00211..." */
char *Rest; /* "m\n..." */ }
int SiteCount; /* 1 */ The ARTDATA structure contains information about
SITE **Sites; /* defined below */
the current article such as its size, the host that sent
The C comments above show the data that would it, and so on. The MeetsSiteCriteria function is an
generated for the following line in the active file: abstraction for the in-line tests that are done to see if
comp.sources.unix 00211 00202 m an article really should be propagated to a site (e.g.,
checking the Path header as described above).
The Last field specifies the name to be given to the AssignName is described below.
next article in the group. The LastString element
At its core, innd is an I/O scheduler that makes
points into the in-memory copy of the file. This
callbacks when select (2) has determined that there is
number is carefully formatted so that the file can be
activity on a descriptor. This is encapsulated in the
memory-mapped, or updated with a single write.
CHANNEL structure, which has the following ele-
A hash table into the structure array is built, ments:
using a function provided by Chris Torek [Torek91].
The hash calculation is very simple, yet empirically enum TYPE Type;
it gives near-uniform distribution. The secondary enum STATE State;
key is the highest article number, so groups with the int fd;
most traffic tend to be at the top of the bucket. FUNCPTR Reader;
FUNCPTR WriteDone;
The INN equivalent of the sys file read next. BUFFER In;
An array of SITE structures is created, one for each BUFFER Out;
site, that contains the following elements:
BOOL Sendit; The Type field is used for internal consistency
char FileFlags[10]; checks. There four different types of channels
local-accept, remote-accept, local-control (used by
The FileFlags array specifies what information ctlinnd) and NNTP connection. Each type is imple-
should be written to the site's data stream when it mented in anywhere from 100 to 1200 lines of code.
receives an article. The subscription list for the site The Reader and WriteDone function pointers, and
is then parsed, and for all the newsgroup that it the State enumeration are used for protocol-specific
recives, the matching NEWSGROUP structure will data. For example, State field is used by the NNTP
contain a pointer to the SITE structure. channel code to determine whether the site is send-
Using these two structures it is easy to step ing an NNTP command or an article. The BUFFER
through how an article is propagated: datatype contains sized reusable I/O buffers that
extern ARTDATA *art; grow as needed.
extern SITE *Sites, *LastSite; At start time innd calls getdtablesize (2) to
extern int nSites; create an array of channels that can be directly
char **pp; indexed by descriptor.
SITE *sp;
NEWSGROUP *ng; The code to listen on the NNTP port is show in
int i; Figure 3. When a host connects to the NNTP port,
select (2) will report activity on the descriptor and
while (*pp) { call RemoteReader which will accept the connection
ng = HashNewsgroup(*pp++); and possibly create fill in a new CHANNEL out of
if (ng == NULL)
continue;
the resultant descriptor.
AssignName(ng); It took a bit of effort to write the callback loop
for (i = 0; i < ng->nSites; i++) { so that it was fair i.e., so that the lowest descrip-
if (MeetsSiteCritera(ng->Sites[i], art)) tors did not get priority treatment. The problem was
96 Summer '92 USENIX June 8-June 12, 1992 San Antonio, TX
Salz InterNetNews ...
complicated because other parts of the server can This is a very fast way of writing the article to disk;
add and remove themselves from the select (2) read it avoids extra memory copies, and is only possible
and write mask, as needed. because the entire article is kept in memory until the
Once the NNTP channel has been created for a last moment.
site, the server is ready to accept articles from that Future work
site. The reader function for NNTP channels reads
RFC 977 follows the SMTP protocol for send-
as much data as is available from the descriptor. If
ing text: line are terminated with \r\n, a period is
it is in ``get a command'' state, it looks for a simple
placed before all lines starting with a period, and
\r\n terminator; if it is in ``reading an article'' state,
data is terminated with a line consisting of a single
it looks for a ``\r\n.\r\n'' terminator. If not found, it
period [Postel82]. Innd must scan the text of all
just returns; the data will become available at some
articles it receives and convert them to standard
point. If the terminator is found, it processes the
UNIX format. On the transmission side, innxmit must
data in the buffer. For filing an article, this means
read the articles a line at a time in order to add the
cleaning up the NNTP text escapes, and calling the
extra data. If all newsreading is done via NNTP,
article abstraction to process the article.
then articles could be stored directly in NNTP for-
Processing the article is the largest single rou-
mat, and innxmit could read and write the article in
tine in the server. The AssignName shown above
two system calls. The innd gains would not be as
increments the high-water mark for the newsgroup.
dramatic, but tests show it would still be somewhat
If the article has already been written to disk, a link
measurable.
is made under the new name. (Symbolic links, if
There is no NNTP ``TURN'' command, so that
available, can be used if the spool area spans parti-
a single connection cannot be used for bidirectional
tions.) If the article has not been written, a
article transfer. Turn-around is very successful on
struct iovec array is set up as shown below, where
vertical bars separate each iovec element:UUCP over conventional phone lines, but seems of
First headers... limited use on higher-performance network links.
Path: |LOCAL_PATH_PREFIX|rest of path... The SMTP protocol has had a ``TURN'' command
Second headers... since its inception, but it has received no practical
|XREF_HEADER| use. Several people find the idea of adding outgoing
transfer to innd attractive, since it is already struc-
|Article body. tured for multi-host I/O and the idea of caching
int
RemoteReader(cp)
CHANNEL *cp;
{
int newfd;
struct sockadr_in remote;
int remsize;
newfd = accept(cp->fd, &remote, &remsize);
if (InAccessFile(remsocket))
CHANcreate(newfd, TYPEnntp, STATEgetcmd, NCreader, NCwritedone);
else {
ForkAndExec("nnrpd", newfd);
close(newfd);
}
}
int
RemoteSetup()
{
int fd;
fd = GetSocketBoundToNNTPPort();
CHANcreate(fd, TYPEremote, STATElisten, RemoteReader, NULL);
}
Figure 3: Listening on an NNTP port
Summer '92 USENIX June 8-June 12, 1992 San Antonio, TX 97
InterNetNews ... Salz
recent articles in memory has its appeal. Adding [Adams92] Rick Adams, Total traffic through uunet
outgoing transfer to innd would take a moderate for the last 2 weeks, Usenet message
effort.