Tags: applicative language, avoidance, bind, complement, computations, concurrent language, execution, hinges, implementation, interpreter, main loop, murray hill new jersey, rob pike, scheduler, squint, storage, synchronous channels, t bell laboratories, tion, value semantics,
The Implementation of Newsqueak
Rob Pike
AT&T Bell Laboratories
Murray Hill, New Jersey 07974
ABSTRACT
The implementation of the concurrent applicative language Newsqueak has several
unusual features. The interpreter, squint, uses a copy-on-write scheme to manage stor-
age honoring Newsqueak's strictly applicative (by-value) semantics for data. There is no
explicit scheduler. Instead, the execution of processes is interleaved very finely, but ran-
domly, by an efficient scheme that integrates process switching into the interpreter's main
loop. The implementation of select, the non-deterministic, multi-way communica-
tions operator, exploits details in the implementation of processes.
This paper describes much of the interpreter but explains only small aspects of the
language. Further detail about the language may be found in the references.
Storage
Since the management of storage is central to the implementation of any language, it is a good start-
ing point for the description of the Newsqueak interpreter. But it is especially pertinent for Newsqueak
because the design of the language hinges on its strictly by-value (applicative) management of data.
Concurrent and applicative programming complement each other. The ability to send messages on
channels provides I/O without side effects, while the avoidance of shared data helps keep concurrent pro-
cesses from colliding. Newsqueak is an applicative concurrent language based on the concurrent composi-
tion of processes communicating on synchronous channels [Pike 89, McIl 89]. Two computations share a
value only if they share a variable. All assignment of values to objects -- including function return, bind-
ing of parameters to functions and processes, and passing values on channels -- is done by making a copy
of the assigned value. Although Newsqueak permits global variables, programming without them is conve-
nient and implicitly encouraged. The language promotes independent functions and processes operating in
environments defined exclusively by their parameters. Since those parameters may include communication
channels, the language feels considerably different from traditional applicative languages.
Newsqueak does not require that its implementation copy values in an assignment, but the behavior
must be as though the value were copied. The Newsqueak interpreter, squint, combines reference-count
garbage collection with a lazy copying scheme to defer copying as long as possible. The result is similar to
copy-on-write page management schemes in operating systems.
-2-
Reference counting was chosen because it is easy to implement [Knuth 73]. It was (correctly)
expected that storage management would not dominate performance. Moreover, when the language was
being designed there were several candidates for the CPU it would eventually be run on, so it seemed pru-
dent to use simple, inherently portable techniques.
Each object in Newsqueak is represented using a single word containing the address of a data struc-
ture describing the object, except for integers, which are just stored in the word. (The language itself has no
pointer types, but the implementation uses pointers extensively.) The data structure begins with a header
that includes a reference count and a description of the size and layout of the object, which simplifies copy-
ing the structures. (The static type system requires no run-time checking of the type of objects, but the
interpreter uses a single routine to copy all objects.) The data contained in arrays of characters and integers
are stored immediately after the header. Arrays of non-integral objects are represented recursively by stor-
ing an array of pointers to the component objects after the initial header. Records, defined using the
struct keyword, are stored using a hybrid scheme; a bit array at the beginning of the data part of the
object indicates which elements of the structure are represented by pointers.
An object is freed (collected) when the number of references to it goes to zero, which can occur when
a link to the object is broken by assignment or when a variable pointing to the object is released at the end
of an executing block. When an object is freed, pointers contained within it are followed and their refer-
ence counts are decremented. When these counts go to zero, the algorithm continues recursively.
When an object is assigned to a variable, its reference count is increased. However, when a compo-
nent of an object is updated, the resulting assignment may be done in place only if the containing object has
a reference count of one. If not, the object must first be copied. Consider the isolated assignments
A = somearray
B = A
A[i] = p.
After the second assignment, the array pointed to by A and B has a reference count of at least two. To
assign p to A[i], A is copied, the old array's reference count is decremented (A no longer points to it, but
B still does), all its component objects' reference counts are incremented (the new array points to the same
components), and A is made to point to the new object. This new object has a reference count of one. The
original object pointed to by A[i] and B[i] has its reference count decremented (after assignment A[i]
will not point to it but B[i] will), and A[i] may finally be pointed at p, whose reference count incre-
ments. All these operations occur as a single atomic action. Care must be taken in the implementation to
guarantee that assignments such as
A[i] = A[i]
work properly, even when disguised, for example by communication on a channel. But the method is not
hard to implement, and is beneficial for many common cases such as passing an array to a function that
examines it but does not change it.
-3-
Processes and scheduling
The unusual handling of processes and scheduling in squint is most easily approached the same
way it was developed: by successive refinement of a basic interpreter. Many interpreters represent the tar-
get program as an array of function pointers, each of which represents pseudo-instructions in a simulated
CPU, and use a program counter to step through those functions. The end of the program is indicated by a
pseudo-instruction that returns zero (false); non-terminal pseudo-instructions return one (true). The code in
C looks like:
typedef int (*Inst)(void);
Inst *pc;
Inst program[];
compile();
pc=program;
while((**pc++)())
;
As well, there are often some global pseudo-registers, such as a stack pointer, and some associated data,
such as a stack. These global variables are manipulated by the pseudo-instructions to push variables on the
stack and perform other low-level operations.
Newsqueak needs processes. Since the implementation is a single (real) process in a C program, we
need to simulate processes by interleaving the execution of various Newsqueak programs in a single
instance of the basic loop. When a (simulated) process is not actively executing, its state can be held in a
data structure such as -- schematically at least --
typedef struct{
Inst *pc; /* program counter */
int *sp; /* stack pointer */
int stack[N]; /* stack storage */
}Proc;
A typical operating systems approach at this point would be to use the Proc structure to hold the pc and
sp of a suspended process and to swap them with the global variables when the process is enabled again.
(The stack storage in the Proc structure could be used as is, without copying.) The execution loop might
then become
while(a process can run){
while(randomtimer()!=0 && (**pc++)())
;
swap(schedule());
}
where schedule selects a process (i.e., a Proc pointer) to run and swap exchanges the globals with the
named process. Again, this is a CPU-like model; the timer represents some sort of clock that drives the
-4-
preemptive scheduling algorithm. It might be implemented by having a software interrupt set a flag, or just
by running a counter.
Of course, these processes should be communicating, so some scheduling can be tied to communica-
tion. For example, when a sending process, say A, detects (through a data structure representing a channel,
which will be described in the next section) that another process, say B, is ready to receive its message, A
can execute
swap(B)
thereby passing the flow of control to the receiving process. In general, though, this approach does not
obviate the need for preemptive scheduling. If a process does not communicate often, it may never get run.
Worse, tying the scheduling to communication makes the system very deterministic. Concurrent lan-
guages, particularly those originating with Hoare's Communicating Sequential Processes (CSP) [Hoare 78],
have a tradition of non-determinism derived from a combination of Dijkstra's guarded commands [Dijk 76]
and distributed computation. Non-determinism also has the advantage of avoiding certain classes of live-
lock that can occur when communicating with a chatty process. Newsqueak therefore should be non-
deterministic when scheduling and when choosing between multiple potential communicators. To provide
non-deterministic scheduling without interrupting timers, we need a different structure for the interpreter
loop.
Squint has no scheduler and no global program counter or stack pointer. Instead, the state of all
processes is described only by the Proc structures, using a model related to hardware microtasking [Thack
79] and the HUB miniature operating system [ODell 87] [Mas 76]. Rather than running the pseudo-
instructions by executing
(**pc++)()
squint executes
(**proc->pc++)(proc)
where proc points to the head of a queue of active processes. Each pseudo-instruction needs the Proc
pointer of the current process to access the appropriate stack and registers. These indirections may cost
some execution time, of course, but the hope is to gain some back by not having to save and restore process
state when scheduling. By not saving state when scheduling, all that is required is to change proc to run
the new process. That is inexpensive enough to do after every pseudo-instruction, if we can cycle through
the process queue cheaply. Given a process queue and a next pointer in each Proc the code becomes:
-5-
typedef int (*Inst)(Proc*);
Proc *proc; /* head of process queue */
Proc *ptail; /* tail of process queue */
while(proc){
while((**proc->pc++)(proc)){
ptail->next = proc;
ptail = proc; /* append proc to tail */
proc = proc->next; /* delete proc from head */
}
}
The queue manipulation succeeds even if the queue has only one process. Nonetheless, in the common
case that only a single process is running, we can improve the loop by leaving the queue alone if proc
equals ptail:
while(proc){
while((**proc->pc++)(proc)){
if(proc != ptail){
ptail->next = proc;
ptail = proc; /* append proc to tail */
proc = proc->next; /* delete proc from head */
}
}
}
If only one process is running, the scheduling overhead is one comparison per pseudo-instruction, plus the
cost of accessing the simulated registers through a pointer (which may be negligible; it depends on the
architecture of the real CPU). If several processes are active, though, proc is not equal to ptail and
each execution of the loop accesses several global variables. We can do better by amortizing the cost over
several instructions, by scheduling less often. By being statistical rather than absolute in deciding how
often, we have an opportunity to introduce non-determinism into the scheduler.
To randomize the interleaving of the processes, we need an extremely cheap way to decide how to
interleave. The first requirement is a cheap random number generator. It does not have to be good, it just
needs to be good enough that programs cannot exploit any correlations. The following generator, courtesy
of Jim Reeds, uses a 31-bit linear feedback shift register to derive a random enough number in a handful of
minor instructions on most 32-bit computers:
-6-
long x = 0xFFFFFFFFL;
x += x;
if(x < 0)
x ^= 0x88888EEFL;
n = x&MASK;
(The & operator is bitwise and; ^ is bitwise exclusive or.) The resulting n is a random number between 0
and MASK; MASK is 15 in squint. With the generator in the loop, the result is:
while(proc){
x += x;
if(x < 0)
x ^= 0X88888EEFL;
n = x&MASK;
while(--n>=0 && (**proc->pc++)(proc))
;
if(proc != ptail){
ptail->next = proc;
ptail = proc; /* append proc to tail */
proc = proc->next; /* delete proc from head */
}
}
With x and n in registers, this loop runs insignificantly slower than the non-random one on a VAX-11 with
a single process, and about 40% faster with several processes. It also offers non-determinism and requires
no timer. The only remaining problem is to work in the scheduling requirements of inter-process communi-
cation, the subject of the next section.
Newsqueak provides a process creation operator (begin) but no explicit process destruction opera-
tor. When a process is instantiated, it is wrapped in an envelope that converts what would be its top-level
function return into the sequence that releases the function's resources (by decrementing reference counts
of the top-level data structures) and removes the process from the run queue. Garbage collection does the
rest.
Communication
Consider a process S sending an integer i to a process R, using a channel c. S executes
c