Tags: argonne national laboratory, clusters, computer science division, contexts, convergence, cused, different concepts, dynamic processes, ewing lusk, feature comparisons, feature solutions, match, mathematics, mcs, mpi group, origins, pvm group, sys, unspoken assumption, william gropp,
Goals Guiding Design: PVM and MPI
William Gropp Ewing Lusk
gropp@mcs.anl.gov lusk@mcs.anl.gov
Mathematics and Computer Science Division
Argonne National Laboratory
Abstract The most obvious is that some convergence has recently
taken place in the functionality offered by the two systems
PVM and MPI, two systems for programming clusters, (e.g., dynamic processes in MPI, static groups and mes-
are often compared. The comparisons usually start with the sage contexts in PVM), and the very different approaches
unspoken assumption that PVM and MPI represent differ- taken in these extensions merit comment. Equally impor-
ent solutions to the same problem. In this paper we show tant, however, is the fact that previous analyses have fo-
that, in fact, the two systems often are solving different cused on local, feature-by-feature comparisons, describing
problems. In cases where the problems do match but the similarities as well as differences. Such feature-by-feature
solutions chosen by PVM and MPI are different, we explain comparisons can be misleading, particularly when the two
the reasons for the differences. Usually such differences can systems use the same word for different concepts. For ex-
be traced to explicit differences in the goals of the two sys- ample, an MPI group and a PVM group are really quite
tems, their origins, or the relationship between their speci- different objects, although they have superficial similari-
fications and their implementations. For example, we show ties (e.g., in MPI, sources and destinations are relative to
that the requirement for portability and performance across a group, while in PVM sources and destinations are always
many platforms caused MPI to choose approaches differ- absolute in terms of the "task ids").
ent from those made by PVM, which is able to exploit the We prefer to analyze the differences in PVM and MPI by
similarities of network-connected systems. looking first at sources of these differences. The structure of
this paper is as follows. In Section 2 we review the explicit
design goals of the MPI Forum. In Section 3, we review the
similarities between PVM and MPI, leading us in Section 4
1 Introduction to discuss the consequences of separating implementation
from design. In Sections 5, 6, 7, and 8, we show how these
The emergence of the cluster as a viable parallel comput- sources have influenced differences between PVM and MPI
ing platform, even scaled into the supercomputer range, has in the areas of dynamic processes, contexts, nonblocking
been enabled by the simultaneous emergence of message- operations, and portability, respectively. In Section 9 we
passing libraries that have made it possible to map parallel focus on those aspects of MPI that go beyond the message-
algorithms onto them in a portable way. PVM and MPI have passing model. This paper expands an earlier version [16].
been the most successful of such libraries. Among the additions here are discussions of parallel I/O,
PVM [10] and MPI [19, 20] are both specifications1 the safety of contexts, and a subtle performance issue in
for message-passing libraries that can be used for writing multiparty communications.
portable parallel programs. Recent books on building clus-
ters, both for Linux [26] and for Windows [27], contain 2 MPI's Goals
chapters on using both MPI and PVM. Since there are freely
available versions of each, users have a choice, and begin- Rather than go through each specification feature by fea-
ning users in particular can be confused by their superficial ture, we will discuss some of the explicit design goals that
similarities. Several comparisons of PVM and MPI have were established by the MPI Forum before it undertook to
been carried out since the mid-1990s [18, 17, 12, 23, 16]. specify the details. In many cases these goals dictated de-
We consider it worthwhile to do so again for two reasons. tails of the specification (such as the contents of individual
1 We treat the Oak Ridge version of PVM as represented by [5, 11] as function parameter lists). Where these details differ from
the PVM specification. MPI is represented by the MPI-2 specification. the corresponding details in PVM, out goal-oriented ap-
proach will elucidate the sources of the differences. In addi- module to a subset of processes is needed by domain
tion to differences in explicit goals, we will note a few dif- decomposition methods and for multidisciplinary ap-
ferences more attributable to the origin of the two systems. plications. Hence, process source/destination must be
PVM was the effort of a single research group, allowing it specified by rank in a group rather than by an absolute
great flexibility in design and also enabling it to respond in- identifier, and context must not be a visible value (see
crementally to the experiences of a large user community. Section 6). Some other implications of modularity are
Moreover, the implementation team was the same as the described below.
design team, so design and implementation could interact
quickly. In contrast, MPI was designed by the MPI Forum · MPI would be extensible to meet future needs and de-
(a diverse collection of implementors, library writers, and velopments. This requirement led to an object-oriented
end users) quite independently of any specific implemen- approach without a commitment to an object-oriented
tation but with the expectation that all of the participating language. This approach required functions to manip-
vendors would implement it. Hence, all functionality had ulate the objects, and was one minor reason for the rel-
to be negotiated among the users and a wide range of im- atively large number of functions in MPI (large here
plementors, each of whom had a quite different implemen- is relative to C and Fortran programs; C++ and Java
tation environment in mind. programmers are used to large numbers of functions).
The first task of the MPI Forum was to define the goals
· MPI would support heterogeneous computing (the
that would guide its subsequent discussions. Some of these
MPI Datatype object allows implementations to be
goals (and some of their implications) were the following:
heterogeneous), although it would not require that all
implementations be heterogeneous.
· MPI would be a library for writing application pro-
grams, not a distributed operating system. This goal · MPI would require well-defined behavior (no race con-
has implications for resource management issues, as ditions or avoidable implementation-specific behav-
discussed in Section 5. ior).
· MPI would not mandate thread-safe implementations,
For simplicity, the MPI Forum sought to make each ap-
but its specification would allow them. Thread safety
proach solve as many of these goals as possible. For ex-
implies that there can be no notion of a "current"
ample, datatypes solve both heterogeneity and noncontigu-
buffer, message, error code, and so on. As the "nodes"
ous data layouts, both for messages and for files. Similarly,
in the network become symmetric multiprocessors,
communicators combine both process groups with commu-
thread safety becomes increasingly important in a het-
nications contexts.
erogeneous, networked environment.2 Recent experi-
The MPI standard has been widely implemented and is
ences from vendor implementations of a thread-safe
used nearly everywhere, attesting to the extent to which
MPI (in particular, the IBM implementation [30]) con-
these goals were achieved. See [15] for a discussion of
firm that the MPI design is thread-safe.
the importance of these goals to the success of MPI (or any
· MPI would be capable of delivering high performance method for parallel programming).
on high-performance systems. Hence, no memory PVM had, with the exception of support for hetero-
copies would be mandated by the design. Scalability, geneous computing and a different approach to extensi-
combined with correctness, for collective operations bility, different goals. In particular, PVM was aimed at
required that groups be "static". An open research providing a portable, heterogeneous environment for us-
problem is finding semantic definitions and appropri- ing clusters of machines using socket communications over
ate algorithms that allow dynamic groups to meet these TCP/IP as a parallel computer. Because of PVM's focus on
same requirements. socket-based communication between loosely-coupled sys-
tems, PVM places a greater emphasis on providing a dis-
· MPI would be modular, to accelerate the development tributed computing environment and on handling commu-
of portable parallel libraries. Modularity has many im- nication failures.
plications. For example, all references must be rela-
tive to a module, not the entire program. Consider a
module that solves a system of linear equations on an
3 What is Not Different?
arbitrary subset of processes; the ability to restrict the
2 There
Despite their differences, PVM and MPI certainly have
is a project to join threads with PVM (TPVM [9]), but this is
more a lightweight process model than a fully threaded model and, as such,
features in common. In this section, we review some of
does not offer as rich a programming model as a fully thread-safe model the similarities and, in the process, correct some common
would. misconceptions about the MPI specification. In most cases
2
these misconceptions arise because of confusion between pect. Specific implementations can easily define their indi-
specification and implementation. vidual handling of errors. Thus, most MPI implementations
Both PVM and MPI are portable; the specification of do not simply abort when an error is detected; just as the
each is machine independent, and implementations are PVM implementation does, they attempt to provide a useful
available for a wide variety of machines, particularly those error indication and allow the user to continue. Specifically,
likely to appear in clusters. in any system, there are recoverable and nonrecoverable er-
Once a system is portable, the issue of homogeneity can rors. An example of a recoverable error is an illegal argu-
be addressed. Can two processes on different machine ar- ment to a routine, such as a null-pointer or an out-of-range
chitectures communicate with one another despite differ- value. A nonrecoverable error is one where the program
ences in byte ordering in memory or even word length? may not be able to continue. In many applications, access-
To this end PVM provides the pvm pack/unpack func- ing an invalid address or attempting to execute an invalid or
tions and the datatype arguments to pvm send/recv; privileged instruction is nonrecoverable. The MPI standard
MPI does the same with its more general MPI Datatype does not specify which errors are recoverable, though there
argument to many routines. Of course, some implemen- has been some discussion in this direction. This is an exam-
tations of MPI, particularly those from hardware vendors, ple of the determination of the MPI Forum to maintain max-
may not be used in a heterogeneous environment, but the imum portability--mandating any specific behavior would
MPI specification is designed to encourage heterogeneous limit the portability of MPI. Note that even for PVM, some
implementations, and both the MPICH [13] and LAM [2] systems provide a less "recoverable" environment than oth-
implementations support heterogeneous environements. ers. For example, systems with proprietary interconnects
Both MPI and PVM permit different processes of a par- may kill all processes when any one exits.
allel program to execute different executable binary files. Another source of confusion involves features of a par-
(This would be required in a heterogeneous implementa- ticular implementation that are exposed to the programmer.
tion, in any case.) That is, both PVM and MPI support Consider the pvm reg tasker routine that allows a pro-
MIMD programs as well as SPMD programs, although cess to indicate to PVM that it, rather than fork/exec,
again some implementations may not do so, and launch- should be used to start tasks. This is an powerful hook to
ing MIMD programs may be less convenient than launching allow extension of the PVM implementation by special ap-
SPMD programs. Both MPICH and LAM support MIMD plications, such as debugger servers and batch schedulers.
programming. MPI, as a standard, has no such object, but specific MPI
A final issue is that of interoperability. This term refers implementations can and do provide similar services; for
to the possibility of communicating among processes linked example, the MPICH implementation of MPI provides a
with two completely different implementations. We discuss process startup hook used by the TotalView [29] debugger.
this issue, and provide further comments on portability and The MPI standard does not specify how implementations
heterogeneity, in Section 8. are to provide this service; as a standard, it should not. At
In summary, both MPI and PVM are systems designed to the same time, the experience with TotalView has defined
provide users with libraries for writing portable, heteroge- an interface that MPI implementations (not just MPICH)
neous, MIMD programs. In comparing issues, one must not can use, allowing any debugger to access this information
confuse the MPI specification with a particular implemen- [4]. We note that some PVM implementations for mas-
tation subcase, such as the ch p4 device of MPICH, which sively parallel processors (MPPs) also do not provide the
is widely used on clusters but does not define MPI. pvm reg tasker routine. This is an example of the free-
dom of PVM to provide features only in some environ-
4 Implementation and Definition ments. As a standard, MPI does not have that freedom. If
the MPI standard had mandated such a routine, any MPI
One common confusion in comparing MPI with PVM implementation would have to provide it. Instead, MPI's
comes from comparing the specification of MPI with the explicit goals mandated that it choose portability over cer-
implementation of PVM. Standards specifications tend to tain kinds of functionality.
specify the minimum level of compliance, while any im- When we compare implementations rather than an im-
plementation offers more functionality. In the MPI Forum, plementation of PVM with the MPI standard, the gap in this
many such "added-value" features are listed as expected of type of functionality narrows. For example, MPICH [13],
a "high-quality implementation". rather than MPI, does provide a way for debuggers like To-
Error handling and recovery are a good example. Stan- talView to access to internal MPICH state on the message
dards tend not to mandate specific behavior on errors, other queues. Many users want this information, but it raises an
than to list error indicator values. The expectation is that interesting issue: How does one define a standard for the in-
high-quality implementations will give users what they ex- ternal state of an implementation? For any implementation
3
this can be done, but different implementations may have tem. But it is well within the capabilities of advanced re-
different internal states. For example, one optimization for source management systems. How should a parallel com-
communication has the process issuing an MPI RECV send puting system interface with such a system? The choices
a message to the expected source of the message, allow- are (a) pick a small subset that all systems can support,
ing the sender to deliver the message directly into the re- (b) define a general and generic, but fully expressive, sys-
ceiver's memory [21]. Should this information be presented tem, or (c) provide an interface that allows information to
to the user? Other implementation choices might elimi- be passed, in an implementation-specific manner, to the re-
nate some queues altogether or make it more difficult to source system.
find all pending communication operations; in fact, in the PVM chose (a)3 ; this is the most convenient form for
MPICH implementation, there is no send queue unless the many users, particularly if the default choices are adequate.
system has been configured and built to support the mes- More demanding users want (b); this gives them the max-
sage queue service. By not specifying a model of the inter- imum portability without sacrificing too much expressiv-
nals of an MPI implementation, such as defining a "message ity. Unfortunately, (b) has two drawbacks--it isn't exten-
queue" does, the MPI standard allows MPI implementations sible, and it assumes that there is a well-defined interface
to make tradeoffs between the performance and functional- that users agree on.4 These drawbacks led the MPI Forum,
ity that the users want. which spent a great deal of time trying to find a solution like
(b), to choose (c). In MPI, this is the "info" argument to an
5 Dynamic Processes MPI Comm spawn command:
MPI_Comm_spawn(worker_program,
One way to understand the differences between PVM MPI_ARGV_NULL,
and MPI is to look at the MPI features for creating and at- universe_size-1,
taching to processes. While the two approaches may seem info_for_resource_manager, 0,
similar, they are actually quite different. Perhaps the great- MPI_COMM_SELF, &everyone,
est difference is in the handling of resource information that MPI_ERRCODES_IGNORE);
is used to determine where to create the new process. This
reflects a difference in the approach to providing distributed Just like filenames, the specific contents of "info" depend on
operating system support by PVM and MPI. PVM, through the implementation. MPI specifies a few predefined items,
its virtual machine (implemented as the PVM daemons) such as working directory and architecture. Other infor-
provides a simple yet useful distributed operating system. mation can be passed directly to the local resource man-
Special interfaces, such as the pvm reg tasker, allow ager. For example, an MPI implementation could provide
the PVM system to interface with other resource manage- a way to pass the above example to the resource manager.
ment systems. MPI does not mandate or define a virtual MPI implementations are required to ignore unrecognized
machine, even in MPI-2. Rather, it provides a way, through fields; this strategy encourages users to provide extra infor-
a new MPI object (MPI Info), to communicate with what- mation when possible. Note that the MPI Info object is
ever mechanism is providing distributed operating system also used in the file I/O section of MPI-2 to provide per-
services. That mechanism may well be a parallel virtual ma- formance hints. This is another example of MPI using the
chine; several implementations already use distributed dae- same feature to solve multiple goals.
mons to start and manage MPI jobs. But we emphasize that Another difference between MPI and PVM shows up in
daemons are not required by the MPI specification. This the presence of pvm config and the lack of an MPI equiv-
feature is important for extreme-scale architectures, where alent. The pvm config function provides information on
the very existence of local daemons may be impractical. the virtual machine. This information can be used by the
To understand the difference, consider the resources that programmer to attempt to manage resources directly, for ex-
an application may want to specify when creating a new ample, by specifying particular hosts in pvm spawn. Why
process: doesn't MPI provide a similar function?
The problem is that the information that any command
Any system that can run an RS/6000, AIX 4.y can provide on the environment is immediately out of date.
(y 3) executable, with 4 memory banks and at For example, even in PVM, between the time pvm config
least 512 MB of memory, 400 MB of /tmp, and
3 PVM-aware resource managers such as Condor and LoadLeveler can
a load of < 2, and is able to run for 48 hours, with
provide more complex services, but this is outside of the PVM program
access to /home/me and the runtime libraries for itself and is specific to the particular resource manager in use. Portable
xlf version 3.4.5 or 3.4.6 but not 3.4.7 or 3.4.4. PVM programs cannot rely on such services.
4 Several systems are specific to particular resource managers such as
Such a specification is complicated, and probably beyond LoadLeveler and LSF (Load Sharing Facility), but there is no consensus
what would be expected from a parallel programming sys- on which of these, or which combination of features, should be adopted.
4
is called and pvm spawn is called, another PVM applica- 6 Contexts
tion may have executed pvm delhosts, thus invalidating
the information provided by pvm config. As the num- Writing parallel programs is notoriously difficult. One
ber of items grows larger and more complex, the likelihood solution is to accelerate the development of parallel li-
that some critical item will be out of date increases (con- braries, with the expectation that end users will access par-
sider space in /tmp or load average). In the PVM case, the allelism through libraries rather than by invoking message-
impact of this problem is somewhat mitigated by the fact passing functions directly. Thus an original goal of the MPI
that each user has a personal parallel virtual machine. Of design was to provide the functionality needed by libraries
course, a single user may have multiple parallel jobs run- and missing in most message-passing systems of the time.
ning at the same time (e.g., under the control of a system to
The single greatest impediment to the use of parallel li-
explore a parameter space), so the problem is not eliminated
braries has been the lack of modularity. In its simplest form,
by providing single user virtual machines.
this impediment manifests itself when a message sent by a
library is received unexpectedly by either user code or an-
The MPI Forum discussed this situation at great length
other library. The solution lies in contexts [8]. (Readers not
but could find no workable solution. This is an example
familiar with the notion of context should see the discussion
of a "race condition," a situation in which the user is in a
of contexts in Section 2.3 of [14].)
race with other users and the system and where the "ex-
pected" behavior depends on the user's winning the race. The treatment of contexts illustrates how a combination
It is also another example of the tradeoff in user conve- of features can affect future enhancements. Following MPI,
nience and precise system behavior. Naturally, one would PVM 3.4 adds contexts; unlike MPI, these are user-visible
like to perform the operations PVM provides. But one can- integers that may be sent from process to process and oth-
not guarantee that the resources described will exist when a erwise manipulated by the user. They are also guaranteed
process is created. to be globally unique; PVM can ensure uniqueness because
there is a single virtual machine. MPI's contexts are opaque
Hence, the MPI Comm spawn call combines process and defined only by their effect in MPI operations; while
creation with information on the needed resources. Com- a simple implementation could make them globally unique,
bining operations is a classic approach for solving race con- that is not required (and, for scalability reasons, may not be
ditions, and this solution is used in many places in MPI. desirable).
Eliminating race conditions makes many operations in MPI Consider the case of two parallel programs that wish to
collective. Note that the PVM 3.4 pvm newcontext [5] connect to each other. Both MPI and PVM provide a way to
presents a race condition in the delivery of the new context do this. But the PVM approach requires that both programs
value to other processes; MPI solves this problem by mak- belong to a single PVM virtual machine. The decision to
ing context creation collective over all processes that will make the PVM context a visible, explicit integer means that
use the context. Note that the race is removed by this ap- programs belonging to different PVMs cannot safely con-
proach, not just moved into the MPI implementation. nect, because they may already have the same "unique" con-
text id. It also means that different PVMs cannot be merged
Because of the presence of such race conditions, MPI into a single PVM, since again previously unique context
also forms the MPI communicator (roughly similar to a integers would no longer be unique. Using an external ser-
PVM group and context) at the same time as creating vice (such as a context value server) to allocate contexts
the processes. For the same reason, MPI provides an simply pushes the problem to a different level without solv-
MPI Comm spawn multiple routine that allows MPI to ing it. In addition, there is the very real issue that users may
create processes for a large collection of different executa- choose to ignore the problems of distributing a visible mes-
bles in a single operation. sage context and pick a fixed value. This can lead to subtle
problems and was one reason that the MPI Forum made the
Another difference in the handling of process creation is context value opaque. The MPI approach sacrifices some
in the use of MPI intercommunicators. An MPI intercom- flexibility (explicit, unique context values) for the extensi-
municator represents two groups of processes that commu- bility offered by a more modular and encapsulated design.
nicate with each other. It is a natural representation for cre- The PVM design is backward-compatible but not as safe.
ated processes: one group represents the children and one
group represents the parents (multiple parents are allowed
in MPI to avoid race conditions). In PVM, created pro- 7 Nonblocking Operations
cesses have only one parent; this reflects PVM's use of the
fork/exec or system spawn model of process creation as Nonblocking operations (e.g., MPI Isend) are often
separate from connecting processes for communication. misunderstood as a "performance" optimization. In fact,
5
these are necessary when constructing any large, com- the error code EAGAIN to indicate that the operation would
plex communication system. They should be distinguished block. This allows careful users to avoid deadlock in their
from asynchronous operations. A nonblocking operation applications. POSIX also defines a form of nonblocking
is simply one that does not block the calling process. An operation even more like the MPI nonblocking operations:
asynchronous operation usually implies that it continues to the aio read, aio write, aio error, aio return,
take place concurrently with other operations. (Note that and aio cancel interface for asynchronous I/O. These
the PVM documentation sometimes uses "asynchronous" routines have a test operation (aio error returns 0 when
where MPI would use "nonblocking" and sometimes uses an operation is complete and EINPROGRESS when not
nonblocking.) complete) and a cancel operation. Asynchronous I/O has
MPI provides an extensive set of nonblocking operations been used for years in large-scale scientific computing; the
(MPI Isend, MPI Irecv, MPI Ibsend, etc.). PVM MPI approach is not unusual.
does not provide nonblocking operations in the MPI sense A more subtle need for nonblocking operations comes
(pvm nbrecv is really what MPI would call a "probe"). from considering the performance of communication pat-
MPI provides such operations not only to allow for over- terns involving more than two processes. Consider four pro-
lapping communication, but also to make it easier to write cesses communicating with the program
portable, correct programs.
MPI_Irecv( ..., nbr1,..., &request[0] );
Consider the program running on two processes
MPI_Irecv( ..., nbr2,..., &request[1] );
shown in Figure 1, in the case where pvm setopt(
MPI_Send( ..., nbr3, ... ); /* 1 */
PvmRoute, PvmRouteDirect ) has been called. MPI_Send( ..., nbr4, ... ); /* 2 */
Does this program work? The answer depends on the size of MPI_Waitall( 2, requests, statuses );
the messages (size), the particular platforms (MPP, work-
station networks, or symmetric multiprocessors), and even This code looks fine but has a subtle problem. If the sends
the environment (e.g., free swap space). For short messages, labeled with the comment /* 1 */ on two processes tar-
the program will almost always work. At some message get the same receiver, then they may suffer a performance
size, on the other hand, it will fail, since the messages must degradation because of limits on how fast any process can
be buffered somewhere outside the program itself; the pro- receive data (for example, limited by network bandwidth).
grams will hang, each waiting for the other to execute the If instead the code was
pvm precv. This may seem unusual, but programs that
process large amounts of data can easily exceed the amount MPI_Irecv( ..., nbr1,..., &request[0] );
of available buffering. MPI_Irecv( ..., nbr2,..., &request[1] );
Again, tradeoff exists between user convenience and pre- MPI_Isend( ..., nbr3, ..., &request[2] );
cise behavior by the interface. MPI is careful to specify MPI_Isend( ..., nbr4, ..., &request[3] );
MPI_Waitall( 4, requests, statuses );
the kind of buffering behavior and to provide alternative
solutions to the problem of writing reliable programs: a
the MPI implementation can send the data for the sends us-
buffered send (MPI Bsend) with a guaranteed amount of
ing request[2] and request[3] at the same time,
(user-controlled) buffering, or nonblocking operations. The
maximizing the use of the available network bandwidth.
degree to which users want such programs to work was
Accomplishing the same efficient use of the network re-
shown by the public reaction to the MPI 1 draft that did not
sources is possible with blocking operations but requires
provide a buffered send; the MPI Forum added the buffered
very careful ordering of operations (and hence much more
send to satisfy this need. See [14] and [25] for a more de-
difficult programming) than in the nonblocking case.
tailed introduction to MPI's handling of buffering.
The MPI Forum attempted to define the conditions when
MPI Send could be safely used (and in fact, most ven- 8 Portability, Heterogeneity, and Interoper-
dors currently document these and provide some control by ability
way of environment variables). Defining such conditions,
however, requires mandating a particular implementation Portability refers to the ability of the same source code to
model. The most obvious model is not scalable in its use be compiled and run on different parallel machines. Hetero-
of memory; more complex models are harder for users to geneity refers to portability to "virtual parallel machines"
work with and further constrain implementations. made up of networks of machines that are physically quite
We note that the Unix socket interface provides a so- different. Interoperability refers to the ability of different
lution much like the MPI nonblocking operations, though implementations of the same specification to exchange mes-
somewhat less convenient for the user. A socket can be set sages. In this section we compare PVM and MPI with re-
so that read or write returns rather than blocking, using spect to these three properties.
6
Process 1 Process 2
pvm_psend( ..., size, ... ) pvm_psend(..., size, ... )
pvm_precv( ) pvm_precv( )
Figure 1. Head-to-head communication
Both PVM and MPI had portability as an original goal. details so that implementations conforming to this standard
As we have seen, MPI's very strict adherence to this prin- can exchange messages. IMPI is now available [3] and sev-
ciple prevented it from having some features desirable on eral vendor implementations exist.
workstation networks precisely because they could not be
implemented in all environments. PVM, defined primarily 9 Beyond Message Passing
by a single implementation for workstation networks, has
more freedom to add features appropriate for that environ-
The evolution of parallel computing has taken us beyond
ment, but at the cost of making some PVM programs not
simple message passing. One area that MPI-2 has devel-
portable to more restrictive environments.
oped is remote-memory operations. These operations sup-
Portability is an underappreciated issue. PVM is consid-
port put, get, and accumulate operations in a "one-sided"
ered by many to be highly portable, and in fact the PVM
manner. Maintaining MPI's commitment to heterogeneity,
group has done an excellent job in providing implemen-
even these analogues of "store into array" are defined to
tations across a wide range of platforms, covering most
operate in a heterogeneous environment. MPI uses MPI
Unix systems and Windows [24]. But the designers of MPI
datatypes and a new MPI object, a "window" (MPI Win),
had to consider running on systems that were neither; in
to provide this capability. Maintaining MPI's commitment
fact, MPI has even been used in embedded systems (see
to performance and scalability as well as adaptability to a
http://www.mc.com). MPI could not assume that any
wide range of environments, MPI-2 introduces a number of
particular operating system support was available; the de-
ways to synchronize access to the shared data areas, includ-
sign of MPI reflects this constraint. Some users have com-
ing support for the bulk synchronous programming (BSP)
plained that MPI does not mandate support for certain Unix
model. These functions have already been implemented by
features, when in fact features such as standard input, pro-
several vendors (HP, Fujitsu, and Cray). PVM provides no
cess creation, and signals are absent in many important,
similar functionality.
non-Unix systems.
Parallel I/O is another area where MPI-2 provides a rich
Support for heterogeneity is provided in both spec- set of performance-oriented operations. As with all MPI op-
ifications. PVM has separate functions to pack spe- erations, these support heterogeneous systems and allow the
cific data types into buffers; MPI uses basic and derived user to choose between forms optimized for a particular sys-
datatypes. The MPI specification does not mandate hetero- tem ("native") or for interoperation with other environments
geneous support, however; that is up to the implementation. and MPI implementations ("external32"). These facilities
LAM [2], CHimP [1], and MPICH [13] are implementa- are fully integrated with MPI's other functions. In PVM's
tions of MPI that can run on heterogeneous networks of case, while there are some projects such as PIOUS [22], no
workstations. integrated parallel I/O capability exists. This situation re-
Interoperability is outside the scope of the user program, flects the differences in the orientation of the two systems:
and entirely up to the implementation. Some vendor im- many of the parallel I/O functions are collective and are best
plementations of PVM are neither heterogeneous nor in- defined in terms of static groups, such as MPI defines. PVM
teroperable with the Oak Ridge version of PVM. The MPI eventually added static groups, but they are not as fully de-
standard does not mandate implementation details, and thus veloped as the groups in MPI, which has a comprehensive
MPI implementations, of which there are many, typically set of operations for manipulating and performing collective
are not interoperable. communication and computation using scalable algorithms.
Thus, "interoperability" of MPI matches that of PVM. MPI datatypes have also proved to be critical in obtaining
Versions of the same implementation (Oak Ridge PVM, high performance in I/O operations [28].
MPICH, or LAM) are interoperable. True interoperability PVM provides more support for fault tolerance and re-
is among completely different implementations, matched at covery by exposing to the programmer some of the prop-
the level of the wire protocol. erties of sockets. MPI does less, in the interest of greater
A separate effort (not part of the MPI Forum) has devel- portability. Fault tolerance in MPI is an important research
oped an "interoperability standard" called IMPI that pro- topic. The work on FT-MPI [6, 7] has shown what can be
vides sufficient standardization for some implementations done if one is willing to change some of the fundamental
7
semantics of the MPI specification. 121128. IEEE Press, April 1992. Also available as LLNL
Technical Report UCRL-JC-109775.
[9] A. J. Ferrari and V. S. Sunderam. TPVM: Distributed con-
10 Conclusion current computing with lightweight processes. In Proceed-
ings of the Fourth IEEE International Symposium on High
In this paper we have focused on a few of the many dif- Performance Distributed Computing, August 24, 1995,
ferences between MPI and PVM. We have shown that the Washington, DC, USA, pages 211218. IEEE Computer So-
differences between MPI and PVM remain profound, de- ciety Press, 1995.
spite some convergence. These differences are accountable [10] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, B. Manchek,
for if one bears in mind their quite different origins and and V. Sunderam. PVM: Parallel Virtual Machine--A User's
Guide and Tutorial for Network Parallel Computing. MIT
goals.
Press, Cambridge, Mass., 1994.
[11] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek,
Acknowledgments and V. Sunderam. PVM 3 Users Guide and Reference man-
ual. Oak Ridge National Laboratory, Oak Ridge, Tennessee
37831, May 1994.
This work was supported by the Mathematical, Infor-
[12] G. A. Geist, J. A. Kohl, and P. M. Papadopoulos. PVM and
mation, and Computational Sciences Division subprogram MPI: A comparison of features. Calculateurs Paralleles,
of the Office of Advanced Scientific Computing Research, 8(2), 1996.
U.S. Department of Energy, under Contract W-31-109-Eng- [13] W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-
38. The authors also thank the referees for their valuable performance, portable implementation of the MPI Message-
comments. Passing Interface standard. Parallel Computing, 22(6):789
828, 1996.
[14] W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable
References Parallel Programming with the Message Passing Interface,
2nd edition. MIT Press, Cambridge, MA, 1999.
[1] R. Alasdair, A. Bruce, J. G. Mills, and A. G. Smith. [15] W. D. Gropp. Learning from the success of MPI. In
CHIMP/MPI user guide. Technical Report EPCC-KTP- B. Monien, V. K. Prasanna, and S. Vajapeyam, editors, High
CHIMP-V2-USER 1.2, Edinburgh Parallel Computing Cen- Performance Computing HiPC 2001, number 2228 in Lec-
tre, June 1994. ture Notes in Computer Science, pages 8192. Springer,
[2] G. Burns, R. Daoud, and J. Vaigl. LAM: An open cluster Dec. 2001. 8th International Conference.
environment for MPI. In J. W. Ross, editor, Proceedings of [16] W. D. Gropp and E. Lusk. Why are PVM and MPI so differ-
Supercomputing Symposium '94, pages 379386. University ent? In M. Bubak, J. Dongarra, and J. WaŽniewski, editors,
s
of Toronto, 1994. Recent Advances in Parallel Virtual Machine and Message
[3] I. S. Committee. IMPI - interoperable message-passing in- Passing Interface, volume 1332 of Lecture Notes in Com-
terface, 1998. http://impi.nist.gov/IMPI/. puter Science, pages 310. Springer Verlag, 1997. 4th Eu-
[4] J. Cownie and W. Gropp. A standard interface for debugger ropean PVM/MPI Users' Group Meeting, Cracow, Poland,
access to message queue information in MPI. In J. Don- November 1997.
garra, E. Luque, and T. Margalef, editors, Recent Advances [17] J. C. Hardwick. Porting a vector library: a comparison of
in Parallel Virtual Machine and Message Passing Interface, MPI, Paris, CMMD and PVM. In IEEE, editor, Proceedings
volume 1697 of Lecture Notes in Computer Science, pages of the 1994 Scalable Parallel Libraries Conference: Octo-
5158. Springer Verlag, 1999. ber 1214, 1994, Mississippi State University, Mississippi,
[5] J. J. Dongarra, G. A. Geist, R. J. Manchek, and P. M. Pa- pages 6877, Silver Spring, Maryland, 1995. IEEE Com-
padopoulos. Adding context and static groups into PVM. puter Society Press.
http://www.epm.ornl.gov/pvm/context.ps, July 1995. [18] R. Hempel. The status of the MPI message-passing standard
[6] G. Fagg and J. dongarra. Fault-tolerant MPI: Supporting and its relation to PVM. In A. Bode, J. Dongarra, T. Lud-
dynamic applications in a dynamic world. In J. Dongarra, wig, and V. Sunderam, editors, Parallel Virtual Machine,
P. Kacsuk, and N. Podhorszki, editors, Recent Advances in EuroPVM '96: Third European PVM Conference, Munich,
Parallel Virutal Machine and Message Passing Interface, Germany, October 79, 1996: proceedings, volume 1156 of
number 1908 in Springer Lecture Notes in Computer Sci- Lecture Notes in Computer Science, pages 1421. Springer-
ence, pages 346353, 2000. 7th European PVM/MPI Users' Verlag, 1996.
Group Meeting. [19] Message Passing Interface Forum. MPI: A Message-Passing
[7] G. E. Fagg, A. Bukovsky, and J. J. Dongarra. HARNESS Interface standard. International Journal of Supercomputer
and fault tolerant MPI. Parallel Computing, 27(11):1479 Applications, 8(3/4):165414, 1994.
1495, Oct. 2001. [20] Message Passing Interface Forum. MPI2: A Message Pass-
[8] R. D. Falgout, A. Skjellum, S. G. Smith, and C. H. Still. ing Interface standard. International Journal of High Per-
The multicomputer toolbox approach to concurrent BLAS formance Computing Applications, 12(12):1299, 1998.
and LACS. In J. Saltz, editor, Proceedings of the Scalable [21] K. Morimoto, T. Matsumoto, and K. Hiraki. Implementing
High Performance Computing Conference (SHPCC), pages MPI with the memory-based communication facilities on the
8
SSS-CORE operating system. In V. Alexandrov and J. Don-
garra, editors, Recent Advances in Parallel Virtual Machine
and Message Passing Interface, volume 1497 of Lecture
Notes in Computer Science, pages 223230. Springer, 1998.
[22] S. A. Moyer and V. S. Sunderam. PIOUS: A scalable par-
allel I/O system for distributed computing environments. In
Proceedings of the Scalable High-Performance Computing
Conference, pages 7178, 1994.
[23] W. Saphir. Devil's advocate: Reasons not to use PVM, May
1994. PVM User Group Meeting.
[24] S. L. Scott, M. Fischer, and A. Geist. PVM on windows
and NT clusters. In V. Alexandrov and J. Dongarra, editors,
Recent advances in Parallel Virtual Machine and Message
Passing Interface, volume 1497 of Lecture Notes in Com-
puter Science, pages 231238. Springer, 1998.
[25] M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and
J. Dongarra. MPI--The Complete Reference: Volume 1, The
MPI Core, 2nd edition. MIT Press, Cambridge, MA, 1998.
[26] T. Sterling, editor. Beowulf Cluster Computing with Linux.
MIT Press, 2002.
[27] T. Sterling, editor. Beowulf Cluster Computing with Win-
dows. MIT Press, 2002.
[28] R. Thakur, E. Lusk, and W. Gropp. A case for using MPI's
derived datatypes to improve I/O performance. In Proceed-
ings of SC98: High Performance Networking and Comput-
ing, Nov. 1998.
[29] Web page: Introduction to the TotalView debugger.
http://www.dolphinics.com/tw/tv/totalview.html.
[30] D. Treumann. Personal communication, 1998.
9