Tags: arbitrary commands, buffer overflow, computer sciences, e mail, engineering sciences, erlangen germany, faculty of engineering, friedrich alexander, germany phone, integer overflow, java operating system, kernels, lems, meik, network operating system, root privilege, secure network, technical report tr, uni erlangen, university of erlangen,
Michael Golm, Meik Felser,
Christian Wawersich, Jürgen Kleinöder
A Java Operating System
as the Foundation of a
Secure Network Operating System
Technical Report TR-I4-02-05 x August 2002
Department of Computer Sciences 4
Distributed Systems and Operating Systems
Friedrich-Alexander-Universität
Erlangen-Nürnberg
Univ. Erlangen-Nürnberg · Informatik 4 · Martensstr. 1 · 91058 Erlangen · Germany
Phone: +49.9131.85.27277 · Fax: +49.9131.85.28732 TECHNISCHE FAKULTÄT
E-Mail: i4@informatik.uni-erlangen.de · URL: http://www4.informatik.uni-erlangen.de (Faculty of Engineering Sciences)
A Java Operating System as the Foundation of a
Secure Network Operating System
Michael Golm, Meik Felser, Christian Wawersich, Jürgen Kleinöder
University of Erlangen-Nuremberg
Dept. of Computer Science 4 (Distributed Systems and Operating Systems)
Martensstr. 1, 91058 Erlangen, Germany
{golm, felser, wawersich, kleinoeder}@informatik.uni-erlangen.de
Abstract user to run arbitrary commands with root privilege [10],
Errors in the design and implementation of operating [11], one executes commands in emails [12], and one is an
system kernels and system programs lead to security prob- integer overflow [13]. The six buffer overflow vulnerabili-
lems that very often cause a complete breakdown of all ties could have been avoided by using techniques described
security mechanisms of the system. by Cowan et al. [15]. However, not all overflow attacks can
We present the architecture of the JX operating system, be detected and the authors recommend the use of a type-
which avoids two categories of these errors. First, there are safe language.
implementation errors, such as buffer overflows, dangling An argument that is often raised against type-safe sys-
pointers, and memory leaks, caused by the use of unsafe tems and software protection is that the compiler must be
languages. We eliminate these errors by using Java--a type- trusted. We think that this is not a very strong argument for
safe language with automatic memory management--for the following three reasons. (i) Traditional systems, such as
the implementation of the complete operating system. Sec- Unix, also use compilers to compile trusted components,
ond, there are architectural errors caused by complex sys- like the kernel and system programs. Security in such a sys-
tem architectures, poorly understood interdependencies tem relies on the assumption that the C compiler contains no
between system components, and minimal modularization. bugs or trojan horses [61]. (ii) Only the compiler backend
JX addresses these errors by following well-known princi- that translates the type-safe instruction set into the instruc-
ples, such as least-privilege and separation-of-privilege, tion set of the processor and the verifier that guarantees
and by using a minimal security kernel, which, for example, type-safety must be trusted. (iii) The additional effort that
excludes the filesystem. must be put into the verification of two components--the
Java security problems, such as the huge trusted class compiler backend and the verifier--pays off with reduced
library and reliance on stack inspection are avoided. Code verification effort for many trusted system programs. Most
of different trustworthiness or code that belongs to different vulnerabilities in current systems are caused not by bugs in
principals is separated into isolated domains. These the kernel but by bugs in system programs.
domains represent independent virtual machines. Sharing The second category of errors--the architectural
of information or resources between domains can be com- errors-- is more difficult to tackle. The three CERT notes
pletely controlled by the security kernel. related to the execution of commands in strings and emails
are critical because the vulnerable systems violate the prin-
1 Introduction ciple of least-privilege [52]. Thus, in current mainstream
systems it is not the question whether the proper security
There are two categories of errors that cause the easy policy is used, but whether security can be enforced at all
vulnerability of current systems. The first are implementa- [39]. Violations of the principle of least-privilege, an uncon-
tion errors, such as buffer overflows, dangling pointers, and trolled cumulation of functionality, many implementation
memory leaks, which are caused by the prevalent use of errors, complex system architectures, and poorly under-
unsafe languages in current systems. This becomes danger- stood interrelations between system components make cur-
ous when an OS relies on a large number of trusted pro- rent systems very vulnerable. This is a problem that affects
grams. From the top ten CERT notes (as of January 2002) all applications, because applications are built on top of an
with highest vulnerability potential six are buffer overflows operating system and can be only as secure as its trusted
[4], [5], [6], [7], [8], [9], two relate to errors checking user programs and the underlying kernel.
supplied strings that contain commands thus allowing the
1
As it will never be possible to develop software of mod- nism that completely isolates the servers with respect to
erate complexity that is free of errors one must assume that data access and resource usage.
every complex application contains security critical errors. The paper is structured as follows. Section 2 gives an
The realization that these errors can not be avoided in cur- overview about Java security and analyzes some weak-
rent systems led to the proliferation of firewalls that are nesses of the Java security mechanism. Section 3 describes
responsible to shield potentially vulnerable systems from the architecture of JX with the focus on the security archi-
potentially dangerous traffic. Application developers and tecture. Section 4 describes the performance of the system
deployers react to the restriction of a firewall by tunneling as a web server. Section 4 discusses how the system meets
traffic over open ports, for example the http port 80. The the requirements of a security architecture. Section 5
security community reacts by building traffic analyzers that describes related work and Section 6 concludes the paper.
analyze the TCP stream and the protocols above TCP and
http. As it becomes more and more expensive to cure the 2 Java Security
symptoms it becomes more attractive to fix the deeper
underlying causes of the security problems. Java security is based on the concept of a sandbox,
It is well understood that the unsafe nature of the lan- which relies on the type-safety of the executed code.
guages C and C++ is the reason for many of today's security Untrusted but verified code can run in the sandbox and can
problems. There are several projects that try to develop a not leave the sandbox to do any harm. Every sandbox must
safe dialect of C. One of these projects created a safe dialect have a kind of exit or hole, otherwise the code running in the
of C, called Cyclone-C [34]. Although Cyclone-C looks sandbox can not communicate results or interact with the
similar to C it is not possible to recompile an existing non- environment in a suitable way. These holes must be clearly
trivial C program, such as an OS kernel, without changes. defined and thoroughly controlled. The holes of the Java
Using Java instead of a Cyclone-C means that it is more dif- sandbox are the native methods. To control these holes, the
ficult to port C programs, but allows to run the large number Java runtime first controls which classes are allowed to load
of existing Java programs without modifications. Further- a library that contains native code. These classes must be
more, Cyclone-C programs have a similar performance trusted to guard access to their native methods. The native
overhead as Java programs. methods of these classes should be non-public and the pub-
There is still the problem that basing the protection on lic non-native methods are expected to invoke the Security-
type-safety ties the system to a certain language and type Manager before invoking a native method. The Security-
system. But this seems to be no problem at all. Although the Manager inspects the runtime call stack and checks whether
Java bytecode was not designed as the target instruction set the caller of the trusted method is trusted.
for languages other than Java, there is a large number of lan- Java version 1 distinguishes between trusted system
guages that can be compiled to Java bytecode. Examples are classes, which were loaded using the JVM internal class
Python [50], Eiffel [48], Tcl [35], Scheme [42], Prolog [20], loading mechanism, and untrusted classes, which were
Smalltalk [56], ADA95 [26], and Cobol [47]. loaded using a class loader external to the JVM. Implemen-
Java allows developing applications using a modern tations of the SecurityManager can check whether the
object-oriented style, emphasizing abstraction and reusabil- classes on the call stack--the callers of the method--are
ity. On the other hand many security problems have been trusted or untrusted classes. When the caller was a system
detected in Java systems in the past [18]. The main contri- class the operation usually is allowed otherwise the Securi-
bution of this paper is an architecture for a secure Java oper- tyManager decides, depending on the kind of operation and
ating system that avoids these problems and a discussion of its parameters, whether the untrusted class is allowed to
its implementation and performance. invoke the operation1.
We follow Rushby [51] in his reasoning that a secure Java version 2 also relies on stack inspection but can
system should be structured as if it were a distributed sys- define more flexible security policies by describing the per-
tem. With such an architecture a security problem in one missions of classes of a certain origin in external files.
part of the system does not automatically lead to a collapse To sum up, Java security relies on the following require-
of the whole system's security. Microkernels are well suited ments:
as the foundation of such a system. Especially systems that 0
(1) Code is kept in a sandbox by using an intermediate
adhere to the multi-server approach, such as SawMill [28], instruction set. Programs are verified to be type-safe.
and mediate communication between the servers [33] are
able to limit the effect of security violations.
The JX system combines the advantages of a multi- 1. The real implementation uses the abstraction of classloader-depth,
server structure with the advantages of type-safety. It uses which is the number of stack frames between the current stack frame
and the first stack frame connected to a class that was loaded using
type-safety to provide an efficient communication mecha- a classloader.
2
(2) The package-specific and/or class-specific access A small microkernel contains low-level hardware initializa-
modifiers must be used to restrict access to the holes of tion code and a minimal Java Virtual Machine (JVM).
this sandbox: the native methods of trusted classes. As The JX system is structured into domains (see Figure 1).
long as the demarcation line between Java code and native Each domain represents the illusion of an independent
code is not crossed, the Java code can do no harm. JVM. A domain has a unique ID, its own heap including its
(3) The publicly accessible methods of the trusted classes own garbage collector, and its own threads. Thus domains
must invoke the SecurityManager to check whether an are isolated with respect to CPU and memory consumption.
operation that would leave the sandbox is allowed. They can be terminated independently from each other and
The SecurityManager is similar to a reference monitor, the memory that is reserved for the heap, the stack and
but has a severe shortcoming: it is not automatically domain control structures can be released immediately
invoked. A trusted class must explicitly invoke the Security- when the domain is terminated.
Manager to protect itself. The mere number of native meth- All domains execute 100% Java code. The microkernel
ods makes it difficult to assure this. We counted 1312 native represents itself also as a domain. Because this domain has
methods in Sun's JRE 1.3.1_02 for Linux, which are 2.9 the ID 0 it is called DomainZero. DomainZero contains all
percent of all methods. From these native methods 34 per- C and assembler code that is used in the system.
cent are public and even as much as 16 percent are public JX does not support native methods and there is no
static methods in a public class. This means that the method trusted Java code that must be loaded into a domain. There
can be invoked directly from everywhere without the Secu- is no trust boundary within a domain which eases adminis-
rityManager having a chance to intercept the call. Two of tration and allows a domain complete freedom in what code
these methods are java.lang.System.currentTimeMillis() and it runs. Because the domain contains no trusted code it is a
java.lang.Thread.sleep() which provides an interesting opportu- sandbox that is completely closed. We create a new hole by
nity to create a covert timing channel. The fact that covert introducing capabilities, called portals.
channels are not exploited can be attributed to the existence Portals are proxies [55] for a service that runs in another
of many overt channels. Public, non-final, static variables in domain. Portals look like ordinary objects and are located
public system classes are only one example (we counted 31 on a domains heap, but the invocation of a method synchro-
of these fields in Sun's JRE). nously transfers control to the service that runs in another
A further problem is that the stack inspection mecha- domain. Parameters are copied from the client to the server
nism only is concerned with access control. It completely domain.
ignores the availability aspect of security. This lack was Portals and services can not be created explicitly by the
addressed in JRes [17]. By rewriting bytecodes, JRes cre- programmer. They "magically" appear during portal com-
ates a layer on top of the JVM. In our opinion, this is the munication. When a domain wants to provide a service it
wrong layer for resource control, because resources that are can define a portal interface, which must be a subinterface
only visible inside the JVM can only be accounted inside of jx.zero.Portal, and a class that implements this interface.
the JVM. Examples are CPU time and memory used for the When an instance of such a class is passed to another
garbage collector (GC) or just-in-time compiler or memory domain the portal invocation mechanism creates a service in
used for stack frames. Furthermore, rewriting bytecodes is the source domain and a portal in the destination domain.
a performance overhead in itself and it creates slower pro- This architecture has a bootstrap problem: A domain can
grams. Often, Java is perceived as inherently insecure due to
the complexity of its class libraries and runtime system [22]. Components Heap
As will be described in Section 3, JX avoids this problem by Classes Portals
not trusting the JDK class library.
Objects
3 JX Security Architecture Threads
Java-Stacks
This section describes the aspects of the JX architecture Thread Control Blocks
Domain A Domain B
that are relevant to security.
C Code Threads
3.1 JX architecture Assembler Stacks
Thread Control Blocks
JX is a single address space system. All code runs in one
physical address space; an MMU is not used. Protection is DomainZero (Microkernel)
based on the type-safety of the Java bytecode instruction set.
Figure 1: Structure of the JX system
3
obtain new portals solely by using existing portals. There- local service table, a pointer to the Domain Control Block
fore each domain possesses an initial portal: a portal to a (DCB) and a domain ID. DCBs are one of the few global
naming service. Using this portal the domain can obtain data structures of JX. Because the DCB of a domain is
other portals to access more services. When a domain is cre- reused when a domain terminates and portals can outlive the
ated, the creating domain can pass the naming portal as a domain in which the service is located, the DCB pointer
parameter of the domain creation call. When no naming could point to a DCB that contains not information about
portal is specified in the createDomain2 call, the default the terminated service domain but a newly created domain.
Naming portal of the creating domain is passed to the cre- Therefore the portal contains also a unique domain ID,
ated domain. The naming service of the microkernel is used which is checked against the ID in the DCB before the DCB
only by the initial domain (DomainInit) which implements is used.
a naming service in Java and passes this naming service to Although the portal is located on the heap of the client
all domains it creates. Because DomainInit looks up all por- domain the Java code has no way to access its contents. The
tals from the microkernel on startup no interaction with the type of the portal reference is the jx.zero.Portal interface,
microkernel naming service by any domain is needed after which, as an interface, has no fields. Thus it is not possible
DomainInit has completed its initialization. to forge a portal to access an arbitrary service.
The implementation of the portal mechanism had to ful- Services are removed automatically when no portal to
fil the following requirements: the service exists. To detect this condition the SCB contains
· It must not be possible to explicitly create a portal object. a reference counter that counts the number of portals to the
· It must be possible to terminate a domain and release all service. When a portal is passed to another domain a portal
its resources independent of its current communication to the same service is created in the other domain and the
relationships. reference counter is incremented. When a portal is garbage
· As services are created by the microkernel they must also collected the finalization cycle decrements the reference
be automatically removed when they are no longer counter of the service. When a domain terminates all portals
needed. The data structures necessary to control a service can be considered garbage and a finalization cycle is per-
must be placed on the domains heap and a garbage collec- formed before the heap memory is released.
tor must be able to move them.
With the following implementation all these require- 3.2 JX as a capability system
ments are met. A service is represented by a service control
block (SCB) that is stored on the server domain's heap. The Portals are capabilities [19]. A domain can only access
SCB has a reference to the object that contains the imple- other domains when it possesses a portal to a service of the
mentation of the portal methods, a thread that is used to exe- other domain. The operations that can be performed with
cute the methods, and a queue of waiting senders (Figure 2). the portal are listed in the portal interface.
Although the capability concept is very flexible and
Sender Queue solves many security problems, such as the confused deputy
DomainID Service [30], in a very natural way, it has well known limitations.
ServiceID Control The major concern is that a capability can be used to obtain
Block
Portal other capabilities, which makes it difficult, if not impossi-
ble, to enforce confinement [62]. JX as described up to now
Service Thread Object can not enforce confinement. Thus an additional mecha-
Table Control
Block nism is needed: a reference monitor that is able to check all
portal invocations and the transfer of portals between
Client Domain Server Domain
domains.
Figure 2: Portal data structures
A portal contains no direct pointer to the Service Control 3.3 The reference monitor
Block (SCB) because the SCB is stored on the heap and can
A reference monitor must be tamper-proof, mediate all
be moved by the garbage collector. Using direct pointers
accesses, and be small enough to be verified.
would require updating all portals to a service during a GC
A reference monitor for JX must at least control incom-
cycle of the service domain. This would require a scan of
ing and outgoing portal calls. There are two alternatives for
the heaps of all domains which does not scale well. There-
the implementation of such a reference monitor:
fore a portal contains the index of the service in a domain-
Proxy. Initially a domain has access only to the naming por-
tal that is passed during domain creation. To obtain other
2. createDomain is a method of the DomainManager service which
runs in DomainZero. portals the name service is used. The parent domain can
4
implement this name service to not return the registered por- to an object of another domain. The reference monitor fur-
tal but a proxy portal which implements the same interface. thermore gets the Domain portal of the caller domain and
This proxy can then invoke a central reference monitor the callee domain. To accelerate the operation of the refer-
before invoking the original portal. ence monitor, the Domain portal is a portal which can be
inlined by the translator. On an x86 it takes only two
Microkernel. The portal invocation mechanism inside the
machine instructions to get the domain ID given the Domain
microkernel invokes a reference monitor on each portal call
portal.
and passes sender principal, receiver principal, and call
The main problem is to obtain a consistent view of the
parameters to the reference monitor.
system during the check. One way is to freeze the whole
system by disabling interrupts during the check. This would
These two implementation alternatives have the follow-
work only on a uniprocessor, would interfere with schedul-
ing advantages and drawbacks. The proxy solution needs no
ing, and allow a denial-of-service attack. Therefore, our cur-
modification of the microkernel and thus avoids the danger
rent implementation copies all parameters from the client
of introducing new bugs. As long as no reference monitor-
domain to the server domain up to a certain per-call quota.
ing is needed, the proxy solution does not cause any addi-
These objects are not immediately available to the server
tional cost. The microkernel solution must check in every
domain, but are first checked by the security manager. When
portal invocation sequence whether a reference monitor is
attached to the domain. Because the domain control block, the security manager approves the call the normal portal
which contains this information, is already in the cache dur- invocation sequence proceeds.
ing the portal invocation, this check is nearly for free. On the
other hand, the proxy solution requires the name service to 3.4 Making an access decision
create a proxy for each registered portal. During a method
Spencer et al. [58] argue that basing an access decision
invocation at such a portal the whole parameter graph must
only on the intercepted IPC between servers forces the secu-
be traversed and when a portal is found it must be replaced
rity server to duplicate part of the object server's state or
by a proxy portal.
functionality. We found two examples of this problem. In
We rejected the proxy approach, because it requires a
UNIX-like systems access to files in a file system is checked
rather complex implementation and it is difficult to assure
when the file is opened. The security manager must analyze
that each portal is "encapsulated" in a proxy portal.
the file name to make the access decision, which is difficult
We modified the microkernel to invoke the reference
without knowing details of the file system implementation
monitor when a portal call invokes a service of the moni-
and without information that is only accessible to the file
tored domain (inbound) and when a service of another
system implementation. The problem is even more obvious
domain is invoked via a portal (outbound). The internal
in a database system that is accessed using SQL statements.
activity of a domain is not controlled. The same reference
To make an access decision the reference monitor must
monitor must control inbound and outbound calls of a
parse the SQL statement. This is inefficient and duplicates
domain, but different domains can use different monitors. A
functionality of the database server.
monitor is attached to a domain when the domain is created.
There are three solutions for these problems:
When a domain creates a new domain, the reference moni- 0
(1) The reference monitor lets the server proceed and only
tor of the creating domain is asked to attach a reference
checks the returned portal (the file portal).
monitor to the created domain. Usually, it will attach itself
to the new domain but it can - depending on the security pol- (2) The server explicitly communicates with the security
icy - attach another reference monitor or no reference mon- manager when an access decision is needed.
itor at all. (3) Design a different interface that simplifies the access
It must be guaranteed, that while the access check is per- decision.
formed, the state to be checked can only be modified by the Approach (1) may be too late, especially in cases where
reference monitor. When this state only includes the param- the call modified the state of the server.
eters of the call, these parameters could be copied to a loca- Approach (2) is the most flexible solution. It is used in
tion that is only accessible by the reference monitor. When Flask with the intention of separating security policy and
the state includes other properties of the involved domains, enforcement mechanism [58]. The main problem of this
the activity of these domains must be suspended. For these solution is, that it pollutes the server implementation with
reasons the access check is performed in a separate domain, calls to the security manager. The Flask security architec-
not in the caller or callee domain. ture was implemented in SELinux [40]. In SELinux, the list
The list of parameters is accessed using an array of of permissions for file and directory objects have a nearly
VMObject portals. VMObject is a portal which allows access one-to-one correspondence to an interface one would use
5
for these objects. This makes approach (3) the most promis- control of portal communication and (ii) the control of por-
ing approach. Our two example problems would be solved tal propagation.
by parsing the path in the client domain. In an analogous
manner the SQL parser is located in the client domain and a Figure 3 shows the complete reference monitor inter-
parsed representation is passed to the server domain and face. Figure 4 shows the information that is available to the
intercepted by the security manager. This has the additional reference monitor.
advantage of moving code to an untrusted client, eliminat-
public interface DomainBorder {
ing the need to verify this code. Section 3.11 gives further boolean outBound(InterceptInfo info);
details about the design of the file server interface. boolean inBound(InterceptInfo info);
boolean createPortal(PortalInfo info);
3.5 Controlling portal propagation void destroyPortal(PortalInfo info);
}
In [36] Lampson envisioned a system in which the client
can determine all communication channels that are avail- Figure 3: Reference monitor interface
able to the server before talking to the server. We can do this
by enumerating all portals that are owned by a domain. As public interface InterceptInfo extends Portal {
we can not enforce a domain to be memoryless [36], we Domain getSourceDomain();
must also control the future communication behavior of a Domain getTargetDomain();
domain to guarantee the confinement of information passed VMMethod getMethod();
to the domain. VMObject getServiceObject();
Several alternative implementations can be used to enu- VMObject[] getParameters();
merate the portals of a domain: }
0
(1) A simple approach is to scan the complete heap of the
public interface PortalInfo extends Portal {
domain for portal objects. Besides the expensive scanning
Domain getTargetDomain();
operation, the security manager can not be sure, that the int getServiceID();
domain will not obtain portals in the future. }
(2) An outbound intercepter can be installed to observe all
Figure 4: Information interfaces
outgoing communication of the domain. Thus a domain is
allowed to posses a critical portal but the reference moni-
tor can rejects it's use. The performance disadvantage is 3.6 Principals
that the complete communication must be checked, even
if the security policy allows unrestricted communication A security policy uses the concept of a principal [19] to
with a subset of all domains. name the subject that is responsible for an operation. The
(3) The security manager checks all portals transferred to principal concept is not known to the JX microkernel. It is
a domain. This can be achieved by installing an inbound an abstraction that is implemented by the security system
interceptor which inspects all data given to a domain and outside the microkernel, while the microkernel only oper-
traverses the parameter object graph to find portals. This ates with domains. Mapping a domain ID to a principal is
could be an expensive operation if a parameter object is the responsibility of the security manager. We implemented
the root of a large object graph. During copying of the a security manager which uses a hash table to map the
parameters to the destination domain, the microkernel domain ID to the principal object. We first considered an
already traverses the whole object graph. Therefore it is implementation where the microkernel supports the attach-
easy to find portals during this copying operation. The ment of a principal object to a domain. The biggest problem
kernel can then inform the security manager, that there is of such a support would be the placement of the principal
a portal passed to the domain (method createPortal()). object. Should the object live in the domain it is attached to
The return value of createPortal() decides whether the or in the security manager domain? Both approaches have
portal can be created or not. The security manager must severe problems. As the security manager must access the
also be informed if the garbage collector destroys a portal object it should be placed in the security manager's heap.
(destroyPortal()). This way reference monitor can keep But this creates domain interdependencies and the indepen-
track of what portals a domain actually possesses. dence of heap management and garbage collection, which is
an important property of the JX architecture, would be lost.
Thus, a numerical principal ID seemed to be the only solu-
Confinement can now be guaranteed with two mecha-
tion. But having a principal ID has no advantages over hav-
nisms that can be used separately or in combination: (i) the
6
ing a domain ID, so finally we concluded that the microker- outside the runtime system, the runtime system must know
nel should not care about principals at all. about their existence or even know part of their internal
The security manager maps the unique domain ID to a structure (fields and methods). These structural require-
principal object. Once the principal is known, the security ments are checked by the verifier.
manager can use several policies for the access decision, for The class Object is the base class of all classes and inter-
example based on a simple identity or based on roles [24]. faces. It contains methods to use the object as a condition
To service a portal call the server thread may itself variable, etc. In JX Object is implemented by the runtime
invoke portals into other domains. To avoid several prob- system. The class String is used for strings. Because String
lems (trojan horse, confused deputy [30]) the server may is used inside the runtime system, it is required that the
want to downgrade the rights of these invocations to the String class does exist in a domain and that the first field is
rights of the original client. The most elegant solution of a character array. The runtime system needs to throw several
these problems is a pure capability architecture. In the JX exceptions, such as ArrayIndexOutOfBoundsException,
architecture this would mean that the server uses only por- NullpointerException, OutOfMemoryError, StackOverflow-
tals that were obtained from that particular client. This Error. It is required that these classes and their superclasses
requirement is difficult to assure in a multi-threaded server RuntimeException, Exception and Throwable exist in a
domain that processes requests from different clients at the domain. There are no structural requirements for these
same time. Because the server threads use the same heap, a classes. Arrays are type compatible to the interfaces Clone-
portal may leak from one server thread to another. A better able and Serializable. These interfaces also must exist in a
solution is to allow the reference monitor to downgrade the domain.
rights of a call. To allow the reference monitor to enforce Classes are represented by the portal jx.zero.VMClass.
downgrading rights to the rights of the invoker, each service But because Object contains a method getClass(), it is
thread (a thread that processes a portal call) has the domain required that java.lang.Class exists and contains a construc-
ID of the original client attached to it. This information is tor which has one parameter of type VMClass.
passed during each portal invocation. The reference monitor
has access to this information and can base the access deci- 3.9 Structure of the Trusted Computing Base
sion on the principal of the original domain, instead of the
principal of the immediate client. Figure 5 shows the structure of the trusted computing
base (TCB). In the TCB we include all system components
that the user trusts to perform a certain operation correctly.
3.7 Revocation of memory objects
The central part of the system is the integrity kernel. Com-
There is a special kind of portals, called fast portals. Fast promising the integrity kernel allows an intruder to crash the
portals can only be created by DomainZero. They are exe- whole system. Built on the integrity kernel is the security
cuted in the context of the caller. The semantics of a fast kernel. The security kernel represents the minimal TCB. In
portal is known to the system and it's methods can be a typical system configuration the TCB will include the
inlined by the translator. An example for a fast portal is the window manager and the file system. Users will trust the file
Memory portal. We solved the confinement problems of system to store their data reliably. Compromising the secu-
capabilities by introducing a reference monitor that is rity kernel or the rest of the TCB leads to security breaches,
invoked when a portal is used. This is not practical with such as disclosure of confidential data or unauthorized mod-
memory portals for performance reasons, although it could ification of data, but not to an immediate system crash. It
be done. Therefore memory portals support revocation. may lead to a system crash when a compromised security
When the reference monitor detects that a portal is passed kernel allows access to the integrity kernel. This design is
between two domains (createPortal()) it could revoke the reminiscent of the protection rings of Multics.
access right to the memory object for the source domain or JX is a component-based system. A component consists
reject passing of the memory portal. of a number of classes and a file that describes the compo-
nent. This file also contains the information on what other
3.8 Minimizing the JDK class library components the component depends on. The modulariza-
tion and explicit dependencies allows to remove unneces-
The JVM and the class library of the Java Development sary functionality with a few configuration changes. For
Kit (JDK) can not easily be separated from each other. example in a server system the window manager may not be
In JX the JDK is not part of the trusted computing base part of the TCB, while in a thin client system the file system
(TCB). However, there are some classes, whose definition is may not be needed. A user may even decide not to trust the
very tightly integrated with the JVM specification [38][29]. file system and store the data in an own data base.
Although these classes (except Object) are implemented
7
It is important that there are no dependencies between
the inner kernels and the outer ones. The security manager,
User Application User Application
for example, must not store its configuration in the file sys-
open tem but use its own simple file system.
window
read file TCB
Tamper-resistant auditing. The system must assure that
all security relevant events are persistently stored on disk
Window Manager and cannot be modified. To be certain that the audit trail is
tamper-proof we use a separate disk and write this disk in an
File Server append only mode. We do not use a file system at all but
write the messages unbuffered to consecutive disk sectors.
Keyboard and We do not use any buffering and the audit call only returns
Mouse Driver
when the block was written to disk. Writing at consecutive
get permissions
disk sectors avoids long distance head movements and gives
ask user
trusted path
a rate of 630 audit messages per second3. Writing one audit
Security Kernel
message needs 1582 µseconds. Given that a file access
start domain
which can be satisfied from the buffer cache is in the tenth
of µseconds auditing each file access adds considerable
Authentication overhead. The size of a typical audit message is between 35
System and 40 bytes. The disk is used as a ring buffer: when the last
sector is reached we wrap to the first one and overwrite old
logs. This avoids a problem often encountered when log-
read/write sector
Access/Execute
Decision ging to a file system: when the file system is full logs get
lost. Usually, the most recent logs are the most valuable.
With the above mentioned message rate of 630 messages/
Principal second and a message size of 40 bytes we have a time win-
Management
dow of 110 hours using a 10 GBytes disk. Under normal
Central Audit operation the time window is much larger, because the mes-
Security sage rate is well below its maximum.
Manager
Trusted path. According to the Orange Book [21] a trusted
Domain
path is the path from the user to the TCB. Depending on the
Starter Program user interface the TCB must include the window manager or
Loader the console driver.
Recent literature generalizes the notion of a trusted path
to any communication mechanism within the system. To
Component
Repository trust a communication path it is essential to identify the
communication partner and provide a communication chan-
Verifier & BlockIO
nel that can not be overheard or modified. Portal communi-
Translator Disk Driver cation is such a mechanism.
Usually, the reference monitor limits communication
according to a certain security policy. This mechanism
works automatically and is transparent to domains. But it is
JX Microkernel even possible for a domain to explicitly consider portal
communication as being performed on a trusted path,
because the target domain of a portal can be obtained and
Hardware this identity can not be spoofed.
Integrity Kernel
interception
domain
portal call 3. The following hardware was used for all measurements in this pa-
per: Intel PIII 500 MHz, 512 KB cache, 640 MB RAM, 440BX
Figure 5: Typical TCB structure Chipset, 82371AB PIIX4 IDE, Maxtor 91303D6 disk.
8
3.10 Maintaining security in a dynamic system classes in terms of our capability-based filesystem interface
(Figure 6).
An operating system is a highly dynamic system. New
Client Domain
users log in, services are started and terminate, rights of
users are changing, etc. To maintain security in such a sys-
Client
tem, the initial system state must be secure and each state
transition must transfer the system into a secure state. java.io.RandomAccessFile
jdk_fs
There are two issues to be considered here: the system
jx.fs.File
issue and the security policy issue.
fs_user
It must be guaranteed that trusted software is not tam-
pered and untrusted software runs in a restricted environ-
ment. The system starts with a secure boot process. Pro- Reference Monitor
and Security Policy
vided that no attacker has physical access to the hardware
Security Domain
booting from a tamper-proof device, such as a CD-ROM, is
sufficient and we do not need a secure boot process as in jx.fs.File
AEGIS [2] that checks for hardware modifications. We trust fs_user
the initial domain to correctly start the security services and
fs_user_impl
to attach them to the created domains. Each domain is
jx.fs.FileInode
started with a strictly defined set of rights (portals) and no Legend:
fs
trusted code. The initial portals always include a naming Implementation
fs_javafs Component
portal with which other portals can be obtained. To avoid the
expensive nameserver lookup it is possible to pass a set of jx.bio.BlockIO
Interface
additional portals to a newly created domain. The created bio
Interface Component
domain is automatically associated with a principal. When Fileserver Domain
a domain obtains new portals or communicates using exist-
ing portals the security system is involved. Figure 6: Filesystem layers
The policy issue is concerned with secure changes of the The implementation component jdk_fs contains imple-
access rights, additions of principals, etc. How this is done mentations for the java.io.* classes and uses portal inter-
depends on the used security policy and is outside the scope faces from the fs_user interface component to access the file
of this paper. system. These portals access service objects that are imple-
mented in the fs_user_impl component.
3.11 Securing servers Code that uses the java.io classes can run unmodified on
top of our implementation of java.io. But the advantages of
We use the file system server to illustrate how our secu-
a capability-based system are lost: files must be referenced
rity architecture works in a real system. As we discussed in
by name and problems similar to the Confused Deputy [30]
Section 3.4 we use the server interface to make access deci-
are possible. An application can avoid the problems by
sions. For this to work servers must export securable inter-
using the (not JDK-compatible) capability-based file sys-
faces. A securable interface must use simple parameters and
tem interface.
provide fine-grained simple operations.
In an multi-level security (MLS) system in which the file
Many servers have a built-in notion of permissions, for
system is part of the TCB, the file system must be verified
example the user/group/other permissions in a UNIX file
to work correctly - which may be a difficult task as file sys-
system. We call them native permissions. These permis-
tems employ non-trivial algorithms. We used a configura-
sions can be supplemented or replaced by a set of foreign
tion which eliminates the need for file system verification.
permissions. These permissions could, for example, be
Our system creates different instances of the file system for
access control lists. Because foreign permissions are not
the different security levels, each file server being able to
supported by the server, there must be a way to store them.
use a disjunct range of sectors of the disk. Assuring correct
The SELinux system [40] uses a file hierarchy in the normal
MLS operation can now be reduced to the problem of veri-
file system to store foreign permissions.
fying that the disk driver works correctly; that is, it really
writes the block to the correct position on the disk. The file
There is some scepticism whether a capability-based
system may run outside the TCB with a security label that
system can be compatible to the JDK (see the discussion of
is equivalent to the data it stores.
capabilities in [63]). We proved that this is possib