Information about http://www.openldap.org/pub/umich/xldbm.pdf

An X.500 and LDAP Database: Design and Implementation…

Tags: abstract searches, aliases, approximate searches, attributes, backend, concurrent operation, database design, database packages, execution, implementation, information references, knowledge service, ldap, performance evaluation, remainder, search space, section 6, server searches, substring, types of queries,
Pages: 9
Language: english
Created: Sat Dec 13 22:58:25 2003
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
Page 6
image
Page 7
image
Page 8
image
Page 9
image
                 An X.500 and LDAP Database: Design and Implementation
                             Timothy A Howes 


Abstract                                                     searches efficiently, including full substring searches,
                                                             several kinds of approximate searches, and, with an
   This paper describes the design and implementation        underlying ordered database (such as a btree), range
of xldbm, an X.500 and stand-alone LDAP backend              queries. The database is fully disk-based and makes use
database. The xldbm database supports efficient              of caching and threading for good performance and
execution of all queries, data modifications, and search     highly concurrent operation. It supports site-specific
pruning using centroids, all using simple underlying         index configuration and performance tailoring, and
technology freely available on the Internet. Our             centroids indexing for distributed search space pruning.
approach to resolving various kinds of queries is
described, along with a performance evaluation and               The remainder of this paper gives an overview of
comparison to other popular database packages.               the X.500/LDAP information and query models driving
                                                             our database design, followed by an overview of our
                                                             approach to the problem. Section 4 gives details on
1    Introduction                                            how we handle specific types of queries. Section 5
    X.500 [1] and LDAP [2,3] define similar directory        describes our handling of aliases and knowledge
service information and query models. The information        references, while section 6 discusses how we use
model is centered around entries, which are composed         centroids to provide efficient multi-server searches.
of attributes. The entries are organized into a tree         Section 7 shows how modifications affect the database
structure, usually corresponding to a geographical and       structure, and Section 8 reports some performance
organizational distribution. The query models allow          measurements and summaries various optimizations
searching of portions of the tree based on filter criteria   we have implemented to improve performance.
involving attributes (e.g., entries with a surname of        Finally, Section 9 discusses limitations of our current
"Jensen"), and returning requested attributes from each      design, and what might be done to eliminate them.
matching entry. The model also defines operations for
adding, removing, or changing entries in the directory.      2    Overview of X.500 and LDAP
    The X.500/LDAP model poses some interesting                  X.500 is the OSI directory service. It defines an
challenges in database design. Searches have broad           information model, determining the form and character
scope, spanning from one entry to the entire tree,           of information in the directory; a namespace, allowing
making efficient index construction difficult. Several       the information to be referred to and organized; and a
types of primitive searches are involved, including          functional model, determining what operations can be
equality, substring, and approximate matching, and           performed on the information.
range queries. Arbitrary boolean combinations of
search filters must be supported, requiring query                The information model is centered around entries,
optimization for efficient processing. Searches can          which are composed of attributes. Each attribute has a
span multipe servers, following aliases or not. Data         type and one or more values. The type determines the
can be arbitrarily distributed among servers by means        attribute's syntax, which defines what kind of
of knowledge references. Efficient alias and knowledge       information is allowed in the values. Entries are
handling during a search are key to good distributed         arranged in a tree structure and divided among servers
performance.                                                 by means of knowledge references in a geographical
                                                             and organizational distribution. Entries are named
    We imposed an additional constraint on the               according to their position in this hierarchy. Alias
underlying technology in our system. It had to be            entries are allowed, which point to other entries,
simple and freely-available Internet software. We did        circumventing the hierarchy. Figure 1 depicts the
not want to require a commercial DBMS package, and           relationship between entries, attributes, and values and
we wanted the system to be understandable by users           shows how entries are arranged into a tree.
without much effort. Also, our goal was to make
installation and administration of the system as
straightforward as possible.
   The database we have designed and implemented is
based on any of several freely available hash or btree
packages, such as GNU dbm [4], or the Berkeley db
package [5]. It handles all types of X.500/LDAP
                                                                to support the X.500 list and read operations).

                                                                3    Approach
                                                                    Our approach to the X.500/LDAP database design
                                                                problem is a simple one. We wanted high performance,
                                                                but not at the expense of complicated management,
 Alias                   Object                                 administration, or recovery procedures. We wanted
                                                                reliability, but not the complications introduced by a
 entry                   entry                                  two-phase commit , roll-back or other guaranteed-
                                  Attr, Attr, ...               reliable transaction protocol. We felt the system should
                                                                be understandable with little effort, and easy to
                                                                manage. In short, our goal was to develop a highly-
   Figure 1: The X.500 model is centered                        functional system with good performance that was easy
    around entries which arecomposed of                         to administer and understand, and had reasonable
 attributes. Entries are arranged into a tree                   reliability and recovery capabilities.
 structure. Alias entries can circumvent the
                   hierarchy.                                       We started with the simple two-part index structure,
                                                                depicted in Figure 2. First, we assign each entry a
    Functionally, X.500 defines operations for                  uniqne identifier by which it can be referred to
searching, reading, and writing directory information.          efficiently. All entries in the database are maintained in
The search operation can span a single entry, an entry's        a single index file, keyed by this ID. Entries are stored
children, or an entire subtree of the directory. Alias          in this index using a simple text format of the form
entries can be followed automatically during a search,          "attribute: value," as shown in Table 1. Non-ASCII
even if they cross server boundaries. There are also            values, or values that are too long to fit on a
operations to read a single entry or list the children of       reasonably sized line are represented using a base 64
an entry. Operations are provided to add entries, delete        encoding, making entries in the index human readable
entries, and modify existing entries.                           and easy to reconstruct if lost. Given an ID, the
    LDAP, the lightweight directory access protocol,            corresponding entry can be returned quickly and
was originally developed as a front-end to the X.500            efficiently, for the cost of a single hash table or btree
directory. Naturally, it assumes the same information           lookup, depending on the choice of underlying
model and namespace as X.500. LDAP is lightweight               technology. If other index files are corrupted or
for three main reasons. First, the functional model is          destroyed, they can be regenerated from this one file.
less complicated. The read and list operations are left
out; they are emulated using search. Some of the more            id2entry                attribute index
esoteric and less-often-used features of other operations
are also not included.                                           id1entry1               value1id1,...,idn
    Second, LDAP runs directly over TCP or other                 ...                     ...
reliable transport. It avoids the overhead of the OSI            idnentryN               vlaueNid1,...,idn
session and presentation layers, making connection
setup and packet handling faster and simpler.                      Figure 2: Xldbm index structure. The
    Third, LDAP uses simple string representations for           id2entry index stores entries in text form.
most syntaxes. While X.500 encodes data elements as             The attribute indexes map values to lists of
highly structured ASN.1 elements, the LDAP approach                              entry IDs.
encodes them as simple strings. This is a big                       Second, for each indexed attribute, we generate
performance win in encoding/decoding speed and                  another file containing a list of IDs of entries
complexity.                                                     containing each value of the attribute. (See the detailed
    We have made minor extensions to LDAP so that               discussion on each query type below for an explanation
it can be used as a stand-alone directory service, not          of what each value actually is.) This index is keyed by
just a frontend to X.500. The modifications involve             value, making the retrieval of a list of IDs of entries
the addition of referrals to the protocol, and the              containing a given value efficient. These indexes are
development of a backend database supporting the                not text-based, nor are they meant to be read or
LDAP information model and query semantics. The                 manipulated by users (except via a low-level database
similar X.500 and LDAP information and query                    administration program we provide, or indirectly
models make database development for both a similar             through an X.500 or LDAP query). Vaues are
task. The same database design can be used to backend           normalized before they are added to an index (e.g.,
both protocols, with only minor modifications (e.g.,            caseIgnoreString    values       are    case-normalized,



                                                            2
telephoneNumberSyntax values have spaces and dashes               straightforward: the commonName index is consulted
removed). The original values are retained in the                 for the list of entries corresponding to the value "Babs
id2entry index and will be returned in a search.                  Jensen." Next, this list of entries is read from the
Additional indexes of this type are constructed for               id2entry index. The entries returned are guaranteed to
Distinguished Names, aliases and knowledge                        match the filter.
references. These specialized indexes are discussed in
detail later.                                                     4 . 2 Approximate Matching
     Table 1: Text entry representation in the                        There are several approaches to approximate
    id2entry index. The first line contains the                   matching. Phonetic algorithms such as soundex and
      entry ID. The second line contains the                      metaphone [] are popular, and there is ongoing research
      Distinguished Name. Subsequent lines                        into spelling or other error-correcting algorithms such
                contain attributes.                               as the one used by glimpse []. Currently, we support
      12345                                                       both soundex and metaphone, and have plans to
      dn: cn=Babs Jensen, o=Babsco, c=US                          support glimpse. For both phonetic algorithms we
      cn: Babs Jensen                                             chose an approach that makes few assumptions about
      cn: Barbara J Jensen                                        the structure of the data (e.g., it does not assume name
      sn: Jensen                                                  data and attempt to extract a surname for matching
      ...                                                         purposes). This makes our algorithm appropriate for a
                                                                  wide variety of data, and reduces the risk and
                                                                  complexity involved in assuming a semantic structure.
    Given this index structure, answering a simple
                                                                  The disadvantage is that we are unable to take
query is straightforward: look up the requested value in
                                                                  advantage of any knowledge about the type of data to
the appropriate attribute index, returning a list of entry
                                                                  improve performance.
IDs; look up those entry IDs in the id2entry index,
returning the resulting entries to the user. For reasons              We treat the value being matched as a sequence of
discussed below, the entries read from the index files            words. When building the index, a phonetic code is
may be candidate entries, that is, there is no guarantee          generated for each word in the value. The code is then
that they actualy match the query. Therefore each entry           stored in the index, mapping to the ID of the entry
has the filter applied to it directly before it is returned       containing the original value. The value given in an
as a match. This also provides an opportunity to apply            approximate matching query is similarly broken into
access control, size and time limits, etc.                        words and then codes, each of which is looked up in
                                                                  the index.
    Using this simple index structure, we are able to
answer virtually any X.500/LDAP query efficiently.                    An entry is considered to match the query if it has a
The requirements on the underlying database are                   value containing words corresponding to all the given
minimal: index entries are read and written; entry IDs            codes in the proper order. If the query contains multiple
are inserted in and deleted from index entries. Only              words, the lists of associated IDs are intersected to
during range queries and, optionally, some forms of               produce the final list. The words must appear in the
approximate matching is some ordering on the indexes              same order in the value. Since ordering information is
required, implying the btree, rather than hash file,              lost in the index, the filter must be applied directly to
backend.                                                          each candidate entry to determine if it really matches
                                                                  the query. Table 2 shows some example approximate
                                                                  matching queries and values to match against
4      Specific Query Types                                       (including the corresponding phonetic codes generated
    This section discusses xldbm's index use in detail            by the metaphone algorithm), and a brief explanation
for various types of queries. For all queries, the general        of why the value does or does not match the query.
approach is to consult one or more index files to
generate a list of entries. Depending on the query type,
these entries may have the filter applied to them
directly to ensure a match. Individual search primitives
are described first, followed by boolean combinations
of queries.

4 . 1 Simple Equality
   An equality search tests for entries that have a
given value for a certain attribute. For example, a
commonName of "Babs Jensen." Satisfying such a
query using the xldbm index structure is


                                                              3
   Table 2: Example approximate matches                        interesting and challenging posed by X.500/LDAP.
                                                               Both models support arbitrary substring matching on
Query (codes)     Value (codes)      Match?                    text attributes. A query may specify a leading
Babs Jensen       Babs Johnson       Yes - match               substring, trailing substring, arbitrary internal
(BBS JNSN)        (BBS JNSN)                                   substring, or any combination of these to be matched.
                                                               We set out to design a scheme that was fast, efficient
Babs Jensen       Jensen Babs        No - match, but           and flexible. A glimpselike approach, though efficient
(BBS JNSN)        (JNSN BBS)         wrong order               in terms of index space used, was not fast enough and
Jensen            Smith              No - codes do             difficult to update incrementally (e.g., in response to
(JNSN)            (SMO)              not match                 modifications). Other approaches involving fast pattern
                                                               matching on values suffer from order N performance
Bob Smith         Bob A Smith        Yes - match               where N is the size of the data being searched.
(BB SMO)          (BB A SMO)
                                                                   Our solution is to generate all substring
Bob A Smith       Bob Smith          No - code for A           components of a fixed length for each value and index
(BB A SMO)        (BB SMO)           is missing                those. Additional anchors are added to each value,
    There are several options for code generation and          marking the beginning and end of the string. When a
matching in the index. The simplest is to generate             query is presented, similar substring components are
fixed-length codes of some maximum length. This                generated corresponding to it. These components are
makes generation and lookup simple. If the code length         looked up in the index and the resulting lists of entry
is too short, it can lead to unexpected matches because        IDs are intersected to form the list of candidate entries.
of code truncation (e.g., "Babs" matching                      These candidates then have the query applied to them
"Babsikowjskvik" - both produce a code with BBS as             directly, to ensure they match the query. This last step
the first three characters). If the code length is too         is necessary since, as for approximate matching, the
long, it can lead to missed matches that should be             ordering of substrings is not retained in the index.
returned (e.g., "Howe" not matching "Howes"). These            Figure 3 illustrates this process for the value "Babs."
two problems are in conflict.
                                                                Babs               BABS                  ^BABS$
     The solution is to adopt a configurable "prefix"
matching scheme in which a code is considered a match             ^BA
if it contains the key code as a prefix. This solves the                                     Substring
missed match probolem, but can still lead to                      BAB
unexpected matches as described above. To combat this                                        Attribute
problem, we add the constraint that the two codes must
                                                                  ABS
                                                                                               Index
differ in length by at most N characters, where N is an           BS$
administrator-defined constant. Setting N to zero
results in strict code matching. Setting N to a large            Figure 3: Substring index generation for
number results in strict prefix matching (e.g., "Babs"          components of length three. The value is
(BBS) will         match    "Babsik"     (BBSK)      and       normalized and leading and trailing anchors
"Babsikowjskvik" (BBSKJSKFK)). Setting                 N        are added. Then all possible substrings of
somewhere in between results in more reasonable                length three are generated and stored in the
behavior (e.g., "Babs" matches "Babsik", but not                      corresponding attribute index.
"Babsikowjskvik"). We have found two to be a pretty
good number for N.                                                 Our experience shows that a component length of
                                                               three is optimal for databases of around 100,000
    To support this variable prefix matching requires          entries. The optimal length depends on the size of the
the underlying database to support prefix retrieval of         data being indexed, the type of data, and a time-space
codes. With a btree or other ordered method, this is           trade-off in query performance versus index size.
straightforward. With a hash-based scheme, it is more          Longer components result in each substring mapping
difficult. For small values of N and a restricted              to fewer entries, but the number of distinct
phonetic code alphabet (e.g., in the soundex scheme),          components increases. Shorter components reduce this
it is feasible to generate all possible codes of greater       number, but increases the number of entries to which
length (up to N) and look them up. This method                 each component maps. The two extreme cases provide
clearly does not scale well, and our implementation            some insight: A component length of one means that
only implements variable prefix matching with an               each letter is a component. For many types of data,
ordered underlying database.                                   this means that close to every entry will be listed for
                                                               each component, making the list of candidate entries
4 . 3 Substring Matching                                       long. A very long component length degenerates into
                                                               the equality index case, and no advantage is gained.
   The substring matching problem is one of the most


                                                           4
Note also that the component length sets a lower                 which this approach works are limited.
bound on the length of substring queries that can be
supported (e.g., with a component length of three, a             4 . 5 Boolean Combinations
query for *A* cannot be answered).
                                                                     As with any database, one of the most challenging
    In practice, this scheme works surprisingly well.            problems is the support of arbitrary queries. If the
For most data sets and query types, we have found they           query set is restricted and can be predicted ahead of
tend to contain some "power components" that help to             time, design is simplified. In the case of X.500/LDAP
reduce the list of candidates quickly. Pathological cases        (as with the relational model), there is no limit to the
exist in which most entries are listed for each                  complexity of queries. The method by which these
component. In such cases the "power" aspect may be               queries are built is straightforward, though, making the
contained in the component ordering, rather than the             task easier. Boolean combinations of queries include
components themselves, in which case the algorithm               conjunction (AND), disjunction (OR), and negation
reduces to a substring search of the entire space, as            (NOT). If X and Y are queries, so are "X AND Y," "X
candidates are eliminated. We have found such cases to           OR Y," and "NOT X."
be rare in practice.
                                                                     Conjunctive queries are easy to handle. A list of
4 . 4 Ranges                                                     candidates is produced for each conjunction and then
                                                                 intersected. Note that it is not necessary to evaluate
    For attributes supporting some kind of ordering,             candidates before the intersection.
the X.500/LDAP models support inequality queries for
                                                                     Disjunctive queries are similarly straightforward. A
entries containing values greater-than-or-equal-to or
                                                                 list of candidates is produced for each disjunction and
less-than-or-equal-to a given value. Although often the
                                                                 then unioned. Again, candidates need not be evaluated
case in practice, there is no requirement that these two
                                                                 before the union takes place.
operators be used together to form a bounded range
query. With an underlying database supporting ordered               Negation queries are more difficult. A simple
retrieval, responding to such queries is easy. With a            approach is to produce a list of candidates matching the
hash-based scheme, it presents a problem we have not             query and then subtract it from the list of all
yet solved, except in some specific cases.                       candidates. Unfortunately, since the original list may
                                                                 be only a list of candidates not guaranteed to match the
    With ordered retrieval, a greater-than-or-equal-to
                                                                 query, blindly applying this approach can lead to lost
query is answered by retrieving the given value (or
                                                                 matches. The solution is to apply the query (before
"smallest" value greater than it), and then stepping
                                                                 negation) to the list of candidates before performing the
through subsequent values in the ordering. The
                                                                 subtraction. This produces correct results, but can be
resulting lists of entry IDs are unioned together to
                                                                 expensive.
form a single list of candidates. These candidates are
guaranteed to match the filter, so there is no need to               For example if the query is for entries not
apply the filter directly to them. A less-than-or-equal-to       containing an objectClass of person and the database
query is handled similarly. The first item in the                contains a million entries, only one of which is not a
ordering is retrieved, followed by subsequent items              person, the method degenerates into a linear search. In
until an item greater than or equal to the given key is          this case, it would be more efficient to step through
reached. The resulting entry IDs are unioned to form             the values of the objectClass attribute, building
the result. If the query involves range of values (i.e.,         candidates from the ones matching the query. By
greater than one vaue and less than another), obvious            building more knowledge into the database (e.g., how
optimizations can be made. The efficiency of this                many distinct values are in an index), NOT
method is proportional to the number of keys in the              performance can be improved.
range.
    With a hash-based scheme, ordered retrieval is not           5    Aliases and Knowledge References
possible. In some cases, where the ordering can be
approximated by a substring search, a hash-based                     Aliases and knowledge references provide similar
approach can still provide results. For example, an              challenges to database design. Both features create
attribute containing UTC time values has this                    situations where a search must be continued "outside"
property. A query requesting entries with a time greater         of the original search scope, perhaps even outside the
than or equal to 1994 and less than or equal to 1995             original server handling the query. Of the two, aliases
produces the same results as a substring query for a             are more problematic because they can point anywhere,
time with a leading substring of "1994" (plus the                there is no consistency requirement, and they are user-
simple case of a time equal to 1994). This works                 creatable.
because UTC time has a concrete string representation                A search that does not have the searchAliases flag
that is lexicographically increasing. The situations in          set in X.500 or the alias flag set to derefAlways or


                                                             5
derefSearching in LDAP is not affected by aliases. If             to the client. In LDAP, knowledge references are
one of these flags is set, indicating that aliases                returned as referrals (more on knowledge references and
themselves should not be searched, but rather what                their relationship to centroids in Section 6). Figure 5
they point to, a new phase is added to the search                 depicts the structure of the knowledge reference index.
procedure.
    The key to handling aliases is to identify those                                       1
aliases that point outside the scope of the search. If an
alias does not "escape" the scope of the search, the                                                   2
entry it points to will be searched automatically
(because it is contained within the scope, not because
an alias points to it - why it gets searched is
immaterial, as long as it does). Once such aliases are
identified, the search is continued with the entries to
which they point (either the entry itself for a one-level             One-level KR index Subtree KR index
search, or the entry and all its descendents for a subtree
search). Base object searches are easy to handle by                   12                 12
examining the entry directly, and do not require any
                                                                  Figure 5: Knowledge reference scope index.
special indexing.
                                                                  A search beginning at entry 1 is continued
    To efficiently identify aliases that need searching,           at the server identified by the knowledge
two new indexes are maintained, one for one-level                       reference contained in entry 2.
scopes, one for subtree scopes. For each non-leaf entry,
                                                                      Maintenance of the alias and knowledge reference
the one-level index contains an entry containing the
                                                                  indexes is non-trivial but straightforward. When a new
entry IDs of alias children of the entry that do not
                                                                  alias entry is added to the tree, the one-level alias entry
point to other children (i.e., aliases that escape the one-
                                                                  for its parent may need updating (if the alias does not
level scope). Similarly, the subtree index contains
                                                                  point to a sibling entry). The subtree alias entry for
entry IDs of alias descendents of the entry that do not
                                                                  each ancestor may also need updating. For knowledge
point to other descendents. During a search, the list of
                                                                  references, the same is true.
candidate entries is generated as before, and then the
appropriate alias-scope index is consulted to determine               This approach allows the efficient identification of
if there are entries outside the scope that should be             alias and knowledge references at which a search must
searched. Figure 4 illustrates this process for a sampe           be continued. Aliases can be particularly troublesome
tree.                                                             from a performance standpoint. If many aliases escape
                                                                  the scope of a search, each one must be searched
                                                                  individually, causing a significant performance penalty.
                                                                  It's hard to see a general solution to this problem that
                1                      2                          guarantees good performance, and we feel this is a
                                                                  design deficiency in the X.500 and LDAP models.

                             4                                    6     Centroids
     3                                             5
                                                                      A weakness of the X.500/LDAP model is its lack
 One-level alias index Subtree alias index                        of support for wide-area searches. The hierarchical
                                                                  scheme works well for searches whose scope can be
 13                    13                                         restricted using the namespace. For searches that do not
                                                                  have this property, the model degenerates to a search of
 Figure 4: Alias scope index. A subtree or                        the entire tree, contacting every server. Clearly, this
     one-level search starting at entry 1                         approach does not scale well.
    consultss the appropriate index and
 determines it needs to continue the search                           Several solutions have been proposed, including
   with the entry pointed to by entry 3.                          alternate hierarchies; special "yellow pages" portions of
                                                                  the tree where attributes are organized to facilitate
    Knowledge references are handled via a similar                alternate searching; "alias" trees that collect pointers to
approach. Indexes are constructed for one-level and               information in other servers; and out-of-band
subtree knowledge references. Given a search scope and            distributed indexing to help prune the search space. All
the entry ID of the base object, the list of knowledge            these schemes have their advantages, but we chose the
references within that scope can be quickly retrieved. In         latter approach for our system. It averts many of the
X.500, these knowledge references are either used to              maintenance and consistency probems of using aliases,
chain the search or returned as continuation references           and does not require the global cooperation necessary to


                                                              6
implement an alternate namespace. It also has the                    smaller, but value-based centroids allow
advantage of presenting a consistent model to clients;            more accurate and flexible query resolution.
they see the same tree as always, searches just happen                 In this example, the word centroid is
more efficiently.                                                 smaller by one "Babs" and one "Jensen," but
                                                                   gives a false positive match for a query for
   We chose to use centroids as our distributed                                  "Bjorn Johnson."
indexing framework. Adapted from work by Salten [],
and originally proposed for use on the Internet in the              Values              Word centroid     Value centroid
WHOIS++ system, centroids have the potential to
provide efficient wide-area searching in the                        Babs Jensen         Babs              Babs Jensen
X.500/LDAP model. We have adapted centroids to this                 Babs Johnson        Jensen            Babs Johnson
model, and included a few extensions that allow us to               Babs Jensen         Johnson           Bjorn Jensen
support the more flexible query language defined by                 Bjorn Jensen        Bjorn
X.500 and LDAP.                                                       Incorporating centroids into the X.500/LDAP
     The basic centroid model involves generating the             model is not difficult conceptually. A search is
list of distinct words in a database. This list is called a       initiated at some point in the hierarchy. Normally, a
centroid of the database. If centroids are generated for          server would search its own data and chain or refer the
many such databases and given to another server, the              search to all servers holding data below it in the tree. If
server can consult the list of centroids to determine             a server holds centroids for these servers, it can consult
which low-level databases might hold the answer to a              them and only chain or refer the search to those servers
query. An trivial example is shown in Figure 6.                   possibly able to satisfy the query. Figure 7 illustrates
                                                                  this process.
                  Babs             A,B                                                       fix
                  Jensen           A,B
                                                                      Figure 7: Search-space pruning using
                  Johnson          A,B                            centroids. A server holding centroid data for
                  Bjorn            A                               servers below it in the tree only chains or
                  Lars             B                               refers the search to those servers possibly
                                                                            able to answer the query.
                        server C
                                                                      In our database design, we introduce the concept of
                                                                  a centroid entry. Conceptually, the entry holds an
     Babs Jensen                       Babs Jensen                entire centroid, along with access information for the
     Babs Johnson                      Lars Johnson               server that generated the centroid. (In our
     Bjorn Jensen                                                 implementation, this information is contained inthe
                                                                  name of the centroid entry.) The values in the entry are
       server A                           server B                added to the indexes just like normal values. During a
                                                                  search, the indexes are consulted as usual, returning the
 Figure 6: Centroid example. A centroid is a                      ID of the centroid entry. Here the handling of centroid
list of distinct words in a database. A server                    entries differs from regular entries. Instead of returning
 that collects centroids from other servers is                    the entry to the client, the search is continued using
able to determine which servers are likely to                     the access information in the centroid entry, or the
   be able to answer a given query. A search                      access information is returned to the client so it may
  involving the word "Bjorn" can be directed                      continue the search.
                only to server A.                                     This implementation of centroids has several
    We modified the centroid model to include whole               advantages. First, it was very easy to implement. Once
values, rather than words. Simiar modifications were              we developed the index structure described in the
made by the desighers of SOLO [], who use centroids               previous sections, it took literally less than a dozen
for navigation and searching. The use of values in the            lines of code to support centroids. Second, the pruning
centroid enables a broader range of searches to be                happens through the normal indexing consultation
supported (e.g., substrings), and fewer "false positives"         process. Third, the centroid generation and addition
to be returned. There is a trade-off, of course. The              process is done using normal DAP or LDAP
centroids produced are larger, not having as attractive           operations. Centroids are added using the add operation
collapsing properties as           their    word-oriented         and deleted using the delete operation. Existing
counterparts. Table 3 provides a comparison between               centroids are modified using the modify operation (e.g.,
the word and value approaches to centroid generation.             in response to incremental changes to the centroid). A
                                                                  separate process is responsible for using these
   Tablle 3: Word versus value centroid                           operations to generate and apply centroid changes to
comparison. Word-based centroids tend to be                       and from the database. This makes it easy to add new


                                                              7
schemes later, or change the existing scheme.                          Second, our original design called for index entries
                                                                   as single blocks of entry IDs. In practice, these blocks
                                                                   become quite large. For example, in our database there
7    Modifications                                                 are close to 100,000 people entries, causing the single
    Both X.500 and LDAP support adding, deleting and               objectClass index entry for the value person to be
modifying entries in the directory. They assume that               about 400,000 bytes (100,000 entries * 4 bytes per
read requests of the directory are far more frequent than          entry ID). If an entry is added, this index entry
writes, but modify performance is still an issue. In               increases by one entry ID, causing the entire 400K
both models, every set of modifications must either                block to be rewritten. This caused a severe performance
succeed or fail as a group. Although we do not provide             penalty. To combat the problem, we introduced a
a transaction system with roll-back capability, we do              scheme in which index entries are broken up into file
minimize the time during which a serious failure can               system-sized block fragments. Small blocks are stored
occur.                                                             as before. Large blocks are accessed with a level of
                                                                   indirection through a "header" block which points to
    A modification is implemented as a three-step                  the fragments. A slight penalty is paid when the block
process. First, an in-memory copy of the entry is                  is read (several small reads take longer than than one
changed. If this fails, or the entry fails to satisfy the          large one), but the performance increase on a modify is
directory schema requirements after the modifications              substantial. Generally, only one fragment needs
are applied, the operation is aborted. Second, the                 updating.
affected indexes are updated. This involves inserting
and deleting entry IDs from index entries. If anything                 Third, we noticed that some index entries,
goes wrong here, the operation is aborted. Third, the              particularly in the substring index, were so large that
id2entry index containing the entry itself is updated. In          they contributed little to reducing the number of
case of a catastrophic failure, this is the only file that         candidates. Worse, such blocks contribute to poor
is crucial. The other indexes can be reconstructed from            performance, both on reads and writes. To reduce the
the id2entry index. So, the only critical section of the           effects of such large blocks, we introduced the concept
modify is the update of the id2entry index. Since each             of an allIDs block. The allIDs block is a simpe stand-
entry is represented in the index by a contiguous                  in for a block containing all entry IDs in the database.
sequence of characters, the update of this index can be            It contains a single flag-valued entry ID, making it
done with a single write operation, further reducing the           very small. The allIDs block acts like the universal set
risk.                                                              in index operations. The block size above which a
                                                                   block is replaced by an allIDs block is configurable.
    In practice, we find this level of reliability adequate,       The optimum size depends on the database size and
especially when combined with the ability to do                    content. In practice, we have found a size of between
replication, which we have implemented. A full                     5,000 and 10,000 entry IDs to work well for our
discussion of our replication strategy is beyond the               100K-sized environment. Pathological cases exist in
scope of this paper, but changes are logged to a file              which this approach leads to a linear search of the
which is then read by a separate replication process,              database, but we have found these cases to be rare.
responsible for distributed changes to any number of
replicas.                                                             Finally, our biggest performance enhancement
                                                                   comes from extensive use of caching. The most
                                                                   expensive part of the search operation is the reading of
8    Performance Enhancements                                      candidates from the id2entry index, and the subsequent
                                                                   application of the filter. We maintain a configurable
    As described, the system works surprisingly well,              in-memory cache of entries, keyed by DN and by entry
especially considering its simplicity. We have made                ID. Table 4 shows a comparison of some typical
several enhancements that have significantly increased             searches, with "cold" and "hot" entry caches. We also
performance, especially on modifies. First, our original           maintain a cache of open index files, and the
design called for a separate index file for each attribute         underlying hash or btree database keeps a cache of its
and index type (i.e., for each attribute files for the             own.
equality index, substring index and approximate index
are created). To reduce the number of indexes open at               Table 4: Effect of caching on performance
one time, we have combined these three files into one.                         for various queries.
Doing so required the addition of a prefix scheme for
                                                                      Query              Cold cache        Hot cache
storing the value keys in order to avoid conflicts.
Equality keys are prefixed by "=", substring keys are                 cn=Babs            -                 -
prefixed by "*", and approximate keys are prefixed by
"~". This reduces by a factor of three the number of                  cn=*Babs*          -                 -
open indexes necessary, with little impact on                         cn~=Babs           -                 -
performance.


                                                               8
                                                                    Directory Access Protocol. Request            For
                                                                    Comments (RFC) 1777, March, 1995.
9    Limitations                                                [3] T. Howes, S. Kille, W. Yeong, C. Robbins, The
                                                                    String Representation of Standard Attribute
    Our scheme performs well, but it does have                      Syntaxes. RFC 1778, March 1995.
limitations. First, the scheme will not scale to more
than a few hundred thousand to a million entries. The
algorithms are sound, but the use of simple file                Author Information
system-based hash and btree packages incurs a
                                                                    Tim Howes is a Senior Systems Research
performance penalty. Replacing these underlying
                                                                Programmer for the University of Michigan's
systems with a DBMS system would undoubtedly
                                                                Information Technology Division. He received a
improve performance and increase the number of
                                                                B.S.E. in Aerospace Engineering, a M.S.E. in
entries the scheme can handle.
                                                                Computer Engineering from U-M, and is completing a
    Second, we feel modify performance is still                 Ph.D. in Computer Science. He is currently project
unacceptably poor. The limiting factor here again is            director and principal investigator for the NSF-
the underlying database package. A file system-based            sponsored WINX project, and in charge of directory
approach, not optimized for modification, nor for our           service development and deployment at U-M. He is the
application in particular, simply cannot compete with           primary architect and implementor of the U-M LDAP
a commercial DBMS.                                              directory package, the DIXIE system, the GDA X.500
                                                                DSA, and a major developer of the QUIPU X.500
    Finally, the user-friendy text representation used to       implementation. He is author or co-author of several
store entries in the id2entry index slows down the              papers and RFCs, including RFC 1777 and RFC 1778
reading of entries. Especially in the X.500 case, there         defining the LDAP protocol. He is chair of the IETF
is significant time spent converting from the text              Access, Searching, and Indexing of Directories working
format to an internal representation. Storing entries in        group, and an active member of the ACM and IEEE.
BER-encoded ASN.1 (the representation used on the
wire) would greatly improve this performance. It would
require developing new BER-aware versions of many
utility routines, including things like strcmp and
friends.

1 0 Summary
    This paper describes the design and implementation
of xldbm, a backend database for X.500 and stand-alone
LDAP. The system uses simple freely-available
database technology and provides good performance,
reliability, and recovery. Xldbm handles virtually all
types of X.500/LDAP queries efficiently, including
full substring and approximate matches. Aliases and
knowledge references are handled through clever index
construction. Centroids have been adapted for use as a
search space-pruning device. Several performance
enhancements have been implemented, making the
system perform surprisingly well.

1 1 Acknowledgements
   This material is based upon work supported by the
National Science Foundation under Grant No. NCR-
9416667.

References
[1] The Directory: Overview of Concepts, Models and
    Service. CCITT Recommendation X.500, 1988.
[2] W. Yeong, T. Howes, S. Kille, Lightweight


                                                            9