Information about http://download-east.oracle.com/otndocs/tech/semantic_web/pdf/ind3p1.pdf

Supporting Ontology-based Semantic Matching in RDBMS …

Tags: clinical applications, flexibility, geographic information system, guide application, jagannathan srinivasan, knowledge acquisition system, nashua nh, ontologies, oracle, oracle corporation, oracle drive, rdbms, restaurant guide,
Pages: 12
Language: english
Created: Fri Jun 11 16:23:49 2004
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
Page 6
image
Page 7
image
Page 8
image
Page 9
image
Page 10
image
Page 11
image
Page 12
image
     Supporting Ontology-based Semantic Matching in RDBMS
              Souripriya Das, Eugene Inseok Chong, George Eadon, Jagannathan Srinivasan
                                          Oracle Corporation
                               One Oracle Drive, Nashua, NH 03062, USA

                                                                                 possible sharing of knowledge among multiple
                             Abstract
                                                                                 applications, and the flexibility of evolving the
      Ontologies are increasingly being used to build                            knowledge without requiring changes to the application.
      applications that utilize domain-specific                                     This approach has been used to build applications for
      knowledge. This paper addresses the problem of                             various domains (such as clinical applications [3],
      supporting ontology-based semantic matching in                             geographic information system [2], integrated knowledge
      RDBMS. Specifically, 1) A set of SQL                                       management [1], and knowledge acquisition system [7]).
      operators,       namely        ONT_RELATED,                                The same approach can be adopted to build database
      ONT_EXPAND,           ONT_DISTANCE,          and                           applications. This paper addresses the problem of
      ONT_PATH, are introduced to perform                                        supporting ontology-based semantic matching in
      ontology-based semantic matching, 2) A new                                 RDBMS.
      indexing scheme ONT_INDEXTYPE is                                              To motivate the need for ontology-based semantic
      introduced to speed up ontology-based semantic                             matching, consider a restaurant guide application, which
      matching operations, and 3) System-defined                                 recommends restaurants to a user based on her/his
      tables are provided for storing ontologies                                 preferences. Consider a table served_food that contains
      specified in OWL. Our approach enables users                               the types of cuisines served at restaurants.
      to reference ontology data directly from SQL
                                                                                                    Table 1: served_food
      using the semantic match operators, thereby
      opening up possibilities of combining with other                           R_id                          Cuisine
      operations such as joins as well as making the                             1                             American
      ontology-driven applications easy to develop                               2                             Mexican
      and efficient. In contrast, other approaches use                           2                             American
      RDBMS only for storage of ontologies and                                   14                            Portuguese
      querying of ontology data is typically done via
      APIs. This paper presents the ontology-related                                In the absence of semantic matching, the application
      functionality including inferencing, discusses                             would most likely resort to syntactic matching via the
      how it is implemented on top of Oracle                                     `=' operator as shown below:
      RDBMS, and illustrates the usage with several                                   SELECT * FROM served_food
                                                                                      WHERE cuisine = `Latin American';
      database applications.
                                                                                    This query generates no rows since none of Cuisine
                                                                                 values in the table will match `Latin American'.
1. Introduction                                                                     In contrast, the user can get more meaningful results
An ontology is a shared conceptualization of knowledge                           by performing semantic matching that consults an
in a particular domain. It facilitates building applications                     ontology (such as the cuisine ontology in Figure 1) for
by separating knowledge about the target domain from                             computing the results. Specifically, a user can issue the
the rest of the application code. The key benefits of this                       following query:
approach are: simplification of the application code,                                 SELECT * FROM served_food
                                                                                      WHERE ONT_RELATED(cuisine,
                                                                                                          `IS_A',
Permission to copy without fee all or part of this material is granted                                    `Latin American',
provided that the copies are not made or distributed for direct                                           `Cuisine_ontology')=1;
commercial advantage, the VLDB copyright notice and the title of the                  Here the ONT_RELATED operator determines if the two
publication and its date appear, and notice is given that copying is by
permission of the Very Large Data Base Endowment. To copy
                                                                                 input terms are related by the input relationship type
otherwise, or to republish, requires a fee and/or special permission             argument by consulting the specified ontology. If they are
from the Endowment                                                               related, then the operator will return 1, otherwise 0.
Proceedings of the 30th VLDB Conference,
Toronto, Canada, 2004



                                                                          1054
                                                                         Healthcare Applications) can take advantage of this
                                 Cuisine                                 capability. These capabilities can be exploited to support
                                                                         semantic web applications (such as web service discovery
                                                 Latin American          [8]) as well. A key requirement in these applications is to
             Asian               Western                                 provide semantic matching between syntactically
                                                                         different terms or sometimes between syntactically same,
                           American        European                      but semantically different terms [10].
                                                        Mexican
      South Asian       Far Eastern                                          Another category of the semantic matching application
                                                    Portuguese           is related to knowledge reuse. Neutral Authoring [9] is an
                                       Italian
                                                   German                application area, where information is represented in a
                    Chinese        Japanese
    Indian   Pakistani                                                   single language and then converted into different
                            Korean
                                                                         languages for multiple target systems. Corporate
               Figure 1: A Cuisine Ontology                              knowledge bases on which documentation and software
     (Each node represents an Individual and each edge                   development can be based is another big application area
    represents a transitive ObjectProperty `IS_A')                       of such a capability.
                                                                             Support for ontology-based semantic matching is
   The query identifies rows containing cuisines that are                achieved by introducing the following:
related to `Latin American' based on `IS_A'                                  · Two new SQL operators, ONT_RELATED and
relationship. The query will generate restaurants 2 and 14               ONT_EXPAND (as described above) are defined to model
since `Mexican' and `Portuguese' are related to                          ontology-based semantic matching operations. For
`Latin American' cuisine. Thus, one can incorporate                      queries involving ONT_RELATED operator, two
semantics of the particular knowledge domain in SQL                      ancillary SQL operators            ONT_DISTANCE         and
queries by introducing ontology-based semantic                           ONT_PATH, are defined that return distance and path
matching.                                                                respectively for the filtered rows. Additional operators
   Optionally, a user may want to get a measure for the                  may be introduced for querying purposes, e.g., for finding
rows filtered by ONT_RELATED operator. This can be                       datatype property values.
achieved by using ONT_DISTANCE ancillary operator.                           · A new indexing scheme ONT_INDEXTYPE is
The ONT_DISTANCE operator gives a measure of how                         defined to speed up ontology-based semantic matching
closely the terms are related by measuring the distance                  operations.
between the two terms. Continuing with the example, one                      · A schema has been designed to store information
can get the result sorted on distance measure as follows:                extracted from an ontology. This schema is not directly
     SELECT * FROM served_food
     WHERE ONT_RELATED (cuisine,                                         visible to the user.
                        `IS_A',                                              The proposed functionality can be implemented by
                        `Latin American',                                exploiting the database extensibility capabilities (namely,
                        `Cuisine_ontology',                              the ability to define user-defined operators, user-defined
                         123) = 1
                                1
     ORDER BY ONT_DISTANCE (123 );                                       indexing schemes, and table functions) typically available
                                                                         in an RDBMS.
   Similarly, another ancillary operator ONT_PATH would                      We support ontologies specified in Web Ontology
be useful, which computes path information between the                   Language (OWL [19], specifically, OWL Lite and OWL
two terms.                                                               DL) by extracting information from the OWL document
   In addition, a user may want to query an ontology                     and then storing this information in the schema.
independently (without involving user tables). The                           Oracle's Extensibility Framework [5] has been used to
ONT_EXPAND operator can be used for this purpose (see                    implement the operators and the new indexing scheme.
Section 2 for details).                                                  Specifically, ONT_RELATED, ONT_DISTANCE, and
   Providing     ontology-based     semantic    matching                 ONT_PATH operators are implemented as user-defined
capability as part of SQL greatly facilitates developing                 operators. ONT_EXPAND is implemented as a table
ontology-driven database applications. Applications that                 function. The operator implementation typically requires
can benefit include e-commerce (such as supply chain                     computing a transitive closure based upon explicit
management, application integration, personalization, and                relationships and inferred relationships. This is performed
auction). Also, applications that have to work with                      via queries with CONNECT BY clause. The
domain-specific knowledge repositories (such as                          ONT_INDEXTYPE is implemented as a user-defined
BioInformatics, Geographical Information Systems, and                    indexing scheme (See Section 3.3 for details).

1
    This argument identifies the filtering operator expression
(ONT_RELATED) that computes this ancillary value [11].


                                                                  1055
1.1 Related Work                                                             used to query the ontology independently,
                                                                             whereas ONT_RELATED operator can be used to
Ontologies have been around for sometime and in recent
                                                                             perform queries on a user table holding ontology
years they have received wider attention in the context of
                                                                             terms.
semantic web [4]. Several ontology building tools have
been developed (for example, OntoEdit [13], OntoBroker                   · Optionally, a user can use two (ancillary)
[14], OntologyBuilder and OntologyServer [16], KAON                          operators, ONT_DISTANCE and ONT_PATH, in
[15]). Most of these tools use a file system to store                        queries involving the ONT_RELATED operator to
ontologies. Among these, KAON and VerticalNet                                get additional measures (distance and path) for
products (OntologyBuilder and OntologyServer) as well                        the filtered rows.
as the Jena2 Semantic Web Framework [21] allow storing                  These are further elaborated below.
of ontology using RDBMS and they provide an API for
                                                                    2.2 RDBMS Schema for Storing Ontologies
access and manipulation of ontologies. However, the key
difference is that our approach makes ontology-based                An RDBMS schema has been created for storing
semantic matching available as part of SQL.                         ontologies specified in OWL. The tables in this schema
                                                                    include:
1.2 Organization of the Paper
                                                                    Ontologies (
Section 2 presents a feature overview of supporting                   OntologyID, OntologyName, Owner, ...)
ontology-based semantic matching operations. Section 3
discusses the implementation of the ontology-related                Terms (
                                                                      TermID, OntologyID, Term, Type, ...)
changes on top of Oracle RDBMS and its performance
results. Section 4 illustrates their usage with several             Properties (
database applications. Section 5 describes our experience             OntologyId, PropertyID,
                                                                      DomainClassID, RangeClassID,
and Section 6 concludes with a summary and outlines                   Characteristics, ...)
future work.
                                                                    Restrictions (
2. Supporting Ontology-based Semantic                                 OntologyID, NewClassID, PropertyID,
                                                                      MinCardinality, MaxCardinality,
   Matching on Oracle RDBMS                                           SomeValuesFrom, AllValuesFrom, ...)
Although the feature is described in the context of Oracle
RDBMS, it can be supported in any RDBMS that support                Relationships (
                                                                      OntologyID, TermID1, PropertyID,
a few basic extensibility capabilities as outlined in                 TermID2, ...)
Section 1.
                                                                    PropertyValues (
                                                                      OntologyID, TermID, PropertyID,
        ONT_EXPAND         ONT_RELATED Operator
                           (with ONT_PATH &                           Value, ...)
        Operator
                           ONT_DISTANCE ancillary
                           operators)
                                                                    ·     Ontologies: Contains basic information about
                                                                          various ontologies.
                                                                    ·     Terms: Represents classes, individuals, and
                                                                          properties in the ontologies. A term is a lexical
                                                                          representation of a concept within an ontology.
                                                                          TermID value is generated to be unique across all
        System Defined Tables      User Tables
        for storing Ontologies
                                                                          ontologies. This allows representation of references
                                                                          to a term in a different ontology than the one that
        Figure 2: Ontology-related Functionality                          defines the term. Also, even an OntologyID is
                                                                          handled as a TermID which facilitates storing values
2.1 Overview                                                              for various properties (e.g., Annotation Properties)
                                                                          and other information that applies to an ontology
The ontology-related functionality (see Figure 2) is as                   itself. Note that, as a convention, any column in the
follows:                                                                  above schema whose name is of the form "...ID..",
     · An RDBMS schema, consisting of several                             would actually contain TermID values (like a
         system-defined tables (see Section 2.2 for                       foreign key).
         details), is created for storing information               ·     Properties: Contains information about the
         extracted from the ontologies.                                   properties. Domain and range of a property are
     · Two operators are provided for querying                            represented with TermID values of the
         purposes. The ONT_EXPAND operator can be                         corresponding classes. Characteristics indicate


                                                             1056
        which of the following properties are true for the          extracted and then stored into the system-defined tables
        property: symmetry, transitivity, functional, inverse       in the RDBMS schema described above.
        functional.                                                     The Ontologies table stores some basic information
·       Restrictions: Contains information about property            about all the ontologies that are currently stored in the
        restrictions. Restrictions on a property results in          database. A portion (view) of this table is visible to the
        definition of a new class. This new class is not             user.
        necessarily named (i.e., `anonymous' class) in
        OWL. However, internally we create a new (system-           2.3 Modeling Ontology-based Semantic Matching
        defined) class for ease of representation.                  To support ontology-based semantic matching in
·       Relationships: Contains information about the               RDBMS several new operators are defined.
        relationship between two terms.
·       PropertyValues: Contains                   2.3.1 ONT_RELATED Operator. This operator models
        information associated with the terms. In order to          the basic semantic matching operation. It determines if
        handle values of different data types, some                 the two input terms are related with respect to the
        combinations of the following may be used: Define           specified RelType relationship argument within an
        separate tables (or separate columns in the same            ontology. If they are related it returns 1, otherwise it
        table) for each of the frequently encountered types         returns 0.
        and use a generic self-describing type (ANYDATA                ONT_RELATED (Term1, RelType, Term2,
        in Oracle RDBMS) to handle any remaining types.                                 OntologyName
                                                                       ) RETURNS INTEGER;
System-defined Classes for Anonymous Classes: We                       The RelType can specify a single ObjectProperty (for
create internal (i.e., not visible to the user) or system-          example, `IS_A', `EQV', etc.) or it can specify a
defined classes to handle OWL anonymous classes that                combination of such properties by using AND, NOT, and
arise in various situations such as Property Restrictions,          OR operators (for example, `IS_A OR EQV'). Note that
enumerated types (used in DataRange), class definitions             both Term1 and Term2 need to be simple terms. If
expressed as expression involving IntersectionOf,                   Term2 needs to be complex involving AND, OR, and
UnionOf, and ComplementOf.                                          NOT operators, user can issue query with individual
Bootstrap Ontology: The first things that are loaded into           terms and combine them with INTERSECT, UNION, and
the above schema are the basic concepts of OWL itself.              MINUS operators. See Section 2.3.4 for an example.
In some sense this is like the bootstrap ontology. For                 RelType specified as an expression involving OR and
example:                                                            NOT operators (e.g., FatherOf OR MotherOf) is treated
    ·      Thing and Nothing are stored as Classes.                 as a virtual relationship (in this case say AncestorOf)
                                                                    that is transitive by nature (also see Section 3.2.5).
    ·      subClassOf is stored as a transitive (meta)
           property that relates two classes.                       2.3.2 ONT_EXPAND Operator. This operator is
                                                                    introduced to query an ontology independently. Similar to
    ·      subPropertyOf is stored as a transitive (meta)           ONT_RELATED operator, the RelType can specify either
           property that relates two properties.                    a simple relationship or combination of them.
                                                                       CREATE TYPE ONT_TermRelType AS OBJECT (
    ·      disjointWith is stored as a symmetric (meta)
           property that relates two classes.                           Term1Name VARCHAR(32),
                                                                        PropertyName VARCHAR(32),
    ·      SameAs is stored as a transitive and symmetric               Term2Name VARCHAR(32),
           property that relates two individuals in Thing               TermDistance NUMBER,
           class.                                                       TermPath VARCHAR(2000)
                                                                       );
   Storing these OWL concepts as a bootstrap ontology
                                                                       CREATE TYPE ONT_TermRelTableType AS
facilitates inferencing. A simple example would be the                  TABLE OF ONT_TermRelType;
following: If C1 is a subclassOf C2 and C2 is a
subclassOf C3, then (by transitivity of subClassOf) C1 is              ONT_EXPAND (Term1, RelType, Term2,
                                                                                   OntologyName
a subclassOf C3. Note that the reflexive nature of                     ) RETURNS ONT_TermRelTableType;
subClassOf and SubPropertyOf is handled as a special
case.                                                                  Typically, non-NULL values for RelType and Term2
                                                                    are specified as input and then the operator computes all
Loading Ontologies: An ontology is loaded into the                  the appropriate  tuples in the
database by using an API that takes as input an OWL                 closure taking into account the characteristics (transitivity
document. Information from the OWL document is                      and symmetry) of the specified RelType. In addition, it



                                                             1057
also computes the relationship measures in terms of                               ONT_RELATED(sf.cuisine,
                                                                                              `IS_A OR EQV',
distance (TermDistance) and path (TermPath). For                                             `Latin American',
cases when a term is related to input term by multiple                                        `Cuisine_ontology')=1
paths, one row per path is returned. It is also possible that                     AND r.price_range = `$';
ONT_EXPAND invocation may specify input values for                     2.3.5 Discussion. Note that the queries in section 2.3.4
any one or more of the three parameters or even none of                can also be issued using the ONT_EXPAND operator. For
the three parameters. In each of these cases, the                      example, the first query in that section can alternatively
appropriate set of  tuples is                   be expressed using ONT_EXPAND as follows:
returned.                                                                  SELECT r.name, r.address
                                                                           FROM served_food sf, restaurant r
2.3.3 ONT_DISTANCE and ONT_PATH Ancillary                                  WHERE r.id = sf.r_id AND
Operators. These operators compute the distance and                              sf.cuisine IN
path measures respectively for the rows filtered using                           (SELECT Term1Name from TABLE(
ONT_RELATED operator.                                                             ONT_EXPAND(NULL, `IS_A OR EQV',
                                                                                              `Latin American',
   ONT_DISTANCE (NUMBER) RETURNS NUMBER;                                                     `Cuisine_ontology')));
   ONT_PATH (NUMBER) RETURNS VARCHAR;
   A single resulting row can be related in more than one                The ONT_RELATED operator is provided in addition to
                                                                       ONT_EXPAND operator for the following reasons:
way with the input term. For such cases, the above
operators return the optimal measure, namely smallest                  ·    The ONT_RELATED operator provides a more natural
distance or shortest path. For computing all the matches,                   way of expressing semantic matching operations on
the following two operators are provided:                                   column holding ontology terms.
   ONT_DISTANCE_ALL (NUMBER)                                           ·    It allows use of an index created on column holding
    RETURNS TABLE OF NUMBER;                                                ontology terms to speed up the query execution by
   ONT_PATH_ALL (NUMBER)                                                    taking column data into account.
    RETURNS TABLE OF VARCHAR;
2.3.4 A Restaurant Guide Example. Consider a                           2.4 Inferencing
restaurant guide application that maintains type of cuisine
served at various restaurants. It has two tables, 1)                   Inferencing rules employing the symmetry and transitivity
restaurants containing restaurant information, and 2)                  characteristics of properties are used to infer new
served_food containing the types of cuisine served at                  relationships. This kind of inferencing can be achieved
restaurants.                                                           through the use of the operators defined above (see
   The restaurant guide application takes as input a type              Section 3.2 for details). Note that our support for
of cuisine and returns the list of restaurants serving that            inferencing is restricted to OWL Lite and OWL DL, both
cuisine. Obviously, applications would like to take                    of which are decidable.
advantage of an available cuisine ontology to provide                     To support more complete inferencing using the above
better match for the user queries. The cuisine ontology                operators, an initial phase of inferencing is done after
describes the relationships between various types of                   ontology is loaded, and the results of this inferencing are
cuisines as shown earlier in Figure 1.                                 stored persistently and used in subsequent inferencing.
    Thus, if a user is interested in restaurants that serve            Several examples of the initial inferencing follow.
 cuisine of type `Latin American', the database                           All subPropertyOf relationships are derived and stored
 application can generate the following query:                         during this phase. The transitive aspects of sameAs (e.g.,
   SELECT r.name, r.address
                                                                       sameAs(x,y) AND sameAs(y,z) IMPLIES sameAs(x,z))
   FROM served_food sf, restaurant r                                   can be handled by the           operators defined above.
   WHERE r.id = sf.r_id AND                                            However, the more complex rules which imply sameAs
          ONT_RELATED(sf.cuisine,                                      (e.g., p is a functional property AND p(a,x) AND p(b,y)
                       `IS_A OR EQV',
                      `Latin American',                                AND sameAs(a,b) IMPLIES sameAs(x,y)) are best
                       `Cuisine_ontology')=1;                          handled outside of the operator implementation. We
   To query on `Latin American' AND `Western'                          expect these complex sameAs inferences to be done
the application program can obtain rows for each and use               during the initial inferencing phase. To provide the
the SQL INTERSECT operation to compute the result.                     semantics of sameAs during closure computation, the
                                                                       operators will treat an individual `I' as `I OR J' for all J
   Also, the application can exploit the full SQL
                                                                       where sameAs(I, J).
expressive power when using ONT_RELATED operator.
                                                                          Furthermore, we are introducing an internal
For example, it can easily combine the above query
                                                                       relationship that will be useful for inferencing over
results with those restaurants that have lower price range.
                                                                       complex subPropertyOf and inverseOf interactions:
   SELECT r.name                                                       SubPropertyOfInverseOf(f,g) iff f(x,y) IMPLIES g(y,x).
   FROM served_food sf, restaurant r
   WHERE r.id = sf.r_id AND                                            Again, we expect that the rules that introduce


                                                                1058
SubPropertyOfInverseOf (spiOf, for short) into an                    3.1 Operators
ontology (e.g., inverseOf(f,g) IMPLIES spiOf(f,g) AND
                                                                     The ONT_RELATED operator is defined as a primary
spiOf(g,f)) will be handled during the initial inferencing
                                                                     user-defined operator, with ONT_DISTANCE and
phase. However, the transitive aspects of spiOf can be
                                                                     ONT_PATH as its ancillary operators. The primary
handled as a special case within our operators, according
                                                                     operator computes the ancillary value as part of its
to the following rules:
                                                                     processing [11]. In this case, ONT_RELATED operator
· subPropertyOf(f,g) AND spiOf(g,h)                                  computes the relationship. If ancillary values (the
      IMPLIES spiOf(f,h)                                             distance and path measure) are required, it computes
     (Proof: f(x,y) IMPLIES g(x,y) IMPLIES h(y,x))                   them as well.
·    spiOf(f,g) AND subPropertyOf(g,h)                                  Note that the user-defined operator mechanism in
       IMPLIES spiOf(f,h)                                            Oracle allows for sharing state across multiple
     (Proof: f(x,y) IMPLIES g(y,x) IMPLIES h(y,x))                   invocations. Thus, the implementation of the
                                                                     ONT_RELATED operator involves building and compiling
·    spiOf(f,g) AND spiOf(g,h)
                                                                     an SQL query with CONNECT BY clause (as described
       IMPLIES SubPropertyOf(f,h)
                                                                     in Section 3.2) during its first invocation. Each
     (Proof: f(x,y) IMPLIES g(y,x) IMPLIES h(x,y))
                                                                     subsequent invocations of the operator simply uses the
                                                                     previously compiled SQL cursor, binds it with the new
         ParentOf                   SubPropertyOfInverseOf           input value, and executes it to obtain the result.
                                       (inferred relation)               The ONT_EXPAND operator is defined as a table
             subPropertyOf
                                                                     function as it returns a table of rows, which by default
         MotherOf                          hasMother                 includes the path and distance measures.
                        inverseOf
                                                                     3.2 Basic Algorithm
   Given the expansion of subPropertyOf and spiOf, we
can find all relationship tuples for a given property,               Basic processing for both ONT_RELATED and
including those that are implied through sub-properties              ONT_EXPAND involves computing transitive closure,
and inverse-properties. Consider the following example               namely, traversal of a tree structure by following
(see adjoining figure), where non-transitive properties are          relationship links given a starting node. Also, as part of
used for clarity, which expands ParentOf. In this case, if           transitive closure computation, we need to track the
we have subPropertyOf(MotherOf, ParentOf) and                        distance and path information for each pair formed by
inverseOf(MotherOf, hasMother), we will get                          starting node and target node reached via the relationship
spiOf(hasMother, ParentOf) based on result of the initial            links.
inferencing phase and the above rules. Then                             Oracle supports transitive closure queries with
hasMother(child, mother) will be sufficient to yield                 CONNECT BY clause as follows:
ParentOf(mother, child) in the result set:                              SELECT ... FROM ... START WITH 
                                                                        CONNECT BY ;
    SELECT r.termID1, 'ParentOf', r.termID2
    FROM relationships r, terms t                                        The starting node is selected based on the condition
    WHERE r.propertyID = t.termID                                    given in START WITH clause, and then nodes are
      AND t.term IN
     (select term1Name FROM                                          traversed based on the condition given in CONNECT BY
      TABLE(ONT_EXPAND(NULL,                                         clause. The parent node is referred to by the PRIOR
                       'subPropertyOf',                              operator. For computation of distance and path, the
                       'ParentOf',
                       'Family_ontology')))                          Oracle-provided       LEVEL       psuedo-column       and
                                                                     SYS_CONNECT_BY_PATH function are respectively
    UNION
                                                                     used in the select list of a query with CONNECT BY
    SELECT r.termID2, 'ParentOf', r.termID1
    FROM relationships r, terms t                                    clause.
    WHERE r.propertyID = t.termID                                        Note that in the system-defined Relationships table, a
      AND t.term IN                                                  row represents `TermID1 is related to TermID2 via
     (select term1Name FROM
      TABLE(ONT_EXPAND(NULL,                                         PropertyID relationship.' For example, if `A IS_A B',
                       'spiOf',                                      it is represented as the row  assuming
                       'ParentOf',                                   that the ontologyID is 1.
                       'Family_ontology')))                              Note that any cycles encountered during the closure
3. Implementation of Ontology Related                                computation will be handled by the CONNECT BY
   Functionality on Oracle RDBMS                                     NOCYCLE query implementation available in Oracle
                                                                     10g (not explicitly shown in the examples below). Also,
This section describes how the ontology-related                      the proposed index-based implementation (described in
functionality is implemented on top of Oracle RDBMS.


                                                              1059
Section 3.3) can handle this case even in Oracle 9i                              relation = `IS_A'
                                                                               CONNECT BY
Release 2.                                                                      PRIOR term1 = term2 AND
   For simplicity, we use a slightly different definition for                   relation = 'IS_A');
the relationships table where term names are stored                    The text in boldface above is the portion that has been
instead of termIDs as follows:                                         converted. Basically, the third argument is translated into
Relationships (
                                                                       START WITH clause and the second argument into
  OntologyName, Term1, Relation, Term2,                                CONNECT BY clause.
  ...)                                                                 The result for this query is as follows:
   To illustrate the processing, we use the restaurant                 NAME                        ADDRESS
guide example. The data for the two tables used are as
                                                                       Chilis                      ......
shown in Table 2 below.
                                                                       Maharaj                     ......
         Table 2: Example Restaurant Database
                                                                       Niva                        ......
   restaurant                      served_food
                                                                       3.2.2 Handling OR Operator. Consider a case where
Id     Name       price_r             R_id   cuisine                   `Brazilian' cuisine was not originally included in the
                            ......
                  ange                1      American                  ontology and is now inserted under the `South
1      Mac        $                   2      Mexican                   American' cuisine. Also, to put `South American'
2      Chilis     $$                  2      American                  cuisine in the same category as `Latin American'
                                                                       cuisine, the transitive and symmetric `EQV' relationship
3      Anthonys   $$$                 3      American
                                                                       is used as shown in the Figure 3 below:
                                      4      American
4      BK         $
                                      5      American                                        EQV
5      Uno        $$
                                      5      Italian                      Latin American                    South American
6      Wendys     $
                                      6      American
7      Dabin      $$                  7      Korean                                                         Brazilian
8      Cheers     $$                  7      Japanese                          Figure 3: Adding EQV Relationship
9      KFC        $                   8      American                  Now, to get `Latin American' cuisine, disjunctive
10     Sizzlers   $$                  9      American                  conditions should be used to traverse both relationship
                                      10     American                  links, that is, `IS_A' and `EQV'. Such disjunctive
11     Rio        $$                                                   conditions can be directly specified in the START WITH
                                      11     Brazilian
12     Maharaj    $$                                                   and CONNECT BY clauses.
                                      12     Mexican
13     Dragon     $$                                                   Original Query:
                                      12     Indian
                                                                         SELECT r.name, r.address
14     Niva       $$                  13     Chinese                      FROM served_food sf, restaurant r
                                                                           WHERE r.id = sf.r_id AND
                                      14     Portuguese
                                                                                 ONT_RELATED(sf.cuisine,
                                                                                            `IS_A OR EQV',
                                                                                            `Latin American',
3.2.1 Handling Simple Terms. Consider a query that has                                      `Cuisine_ontology')=1;
simple relation types, i.e., no AND, OR, NOT operators.                Transformed Query:
The first query given in Section 2.3.4 can be converted as                The only differences from the transformed query of
follows:                                                                  the previous example is that the relationships table:
Original Query:                                                               FROM relationships
       SELECT r.name, r.address                                           is replaced by a sub-query to introduce the implicit
       FROM served_food sf, restaurant r                                  symmetric edges into the query:
       WHERE r.id = sf.r_id AND
             ONT_RELATED(sf.cuisine, `IS_A',                                  FROM (SELECT term1, relation, term2
     `Latin American',                                                              FROM relationships
     `Cuisine_ontology')=1;                                                         UNION
                                                                                    SELECT term2, relation, term1
Transformed Query:                                                                  FROM relationships
     SELECT r.name, r.address                                                       WHERE relation = `EQV')
     FROM served_food sf,restaurant r                                     and the occurrence of the following predicate in
     WHERE r.id = sf.r_id AND
           sf.cuisine IN                                                  START WITH and CONNECT BY clauses
         (SELECT term1 FROM relationships                                     relation = `IS_A'
          START WITH                                                      is replaced with the following predicate:
           term2 = 'Latin American' AND



                                                                1060
      (relation = `IS_A' OR relation =                                   WHERE r.id = sf.r_id AND
      `EQV')                                                                   ONT_RELATED(sf.cuisine,
                                                                                          `NOT EQV',
3.2.3 Handling AND operator. Conjunctive conditions                                       `Latin American',
between transitive relationship types can be handled by                                   `Cuisine_ontology')=1;
independently computing the transitive closure for each               Transformed Query: Only difference from the
relationship type and then applying set INTERSECT on               transformed query of the example in Section 3.2.1 is that
the resulting sets. For each node in the intersection, a path      the occurrence of the following predicate in START
exists from the start node to this node for each                   WITH and CONNECT BY clauses
relationship type and hence this is sufficient.                             relation = `IS_A'
   Let us consider another relationship between cuisines,                is replaced with the following predicate:
which identifies the spiciest cuisine using the term                        relation != `EQV'
MOST_SPICY. The ontology can now contain                              Note that if a user wants to retrieve all cuisines except
information such as `South Asian cuisine is                        Latin American cuisine, then the query can be formulated
MOST_SPICY Asian cuisine' and `Indian cuisine is                   using the operator ONT_RELATED returning 0 as follows:
MOST_SPICY South Asian cuisine,' etc.                                           ......
   To find very spicy cuisine from the ontology, user can                          ONT_RELATED(sf.cuisine,
issue a query using conjunctive conditions in the                                             `IS_A',
relationships as follows:                                                                     `Latin American',
                                                                                              `Cuisine_ontology')=0;
Original Query: Find a restaurant that serves very spicy
                 Asian cuisine.                                    3.2.5 Handling Combination of OR, AND, and NOT.
   SELECT r.name FROM served_food sf,                              OR and NOT operators are directly specified in the
                       restaurant r                                CONNECT BY clause and AND operators are handled
   WHERE r.id = sf.r_id AND                                        by INTERSECT. All conditions are rewritten as
   ONT_RELATED(sf.cuisine,
                `IS_A AND MOST_SPICY'                              conjunctive conditions. For example, `A OR (B AND
               `Asian',                                            C)' will be converted into `(A OR B) AND (A OR
                `Cuisine_ontology') = 1;                           C).' Then, `(A OR B)' and `(A OR C)' are
Transformed query:                                                 specified in the CONNECT BY clause in separate queries
   SELECT r.name FROM served_food sf,                              that can be combined with INTERSECT operator.
                  restaurant r
    WHERE r.id = sf.r_id AND
         sf.cuisine IN                                             3.3 Speeding up ONT_RELATED and
     (                                                                 ONT_EXPAND Operations
      SELECT term1 FROM relationships
       START WITH                                                  Finding transitive closure from an ontology can be a
         term2 = 'Asian' AND                                       time-consuming process especially if the ontology has a
         relation = `IS_A'
       CONNECT BY                                                  large number of terms. In addition, different relationship
         PRIOR term1 = term2 AND                                   types can further increase the computation cost. To
         relation = 'IS_A'                                         address this problem, a transitive closure table is pre-
      INTERSECT
                                                                   computed. Note that as part of this computation both
                                                                   distance and path measures are computed as well. For the
       SELECT term1 FROM relationships                             example cuisine ontology, the transitive closure table will
        START WITH
           term2 = 'Asian' AND                                     be as shown in Table 3.
           relation = `MOST_SPICY'                                            Table 3: Transitive Closure Table
         CONNECT BY
            PRIOR term1 = term2 AND                                RootTerm        RelType    Term            Distance   Path
             relation =`MOST_SPICY');                              ...
                                                                   Latin           IS_A       Mexican         1          ...
                                                                   American
3.2.4 Handling NOT operator. A NOT operator
                                                                   Latin           IS_A       Portuguese      1          ...
specifies which relationships to exclude when finding              American
transitive closure. Therefore, given the start node all            ...
relationships except ones specified in NOT operator will
be traversed. NOT operators can be directly specified in              The data is stored in a key compressed index-
the START WITH and CONNECT BY clauses.                             organized table [18] (primary B+-tree) with  as the key. The commonly occurring
excluding `EQV' relationship types.                                 prefixes are compressed. The
   SELECT r.name FROM served_food sf,                              distance and path are stored as overflow-resident
                      restaurant r



                                                            1061
columns. This allows for basic index-structure to remain           and update operations also result in incremental
compact thereby providing efficient index-lookup.                  maintenance of the index.
   For a query involving ONT_EXPAND, say with                         The transitive closure table is meant for a stable
arguments `Latin American' and `IS_A' this pre-                    ontology. If the ontology changes, the table needs to be
computed transitive closure table is looked up instead of          updated. For inserts/deletes/updates into ontology, the
traversing the ontology to find the transitive closure, and        transitive closure table can be incrementally maintained.
the matching rows are returned. The rows returned                  The algorithm is omitted due to lack of space.
include the distance and path measures, which are also
available in the Transitive Closure table.                         3.4 Performance Study using Cancer Ontology
   To speed up queries involving ONT_RELATED, a new                To characterize the performance of ontology-based
indexing scheme ONT_INDEXTYPE is implemented using                 semantic matching, a part of the National Cancer
Oracle's Extensible Indexing Framework [5]. Users only             Institute's Thesaurus and ontology [17] was stored using
need to create an index on the column holding ontology             a prototype implementation built on Oracle RDBMS.
terms using ONT_INDEXTYPE as follows:                              Specifically, the following experiments were conducted
      CREATE INDEX                                     using Oracle9i Release 2 (9.2.0.3) on a SunOS 5.6, Ultra-
      ON  ()                              60 Sparc Workstation with one 450Mhz CPU and 512
      INDEXTYPE is ONT_INDEXTYPE
      PARAMETERS(`Ontology =                                       MB of main memory.
                   ');                                  The stored ontology consisted of 25,762 terms and
   The basic processing of indexing scheme works as                54,387 relationships among them. The resulting transitive
follows. Consider the following index creation statement:          closure took 6 minutes to build and contained 186,211
      CREATE INDEX idx1                                            rows.
      ON served_food (cuisine)                                         The following query was executed with the index
      INDEXTYPE is ONT_INDEXTYPE
      PARAMETERS(`Ontology=cuisine_ontology');                     (transitive closure table) for several different terms. This
                                                                   queries the ontology directly.
   The index creation results in creation of a key-
                                                                      SELECT count(*) FROM TABLE(
compressed index-organized table with two columns                      ONT_EXPAND(NULL, `IS_A',
 as shown in Table 4. The row_id                                  :term,
column contains the row identifier for the served_food                           `Cancer_Ontology'));
table.                                                                Figure 4 illustrates that query runtime increases
                  Table 4: Index Table                             linearly as the number of rows in the ONT_EXPAND
cuisine                       row_id                               result set increases.
...                                                                   Next a series of database tables, named patients,
Mexican                       ROWID7                               were created with a varying number of rows, such that
Portuguese                    ROWID8
                                                                   10% of all rows contained a diagnosis term that was one
                                                                   of the 961 subclasses of `Experimental_Organism_
...
                                                                   Diagnoses' (EOD).
   Now, a query involving ONT_RELATED operator say
with arguments (sf.cuisine, `IS_A', `Latin                                              4.00
American', ...), is executed by first searching the
                                                                                        3.50
transitive closure table using the key (`Latin
American', `IS_A') to find the terms, and then for                                      3.00
                                                                       Time (seconds)




each term the corresponding row identifier is obtained by                               2.50
doing a lookup into the Index Table.
                                                                                        2.00
   If a query with ONT_RELATED operator references
                                                                                        1.50
ONT_DISTANCE and/or ONT_PATH, then the indexed
implementation of ONT_RELATED operator retrieves                                        1.00
distance and/or path measures from the transitive closure                               0.50
table. These values are simply returned as part of
ONT_DISTANCE and ONT_PATH invocations.                                                  0.00
   The index idx1 created on served_food table                                                 0         2000         4000        6000
behaves likes a regular index, which can be incrementally                                      Num ber of Term s in ONT_EXPAND Result
maintained. That is, if a new row is added to served_
food table, the corresponding 
values are added to the index table. Similarly, the delete
                                                                                          Figure 4: Performance of ONT_EXPAND



                                                            1062
   Given these patients tables, the following query                                                to terrorism. From the example pattern given in [20],
was executed, using all three ONT_RELATED                                                          where a person rents a truck and another person buys
evaluation mechanisms (functional evaluation without an                                            fertilizer, and they reside in the same house, we can
index, functional evaluation with an index, and index                                              formulate an SQL query using ONT_RELATED operator
evaluation), to count the number of patients whose                                                 for a given activity table:
diagnosis is an EOD:
                                                                                                   Person_name   Address    Activity   Object
   SELECT count(*) FROM patients                                                                   John Buck     Addr1      Rent       Ford F-150
   WHERE ONT_RELATED(diagnosis,
                      `IS_A',                                                                      Jane Doe      Addr1      Buy        Ammonium Nitrate
        `Experimental_Organism_Diagnoses',                                                         ...           ...        ...        ...
                     `Cancer_Ontology') = 1;
  Figure 5 illustrates how query runtime varies as the
number of rows in the patients table changes.                                                      SELECT *
                                                                                                   FROM ACTIVITY x, ACTIVITY y
                                                                                                   WHERE
                                                                                                    x.Activity = `Rent' AND
                       400.00                                                                       y.Activity = `Buy' AND
                       350.00                                                                       ONT_RELATED(x.object,`IS_A',`Truck',
                                                                                                               `vehicle_ontology') = 1 AND
                       300.00                                                                       ONT_RELATED(y.object,`IS_A','Fertilizer',
      Time (seconds)




                       250.00                                                                                 `chemical_ontology')=1 AND
                                                                                                    x.Address = y.Address;
                       200.00
                       150.00                                                                      By referring to more than one ontology we can analyze
                       100.00                                                                      suspicious activities involving a combination of different
                                                                                                   actions.
                        50.00
                         0.00
                                                                                                   4.2      A Supply Chain Application
                                 0            10000     20000            30000      40000
                                             Number of Row s Queried                               A supply chain where thousands of products and services
                                                                                                   are exchanged has a major issue of standardizations of
                        Funct ional ( wit hout index)   Funct ional (wit h Index)   Index          purchase order, bill of material, catalogs, etc. There could
                                                                                                   be name, language, currency, and unit differences to
                                                                                                   resolve to name a few. Standardization efforts such as
                       Figure 5: Performance of ONT_RELATED                                        RosettaNet [22], UNSPSC [23] for product category and
                                                                                                   ebXML [24] to standardize business processes and
   Functional evaluation without an index computes                                                 products are not enough to meet individual
transitive closure for ONT_RELATED using CONNECT BY                                                vendor/customer needs to resolve semantic differences.
clause (without using a pre-computed transitive closure                                            Typically, these conflicts have been resolved using some
table). Functional evaluation with an index uses the                                               form of mapping mechanisms [6].
transitive closure table to check if two terms are
connected. Index evaluation uses the index table (Table                                               These conflicts can be resolved by issuing an SQL
4) and the transitive closure table to compute the result.                                         query with ONT_RELATED operator using ontologies.
Again the query time increases linearly as the result set                                          Even individually developed ontologies may need the
size increases. The results clearly show that finding                                              ontology mappings to communicate each other. We can
transitive closure by traversing the ontology during query                                         apply ONT_RELATED and ONT_EXPAND operators to
processing time is time-consuming and utilizing the pre-                                           the mapping ontology to resolve the conflicts as well.
computed transitive closure table provides significant
performance gains.                                                                                 4.3      A Life Science Application
                                                                                                   Life Science domain applications have been using
4. Ontology-driven Database Applications                                                           ontologies for representing knowledge as well as the basis
 This section illustrates the usage of the ontolo