Anastasia Analyti: Research Record

[Home] [Publications by Year] [Publications by Research Area]

Reasoning on the Semantic Web

Rules constitute the next layer over the ontology languages of the Semantic Web, allowing arbitrary interaction of variables in the head and body of the rules. In this work, the Semantic Web language Resource Description Framework Schema (RDFS) is extended to accommodate the two negations of Partial Logic, namely weak negation (expressing negation-as-failure or non-truth) and strong negation (expressing explicit negative information or falsity), as well as derivation rules. The new language is called Extended RDF (ERDF). The stable model semantics of ERDF ontologies is developed, based on Partial Logic, extending the model-theoretic semantics of RDFS. Intuitively, an ERDF ontology is the combination of (i) an ERDF graph G containing (implicitly existentially quantified) positive and negative information, and (ii) an ERDF program P containing derivation rules, with possibly all connectives weak negation, strong negation, material implication, conjunction, and disjunction, as well as existential and universal quantifiers in the body of a rule, and strong negation in the head of a rule. ERDF enables the combination of closed-world (non-monotonic) and open world (monotonic) reasoning, in the same framework, through the presence of weak negation (in the body of the rules) and the new metaclasses erdf:TotalProperty and erdf:TotalClass, respectively. We have shown that ERDF stable model entailment conservatively extends RDFS entailment from RDF graphs to ERDF ontologies.

Unfortunately, satisfiability and entailment under the ERDF stable model semantics are in general undecidable and decidability cannot be achieved under this semantics, unless ERDF ontologies of restricted syntax are considered. This is due to the fact that the RDF vocabulary is infinite. Therefore, to achieve decidability of reasoning in the general case, we propose a modified semantics, called ERDF #n-stable model semantics that considers a finite vocabulary. The new semantics also extends RDFS entailment from RDF graphs to ERDF ontologies. Moreover, if O is a simple ERDF ontology (i.e., the bodies of the rules of O contain only weak negation, strong negation, and conjunction) then query answering under the ERDF #n-stable model reduces to query answering under the answer set semantics. Complexity results and equivalence statements between the ERDF stable and #n-stable model semantics are provided. Additionally, we propose a framework of modular ERDF ontologies and define its semantics extending the ERDF stable model semantics.

We present a principled framework for modular web rule bases, called MWeb (MWeb implementation site). According to this framework, each predicate defined in a rule base is characterized by its defining reasoning mode, scope, and exporting rule base list. Each predicate used in a rule base is characterized by its requesting reasoning mode and importing rule base list. For legal MWeb modular rule bases S the MWebAS and MWebWFS semantics of each rule base s in S w.r.t. S are defined model-theoretically. These semantics extend the answer set semantics (AS) and the well-founded semantics with explicit negation WFSX on ELPs, respectively, keeping all of their semantical and computational characteristics. Our framework supports: (i) local semantics and different points of view, (ii) local closed-world and open-world assumptions, (iii) scoped negation-as-failure, (iv) restricted propagation of local inconsistencies, and (v) monotonicity of reasoning, for "fully shared" predicates.

We specify the syntax of simple modular ERDF ontologies in the MWeb system, allowing for the integration of both systems. The transformation clearly identifies the subset of the MWeb language necessary to implement simple modular ERDF ontologies, which does not require all the features of the MWeb framework. Additionally, we specify the semantics of ERDF reasoning entirely in the MWeb framework, including alignment with RIF, support of RDF and RDFS reasoning, as well extensions to the original ERDF semantics for dealing with closed classes and properties. Thus, reasoning on simple modular ERDF ontologies can be achieved through our MWeb implementation and, in particular, supporting modular reasoning over RDF(S) ontologies. Another contribution of this work is the specification of the semantics of simple modular ERDF ontologies via extended logic programming rules, which can be readily adapted by any other system under answer set semantics.

Biomedical Informatics

The ultimate goal of the biomedical informatics project PrognoChip is the identification of classification and prognosis molecular markers for breast cancer. This requires not only an understanding of the genetic basis of the disease, based on the patientís tumor gene expression profiles but also the correlation of this data with knowledge normally processed in the clinical setting. We have developed the Mediator component of the PrognoChip Integrated Clinico-Genomic Environment (ICGE), through which the integration of the clinical information subsystem and the genomic information subsystem is achieved. The biomedical investigator can form clinico-genomic queries through the web-based graphical user interface of the Mediator. This is split into several query forms, allowing cancerous sample selection (along with their associated gene expression profiles and patient characteristics), based on criteria of interest. After a query is formed, the Mediator translates it into an equivalent set of local subqueries, which are executed directly against the constituent databases. Then, results are combined for presentation to the user and/or transmission to the Data Mining tools for analysis.

Faceted Metadata and Semantics

A faceted taxonomy is a set of taxonomies, each one describing the domain of interest from a different (preferably orthogonal) point of view. Having a faceted taxonomy, each domain object (e.g., a book or a Web page) can be indexed using a compound term, i.e., a set of terms from the different facets. Faceted taxonomies carry a number of well known advantages over single taxonomies (clarity, compactness, scalability), but they also have a severe drawback: the high cost of avoiding invalid compound terms, i.e. compound terms that do not apply to any object in the domain. The interaction paradigm of faceted search and dynamic taxonomies can enable users to browse only nodes that correspond to valid compound terms. However, if the computation of such compound terms is based only on the objects that have already been indexed then this interaction paradigm cannot be exploited, in the case where there are no indexed objects.

We propose an algebra, called Compound Term Composition Algebra (CTCA), based on which one can built an algebraic expression to specify the valid compound terms of a faceted taxonomy, in a flexible and easy manner. The availability of algebraic expressions describing the valid compound terms of a faceted taxonomy enables the dynamic generation of navigation trees, whose nodes correspond to valid compound terms, only. These navigational trees can be used for indexing (for avoiding errors) and do not present the problem of missing terms or missing relationships that characterize single-taxonomies. Additionally, we propose specific mining algorithms that can be used for expressing the extensionally valid compound terms of a materialized faceted taxonomy (i.e., a corpus of objects indexed through a faceted taxomony), in the form of an algebraic expression. Obviously, such mined algebraic expressions enable the user to take advantage of the aforementioned interaction scheme, without having to resort to the (possibly, numerous) instances of M. Furthermore, algebraic expressions describing the valid compound terms of a faceted taxonomy can be exploited in other tasks, such as retrieval optimization, configuration management, consistency control, and compression.

Contexts in Information Bases

Our research in this area concerns the development of a general theory on Contexts in Information Bases, and includes (i) the introduction of a novel structure for the representation of contexts, (ii) the development of a naming scheme based on contexts, (iii) the definition of operations between contexts, such as union, intersection, and difference operation, and (iv) the definition of a high-level query and update language for contexts.

The notion of context appears in several disciplines, including computer science, under various forms. However, all these forms are very diverse and serve different purposes. We present a general framework for representing the notion of context in information modeling. First, we define a context as a set of objects, within which each object has a set of names and possibly a reference: the reference of the object is another context which ``hides" detailed information about the object. Then, we introduce the possibility of structuring the contents of a context through the traditional abstraction mechanisms, i.e. classification, generalization, and attribution. We show that, depending on the application, our notion of context can be used as an independent abstraction mechanism, either in an alternative or a complementary capacity with respect to the traditional abstraction mechanisms. We also study the interactions between contextualization and the traditional abstraction mechanisms, as well as the constraints that govern such interactions. Finally, we present a theory for contextualized information bases. The theory includes a set of validity constraints, a model theory, as well as a set of sound and complete inference rules. We show that our core theory can be easily extended to support embedding of particular information models in our contextualization framework.

Knowledge Representation and Reasoning

Our research in this area focuses on (i) the development of knowledge representation models that support the representation of complex, evolving, heterogeneous, and abstract concepts and processes, (ii) methodologies for conceptual model design, (iii) the formalization of knowledge representation models, and (iv) the development and formalization of semantic structures that support reasoning not only at the instance, but also at the schema level.

The goal of semantic data modelling is to enable the database designer to naturally and directly incorporate as much as possible of the meaning of an application environment into its data model. However, a semantic data model should not only be characterized in terms of its representational adequacy but also in terms of the inferences it supports. Yet, little effort has been devoted to mechanisms for schema derivations and schema verification. To satisfy this need, we propose (i) structures that carry expressive and useful information on the database schema, (ii) a set of inference rules for schema derivations, and (iii) a mechanism for discovering contradictory schema declarations.

Our research is focused on the development and strong formalization of a knowledge representation model that supports the description of large evolving varieties of highly interrelated data, concepts and complex relationships. In addition, we are concerned with the development of a methodology for constructing semantic data models.

Though specialization and inheritance are well-known concepts, certain aspects of these concepts lack formal foundations. In particular, when properties of different classes are semantically related, several different semantics are possible for the inherited properties, and a choice is necessary. Conventional systems impose an a priori solution that supports only one of the possible semantics of inheritance. We introduced Restriction Isa (RISA), a form of specialization that represents property value refinement. We demonstrated that RISA allows to differentiate between the possible semantics of inheritance, in a formal and sound way.

In addition, the RISA relation allows to express participation constraints on properties. Specifically, properties of a class are characterized as necessary, possible, or inapplicable on a given subclass. Whether explicitly declared or derived, this information is useful for several reasons:
(i) It aids the user to understand better the semantics of the subclass,
(ii) it expresses a particular form of negative information,
(iii) it uncovers contradictory declarations or design errors, and
(vi) it characterizes property values that are missing from the database.

We introduced a new relationship among properties, called property covering. Property covering holds when a property restricted to a given class is the union of a collection of sub- properties. In fact, property covering is a generalization of the RISA relation, mentioned earlier. We demonstrated that property covering, together with inheritance and disjointness, constitutes a powerful conceptual modelling mechanism.

Logic Programming and Deductive Databases

A deductive database consists of two parts: a set of known facts, and a set of rules from which new facts can be derived. The goal of this research is to derive useful information from a set of contradictory rules. Consistency of derived facts is not a realistic assumption in many applications. In the presence of contradiction, classical logic fails to give any semantics to the deductive database. Thus, even a single erroneous datum could destroy all meaningful information. In the investigated framework, rules are equipped with a partial order expressing their relative reliability in case of conflict. This reliability order is used to choose between conflicting rules. When no choice is possible, the conflicting rules are considered unreliable and their conclusions are blocked. Conclusions from rules unrelated to the contradiction are considered reliable and they are used for the derivation of new information.

Multimedia Database Systems

Multimedia database systems deal with the storage, manipulation, and retrieval of multiple media types (pictures, voice, video, graphics, text). My interests in this area include: (i) the development of multimedia interpretation models describing the content of multimedia data for content-based retrieval, (ii) the development of multimedia description models supporting multimedia data presentation and synchronization, and (iii) query languages and navigation methods for multimedia data.

Data Structures and Files, Main-Memory Databases

We have proposed and analyzed the performance of multi-directory hashing techniques for fast search in main memory databases. Additionally, we have proposed and analyzed the performance of a multi-directory hashing technique for disk-based databases. The latter technique achieves improved bucket utilization and is suitable for parallel search.