Anastasia Analyti: Research Record
|
[Home]
[Publications by Year]
[Publications by Research Area]
Reasoning on the Semantic Web |
Rules constitute the next layer over the ontology languages of the Semantic Web, allowing arbitrary
interaction of variables in the head and body of the rules. In this work, the Semantic Web language Resource
Description Framework Schema (RDFS) is extended to accommodate the two negations of Partial Logic,
namely weak negation (expressing negation-as-failure or non-truth) and strong negation
(expressing explicit negative information or falsity), as well as derivation rules.
The new language is called Extended RDF (ERDF). The stable model semantics of ERDF ontologies
is developed, based on Partial Logic, extending the model-theoretic semantics of RDFS. Intuitively,
an ERDF ontology is the combination of (i) an ERDF graph G containing (implicitly existentially quantified)
positive and negative information, and (ii) an ERDF program P containing derivation rules, with possibly all
connectives weak negation, strong negation, material implication, conjunction, and disjunction,
as well as existential and universal quantifiers in the body of a rule, and strong negation
in the head of a rule. ERDF enables the combination of closed-world (non-monotonic) and
open world (monotonic) reasoning, in the same framework, through the presence of weak
negation (in the body of the rules) and the new metaclasses erdf:TotalProperty and erdf:TotalClass,
respectively. We have shown that ERDF stable model entailment conservatively extends
RDFS entailment from RDF graphs to ERDF ontologies.
Unfortunately, satisfiability and entailment under the ERDF stable model
semantics are in general undecidable and decidability cannot be achieved under
this semantics, unless ERDF ontologies of restricted syntax are considered.
This is due to the fact that the RDF vocabulary is infinite. Therefore, to achieve
decidability of reasoning in the general case, we propose a modified semantics,
called ERDF #n-stable model semantics that considers a finite vocabulary. The new semantics
also extends RDFS entailment from RDF graphs to ERDF ontologies. Moreover, if O is a simple ERDF
ontology (i.e., the bodies of the rules of O contain only weak negation, strong negation, and
conjunction) then query answering under the ERDF #n-stable model reduces to query answering
under the answer set semantics. Complexity results and equivalence statements between the ERDF
stable and #n-stable model semantics are provided. Additionally, we propose a framework of modular ERDF ontologies
and define its semantics extending the ERDF stable model semantics.
We present a principled framework for modular web rule bases, called MWeb
(MWeb implementation site). According to this framework, each predicate
defined in a rule base is characterized by its defining reasoning mode, scope, and exporting rule base list.
Each predicate used in a rule base is characterized by its requesting reasoning mode and importing rule base list.
For legal MWeb modular rule bases S the MWebAS and MWebWFS semantics of each rule base s in S w.r.t. S
are defined model-theoretically. These semantics extend the answer set semantics (AS) and the well-founded semantics
with explicit negation WFSX on ELPs, respectively, keeping all of their semantical and computational characteristics.
Our framework supports: (i) local semantics and different points of view, (ii) local closed-world and open-world assumptions, (iii)
scoped negation-as-failure, (iv) restricted propagation of local inconsistencies, and (v) monotonicity of reasoning, for "fully shared" predicates.
We specify the syntax of simple modular ERDF ontologies in the MWeb system,
allowing for the integration of both systems. The transformation clearly identifies
the subset of the MWeb language necessary to implement simple modular ERDF ontologies,
which does not require all the features of the MWeb framework. Additionally, we specify the semantics
of ERDF reasoning entirely in the MWeb framework, including alignment with RIF,
support of RDF and RDFS reasoning, as well extensions to the original ERDF semantics
for dealing with closed classes and properties.
Thus, reasoning on simple modular ERDF ontologies can be achieved through our
MWeb implementation and, in particular, supporting modular reasoning over RDF(S) ontologies.
Another contribution of this work is the specification of the semantics of simple modular
ERDF ontologies via extended logic programming rules, which can be readily adapted by any other system under answer set semantics.
The ultimate goal of the biomedical informatics project PrognoChip is the identification of
classification and prognosis molecular markers for breast cancer. This requires not only an
understanding of the genetic basis of the disease, based on the patient’s tumor gene expression
profiles but also the correlation of this data with knowledge normally processed in the clinical setting.
We have developed the Mediator component of the PrognoChip Integrated Clinico-Genomic
Environment (ICGE), through which the integration of the clinical information subsystem and the
genomic information subsystem
is achieved. The biomedical investigator can form clinico-genomic queries through the web-based
graphical user interface of the Mediator. This is split into several query forms,
allowing cancerous sample selection (along with their associated gene expression profiles and patient characteristics),
based on criteria of interest. After a query is formed, the Mediator translates it
into an equivalent set of local subqueries, which are executed directly against the constituent databases.
Then, results are combined for presentation to the user and/or transmission to the Data Mining tools for analysis.
Faceted Metadata and Semantics |
A faceted taxonomy is a set of taxonomies, each one describing the domain of interest from a different
(preferably orthogonal) point of view. Having a faceted taxonomy, each domain object (e.g., a book or a
Web page) can be indexed using a compound term, i.e., a set of terms from the different facets. Faceted
taxonomies carry a number of well known advantages over single taxonomies (clarity, compactness, scalability),
but they also have a severe drawback: the high cost of avoiding invalid compound terms, i.e. compound terms
that do not apply to any object in the domain. The interaction paradigm of faceted search and dynamic
taxonomies can enable users to browse only nodes that correspond to valid compound terms. However, if
the computation of such compound terms is based only on the objects that have already been indexed then
this interaction paradigm cannot be exploited, in the case where there are no indexed objects.
We propose an algebra, called Compound Term Composition Algebra (CTCA), based on which one
can built an algebraic expression to specify the valid compound terms of a faceted taxonomy, in
a flexible and easy manner. The availability of algebraic expressions describing the valid compound
terms of a faceted taxonomy enables the dynamic generation of navigation trees, whose nodes
correspond to valid compound terms, only. These navigational trees can be used for indexing
(for avoiding errors) and do not present the problem of missing terms or missing relationships
that characterize single-taxonomies. Additionally, we propose specific mining algorithms
that can be used for expressing the extensionally valid compound terms of a materialized faceted taxonomy
(i.e., a corpus of objects indexed through a faceted taxomony),
in the form of an algebraic expression. Obviously, such mined algebraic expressions enable the
user to take advantage of the aforementioned interaction scheme, without having to resort to the
(possibly, numerous) instances of M. Furthermore, algebraic expressions describing the valid
compound terms of a faceted taxonomy can be exploited in other tasks, such as retrieval optimization,
configuration management, consistency control, and compression.
Contexts in Information Bases |
Our research in this area concerns the development of a general theory
on Contexts in Information Bases, and includes (i) the introduction of
a novel structure for the representation of contexts,
(ii) the development of a naming scheme based on contexts,
(iii) the definition of operations between contexts, such as union,
intersection, and difference operation, and
(iv) the definition of a high-level query and update language for contexts.
The notion of context appears in several disciplines, including computer
science, under various forms.
However, all these forms are very diverse and serve different purposes.
We present a general framework for representing the notion of
context in information modeling.
First, we define a context as
a set of objects, within which each object has a set of names and
possibly a reference: the reference of the object is another
context which ``hides" detailed information about the object.
Then, we introduce the possibility of structuring the contents of
a context through the traditional abstraction mechanisms, i.e.
classification, generalization, and attribution. We show that,
depending on the application, our notion of context can be used as
an independent abstraction mechanism, either in an alternative or
a complementary capacity with respect to the traditional
abstraction mechanisms. We also
study the interactions between contextualization and the
traditional abstraction mechanisms, as well as the constraints
that govern such interactions. Finally, we present a theory for
contextualized information bases. The theory includes a set of
validity constraints, a model theory, as well as a set of sound
and complete inference rules. We show that our core theory can be
easily extended to support embedding of particular information
models in our contextualization framework.
Knowledge Representation and Reasoning |
Our research in this area focuses on (i) the development of knowledge representation models
that support the representation of complex, evolving, heterogeneous, and abstract concepts
and processes, (ii) methodologies for conceptual model design, (iii) the formalization of
knowledge representation models, and (iv) the development and formalization of semantic
structures that support reasoning not only at the instance, but also at the schema level.
The goal of semantic data modelling is to enable the database designer to naturally and
directly incorporate as much as possible of the meaning of an application environment into its
data model. However, a semantic data model should not only be characterized in terms of its
representational adequacy but also in terms of the inferences it supports. Yet, little effort has
been devoted to mechanisms for schema derivations and schema verification. To satisfy this
need, we propose (i) structures that carry expressive and useful information on the database
schema, (ii) a set of inference rules for schema derivations, and (iii) a mechanism for
discovering contradictory schema declarations.
Our research is focused on the development and strong formalization of a knowledge
representation model that supports the description of large evolving varieties of highly
interrelated data, concepts and complex relationships. In addition, we are concerned with the
development of a methodology for constructing semantic data models.
Though specialization and inheritance are well-known concepts, certain aspects of these
concepts lack formal foundations. In particular, when properties of different classes are
semantically related, several different semantics are possible for the inherited properties, and
a choice is necessary. Conventional systems impose an a priori solution that supports only
one of the possible semantics of inheritance. We introduced Restriction Isa (RISA), a form of
specialization that represents property value refinement. We demonstrated that RISA allows
to differentiate between the possible semantics of inheritance, in a formal and sound way.
In addition, the RISA relation allows to express participation constraints on properties.
Specifically, properties of a class are characterized as necessary, possible, or inapplicable on
a given subclass. Whether explicitly declared or derived, this information is useful for several
reasons:
(i) It aids the user to understand better the semantics of the subclass,
(ii) it expresses a particular form of negative information,
(iii) it uncovers contradictory declarations or design errors, and
(vi) it characterizes property values that are missing from the database.
We introduced a new relationship among properties, called property covering. Property
covering holds when a property restricted to a given class is the union of a collection of sub-
properties. In fact, property covering is a generalization of the RISA relation, mentioned
earlier. We demonstrated that property covering, together with inheritance and disjointness,
constitutes a powerful conceptual modelling mechanism.
Logic Programming and Deductive Databases |
A deductive database consists of two parts: a set of known facts, and a set of rules from which
new facts can be derived. The goal of this research is to derive useful information from a set
of contradictory rules. Consistency of derived facts is not a realistic assumption in many applications.
In the presence of contradiction, classical logic fails to give any semantics to the deductive database.
Thus, even a single erroneous datum could destroy all meaningful information.
In the investigated framework, rules are equipped with a partial order expressing their relative
reliability in case of conflict. This reliability order is used to choose between conflicting rules.
When no choice is possible, the conflicting rules are considered unreliable and their conclusions are blocked.
Conclusions from rules unrelated to the contradiction are considered reliable and they are used
for the derivation of new information.
Multimedia Database Systems |
Multimedia database systems deal with the storage, manipulation,
and retrieval of multiple media types (pictures, voice, video, graphics, text).
My interests in this area include:
(i) the development of multimedia interpretation models describing the content of multimedia data for
content-based retrieval, (ii) the development of multimedia description models
supporting multimedia data presentation and synchronization, and
(iii) query languages and navigation methods for multimedia data.
Data Structures and Files, Main-Memory Databases |
We have proposed and analyzed the performance of
multi-directory hashing techniques for fast search in main
memory databases.
Additionally, we have proposed and analyzed the performance of a multi-directory hashing technique
for disk-based databases. The latter technique achieves improved bucket utilization and is suitable for parallel search.