Position paper: Developing Hypermedia Over an Information Repository

Position paper for the 2nd Workshop on Open Hypermedia Systems at Hypertext '96,
Washington, DC, USA (16-20 March 1996)

Developing Hypermedia Over an Information Repository

Panos Constantopoulos , Manos Theodorakis and Yannis Tzitzikas

Department of Computer Science,
University of Crete
and
Institute of Computer Science (ICS),
Foundation for Research and Technology - Hellas (FORTH)
email : { panos |etheodor |tzitzik }@csi.forth.gr

Abstract

We propose developing hypermedia applications over an information repository. The main benefits of this approach include development efficiency, product quality, ease of tailoring and extensibility. In particular we focus on design and implementation issues related to presenting, navigating and retrieving large amounts of highly interlinked, complex data. DOMENICUS , a prototype customizable hypermedia engine built on top of a repository system ,the Semantic Index System , is briefly reviewed.

1 Introduction

Improving the quality of hypermedia design and reducing the development cost is an important challenge for the information industry.

In addition the integration and collaboration of existing and future tools over a general hypermedia environment would contribute to the efficient usage of the human knowledge stored in computer systems, and would enhance productivity, collaboration, cognition and learning.

In this paper we focus on the design and implementation issues concerning hypermedia systems needed for the presentation of large amounts of highly interrelated, structured, heterogeneous data, for which complex query and retrieval methods are needed and are subject to update (in contrast with permanent data). Actually, these applications are very laborious and this holds for all the steps of their design process and life-cycle: domain analysis, navigational design, interface design, implementation, data entry, testing, adaptation/tailoring, data evolution and system upgrade. Example applications include encyclopedias, scientific catalogs, on-line museum systems and geographical/political atlases.

We use an information repository in order to represent the logical structure of the data, as well as information concerning their usage and management (display, retrieval). A repository is defined [1] as a shared database of information about engineered artifacts. We use a repository manager, the Semantic Index System (SIS), which supports rich structuring mechanisms: classification, generalization and attribution, which are indispensable in order to deal with the data/usage complexity. In case of non structured data (images/video/audio/plain text) only their semantic description (if any) and pointers to their storage location are stored in the repository.

On top of SIS we have developed a customizable hypermedia engine, DOMENICUS , which allows presentation, navigation and retrieval of data (including multimedia). It supports a set of core functionalities, needed in most application domains, which are customizable according to a Presentation Model (PM) , stored in the repository itself. It supports cards whose contents are determined at run-time (limited natural language generation), are customized easily (by the PM), and offer flexible and consistent hyperlinking. Thus, the development cost (hyperlinking and data entry effort) is minimized and we get fully connected and consistent presentations, which are flexible and extensible, thus, suitable for on-line applications. In addition it offers complex retrieval methods and incremental query formulation. Representing the data in a repository permits efficient exploitation and data reuse.

In section 2 we refer to some crucial issues relating to hypermedia development. In section 3 we describe the Semantic Index System (SIS), our repository manager, and in section 4 we present DOMENICUS. In section 5 we present future directions and in section 6 we draw some conclusions.

2 Crucial issues

Crucial issues concerning hypermedia development are the following:

Information Representation
Information representation determines the degree of data exploitation. In case of hypermedia applications rich structuring mechanisms are needed to handle data complexity (eg: scientific knowledge and engineering designs or constructs) and usage complexity (complex retrieval methods, usage of the same data by different users for different purposes). Multi-facet classification [8] must be supported.
Information Management
Advanced information management capabilities are needed in order to result in presentation quality and effectiveness. Data integrity, consistency and privacy should be addressed. Further issues concern collaboration, distribution and interoperability.
Extensibility
Domain data evolution should be feasible and performed in an easy manner. In addition, data reusability and tool integration and collaboration are indispensable. We agree with [5] that a complete hypermedia environment is needed at operating system level.

Therefore we believe that it would be advantageous for hypermedia development to take place over information repositories integrated with other tools. Centerpiece of a repository is the repository manager. Bernstein [1] defines the repository manager as a database application that supports checkout/checkin (private workspaces), version and configuration management, notification, context management, workflow control and rich structuring mechanisms in order to handle data sharing and interchange.

Moreover integrating repository and tools will release tool developers from implementing tool-specific databases and will enable tools to exploit the amenities of a repository manager.

3 The Semantic Index System

The Semantic Index System (hereafter SIS) [4,3] is a system for the management of very large collections of highly interrelated information objects with evolving structures. This system is especially well suited for use as a repository system, providing metadata management and the kernel of an integrated environment of a dynamic collection of tools [1].

SIS uses Telos as the information representation framework. Telos [6] is an object-oriented knowledge representation language that supports a number of structuring mechanisms as well as an assertional and temporal reasoning sublanguage. In SIS we confine ourselves to a version of the structural part of the Telos language.

Objects in Telos are named and organized along three dimensions: attribution, classification and generalization.

A distinctive feature of Telos , and consequently of the SIS data model, is the uniform treatment of individuals and attributes. This allows attributes to be organized in classification and generalization hierarchies and to have attributes of their own, which provides great expressive power and flexibility.

Multiple classification is allowed, supporting the separate representation of multiple modeling aspects. An open-ended classification hierarchy is possible. Classes within a given instantiation level are also organized in terms of generalization (or isA) relationships. These can be multiple and give rise to hierarchies that are directed acyclic graphs. They induce strict inheritance of attributes, in the sense that inherited attributes cannot be overridden but only restricted by the definition of the subclass.

The SIS outperforms all relational systems on the market in lookup and traversal access times by a factor of about 25. Thus, it currently is a unique pragmatic solution for efficiently handling very large sets of highly structured data.

The SIS user interface supports menu-guided and forms-based query formulation with graphical and textual presentation of the answer sets. It also supports graphical browsing and navigation in a hypertext-like manner. A hypertext annotation mechanism is also provided. Menu titles, menu layout and domain-specific queries are user-configurable. Thus the user interface can be customized to the application without changing the executable code.

A forms-based interactive data entry facility is provided. It allows for entering data and schema information in a uniform manner. By employing the schema information, it automatically adapts itself to the structure of the various classes and subclasses. Furthermore, it is customizable to application-specific tasks, such as classification of items, addition of descriptive elements, etc.

An Application Programming Interface (API) for communication with other tools is provided.

So far, SIS has been used as the kernel for various applications, such as a Software Static Analysis and Class Management System [3], the CLIO Cultural Documentation System [2], and prototype systems for thesaurous management and mechanical fault documentation and diagnosis.

In the framework of the AQUARELLE project (a project of the EC Telematics Programme aiming to the sharing of cultural heritage), SIS (and CLIO) is going to be used for the storage and management of multimedia folders and will be integrated with SGML editors in order to create, store and make accessible documents (folders) with referential integrity to formal knowledge entities and other document parts.

4 DOMENICUS: A repository-based hypermedia engine

DOMENICUS is a hypermedia engine developed over SIS which supports a set of core functionalities appearing in most application domains: alphabetic lists, subject catalogs, guided tours, query cards, hyperlinks, image annotations, bookmarks and history. Since hypermedia applications are addressed to different kinds of users it offers a simple, friendly and uniform interaction with the user.

The repository contains two types of data: (a) domain data and (b) usage data (used to determine the presentation and management of the domain data). In order to represent the domain data, the appropriate semantic network is constructed using the Telos language. In case of multimedia data, only logical pointers to them are stored in the knowledge base. The granularity of structuring should be determined by (a) the queries that should be answerable, and (b) the presentation requirements.

The usage data are used to customize the DOMENICUS functionalities, that is, the main data categories (subject catalog), the available guided tours, the contents and the interpretation of the query cards, the presentation card types and their contents (including their hyperlinking).

A Presentation Card Specification (PCS) defines a mapping of the knowledge concerning one object of the repository to a set of multimedia information: images, video, audio and a text in a format which is close to natural language (enriched with hyperlinks), making the contents of cards friendly. Card specifications can specify static and dynamic elements (expressed in the supported query language), and are used at run-time in order to produce the card contents. A PCS can be assigned to an object or a class of objects. Exceptions to the presentation declaration of a class are possible , permitting artistic interventions to the presentation of some members of the class.

Hyperlinking is one of the distinctive features of our approach since our presentation model permits the declaration of hyperlink classes (some of the link classes of the base ) and hyperlink connections determined by queries. This means that hyperlinks are produced dynamically at run-time, assuming that the information is structured appropriately in the knowledge base. This minimizes the cost of hyperlinking. Actually this cost is transposed to the information structuring cost which is done once with the aid of friendly data entry tools offered by the repository manager. In addition, it results in fully connected and consistent presentations, ease of tailoring, and permit data/schema update and evolution (needed for on-line applications). In addition, the data entry/modification effort is reduced. The methodology of hyperlink construction prevents dangling and erroneous links and makes the presentation stable since the hyperlinks - which are links of the knowledge base - are subject to the integrity control of the repository. Moreover hyperlinks are typed (they belong to classes, metaclasses,...) and this can be exploited in order to adapt/filter the card contents easily (even at run-time).

The domain and usage data relationship is represented in the base itself in a manner which is clear, flexible, consistent and can be used efficiently. The same domain data can be related with more than one set of usage data (that is, presentation descriptions). For example, from a single cultural knowledge base, we can provide on-line presentations dedicated to museum curators, WWW users, or we can produce a stable presentation disposed as a CD-ROM.

DOMENICUS can be used for the development of hypermedia applications whose knowledge evolves over time, offering development efficiency, extensibility and easy customization. DOMENICUS has been used for a prototype electronic presentation of the painting exhibition "From El-Greco to Cezzane" held in the National Gallery of Athens in 1992.

5 Future directions: Towards an advanced information repository

We keep working with our repository manager (SIS) and its usage for hypermedia development. Regarding SIS, we focus on context management [7] and tool integration. In particular, we are working with issues regarding context-based object naming [9] which, among others, contributes to the quality of the limited natural language generation (used in DOMENICUS cards) and data entry facilitation. We also study issues concerning view/context updates [10], in order to address filtering, authority and collaboration issues.

Tool integration is crucial since it will permit tools to share/interchange data without the need of special protocols. This needs special translators to map a tool's data format to a canonical format stored in the repository, but is not enough for tools that access data interactively during their execution. This problem can be solved either by modifying the tools in order to communicate with the repository (this is a long term solution which requires accepted standards), or by implicit methods like implementing a virtual repository interface in order to trap all tool's data accesses, translating them to repository accesses.

Regarding SIS and hypermedia, we are currently working on the mapping between SGML and Telos.

6 Conclusion

We believe that the requirements of an advanced hypermedia system can be satisfied if it is developed over an advanced information repository. Our experience from using SIS and DOMENICUS indicates that the following benefits can be obtained from the approach :

Development efficiency: Customization of ready-make functionalities, with reusability of the presentation specification. Automatic production of the card contents. Minimization of the data entry effort, usage of the same base for more than one presentations (data reuse).
Product quality: Information consistency, ability to express complex queries, fully connected presentations.
Ease of tailoring/adaptation: This is achieved through a simple but powerful presentation model.
Extensibility: Schema and data evolution is possible at run-time.

References

1: Philip A. Bernstein and Umeshwar Dayal. ``An Overview of Repository Technology". In Proceedings of the 20th VLDB Conference, pages 705--713, Santiago, Chile, 1994.
2: Panos Constantopoulos. `` Cultural Documentation: The CLIO System". Technical Report 115, Institute of Computer Science Foundation for Research and Technology Hellas, January 1994.
3: Panos Constantopoulos and Martin Doerr. ``Component Classification in the Software Information Base". in O.Nierstrasz and D.Tsichritzis, eds.,Object-Oriented Software Composition, Prentice-Hall,1995.
4: Panos Constantopoulos and Martin Doerr. `` The Semantic Index System : A brief presentation" ;. Institute of Computer Science Foundation for Research and Technology Hellas, May 1994. (http://www.ics.forth.gr/proj/isst/Systems/sis/).
5: Hugh Davis, Wendy Hall, Ian Heath, and Gary Hill. ``Towards An Integrated Information Environment With Open Hypermedia Systems". In Proceedings of European Conference on Hypertext - ECHT '92, Milano, Italy, November 30, 1992.
6: John Mylopoulos, Alex Borgida, Matthias Jarke, and Manolis Koubarakis. ``Telos : Representing Knowledge about Information Systems". ACM Transactions on Information Systems, 8(4), October 1990.
7: John Mylopoulos and Renate Motschnig-Pitric. ``Partitioning Information Bases with Contexts". In Proceedings of Conference on Cooperative Information Systems, CoopIS-95, pages 44--54, Vienna, Austria, May 1995.
8: Ruben Prieto-Diaz. ``Implementing Faceted Classification for Software Reuse". Communications of the ACM, 34(5), 1991.
9: Name Scope in Semantic Data Models ". Master's thesis, Department of Computer Science - University of Crete, September 1995. (in Greek).
10: View Updates in Knowledge Bases". Master's thesis, Department of Computer Science - University of Crete, October 1995. (in Greek).

Yannis' computer life.

Last update 5/9/96

This page has been accessed � times.