Scalable & incrementa  type discovery

The rapid proliferation of the semantic web has led to the emergence of many weakly structured and incomplete data sources. On these sources typing information might be partially or completely missing. On the other hand, type information is essential for a number of tasks such as query answering, integration, summarization, and partitioning. Existing approaches for type discovery, either completely ignore type declarations available in the dataset (implicit type discovery approaches), or rely only on existing types, in order to complement them (explicit type enrichment approaches). In this demonstration, we present HInT, the first \textit{incremental} and \textit{hybrid} type discovery system for RDF datasets. To achieve this goal HInT identifies the patterns of the various instances, and then indexes and groups them to identify the types. Besides discovering new types, HInT exploits type information if available, to improve the quality of the discovered types by guiding the classification of the new instance in the correct group and by refining the groups already built.

Advantages

  • The first hybrid approach

    Combines explicit schema enrichment with implicit schema discovery

  • Pattern Compression

    Reduces instance comparison to pattern comparison

  • Incremental

    Exploits natively incrementallity by using Locallity Sensitive Hashing(LSH)  for clustering 

Architecture

The high-level architecture of HInT consists of the graphical user interface (GUI) and the back-end subsystem. The GUI has been implemented in HTML and CSS using the Vue JS framework whereas the back-end is implemented in Python.

  • 2 Competitors Outperformed on implicit type discovery
  • 1 Competitor Outperformed on explicit  type enrichment

Demonstration Scenario

  • Configuration. Five datasets: BNF, Conference, DBpedia, HistMunic and LUBM
  • Schema Discovery Ignoring Known Types. Ignore typing information, pattern discovery, LSH indexing and type assignment.
  • Schema Discovery Exploiting Typing Information. Configure the system to exploit available typing information improving the discovered types.