Towards Semantically Enriched Data Spaces

Motivation

Usually, large corporations struggle to provide an enterprise-wide overview of available data and data demands.
The main challenges are complex processes which are intensified by the exponential growth of data. As a result, data is often split and locked in separate data silos maintained by the diverse departments.

Leveraging these data silos for analytical and data science tasks is a hard problem.
Studies show that data scientists spend up to 80% of their time for discovering, accessing and transforming data instead of analyzing it.
Current, heavily engineering-based solutions do not allow queryable ad-hoc access and do not scale beyond a certain number of data sources and models.

Approach

We are currently researching new systems to enable next-generation data management using the following approaches:

Linked data management using graph databases and ontologies that enable:

enrichment of decentral data with meta / semantic / provenance information
efficient incremental management of this information as annotated labeled graphs
adoption of emergent meta data standards and ontologies

Query federation using industry standards (like Apache Drill, GraphQL) that enable:

Virtual data sets
Avoidance of data duplication
Decoupling of source and target schema

Publications

Holl, P., & Gossling, K. Midas-An interactive data catalog for data science teams.: KDD`19 Project Showcase

To top

Chair of Software Engineering for Business Information Systems

Prof. Dr. Florian Matthes

Project Attributes

Acronym

MIDAS

Research Area

Enterprise Architecture Management

Contact

Patrick Holl

Partners

DATEV

Sponsors

Software Campus

Status

Completed