Towards Semantically Enriched Data Spaces
Motivation
Usually, large corporations struggle to provide an enterprise-wide overview of available data and data demands.
The main challenges are complex processes which are intensified by the exponential growth of data. As a result, data is often split and locked in separate data silos maintained by the diverse departments.
Leveraging these data silos for analytical and data science tasks is a hard problem.
Studies show that data scientists spend up to 80% of their time for discovering, accessing and transforming data instead of analyzing it.
Current, heavily engineering-based solutions do not allow queryable ad-hoc access and do not scale beyond a certain number of data sources and models.
Approach
We are currently researching new systems to enable next-generation data management using the following approaches:
Linked data management using graph databases and ontologies that enable:
- enrichment of decentral data with meta / semantic / provenance information
- efficient incremental management of this information as annotated labeled graphs
- adoption of emergent meta data standards and ontologies
Query federation using industry standards (like Apache Drill, GraphQL) that enable:
- Virtual data sets
- Avoidance of data duplication
- Decoupling of source and target schema
Publications
| Holl, P., & Gossling, K. Midas-An interactive data catalog for data science teams.: KDD`19 Project Showcase |