Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods can not easily return a single result that
the user can trust, especially if the result is composed from a large number of sources that user has to verify manually.
We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions from which the user can choose to resolve the uncertainty of the data sources and methods used. We present algorithms based on a greedy and a genetic optimization approach that are able to solve the problem of consistent, multi-solution set covering, and present algorithms . We showcase these algorithms using our Dresden Web Table Corpus consisting of 125M tables.
Maik Thiele
SyA