Improving Data Quality by Leveraging Statistical Relational Learning

LARYSA Visengeriyeva ALAN Akbik MANOHAR Kaul TILMANN Rabl VOLKER Markl
Digitally collected data su? ers from many data quality issues, such as duplicate, incorrect, or incomplete data. A common approach for counteracting these issues is to formulate a set of data cleaning rules to identify and repair incorrect, duplicate and missing data. Data cleaning systems must be able to treat data quality rules holistically, to incorporate heterogeneous constraints within a single routine, and to automate data curation. We propose an approach to data cleaning based on statistical relational learning (SRL). We ...