Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple.
Download count: 0
- Partial requirement for: M.S., Arizona State University, 2013Note typethesis
- Includes bibliographical references (p. 32-33)Note typebibliography
- Field of study: Computer science