Matching Items (2)
Filtering by

Clear all filters

153003-Thumbnail Image.png
Description
Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which

Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this thesis, I provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. I thus avoid the necessity for a domain expert or master data. I also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. A Map-Reduce architecture to perform this computation in a distributed manner is also shown. I evaluate these methods over both synthetic and real data.
ContributorsDe, Sushovan (Author) / Kambhampati, Subbarao (Thesis advisor) / Chen, Yi (Committee member) / Candan, K. Selcuk (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2014
135486-Thumbnail Image.png
Description
This creative project is the first draft of a database of financial records from Arizona law enforcement's use of the state asset forfeiture program from fiscal 2011-2015. Asset forfeiture is a program by which law enforcement can seize property suspected to have been used in a crime and can then

This creative project is the first draft of a database of financial records from Arizona law enforcement's use of the state asset forfeiture program from fiscal 2011-2015. Asset forfeiture is a program by which law enforcement can seize property suspected to have been used in a crime and can then use the property, cash, or proceeds from the property's auction for its own purposes, raising questions of conflicts of interest. The paper explains the methodology and goals for the database, while the database itself represents more than 11,000 pages of financial records and is more than 70,300 cells large.
ContributorsMahoney, Emily Livingston (Author) / Doig, Steve (Thesis director) / Petchel, Jacqueline (Committee member) / Walter Cronkite School of Journalism and Mass Communication (Contributor) / School of Music (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05