Matching Items (2,111)
Filtering by

Clear all filters

152158-Thumbnail Image.png
Description
Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.
ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2013
173937-Thumbnail Image.png
Description

Leonard Hayflick studied the processes by which cells age during the twentieth and twenty-first centuries in the United States. In 1961 at the Wistar Institute in the US, Hayflick researched a phenomenon later called the Hayflick Limit, or the claim that normal human cells can only divide forty to sixty

Leonard Hayflick studied the processes by which cells age during the twentieth and twenty-first centuries in the United States. In 1961 at the Wistar Institute in the US, Hayflick researched a phenomenon later called the Hayflick Limit, or the claim that normal human cells can only divide forty to sixty times before they cannot divide any further. Researchers later found that the cause of the Hayflick Limit is the shortening of telomeres, or portions of DNA at the ends of chromosomes that slowly degrade as cells replicate. Hayflick used his research on normal embryonic cells to develop a vaccine for polio, and from HayflickÕs published directions, scientists developed vaccines for rubella, rabies, adenovirus, measles, chickenpox and shingles.

Created2014-07-20
173939-Thumbnail Image.png
Description

Although best known for his work with the fruit fly, for which he earned a Nobel Prize and the title "The Father of Genetics," Thomas Hunt Morgan's contributions to biology reach far beyond genetics. His research explored questions in embryology, regeneration, evolution, and heredity, using a variety of approaches.

Created2007-09-25
173947-Thumbnail Image.jpg
Created1935