Search Content

Unsupervised Bayesian data cleaning techniques for structured data

Description

Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which…

Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this thesis, I provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. I thus avoid the necessity for a domain expert or master data. I also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. A Map-Reduce architecture to perform this computation in a distributed manner is also shown. I evaluate these methods over both synthetic and real data.

ContributorsDe, Sushovan (Author) / Kambhampati, Subbarao (Thesis advisor) / Chen, Yi (Committee member) / Candan, K. Selcuk (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2014

Time efficient and quality effective K nearest neighbor search in high dimension space

Description

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and…

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and LSH-Forest improve upon the basic LSH algorithm by varying hash bucket size dynamically at query time, so these two algorithms can answer different KNN queries adaptively. However, these two algorithms need a data access post-processing step after candidates' collection in order to get the final answer to the KNN query. In this thesis, Multi-Probe LSH with data access post-processing (Multi-Probe LSH with DAPP) algorithm and LSH-Forest with data access post-processing (LSH-Forest with DAPP) algorithm are improved by replacing the costly data access post-processing (DAPP) step with a much faster histogram-based post-processing (HBPP). Two HBPP algorithms: LSH-Forest with HBPP and Multi- Probe LSH with HBPP are presented in this thesis, both of them achieve the three goals for KNN search in large scale high dimensional data set: high search quality, high time efficiency, high space efficiency. None of the previous KNN algorithms can achieve all three goals. More specifically, it is shown that HBPP algorithms can always achieve high search quality (as good as LSH-Forest with DAPP and Multi-Probe LSH with DAPP) with much less time cost (one to several orders of magnitude speedup) and same memory usage. It is also shown that with almost same time cost and memory usage, HBPP algorithms can always achieve better search quality than LSH-Forest with random pick (LSH-Forest with RP) and Multi-Probe LSH with random pick (Multi-Probe LSH with RP). Moreover, to achieve a very high search quality, Multi-Probe with HBPP is always a better choice than LSH-Forest with HBPP, regardless of the distribution, size and dimension number of the data set.

ContributorsYu, Renwei (Author) / Candan, Kasim S (Thesis advisor) / Sapino, Maria L (Committee member) / Chen, Yi (Committee member) / Sundaram, Hari (Committee member) / Arizona State University (Publisher)

Created2011

Using Games to Explore Collective Action on International Scales

Description

One of the salient challenges of sustainability is the Tragedy of the Commons, where individuals acting independently and rationally deplete a common resource despite their understanding that it is not in the group's long term best interest to do so. Hardin presents this dilemma as nearly intractable and solvable only…

One of the salient challenges of sustainability is the Tragedy of the Commons, where individuals acting independently and rationally deplete a common resource despite their understanding that it is not in the group's long term best interest to do so. Hardin presents this dilemma as nearly intractable and solvable only by drastic, government-mandated social reforms, while Ostrom's empirical work demonstrates that community-scale collaboration can circumvent tragedy without any elaborate outside intervention. Though more optimistic, Ostrom's work provides scant insight into larger-scale dilemmas such as climate change. Consequently, it remains unclear if the sustainable management of global resources is possible without significant government mediation. To investigate, we conducted two game theoretic experiments that challenged students in different countries to collaborate digitally and manage a hypothetical common resource. One experiment involved students attending Arizona State University and the Rochester Institute of Technology in the US and Mountains of the Moon University in Uganda, while the other included students at Arizona State and the Management Development Institute in India. In both experiments, students were randomly assigned to one of three production roles: Luxury, Intermediate, and Subsistence. Students then made individual decisions about how many units of goods they wished to produce up to a set maximum per production class. Luxury players gain the most profit (i.e. grade points) per unit produced, but they also emit the most externalities, or social costs, which directly subtract from the profit of everybody else in the game; Intermediate players produce a medium amount of profit and externalities per unit, and Subsistence players produce a low amount of profit and externalities per unit. Variables influencing and/or inhibiting collaboration were studied using pre- and post-game surveys. This research sought to answer three questions: 1) Are international groups capable of self-organizing in a way that promotes sustainable resource management?, 2) What are the key factors that inhibit or foster collective action among international groups?, and 3) How well do Hardin's theories and Ostrom's empirical models predict the observed behavior of students in the game? The results of gameplay suggest that international cooperation is possible, though likely sub-optimal. Statistical analysis of survey data revealed that heterogeneity and levels of trust significantly influenced game behavior. Specific traits of heterogeneity among students found to be significant were income, education, assigned production role, number of people in one's household, college class, college major, and military service. Additionally, it was found that Ostrom's collective action framework was a better predictor of game outcome than Hardin's theories. Overall, this research lends credence to the plausibility of international cooperation in tragedy of the commons scenarios such as climate change, though much work remains to be done.

ContributorsStanton, Albert Grayson (Author) / Clark, Susan Spierre (Thesis director) / Seager, Thomas (Committee member) / Civil, Environmental and Sustainable Engineering Programs (Contributor) / Barrett, The Honors College (Contributor)

Created2014-12

Random Simulations of Braess's Paradox

Description

This paper uses network theory to simulate Nash equilibria for selfish travel within a traffic network. Specifically, it examines the phenomenon of Braess's Paradox, the counterintuitive occurrence in which adding capacity to a traffic network increases the social costs paid by travelers in a new Nash equilibrium. It also employs…

This paper uses network theory to simulate Nash equilibria for selfish travel within a traffic network. Specifically, it examines the phenomenon of Braess's Paradox, the counterintuitive occurrence in which adding capacity to a traffic network increases the social costs paid by travelers in a new Nash equilibrium. It also employs the measure of the price of anarchy, a ratio between the social cost of the Nash equilibrium flow through a network and the socially optimal cost of travel. These concepts are the basis of the theory behind undesirable selfish routing to identify problematic links and roads in existing metropolitan traffic networks (Youn et al., 2008), suggesting applicative potential behind the theoretical questions this paper attempts to answer. New topologies of networks which generate Braess's Paradox are found. In addition, the relationship between the number of nodes in a network and the number of occurrences of Braess's Paradox, and the relationship between the number of nodes in a network and a network's price of anarchy distribution are studied.

ContributorsChotras, Peter Louis (Author) / Armbruster, Dieter (Thesis director) / Lanchier, Nicolas (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Economics Program in CLAS (Contributor)

Created2015-05

Evolutionary games as interacting particle systems

Description

This dissertation investigates the dynamics of evolutionary games based on the framework of interacting particle systems in which individuals are discrete, space is explicit, and dynamics are stochastic. Its focus is on 2-strategy games played on a d-dimensional integer lattice with a range of interaction M. An overview of…

This dissertation investigates the dynamics of evolutionary games based on the framework of interacting particle systems in which individuals are discrete, space is explicit, and dynamics are stochastic. Its focus is on 2-strategy games played on a d-dimensional integer lattice with a range of interaction M. An overview of related past work is given along with a summary of the dynamics in the mean-field model, which is described by the replicator equation. Then the dynamics of the interacting particle system is considered, first when individuals are updated according to the best-response update process and then the death-birth update process. Several interesting results are derived, and the differences between the interacting particle system model and the replicator dynamics are emphasized. The terms selfish and altruistic are defined according to a certain ordering of payoff parameters. In these terms, the replicator dynamics are simple: coexistence occurs if both strategies are altruistic; the selfish strategy wins if one strategy is selfish and the other is altruistic; and there is bistability if both strategies are selfish. Under the best-response update process, it is shown that there is no bistability region. Instead, in the presence of at least one selfish strategy, the most selfish strategy wins, while there is still coexistence if both strategies are altruistic. Under the death-birth update process, it is shown that regardless of the range of interactions and the dimension, regions of coexistence and bistability are both reduced. Additionally, coexistence occurs in some parameter region for large enough interaction ranges. Finally, in contrast with the replicator equation and the best-response update process, cooperators can win in the prisoner's dilemma for the death-birth process in one-dimensional nearest-neighbor interactions.

ContributorsEvilsizor, Stephen (Author) / Lanchier, Nicolas (Thesis advisor) / Kang, Yun (Committee member) / Motsch, Sebastien (Committee member) / Smith, Hal (Committee member) / Thieme, Horst (Committee member) / Arizona State University (Publisher)

Created2016

Patient-centered and experience-aware mining for effective information discovery in health forums

Description

Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword…

Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword query to search health forums regarding to some specific questions, e.g., what treatments are effective for a disease symptom? A medical researcher can discover medical knowledge in a timely and large-scale fashion by automatically aggregating the latest evidences emerging in health forums.

This dissertation studies how to effectively discover information in health forums. Several challenges have been identified. First, the existing work relies on the syntactic information unit, such as a sentence, a post, or a thread, to bind different pieces of information in a forum. However, most of information discovery tasks should be based on the semantic information unit, a patient. For instance, given a keyword query that involves the relationship between a treatment and side effects, it is expected that the matched keywords refer to the same patient. In this work, patient-centered mining is proposed to mine patient semantic information units. In a patient information unit, the health information, such as diseases, symptoms, treatments, effects, and etc., is connected by the corresponding patient.

Second, the information published in health forums has varying degree of quality. Some information includes patient-reported personal health experience, while others can be hearsay. In this work, a context-aware experience extraction framework is proposed to mine patient-reported personal health experience, which can be used for evidence-based knowledge discovery or finding patients with similar experience.

At last, the proposed patient-centered and experience-aware mining framework is used to build a patient health information database for effectively discovering adverse drug reactions (ADRs) from health forums. ADRs have become a serious health problem and even a leading cause of death in the United States. Health forums provide valuable evidences in a large scale and in a timely fashion through the active participation of patients, caregivers, and doctors. Empirical evaluation shows the effectiveness of the proposed approach.

ContributorsLiu, Yunzhong (Author) / Chen, Yi (Thesis advisor) / Liu, Huan (Thesis advisor) / Li, Baoxin (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2016

Towards a Game-Theoretic Analysis for the Study of Disability Microaggressions as a Communicative Phenomenon

Description

For fifty years, inquiry has attempted to capture how groups of people experience microaggression phenomena through multiple methodological and analytic applications grounded in psychology-influenced frameworks. Yet, despite theoretical advancements, the phenomenon has met criticisms trivializing its existence, falsifiability, and social significance. Unpacking possible interactive factors of a microaggressive moment invites…

For fifty years, inquiry has attempted to capture how groups of people experience microaggression phenomena through multiple methodological and analytic applications grounded in psychology-influenced frameworks. Yet, despite theoretical advancements, the phenomenon has met criticisms trivializing its existence, falsifiability, and social significance. Unpacking possible interactive factors of a microaggressive moment invites a revisitation of the known and unknown pragmatic conditions that may produce and influence its discomforting situational “content.” This study employs an intentional, game-theoretic methodology based on brief, publicly-recorded, everyday conversation segments. Conversation segments of social interactions provide a means to conduct a mathematically-solid, computationally-tractable analysis of explaining what is happening during encounters where disability microaggressions are likely the result of partial (non)cooperation between communicators. Such analysis extends the microaggression research program (MRP) by: (1) proposing theoretical consequences for conversational repair phenomena, algorithmic programming, and experimental designs in negotiation research; and (2) outlining practical approaches for preventing microaggressions with new communication pedagogy, anti-oppression/de-escalation training programs, and calculable, focus-oriented psychotherapy. It concludes with an invitation for scholars to “be” in ambiguity so that they may speculate possible trajectories for the study of microaggressions as a communicative phenomenon.

ContributorsReutlinger, Corey Jon (Author) / de la Garza, Sarah Amira (Thesis advisor) / Alberts, Janet (Committee member) / Lanchier, Nicolas (Committee member) / Cherney, James L. (Committee member) / Arizona State University (Publisher)

Created2021

Game-theoretic Empathetic Parameter Estimation in Two-Vehicle Interaction

Description

Recent years, there has been many attempts with different approaches to the human-robot interaction (HRI) problems. In this paper, the multi-agent interaction is formulated as a differential game with incomplete information. To tackle this problem, the parameter estimation method is utilized to obtain the approximated solution in a real time…

Recent years, there has been many attempts with different approaches to the human-robot interaction (HRI) problems. In this paper, the multi-agent interaction is formulated as a differential game with incomplete information. To tackle this problem, the parameter estimation method is utilized to obtain the approximated solution in a real time basis. Previous studies in the parameter estimation made the assumption that the human parameters are known by the robot; but such may not be the case and there exists uncertainty in the modeling of the human rewards as well as human's modeling of the robot's rewards. The proposed method, empathetic estimation, is tested and compared with the ``non-empathetic'' estimation from the existing works. The case studies are conducted in an uncontrolled intersection with two agents attempting to pass efficiently. Results have shown that in the case of both agents having inconsistent belief of the other agent's parameters, the empathetic agent performs better at estimating the parameters and has higher reward values, which indicates the scenarios when empathy is essential: when agent's initial belief is mismatched from the true parameters/intent of the agents.

ContributorsChen, Yi (Author) / Ren, Yi (Thesis advisor) / Zhang, Wenlong (Committee member) / Yong, Sze Zheng (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by