Search Content

Spatio-temporal data mining to detect changes and clusters in trajectories

Description

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic…

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories.

ContributorsKondaveeti, Anirudh (Author) / Runger, George C. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2012

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus…

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)

Created2013

Individualizing the informed consent process for whole genome sequencing: a patient directed approach

Description

ABSTRACT Whole genome sequencing (WGS) and whole exome sequencing (WES) are two comprehensive genomic tests which use next-generation sequencing technology to sequence most of the 3.2 billion base pairs in a human genome (WGS) or many of the estimated 22,000 protein-coding genes in the genome (WES). The promises offered from…

ABSTRACT Whole genome sequencing (WGS) and whole exome sequencing (WES) are two comprehensive genomic tests which use next-generation sequencing technology to sequence most of the 3.2 billion base pairs in a human genome (WGS) or many of the estimated 22,000 protein-coding genes in the genome (WES). The promises offered from WGS/WES are: to identify suspected yet unidentified genetic diseases, to characterize the genomic mutations in a tumor to identify targeted therapeutic agents and, to predict future diseases with the hope of promoting disease prevention strategies and/or offering early treatment. Promises notwithstanding, sequencing a human genome presents several interrelated challenges: how to adequately analyze, interpret, store, reanalyze and apply an unprecedented amount of genomic data (with uncertain clinical utility) to patient care? In addition, genomic data has the potential to become integral for improving the medical care of an individual and their family, years after a genome is sequenced. Current informed consent protocols do not adequately address the unique challenges and complexities inherent to the process of WGS/WES. This dissertation constructs a novel informed consent process for individuals considering WGS/WES, capable of fulfilling both legal and ethical requirements of medical consent while addressing the intricacies of WGS/WES, ultimately resulting in a more effective consenting experience. To better understand components of an effective consenting experience, the first part of this dissertation traces the historical origin of the informed consent process to identify the motivations, rationales and institutional commitments that sustain our current consenting protocols for genetic testing. After understanding the underlying commitments that shape our current informed consent protocols, I discuss the effectiveness of the informed consent process from an ethical and legal standpoint. I illustrate how WGS/WES introduces new complexities to the informed consent process and assess whether informed consent protocols proposed for WGS/WES address these complexities. The last section of this dissertation describes a novel informed consent process for WGS/WES, constructed from the original ethical intent of informed consent, analysis of existing informed consent protocols, and my own observations as a genetic counselor for what constitutes an effective consenting experience.

ContributorsHunt, Katherine (Author) / Hurlbut, J. Benjamin (Thesis advisor) / Robert, Jason S. (Thesis advisor) / Maienschein, Jane (Committee member) / Northfelt, Donald W. (Committee member) / Marchant, Gary (Committee member) / Ellison, Karin (Committee member) / Arizona State University (Publisher)

Created2013

Dissociating the disease from the diseased

Description

Lung Cancer Alliance, a nonprofit organization, released the "No One Deserves to Die" advertising campaign in June 2012. The campaign visuals presented a clean, simple message to the public: the stigma associated with lung cancer drives marginalization of lung cancer patients. Lung Cancer Alliance (LCA) asserts that negative public attitude…

Lung Cancer Alliance, a nonprofit organization, released the "No One Deserves to Die" advertising campaign in June 2012. The campaign visuals presented a clean, simple message to the public: the stigma associated with lung cancer drives marginalization of lung cancer patients. Lung Cancer Alliance (LCA) asserts that negative public attitude toward lung cancer stems from unacknowledged moral judgments that generate 'stigma.' The campaign materials are meant to expose and challenge these common public category-making processes that occur when subconsciously evaluating lung cancer patients. These processes involve comparison, perception of difference, and exclusion. The campaign implies that society sees suffering of lung cancer patients as indicative of moral failure, thus, not warranting assistance from society, which leads to marginalization of the diseased. Attributing to society a morally laden view of the disease, the campaign extends this view to its logical end and makes it explicit: lung cancer patients no longer deserve to live because they themselves caused the disease (by smoking). This judgment and resulting marginalization is, according to LCA, evident in the ways lung cancer patients are marginalized relative to other diseases via minimal research funding, high- mortality rates and low awareness of the disease. Therefore, society commits an injustice against those with lung cancer. This research analyzes the relationship between disease, identity-making, and responsibilities within society as represented by this stigma framework. LCA asserts that society understands lung cancer in terms of stigma, and advocates that society's understanding of lung cancer should be shifted from a stigma framework toward a medical framework. Analysis of identity-making and responsibility encoded in both frameworks contributes to evaluation of the significance of reframing this disease. One aim of this thesis is to explore the relationship between these frameworks in medical sociology. The results show a complex interaction that suggest trading one frame for another will not destigmatize the lung cancer patient. Those interactions cause tangible harms, such as high mortality rates, and there are important implications for other communities that experience a stigmatized disease.

ContributorsCalvelage, Victoria (Author) / Hurlbut, J. Benjamin (Thesis advisor) / Maienschein, Jane (Committee member) / Ellison, Karin (Committee member) / Arizona State University (Publisher)

Created2013

Professor attitudes and beliefs about teaching evolution

Description

Teaching evolution has been shown to be a challenge for faculty, in both K-12 and postsecondary education. Many of these challenges stem from perceived conflicts not only between religion and evolution, but also faculty beliefs about religion, it's compatibility with evolutionary theory, and it's proper role in classroom curriculum. Studies…

Teaching evolution has been shown to be a challenge for faculty, in both K-12 and postsecondary education. Many of these challenges stem from perceived conflicts not only between religion and evolution, but also faculty beliefs about religion, it's compatibility with evolutionary theory, and it's proper role in classroom curriculum. Studies suggest that if educators engage with students' religious beliefs and identity, this may help students have positive attitudes towards evolution. The aim of this study was to reveal attitudes and beliefs professors have about addressing religion and providing religious scientist role models to students when teaching evolution. 15 semi-structured interviews of tenured biology professors were conducted at a large Midwestern universiy regarding their beliefs, experiences, and strategies teaching evolution and particularly, their willingness to address religion in a class section on evolution. Following a qualitative analysis of transcripts, professors did not agree on whether or not it is their job to help students accept evolution (although the majority said it is not), nor did they agree on a definition of "acceptance of evolution". Professors are willing to engage in students' religious beliefs, if this would help their students accept evolution. Finally, professors perceived many challenges to engaging students' religious beliefs in a science classroom such as the appropriateness of the material for a science class, large class sizes, and time constraints. Given the results of this study, the author concludes that instructors must come to a consensus about their goals as biology educators as well as what "acceptance of evolution" means, before they can realistically apply the engagement of student's religious beliefs and identity as an educational strategy.

ContributorsBarnes, Maryann Elizabeth (Author) / Brownell, Sara E (Thesis advisor) / Brem, Sarah K. (Thesis advisor) / Lynch, John M. (Committee member) / Ellison, Karin (Committee member) / Arizona State University (Publisher)

Created2014

Development and analysis of new 3D tactile materials for the enhancement of STEM education for the blind and visually impaired

Description

Blind and visually impaired individuals have historically demonstrated a low participation in the fields of science, engineering, mathematics, and technology (STEM). This low participation is reflected in both their education and career choices. Despite the establishment of the Americans with Disabilities Act (ADA) and the Individuals with Disabilities Education Act…

Blind and visually impaired individuals have historically demonstrated a low participation in the fields of science, engineering, mathematics, and technology (STEM). This low participation is reflected in both their education and career choices. Despite the establishment of the Americans with Disabilities Act (ADA) and the Individuals with Disabilities Education Act (IDEA), blind and visually impaired (BVI) students continue to academically fall below the level of their sighted peers in the areas of science and math. Although this deficit is created by many factors, this study focuses on the lack of adequate accessible image based materials. Traditional methods for creating accessible image materials for the vision impaired have included detailed verbal descriptions accompanying an image or conversion into a simplified tactile graphic. It is very common that no substitute materials will be provided to students within STEM courses because they are image rich disciplines and often include a large number images, diagrams and charts. Additionally, images that are translated into text or simplified into basic line drawings are frequently inadequate because they rely on the interpretations of resource personnel who do not have expertise in STEM. Within this study, a method to create a new type of tactile 3D image was developed using High Density Polyethylene (HDPE) and Computer Numeric Control (CNC) milling. These tactile image boards preserve high levels of detail when compared to the original print image. To determine the discernibility and effectiveness of tactile images, these customizable boards were tested in various

university classrooms as well as in participation studies which included BVI and sighted students. Results from these studies indicate that tactile images are discernable and were found to improve performance in lab exercises as much as 60% for those with visual impairment. Incorporating tactile HDPE 3D images into a classroom setting was shown to increase the interest, participation and performance of BVI students suggesting that this type of 3D tactile image should be incorporated into STEM classes to increase the participation of these students and improve the level of training they receive in science and math.

ContributorsGonzales, Ashleigh (Author) / Baluch, Debra P (Thesis advisor) / Maienschein, Jane (Committee member) / Ellison, Karin (Committee member) / Arizona State University (Publisher)

Created2015

Analysis of no-confounding designs using the dantzig selector

Description

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation…

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation of main effects and a few two-factor interactions without the need for follow-up experimentation. Analysis methods for non-regular designs is an area of ongoing research, because standard variable selection techniques such as stepwise regression may not always be the best approach. The current work investigates the use of the Dantzig selector for analyzing no-confounding designs. Through a series of examples it shows that this technique is very effective for identifying the set of active factors in no-confounding designs when there are three of four active main effects and up to two active two-factor interactions.

To evaluate the performance of Dantzig selector, a simulation study was conducted and the results based on the percentage of type II errors are analyzed. Also, another alternative for 6 factor NC design, called the Alternate No-confounding design in six factors is introduced in this study. The performance of this Alternate NC design in 6 factors is then evaluated by using Dantzig selector as an analysis method. Lastly, a section is dedicated to comparing the performance of NC-6 and Alternate NC-6 designs.

ContributorsKrishnamoorthy, Archana (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie (Thesis advisor) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

Holistic learning for multi-target and network monitoring problems

Description

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view…

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view of the problem allows for richer learning from data and, thereby, improves decision making.

The first topic of this dissertation is the prediction of several target attributes using a common set of predictor attributes. In a holistic learning approach, the relationships between target attributes are embedded into the learning algorithm created in this dissertation. Specifically, a novel tree based ensemble that leverages the relationships between target attributes towards constructing a diverse, yet strong, model is proposed. The method is justified through its connection to existing methods and experimental evaluations on synthetic and real data.

The second topic pertains to monitoring complex systems that are modeled as networks. Such systems present a rich set of attributes and relationships for which holistic learning is important. In social networks, for example, in addition to friendship ties, various attributes concerning the users' gender, age, topic of messages, time of messages, etc. are collected. A restricted form of monitoring fails to take the relationships of multiple attributes into account, whereas the holistic view embeds such relationships in the monitoring methods. The focus is on the difficult task to detect a change that might only impact a small subset of the network and only occur in a sub-region of the high-dimensional space of the network attributes. One contribution is a monitoring algorithm based on a network statistical model. Another contribution is a transactional model that transforms the task into an expedient structure for machine learning, along with a generalizable algorithm to monitor the attributed network. A learning step in this algorithm adapts to changes that may only be local to sub-regions (with a broader potential for other learning tasks). Diagnostic tools to interpret the change are provided. This robust, generalizable, holistic monitoring method is elaborated on synthetic and real networks.

ContributorsAzarnoush, Bahareh (Author) / Runger, George C. (Thesis advisor) / Bekki, Jennifer (Thesis advisor) / Pan, Rong (Committee member) / Saghafian, Soroush (Committee member) / Arizona State University (Publisher)

Created2014

Experimental designs for generalized linear models and functional magnetic resonance imaging

Description

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact…

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact designs for GLMs, that are robust against parameter, link and model

uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we concentrate on providing robust designs for GLMs, using the D and A optimality criterion. The second part of our research is on providing an algorithm

that is a faster alternative to a recently proposed genetic algorithm (GA) to construct optimal designs for fMRI studies. Our algorithm is built upon a discrete version of the PSO.

ContributorsTemkit, M'Hamed (Author) / Kao, Jason (Thesis advisor) / Reiser, Mark R. (Committee member) / Barber, Jarrett (Committee member) / Montgomery, Douglas C. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

Theses and Dissertations