Search Content

Spatio-temporal data mining to detect changes and clusters in trajectories

Description

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic…

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories.

ContributorsKondaveeti, Anirudh (Author) / Runger, George C. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2012

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus…

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)

Created2013

Policy-driven security management for gateway-oriented reconfigurable ecosystems

Description

With the increasing user demand for low latency, elastic provisioning of computing resources coupled with ubiquitous and on-demand access to real-time data, cloud computing has emerged as a popular computing paradigm to meet growing user demands.…

With the increasing user demand for low latency, elastic provisioning of computing resources coupled with ubiquitous and on-demand access to real-time data, cloud computing has emerged as a popular computing paradigm to meet growing user demands. However, with the introduction and rising use of wear- able technology and evolving uses of smart-phones, the concept of Internet of Things (IoT) has become a prevailing notion in the currently growing technology industry. Cisco Inc. has projected a data creation of approximately 403 Zetabytes (ZB) by 2018. The combination of bringing benign devices and connecting them to the web has resulted in exploding service and data aggregation requirements, thus requiring a new and innovative computing platform. This platform should have the capability to provide robust real-time data analytics and resource provisioning to clients, such as IoT users, on-demand. Such a computation model would need to function at the edge-of-the-network, forming a bridge between the large cloud data centers and the distributed connected devices.

This research expands on the notion of bringing computational power to the edge- of-the-network, and then integrating it with the cloud computing paradigm whilst providing services to diverse IoT-based applications. This expansion is achieved through the establishment of a new computing model that serves as a platform for IoT-based devices to communicate with services in real-time. We name this paradigm as Gateway-Oriented Reconfigurable Ecosystem (GORE) computing. Finally, this thesis proposes and discusses the development of a policy management framework for accommodating our proposed computational paradigm. The policy framework is designed to serve both the hosted applications and the GORE paradigm by enabling them to function more efficiently. The goal of the framework is to ensure uninterrupted communication and service delivery between users and their applications.

ContributorsDsouza, Clinton (Author) / Ahn, Gail-Joon (Thesis advisor) / Doupe, Adam (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)

Created2015

Privacy preserving controls for Android applications

Description

Android is currently the most widely used mobile operating system. The permission model in Android governs the resource access privileges of applications. The permission model however is amenable to various attacks, including re-delegation attacks, background snooping attacks and disclosure of private information. This thesis is aimed at understanding, analyzing and…

Android is currently the most widely used mobile operating system. The permission model in Android governs the resource access privileges of applications. The permission model however is amenable to various attacks, including re-delegation attacks, background snooping attacks and disclosure of private information. This thesis is aimed at understanding, analyzing and performing forensics on application behavior. This research sheds light on several security aspects, including the use of inter-process communications (IPC) to perform permission re-delegation attacks.

Android permission system is more of app-driven rather than user controlled, which means it is the applications that specify their permission requirement and the only thing which the user can do is choose not to install a particular application based on the requirements. Given the all or nothing choice, users succumb to pressures and needs to accept permissions requested. This thesis proposes a couple of ways for providing the users finer grained control of application privileges. The same methods can be used to evade the Permission Re-delegation attack.

This thesis also proposes and implements a novel methodology in Android that can be used to control the access privileges of an Android application, taking into consideration the context of the running application. This application-context based permission usage is further used to analyze a set of sample applications. We found the evidence of applications spoofing or divulging user sensitive information such as location information, contact information, phone id and numbers, in the background. Such activities can be used to track users for a variety of privacy-intrusive purposes. We have developed implementations that minimize several forms of privacy leaks that are routinely done by stock applications.

ContributorsGollapudi, Narasimha Aditya (Author) / Dasgupta, Partha (Thesis advisor) / Xue, Guoliang (Committee member) / Doupe, Adam (Committee member) / Arizona State University (Publisher)

Created2014

Towards demographic information release in LBS k-Anonymization

Description

The increasing number of continually connected mobile persons has created an environment conducive to real time user data gathering for many uses both public and private in nature. Publicly, one can envision no longer requiring a census to determine the demographic composition of the country and its sub regions. The…

The increasing number of continually connected mobile persons has created an environment conducive to real time user data gathering for many uses both public and private in nature. Publicly, one can envision no longer requiring a census to determine the demographic composition of the country and its sub regions. The information provided is vastly more up to date than that of a census and allows civil authorities to be more agile and preemptive with planning. Privately, advertisers take advantage of a persons stated opinions, demographics, and contextual (where and when) information in order to formulate and present pertinent offers.

Regardless of its use this information can be sensitive in nature and should therefore be under the control of the user. Currently, a user has little say in the manner that their information is processed once it has been released. An ad-hoc approach is currently in use, where the location based service providers each maintain their own policy over personal information usage.

In order to allow more user control over their personal information while still providing for targeted advertising, a systematic approach to the release of the information is needed. It is for that reason we propose a User-Centric Context Aware Spatiotemporal Anonymization framework. At its core the framework will unify the current spatiotemporal anonymization with that of traditional anonymization so that user specified anonymization requirement is met or exceeded while allowing for more demographic information to be released.

ContributorsSanchez, Michael Andrew (Author) / Ahn, Gail-Joon (Thesis advisor) / Doupe, Adam (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)

Created2014

Flexible analyst defined viewpoint for malware relationship analysis

Description

The rate at which new malicious software (Malware) is created is consistently increasing each year. These new malwares are designed to bypass the current anti-virus countermeasures employed to protect computer systems. Security Analysts must understand the nature and intent of the malware sample in order to protect computer systems from…

The rate at which new malicious software (Malware) is created is consistently increasing each year. These new malwares are designed to bypass the current anti-virus countermeasures employed to protect computer systems. Security Analysts must understand the nature and intent of the malware sample in order to protect computer systems from these attacks. The large number of new malware samples received daily by computer security companies require Security Analysts to quickly determine the type, threat, and countermeasure for newly identied samples. Our approach provides for a visualization tool to assist the Security Analyst in these tasks that allows the Analyst to visually identify relationships between malware samples.

This approach consists of three steps. First, the received samples are processed by a sandbox environment to perform a dynamic behavior analysis. Second, the reports of the dynamic behavior analysis are parsed to extract identifying features which are matched against other known and analyzed samples. Lastly, those matches that are determined to express a relationship are visualized as an edge connected pair of nodes in an undirected graph.

ContributorsHolmes, James Edward (Author) / Ahn, Gail-Joon (Thesis advisor) / Dasgupta, Partha (Committee member) / Doupe, Adam (Committee member) / Arizona State University (Publisher)

Created2014

Analysis of no-confounding designs using the dantzig selector

Description

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation…

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation of main effects and a few two-factor interactions without the need for follow-up experimentation. Analysis methods for non-regular designs is an area of ongoing research, because standard variable selection techniques such as stepwise regression may not always be the best approach. The current work investigates the use of the Dantzig selector for analyzing no-confounding designs. Through a series of examples it shows that this technique is very effective for identifying the set of active factors in no-confounding designs when there are three of four active main effects and up to two active two-factor interactions.

To evaluate the performance of Dantzig selector, a simulation study was conducted and the results based on the percentage of type II errors are analyzed. Also, another alternative for 6 factor NC design, called the Alternate No-confounding design in six factors is introduced in this study. The performance of this Alternate NC design in 6 factors is then evaluated by using Dantzig selector as an analysis method. Lastly, a section is dedicated to comparing the performance of NC-6 and Alternate NC-6 designs.

ContributorsKrishnamoorthy, Archana (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie (Thesis advisor) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

Holistic learning for multi-target and network monitoring problems

Description

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view…

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view of the problem allows for richer learning from data and, thereby, improves decision making.

The first topic of this dissertation is the prediction of several target attributes using a common set of predictor attributes. In a holistic learning approach, the relationships between target attributes are embedded into the learning algorithm created in this dissertation. Specifically, a novel tree based ensemble that leverages the relationships between target attributes towards constructing a diverse, yet strong, model is proposed. The method is justified through its connection to existing methods and experimental evaluations on synthetic and real data.

The second topic pertains to monitoring complex systems that are modeled as networks. Such systems present a rich set of attributes and relationships for which holistic learning is important. In social networks, for example, in addition to friendship ties, various attributes concerning the users' gender, age, topic of messages, time of messages, etc. are collected. A restricted form of monitoring fails to take the relationships of multiple attributes into account, whereas the holistic view embeds such relationships in the monitoring methods. The focus is on the difficult task to detect a change that might only impact a small subset of the network and only occur in a sub-region of the high-dimensional space of the network attributes. One contribution is a monitoring algorithm based on a network statistical model. Another contribution is a transactional model that transforms the task into an expedient structure for machine learning, along with a generalizable algorithm to monitor the attributed network. A learning step in this algorithm adapts to changes that may only be local to sub-regions (with a broader potential for other learning tasks). Diagnostic tools to interpret the change are provided. This robust, generalizable, holistic monitoring method is elaborated on synthetic and real networks.

ContributorsAzarnoush, Bahareh (Author) / Runger, George C. (Thesis advisor) / Bekki, Jennifer (Thesis advisor) / Pan, Rong (Committee member) / Saghafian, Soroush (Committee member) / Arizona State University (Publisher)

Created2014

Experimental designs for generalized linear models and functional magnetic resonance imaging

Description

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact…

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact designs for GLMs, that are robust against parameter, link and model

uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we concentrate on providing robust designs for GLMs, using the D and A optimality criterion. The second part of our research is on providing an algorithm

that is a faster alternative to a recently proposed genetic algorithm (GA) to construct optimal designs for fMRI studies. Our algorithm is built upon a discrete version of the PSO.

ContributorsTemkit, M'Hamed (Author) / Kao, Jason (Thesis advisor) / Reiser, Mark R. (Committee member) / Barber, Jarrett (Committee member) / Montgomery, Douglas C. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

Theses and Dissertations