Search Content

Spatio-temporal data mining to detect changes and clusters in trajectories

Description

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic…

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories.

ContributorsKondaveeti, Anirudh (Author) / Runger, George C. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2012

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus…

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)

Created2013

Automated testing for RBAC policies

Description

Access control is necessary for information assurance in many of today's applications such as banking and electronic health record. Access control breaches are critical security problems that can result from unintended and improper implementation of security policies. Security testing can help identify security vulnerabilities early and avoid unexpected expensive cost…

Access control is necessary for information assurance in many of today's applications such as banking and electronic health record. Access control breaches are critical security problems that can result from unintended and improper implementation of security policies. Security testing can help identify security vulnerabilities early and avoid unexpected expensive cost in handling breaches for security architects and security engineers. The process of security testing which involves creating tests that effectively examine vulnerabilities is a challenging task. Role-Based Access Control (RBAC) has been widely adopted to support fine-grained access control. However, in practice, due to its complexity including role management, role hierarchy with hundreds of roles, and their associated privileges and users, systematically testing RBAC systems is crucial to ensure the security in various domains ranging from cyber-infrastructure to mission-critical applications. In this thesis, we introduce i) a security testing technique for RBAC systems considering the principle of maximum privileges, the structure of the role hierarchy, and a new security test coverage criterion; ii) a MTBDD (Multi-Terminal Binary Decision Diagram) based representation of RBAC security policy including RHMTBDD (Role Hierarchy MTBDD) to efficiently generate effective positive and negative security test cases; and iii) a security testing framework which takes an XACML-based RBAC security policy as an input, parses it into a RHMTBDD representation and then generates positive and negative test cases. We also demonstrate the efficacy of our approach through case studies.

ContributorsGupta, Poonam (Author) / Ahn, Gail-Joon (Thesis advisor) / Collofello, James (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2014

Thermal aware scheduling in hadoop map reduce framework

Description

The energy consumption of data centers is increasing steadily along with the associ- ated power-density. Approximately half of such energy consumption is attributed to the cooling energy, as a result of which reducing cooling energy along with reducing servers energy consumption in data centers is becoming imperative so as to…

The energy consumption of data centers is increasing steadily along with the associ- ated power-density. Approximately half of such energy consumption is attributed to the cooling energy, as a result of which reducing cooling energy along with reducing servers energy consumption in data centers is becoming imperative so as to achieve greening of the data centers. This thesis deals with cooling energy management in data centers running data-processing frameworks. In particular, we propose ther- mal aware scheduling for MapReduce framework and its Hadoop implementation to reduce cooling energy in data centers. Data-processing frameworks run many low- priority batch processing jobs, such as background log analysis, that do not have strict completion time requirements; they can be delayed by a bounded amount of time. Cooling energy savings are possible by being able to temporally spread the workload, and assign it to the computing equipments which reduce the heat recirculation in data center room and therefore the load on the cooling systems. We implement our scheme in Hadoop and performs some experiments using both CPU-intensive and I/O-intensive workload benchmarks in order to evaluate the efficiency of our scheme. The evaluation results highlight that our thermal aware scheduling reduces hot-spots and makes uniform temperature distribution within the data center possible. Sum- marizing the contribution, we incorporated thermal awareness in Hadoop MapReduce framework by enhancing the native scheduler to make it thermally aware, compare the Thermal Aware Scheduler(TAS) with the Hadoop scheduler (FCFS) by running PageRank and TeraSort benchmarks in the BlueTool data center of Impact lab and show that there is reduction in peak temperature and decrease in cooling power using TAS over FCFS scheduler.

ContributorsKole, Sayan (Author) / Gupta, Sandeep (Thesis advisor) / Huang, Dijiang (Committee member) / Varsamopoulos, Georgios (Committee member) / Arizona State University (Publisher)

Created2013

Discovering and using patterns for countering security challenges

Description

Most existing security decisions for both defending and attacking are made based on some deterministic approaches that only give binary answers. Even though these approaches can achieve low false positive rate for decision making, they have high false negative rates due to the lack of accommodations to new attack methods…

Most existing security decisions for both defending and attacking are made based on some deterministic approaches that only give binary answers. Even though these approaches can achieve low false positive rate for decision making, they have high false negative rates due to the lack of accommodations to new attack methods and defense techniques. In this dissertation, I study how to discover and use patterns with uncertainty and randomness to counter security challenges. By extracting and modeling patterns in security events, I am able to handle previously unknown security events with quantified confidence, rather than simply making binary decisions. In particular, I cope with the following four real-world security challenges by modeling and analyzing with pattern-based approaches: 1) How to detect and attribute previously unknown shellcode? I propose instruction sequence abstraction that extracts coarse-grained patterns from an instruction sequence and use Markov chain-based model and support vector machines to detect and attribute shellcode; 2) How to safely mitigate routing attacks in mobile ad hoc networks? I identify routing table change patterns caused by attacks, propose an extended Dempster-Shafer theory to measure the risk of such changes, and use a risk-aware response mechanism to mitigate routing attacks; 3) How to model, understand, and guess human-chosen picture passwords? I analyze collected human-chosen picture passwords, propose selection function that models patterns in password selection, and design two algorithms to optimize password guessing paths; and 4) How to identify influential figures and events in underground social networks? I analyze collected underground social network data, identify user interaction patterns, and propose a suite of measures for systematically discovering and mining adversarial evidence. By solving these four problems, I demonstrate that discovering and using patterns could help deal with challenges in computer security, network security, human-computer interaction security, and social network security.

ContributorsZhao, Ziming (Author) / Ahn, Gail-Joon (Thesis advisor) / Yau, Stephen S. (Committee member) / Huang, Dijiang (Committee member) / Santanam, Raghu (Committee member) / Arizona State University (Publisher)

Created2014

Analysis of no-confounding designs using the dantzig selector

Description

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation…

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation of main effects and a few two-factor interactions without the need for follow-up experimentation. Analysis methods for non-regular designs is an area of ongoing research, because standard variable selection techniques such as stepwise regression may not always be the best approach. The current work investigates the use of the Dantzig selector for analyzing no-confounding designs. Through a series of examples it shows that this technique is very effective for identifying the set of active factors in no-confounding designs when there are three of four active main effects and up to two active two-factor interactions.

To evaluate the performance of Dantzig selector, a simulation study was conducted and the results based on the percentage of type II errors are analyzed. Also, another alternative for 6 factor NC design, called the Alternate No-confounding design in six factors is introduced in this study. The performance of this Alternate NC design in 6 factors is then evaluated by using Dantzig selector as an analysis method. Lastly, a section is dedicated to comparing the performance of NC-6 and Alternate NC-6 designs.

ContributorsKrishnamoorthy, Archana (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie (Thesis advisor) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

Holistic learning for multi-target and network monitoring problems

Description

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view…

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view of the problem allows for richer learning from data and, thereby, improves decision making.

The first topic of this dissertation is the prediction of several target attributes using a common set of predictor attributes. In a holistic learning approach, the relationships between target attributes are embedded into the learning algorithm created in this dissertation. Specifically, a novel tree based ensemble that leverages the relationships between target attributes towards constructing a diverse, yet strong, model is proposed. The method is justified through its connection to existing methods and experimental evaluations on synthetic and real data.

The second topic pertains to monitoring complex systems that are modeled as networks. Such systems present a rich set of attributes and relationships for which holistic learning is important. In social networks, for example, in addition to friendship ties, various attributes concerning the users' gender, age, topic of messages, time of messages, etc. are collected. A restricted form of monitoring fails to take the relationships of multiple attributes into account, whereas the holistic view embeds such relationships in the monitoring methods. The focus is on the difficult task to detect a change that might only impact a small subset of the network and only occur in a sub-region of the high-dimensional space of the network attributes. One contribution is a monitoring algorithm based on a network statistical model. Another contribution is a transactional model that transforms the task into an expedient structure for machine learning, along with a generalizable algorithm to monitor the attributed network. A learning step in this algorithm adapts to changes that may only be local to sub-regions (with a broader potential for other learning tasks). Diagnostic tools to interpret the change are provided. This robust, generalizable, holistic monitoring method is elaborated on synthetic and real networks.

ContributorsAzarnoush, Bahareh (Author) / Runger, George C. (Thesis advisor) / Bekki, Jennifer (Thesis advisor) / Pan, Rong (Committee member) / Saghafian, Soroush (Committee member) / Arizona State University (Publisher)

Created2014

Experimental designs for generalized linear models and functional magnetic resonance imaging

Description

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact…

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact designs for GLMs, that are robust against parameter, link and model

uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we concentrate on providing robust designs for GLMs, using the D and A optimality criterion. The second part of our research is on providing an algorithm

that is a faster alternative to a recently proposed genetic algorithm (GA) to construct optimal designs for fMRI studies. Our algorithm is built upon a discrete version of the PSO.

ContributorsTemkit, M'Hamed (Author) / Kao, Jason (Thesis advisor) / Reiser, Mark R. (Committee member) / Barber, Jarrett (Committee member) / Montgomery, Douglas C. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

Determining the integrity of applications and operating systems using remote and local attesters

Description

This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform. The challenge is executable code which the client must execute,…

This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform. The challenge is executable code which the client must execute, and the code generates results which are sent to the external entity. These results provide the external entity an assurance as to whether the client application and the OS are in pristine condition. This work also presents a technique where it can be verified that the application which was attested, did not get replaced by a different application after completion of the attestation. The implementation of these three techniques was achieved entirely in software and is backward compatible with legacy machines on the Intel x86 architecture. This research also presents two approaches to incorporating software based "root of trust" using Virtual Machine Monitors (VMMs). The first approach determines the integrity of an executing Guest OS from the Host OS using Linux Kernel-based Virtual Machine (KVM) and qemu emulation software. The second approach implements a small VMM called MIvmm that can be utilized as a trusted codebase to build security applications such as those implemented in this research. MIvmm was conceptualized and implemented without using any existing codebase; its minimal size allows it to be trustworthy. Both the VMM approaches leverage processor support for virtualization in the Intel x86 architecture.

ContributorsSrinivasan, Raghunathan (Author) / Dasgupta, Partha (Thesis advisor) / Colbourn, Charles (Committee member) / Shrivastava, Aviral (Committee member) / Huang, Dijiang (Committee member) / Dewan, Prashant (Committee member) / Arizona State University (Publisher)

Created2011

Systematic policy analysis and management

Description

With the advent of technologies such as web services, service oriented architecture and cloud computing, modern organizations have to deal with policies such as Firewall policies to secure the networks, XACML (eXtensible Access Control Markup Language) policies for controlling the access to critical information as well as resources. Management of…

With the advent of technologies such as web services, service oriented architecture and cloud computing, modern organizations have to deal with policies such as Firewall policies to secure the networks, XACML (eXtensible Access Control Markup Language) policies for controlling the access to critical information as well as resources. Management of these policies is an extremely important task in order to avoid unintended security leakages via illegal accesses, while maintaining proper access to services for legitimate users. Managing and maintaining access control policies manually over long period of time is an error prone task due to their inherent complex nature. Existing tools and mechanisms for policy management use different approaches for different types of policies. This research thesis represents a generic framework to provide an unified approach for policy analysis and management of different types of policies. Generic approach captures the common semantics and structure of different access control policies with the notion of policy ontology. Policy ontology representation is then utilized for effectively analyzing and managing the policies. This thesis also discusses a proof-of-concept implementation of the proposed generic framework and demonstrates how efficiently this unified approach can be used for analysis and management of different types of access control policies.

ContributorsKulkarni, Ketan (Author) / Ahn, Gail-Joon (Thesis advisor) / Yau, Stephen S. (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2011

ASU Electronic Theses and Dissertations