Search Content

Spatio-temporal data mining to detect changes and clusters in trajectories

Description

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic…

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories.

ContributorsKondaveeti, Anirudh (Author) / Runger, George C. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2012

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus…

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)

Created2013

Hafnium oxide as an alternative barrier to aluminum oxide for thermally stable niobium tunnel junctions

Description

In this research, our goal was to fabricate Josephson junctions that can be stably processed at 300°C or higher. With the purpose of integrating Josephson junction fabrication with the current semiconductor circuit fabrication process, back-end process temperatures (>350 °C) will be a key for producing large scale junction circuits reliably,…

In this research, our goal was to fabricate Josephson junctions that can be stably processed at 300°C or higher. With the purpose of integrating Josephson junction fabrication with the current semiconductor circuit fabrication process, back-end process temperatures (>350 °C) will be a key for producing large scale junction circuits reliably, which requires the junctions to be more thermally stable than current Nb/Al-AlOx/Nb junctions. Based on thermodynamics, Hf was chosen to produce thermally stable Nb/Hf-HfOx/Nb superconductor tunnel Josephson junctions that can be grown or processed at elevated temperatures. Also elevated synthesis temperatures improve the structural and electrical properties of Nb electrode layers that could potentially improve junction device performance. The refractory nature of Hf, HfO2 and Nb allow for the formation of flat, abrupt and thermally-stable interfaces. But the current Al-based barrier will have problems when using with high-temperature grown and high-quality Nb. So our work is aimed at using Nb grown at elevated temperatures to fabricate thermally stable Josephson tunnel junctions. As a junction barrier metal, Hf was studied and compared with the traditional Al-barrier material. We have proved that Hf-HfOx is a good barrier candidate for high-temperature synthesized Josephson junction. Hf deposited at 500 °C on Nb forms flat and chemically abrupt interfaces. Nb/Hf-HfOx/Nb Josephson junctions were synthesized, fabricated and characterized with different oxidizing conditions. The results of materials characterization and junction electrical measurements are reported and analyzed. We have improved the annealing stability of Nb junctions and also used high-quality Nb grown at 500 °C as the bottom electrode successfully. Adding a buffer layer or multiple oxidation steps improves the annealing stability of Josephson junctions. We also have attempted to use the Atomic Layer Deposition (ALD) method for the growth of Hf oxide as the junction barrier and got tunneling results.

ContributorsHuang, Mengchu, 1987- (Author) / Newman, Nathan (Thesis advisor) / Rowell, John M. (Committee member) / Singh, Rakesh K. (Committee member) / Chamberlin, Ralph (Committee member) / Wang, Robert (Committee member) / Arizona State University (Publisher)

Created2013

Analysis of no-confounding designs using the dantzig selector

Description

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation…

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation of main effects and a few two-factor interactions without the need for follow-up experimentation. Analysis methods for non-regular designs is an area of ongoing research, because standard variable selection techniques such as stepwise regression may not always be the best approach. The current work investigates the use of the Dantzig selector for analyzing no-confounding designs. Through a series of examples it shows that this technique is very effective for identifying the set of active factors in no-confounding designs when there are three of four active main effects and up to two active two-factor interactions.

To evaluate the performance of Dantzig selector, a simulation study was conducted and the results based on the percentage of type II errors are analyzed. Also, another alternative for 6 factor NC design, called the Alternate No-confounding design in six factors is introduced in this study. The performance of this Alternate NC design in 6 factors is then evaluated by using Dantzig selector as an analysis method. Lastly, a section is dedicated to comparing the performance of NC-6 and Alternate NC-6 designs.

ContributorsKrishnamoorthy, Archana (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie (Thesis advisor) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

Holistic learning for multi-target and network monitoring problems

Description

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view…

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view of the problem allows for richer learning from data and, thereby, improves decision making.

The first topic of this dissertation is the prediction of several target attributes using a common set of predictor attributes. In a holistic learning approach, the relationships between target attributes are embedded into the learning algorithm created in this dissertation. Specifically, a novel tree based ensemble that leverages the relationships between target attributes towards constructing a diverse, yet strong, model is proposed. The method is justified through its connection to existing methods and experimental evaluations on synthetic and real data.

The second topic pertains to monitoring complex systems that are modeled as networks. Such systems present a rich set of attributes and relationships for which holistic learning is important. In social networks, for example, in addition to friendship ties, various attributes concerning the users' gender, age, topic of messages, time of messages, etc. are collected. A restricted form of monitoring fails to take the relationships of multiple attributes into account, whereas the holistic view embeds such relationships in the monitoring methods. The focus is on the difficult task to detect a change that might only impact a small subset of the network and only occur in a sub-region of the high-dimensional space of the network attributes. One contribution is a monitoring algorithm based on a network statistical model. Another contribution is a transactional model that transforms the task into an expedient structure for machine learning, along with a generalizable algorithm to monitor the attributed network. A learning step in this algorithm adapts to changes that may only be local to sub-regions (with a broader potential for other learning tasks). Diagnostic tools to interpret the change are provided. This robust, generalizable, holistic monitoring method is elaborated on synthetic and real networks.

ContributorsAzarnoush, Bahareh (Author) / Runger, George C. (Thesis advisor) / Bekki, Jennifer (Thesis advisor) / Pan, Rong (Committee member) / Saghafian, Soroush (Committee member) / Arizona State University (Publisher)

Created2014

Experimental designs for generalized linear models and functional magnetic resonance imaging

Description

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact…

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact designs for GLMs, that are robust against parameter, link and model

uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we concentrate on providing robust designs for GLMs, using the D and A optimality criterion. The second part of our research is on providing an algorithm

that is a faster alternative to a recently proposed genetic algorithm (GA) to construct optimal designs for fMRI studies. Our algorithm is built upon a discrete version of the PSO.

ContributorsTemkit, M'Hamed (Author) / Kao, Jason (Thesis advisor) / Reiser, Mark R. (Committee member) / Barber, Jarrett (Committee member) / Montgomery, Douglas C. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

Production scheduling and system configuration for capacitated flow lines with application in the semiconductor backend process

Description

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process…

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process is still very difficult due to the wide product mix, large number of parallel machines, product family related setups, machine-product qualification, and weekly demand consisting of thousands of lots. In this research, a novel mixed-integer-linear-programming (MILP) model is proposed for the batch production scheduling of a semiconductor back-end facility. In the MILP formulation, the manufacturing process is modeled as a flexible flow line with bottleneck stages, unrelated parallel machines, product family related sequence-independent setups, and product-machine qualification considerations. However, this MILP formulation is difficult to solve for real size problem instances. In a semiconductor back-end facility, production scheduling usually needs to be done every day while considering updated demand forecast for a medium term planning horizon. Due to the limitation on the solvable size of the MILP model, a deterministic scheduling system (DSS), consisting of an optimizer and a scheduler, is proposed to provide sub-optimal solutions in a short time for real size problem instances. The optimizer generates a tentative production plan. Then the scheduler sequences each lot on each individual machine according to the tentative production plan and scheduling rules. Customized factory rules and additional resource constraints are included in the DSS, such as preventive maintenance schedule, setup crew availability, and carrier limitations. Small problem instances are randomly generated to compare the performances of the MILP model and the deterministic scheduling system. Then experimental design is applied to understand the behavior of the DSS and identify the best configuration of the DSS under different demand scenarios. Product-machine qualification decisions have long-term and significant impact on production scheduling. A robust product-machine qualification matrix is critical for meeting demand when demand quantity or mix varies. In the second part of this research, a stochastic mixed integer programming model is proposed to balance the tradeoff between current machine qualification costs and future backorder costs with uncertain demand. The L-shaped method and acceleration techniques are proposed to solve the stochastic model. Computational results are provided to compare the performance of different solution methods.

ContributorsFu, Mengying (Author) / Askin, Ronald G. (Thesis advisor) / Zhang, Muhong (Thesis advisor) / Fowler, John W (Committee member) / Pan, Rong (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Analysis and modeling of services impacts on system workload and performance in service-based systems (SBS)

Description

In recent years, service oriented computing (SOC) has become a widely accepted paradigm for the development of distributed applications such as web services, grid computing and cloud computing systems. In service-based systems (SBS), multiple service requests with specific performance requirements make services compete for system resources. IT service providers need…

In recent years, service oriented computing (SOC) has become a widely accepted paradigm for the development of distributed applications such as web services, grid computing and cloud computing systems. In service-based systems (SBS), multiple service requests with specific performance requirements make services compete for system resources. IT service providers need to allocate resources to services so the performance requirements of customers can be satisfied. Workload and performance models are required for efficient resource management and service performance assurance in SBS. This dissertation develops two methods to understand and model the cause-effect relations of service-related activities with resources workload and service performance. Part one presents an empirical method that requires the collection of system dynamics data and the application of statistical analyses. The results show that the method is capable to: 1) uncover the impacts of services on resource workload and service performance, 2) identify interaction effects of multiple services running concurrently, 3) gain insights about resource and performance tradeoffs of services, and 4) build service workload and performance models. In part two, the empirical method is used to investigate the impacts of services, security mechanisms and cyber attacks on resources workload and service performance. The information obtained is used to: 1) uncover interaction effects of services, security mechanisms and cyber attacks, 2) identify tradeoffs within limits of system resources, and 3) develop general/specific strategies for system survivability. Finally, part three presents a framework based on the usage profiles of services competing for resources and the resource-sharing schemes. The framework is used to: 1) uncover the impacts of service parameters (e.g. arrival distribution, execution time distribution, priority, workload intensity, scheduling algorithm) on workload and performance, and 2) build service workload and performance models at individual resources. The estimates obtained from service workload and performance models at individual resources can be aggregated to obtain overall estimates of services through multiple system resources. The workload and performance models of services obtained through both methods can be used for the efficient resource management and service performance assurance in SBS.

ContributorsMartinez Aranda, Billibaldo (Author) / Ye, Nong (Thesis advisor) / Wu, Tong (Committee member) / Sarjoughian, Hessam S. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2012

Experimental demonstration of photovoltaic powered solar cooling with ice storage

Description

The ability to shift the photovoltaic (PV) power curve and make the energy accessible during peak hours can be accomplished through pairing solar PV with energy storage technologies. A prototype hybrid air conditioning system (HACS), built under supervision of project head Patrick Phelan, consists of PV modules running a DC…

The ability to shift the photovoltaic (PV) power curve and make the energy accessible during peak hours can be accomplished through pairing solar PV with energy storage technologies. A prototype hybrid air conditioning system (HACS), built under supervision of project head Patrick Phelan, consists of PV modules running a DC compressor that operates a conventional HVAC system paired with a second evaporator submerged within a thermal storage tank. The thermal storage is a 0.284m3 or 75 gallon freezer filled with Cryogel balls, submerged in a weak glycol solution. It is paired with its own separate air handler, circulating the glycol solution. The refrigerant flow is controlled by solenoid valves that are electrically connected to a high and low temperature thermostat. During daylight hours, the PV modules run the DC compressor. The refrigerant flow is directed to the conventional HVAC air handler when cooling is needed. Once the desired room temperature is met, refrigerant flow is diverted to the thermal storage, storing excess PV power. During peak energy demand hours, the system uses only small amounts of grid power to pump the glycol solution through the air handler (note the compressor is off), allowing for money and energy savings. The conventional HVAC unit can be scaled down, since during times of large cooling demands the glycol air handler can be operated in parallel with the conventional HVAC unit. Four major test scenarios were drawn up in order to fully comprehend the performance characteristics of the HACS. Upon initial running of the system, ice was produced and the thermal storage was charged. A simple test run consisting of discharging the thermal storage, initially ~¼ frozen, was performed. The glycol air handler ran for 6 hours and the initial cooling power was 4.5 kW. This initial test was significant, since greater than 3.5 kW of cooling power was produced for 3 hours, thus demonstrating the concept of energy storage and recovery.

ContributorsPeyton-Levine, Tobin (Author) / Phelan, Patrick (Thesis advisor) / Trimble, Steve (Committee member) / Wang, Robert (Committee member) / Arizona State University (Publisher)

Created2012

ASU Electronic Theses and Dissertations