ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- All Subjects: Statistics
The first topic of this dissertation is the prediction of several target attributes using a common set of predictor attributes. In a holistic learning approach, the relationships between target attributes are embedded into the learning algorithm created in this dissertation. Specifically, a novel tree based ensemble that leverages the relationships between target attributes towards constructing a diverse, yet strong, model is proposed. The method is justified through its connection to existing methods and experimental evaluations on synthetic and real data.
The second topic pertains to monitoring complex systems that are modeled as networks. Such systems present a rich set of attributes and relationships for which holistic learning is important. In social networks, for example, in addition to friendship ties, various attributes concerning the users' gender, age, topic of messages, time of messages, etc. are collected. A restricted form of monitoring fails to take the relationships of multiple attributes into account, whereas the holistic view embeds such relationships in the monitoring methods. The focus is on the difficult task to detect a change that might only impact a small subset of the network and only occur in a sub-region of the high-dimensional space of the network attributes. One contribution is a monitoring algorithm based on a network statistical model. Another contribution is a transactional model that transforms the task into an expedient structure for machine learning, along with a generalizable algorithm to monitor the attributed network. A learning step in this algorithm adapts to changes that may only be local to sub-regions (with a broader potential for other learning tasks). Diagnostic tools to interpret the change are provided. This robust, generalizable, holistic monitoring method is elaborated on synthetic and real networks.
exact designs for GLMs, that are robust against parameter, link and model
uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we concentrate on providing robust designs for GLMs, using the D and A optimality criterion. The second part of our research is on providing an algorithm
that is a faster alternative to a recently proposed genetic algorithm (GA) to construct optimal designs for fMRI studies. Our algorithm is built upon a discrete version of the PSO.
The pseudo-Bayesian approach can be applied to the problem of optimal design construction under dependent observations. Often, correlation between observations exists due to restrictions on randomization. Several techniques for optimal design construction are proposed in the case of the conditional response distribution being a natural exponential family member but with a normally distributed block effect . The reviewed pseudo-Bayesian approach is compared to an approach based on substituting the marginal likelihood with the joint likelihood and an approach based on projections of the score function (often called quasi-likelihood). These approaches are compared for several models with normal, Poisson, and binomial conditional response distributions via the true determinant of the expected Fisher information matrix where the dispersion of the random blocks is considered a nuisance parameter. A case study using the developed methods is performed.
The joint and quasi-likelihood methods are then extended to address the case when the magnitude of random block dispersion is of concern. Again, a simulation study over several models is performed, followed by a case study when the conditional response distribution is a Poisson distribution.
The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner.
The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability.
The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability.
The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.
Computer-generated optimal designs are popular design choices for less standard scenarios where classical designs are not ideal. This work presents a new approach to experimental designs for dual-response systems. The normal, binomial, and Poisson distributions are considered for the potential responses. Using the D-criterion for the linear model and the Bayesian D-criterion for the nonlinear models, a weighted criterion is implemented in a coordinate-exchange algorithm. The designs are evaluated and compared across different weights. The sensitivity of the designs to the priors supplied in the Bayesian D-criterion is explored in the third chapter of this work.
The final section of this work presents a method for a decision-making process involving multiple objectives. There are situations where a decision-maker is interested in several optimal solutions, not just one. These types of decision processes fall into one of two scenarios: 1) wanting to identify the best N solutions to accomplish a goal or specific task, or 2) evaluating a decision based on several primary quantitative objectives along with secondary qualitative priorities. Design of experiment selection often involves the second scenario where the goal is to identify several contending solutions using the primary quantitative objectives, and then use the secondary qualitative objectives to guide the final decision. Layered Pareto Fronts can help identify a richer class of contenders to examine more closely. The method is illustrated with a supersaturated screening design example.