ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- All Subjects: Statistics
- Creators: Borror, Connie M.
- Creators: Davulcu, Hasan
The pseudo-Bayesian approach can be applied to the problem of optimal design construction under dependent observations. Often, correlation between observations exists due to restrictions on randomization. Several techniques for optimal design construction are proposed in the case of the conditional response distribution being a natural exponential family member but with a normally distributed block effect . The reviewed pseudo-Bayesian approach is compared to an approach based on substituting the marginal likelihood with the joint likelihood and an approach based on projections of the score function (often called quasi-likelihood). These approaches are compared for several models with normal, Poisson, and binomial conditional response distributions via the true determinant of the expected Fisher information matrix where the dispersion of the random blocks is considered a nuisance parameter. A case study using the developed methods is performed.
The joint and quasi-likelihood methods are then extended to address the case when the magnitude of random block dispersion is of concern. Again, a simulation study over several models is performed, followed by a case study when the conditional response distribution is a Poisson distribution.
In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore
and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.
The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.
Computer-generated optimal designs are popular design choices for less standard scenarios where classical designs are not ideal. This work presents a new approach to experimental designs for dual-response systems. The normal, binomial, and Poisson distributions are considered for the potential responses. Using the D-criterion for the linear model and the Bayesian D-criterion for the nonlinear models, a weighted criterion is implemented in a coordinate-exchange algorithm. The designs are evaluated and compared across different weights. The sensitivity of the designs to the priors supplied in the Bayesian D-criterion is explored in the third chapter of this work.
The final section of this work presents a method for a decision-making process involving multiple objectives. There are situations where a decision-maker is interested in several optimal solutions, not just one. These types of decision processes fall into one of two scenarios: 1) wanting to identify the best N solutions to accomplish a goal or specific task, or 2) evaluating a decision based on several primary quantitative objectives along with secondary qualitative priorities. Design of experiment selection often involves the second scenario where the goal is to identify several contending solutions using the primary quantitative objectives, and then use the secondary qualitative objectives to guide the final decision. Layered Pareto Fronts can help identify a richer class of contenders to examine more closely. The method is illustrated with a supersaturated screening design example.
This research explores the problem of the why so few of the published algorithms enter production and furthermore, fewer end up generating sustained value. The dissertation proposes a ‘Design for Deployment’ (DFD) framework to successfully build machine learning analytics so they can be deployed to generate sustained value. The framework emphasizes and elaborates the often neglected but immensely important latter steps of an analytics process: ‘Evaluation’ and ‘Deployment’. A representative evaluation framework is proposed that incorporates the temporal-shifts and dynamism of real-world scenarios. Additionally, the recommended infrastructure allows analytics projects to pivot rapidly when a particular venture does not materialize. Deployment needs and apprehensions of the industry are identified and gaps addressed through a 4-step process for sustainable deployment. Lastly, the need for analytics as a functional area (like finance and IT) is identified to maximize the return on machine-learning deployment.
The framework and process is demonstrated in semiconductor manufacturing – it is highly complex process involving hundreds of optical, electrical, chemical, mechanical, thermal, electrochemical and software processes which makes it a highly dynamic non-stationary system. Due to the 24/7 uptime requirements in manufacturing, high-reliability and fail-safe are a must. Moreover, the ever growing volumes mean that the system must be highly scalable. Lastly, due to the high cost of change, sustained value proposition is a must for any proposed changes. Hence the context is ideal to explore the issues involved. The enterprise use-cases are used to demonstrate the robustness of the framework in addressing challenges encountered in the end-to-end process of productizing machine learning analytics in dynamic read-world scenarios.
The second half of this research deals with the construction of exact D-optimal designs for binary and ordinal responses. For both types, the base models fall under the class of Generalized Linear Models (GLMs) with a logistic link. First, the properties of the exact D-optimal mixture designs for binary responses are investigated. It will be shown that standard mixture designs and designs proposed for normal-theory responses are poor surrogates for the true D-optimal designs. In contrast with the D-optimal designs for normal-theory responses which locate support points at the boundaries of the mixture region, exact D-optimal designs for GLMs tend to locate support points at regions of uncertainties. Alternate D-optimal designs for binary responses with high D-efficiencies are proposed by utilizing information about these regions.
The Mixture Exchange Algorithm (MEA), a search heuristic tailored to the construction of efficient mixture designs with GLM-type responses, is proposed. MEA introduces a new and efficient updating formula that lessens the computational expense of calculating the D-criterion for multi-categorical response systems, such as ordinal response models. MEA computationally outperforms comparable search heuristics by several orders of magnitude. Further, its computational expense increases at a slower rate of growth with increasing problem size. Finally, local and robust D-optimal designs for ordinal-response mixture systems are constructed using MEA, investigated, and shown to have high D-efficiency performance.