ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- All Subjects: Statistics
- Creators: Davulcu, Hasan
- Creators: Sen, Arunabha
In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore
and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.
The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.
This research explores the problem of the why so few of the published algorithms enter production and furthermore, fewer end up generating sustained value. The dissertation proposes a ‘Design for Deployment’ (DFD) framework to successfully build machine learning analytics so they can be deployed to generate sustained value. The framework emphasizes and elaborates the often neglected but immensely important latter steps of an analytics process: ‘Evaluation’ and ‘Deployment’. A representative evaluation framework is proposed that incorporates the temporal-shifts and dynamism of real-world scenarios. Additionally, the recommended infrastructure allows analytics projects to pivot rapidly when a particular venture does not materialize. Deployment needs and apprehensions of the industry are identified and gaps addressed through a 4-step process for sustainable deployment. Lastly, the need for analytics as a functional area (like finance and IT) is identified to maximize the return on machine-learning deployment.
The framework and process is demonstrated in semiconductor manufacturing – it is highly complex process involving hundreds of optical, electrical, chemical, mechanical, thermal, electrochemical and software processes which makes it a highly dynamic non-stationary system. Due to the 24/7 uptime requirements in manufacturing, high-reliability and fail-safe are a must. Moreover, the ever growing volumes mean that the system must be highly scalable. Lastly, due to the high cost of change, sustained value proposition is a must for any proposed changes. Hence the context is ideal to explore the issues involved. The enterprise use-cases are used to demonstrate the robustness of the framework in addressing challenges encountered in the end-to-end process of productizing machine learning analytics in dynamic read-world scenarios.
As a step in model development, statistically designed screening experiments may be used to identify the main effects and interactions most significant on a response of a system. However, traditional approaches for screening are ineffective for complex systems because of the size of the experimental design. Consequently, the factors considered are often restricted, but this automatically restricts the interactions that may be identified as well. Alternatively, the designs are restricted to only identify main effects, but this then fails to consider any possible interactions of the factors.
To address this problem, a specific combinatorial design termed a locating array is proposed as a screening design for complex systems. Locating arrays exhibit logarithmic growth in the number of factors because their focus is on identification rather than on measurement. This makes practical the consideration of an order of magnitude more factors in experimentation than traditional screening designs.
As a proof-of-concept, a locating array is applied to screen for main effects and low-order interactions on the response of average transport control protocol (TCP) throughput in a simulation model of a mobile ad hoc network (MANET). A MANET is a collection of mobile wireless nodes that self-organize without the aid of any centralized control or fixed infrastructure. The full-factorial design for the MANET considered is infeasible (with over 10^{43} design points) yet a locating array has only 421 design points.
In conjunction with the locating array, a ``heavy hitters'' algorithm is developed to identify the influential main effects and two-way interactions, correcting for the non-normal distribution of the average throughput, and uneven coverage of terms in the locating array. The significance of the identified main effects and interactions is validated independently using the statistical software JMP.
The statistical characteristics used to evaluate traditional screening designs are also applied to locating arrays.
These include the matrix of covariance, fraction of design space, and aliasing, among others. The results lend additional support to the use of locating arrays as screening designs.
The use of locating arrays as screening designs for complex engineered systems is promising as they yield useful models. This facilitates quantitative evaluation of architectures and protocols and contributes to our understanding of complex engineered networks.