Search Content

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

Description

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness,…

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach.

In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore

and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.

The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.

ContributorsTian, Wenbo (Author) / Hsiao, Ihan (Thesis advisor) / Bazzi, Rida (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2019

Crossing the chasm: deploying machine learning analytics in dynamic real-world scenarios

Description

The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new algorithms or fine-tuning the performance of existing algorithms. Little exists on the process of taking an algorithm from the lab-environment…

The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new algorithms or fine-tuning the performance of existing algorithms. Little exists on the process of taking an algorithm from the lab-environment into the real-world, culminating in sustained value. Real-world applications are typically characterized by dynamic non-stationary systems with requirements around feasibility, stability and maintainability. Not much has been done to establish standards around the unique analytics demands of real-world scenarios.

This research explores the problem of the why so few of the published algorithms enter production and furthermore, fewer end up generating sustained value. The dissertation proposes a ‘Design for Deployment’ (DFD) framework to successfully build machine learning analytics so they can be deployed to generate sustained value. The framework emphasizes and elaborates the often neglected but immensely important latter steps of an analytics process: ‘Evaluation’ and ‘Deployment’. A representative evaluation framework is proposed that incorporates the temporal-shifts and dynamism of real-world scenarios. Additionally, the recommended infrastructure allows analytics projects to pivot rapidly when a particular venture does not materialize. Deployment needs and apprehensions of the industry are identified and gaps addressed through a 4-step process for sustainable deployment. Lastly, the need for analytics as a functional area (like finance and IT) is identified to maximize the return on machine-learning deployment.

The framework and process is demonstrated in semiconductor manufacturing – it is highly complex process involving hundreds of optical, electrical, chemical, mechanical, thermal, electrochemical and software processes which makes it a highly dynamic non-stationary system. Due to the 24/7 uptime requirements in manufacturing, high-reliability and fail-safe are a must. Moreover, the ever growing volumes mean that the system must be highly scalable. Lastly, due to the high cost of change, sustained value proposition is a must for any proposed changes. Hence the context is ideal to explore the issues involved. The enterprise use-cases are used to demonstrate the robustness of the framework in addressing challenges encountered in the end-to-end process of productizing machine learning analytics in dynamic read-world scenarios.

ContributorsShahapurkar, Som (Author) / Liu, Huan (Thesis advisor) / Davulcu, Hasan (Committee member) / Ameresh, Ashish (Committee member) / He, Jingrui (Committee member) / Tuv, Eugene (Committee member) / Arizona State University (Publisher)

Created2016

Basketball Shooting Prediction Using Machine Learning Models and Motion Capture System

Description

This project explores the potential for the accurate prediction of basketball shooting posture with machine learning (ML) prediction algorithms, using the data collected by an Internet of Things (IoT) based motion capture system. Specifically, this question is addressed in the research - Can I develop an ML model to generalize…

This project explores the potential for the accurate prediction of basketball shooting posture with machine learning (ML) prediction algorithms, using the data collected by an Internet of Things (IoT) based motion capture system. Specifically, this question is addressed in the research - Can I develop an ML model to generalize a decent basketball shot pattern? - by introducing a supervised learning paradigm, where the ML method takes acceleration attributes to predict the basketball shot efficiency. The solution presented in this study considers motion capture devices configuration on the right upper limb with a sole motion sensor made by BNO080 and ESP32 attached on the right wrist, right forearm, and right shoulder, respectively, By observing the rate of speed changing in the shooting movement and comparing their performance, ML models that apply K-Nearest Neighbor, and Decision Tree algorithm, conclude the best range of acceleration that different spots on the arm should implement.

ContributorsLiang, Chengxu (Author) / Ingalls, Todd (Thesis advisor) / Turaga, Pavan (Thesis advisor) / De Luca, Gennaro (Committee member) / Arizona State University (Publisher)

Created2023

Computational Challenges in Non-parametric Prediction of Bradycardia in Preterm Infants

Description

Infants born before 37 weeks of pregnancy are considered to be preterm. Typically, preterm infants have to be strictly monitored since they are highly susceptible to health problems like hypoxemia (low blood oxygen level), apnea, respiratory issues, cardiac problems, neurological problems as well as an increased chance of long-term health…

Infants born before 37 weeks of pregnancy are considered to be preterm. Typically, preterm infants have to be strictly monitored since they are highly susceptible to health problems like hypoxemia (low blood oxygen level), apnea, respiratory issues, cardiac problems, neurological problems as well as an increased chance of long-term health issues such as cerebral palsy, asthma and sudden infant death syndrome. One of the leading health complications in preterm infants is bradycardia - which is defined as the slower than expected heart rate, generally beating lower than 60 beats per minute. Bradycardia is often accompanied by low oxygen levels and can cause additional long term health problems in the premature infant.The implementation of a non-parametric method to predict the onset of brady- cardia is presented. This method assumes no prior knowledge of the data and uses kernel density estimation to predict the future onset of bradycardia events. The data is preprocessed, and then analyzed to detect the peaks in the ECG signals, following which different kernels are implemented to estimate the shared underlying distribu- tion of the data. The performance of the algorithm is evaluated using various metrics and the computational challenges and methods to overcome them are also discussed.
It is observed that the performance of the algorithm with regards to the kernels used are consistent with the theoretical performance of the kernel as presented in a previous work. The theoretical approach has also been automated in this work and the various implementation challenges have been addressed.

ContributorsMitra, Sinjini (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Moraffah, Bahman (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2020

Theses and Dissertations

Filtering by

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

Crossing the chasm: deploying machine learning analytics in dynamic real-world scenarios

Basketball Shooting Prediction Using Machine Learning Models and Motion Capture System

Computational Challenges in Non-parametric Prediction of Bradycardia in Preterm Infants