Search Content

Predictive Modeling of 4th Down Selection in Power 5 Conference: Data Analytics

Description

Predictive analytics have been used in a wide variety of settings, including healthcare,
sports, banking, and other disciplines. We use predictive analytics and modeling to
determine the impact of certain factors that increase the probability of a successful
fourth down conversion in the Power 5 conferences. The logistic regression models…

Predictive analytics have been used in a wide variety of settings, including healthcare,
sports, banking, and other disciplines. We use predictive analytics and modeling to
determine the impact of certain factors that increase the probability of a successful
fourth down conversion in the Power 5 conferences. The logistic regression models
predict the likelihood of going for fourth down with a 64% or more probability based on
2015-17 data obtained from ESPN’s college football API. Offense type though important
but non-measurable was incorporated as a random effect. We found that distance to go,
play type, field position, and week of the season were key leading covariates in
predictability. On average, our model performed as much as 14% better than coaches
in 2018.

ContributorsBlinkoff, Joshua Ian (Co-author) / Voeller, Michael (Co-author) / Wilson, Jeffrey (Thesis director) / Graham, Scottie (Committee member) / Dean, W.P. Carey School of Business (Contributor) / Department of Information Systems (Contributor) / Department of Management and Entrepreneurship (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Predictive Modeling of 4th Down Selection in Power 5 Conference: Data Analytics

Description

Predictive analytics have been used in a wide variety of settings, including healthcare, sports, banking, and other disciplines. We use predictive analytics and modeling to determine the impact of certain factors that increase the probability of a successful fourth down conversion in the Power 5 conferences. The logistic regression models…

Predictive analytics have been used in a wide variety of settings, including healthcare, sports, banking, and other disciplines. We use predictive analytics and modeling to determine the impact of certain factors that increase the probability of a successful fourth down conversion in the Power 5 conferences. The logistic regression models predict the likelihood of going for fourth down with a 64% or more probability based on 2015-17 data obtained from ESPN’s college football API. Offense type though important but non-measurable was incorporated as a random effect. We found that distance to go, play type, field position, and week of the season were key leading covariates in predictability. On average, our model performed as much as 14% better than coaches in 2018.

ContributorsVoeller, Michael Jeffrey (Co-author) / Blinkoff, Josh (Co-author) / Wilson, Jeffrey (Thesis director) / Graham, Scottie (Committee member) / Department of Information Systems (Contributor) / Department of Finance (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Novel methods of biomarker discovery and predictive modeling using Random Forest

Description

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF…

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval.

Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships.

Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets.

Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets.

ContributorsGuan, Xin (Author) / Liu, Li (Thesis advisor) / Runger, George C. (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2017

Predictive modeling for extremely scaled CMOS and post silicon devices

Description

To extend the lifetime of complementary metal-oxide-semiconductors (CMOS), emerging process techniques are being proposed to conquer the manufacturing difficulties. New structures and materials are proposed with superior electrical properties to traditional CMOS, such as strain technology and feedback field-effect transistor (FB-FET). To continue the design success and make an impact…

To extend the lifetime of complementary metal-oxide-semiconductors (CMOS), emerging process techniques are being proposed to conquer the manufacturing difficulties. New structures and materials are proposed with superior electrical properties to traditional CMOS, such as strain technology and feedback field-effect transistor (FB-FET). To continue the design success and make an impact on leading products, advanced circuit design exploration must begin concurrently with early silicon development. Therefore, an accurate and scalable model is desired to correctly capture those effects and flexible to extend to alternative process choices. For example, strain technology has been successfully integrated into CMOS fabrication to improve transistor performance but the stress is non-uniformly distributed in the channel, leading to systematic performance variations. In this dissertation, a new layout-dependent stress model is proposed as a function of layout, temperature, and other device parameters. Furthermore, a method of layout decomposition is developed to partition the layout into a set of simple patterns for model extraction. These solutions significantly reduce the complexity in stress modeling and simulation. On the other hand, semiconductor devices with self-feedback mechanisms are emerging as promising alternatives to CMOS. Fe-FET was proposed to improve the switching by integrating a ferroelectric material as gate insulator in a MOSFET structure. Under particular circumstances, ferroelectric capacitance is effectively negative, due to the negative slope of its polarization-electrical field curve. This property makes the ferroelectric layer a voltage amplifier to boost surface potential, achieving fast transition. A new threshold voltage model for Fe-FET is developed, and is further revealed that the impact of random dopant fluctuation (RDF) can be suppressed. Furthermore, through silicon via (TSV), a key technology that enables the 3D integration of chips, is studied. TSV structure is usually a cylindrical metal-oxide-semiconductors (MOS) capacitor. A piecewise capacitance model is proposed for 3D interconnect simulation. Due to the mismatch in coefficients of thermal expansion (CTE) among materials, thermal stress is observed in TSV process and impacts neighboring devices. The stress impact is investigated to support the interaction between silicon process and IC design at the early stage.

ContributorsWang, Chi-Chao (Author) / Cao, Yu (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Clark, Lawrence (Committee member) / Schroder, Dieter (Committee member) / Arizona State University (Publisher)

Created2011

Preliminary Model of Alzheimer’s Disease Age of Onset From the All of Us Database

Description

The burden of dementia and its primary cause, Alzheimer’s disease, continue to devastate many with no available cure although present research has delivered methods for risk calculation and models of disease development that promote preventative strategies. Presently Alzheimer’s disease affects 1 in 9 people aged 65 and older amounting to…

The burden of dementia and its primary cause, Alzheimer’s disease, continue to devastate many with no available cure although present research has delivered methods for risk calculation and models of disease development that promote preventative strategies. Presently Alzheimer’s disease affects 1 in 9 people aged 65 and older amounting to a total annual healthcare cost in 2023 in the United States of $345 billion between Alzheimer’s disease and other dementias making dementia one of the costliest conditions to society (“2023 Alzheimer’s Disease Facts and Figures,” 2023). This substantial cost can be dramatically lowered in addition to a reduction in the overall burden of dementia through the help of risk prediction models, but there is still a need for models to deliver an individual’s predicted time of onset that supplements risk prediction in hopes of improving preventative care. The aim of this study is to develop a model used to predict the age of onset for all-cause dementias and Alzheimer’s disease using demographic, comorbidity, and genetic data from a cohort sample. This study creates multiple regression models with methods of ordinary least squares (OLS) and least absolute shrinkage and selection operator (LASSO) regression methods to understand the capacity of predictor variables that estimate age of onset for all-cause dementia and Alzheimer’s disease. This study is unique in its use of a diverse cohort containing 346 participants to create a predictive model that originates from the All of Us Research Program database and seeks to represent an accurate sampling of the United States population. The regression models generated had no predictive capacity for the age of onset but outline a simplified approach for integrating public health data into a predictive model. The results from the generated models suggest a need for continued research linking risk factors that estimate time of onset.

ContributorsGoeringer, Cayden (Author) / Holechek, Susan (Thesis director) / Sellner, Erin (Committee member) / Barrett, The Honors College (Contributor) / School of Life Sciences (Contributor) / School of Music, Dance and Theatre (Contributor)

Created2023-05

Potentiomics: Observations of In Situ and In Vitro Microbial Metabolic Activity Using Type IV Non-Selective Biofilm Membrane Potentiometric Sensors in Real-Time Applications.

Description

Potentiometric instrumentation technologies are widely used across many disciplines of science and engineering providing the ability to measure changes to specific environmental variables through various types of sensor electrodes and selective membranes. However, types I, II, and III potentiometric sensor electrodes are limited by biofouling activity, membrane maintenance, grounding…

Potentiometric instrumentation technologies are widely used across many disciplines of science and engineering providing the ability to measure changes to specific environmental variables through various types of sensor electrodes and selective membranes. However, types I, II, and III potentiometric sensor electrodes are limited by biofouling activity, membrane maintenance, grounding sensitivity, thermodynamic variables, and electromagnetic interference. Further, algorithms embedded into instrumentation hardware have impeded the usefulness of such measurements outside of highly controlled environments. Reliability of accurate measurement using these types of senor electrodes is limited to industrial and lab applications in chemistry and nominally active biological environments. Novel innovations in using exotic materials have improved the usefulness of Type II (e.g. tantalum-rubidium-doped titanium) and Type III (e.g. Nafion™ membranes) sensor electrodes, but those sensors are still limited to measuring a single selective parameter. This scope of work investigates utilizing a novel non-selective membrane, or naturally occurring biofilm membrane, as the active sensing surface of a graphite electrode as a new Type IV potentiometric sensor electrode (e.g., the MiProbE™) in biologically active environments. The analysis herein demonstrates decomposition of these non-selective signals into real-time metabolic activity, measurement of key biochemical processes and environmental condition parameters through classical mathematical analysis methods providing the basis of Potentiomics – the characterization and quantification of biochemical metabolic processes in highly dynamic non-equilibrium states.

ContributorsTaylor, Evan (Author) / Weiss, Taylor L (Thesis advisor) / Brown, Albert F (Committee member) / Boyer, Treavor H (Committee member) / Arizona State University (Publisher)

Created2022

A Player-Based Approach to Predicting March Madness Tournament Outcomes

Description

In the U.S., the annual NCAA college basketball tournament, known as March Madness, draws in millions of people trying to predict who will win. Just one problem: no one has ever created a perfect bracket. By using a player-based rating system that updates throughout the season, a “predictive model” can…

In the U.S., the annual NCAA college basketball tournament, known as March Madness, draws in millions of people trying to predict who will win. Just one problem: no one has ever created a perfect bracket. By using a player-based rating system that updates throughout the season, a “predictive model” can be created to accurately predict teams with the best shot of winning the championship, and even show which players had the most impact on a single team in college basketball.

ContributorsKearney, Matthew (Author) / Schneider, Laurence (Thesis director) / McIntosh, Daniel (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2023-05

Machine Learning Methods for Prediction of Physical System Behavior

Description

The advancement and marked increase in the use of computing devices in health care for large scale and personal medical use has transformed the field of medicine and health care into a data rich domain. This surge in the availability of data has allowed domain experts to investigate, study and…

The advancement and marked increase in the use of computing devices in health care for large scale and personal medical use has transformed the field of medicine and health care into a data rich domain. This surge in the availability of data has allowed domain experts to investigate, study and discover inherent patterns in diseases from new perspectives and in turn, further the field of medicine. Storage and analysis of this data in real time aids in enhancing the response time and efficiency of doctors and health care specialists. However, due to the time critical nature of most life- threatening diseases, there is a growing need to make informed decisions prior to the occurrence of any fatal outcome. Alongside time sensitivity, analyzing data specific to diseases and their effects on an individual basis leads to more efficient prognosis and rapid deployment of cures. The primary challenge in addressing both of these issues arises from the time varying and time sensitive nature of the data being studied and in the ability to successfully predict anomalous events using only observed data.This dissertation introduces adaptive machine learning algorithms that aid in the prediction of anomalous situations arising due to abnormalities present in patients diagnosed with certain types of diseases. Emphasis is given to the adaptation and development of algorithms based on an individual basis to further the accuracy of all predictions made. The main objectives are to learn the underlying representation of the data using empirical methods and enhance it using domain knowledge. The learned model is then utilized as a guide for statistical machine learning methods to predict the occurrence of anomalous events in the near future. Further enhancement of the learned model is achieved by means of tuning the objective function of the algorithm to incorporate domain knowledge. Along with anomaly forecasting using multi-modal data, this dissertation also investigates the use of univariate time series data towards the prediction of onset of diseases using Bayesian nonparametrics.

ContributorsDas, Subhasish (Author) / Gupta, Sandeep K.S. (Thesis advisor) / Banerjee, Ayan (Committee member) / Indic, Premananda (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2022

Predicting Student Dropout in Self-Paced MOOC Course

Description

One persisting problem in Massive Open Online Courses (MOOCs) is the issue of student dropout from these courses. The prediction of student dropout from MOOC courses can identify the factors responsible for such an event and it can further initiate intervention before such an event to increase student success in…

One persisting problem in Massive Open Online Courses (MOOCs) is the issue of student dropout from these courses. The prediction of student dropout from MOOC courses can identify the factors responsible for such an event and it can further initiate intervention before such an event to increase student success in MOOC. There are different approaches and various features available for the prediction of student’s dropout in MOOC courses.In this research, the data derived from the self-paced math course ‘College Algebra and Problem Solving’ offered on the MOOC platform Open edX offered by Arizona State University (ASU) from 2016 to 2020 was considered. This research aims to predict the dropout of students from a MOOC course given a set of features engineered from the learning of students in a day. Machine Learning (ML) model used is Random Forest (RF) and this model is evaluated using the validation metrics like accuracy, precision, recall, F1-score, Area Under the Curve (AUC), Receiver Operating Characteristic (ROC) curve. The average rate of student learning progress was found to have more impact than other features. The model developed can predict the dropout or continuation of students on any given day in the MOOC course with an accuracy of 87.5%, AUC of 94.5%, precision of 88%, recall of 87.5%, and F1-score of 87.5% respectively. The contributing features and interactions were explained using Shapely values for the prediction of the model. The features engineered in this research are predictive of student dropout and could be used for similar courses to predict student dropout from the course. This model can also help in making interventions at a critical time to help students succeed in this MOOC course.

ContributorsDominic Ravichandran, Sheran Dass (Author) / Gary, Kevin (Thesis advisor) / Bansal, Ajay (Committee member) / Cunningham, James (Committee member) / Sannier, Adrian (Committee member) / Arizona State University (Publisher)

Created2021

A Retrospective Investigation to Assess the Potential Application of Predictive Machine Learning Algorithms in Oncology Clinical Trials

Description

The purpose of this investigation is to apply a machine learning algorithm with de-identified, historic oncology clinical trial data to assess the theoretical understanding of predictive modeling to derive potential clinical practice recommendations. Within this study, electronic medical records from the HonorHealth Virginia G. Piper Institute will undergo data visualization…

The purpose of this investigation is to apply a machine learning algorithm with de-identified, historic oncology clinical trial data to assess the theoretical understanding of predictive modeling to derive potential clinical practice recommendations. Within this study, electronic medical records from the HonorHealth Virginia G. Piper Institute will undergo data visualization to identify potential correlations and trends critical for model creation as well as further identify potential expansions or limitations of scope regarding model purpose. Hypothesis pursued post data visualization was the development of a predictive model for 6-month survival. Current standard is estimated physician accuracy at 56.5% accuracy at 6 months out. This study created supervised learning models using decision trees, KNN, SVM and Ensemble methods using combinations of LASSO Logistic Regression and Know-GRFF Random Forest for feature selection. SVM trained on a combined set of LASSO and Know-GRRF featured produced the highest performing model at 75.5% with an AUC of 0.82. This study demonstrates the potential for applying predictive modeling on readily available EMR records to drive clinical practice recommendations. The models developed could potentially, with further development, be used as an ancillary tool for jumpstarting patient-physician conversations on survival and life expectancy.

ContributorsLi, Richard Longfei (Co-author) / Liu, Li (Co-author, Thesis director) / Gosselin, Kevin (Co-author, Committee member) / Harrington Bioengineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2019-05