sports, banking, and other disciplines. We use predictive analytics and modeling to
determine the impact of certain factors that increase the probability of a successful
fourth down conversion in the Power 5 conferences. The logistic regression models
predict the likelihood of going for fourth down with a 64% or more probability based on
2015-17 data obtained from ESPN’s college football API. Offense type though important
but non-measurable was incorporated as a random effect. We found that distance to go,
play type, field position, and week of the season were key leading covariates in
predictability. On average, our model performed as much as 14% better than coaches
in 2018.
Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships.
Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets.
Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets.
The burden of dementia and its primary cause, Alzheimer’s disease, continue to devastate many with no available cure although present research has delivered methods for risk calculation and models of disease development that promote preventative strategies. Presently Alzheimer’s disease affects 1 in 9 people aged 65 and older amounting to a total annual healthcare cost in 2023 in the United States of $345 billion between Alzheimer’s disease and other dementias making dementia one of the costliest conditions to society (“2023 Alzheimer’s Disease Facts and Figures,” 2023). This substantial cost can be dramatically lowered in addition to a reduction in the overall burden of dementia through the help of risk prediction models, but there is still a need for models to deliver an individual’s predicted time of onset that supplements risk prediction in hopes of improving preventative care. The aim of this study is to develop a model used to predict the age of onset for all-cause dementias and Alzheimer’s disease using demographic, comorbidity, and genetic data from a cohort sample. This study creates multiple regression models with methods of ordinary least squares (OLS) and least absolute shrinkage and selection operator (LASSO) regression methods to understand the capacity of predictor variables that estimate age of onset for all-cause dementia and Alzheimer’s disease. This study is unique in its use of a diverse cohort containing 346 participants to create a predictive model that originates from the All of Us Research Program database and seeks to represent an accurate sampling of the United States population. The regression models generated had no predictive capacity for the age of onset but outline a simplified approach for integrating public health data into a predictive model. The results from the generated models suggest a need for continued research linking risk factors that estimate time of onset.
In the U.S., the annual NCAA college basketball tournament, known as March Madness, draws in millions of people trying to predict who will win. Just one problem: no one has ever created a perfect bracket. By using a player-based rating system that updates throughout the season, a “predictive model” can be created to accurately predict teams with the best shot of winning the championship, and even show which players had the most impact on a single team in college basketball.