Matching Items (17)
156200-Thumbnail Image.png
Description
Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more traditional statistical methods of inference are still very effective and widely used. Nevertheless, with the decrease in cost of high-performance computing, these fields are starting to employ simulation models to generate insights into questions that have been elusive in the laboratory and field. Although these computational models allow for exquisite control over large numbers of parameters, they also generate data at a qualitatively different scale than most experts in these fields are accustomed to. Thus, more sophisticated methods from big-data statistics have an opportunity to better facilitate the often-forgotten area of bioinformatics that might be called “in-silicomics”.

As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics.
ContributorsSeto, Christian (Author) / Pavlic, Theodore (Thesis advisor) / Li, Jing (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)
Created2018
156148-Thumbnail Image.png
Description
Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association between the covariates and the outcomes across time. When multiple outcomes are of interest, association may exist between the various models. These correlations can lead to issues in model fitting and inference if not properly accounted for. This dissertation presents three papers discussing appropriate methods to properly consider different types of association. The first paper introduces an ANOVA based measure of intraclass correlation for three level hierarchical data with binary outcomes, and corresponding properties. This measure is useful for evaluating when the correlation due to clustering warrants a more complex model. This measure is used to investigate AIDS knowledge in a clustered study conducted in Bangladesh. The second paper develops the Partitioned generalized method of moments (Partitioned GMM) model for longitudinal studies. This model utilizes valid moment conditions to separately estimate the varying effects of each time-dependent covariate on the outcome over time using multiple coefficients. The model is fit to data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) to investigate risk factors of childhood obesity. In the third paper, the Partitioned GMM model is extended to jointly estimate regression models for multiple outcomes of interest. Thus, this approach takes into account both the correlation between the multivariate outcomes, as well as the correlation due to time-dependency in longitudinal studies. The model utilizes an expanded weight matrix and objective function composed of valid moment conditions to simultaneously estimate optimal regression coefficients. This approach is applied to Add Health data to simultaneously study drivers of outcomes including smoking, social alcohol usage, and obesity in children.
ContributorsIrimata, Kyle (Author) / Wilson, Jeffrey R (Thesis advisor) / Broatch, Jennifer (Committee member) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2018
132857-Thumbnail Image.png
Description
Predictive analytics have been used in a wide variety of settings, including healthcare,
sports, banking, and other disciplines. We use predictive analytics and modeling to
determine the impact of certain factors that increase the probability of a successful
fourth down conversion in the Power 5 conferences. The logistic regression models

Predictive analytics have been used in a wide variety of settings, including healthcare,
sports, banking, and other disciplines. We use predictive analytics and modeling to
determine the impact of certain factors that increase the probability of a successful
fourth down conversion in the Power 5 conferences. The logistic regression models
predict the likelihood of going for fourth down with a 64% or more probability based on
2015-17 data obtained from ESPN’s college football API. Offense type though important
but non-measurable was incorporated as a random effect. We found that distance to go,
play type, field position, and week of the season were key leading covariates in
predictability. On average, our model performed as much as 14% better than coaches
in 2018.
ContributorsBlinkoff, Joshua Ian (Co-author) / Voeller, Michael (Co-author) / Wilson, Jeffrey (Thesis director) / Graham, Scottie (Committee member) / Dean, W.P. Carey School of Business (Contributor) / Department of Information Systems (Contributor) / Department of Management and Entrepreneurship (Contributor) / Barrett, The Honors College (Contributor)
Created2019-05
132858-Thumbnail Image.png
Description
Predictive analytics have been used in a wide variety of settings, including healthcare, sports, banking, and other disciplines. We use predictive analytics and modeling to determine the impact of certain factors that increase the probability of a successful fourth down conversion in the Power 5 conferences. The logistic regression models

Predictive analytics have been used in a wide variety of settings, including healthcare, sports, banking, and other disciplines. We use predictive analytics and modeling to determine the impact of certain factors that increase the probability of a successful fourth down conversion in the Power 5 conferences. The logistic regression models predict the likelihood of going for fourth down with a 64% or more probability based on 2015-17 data obtained from ESPN’s college football API. Offense type though important but non-measurable was incorporated as a random effect. We found that distance to go, play type, field position, and week of the season were key leading covariates in predictability. On average, our model performed as much as 14% better than coaches in 2018.
ContributorsVoeller, Michael Jeffrey (Co-author) / Blinkoff, Josh (Co-author) / Wilson, Jeffrey (Thesis director) / Graham, Scottie (Committee member) / Department of Information Systems (Contributor) / Department of Finance (Contributor) / Barrett, The Honors College (Contributor)
Created2019-05
Description
This paper explores the ability to predict yields of soybeans based on genetics and environmental factors. Based on the biology of soybeans, it has been shown that yields are best when soybeans grow within a certain temperature range. The event a soybean is exposed to temperature outside their accepted range

This paper explores the ability to predict yields of soybeans based on genetics and environmental factors. Based on the biology of soybeans, it has been shown that yields are best when soybeans grow within a certain temperature range. The event a soybean is exposed to temperature outside their accepted range is labeled as an instance of stress. Currently, there are few models that use genetic information to predict how crops may respond to stress. Using data provided by an agricultural business, a model was developed that can categorically label soybean varieties by their yield response to stress using genetic data. The model clusters varieties based on their yield production in response to stress. The clustering criteria is based on variance distribution and correlation. A logistic regression is then fitted to identify significant gene markers in varieties with minimal yield variance. Such characteristics provide a probabilistic outlook of how certain varieties will perform when planted in different regions. Given changing global climate conditions, this model demonstrates the potential of using data to efficiently develop and grow crops adjusted to climate changes.
ContributorsDean, Arlen (Co-author) / Ozcan, Ozkan (Co-author) / Travis, Daniel (Co-author) / Gel, Esma (Thesis director) / Armbruster, Dieter (Committee member) / Parry, Sam (Committee member) / Industrial, Systems and Operations Engineering Program (Contributor) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
154699-Thumbnail Image.png
Description
Unmanned aerial vehicles have received increased attention in the last decade due to their versatility, as well as the availability of inexpensive sensors (e.g. GPS, IMU) for their navigation and control. Multirotor vehicles, specifically quadrotors, have formed a fast growing field in robotics, with the range of applications spanning from

Unmanned aerial vehicles have received increased attention in the last decade due to their versatility, as well as the availability of inexpensive sensors (e.g. GPS, IMU) for their navigation and control. Multirotor vehicles, specifically quadrotors, have formed a fast growing field in robotics, with the range of applications spanning from surveil- lance and reconnaissance to agriculture and large area mapping. Although in most applications single quadrotors are used, there is an increasing interest in architectures controlling multiple quadrotors executing a collaborative task. This thesis introduces a new concept of control involving more than one quadrotors, according to which two quadrotors can be physically coupled in mid-flight. This concept equips the quadro- tors with new capabilities, e.g. increased payload or pursuit and capturing of other quadrotors. A comprehensive simulation of the approach is built to simulate coupled quadrotors. The dynamics and modeling of the coupled system is presented together with a discussion regarding the coupling mechanism, impact modeling and additional considerations that have been investigated. Simulation results are presented for cases of static coupling as well as enemy quadrotor pursuit and capture, together with an analysis of control methodology and gain tuning. Practical implementations are introduced as results show the feasibility of this design.
ContributorsLarsson, Daniel (Author) / Artemiadis, Panagiotis (Thesis advisor) / Marvi, Hamidreza (Committee member) / Berman, Spring (Committee member) / Arizona State University (Publisher)
Created2016
154589-Thumbnail Image.png
Description
Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability

Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability of customer information and interests. In this thesis, bank telemarketing data from a Portuguese banking institution were analyzed to determine predictability of several client demographic and financial attributes and find most contributing factors in each. Data were preprocessed to ensure quality, and then data mining models were generated for the attributes with logistic regression, support vector machine (SVM) and random forest using Orange as the data mining tool. Results were analyzed using precision, recall and F1 score.
ContributorsEjaz, Samira (Author) / Davulcu, Hasan (Thesis advisor) / Balasooriya, Janaka (Committee member) / Candan, Kasim (Committee member) / Arizona State University (Publisher)
Created2016
147645-Thumbnail Image.png
Description

We attempted to apply a novel approach to stock market predictions. The Logistic Regression machine learning algorithm (Joseph Berkson) was applied to analyze news article headlines as represented by a bag-of-words (tri-gram and single-gram) representation in an attempt to predict the trends of stock prices based on the Dow Jones

We attempted to apply a novel approach to stock market predictions. The Logistic Regression machine learning algorithm (Joseph Berkson) was applied to analyze news article headlines as represented by a bag-of-words (tri-gram and single-gram) representation in an attempt to predict the trends of stock prices based on the Dow Jones Industrial Average. The results showed that a tri-gram bag led to a 49% trend accuracy, a 1% increase when compared to the single-gram representation’s accuracy of 48%.

ContributorsBarolli, Adeiron (Author) / Jimenez Arista, Laura (Thesis director) / Wilson, Jeffrey (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2021-05
Description

Suicide is a significant public health problem, with incidence rates and lethality continuing to increase yearly. Given the large human and financial cost of suicide worldwide alongside the lack of progress in suicide prediction, more research is needed to inform suicide prevention and intervention efforts. This study approaches suicide from

Suicide is a significant public health problem, with incidence rates and lethality continuing to increase yearly. Given the large human and financial cost of suicide worldwide alongside the lack of progress in suicide prediction, more research is needed to inform suicide prevention and intervention efforts. This study approaches suicide from the lens of suicide note-leaving behavior, which can provide important information on predictors of suicide. Specifically, this study adds to the existing literature on note-leaving by examining history of suicidality, mental health problems, and their interaction in predicting suicide note-leaving, in addition to demographic predictors of note-leaving examined in previous research using data from the National Violent Death Reporting System (NVDRS, n = 98,515). We fit a logistic regression model predicting leaving a suicide note or not, the results of which indicated that those with mental health problems or a history of suicidality were more likely to leave a suicide note than those without such histories, and those with both mental health problems and a history of suicidality were most likely to leave a suicide note. These findings reinforce the need to tailor suicide prevention efforts toward identifying and targeting higher risk populations.

ContributorsCarnesi, Gregory (Author) / O'Rourke, Holly (Thesis director) / Brewer, Gene (Committee member) / Corbin, William (Committee member) / Chassin, Laurie (Committee member) / Barrett, The Honors College (Contributor) / Department of Psychology (Contributor) / Watts College of Public Service & Community Solut (Contributor) / Historical, Philosophical & Religious Studies, Sch (Contributor)
Created2022-05
162989-Thumbnail Image.png
ContributorsCarnesi, Gregory (Author) / O'Rourke, Holly (Thesis director) / Brewer, Gene (Committee member) / Corbin, William (Committee member) / Chassin, Laurie (Committee member) / Barrett, The Honors College (Contributor) / Department of Psychology (Contributor)
Created2022-05