Matching Items (7)
Filtering by

Clear all filters

156200-Thumbnail Image.png
Description
Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more traditional statistical methods of inference are still very effective and widely used. Nevertheless, with the decrease in cost of high-performance computing, these fields are starting to employ simulation models to generate insights into questions that have been elusive in the laboratory and field. Although these computational models allow for exquisite control over large numbers of parameters, they also generate data at a qualitatively different scale than most experts in these fields are accustomed to. Thus, more sophisticated methods from big-data statistics have an opportunity to better facilitate the often-forgotten area of bioinformatics that might be called “in-silicomics”.

As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics.
ContributorsSeto, Christian (Author) / Pavlic, Theodore (Thesis advisor) / Li, Jing (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)
Created2018
156148-Thumbnail Image.png
Description
Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association between the covariates and the outcomes across time. When multiple outcomes are of interest, association may exist between the various models. These correlations can lead to issues in model fitting and inference if not properly accounted for. This dissertation presents three papers discussing appropriate methods to properly consider different types of association. The first paper introduces an ANOVA based measure of intraclass correlation for three level hierarchical data with binary outcomes, and corresponding properties. This measure is useful for evaluating when the correlation due to clustering warrants a more complex model. This measure is used to investigate AIDS knowledge in a clustered study conducted in Bangladesh. The second paper develops the Partitioned generalized method of moments (Partitioned GMM) model for longitudinal studies. This model utilizes valid moment conditions to separately estimate the varying effects of each time-dependent covariate on the outcome over time using multiple coefficients. The model is fit to data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) to investigate risk factors of childhood obesity. In the third paper, the Partitioned GMM model is extended to jointly estimate regression models for multiple outcomes of interest. Thus, this approach takes into account both the correlation between the multivariate outcomes, as well as the correlation due to time-dependency in longitudinal studies. The model utilizes an expanded weight matrix and objective function composed of valid moment conditions to simultaneously estimate optimal regression coefficients. This approach is applied to Add Health data to simultaneously study drivers of outcomes including smoking, social alcohol usage, and obesity in children.
ContributorsIrimata, Kyle (Author) / Wilson, Jeffrey R (Thesis advisor) / Broatch, Jennifer (Committee member) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2018
154699-Thumbnail Image.png
Description
Unmanned aerial vehicles have received increased attention in the last decade due to their versatility, as well as the availability of inexpensive sensors (e.g. GPS, IMU) for their navigation and control. Multirotor vehicles, specifically quadrotors, have formed a fast growing field in robotics, with the range of applications spanning from

Unmanned aerial vehicles have received increased attention in the last decade due to their versatility, as well as the availability of inexpensive sensors (e.g. GPS, IMU) for their navigation and control. Multirotor vehicles, specifically quadrotors, have formed a fast growing field in robotics, with the range of applications spanning from surveil- lance and reconnaissance to agriculture and large area mapping. Although in most applications single quadrotors are used, there is an increasing interest in architectures controlling multiple quadrotors executing a collaborative task. This thesis introduces a new concept of control involving more than one quadrotors, according to which two quadrotors can be physically coupled in mid-flight. This concept equips the quadro- tors with new capabilities, e.g. increased payload or pursuit and capturing of other quadrotors. A comprehensive simulation of the approach is built to simulate coupled quadrotors. The dynamics and modeling of the coupled system is presented together with a discussion regarding the coupling mechanism, impact modeling and additional considerations that have been investigated. Simulation results are presented for cases of static coupling as well as enemy quadrotor pursuit and capture, together with an analysis of control methodology and gain tuning. Practical implementations are introduced as results show the feasibility of this design.
ContributorsLarsson, Daniel (Author) / Artemiadis, Panagiotis (Thesis advisor) / Marvi, Hamidreza (Committee member) / Berman, Spring (Committee member) / Arizona State University (Publisher)
Created2016
154589-Thumbnail Image.png
Description
Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability

Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability of customer information and interests. In this thesis, bank telemarketing data from a Portuguese banking institution were analyzed to determine predictability of several client demographic and financial attributes and find most contributing factors in each. Data were preprocessed to ensure quality, and then data mining models were generated for the attributes with logistic regression, support vector machine (SVM) and random forest using Orange as the data mining tool. Results were analyzed using precision, recall and F1 score.
ContributorsEjaz, Samira (Author) / Davulcu, Hasan (Thesis advisor) / Balasooriya, Janaka (Committee member) / Candan, Kasim (Committee member) / Arizona State University (Publisher)
Created2016
154026-Thumbnail Image.png
Description
There has been a vast increase in applications of Unmanned Aerial Vehicles (UAVs) in civilian domains. To operate in the civilian airspace, a UAV must be able to sense and avoid both static and moving obstacles for flight safety. While indoor and low-altitude environments are mainly occupied by static obstacles,

There has been a vast increase in applications of Unmanned Aerial Vehicles (UAVs) in civilian domains. To operate in the civilian airspace, a UAV must be able to sense and avoid both static and moving obstacles for flight safety. While indoor and low-altitude environments are mainly occupied by static obstacles, risks in space of higher altitude primarily come from moving obstacles such as other aircraft or flying vehicles in the airspace. Therefore, the ability to avoid moving obstacles becomes a necessity

for Unmanned Aerial Vehicles.

Towards enabling a UAV to autonomously sense and avoid moving obstacles, this thesis makes the following contributions. Initially, an image-based reactive motion planner is developed for a quadrotor to avoid a fast approaching obstacle. Furthermore, A Dubin’s curve based geometry method is developed as a global path planner for a fixed-wing UAV to avoid collisions with aircraft. The image-based method is unable to produce an optimal path and the geometry method uses a simplified UAV model. To compensate

these two disadvantages, a series of algorithms built upon the Closed-Loop Rapid Exploratory Random Tree are developed as global path planners to generate collision avoidance paths in real time. The algorithms are validated in Software-In-the-Loop (SITL) and Hardware-In-the-Loop (HIL) simulations using a fixed-wing UAV model and in real flight experiments using quadrotors. It is observed that the algorithm enables a UAV to avoid moving obstacles approaching to it with different directions and speeds.
ContributorsLin, Yucong (Author) / Saripalli, Srikanth (Thesis advisor) / Scowen, Paul (Committee member) / Fainekos, Georgios (Committee member) / Thangavelautham, Jekanthan (Committee member) / Youngbull, Cody (Committee member) / Arizona State University (Publisher)
Created2015
157561-Thumbnail Image.png
Description
Optimal design theory provides a general framework for the construction of experimental designs for categorical responses. For a binary response, where the possible result is one of two outcomes, the logistic regression model is widely used to relate a set of experimental factors with the probability of a positive

Optimal design theory provides a general framework for the construction of experimental designs for categorical responses. For a binary response, where the possible result is one of two outcomes, the logistic regression model is widely used to relate a set of experimental factors with the probability of a positive (or negative) outcome. This research investigates and proposes alternative designs to alleviate the problem of separation in small-sample D-optimal designs for the logistic regression model. Separation causes the non-existence of maximum likelihood parameter estimates and presents a serious problem for model fitting purposes.

First, it is shown that exact, multi-factor D-optimal designs for the logistic regression model can be susceptible to separation. Several logistic regression models are specified, and exact D-optimal designs of fixed sizes are constructed for each model. Sets of simulated response data are generated to estimate the probability of separation in each design. This study proves through simulation that small-sample D-optimal designs are prone to separation and that separation risk is dependent on the specified model. Additionally, it is demonstrated that exact designs of equal size constructed for the same models may have significantly different chances of encountering separation.

The second portion of this research establishes an effective strategy for augmentation, where additional design runs are judiciously added to eliminate separation that has occurred in an initial design. A simulation study is used to demonstrate that augmenting runs in regions of maximum prediction variance (MPV), where the predicted probability of either response category is 50%, most reliably eliminates separation. However, it is also shown that MPV augmentation tends to yield augmented designs with lower D-efficiencies.

The final portion of this research proposes a novel compound optimality criterion, DMP, that is used to construct locally optimal and robust compromise designs. A two-phase coordinate exchange algorithm is implemented to construct exact locally DMP-optimal designs. To address design dependence issues, a maximin strategy is proposed for designating a robust DMP-optimal design. A case study demonstrates that the maximin DMP-optimal design maintains comparable D-efficiencies to a corresponding Bayesian D-optimal design while offering significantly improved separation performance.
ContributorsPark, Anson Robert (Author) / Montgomery, Douglas C. (Thesis advisor) / Mancenido, Michelle V (Thesis advisor) / Escobedo, Adolfo R. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)
Created2019
190842-Thumbnail Image.png
Description
College and university enrollment has decreased nationwide every year for more than a decade as educational consumers increasingly question the value of higher education and discover alternatives to the traditional university system. Enrollment professionals seeking growth are tasked to develop and implement innovative solutions to address increasing enrollment challenges by

College and university enrollment has decreased nationwide every year for more than a decade as educational consumers increasingly question the value of higher education and discover alternatives to the traditional university system. Enrollment professionals seeking growth are tasked to develop and implement innovative solutions to address increasing enrollment challenges by being responsive to consumer values, interests, and needs. This multi-phase mixed methods action research study explores educational data mining and machine learning to understand and predict the enrollment decisions of admitted applicants (n=3,843) at the online campus of a public research university (phase one). Then, this innovation is distributed to understand how university enrollment professionals (n=7) interpret and are affected by the factors that influence online student enrollment decisions (phase two). Logistic regression was used to evaluate 24 independent variables to classify each applicant into a dichotomous dependent outcome: will an applicant enroll or will they not. The model identified 10 significant predictors and accurately categorized 81% of the enrollment outcomes at its peak. The population was comprised of online adult learners and the findings were carefully compared to the findings of previous studies which differed in institutional settings (on campus) and student populations (first-year students). Additionally, the study aimed to extend the work of previous literature through a second application phase within the local context. The second phase was guided by distributed leadership theory and the four-stage theory of organizational change and introduced the model to enrollment professionals within the local context through participation in a workshop coupled with a pre-/post-workshop survey. This convergent parallel mixed methods design resulted in themes that demonstrated enrollment managers had a genuine desire to understand and apply the model to assist in solving complex enrollment challenges and were interested in using the model to inform their perspectives, decision-making, and strategy development. This study concludes that educational data mining and machine learning can be used to predict the enrollment decisions of online adult students and that enrollment managers can use the data to inform the many enrollment challenges they are tasked to overcome.
ContributorsSinger, Cody Gene (Author) / Ross, Lydia (Thesis advisor) / Dorn, Sherman (Thesis advisor) / Cillay, David (Committee member) / Arizona State University (Publisher)
Created2023