Search Content

Data Analysis of Effects of Officer Briefing Synergy in Combat Flight Simulation Game Dreadnought (2017)

Description

Dreadnought is a free-to-play multiplayer flight simulation in which two teams of 8 players each compete against one another to complete an objective. Each player controls a large-scale spaceship, various aspects of which can be customized to improve a player’s performance in a game. One such aspect is Officer Briefings,…

Dreadnought is a free-to-play multiplayer flight simulation in which two teams of 8 players each compete against one another to complete an objective. Each player controls a large-scale spaceship, various aspects of which can be customized to improve a player’s performance in a game. One such aspect is Officer Briefings, which are passive abilities that grant ships additional capabilities. Two of these Briefings, known as Retaliator and Get My Good Side, have strong synergy when used together, which has led to the Dreadnought community’s claiming that the Briefings are too powerful and should be rebalanced to be more in line with the power levels of other Briefings. This study collected gameplay data with and without the use of these specific Officer Briefings to determine the precise impact on gameplay. Linear correlation matrices and inference on two means were used to determine performance impact. It was found that, although these Officer Briefings do improve an individual player’s performance in a game, they do not have a consistent impact on the player’s team performance, and that these Officer Briefings are therefore not in need of rebalancing.

ContributorsJacobs, Max I. (Author) / Schneider, Laurence (Thesis director) / Tran, Samantha (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Chi-square orthogonal components for assessing goodness-of-fit of multidimensional multinomial data

Description

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among…

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components are used to develop goodness-of-fit tests for models fit to the counts obtained from the cross-classification of multi-category dependent variables. Ordinal categories are assumed. Orthogonal components defined on marginals are obtained when analyzing multi-dimensional contingency tables through the use of the QR decomposition. A subset of these orthogonal components can be used to construct limited-information tests that allow one to identify the source of lack-of-fit and provide an increase in power compared to Pearson's test. These tests can address the adverse effects presented when data are sparse. The tests rely on the set of first- and second-order marginals jointly, the set of second-order marginals only, and the random forest method, a popular algorithm for modeling large complex data sets. The performance of these tests is compared to the likelihood ratio test as well as to tests based on orthogonal polynomial components. The derived goodness-of-fit tests are evaluated with studies for detecting two- and three-way associations that are not accounted for by a categorical variable factor model with a single latent variable. In addition the tests are used to investigate the case when the model misspecification involves parameter constraints for large and sparse contingency tables. The methodology proposed here is applied to data from the 38th round of the State Survey conducted by the Institute for Public Policy and Michigan State University Social Research (2005) . The results illustrate the use of the proposed techniques in the context of a sparse data set.

ContributorsMilovanovic, Jelena (Author) / Young, Dennis (Thesis advisor) / Reiser, Mark R. (Thesis advisor) / Wilson, Jeffrey (Committee member) / Eubank, Randall (Committee member) / Yang, Yan (Committee member) / Arizona State University (Publisher)

Created2011

A continuous latent factor model for non-ignorable missing data in longitudinal studies

Description

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment…

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias in results if the missing mechanism is not taken into account. In other words, the missing mechanism is related to the unobserved responses. Data are said to be non-ignorable missing if the probabilities of missingness depend on quantities that might not be included in the model. Classical pattern-mixture models for non-ignorable missing values are widely used for longitudinal data analysis because they do not require explicit specification of the missing mechanism, with the data stratified according to a variety of missing patterns and a model specified for each stratum. However, this usually results in under-identifiability, because of the need to estimate many stratum-specific parameters even though the eventual interest is usually on the marginal parameters. Pattern mixture models have the drawback that a large sample is usually required. In this thesis, two studies are presented. The first study is motivated by an open problem from pattern mixture models. Simulation studies from this part show that information in the missing data indicators can be well summarized by a simple continuous latent structure, indicating that a large number of missing data patterns may be accounted by a simple latent factor. Simulation findings that are obtained in the first study lead to a novel model, a continuous latent factor model (CLFM). The second study develops CLFM which is utilized for modeling the joint distribution of missing values and longitudinal outcomes. The proposed CLFM model is feasible even for small sample size applications. The detailed estimation theory, including estimating techniques from both frequentist and Bayesian perspectives is presented. Model performance and evaluation are studied through designed simulations and three applications. Simulation and application settings change from correctly-specified missing data mechanism to mis-specified mechanism and include different sample sizes from longitudinal studies. Among three applications, an AIDS study includes non-ignorable missing values; the Peabody Picture Vocabulary Test data have no indication on missing data mechanism and it will be applied to a sensitivity analysis; the Growth of Language and Early Literacy Skills in Preschoolers with Developmental Speech and Language Impairment study, however, has full complete data and will be used to conduct a robust analysis. The CLFM model is shown to provide more precise estimators, specifically on intercept and slope related parameters, compared with Roy's latent class model and the classic linear mixed model. This advantage will be more obvious when a small sample size is the case, where Roy's model experiences challenges on estimation convergence. The proposed CLFM model is also robust when missing data are ignorable as demonstrated through a study on Growth of Language and Early Literacy Skills in Preschoolers.

ContributorsZhang, Jun (Author) / Reiser, Mark R. (Thesis advisor) / Barber, Jarrett (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / St Louis, Robert D. (Committee member) / Arizona State University (Publisher)

Created2013

Testing independence of parallel pseudorandom number streams: incorporating the data's multivariate nature

Description

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs…

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of stream dependence can be missed by each approach because neither one fully addresses the multivariate nature of the accumulated data when generators are run in parallel. This dissertation identifies these potential faults in the parallel testing methodologies of TestU01 and investigates two different methods to better detect inter-stream dependencies: correlation motivated multivariate tests and vector time series based tests. These methods have been implemented in an extension to TestU01 built in C++ and the unique aspects of this extension are discussed. A variety of different generation scenarios are then examined using the TestU01 suite in concert with the extension. This enhanced software package is found to better detect certain forms of inter-stream dependencies than the original TestU01 suites of tests.

ContributorsIsmay, Chester (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Kao, Ming-Hung (Committee member) / Lanchier, Nicolas (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2013

Multivariate generalization of reduced major axis regression

Description

A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis…

A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis (RMA) regression and is often used instead of Ordinary Least Squares (OLS) regression. Results for confidence intervals, hypothesis testing and asymptotic distributions of coefficient estimates in the bivariate case are reviewed. A generalization of RMA to more than two variables for fitting a plane to data is obtained by minimizing the sum of a function of the volumes obtained by drawing, from each data point, lines parallel to each coordinate axis to the fitted plane (Draper and Yang 1997; Goodman and Tofallis 2003). Generalized RMA results for the multivariate case obtained by Draper and Yang (1997) are reviewed and some investigations of multivariate RMA are given. A linear model is proposed that does not specify a dependent variable and allows for errors in the measurement of each variable. Coefficients in the model are estimated by minimization of the function of the volumes previously mentioned. Methods for obtaining coefficient estimates are discussed and simulations are used to investigate the distribution of coefficient estimates. The effects of sample size, sampling error and correlation among variables on the estimates are studied. Bootstrap methods are used to obtain confidence intervals for model coefficients. Residual analysis is considered for assessing model assumptions. Outlier and influential case diagnostics are developed and a forward selection method is proposed for subset selection of model variables. A real data example is provided that uses the methods developed. Topics for further research are discussed.

ContributorsLi, Jingjin (Author) / Young, Dennis (Thesis advisor) / Eubank, Randall (Thesis advisor) / Reiser, Mark R. (Committee member) / Kao, Ming-Hung (Committee member) / Yang, Yan (Committee member) / Arizona State University (Publisher)

Created2012

Some topics concerning the singular value decomposition and generalized singular value decomposition

Description

This dissertation involves three problems that are all related by the use of the singular value decomposition (SVD) or generalized singular value decomposition (GSVD). The specific problems are (i) derivation of a generalized singular value expansion (GSVE), (ii) analysis of the properties of the chi-squared method for regularization parameter selection…

This dissertation involves three problems that are all related by the use of the singular value decomposition (SVD) or generalized singular value decomposition (GSVD). The specific problems are (i) derivation of a generalized singular value expansion (GSVE), (ii) analysis of the properties of the chi-squared method for regularization parameter selection in the case of nonnormal data and (iii) formulation of a partial canonical correlation concept for continuous time stochastic processes. The finite dimensional SVD has an infinite dimensional generalization to compact operators. However, the form of the finite dimensional GSVD developed in, e.g., Van Loan does not extend directly to infinite dimensions as a result of a key step in the proof that is specific to the matrix case. Thus, the first problem of interest is to find an infinite dimensional version of the GSVD. One such GSVE for compact operators on separable Hilbert spaces is developed. The second problem concerns regularization parameter estimation. The chi-squared method for nonnormal data is considered. A form of the optimized regularization criterion that pertains to measured data or signals with nonnormal noise is derived. Large sample theory for phi-mixing processes is used to derive a central limit theorem for the chi-squared criterion that holds under certain conditions. Departures from normality are seen to manifest in the need for a possibly different scale factor in normalization rather than what would be used under the assumption of normality. The consequences of our large sample work are illustrated by empirical experiments. For the third problem, a new approach is examined for studying the relationships between a collection of functional random variables. The idea is based on the work of Sunder that provides mappings to connect the elements of algebraic and orthogonal direct sums of subspaces in a Hilbert space. When combined with a key isometry associated with a particular Hilbert space indexed stochastic process, this leads to a useful formulation for situations that involve the study of several second order processes. In particular, using our approach with two processes provides an independent derivation of the functional canonical correlation analysis (CCA) results of Eubank and Hsing. For more than two processes, a rigorous derivation of the functional partial canonical correlation analysis (PCCA) concept that applies to both finite and infinite dimensional settings is obtained.

ContributorsHuang, Qing (Author) / Eubank, Randall (Thesis advisor) / Renaut, Rosemary (Thesis advisor) / Cochran, Douglas (Committee member) / Gelb, Anne (Committee member) / Young, Dennis (Committee member) / Arizona State University (Publisher)

Created2012

Player Optimization in the National Football League: Creating a Winning Franchise

Description

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team…

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team be as successful as possible by defining which positions are most important to a team's success. Data from fifteen years of NFL games was collected and information on every player in the league was analyzed. First there needed to be a benchmark which describes a team as being average and then every player in the NFL must be compared to that average. Based on properties of linear regression using ordinary least squares this project aims to define such a model that shows each position's importance. Finally, once such a model had been established then the focus turned to the NFL draft in which the goal was to find a strategy of where each position needs to be drafted so that it is most likely to give the best payoff based on the results of the regression in part one.

ContributorsBalzer, Kevin Ryan (Author) / Goegan, Brian (Thesis director) / Dassanayake, Maduranga (Committee member) / Barrett, The Honors College (Contributor) / Economics Program in CLAS (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Balancing the Present and Future: Making Valuable Predictions for Continuous Improvement in Management

Description

In the words of W. Edwards Deming, "the central problem in management and in leadership is failure to understand the information in variation." While many quality management programs propose the institution of technical training in advanced statistical methods, this paper proposes that by understanding the fundamental information behind statistical theory,…

In the words of W. Edwards Deming, "the central problem in management and in leadership is failure to understand the information in variation." While many quality management programs propose the institution of technical training in advanced statistical methods, this paper proposes that by understanding the fundamental information behind statistical theory, and by minimizing bias and variance while fully utilizing the available information about the system at hand, one can make valuable, accurate predictions about the future. Combining this knowledge with the work of quality gurus W. E. Deming, Eliyahu Goldratt, and Dean Kashiwagi, a framework for making valuable predictions for continuous improvement is made. After this information is synthesized, it is concluded that the best way to make accurate, informative predictions about the future is to "balance the present and future," seeing the future through the lens of the present and thus minimizing bias, variance, and risk.

ContributorsSynodis, Nicholas Dahn (Author) / Kashiwagi, Dean (Thesis director, Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

It Takes Five: Basketball Teams Using Network Metrics

Description

Analytic research on basketball games is growing quickly, specifically in the National Basketball Association. This paper explored the development of this analytic research and discovered that there has been a focus on individual player metrics and a dearth of quantitative team characterizations and evaluations. Consequently, this paper continued the exploratory…

Analytic research on basketball games is growing quickly, specifically in the National Basketball Association. This paper explored the development of this analytic research and discovered that there has been a focus on individual player metrics and a dearth of quantitative team characterizations and evaluations. Consequently, this paper continued the exploratory research of Fewell and Armbruster's "Basketball teams as strategic networks" (2012), which modeled basketball teams as networks and used metrics to characterize team strategy in the NBA's 2010 playoffs. Individual players and outcomes were nodes and passes and actions were the links. This paper used data that was recorded from playoff games of the two 2012 NBA finalists: the Miami Heat and the Oklahoma City Thunder. The same metrics that Fewell and Armbruster used were explained, then calculated using this data. The offensive networks of these two teams during the playoffs were analyzed and interpreted by using other data and qualitative characterization of the teams' strategies; the paper found that the calculated metrics largely matched with our qualitative characterizations of the teams. The validity of the metrics in this paper and Fewell and Armbruster's paper was then discussed, and modeling basketball teams as multiple-order Markov chains rather than as networks was explored.

ContributorsMohanraj, Hariharan (Co-author) / Choi, David (Co-author) / Armbruster, Dieter (Thesis director) / Fewell, Jennifer (Committee member) / Brooks, Daniel (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2013-05

Statistical Analyses of Octopus bimaculoides Morphology and Physiology

Description

Chapter 1: Functional Specialization and Arm Length in Octopus bimaculoides Although studies are limited, there is some evidence that octopuses use their arms for specialized functions. For example, in Octopus maya and O. vulgaris, the anterior arms are utilized more frequently for grasping and exploring (Lee, 1992; Byrne et al., 2006a),…

Chapter 1: Functional Specialization and Arm Length in Octopus bimaculoides Although studies are limited, there is some evidence that octopuses use their arms for specialized functions. For example, in Octopus maya and O. vulgaris, the anterior arms are utilized more frequently for grasping and exploring (Lee, 1992; Byrne et al., 2006a), while posterior arms are more frequently utilized for crawling in O. vulgaris (Levy et al., 2015). In addition, O. vulgaris uses favored arms when retrieving food and making contact with a T-maze as dictated by their lateralized vision (Byrne, 2006b). O. vulgaris also demonstrates a preference for anterior arms when retrieving food from a Y-maze (Gutnick et. al. 2020). In Octopus bimaculoides bending and elongation were more frequent in anterior arms than posterior arms during reaching and grasping tasks, and right arms displayed deformation more frequently than left arms, with the exception of the hectocotylus (R3) in males (Kennedy et. al. 2020). Given these observed functional differences, the goal of this study was to determine if morphological differences exist between different octopus arm identities, coded as L1-L4 and R1-R4. In particular, the relationship between arm length and arm identity was analyzed statistically. The dataset included 111 intact arms from 22 wild-caught specimens of O. bimaculoides (11 male and 11 female). Simple linear regressions and an analysis of covariance were performed to test the relationship between arm length and a number of factors, including body mass, sex, anterior versus posterior location, and left versus the right side. Mass had a significant linear relationship with arm length and a one-way ANOVA demonstrated that arm identity is significantly correlated with arm length. Moreover, an analysis of covariance demonstrated that independent of mass, arm identity has a significant linear relationship with arm length. Despite an overall appearance of bilateral symmetry, arms of different identities do not have statistically equivalent lengths in O. bimaculoides. Furthermore, differences in arm length do not appear to be related to sex, anterior versus posterior location, or left or right side. These results call into question the existing practice of treating all arms as equivalent by either using a single-arm measurement as representative of all eight or calculating an average length and suggest that morphological analyses of specific arm identities may be more informative. Chapter 2: Predicting and Analyzing Octopus bimaculoides Sensitivity to Global Anesthetic Although global anesthetic is widely used in human and veterinary medicine the mechanism and impact of global anesthetic is relatively poorly comprehended, even in well-studied mammalian models. Invertebrate anesthetic is even less understood. In order to evaluate factors that impact anesthetic effectiveness analyses were conducted on 22 wild-caught specimens of Octopus bimaculoides during 72 anesthetic events.Three machine learning models: regression tree, random forest, and generalized additive model were utilized to make predictions of the concentration of anesthetic (percent ethanol by volume) from 11 features and to determine feature importance in making those predictions. The fit of each model was analyzed on three criteria: correlation coefficient, mean squared error, and relative error. Feature importance was determined in a model-specific manner. Predictions from the best performing model, random forest, have a .82 correlation coefficient with experimental values. Feature importance suggests that temperature on arrival and cohabitation factors strongly influence predictions for anesthesia concentration. This likely indicates the transportation process was incurring stress on the animals and that cohabitation was also stressful for the typically solitary O. bimaculoides. This long-term stress could lead to a decline in the animal’s well-being and a lower necessary ethanol concentration (Horvath et al., 2013). This analysis provides information to improve the care of octopus in laboratory settings and furthers the understanding of the effects of global anesthetic in invertebrates, particularly one with a distributed nervous system.

ContributorsSorge, Marieke Alexandria (Author) / Fisher, Rebecca (Thesis director) / Zhao, Yunpeng (Committee member) / Marvi, Hamid (Committee member) / School of Life Sciences (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Filtering by