Matching Items (43)
Filtering by

Clear all filters

149960-Thumbnail Image.png
Description
By the von Neumann min-max theorem, a two person zero sum game with finitely many pure strategies has a unique value for each player (summing to zero) and each player has a non-empty set of optimal mixed strategies. If the payoffs are independent, identically distributed (iid) uniform (0,1) random

By the von Neumann min-max theorem, a two person zero sum game with finitely many pure strategies has a unique value for each player (summing to zero) and each player has a non-empty set of optimal mixed strategies. If the payoffs are independent, identically distributed (iid) uniform (0,1) random variables, then with probability one, both players have unique optimal mixed strategies utilizing the same number of pure strategies with positive probability (Jonasson 2004). The pure strategies with positive probability in the unique optimal mixed strategies are called saddle squares. In 1957, Goldman evaluated the probability of a saddle point (a 1 by 1 saddle square), which was rediscovered by many authors including Thorp (1979). Thorp gave two proofs of the probability of a saddle point, one using combinatorics and one using a beta integral. In 1965, Falk and Thrall investigated the integrals required for the probabilities of a 2 by 2 saddle square for 2 × n and m × 2 games with iid uniform (0,1) payoffs, but they were not able to evaluate the integrals. This dissertation generalizes Thorp's beta integral proof of Goldman's probability of a saddle point, establishing an integral formula for the probability that a m × n game with iid uniform (0,1) payoffs has a k by k saddle square (k ≤ m,n). Additionally, the probabilities of a 2 by 2 and a 3 by 3 saddle square for a 3 × 3 game with iid uniform(0,1) payoffs are found. For these, the 14 integrals observed by Falk and Thrall are dissected into 38 disjoint domains, and the integrals are evaluated using the basic properties of the dilogarithm function. The final results for the probabilities of a 2 by 2 and a 3 by 3 saddle square in a 3 × 3 game are linear combinations of 1, π2, and ln(2) with rational coefficients.
ContributorsManley, Michael (Author) / Kadell, Kevin W. J. (Thesis advisor) / Kao, Ming-Hung (Committee member) / Lanchier, Nicolas (Committee member) / Lohr, Sharon (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2011
150135-Thumbnail Image.png
Description
It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components are used to develop goodness-of-fit tests for models fit to the counts obtained from the cross-classification of multi-category dependent variables. Ordinal categories are assumed. Orthogonal components defined on marginals are obtained when analyzing multi-dimensional contingency tables through the use of the QR decomposition. A subset of these orthogonal components can be used to construct limited-information tests that allow one to identify the source of lack-of-fit and provide an increase in power compared to Pearson's test. These tests can address the adverse effects presented when data are sparse. The tests rely on the set of first- and second-order marginals jointly, the set of second-order marginals only, and the random forest method, a popular algorithm for modeling large complex data sets. The performance of these tests is compared to the likelihood ratio test as well as to tests based on orthogonal polynomial components. The derived goodness-of-fit tests are evaluated with studies for detecting two- and three-way associations that are not accounted for by a categorical variable factor model with a single latent variable. In addition the tests are used to investigate the case when the model misspecification involves parameter constraints for large and sparse contingency tables. The methodology proposed here is applied to data from the 38th round of the State Survey conducted by the Institute for Public Policy and Michigan State University Social Research (2005) . The results illustrate the use of the proposed techniques in the context of a sparse data set.
ContributorsMilovanovic, Jelena (Author) / Young, Dennis (Thesis advisor) / Reiser, Mark R. (Thesis advisor) / Wilson, Jeffrey (Committee member) / Eubank, Randall (Committee member) / Yang, Yan (Committee member) / Arizona State University (Publisher)
Created2011
152220-Thumbnail Image.png
Description
Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias in results if the missing mechanism is not taken into account. In other words, the missing mechanism is related to the unobserved responses. Data are said to be non-ignorable missing if the probabilities of missingness depend on quantities that might not be included in the model. Classical pattern-mixture models for non-ignorable missing values are widely used for longitudinal data analysis because they do not require explicit specification of the missing mechanism, with the data stratified according to a variety of missing patterns and a model specified for each stratum. However, this usually results in under-identifiability, because of the need to estimate many stratum-specific parameters even though the eventual interest is usually on the marginal parameters. Pattern mixture models have the drawback that a large sample is usually required. In this thesis, two studies are presented. The first study is motivated by an open problem from pattern mixture models. Simulation studies from this part show that information in the missing data indicators can be well summarized by a simple continuous latent structure, indicating that a large number of missing data patterns may be accounted by a simple latent factor. Simulation findings that are obtained in the first study lead to a novel model, a continuous latent factor model (CLFM). The second study develops CLFM which is utilized for modeling the joint distribution of missing values and longitudinal outcomes. The proposed CLFM model is feasible even for small sample size applications. The detailed estimation theory, including estimating techniques from both frequentist and Bayesian perspectives is presented. Model performance and evaluation are studied through designed simulations and three applications. Simulation and application settings change from correctly-specified missing data mechanism to mis-specified mechanism and include different sample sizes from longitudinal studies. Among three applications, an AIDS study includes non-ignorable missing values; the Peabody Picture Vocabulary Test data have no indication on missing data mechanism and it will be applied to a sensitivity analysis; the Growth of Language and Early Literacy Skills in Preschoolers with Developmental Speech and Language Impairment study, however, has full complete data and will be used to conduct a robust analysis. The CLFM model is shown to provide more precise estimators, specifically on intercept and slope related parameters, compared with Roy's latent class model and the classic linear mixed model. This advantage will be more obvious when a small sample size is the case, where Roy's model experiences challenges on estimation convergence. The proposed CLFM model is also robust when missing data are ignorable as demonstrated through a study on Growth of Language and Early Literacy Skills in Preschoolers.
ContributorsZhang, Jun (Author) / Reiser, Mark R. (Thesis advisor) / Barber, Jarrett (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / St Louis, Robert D. (Committee member) / Arizona State University (Publisher)
Created2013
151976-Thumbnail Image.png
Description
Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of stream dependence can be missed by each approach because neither one fully addresses the multivariate nature of the accumulated data when generators are run in parallel. This dissertation identifies these potential faults in the parallel testing methodologies of TestU01 and investigates two different methods to better detect inter-stream dependencies: correlation motivated multivariate tests and vector time series based tests. These methods have been implemented in an extension to TestU01 built in C++ and the unique aspects of this extension are discussed. A variety of different generation scenarios are then examined using the TestU01 suite in concert with the extension. This enhanced software package is found to better detect certain forms of inter-stream dependencies than the original TestU01 suites of tests.
ContributorsIsmay, Chester (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Kao, Ming-Hung (Committee member) / Lanchier, Nicolas (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2013
150618-Thumbnail Image.png
Description
Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, and confidence interval (CI) coverage rates for these models under different conditions. Mean structure, variance structure, effect size, continuous or binary predictor, and sample size were included in the factorial design. Mean structures reflected either a linear relationship or an exponential relationship between the predictor and the outcome. Variance structures reflected homoscedastic (as in linear regression), heteroscedastic (monotonically increasing) or heteroscedastic (increasing then decreasing) variance. Small to medium, large, and very large effect sizes were examined. Sample sizes were 100, 200, 500, and 1000. Results of the simulation study showed that ordinal logistic regression produced type I error, statistical power, and CI coverage rates that were consistently within acceptable limits. Linear regression produced type I error and statistical power that were within acceptable limits, but CI coverage was too low for several conditions important to the analysis of counts and frequencies. Poisson regression and beta regression displayed inflated type I error, low statistical power, and low CI coverage rates for nearly all conditions. All models produced unbiased estimates of the regression coefficient. Based on the statistical performance of the four models, ordinal logistic regression seems to be the preferred method for analyzing GCGF outcomes. Linear regression also performed well, but CI coverage was too low for conditions with an exponential mean structure and/or heteroscedastic variance. Some aspects of model prediction, such as model fit, were not assessed here; more research is necessary to determine which statistical model best captures the unique properties of GCGF outcomes.
ContributorsCoxe, Stefany (Author) / Aiken, Leona S. (Thesis advisor) / West, Stephen G. (Thesis advisor) / Mackinnon, David P (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2012
Description
When analyzing longitudinal data it is essential to account both for the correlation inherent from the repeated measures of the responses as well as the correlation realized on account of the feedback created between the responses at a particular time and the predictors at other times. A generalized method of

When analyzing longitudinal data it is essential to account both for the correlation inherent from the repeated measures of the responses as well as the correlation realized on account of the feedback created between the responses at a particular time and the predictors at other times. A generalized method of moments (GMM) for estimating the coefficients in longitudinal data is presented. The appropriate and valid estimating equations associated with the time-dependent covariates are identified, thus providing substantial gains in efficiency over generalized estimating equations (GEE) with the independent working correlation. Identifying the estimating equations for computation is of utmost importance. This paper provides a technique for identifying the relevant estimating equations through a general method of moments. I develop an approach that makes use of all the valid estimating equations necessary with each time-dependent and time-independent covariate. Moreover, my approach does not assume that feedback is always present over time, or present at the same degree. I fit the GMM correlated logistic regression model in SAS with PROC IML. I examine two datasets for illustrative purposes. I look at rehospitalization in a Medicare database. I revisit data regarding the relationship between the body mass index and future morbidity among children in the Philippines. These datasets allow us to compare my results with some earlier methods of analyses.
ContributorsYin, Jianqiong (Author) / Wilson, Jeffrey Wilson (Thesis advisor) / Reiser, Mark R. (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)
Created2012
150996-Thumbnail Image.png
Description
A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis

A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis (RMA) regression and is often used instead of Ordinary Least Squares (OLS) regression. Results for confidence intervals, hypothesis testing and asymptotic distributions of coefficient estimates in the bivariate case are reviewed. A generalization of RMA to more than two variables for fitting a plane to data is obtained by minimizing the sum of a function of the volumes obtained by drawing, from each data point, lines parallel to each coordinate axis to the fitted plane (Draper and Yang 1997; Goodman and Tofallis 2003). Generalized RMA results for the multivariate case obtained by Draper and Yang (1997) are reviewed and some investigations of multivariate RMA are given. A linear model is proposed that does not specify a dependent variable and allows for errors in the measurement of each variable. Coefficients in the model are estimated by minimization of the function of the volumes previously mentioned. Methods for obtaining coefficient estimates are discussed and simulations are used to investigate the distribution of coefficient estimates. The effects of sample size, sampling error and correlation among variables on the estimates are studied. Bootstrap methods are used to obtain confidence intervals for model coefficients. Residual analysis is considered for assessing model assumptions. Outlier and influential case diagnostics are developed and a forward selection method is proposed for subset selection of model variables. A real data example is provided that uses the methods developed. Topics for further research are discussed.
ContributorsLi, Jingjin (Author) / Young, Dennis (Thesis advisor) / Eubank, Randall (Thesis advisor) / Reiser, Mark R. (Committee member) / Kao, Ming-Hung (Committee member) / Yang, Yan (Committee member) / Arizona State University (Publisher)
Created2012
133482-Thumbnail Image.png
Description
Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries with missing data. The new column is created to measure price difference to create a more accurate analysis on the change in price. Eight relevant variables are selected using cross validation: the total number of bitcoins, the total size of the blockchains, the hash rate, mining difficulty, revenue from mining, transaction fees, the cost of transactions and the estimated transaction volume. The in-sample data is modeled using a simple tree fit, first with one variable and then with eight. Using all eight variables, the in-sample model and data have a correlation of 0.6822657. The in-sample model is improved by first applying bootstrap aggregation (also known as bagging) to fit 400 decision trees to the in-sample data using one variable. Then the random forests technique is applied to the data using all eight variables. This results in a correlation between the model and data of 9.9443413. The random forests technique is then applied to an Ethereum dataset, resulting in a correlation of 9.6904798. Finally, an out-of-sample model is created for Bitcoin and Ethereum using random forests, with a benchmark correlation of 0.03 for financial data. The correlation between the training model and the testing data for Bitcoin was 0.06957639, while for Ethereum the correlation was -0.171125. In conclusion, it is confirmed that cryptocurrencies can have accurate in-sample models by applying the random forests method to a dataset. However, out-of-sample modeling is more difficult, but in some cases better than typical forms of financial data. It should also be noted that cryptocurrency data has similar properties to other related financial datasets, realizing future potential for system modeling for cryptocurrency within the financial world.
ContributorsBrowning, Jacob Christian (Author) / Meuth, Ryan (Thesis director) / Jones, Donald (Committee member) / McCulloch, Robert (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
134976-Thumbnail Image.png
Description
Problems related to alcohol consumption cause not only extra economic expenses, but are an expense to the health of both drinkers and non-drinkers due to the harm directly and indirectly caused by alcohol consumption. Investigating predictors and reasons for alcohol-related problems is of importance, as alcohol-related problems could be prevented

Problems related to alcohol consumption cause not only extra economic expenses, but are an expense to the health of both drinkers and non-drinkers due to the harm directly and indirectly caused by alcohol consumption. Investigating predictors and reasons for alcohol-related problems is of importance, as alcohol-related problems could be prevented by quitting or limiting consumption of alcohol. We were interested in predicting alcohol-related problems using multiple linear regression and regression trees, and then comparing the regressions to the tree. Impaired control, anxiety sensitivity, mother permissiveness, father permissiveness, gender, and age were included as predictors. The data used was comprised of participants (n=835) sampled from students at Arizona State University. A multiple linear regression without interactions, multiple linear regression with two-way interactions and squares, and a regression tree were used and compared. The regression and the tree had similar results. Multiple interactions of variables predicted alcohol-related problems. Overall, the tree was easier to interpret than the regressions, however, the regressions provided specific predicted alcohol-related problems scores, whereas the tree formed large groups and had a predicted alcohol-related problems score for each group. Nevertheless, the tree still predicted alcohol-related problems nearly as well, if not better than the regressions.
ContributorsVoorhies, Kirsten Reed (Author) / McCulloch, Robert (Thesis director) / Zheng, Yi (Committee member) / Patock-Peckham, Julie (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Department of Psychology (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
134937-Thumbnail Image.png
Description
Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has been the feeling that collegiate and professional cheerleaders are not

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has been the feeling that collegiate and professional cheerleaders are not given the appropriate recognition nor credit for the amount of work they do. This contribution is sometimes in question as it depends on the school and the sports teams. The benefits are believed to vary based on the university or professional teams. This research investigated how collegiate cheerleaders and dancers add value to the university sport experience. We interviewed key personnel at the university and conference level and polled spectators at sporting events such as basketball and football. We found that the university administration and athletic personnel see the ASU Spirit Squad as value added but spectators had a totally different perspective. The university acknowledges the added value of the Spirit Squad and its necessity. Spectators attend ASU sporting events to support the university and for the entertainment. They enjoy watching the ASU Spirit Squad perform but would continue to attend ASU sporting events even if cheerleaders and dancers were not there.
ContributorsThomas, Jessica Ann (Author) / Wilson, Jeffrey (Thesis director) / Garner, Deana (Committee member) / Department of Supply Chain Management (Contributor) / Department of Marketing (Contributor) / School of Community Resources and Development (Contributor) / Barrett, The Honors College (Contributor)
Created2017-05