Matching Items (10)
Filtering by

Clear all filters

152220-Thumbnail Image.png
Description
Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias in results if the missing mechanism is not taken into account. In other words, the missing mechanism is related to the unobserved responses. Data are said to be non-ignorable missing if the probabilities of missingness depend on quantities that might not be included in the model. Classical pattern-mixture models for non-ignorable missing values are widely used for longitudinal data analysis because they do not require explicit specification of the missing mechanism, with the data stratified according to a variety of missing patterns and a model specified for each stratum. However, this usually results in under-identifiability, because of the need to estimate many stratum-specific parameters even though the eventual interest is usually on the marginal parameters. Pattern mixture models have the drawback that a large sample is usually required. In this thesis, two studies are presented. The first study is motivated by an open problem from pattern mixture models. Simulation studies from this part show that information in the missing data indicators can be well summarized by a simple continuous latent structure, indicating that a large number of missing data patterns may be accounted by a simple latent factor. Simulation findings that are obtained in the first study lead to a novel model, a continuous latent factor model (CLFM). The second study develops CLFM which is utilized for modeling the joint distribution of missing values and longitudinal outcomes. The proposed CLFM model is feasible even for small sample size applications. The detailed estimation theory, including estimating techniques from both frequentist and Bayesian perspectives is presented. Model performance and evaluation are studied through designed simulations and three applications. Simulation and application settings change from correctly-specified missing data mechanism to mis-specified mechanism and include different sample sizes from longitudinal studies. Among three applications, an AIDS study includes non-ignorable missing values; the Peabody Picture Vocabulary Test data have no indication on missing data mechanism and it will be applied to a sensitivity analysis; the Growth of Language and Early Literacy Skills in Preschoolers with Developmental Speech and Language Impairment study, however, has full complete data and will be used to conduct a robust analysis. The CLFM model is shown to provide more precise estimators, specifically on intercept and slope related parameters, compared with Roy's latent class model and the classic linear mixed model. This advantage will be more obvious when a small sample size is the case, where Roy's model experiences challenges on estimation convergence. The proposed CLFM model is also robust when missing data are ignorable as demonstrated through a study on Growth of Language and Early Literacy Skills in Preschoolers.
ContributorsZhang, Jun (Author) / Reiser, Mark R. (Thesis advisor) / Barber, Jarrett (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / St Louis, Robert D. (Committee member) / Arizona State University (Publisher)
Created2013
150135-Thumbnail Image.png
Description
It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components are used to develop goodness-of-fit tests for models fit to the counts obtained from the cross-classification of multi-category dependent variables. Ordinal categories are assumed. Orthogonal components defined on marginals are obtained when analyzing multi-dimensional contingency tables through the use of the QR decomposition. A subset of these orthogonal components can be used to construct limited-information tests that allow one to identify the source of lack-of-fit and provide an increase in power compared to Pearson's test. These tests can address the adverse effects presented when data are sparse. The tests rely on the set of first- and second-order marginals jointly, the set of second-order marginals only, and the random forest method, a popular algorithm for modeling large complex data sets. The performance of these tests is compared to the likelihood ratio test as well as to tests based on orthogonal polynomial components. The derived goodness-of-fit tests are evaluated with studies for detecting two- and three-way associations that are not accounted for by a categorical variable factor model with a single latent variable. In addition the tests are used to investigate the case when the model misspecification involves parameter constraints for large and sparse contingency tables. The methodology proposed here is applied to data from the 38th round of the State Survey conducted by the Institute for Public Policy and Michigan State University Social Research (2005) . The results illustrate the use of the proposed techniques in the context of a sparse data set.
ContributorsMilovanovic, Jelena (Author) / Young, Dennis (Thesis advisor) / Reiser, Mark R. (Thesis advisor) / Wilson, Jeffrey (Committee member) / Eubank, Randall (Committee member) / Yang, Yan (Committee member) / Arizona State University (Publisher)
Created2011
156264-Thumbnail Image.png
Description
The Pearson and likelihood ratio statistics are well-known in goodness-of-fit testing and are commonly used for models applied to multinomial count data. When data are from a table formed by the cross-classification of a large number of variables, these goodness-of-fit statistics may have lower power and inaccurate Type I error

The Pearson and likelihood ratio statistics are well-known in goodness-of-fit testing and are commonly used for models applied to multinomial count data. When data are from a table formed by the cross-classification of a large number of variables, these goodness-of-fit statistics may have lower power and inaccurate Type I error rate due to sparseness. Pearson's statistic can be decomposed into orthogonal components associated with the marginal distributions of observed variables, and an omnibus fit statistic can be obtained as a sum of these components. When the statistic is a sum of components for lower-order marginals, it has good performance for Type I error rate and statistical power even when applied to a sparse table. In this dissertation, goodness-of-fit statistics using orthogonal components based on second- third- and fourth-order marginals were examined. If lack-of-fit is present in higher-order marginals, then a test that incorporates the higher-order marginals may have a higher power than a test that incorporates only first- and/or second-order marginals. To this end, two new statistics based on the orthogonal components of Pearson's chi-square that incorporate third- and fourth-order marginals were developed, and the Type I error, empirical power, and asymptotic power under different sparseness conditions were investigated. Individual orthogonal components as test statistics to identify lack-of-fit were also studied. The performance of individual orthogonal components to other popular lack-of-fit statistics were also compared. When the number of manifest variables becomes larger than 20, most of the statistics based on marginal distributions have limitations in terms of computer resources and CPU time. Under this problem, when the number manifest variables is larger than or equal to 20, the performance of a bootstrap based method to obtain p-values for Pearson-Fisher statistic, fit to confirmatory dichotomous variable factor analysis model, and the performance of Tollenaar and Mooijaart (2003) statistic were investigated.
ContributorsDassanayake, Mudiyanselage Maduranga Kasun (Author) / Reiser, Mark R. (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / St. Louis, Robert (Committee member) / Kamarianakis, Ioannis (Committee member) / Arizona State University (Publisher)
Created2018
134937-Thumbnail Image.png
Description
Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has been the feeling that collegiate and professional cheerleaders are not

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has been the feeling that collegiate and professional cheerleaders are not given the appropriate recognition nor credit for the amount of work they do. This contribution is sometimes in question as it depends on the school and the sports teams. The benefits are believed to vary based on the university or professional teams. This research investigated how collegiate cheerleaders and dancers add value to the university sport experience. We interviewed key personnel at the university and conference level and polled spectators at sporting events such as basketball and football. We found that the university administration and athletic personnel see the ASU Spirit Squad as value added but spectators had a totally different perspective. The university acknowledges the added value of the Spirit Squad and its necessity. Spectators attend ASU sporting events to support the university and for the entertainment. They enjoy watching the ASU Spirit Squad perform but would continue to attend ASU sporting events even if cheerleaders and dancers were not there.
ContributorsThomas, Jessica Ann (Author) / Wilson, Jeffrey (Thesis director) / Garner, Deana (Committee member) / Department of Supply Chain Management (Contributor) / Department of Marketing (Contributor) / School of Community Resources and Development (Contributor) / Barrett, The Honors College (Contributor)
Created2017-05
153850-Thumbnail Image.png
Description
Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence region as well as the confidence interval for x and

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence region as well as the confidence interval for x and y-coordinates of the vertex using two methods, the gradient method and the delta method. Under some models, an indirect test on the location of the curve can be based on the intercept and slope parameters, but in other models, a direct test on the vertex is required. We present a quadratic-form statistic for a test of the null hypothesis that there is no shift in the location of the vertex in a linear mixed model. The statistic has an asymptotic chi-squared distribution. For 2nd degree polynomials of two independent samples, we present an approximate confidence region for the difference of vertices of two quadratic growth curves using the modified gradient method and delta method. Another chi-square test statistic is derived for a direct test on the vertex and is compared to an F test statistic for the indirect test. Power functions are derived for both the indirect F test and the direct chi-square test. We calculate the theoretical power and present a simulation study to investigate the power of the tests. We also present a simulation study to assess the influence of sample size, measurement occasions and nature of the random effects. The test statistics will be applied to the Tell Efficacy longitudinal study, in which sound identification scores and language protocol scores for children are modeled as quadratic growth curves for two independent groups, TELL and control curriculum. The interpretation of shift in the location of the vertices is also presented.
ContributorsYu, Wanchunzi (Author) / Reiser, Mark R. (Thesis advisor) / Barber, Jarrett (Committee member) / Kao, Ming-Hung (Committee member) / St Louis, Robert D (Committee member) / Wilson, Jeffrey (Committee member) / Arizona State University (Publisher)
Created2015
147645-Thumbnail Image.png
Description

We attempted to apply a novel approach to stock market predictions. The Logistic Regression machine learning algorithm (Joseph Berkson) was applied to analyze news article headlines as represented by a bag-of-words (tri-gram and single-gram) representation in an attempt to predict the trends of stock prices based on the Dow Jones

We attempted to apply a novel approach to stock market predictions. The Logistic Regression machine learning algorithm (Joseph Berkson) was applied to analyze news article headlines as represented by a bag-of-words (tri-gram and single-gram) representation in an attempt to predict the trends of stock prices based on the Dow Jones Industrial Average. The results showed that a tri-gram bag led to a 49% trend accuracy, a 1% increase when compared to the single-gram representation’s accuracy of 48%.

ContributorsBarolli, Adeiron (Author) / Jimenez Arista, Laura (Thesis director) / Wilson, Jeffrey (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2021-05
164185-Thumbnail Image.png
Description

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents.

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents. While this may sound nefarious, the vast amounts of data about these games and student-athletes can be used to glean insights about the sports themselves in order to help student-athletes be more successful. Data analytics can be used to make sense of the available data by creating models and using other tools available that can predict how student-athletes and their teams will do in the future based on the data gathered from how they have performed in the past. Colleges and universities across the country compete in a vast array of sports. As a result of these differences, the sports with the largest amounts of data available will be the more popular college sports, such as football, men’s and women’s basketball, baseball and softball. Arizona State University, as a member of the Pac-12 conference, has a storied athletic tradition and decades of history in all of these sports, providing a large amount of data that can be used to analyze student-athlete success in these sports and help predict future success. However, data is available from numerous other college athletic programs that could provide a much larger sample to help predict with greater accuracy why certain teams and student-athletes are more successful than others. The explosion of analytics across the sports world has resulted in a new focus on utilizing statistical techniques to improve all aspects of different sports. Sports science has influenced medical departments, and model-building has been used to determine optimal in-game strategy and predict the outcomes of future games based on team strength. It is this latter approach that has become the focus of this paper, with football being used as a subject due to its vast popularity and massive supply of easily accessible data.

ContributorsLindstrom, Trent (Author) / Schneider, Laurence (Thesis director) / Wilson, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)
Created2022-05
164186-Thumbnail Image.png
Description

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents.

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents. While this may sound nefarious, the vast amounts of data about these games and student-athletes can be used to glean insights about the sports themselves in order to help student-athletes be more successful. Data analytics can be used to make sense of the available data by creating models and using other tools available that can predict how student-athletes and their teams will do in the future based on the data gathered from how they have performed in the past. Colleges and universities across the country compete in a vast array of sports. As a result of these differences, the sports with the largest amounts of data available will be the more popular college sports, such as football, men’s and women’s basketball, baseball and softball. Arizona State University, as a member of the Pac-12 conference, has a storied athletic tradition and decades of history in all of these sports, providing a large amount of data that can be used to analyze student-athlete success in these sports and help predict future success. However, data is available from numerous other college athletic programs that could provide a much larger sample to help predict with greater accuracy why certain teams and student-athletes are more successful than others. The explosion of analytics across the sports world has resulted in a new focus on utilizing statistical techniques to improve all aspects of different sports. Sports science has influenced medical departments, and model-building has been used to determine optimal in-game strategy and predict the outcomes of future games based on team strength. It is this latter approach that has become the focus of this paper, with football being used as a subject due to its vast popularity and massive supply of easily accessible data.

ContributorsLindstrom, Trent (Author) / Schneider, Laurence (Thesis director) / Wilson, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)
Created2022-05
158282-Thumbnail Image.png
Description
Whilst linear mixed models offer a flexible approach to handle data with multiple sources of random variability, the related hypothesis testing for the fixed effects often encounters obstacles when the sample size is small and the underlying distribution for the test statistic is unknown. Consequently, five methods of denominator degrees

Whilst linear mixed models offer a flexible approach to handle data with multiple sources of random variability, the related hypothesis testing for the fixed effects often encounters obstacles when the sample size is small and the underlying distribution for the test statistic is unknown. Consequently, five methods of denominator degrees of freedom approximations (residual, containment, between-within, Satterthwaite, Kenward-Roger) are developed to overcome this problem. This study aims to evaluate the performance of these five methods with a mixed model consisting of random intercept and random slope. Specifically, simulations are conducted to provide insights on the F-statistics, denominator degrees of freedom and p-values each method gives with respect to different settings of the sample structure, the fixed-effect slopes and the missing-data proportion. The simulation results show that the residual method performs the worst in terms of F-statistics and p-values. Also, Satterthwaite and Kenward-Roger methods tend to be more sensitive to the change of designs. The Kenward-Roger method performs the best in terms of F-statistics when the null hypothesis is true.
ContributorsHuang, Ping-Chieh (Author) / Reiser, Mark R. (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / Arizona State University (Publisher)
Created2020
171467-Thumbnail Image.png
Description
Goodness-of-fit test is a hypothesis test used to test whether a given model fit the data well. It is extremely difficult to find a universal goodness-of-fit test that can test all types of statistical models. Moreover, traditional Pearson’s chi-square goodness-of-fit test is sometimes considered to be an omnibus test but

Goodness-of-fit test is a hypothesis test used to test whether a given model fit the data well. It is extremely difficult to find a universal goodness-of-fit test that can test all types of statistical models. Moreover, traditional Pearson’s chi-square goodness-of-fit test is sometimes considered to be an omnibus test but not a directional test so it is hard to find the source of poor fit when the null hypothesis is rejected and it will lose its validity and effectiveness in some of the special conditions. Sparseness is such an abnormal condition. One effective way to overcome the adverse effects of sparseness is to use limited-information statistics. In this dissertation, two topics about constructing and using limited-information statistics to overcome sparseness for binary data will be included. In the first topic, the theoretical framework of pairwise concordance and the transformation matrix which is used to extract the corresponding marginals and their generalizations are provided. Then a series of new chi-square test statistics and corresponding orthogonal components are proposed, which are used to detect the model misspecification for longitudinal binary data. One of the important conclusions is, the test statistic $X^2_{2c}$ can be taken as an extension of $X^2_{[2]}$, the second-order marginals of traditional Pearson’s chi-square statistic. In the second topic, the research interest is to investigate the effect caused by different intercept patterns when using Lagrange multiplier (LM) test to find the source of misfit for two items in 2-PL IRT model. Several other directional chi-square test statistics are taken into comparison. The simulation results showed that the intercept pattern does affect the performance of goodness-of-fit test, especially the power to find the source of misfit if the source of misfit does exist. More specifically, the power is directly affected by the `intercept distance' between two misfit variables. Another discovery is, the LM test statistic has the best balance between the accurate Type I error rates and high empirical power, which indicates the LM test is a robust test.
ContributorsXu, Jinhui (Author) / Reiser, Mark (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / Zheng, Yi (Committee member) / Edwards, Michael (Committee member) / Arizona State University (Publisher)
Created2022