Search Content

A test and confidence set for comparing the location of quadratic growth curves

Description

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence region as well as the confidence interval for x and…

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence region as well as the confidence interval for x and y-coordinates of the vertex using two methods, the gradient method and the delta method. Under some models, an indirect test on the location of the curve can be based on the intercept and slope parameters, but in other models, a direct test on the vertex is required. We present a quadratic-form statistic for a test of the null hypothesis that there is no shift in the location of the vertex in a linear mixed model. The statistic has an asymptotic chi-squared distribution. For 2nd degree polynomials of two independent samples, we present an approximate confidence region for the difference of vertices of two quadratic growth curves using the modified gradient method and delta method. Another chi-square test statistic is derived for a direct test on the vertex and is compared to an F test statistic for the indirect test. Power functions are derived for both the indirect F test and the direct chi-square test. We calculate the theoretical power and present a simulation study to investigate the power of the tests. We also present a simulation study to assess the influence of sample size, measurement occasions and nature of the random effects. The test statistics will be applied to the Tell Efficacy longitudinal study, in which sound identification scores and language protocol scores for children are modeled as quadratic growth curves for two independent groups, TELL and control curriculum. The interpretation of shift in the location of the vertices is also presented.

ContributorsYu, Wanchunzi (Author) / Reiser, Mark R. (Thesis advisor) / Barber, Jarrett (Committee member) / Kao, Ming-Hung (Committee member) / St Louis, Robert D (Committee member) / Wilson, Jeffrey (Committee member) / Arizona State University (Publisher)

Created2015

THE IMPACT OF RACE AND OTHER LARGE-SCALE PREDICTORS ON THE INCIDENCE OF MELANOMA SKIN CANCER-A BIOSTATISTICAL ANALYSIS

Description

Melanoma is one of the most severe forms of skin cancer and can be life-threatening due to metastasis if not caught early on in its development. Over the past decade, the U.S. Government added a Healthy People 2020 objective to reduce the melanoma skin cancer rate in the U.S. population.…

Melanoma is one of the most severe forms of skin cancer and can be life-threatening due to metastasis if not caught early on in its development. Over the past decade, the U.S. Government added a Healthy People 2020 objective to reduce the melanoma skin cancer rate in the U.S. population. Now that the decade has come to a close, this research investigates possible large-scale risk factors that could lead to incidence of melanoma in the population using logistic regression and propensity score matching. Logistic regression results showed that Caucasians are 14.765 times more likely to get melanoma compared to non-Caucasians; however, after adjustment using propensity scoring, this value was adjusted to 11.605 times more likely for Caucasians than non-Caucasians. Cholesterol, Chronic Obstructive Pulmonary Disease, and Hypertension predictors also showed significance in the initial logistic regression. By using the results found in this experiment, the door has been opened for further analysis of larger-scale predictors and gives public health programs the initial information needed to create successful skin safety advocacy plans.

ContributorsFalls, Nicole Elizabeth (Author) / Wilson, Jeffrey (Thesis director) / Dornelles, Adriana (Committee member) / School of International Letters and Cultures (Contributor) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Using Logistic Regression to Predict Stock Trends Based on Bag-of-Words Representations of News Article Headlines

Description

We attempted to apply a novel approach to stock market predictions. The Logistic Regression machine learning algorithm (Joseph Berkson) was applied to analyze news article headlines as represented by a bag-of-words (tri-gram and single-gram) representation in an attempt to predict the trends of stock prices based on the Dow Jones…

We attempted to apply a novel approach to stock market predictions. The Logistic Regression machine learning algorithm (Joseph Berkson) was applied to analyze news article headlines as represented by a bag-of-words (tri-gram and single-gram) representation in an attempt to predict the trends of stock prices based on the Dow Jones Industrial Average. The results showed that a tri-gram bag led to a 49% trend accuracy, a 1% increase when compared to the single-gram representation’s accuracy of 48%.

ContributorsBarolli, Adeiron (Author) / Jimenez Arista, Laura (Thesis director) / Wilson, Jeffrey (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Quantitative Assessment of NCAA Football Athlete Worth: A Study of Brand Potential and NIL Legislation

Description

Until the Supreme Court’s landmark decision in National Collegiate Athletics Association (NCAA) vs. Alston, student-athletes were not allowed to be compensated for the millions of dollars in revenue they generate for universities. While universities cannot directly pay student-athletes, student-athletes can now make money based off their name, image, and likeness…

Until the Supreme Court’s landmark decision in National Collegiate Athletics Association (NCAA) vs. Alston, student-athletes were not allowed to be compensated for the millions of dollars in revenue they generate for universities. While universities cannot directly pay student-athletes, student-athletes can now make money based off their name, image, and likeness (NIL). NIL legislation has the potential (and has begun to) change college recruiting with the transfer portal and free agency landscape. Now, schools can bake NIL connections into their recruiting pitch, creating a recruiting renaissance. This research is an empirical study to determine the factors that contribute to an athlete’s NIL valuation and earnings. A hierarchical mixed-model analysis run in SAS also is used to analyze the data. The significance of this study includes providing schools and athletes with vital information pertaining to their fiscal valuation during the recruiting process. The findings can help families and student athletes to better estimate expected NIL earnings.

ContributorsMercado, Erik (Author) / Wilson, Jeffrey (Thesis director) / McCreless, Tamuchin (Committee member) / Barrett, The Honors College (Contributor) / Department of Information Systems (Contributor) / Dean, W.P. Carey School of Business (Contributor) / Department of Economics (Contributor)

Created2023-05

Advances in Directional Goodness-of-fit Testing of Binary Data under Model Misspecification in Case of Sparseness

Description

Goodness-of-fit test is a hypothesis test used to test whether a given model fit the data well. It is extremely difficult to find a universal goodness-of-fit test that can test all types of statistical models. Moreover, traditional Pearson’s chi-square goodness-of-fit test is sometimes considered to be an omnibus test but…

Goodness-of-fit test is a hypothesis test used to test whether a given model fit the data well. It is extremely difficult to find a universal goodness-of-fit test that can test all types of statistical models. Moreover, traditional Pearson’s chi-square goodness-of-fit test is sometimes considered to be an omnibus test but not a directional test so it is hard to find the source of poor fit when the null hypothesis is rejected and it will lose its validity and effectiveness in some of the special conditions. Sparseness is such an abnormal condition. One effective way to overcome the adverse effects of sparseness is to use limited-information statistics. In this dissertation, two topics about constructing and using limited-information statistics to overcome sparseness for binary data will be included. In the first topic, the theoretical framework of pairwise concordance and the transformation matrix which is used to extract the corresponding marginals and their generalizations are provided. Then a series of new chi-square test statistics and corresponding orthogonal components are proposed, which are used to detect the model misspecification for longitudinal binary data. One of the important conclusions is, the test statistic $X^2_{2c}$ can be taken as an extension of $X^2_{[2]}$, the second-order marginals of traditional Pearson’s chi-square statistic. In the second topic, the research interest is to investigate the effect caused by different intercept patterns when using Lagrange multiplier (LM) test to find the source of misfit for two items in 2-PL IRT model. Several other directional chi-square test statistics are taken into comparison. The simulation results showed that the intercept pattern does affect the performance of goodness-of-fit test, especially the power to find the source of misfit if the source of misfit does exist. More specifically, the power is directly affected by the `intercept distance' between two misfit variables. Another discovery is, the LM test statistic has the best balance between the accurate Type I error rates and high empirical power, which indicates the LM test is a robust test.

ContributorsXu, Jinhui (Author) / Reiser, Mark (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / Zheng, Yi (Committee member) / Edwards, Michael (Committee member) / Arizona State University (Publisher)

Created2022

Comparison of Denominator Degrees of Freedom Approximations for Linear Mixed Models in Small-Sample Simulations

Description

Whilst linear mixed models offer a flexible approach to handle data with multiple sources of random variability, the related hypothesis testing for the fixed effects often encounters obstacles when the sample size is small and the underlying distribution for the test statistic is unknown. Consequently, five methods of denominator degrees…

Whilst linear mixed models offer a flexible approach to handle data with multiple sources of random variability, the related hypothesis testing for the fixed effects often encounters obstacles when the sample size is small and the underlying distribution for the test statistic is unknown. Consequently, five methods of denominator degrees of freedom approximations (residual, containment, between-within, Satterthwaite, Kenward-Roger) are developed to overcome this problem. This study aims to evaluate the performance of these five methods with a mixed model consisting of random intercept and random slope. Specifically, simulations are conducted to provide insights on the F-statistics, denominator degrees of freedom and p-values each method gives with respect to different settings of the sample structure, the fixed-effect slopes and the missing-data proportion. The simulation results show that the residual method performs the worst in terms of F-statistics and p-values. Also, Satterthwaite and Kenward-Roger methods tend to be more sensitive to the change of designs. The Kenward-Roger method performs the best in terms of F-statistics when the null hypothesis is true.

ContributorsHuang, Ping-Chieh (Author) / Reiser, Mark R. (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / Arizona State University (Publisher)

Created2020

A Study of Sun Devil Athletics’ Men’s Basketball with Information Related to Travel Partnership

Description

We created a sufficient database that can be used by the SDA for extensive analysis as well as a starting foundation for further development. The design of the database revolved around the men’s basketball team and includes data for conferences, teams, players, and the historic schedule of teams past performances.…

We created a sufficient database that can be used by the SDA for extensive analysis as well as a starting foundation for further development. The design of the database revolved around the men’s basketball team and includes data for conferences, teams, players, and the historic schedule of teams past performances. This design can be used as a template for future sports that would like to be added to the database. The queries we ran that tested the functionality of the database show the utility and accessibility that is possible with the data currently in the database. The visuals included assist our examples by exhibiting how the results gathered by the queries can be transformed into figures that may be more visually appealing than the raw data. We came up with example questions that could be potential questions the SDA may have regarding current and past performance statistics. We expect that as a continuation of this project, the SDA will be able to utilize it to their advantage to analyze and improve the performance levels of other teams.

ContributorsSundar, Mayuri (Co-author) / Adusei, Evans (Co-author) / Consalvo, Joshua (Co-author) / Saunders, Wyatt (Co-author) / Wilson, Zechariah (Co-author) / Moser, Kathleen (Thesis director) / Wilson, Jeffrey (Committee member) / School of Social Transformation (Contributor) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Data Analytics in College Sports: How Statistics Can be Used to Predict Sun Devil Success

Description

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents.…

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents. While this may sound nefarious, the vast amounts of data about these games and student-athletes can be used to glean insights about the sports themselves in order to help student-athletes be more successful. Data analytics can be used to make sense of the available data by creating models and using other tools available that can predict how student-athletes and their teams will do in the future based on the data gathered from how they have performed in the past. Colleges and universities across the country compete in a vast array of sports. As a result of these differences, the sports with the largest amounts of data available will be the more popular college sports, such as football, men’s and women’s basketball, baseball and softball. Arizona State University, as a member of the Pac-12 conference, has a storied athletic tradition and decades of history in all of these sports, providing a large amount of data that can be used to analyze student-athlete success in these sports and help predict future success. However, data is available from numerous other college athletic programs that could provide a much larger sample to help predict with greater accuracy why certain teams and student-athletes are more successful than others. The explosion of analytics across the sports world has resulted in a new focus on utilizing statistical techniques to improve all aspects of different sports. Sports science has influenced medical departments, and model-building has been used to determine optimal in-game strategy and predict the outcomes of future games based on team strength. It is this latter approach that has become the focus of this paper, with football being used as a subject due to its vast popularity and massive supply of easily accessible data.

ContributorsLindstrom, Trent (Author) / Schneider, Laurence (Thesis director) / Wilson, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Historical, Philosophical & Religious Studies, Sch (Contributor) / School of Politics and Global Studies (Contributor)

Created2022-05

Lindstrom Thesis (Spring 2022)

Description

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents.…

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents. While this may sound nefarious, the vast amounts of data about these games and student-athletes can be used to glean insights about the sports themselves in order to help student-athletes be more successful. Data analytics can be used to make sense of the available data by creating models and using other tools available that can predict how student-athletes and their teams will do in the future based on the data gathered from how they have performed in the past. Colleges and universities across the country compete in a vast array of sports. As a result of these differences, the sports with the largest amounts of data available will be the more popular college sports, such as football, men’s and women’s basketball, baseball and softball. Arizona State University, as a member of the Pac-12 conference, has a storied athletic tradition and decades of history in all of these sports, providing a large amount of data that can be used to analyze student-athlete success in these sports and help predict future success. However, data is available from numerous other college athletic programs that could provide a much larger sample to help predict with greater accuracy why certain teams and student-athletes are more successful than others. The explosion of analytics across the sports world has resulted in a new focus on utilizing statistical techniques to improve all aspects of different sports. Sports science has influenced medical departments, and model-building has been used to determine optimal in-game strategy and predict the outcomes of future games based on team strength. It is this latter approach that has become the focus of this paper, with football being used as a subject due to its vast popularity and massive supply of easily accessible data.

ContributorsLindstrom, Trent (Author) / Schneider, Laurence (Thesis director) / Wilson, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05

Data Analytics in College Sports: How Statistics Can be Used to Predict Sun Devil Success

Description

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents.…

College athletics are a multi-billion dollar industry featuring hard-working student-athletes competing at a high level for national championships across a variety of different sports. Across the college sports landscape, coaches and players are always seeking an edge they can gain in order to obtain a competitive advantage over their opponents. While this may sound nefarious, the vast amounts of data about these games and student-athletes can be used to glean insights about the sports themselves in order to help student-athletes be more successful. Data analytics can be used to make sense of the available data by creating models and using other tools available that can predict how student-athletes and their teams will do in the future based on the data gathered from how they have performed in the past. Colleges and universities across the country compete in a vast array of sports. As a result of these differences, the sports with the largest amounts of data available will be the more popular college sports, such as football, men’s and women’s basketball, baseball and softball. Arizona State University, as a member of the Pac-12 conference, has a storied athletic tradition and decades of history in all of these sports, providing a large amount of data that can be used to analyze student-athlete success in these sports and help predict future success. However, data is available from numerous other college athletic programs that could provide a much larger sample to help predict with greater accuracy why certain teams and student-athletes are more successful than others. The explosion of analytics across the sports world has resulted in a new focus on utilizing statistical techniques to improve all aspects of different sports. Sports science has influenced medical departments, and model-building has been used to determine optimal in-game strategy and predict the outcomes of future games based on team strength. It is this latter approach that has become the focus of this paper, with football being used as a subject due to its vast popularity and massive supply of easily accessible data.

ContributorsLindstrom, Trent (Author) / Schneider, Laurence (Thesis director) / Wilson, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05