Matching Items (44)

Filtering by

Clear all filters

134937-Thumbnail Image.png

The Value Added of the ASU Spirit Squad to Sun Devil Athletics

Description

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has been the feeling that collegiate and professional cheerleaders are not given the appropriate recognition nor credit for the amount of work they do. This contribution is sometimes in question as it depends on the school and the sports teams. The benefits are believed to vary based on the university or professional teams. This research investigated how collegiate cheerleaders and dancers add value to the university sport experience. We interviewed key personnel at the university and conference level and polled spectators at sporting events such as basketball and football. We found that the university administration and athletic personnel see the ASU Spirit Squad as value added but spectators had a totally different perspective. The university acknowledges the added value of the Spirit Squad and its necessity. Spectators attend ASU sporting events to support the university and for the entertainment. They enjoy watching the ASU Spirit Squad perform but would continue to attend ASU sporting events even if cheerleaders and dancers were not there.

Contributors

Created

Date Created
2017-05

151976-Thumbnail Image.png

Testing independence of parallel pseudorandom number streams: incorporating the data's multivariate nature

Description

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of stream dependence can be missed by each approach because neither one fully addresses the multivariate nature of the accumulated data when generators are run in parallel. This dissertation identifies these potential faults in the parallel testing methodologies of TestU01 and investigates two different methods to better detect inter-stream dependencies: correlation motivated multivariate tests and vector time series based tests. These methods have been implemented in an extension to TestU01 built in C++ and the unique aspects of this extension are discussed. A variety of different generation scenarios are then examined using the TestU01 suite in concert with the extension. This enhanced software package is found to better detect certain forms of inter-stream dependencies than the original TestU01 suites of tests.

Contributors

Agent

Created

Date Created
2013

152220-Thumbnail Image.png

A continuous latent factor model for non-ignorable missing data in longitudinal studies

Description

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases.

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias in results if the missing mechanism is not taken into account. In other words, the missing mechanism is related to the unobserved responses. Data are said to be non-ignorable missing if the probabilities of missingness depend on quantities that might not be included in the model. Classical pattern-mixture models for non-ignorable missing values are widely used for longitudinal data analysis because they do not require explicit specification of the missing mechanism, with the data stratified according to a variety of missing patterns and a model specified for each stratum. However, this usually results in under-identifiability, because of the need to estimate many stratum-specific parameters even though the eventual interest is usually on the marginal parameters. Pattern mixture models have the drawback that a large sample is usually required. In this thesis, two studies are presented. The first study is motivated by an open problem from pattern mixture models. Simulation studies from this part show that information in the missing data indicators can be well summarized by a simple continuous latent structure, indicating that a large number of missing data patterns may be accounted by a simple latent factor. Simulation findings that are obtained in the first study lead to a novel model, a continuous latent factor model (CLFM). The second study develops CLFM which is utilized for modeling the joint distribution of missing values and longitudinal outcomes. The proposed CLFM model is feasible even for small sample size applications. The detailed estimation theory, including estimating techniques from both frequentist and Bayesian perspectives is presented. Model performance and evaluation are studied through designed simulations and three applications. Simulation and application settings change from correctly-specified missing data mechanism to mis-specified mechanism and include different sample sizes from longitudinal studies. Among three applications, an AIDS study includes non-ignorable missing values; the Peabody Picture Vocabulary Test data have no indication on missing data mechanism and it will be applied to a sensitivity analysis; the Growth of Language and Early Literacy Skills in Preschoolers with Developmental Speech and Language Impairment study, however, has full complete data and will be used to conduct a robust analysis. The CLFM model is shown to provide more precise estimators, specifically on intercept and slope related parameters, compared with Roy's latent class model and the classic linear mixed model. This advantage will be more obvious when a small sample size is the case, where Roy's model experiences challenges on estimation convergence. The proposed CLFM model is also robust when missing data are ignorable as demonstrated through a study on Growth of Language and Early Literacy Skills in Preschoolers.

Contributors

Agent

Created

Date Created
2013

152189-Thumbnail Image.png

Alternative methods via random forest to identify interactions in a general framework and variable importance in the context of value-added models

Description

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement.

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions.

Contributors

Agent

Created

Date Created
2013

154040-Thumbnail Image.png

Three-level multiple imputation: a fully conditional specification approach

Description

Currently, there is a clear gap in the missing data literature for three-level models.

To date, the literature has only focused on the theoretical and algorithmic work

required to implement three-level imputation using the joint model (JM) method of

imputation, leaving relatively no

Currently, there is a clear gap in the missing data literature for three-level models.

To date, the literature has only focused on the theoretical and algorithmic work

required to implement three-level imputation using the joint model (JM) method of

imputation, leaving relatively no work done on fully conditional specication (FCS)

method. Moreover, the literature lacks any methodological evaluation of three-level

imputation. Thus, this thesis serves two purposes: (1) to develop an algorithm in

order to implement FCS in the context of a three-level model and (2) to evaluate

both imputation methods. The simulation investigated a random intercept model

under both 20% and 40% missing data rates. The ndings of this thesis suggest

that the estimates for both JM and FCS were largely unbiased, gave good coverage,

and produced similar results. The sole exception for both methods was the slope for

the level-3 variable, which was modestly biased. The bias exhibited by the methods

could be due to the small number of clusters used. This nding suggests that future

research ought to investigate and establish clear recommendations for the number of

clusters required by these imputation methods. To conclude, this thesis serves as a

preliminary start in tackling a much larger issue and gap in the current missing data

literature.

Contributors

Agent

Created

Date Created
2015

154498-Thumbnail Image.png

The impact of partial measurement invariance on between-group comparisons of latent means for a second-order factor

Description

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated to be at one of the two levels or both levels for a second-order CFA model. The numbers and directions of differences in noninvariant loadings or intercepts were also manipulated, along with total sample size and effect size of the second-order factor mean difference. Data were analyzed using correct and incorrect specifications of noninvariant loadings and intercepts. Results summarized across the 5,000 replications in each condition included Type I error rates and powers for the chi-square difference test and the Wald test of the second-order factor mean difference, estimation bias and efficiency for this latent mean difference, and means of the standardized root mean square residual (SRMR) and the root mean square error of approximation (RMSEA).

When the model was correctly specified, no obvious estimation bias was observed; when the model was misspecified by constraining noninvariant loadings or intercepts to be equal, the latent mean difference was overestimated if the direction of the difference in loadings or intercepts of was consistent with the direction of the latent mean difference, and vice versa. Increasing the number of noninvariant loadings or intercepts resulted in larger estimation bias if these noninvariant loadings or intercepts were constrained to be equal. Power to detect the latent mean difference was influenced by estimation bias and the estimated variance of the difference in the second-order factor mean, in addition to sample size and effect size. Constraining more parameters to be equal between groups—even when unequal in the population—led to a decrease in the variance of the estimated latent mean difference, which increased power somewhat. Finally, RMSEA was very sensitive for detecting misspecification due to improper equality constraints in all conditions in the current scenario, including the nonzero latent mean difference, but SRMR did not increase as expected when noninvariant parameters were constrained.

Contributors

Agent

Created

Date Created
2016

153850-Thumbnail Image.png

A test and confidence set for comparing the location of quadratic growth curves

Description

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence region as well as the confidence interval for x and y-coordinates of the vertex using two methods, the gradient method and the delta method. Under some models, an indirect test on the location of the curve can be based on the intercept and slope parameters, but in other models, a direct test on the vertex is required. We present a quadratic-form statistic for a test of the null hypothesis that there is no shift in the location of the vertex in a linear mixed model. The statistic has an asymptotic chi-squared distribution. For 2nd degree polynomials of two independent samples, we present an approximate confidence region for the difference of vertices of two quadratic growth curves using the modified gradient method and delta method. Another chi-square test statistic is derived for a direct test on the vertex and is compared to an F test statistic for the indirect test. Power functions are derived for both the indirect F test and the direct chi-square test. We calculate the theoretical power and present a simulation study to investigate the power of the tests. We also present a simulation study to assess the influence of sample size, measurement occasions and nature of the random effects. The test statistics will be applied to the Tell Efficacy longitudinal study, in which sound identification scores and language protocol scores for children are modeled as quadratic growth curves for two independent groups, TELL and control curriculum. The interpretation of shift in the location of the vertices is also presented.

Contributors

Agent

Created

Date Created
2015

153860-Thumbnail Image.png

Threshold regression estimation via lasso, elastic-net, and lad-lasso: a simulation study with applications to urban traffic data

Description

Threshold regression is used to model regime switching dynamics where the effects of the explanatory variables in predicting the response variable depend on whether a certain threshold has been crossed. When regime-switching dynamics are present, new estimation problems arise related

Threshold regression is used to model regime switching dynamics where the effects of the explanatory variables in predicting the response variable depend on whether a certain threshold has been crossed. When regime-switching dynamics are present, new estimation problems arise related to estimating the value of the threshold. Conventional methods utilize an iterative search procedure, seeking to minimize the sum of squares criterion. However, when unnecessary variables are included in the model or certain variables drop out of the model depending on the regime, this method may have high variability. This paper proposes Lasso-type methods as an alternative to ordinary least squares. By incorporating an L_{1} penalty term, Lasso methods perform variable selection, thus potentially reducing some of the variance in estimating the threshold parameter. This paper discusses the results of a study in which two different underlying model structures were simulated. The first is a regression model with correlated predictors, whereas the second is a self-exciting threshold autoregressive model. Finally the proposed Lasso-type methods are compared to conventional methods in an application to urban traffic data.

Contributors

Agent

Created

Date Created
2015

151992-Thumbnail Image.png

A comparison of DIMTEST and generalized dimensionality discrepancy approaches to assessing dimensionality in item response theory

Description

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools; see e.g., Froelich & Habing, 2007). It remains to be seen how such procedures perform in the context of small-scale assessments characterized by relatively small sample sizes and/or short tests. The fact that some procedures come with minimum allowable values for characteristics of the data, such as the number of items, may even render them unusable for some small-scale assessments. Other measures designed to assess dimensionality do not come with such limitations and, as such, may perform better under conditions that do not lend themselves to evaluation via statistics that rely on asymptotic theory. The current work aimed to evaluate the performance of one such metric, the standardized generalized dimensionality discrepancy measure (SGDDM; Levy & Svetina, 2011; Levy, Xu, Yel, & Svetina, 2012), under both large- and small-scale testing conditions. A Monte Carlo study was conducted to compare the performance of DIMTEST and the SGDDM statistic in terms of evaluating assumptions of unidimensionality in item response data under a variety of conditions, with an emphasis on the examination of these procedures in small-scale assessments. Similar to previous research, increases in either test length or sample size resulted in increased power. The DIMTEST procedure appeared to be a conservative test of the null hypothesis of unidimensionality. The SGDDM statistic exhibited rejection rates near the nominal rate of .05 under unidimensional conditions, though the reliability of these results may have been less than optimal due to high sampling variability resulting from a relatively limited number of replications. Power values were at or near 1.0 for many of the multidimensional conditions. It was only when the sample size was reduced to N = 100 that the two approaches diverged in performance. Results suggested that both procedures may be appropriate for sample sizes as low as N = 250 and tests as short as J = 12 (SGDDM) or J = 19 (DIMTEST). When used as a diagnostic tool, SGDDM may be appropriate with as few as N = 100 cases combined with J = 12 items. The study was somewhat limited in that it did not include any complex factorial designs, nor were the strength of item discrimination parameters or correlation between factors manipulated. It is recommended that further research be conducted with the inclusion of these factors, as well as an increase in the number of replications when using the SGDDM procedure.

Contributors

Agent

Created

Date Created
2013

150618-Thumbnail Image.png

Regression analysis of grouped counts and frequencies using the generalized linear model

Description

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, and confidence interval (CI) coverage rates for these models under different conditions. Mean structure, variance structure, effect size, continuous or binary predictor, and sample size were included in the factorial design. Mean structures reflected either a linear relationship or an exponential relationship between the predictor and the outcome. Variance structures reflected homoscedastic (as in linear regression), heteroscedastic (monotonically increasing) or heteroscedastic (increasing then decreasing) variance. Small to medium, large, and very large effect sizes were examined. Sample sizes were 100, 200, 500, and 1000. Results of the simulation study showed that ordinal logistic regression produced type I error, statistical power, and CI coverage rates that were consistently within acceptable limits. Linear regression produced type I error and statistical power that were within acceptable limits, but CI coverage was too low for several conditions important to the analysis of counts and frequencies. Poisson regression and beta regression displayed inflated type I error, low statistical power, and low CI coverage rates for nearly all conditions. All models produced unbiased estimates of the regression coefficient. Based on the statistical performance of the four models, ordinal logistic regression seems to be the preferred method for analyzing GCGF outcomes. Linear regression also performed well, but CI coverage was too low for conditions with an exponential mean structure and/or heteroscedastic variance. Some aspects of model prediction, such as model fit, were not assessed here; more research is necessary to determine which statistical model best captures the unique properties of GCGF outcomes.

Contributors

Agent

Created

Date Created
2012