Matching Items (18)

Filtering by

Clear all filters

134937-Thumbnail Image.png

The Value Added of the ASU Spirit Squad to Sun Devil Athletics

Description

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has been the feeling that collegiate and professional cheerleaders are not given the appropriate recognition nor credit for the amount of work they do. This contribution is sometimes in question as it depends on the school and the sports teams. The benefits are believed to vary based on the university or professional teams. This research investigated how collegiate cheerleaders and dancers add value to the university sport experience. We interviewed key personnel at the university and conference level and polled spectators at sporting events such as basketball and football. We found that the university administration and athletic personnel see the ASU Spirit Squad as value added but spectators had a totally different perspective. The university acknowledges the added value of the Spirit Squad and its necessity. Spectators attend ASU sporting events to support the university and for the entertainment. They enjoy watching the ASU Spirit Squad perform but would continue to attend ASU sporting events even if cheerleaders and dancers were not there.

Contributors

Created

Date Created
2017-05

151976-Thumbnail Image.png

Testing independence of parallel pseudorandom number streams: incorporating the data's multivariate nature

Description

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of stream dependence can be missed by each approach because neither one fully addresses the multivariate nature of the accumulated data when generators are run in parallel. This dissertation identifies these potential faults in the parallel testing methodologies of TestU01 and investigates two different methods to better detect inter-stream dependencies: correlation motivated multivariate tests and vector time series based tests. These methods have been implemented in an extension to TestU01 built in C++ and the unique aspects of this extension are discussed. A variety of different generation scenarios are then examined using the TestU01 suite in concert with the extension. This enhanced software package is found to better detect certain forms of inter-stream dependencies than the original TestU01 suites of tests.

Contributors

Agent

Created

Date Created
2013

152189-Thumbnail Image.png

Alternative methods via random forest to identify interactions in a general framework and variable importance in the context of value-added models

Description

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement.

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions.

Contributors

Agent

Created

Date Created
2013

154040-Thumbnail Image.png

Three-level multiple imputation: a fully conditional specification approach

Description

Currently, there is a clear gap in the missing data literature for three-level models.

To date, the literature has only focused on the theoretical and algorithmic work

required to implement three-level imputation using the joint model (JM) method of

imputation, leaving relatively no

Currently, there is a clear gap in the missing data literature for three-level models.

To date, the literature has only focused on the theoretical and algorithmic work

required to implement three-level imputation using the joint model (JM) method of

imputation, leaving relatively no work done on fully conditional specication (FCS)

method. Moreover, the literature lacks any methodological evaluation of three-level

imputation. Thus, this thesis serves two purposes: (1) to develop an algorithm in

order to implement FCS in the context of a three-level model and (2) to evaluate

both imputation methods. The simulation investigated a random intercept model

under both 20% and 40% missing data rates. The ndings of this thesis suggest

that the estimates for both JM and FCS were largely unbiased, gave good coverage,

and produced similar results. The sole exception for both methods was the slope for

the level-3 variable, which was modestly biased. The bias exhibited by the methods

could be due to the small number of clusters used. This nding suggests that future

research ought to investigate and establish clear recommendations for the number of

clusters required by these imputation methods. To conclude, this thesis serves as a

preliminary start in tackling a much larger issue and gap in the current missing data

literature.

Contributors

Agent

Created

Date Created
2015

154498-Thumbnail Image.png

The impact of partial measurement invariance on between-group comparisons of latent means for a second-order factor

Description

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated to be at one of the two levels or both levels for a second-order CFA model. The numbers and directions of differences in noninvariant loadings or intercepts were also manipulated, along with total sample size and effect size of the second-order factor mean difference. Data were analyzed using correct and incorrect specifications of noninvariant loadings and intercepts. Results summarized across the 5,000 replications in each condition included Type I error rates and powers for the chi-square difference test and the Wald test of the second-order factor mean difference, estimation bias and efficiency for this latent mean difference, and means of the standardized root mean square residual (SRMR) and the root mean square error of approximation (RMSEA).

When the model was correctly specified, no obvious estimation bias was observed; when the model was misspecified by constraining noninvariant loadings or intercepts to be equal, the latent mean difference was overestimated if the direction of the difference in loadings or intercepts of was consistent with the direction of the latent mean difference, and vice versa. Increasing the number of noninvariant loadings or intercepts resulted in larger estimation bias if these noninvariant loadings or intercepts were constrained to be equal. Power to detect the latent mean difference was influenced by estimation bias and the estimated variance of the difference in the second-order factor mean, in addition to sample size and effect size. Constraining more parameters to be equal between groups—even when unequal in the population—led to a decrease in the variance of the estimated latent mean difference, which increased power somewhat. Finally, RMSEA was very sensitive for detecting misspecification due to improper equality constraints in all conditions in the current scenario, including the nonzero latent mean difference, but SRMR did not increase as expected when noninvariant parameters were constrained.

Contributors

Agent

Created

Date Created
2016

151128-Thumbnail Image.png

Some topics concerning the singular value decomposition and generalized singular value decomposition

Description

This dissertation involves three problems that are all related by the use of the singular value decomposition (SVD) or generalized singular value decomposition (GSVD). The specific problems are (i) derivation of a generalized singular value expansion (GSVE), (ii) analysis of

This dissertation involves three problems that are all related by the use of the singular value decomposition (SVD) or generalized singular value decomposition (GSVD). The specific problems are (i) derivation of a generalized singular value expansion (GSVE), (ii) analysis of the properties of the chi-squared method for regularization parameter selection in the case of nonnormal data and (iii) formulation of a partial canonical correlation concept for continuous time stochastic processes. The finite dimensional SVD has an infinite dimensional generalization to compact operators. However, the form of the finite dimensional GSVD developed in, e.g., Van Loan does not extend directly to infinite dimensions as a result of a key step in the proof that is specific to the matrix case. Thus, the first problem of interest is to find an infinite dimensional version of the GSVD. One such GSVE for compact operators on separable Hilbert spaces is developed. The second problem concerns regularization parameter estimation. The chi-squared method for nonnormal data is considered. A form of the optimized regularization criterion that pertains to measured data or signals with nonnormal noise is derived. Large sample theory for phi-mixing processes is used to derive a central limit theorem for the chi-squared criterion that holds under certain conditions. Departures from normality are seen to manifest in the need for a possibly different scale factor in normalization rather than what would be used under the assumption of normality. The consequences of our large sample work are illustrated by empirical experiments. For the third problem, a new approach is examined for studying the relationships between a collection of functional random variables. The idea is based on the work of Sunder that provides mappings to connect the elements of algebraic and orthogonal direct sums of subspaces in a Hilbert space. When combined with a key isometry associated with a particular Hilbert space indexed stochastic process, this leads to a useful formulation for situations that involve the study of several second order processes. In particular, using our approach with two processes provides an independent derivation of the functional canonical correlation analysis (CCA) results of Eubank and Hsing. For more than two processes, a rigorous derivation of the functional partial canonical correlation analysis (PCCA) concept that applies to both finite and infinite dimensional settings is obtained.

Contributors

Agent

Created

Date Created
2012

151992-Thumbnail Image.png

A comparison of DIMTEST and generalized dimensionality discrepancy approaches to assessing dimensionality in item response theory

Description

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools; see e.g., Froelich & Habing, 2007). It remains to be seen how such procedures perform in the context of small-scale assessments characterized by relatively small sample sizes and/or short tests. The fact that some procedures come with minimum allowable values for characteristics of the data, such as the number of items, may even render them unusable for some small-scale assessments. Other measures designed to assess dimensionality do not come with such limitations and, as such, may perform better under conditions that do not lend themselves to evaluation via statistics that rely on asymptotic theory. The current work aimed to evaluate the performance of one such metric, the standardized generalized dimensionality discrepancy measure (SGDDM; Levy & Svetina, 2011; Levy, Xu, Yel, & Svetina, 2012), under both large- and small-scale testing conditions. A Monte Carlo study was conducted to compare the performance of DIMTEST and the SGDDM statistic in terms of evaluating assumptions of unidimensionality in item response data under a variety of conditions, with an emphasis on the examination of these procedures in small-scale assessments. Similar to previous research, increases in either test length or sample size resulted in increased power. The DIMTEST procedure appeared to be a conservative test of the null hypothesis of unidimensionality. The SGDDM statistic exhibited rejection rates near the nominal rate of .05 under unidimensional conditions, though the reliability of these results may have been less than optimal due to high sampling variability resulting from a relatively limited number of replications. Power values were at or near 1.0 for many of the multidimensional conditions. It was only when the sample size was reduced to N = 100 that the two approaches diverged in performance. Results suggested that both procedures may be appropriate for sample sizes as low as N = 250 and tests as short as J = 12 (SGDDM) or J = 19 (DIMTEST). When used as a diagnostic tool, SGDDM may be appropriate with as few as N = 100 cases combined with J = 12 items. The study was somewhat limited in that it did not include any complex factorial designs, nor were the strength of item discrimination parameters or correlation between factors manipulated. It is recommended that further research be conducted with the inclusion of these factors, as well as an increase in the number of replications when using the SGDDM procedure.

Contributors

Agent

Created

Date Created
2013

150996-Thumbnail Image.png

Multivariate generalization of reduced major axis regression

Description

A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have

A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis (RMA) regression and is often used instead of Ordinary Least Squares (OLS) regression. Results for confidence intervals, hypothesis testing and asymptotic distributions of coefficient estimates in the bivariate case are reviewed. A generalization of RMA to more than two variables for fitting a plane to data is obtained by minimizing the sum of a function of the volumes obtained by drawing, from each data point, lines parallel to each coordinate axis to the fitted plane (Draper and Yang 1997; Goodman and Tofallis 2003). Generalized RMA results for the multivariate case obtained by Draper and Yang (1997) are reviewed and some investigations of multivariate RMA are given. A linear model is proposed that does not specify a dependent variable and allows for errors in the measurement of each variable. Coefficients in the model are estimated by minimization of the function of the volumes previously mentioned. Methods for obtaining coefficient estimates are discussed and simulations are used to investigate the distribution of coefficient estimates. The effects of sample size, sampling error and correlation among variables on the estimates are studied. Bootstrap methods are used to obtain confidence intervals for model coefficients. Residual analysis is considered for assessing model assumptions. Outlier and influential case diagnostics are developed and a forward selection method is proposed for subset selection of model variables. A real data example is provided that uses the methods developed. Topics for further research are discussed.

Contributors

Agent

Created

Date Created
2012

149443-Thumbnail Image.png

Public health surveillance in high-dimensions with supervised learning

Description

Public health surveillance is a special case of the general problem where counts (or rates) of events are monitored for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates.

Public health surveillance is a special case of the general problem where counts (or rates) of events are monitored for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods are typically limited to spatial and/or temporal covariate information and often fail to use all the information available in modern data that can be paramount in unveiling these subtle changes. Additional complexities associated with modern health data that are often not accounted for by traditional methods include: covariates of mixed type, missing values, and high-order interactions among covariates. This work proposes a transform of public health surveillance to supervised learning, so that an appropriate learner can inherently address all the complexities described previously. At the same time, quantitative measures from the learner can be used to define signal criteria to detect changes in rates of events. A Feature Selection (FS) method is used to identify covariates that contribute to a model and to generate a signal. A measure of statistical significance is included to control false alarms. An alternative Percentile method identifies the specific cases that lead to changes using class probability estimates from tree-based ensembles. This second method is intended to be less computationally intensive and significantly simpler to implement. Finally, a third method labeled Rule-Based Feature Value Selection (RBFVS) is proposed for identifying the specific regions in high-dimensional space where the changes are occurring. Results on simulated examples are used to compare the FS method and the Percentile method. Note this work emphasizes the application of the proposed methods on public health surveillance. Nonetheless, these methods can easily be extended to a variety of applications where counts (or rates) of events are monitored for changes. Such problems commonly occur in domains such as manufacturing, economics, environmental systems, engineering, as well as in public health.

Contributors

Agent

Created

Date Created
2010

150135-Thumbnail Image.png

Chi-square orthogonal components for assessing goodness-of-fit of multidimensional multinomial data

Description

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components are used to develop goodness-of-fit tests for models fit to the counts obtained from the cross-classification of multi-category dependent variables. Ordinal categories are assumed. Orthogonal components defined on marginals are obtained when analyzing multi-dimensional contingency tables through the use of the QR decomposition. A subset of these orthogonal components can be used to construct limited-information tests that allow one to identify the source of lack-of-fit and provide an increase in power compared to Pearson's test. These tests can address the adverse effects presented when data are sparse. The tests rely on the set of first- and second-order marginals jointly, the set of second-order marginals only, and the random forest method, a popular algorithm for modeling large complex data sets. The performance of these tests is compared to the likelihood ratio test as well as to tests based on orthogonal polynomial components. The derived goodness-of-fit tests are evaluated with studies for detecting two- and three-way associations that are not accounted for by a categorical variable factor model with a single latent variable. In addition the tests are used to investigate the case when the model misspecification involves parameter constraints for large and sparse contingency tables. The methodology proposed here is applied to data from the 38th round of the State Survey conducted by the Institute for Public Policy and Michigan State University Social Research (2005) . The results illustrate the use of the proposed techniques in the context of a sparse data set.

Contributors

Agent

Created

Date Created
2011