Search Content

A continuous latent factor model for non-ignorable missing data in longitudinal studies

Description

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment…

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias in results if the missing mechanism is not taken into account. In other words, the missing mechanism is related to the unobserved responses. Data are said to be non-ignorable missing if the probabilities of missingness depend on quantities that might not be included in the model. Classical pattern-mixture models for non-ignorable missing values are widely used for longitudinal data analysis because they do not require explicit specification of the missing mechanism, with the data stratified according to a variety of missing patterns and a model specified for each stratum. However, this usually results in under-identifiability, because of the need to estimate many stratum-specific parameters even though the eventual interest is usually on the marginal parameters. Pattern mixture models have the drawback that a large sample is usually required. In this thesis, two studies are presented. The first study is motivated by an open problem from pattern mixture models. Simulation studies from this part show that information in the missing data indicators can be well summarized by a simple continuous latent structure, indicating that a large number of missing data patterns may be accounted by a simple latent factor. Simulation findings that are obtained in the first study lead to a novel model, a continuous latent factor model (CLFM). The second study develops CLFM which is utilized for modeling the joint distribution of missing values and longitudinal outcomes. The proposed CLFM model is feasible even for small sample size applications. The detailed estimation theory, including estimating techniques from both frequentist and Bayesian perspectives is presented. Model performance and evaluation are studied through designed simulations and three applications. Simulation and application settings change from correctly-specified missing data mechanism to mis-specified mechanism and include different sample sizes from longitudinal studies. Among three applications, an AIDS study includes non-ignorable missing values; the Peabody Picture Vocabulary Test data have no indication on missing data mechanism and it will be applied to a sensitivity analysis; the Growth of Language and Early Literacy Skills in Preschoolers with Developmental Speech and Language Impairment study, however, has full complete data and will be used to conduct a robust analysis. The CLFM model is shown to provide more precise estimators, specifically on intercept and slope related parameters, compared with Roy's latent class model and the classic linear mixed model. This advantage will be more obvious when a small sample size is the case, where Roy's model experiences challenges on estimation convergence. The proposed CLFM model is also robust when missing data are ignorable as demonstrated through a study on Growth of Language and Early Literacy Skills in Preschoolers.

ContributorsZhang, Jun (Author) / Reiser, Mark R. (Thesis advisor) / Barber, Jarrett (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / St Louis, Robert D. (Committee member) / Arizona State University (Publisher)

Created2013

Chi-square orthogonal components for assessing goodness-of-fit of multidimensional multinomial data

Description

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among…

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components are used to develop goodness-of-fit tests for models fit to the counts obtained from the cross-classification of multi-category dependent variables. Ordinal categories are assumed. Orthogonal components defined on marginals are obtained when analyzing multi-dimensional contingency tables through the use of the QR decomposition. A subset of these orthogonal components can be used to construct limited-information tests that allow one to identify the source of lack-of-fit and provide an increase in power compared to Pearson's test. These tests can address the adverse effects presented when data are sparse. The tests rely on the set of first- and second-order marginals jointly, the set of second-order marginals only, and the random forest method, a popular algorithm for modeling large complex data sets. The performance of these tests is compared to the likelihood ratio test as well as to tests based on orthogonal polynomial components. The derived goodness-of-fit tests are evaluated with studies for detecting two- and three-way associations that are not accounted for by a categorical variable factor model with a single latent variable. In addition the tests are used to investigate the case when the model misspecification involves parameter constraints for large and sparse contingency tables. The methodology proposed here is applied to data from the 38th round of the State Survey conducted by the Institute for Public Policy and Michigan State University Social Research (2005) . The results illustrate the use of the proposed techniques in the context of a sparse data set.

ContributorsMilovanovic, Jelena (Author) / Young, Dennis (Thesis advisor) / Reiser, Mark R. (Thesis advisor) / Wilson, Jeffrey (Committee member) / Eubank, Randall (Committee member) / Yang, Yan (Committee member) / Arizona State University (Publisher)

Created2011

A study of components of Pearson's chi-square based on marginal distributions of cross-classified tables for binary variables

Description

The Pearson and likelihood ratio statistics are well-known in goodness-of-fit testing and are commonly used for models applied to multinomial count data. When data are from a table formed by the cross-classification of a large number of variables, these goodness-of-fit statistics may have lower power and inaccurate Type I error…

The Pearson and likelihood ratio statistics are well-known in goodness-of-fit testing and are commonly used for models applied to multinomial count data. When data are from a table formed by the cross-classification of a large number of variables, these goodness-of-fit statistics may have lower power and inaccurate Type I error rate due to sparseness. Pearson's statistic can be decomposed into orthogonal components associated with the marginal distributions of observed variables, and an omnibus fit statistic can be obtained as a sum of these components. When the statistic is a sum of components for lower-order marginals, it has good performance for Type I error rate and statistical power even when applied to a sparse table. In this dissertation, goodness-of-fit statistics using orthogonal components based on second- third- and fourth-order marginals were examined. If lack-of-fit is present in higher-order marginals, then a test that incorporates the higher-order marginals may have a higher power than a test that incorporates only first- and/or second-order marginals. To this end, two new statistics based on the orthogonal components of Pearson's chi-square that incorporate third- and fourth-order marginals were developed, and the Type I error, empirical power, and asymptotic power under different sparseness conditions were investigated. Individual orthogonal components as test statistics to identify lack-of-fit were also studied. The performance of individual orthogonal components to other popular lack-of-fit statistics were also compared. When the number of manifest variables becomes larger than 20, most of the statistics based on marginal distributions have limitations in terms of computer resources and CPU time. Under this problem, when the number manifest variables is larger than or equal to 20, the performance of a bootstrap based method to obtain p-values for Pearson-Fisher statistic, fit to confirmatory dichotomous variable factor analysis model, and the performance of Tollenaar and Mooijaart (2003) statistic were investigated.

ContributorsDassanayake, Mudiyanselage Maduranga Kasun (Author) / Reiser, Mark R. (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / St. Louis, Robert (Committee member) / Kamarianakis, Ioannis (Committee member) / Arizona State University (Publisher)

Created2018

Prioritizing Projects on Time and Cost Savings for a more Efficient Manufacturing Process of a Semiconductor Company

Description

Over the course of six months, we have worked in partnership with Arizona State University and a leading producer of semiconductor chips in the United States market (referred to as the "Company"), lending our skills in finance, statistics, model building, and external insight. We attempt to design models that hel…

Over the course of six months, we have worked in partnership with Arizona State University and a leading producer of semiconductor chips in the United States market (referred to as the "Company"), lending our skills in finance, statistics, model building, and external insight. We attempt to design models that help predict how much time it takes to implement a cost-saving project. These projects had previously been considered only on the merit of cost savings, but with an added dimension of time, we hope to forecast time according to a number of variables. With such a forecast, we can then apply it to an expense project prioritization model which relates time and cost savings together, compares many different projects simultaneously, and returns a series of present value calculations over different ranges of time. The goal is twofold: assist with an accurate prediction of a project's time to implementation, and provide a basis to compare different projects based on their present values, ultimately helping to reduce the Company's manufacturing costs and improve gross margins. We believe this approach, and the research found toward this goal, is most valuable for the Company. Two coaches from the Company have provided assistance and clarified our questions when necessary throughout our research. In this paper, we begin by defining the problem, setting an objective, and establishing a checklist to monitor our progress. Next, our attention shifts to the data: making observations, trimming the dataset, framing and scoping the variables to be used for the analysis portion of the paper. Before creating a hypothesis, we perform a preliminary statistical analysis of certain individual variables to enrich our variable selection process. After the hypothesis, we run multiple linear regressions with project duration as the dependent variable. After regression analysis and a test for robustness, we shift our focus to an intuitive model based on rules of thumb. We relate these models to an expense project prioritization tool developed using Microsoft Excel software. Our deliverables to the Company come in the form of (1) a rules of thumb intuitive model and (2) an expense project prioritization tool.

ContributorsAl-Assi, Hashim (Co-author) / Chiang, Robert (Co-author) / Liu, Andrew (Co-author) / Ludwick, David (Co-author) / Simonson, Mark (Thesis director) / Hertzel, Michael (Committee member) / Barrett, The Honors College (Contributor) / Department of Information Systems (Contributor) / Department of Finance (Contributor) / Department of Economics (Contributor) / Department of Supply Chain Management (Contributor) / School of Accountancy (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Mechanical and Aerospace Engineering Program (Contributor) / WPC Graduate Programs (Contributor)

Created2015-05

Statistical Properties of Coherent Structures in Two Dimensional Turbulence

Description

Coherent vortices are ubiquitous structures in natural flows that affect mixing and transport of substances and momentum/energy. Being able to detect these coherent structures is important for pollutant mitigation, ecological conservation and many other aspects. In recent years, mathematical criteria and algorithms have been developed to extract these coherent structures…

Coherent vortices are ubiquitous structures in natural flows that affect mixing and transport of substances and momentum/energy. Being able to detect these coherent structures is important for pollutant mitigation, ecological conservation and many other aspects. In recent years, mathematical criteria and algorithms have been developed to extract these coherent structures in turbulent flows. In this study, we will apply these tools to extract important coherent structures and analyze their statistical properties as well as their implications on kinematics and dynamics of the flow. Such information will aide representation of small-scale nonlinear processes that large-scale models of natural processes may not be able to resolve.

ContributorsCass, Brentlee Jerry (Author) / Tang, Wenbo (Thesis director) / Kostelich, Eric (Committee member) / Department of Information Systems (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Exploring the Relation Between NAV and Price of ETFs in Financial Markets

Description

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to achieve their functionality and this mechanism is designed to minimize the deviations that occur between the ETF’s listed price…

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to achieve their functionality and this mechanism is designed to minimize the deviations that occur between the ETF’s listed price and the net asset value of the ETF’s underlying assets. However while this does cause ETF deviations to be generally lower than their mutual fund counterparts, as our paper explores this process does not eliminate these deviations completely. This article builds off an earlier paper by Engle and Sarkar (2006) that investigates these properties of premiums (discounts) of ETFs from their fair market value. And looks to see if these premia have changed in the last 10 years. Our paper then diverges from the original and takes a deeper look into the standard deviations of these premia specifically.

Our findings show that over 70% of an ETFs standard deviation of premia can be explained through a linear combination consisting of two variables: a categorical (Domestic[US], Developed, Emerging) and a discrete variable (time-difference from US). This paper also finds that more traditional metrics such as market cap, ETF price volatility, and even 3rd party market indicators such as the economic freedom index and investment freedom index are insignificant predictors of an ETFs standard deviation of premia when combined with the categorical variable. These findings differ somewhat from existing literature which indicate that these factors should have a significant impact on the predictive ability of an ETFs standard deviation of premia.

ContributorsZhang, Jingbo (Co-author, Co-author) / Henning, Thomas (Co-author) / Simonson, Mark (Thesis director) / Licon, L. Wendell (Committee member) / Department of Finance (Contributor) / Department of Information Systems (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

A Fortune 100 Technology Company Collaborative Thesis: Third-Party Services \u2014 Optimizing Headcount

Description

The object of the present study is to examine methods in which the company can optimize their costs on third-party suppliers whom oversee other third-party trade labor. The third parties in scope of this study are suspected to overstaff their workforce, thus overcharging the company. We will introduce a complex…

The object of the present study is to examine methods in which the company can optimize their costs on third-party suppliers whom oversee other third-party trade labor. The third parties in scope of this study are suspected to overstaff their workforce, thus overcharging the company. We will introduce a complex spreadsheet model that will propose a proper project staffing level based on key qualitative variables and statistics. Using the model outputs, the Thesis team proposes a headcount solution for the company and problem areas to focus on, going forward. All sources of information come from company proprietary and confidential documents.

ContributorsLoo, Andrew (Co-author) / Brennan, Michael (Co-author) / Sheiner, Alexander (Co-author) / Hertzel, Michael (Thesis director) / Simonson, Mark (Committee member) / Barrett, The Honors College (Contributor) / Department of Information Systems (Contributor) / Department of Finance (Contributor) / Department of Supply Chain Management (Contributor) / WPC Graduate Programs (Contributor) / School of Accountancy (Contributor)

Created2014-05

Analytics of the Prospect Draft in Major League Baseball

Description

Our research encompassed the prospect draft in baseball and looked at what type of player teams drafted to maximize value. We wanted to know which position returned the best value to the team that drafted them, and which level is safer to draft players from, college or high school. We…

Our research encompassed the prospect draft in baseball and looked at what type of player teams drafted to maximize value. We wanted to know which position returned the best value to the team that drafted them, and which level is safer to draft players from, college or high school. We decided to look at draft data from 2006-2010 for the first ten rounds of players selected. Because there is only a monetary cap on players drafted in the first ten rounds we restricted our data to these players. Once we set up the parameters we compiled a spreadsheet of these players with both their signing bonuses and their wins above replacement (WAR). This allowed us to see how much a team was spending per win at the major league level. After the data was compiled we made pivot tables and graphs to visually represent our data and better understand the numbers. We found that the worst position that MLB teams could draft would be high school second baseman. They returned the lowest WAR of any player that we looked at. In general though high school players were more costly to sign and had lower WARs than their college counterparts making them, on average, a worse pick value wise. The best position you could pick was college shortstops. They had the trifecta of the best signability of all players, along with one of the highest WARs and lowest signing bonuses. These were three of the main factors that you want with your draft pick and they ranked near the top in all three categories. This research can help give guidelines to Major League teams as they go to select players in the draft. While there are always going to be exceptions to trends, by following the enclosed research teams can minimize risk in the draft.

ContributorsValentine, Robert (Co-author) / Johnson, Ben (Co-author) / Eaton, John (Thesis director) / Goegan, Brian (Committee member) / Department of Finance (Contributor) / Department of Economics (Contributor) / Department of Information Systems (Contributor) / School of Accountancy (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Valued Plate Appearance Index: Solving for the Contextual Error in Amateur Baseball Statistics

Description

Over the past several decades, analytics have become more and more prevalent in the game of baseball. Statistics are used in nearly every facet of the game. Each team develops its own processes, hoping to gain a competitive advantage over the rest of the league. One area of the game…

Over the past several decades, analytics have become more and more prevalent in the game of baseball. Statistics are used in nearly every facet of the game. Each team develops its own processes, hoping to gain a competitive advantage over the rest of the league. One area of the game that has struggled to produce definitive analytics is amateur scouting. This project seeks to resolve this problem through the creation of a new statistic, Valued Plate Appearance Index (VPI). The problem is identified through analysis that was performed to determine whether any correlation exists between performances at the country's top amateur baseball league, the Cape Cod League, and performances in Major League Baseball. After several stats were analyzed, almost no correlation was determined between the two. This essentially means that teams have no way to statistically analyze Cape Cod League performance and project future statistics. An inherent contextual error in these amateur statistics prevents them from correlating. The project seeks to close that contextual gap and create concrete, encompassing values to illustrate a player's offensive performance in the Cape League. To solve for this problem, data was collected from the 2017 CCBL season. In addition to VPI, Valued Plate Appearance Approach (VPA) and Valued Plate Appearance Result (VPR) were created to better depict a player's all-around performance in each plate appearance. VPA values the quality of a player's approach in each plate appearance. VPR values the quality of the contact result, excluding factors out of the hitter's control. This statistic isolates player performance as well as eliminates luck that cannot normally be taken into account. This paper results in the segmentation of players from the 2017 CCBL into four different groups, which project how they will perform as they transition into professional baseball. These groups and the creation of these statistics could be essential tools in the evaluation and projection of amateur players by Major League clubs for years to come.

ContributorsLothrop, Joseph Kent (Author) / Eaton, John (Thesis director) / McIntosh, Daniel (Committee member) / Department of Information Systems (Contributor) / Department of Marketing (Contributor) / Barrett, The Honors College (Contributor)

Created2017-12

The Value Added of the ASU Spirit Squad to Sun Devil Athletics

Description

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has been the feeling that collegiate and professional cheerleaders are not…

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has been the feeling that collegiate and professional cheerleaders are not given the appropriate recognition nor credit for the amount of work they do. This contribution is sometimes in question as it depends on the school and the sports teams. The benefits are believed to vary based on the university or professional teams. This research investigated how collegiate cheerleaders and dancers add value to the university sport experience. We interviewed key personnel at the university and conference level and polled spectators at sporting events such as basketball and football. We found that the university administration and athletic personnel see the ASU Spirit Squad as value added but spectators had a totally different perspective. The university acknowledges the added value of the Spirit Squad and its necessity. Spectators attend ASU sporting events to support the university and for the entertainment. They enjoy watching the ASU Spirit Squad perform but would continue to attend ASU sporting events even if cheerleaders and dancers were not there.

ContributorsThomas, Jessica Ann (Author) / Wilson, Jeffrey (Thesis director) / Garner, Deana (Committee member) / Department of Supply Chain Management (Contributor) / Department of Marketing (Contributor) / School of Community Resources and Development (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Filtering by