Search Content

Three essays on comparative simulation in three-level hierarchical data structure

Description

Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate the opportunity to have a joint likelihood when fitting hierarchical logistic regression models. Through conditional likelihood, inferences for the regression…

Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate the opportunity to have a joint likelihood when fitting hierarchical logistic regression models. Through conditional likelihood, inferences for the regression and covariance parameters as well as the intraclass correlation coefficients are usually obtained. In those cases, I have resorted to use of Laplace approximation and large sample theory approach for point and interval estimates such as Wald-type confidence intervals and profile likelihood confidence intervals. These methods rely on distributional assumptions and large sample theory. However, when dealing with small hierarchical datasets they often result in severe bias or non-convergence. I present a generalized quasi-likelihood approach and a generalized method of moments approach; both do not rely on any distributional assumptions but only moments of response. As an alternative to the typical large sample theory approach, I present bootstrapping hierarchical logistic regression models which provides more accurate interval estimates for small binary hierarchical data. These models substitute computations as an alternative to the traditional Wald-type and profile likelihood confidence intervals. I use a latent variable approach with a new split bootstrap method for estimating intraclass correlation coefficients when analyzing binary data obtained from a three-level hierarchical structure. It is especially useful with small sample size and easily expanded to multilevel. Comparisons are made to existing approaches through both theoretical justification and simulation studies. Further, I demonstrate my findings through an analysis of three numerical examples, one based on cancer in remission data, one related to the China’s antibiotic abuse study, and a third related to teacher effectiveness in schools from a state of southwest US.

ContributorsWang, Bei (Author) / Wilson, Jeffrey R (Thesis advisor) / Kamarianakis, Ioannis (Committee member) / Reiser, Mark R. (Committee member) / St Louis, Robert (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2017

Three essays on correlated binary outcomes: detection and appropriate models

Description

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association…

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association between the covariates and the outcomes across time. When multiple outcomes are of interest, association may exist between the various models. These correlations can lead to issues in model fitting and inference if not properly accounted for. This dissertation presents three papers discussing appropriate methods to properly consider different types of association. The first paper introduces an ANOVA based measure of intraclass correlation for three level hierarchical data with binary outcomes, and corresponding properties. This measure is useful for evaluating when the correlation due to clustering warrants a more complex model. This measure is used to investigate AIDS knowledge in a clustered study conducted in Bangladesh. The second paper develops the Partitioned generalized method of moments (Partitioned GMM) model for longitudinal studies. This model utilizes valid moment conditions to separately estimate the varying effects of each time-dependent covariate on the outcome over time using multiple coefficients. The model is fit to data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) to investigate risk factors of childhood obesity. In the third paper, the Partitioned GMM model is extended to jointly estimate regression models for multiple outcomes of interest. Thus, this approach takes into account both the correlation between the multivariate outcomes, as well as the correlation due to time-dependency in longitudinal studies. The model utilizes an expanded weight matrix and objective function composed of valid moment conditions to simultaneously estimate optimal regression coefficients. This approach is applied to Add Health data to simultaneously study drivers of outcomes including smoking, social alcohol usage, and obesity in children.

ContributorsIrimata, Kyle (Author) / Wilson, Jeffrey R (Thesis advisor) / Broatch, Jennifer (Committee member) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2018

Locally D-optimal designs for generalized linear models

Description

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained…

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained on a case-by-case basis, while in other situations, researchers also rely heavily on computational tools for design selection.

Three topics are investigated in this dissertation with each one focusing on one type of GLMs. Topic I considers GLMs with factorial effects and one continuous covariate. Factors can have interactions among each other and there is no restriction on the possible values of the continuous covariate. The locally D-optimal design structures for such models are identified and results for obtaining smaller optimal designs using orthogonal arrays (OAs) are presented. Topic II considers GLMs with multiple covariates under the assumptions that all but one covariate are bounded within specified intervals and interaction effects among those bounded covariates may also exist. An explicit formula for D-optimal designs is derived and OA-based smaller D-optimal designs for models with one or two two-factor interactions are also constructed. Topic III considers multiple-covariate logistic models. All covariates are nonnegative and there is no interaction among them. Two types of D-optimal design structures are identified and their global D-optimality is proved using the celebrated equivalence theorem.

ContributorsWang, Zhongsheng (Author) / Stufken, John (Thesis advisor) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2018

A study of components of Pearson's chi-square based on marginal distributions of cross-classified tables for binary variables

Description

The Pearson and likelihood ratio statistics are well-known in goodness-of-fit testing and are commonly used for models applied to multinomial count data. When data are from a table formed by the cross-classification of a large number of variables, these goodness-of-fit statistics may have lower power and inaccurate Type I error…

The Pearson and likelihood ratio statistics are well-known in goodness-of-fit testing and are commonly used for models applied to multinomial count data. When data are from a table formed by the cross-classification of a large number of variables, these goodness-of-fit statistics may have lower power and inaccurate Type I error rate due to sparseness. Pearson's statistic can be decomposed into orthogonal components associated with the marginal distributions of observed variables, and an omnibus fit statistic can be obtained as a sum of these components. When the statistic is a sum of components for lower-order marginals, it has good performance for Type I error rate and statistical power even when applied to a sparse table. In this dissertation, goodness-of-fit statistics using orthogonal components based on second- third- and fourth-order marginals were examined. If lack-of-fit is present in higher-order marginals, then a test that incorporates the higher-order marginals may have a higher power than a test that incorporates only first- and/or second-order marginals. To this end, two new statistics based on the orthogonal components of Pearson's chi-square that incorporate third- and fourth-order marginals were developed, and the Type I error, empirical power, and asymptotic power under different sparseness conditions were investigated. Individual orthogonal components as test statistics to identify lack-of-fit were also studied. The performance of individual orthogonal components to other popular lack-of-fit statistics were also compared. When the number of manifest variables becomes larger than 20, most of the statistics based on marginal distributions have limitations in terms of computer resources and CPU time. Under this problem, when the number manifest variables is larger than or equal to 20, the performance of a bootstrap based method to obtain p-values for Pearson-Fisher statistic, fit to confirmatory dichotomous variable factor analysis model, and the performance of Tollenaar and Mooijaart (2003) statistic were investigated.

ContributorsDassanayake, Mudiyanselage Maduranga Kasun (Author) / Reiser, Mark R. (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / St. Louis, Robert (Committee member) / Kamarianakis, Ioannis (Committee member) / Arizona State University (Publisher)

Created2018

Essays on the identification and modeling of variance

Description

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two…

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two variance parameters are estimated using generalized method of moments, negating the need for a distributional assumption. The mean-variance relation estimates are applied to clustered data and implemented in an adjusted generalized quasi-likelihood approach through an adjustment to the covariance matrix. In the presence of significant correlation in hierarchical structured data, the adjusted generalized quasi-likelihood model shows improved performance for random effect estimates. In addition, submodels to address deviation in skewness and kurtosis are provided to jointly model the mean, variance, skewness, and kurtosis. The additional models identify covariates influencing the third and fourth moments. A cutoff to trim the data is provided which improves parameter estimation and model fit. For each topic, findings are demonstrated through comprehensive simulation studies and numerical examples. Examples evaluated include data on children’s morbidity in the Philippines, adolescent health from the National Longitudinal Study of Adolescent to Adult Health, as well as proteomic assays for breast cancer screening.

ContributorsIrimata, Katherine E (Author) / Wilson, Jeffrey R (Thesis advisor) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Arizona State University (Publisher)

Created2018

Improving the Engineering Retention Rate: A tool to match students to their ideal field of study

Description

Engineering is a multidisciplinary field with a variety of applications. However, since there are so many disciplines of engineering, it is often challenging to find the discipline that best suits an individual interested in engineering. Not knowing which area of engineering most aligns to one’s interests is challenging when deciding…

Engineering is a multidisciplinary field with a variety of applications. However, since there are so many disciplines of engineering, it is often challenging to find the discipline that best suits an individual interested in engineering. Not knowing which area of engineering most aligns to one’s interests is challenging when deciding on a major and a career. With the development of the Engineering Interest Quiz (EIQ), the goal was to help individuals find the field of engineering that is most similar to their interests. Initially, an Engineering Faculty Survey (EFS) was created to gather information from engineering faculty at Arizona State University (ASU) and to determine keywords that describe each field of engineering. With this list of keywords, the EIQ was developed. Data from the EIQ compared the engineering students’ top three results for the best engineering discipline for them with their current engineering major of study. The data analysis showed that 70% of the respondents had their major listed as one of the top three results they were given and 30% of the respondents did not have their major listed. Of that 70%, 64% had their current major listed as the highest or tied for the highest percentage and 36% had their major listed as the second or third highest percentage. Furthermore, the EIQ data was compared between genders. Only 33% of the male students had their current major listed as their highest percentage, but 55% had their major as one of their top three results. Women had higher percentages with 63% listing their current major as their highest percentage and 81% listing it in the top three of their final results.

ContributorsWagner, Avery Rose (Co-author) / Lucca, Claudia (Co-author) / Taylor, David (Thesis director) / Miller, Cindy (Committee member) / Chemical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Multivariable Analysis for Irrigation with Gray Water, Impact of Turbidity and Organic Content in Gray Water on Bacterial Inactivation

Description

The impact of physical/chemical properties of gray water on microbial inactivation in gray water using chlorine was investigated through creating artificial gray water in lab, varying specific components, and then measuring microbial inactivation. Gray water was made through taking autoclaved nanopure water, and increasing the concentration of surfacants, the turbidity,…

The impact of physical/chemical properties of gray water on microbial inactivation in gray water using chlorine was investigated through creating artificial gray water in lab, varying specific components, and then measuring microbial inactivation. Gray water was made through taking autoclaved nanopure water, and increasing the concentration of surfacants, the turbidity, the concentration of organic content, and spiking E. coli grown in tryptic soy broth (TSB); chlorine was introduced using Clorox Disinfecting Bleach2. Bacteria was detected using tryptic soy agar (TSA), and E. coli was specifically detected using the selective media, brilliance. The log inactivation of bacteria detected using TSA was shown to be inversely related to the turbidity of the solution. Complete inactivation of E. coli concentrations between 104-105 CFU/100 ml in gray water with turbidities between 10-100 NTU, 0.1-0.5 mg/L of humic acid, and 0.1 ml of Dawn Ultra, was shown to occur, as detected by brilliance, at chlorine concentrations of 1-2 mg/L within 30 seconds. These result in concentration time (CT) values between 0.5-1 mg/L·min. Under the same gray water conditions, and an E. coli concentration of 104 CFU/100 ml and a chlorine concentration of 0.01 mg/L, complete inactivation was shown to occur in all trials within two minutes. These result in CT values ranging from 0.005 to 0.02. The turbidity and humic acid concentration were shown to be inversely related to the log inactivation and directly related to the CT value. This study shows that chlorination is a valid method of treatment of gray water for certain irrigation reuses.

ContributorsGreenberg, Samuel Gabe (Author) / Abbaszadegan, Morteza (Thesis director) / Schoepf, Jared (Committee member) / Alum, Absar (Committee member) / Chemical Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Genetic Engineering of Cyanobacteria to Improve Photosynthetic Yield

Description

Increasing energy and environmental problems describe the need to develop renewable chemicals and fuels. Global research has been targeting using microbial systems on a commercial scale for synthesis of valuable compounds. The goal of this project was to refactor and overexpress b6-f complex proteins in cyanobacteria to improve photosynthesis under…

Increasing energy and environmental problems describe the need to develop renewable chemicals and fuels. Global research has been targeting using microbial systems on a commercial scale for synthesis of valuable compounds. The goal of this project was to refactor and overexpress b6-f complex proteins in cyanobacteria to improve photosynthesis under dynamic light conditions. Improvement in the photosynthetic system can directly relate to higher yields of valuable compounds such as carotenoids and higher yields of biomass which can be used as energy molecules. Four engineered strains of cyanobacteria were successfully constructed and overexpressed the corresponding four large subunits in the cytochrome b6-f complex. No significant changes were found in cell growth or pigment titer in the modified strains compared to the wild type. The growth assay will be performed at higher and/or dynamic light intensities including natural light conditions for further analysis.

ContributorsNauroth, Benjamin (Author) / Varman, Arul (Thesis director) / Singharoy, Abhishek (Committee member) / Li, Han (Committee member) / Chemical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Contact Angle Measurement Reliability Through Washburn Method

Description

This research investigated deionized water contact angle measurement reliability with alumina powder using the Washburn method. This method relates the capillary rise of a liquid through a column of packed powder to the contact angle of the system. A reference liquid that is assumed to be perfectly wetting, such as…

This research investigated deionized water contact angle measurement reliability with alumina powder using the Washburn method. This method relates the capillary rise of a liquid through a column of packed powder to the contact angle of the system. A reference liquid that is assumed to be perfectly wetting, such as hexane due to the low surface energy, must be used to compare to the tested liquid. Consistency was hypothesized to be achieved with more powder structure and consistency of packing between reference and test trials. The three types of packing structures explored in this study were unstructured, visually-structured (user tapped), and machine-structured tapping. It was also hypothesized that similar contact angle results would be found for different packing methods of the same powder and liquid. However, the average contact angle for unstructured packing was found to be 32.9°, while the angle for the tapped structure was only 11.7°. This large deviation between types of packing shows that there are more inconsistencies with the use of this method than just the regulation of the packing structure. There were two similar glass chromatography columns used, but the second column experienced an unknown interference that led to a delay in the hexane uptake into the powder, which then led to invalid contact angle calculations. There was no discernible relationship between the packing structure and the standard deviation between trials, so the more structured packing does not seem to affect the consistency of results. It is recommended to perform more experiments on a single packing type with different apparatuses and a narrower particle size range.

ContributorsConvery, Brittany Alexis (Author) / Emady, Heather (Thesis director) / Vajrala, Spandana (Committee member) / Chemical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2017-12

Probabilistic Modeling and Regression Analysis of Experimental Data for Structural Systems

Description

The Experimental Data Processing (EDP) software is a C++ GUI-based application to streamline the process of creating a model for structural systems based on experimental data. EDP is designed to process raw data, filter the data for noise and outliers, create a fitted model to describe that data, complete a…

The Experimental Data Processing (EDP) software is a C++ GUI-based application to streamline the process of creating a model for structural systems based on experimental data. EDP is designed to process raw data, filter the data for noise and outliers, create a fitted model to describe that data, complete a probabilistic analysis to describe the variation between replicates of the experimental process, and analyze reliability of a structural system based on that model. In order to help design the EDP software to perform the full analysis, the probabilistic and regression modeling aspects of this analysis have been explored. The focus has been on creating and analyzing probabilistic models for the data, adding multivariate and nonparametric fits to raw data, and developing computational techniques that allow for these methods to be properly implemented within EDP. For creating a probabilistic model of replicate data, the normal, lognormal, gamma, Weibull, and generalized exponential distributions have been explored. Goodness-of-fit tests, including the chi-squared, Anderson-Darling, and Kolmogorov-Smirnoff tests, have been used in order to analyze the effectiveness of any of these probabilistic models in describing the variation of parameters between replicates of an experimental test. An example using Young's modulus data for a Kevlar-49 Swath stress-strain test was used in order to demonstrate how this analysis is performed within EDP. In order to implement the distributions, numerical solutions for the gamma, beta, and hypergeometric functions were implemented, along with an arbitrary precision library to store numbers that exceed the maximum size of double-precision floating point digits. To create a multivariate fit, the multilinear solution was created as the simplest solution to the multivariate regression problem. This solution was then extended to solve nonlinear problems that can be linearized into multiple separable terms. These problems were solved analytically with the closed-form solution for the multilinear regression, and then by using a QR decomposition to solve numerically while avoiding numerical instabilities associated with matrix inversion. For nonparametric regression, or smoothing, the loess method was developed as a robust technique for filtering noise while maintaining the general structure of the data points. The loess solution was created by addressing concerns associated with simpler smoothing methods, including the running mean, running line, and kernel smoothing techniques, and combining the ability of each of these methods to resolve those issues. The loess smoothing method involves weighting each point in a partition of the data set, and then adding either a line or a polynomial fit within that partition. Both linear and quadratic methods were applied to a carbon fiber compression test, showing that the quadratic model was more accurate but the linear model had a shape that was more effective for analyzing the experimental data. Finally, the EDP program itself was explored to consider its current functionalities for processing data, as described by shear tests on carbon fiber data, and the future functionalities to be developed. The probabilistic and raw data processing capabilities were demonstrated within EDP, and the multivariate and loess analysis was demonstrated using R. As the functionality and relevant considerations for these methods have been developed, the immediate goal is to finish implementing and integrating these additional features into a version of EDP that performs a full streamlined structural analysis on experimental data.

ContributorsMarkov, Elan Richard (Author) / Rajan, Subramaniam (Thesis director) / Khaled, Bilal (Committee member) / Chemical Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Ira A. Fulton School of Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Filtering by