Search Content

Using antibodies to characterize healthy, disease, and age states

Description

The advent of new high throughput technology allows for increasingly detailed characterization of the immune system in healthy, disease, and age states. The immune system is composed of two main branches: the innate and adaptive immune system, though the border between these two states is appearing less distinct. The adaptive…

The advent of new high throughput technology allows for increasingly detailed characterization of the immune system in healthy, disease, and age states. The immune system is composed of two main branches: the innate and adaptive immune system, though the border between these two states is appearing less distinct. The adaptive immune system is further split into two main categories: humoral and cellular immunity. The humoral immune response produces antibodies against specific targets, and these antibodies can be used to learn about disease and normal states. In this document, I use antibodies to characterize the immune system in two ways: 1. I determine the Antibody Status (AbStat) from the data collected from applying sera to an array of non-natural sequence peptides, and demonstrate that this AbStat measure can distinguish between disease, normal, and aged samples as well as produce a single AbStat number for each sample; 2. I search for antigens for use in a cancer vaccine, and this search results in several candidates as well as a new hypothesis. Antibodies provide us with a powerful tool for characterizing the immune system, and this natural tool combined with emerging technologies allows us to learn more about healthy and disease states.

ContributorsWhittemore, Kurt (Author) / Sykes, Kathryn (Thesis advisor) / Johnston, Stephen A. (Committee member) / Jacobs, Bertram (Committee member) / Stafford, Phillip (Committee member) / Stout, Valerie (Committee member) / Arizona State University (Publisher)

Created2014

Integrative analyses of diverse biological data sources

Description

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards…

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards these objectives, this research focuses on data integration within two scenarios: (1) transcriptomic, proteomic and functional information and (2) real-time sensor-based measurements motivated by single-cell technology. To assess relationships between protein abundance, transcriptomic and functional data, a nonlinear model was explored at static and temporal levels. The successful integration of these heterogeneous data sources through the stochastic gradient boosted tree approach and its improved predictability are some highlights of this work. Through the development of an innovative validation subroutine based on a permutation approach and the use of external information (i.e., operons), lack of a priori knowledge for undetected proteins was overcome. The integrative methodologies allowed for the identification of undetected proteins for Desulfovibrio vulgaris and Shewanella oneidensis for further biological exploration in laboratories towards finding functional relationships. In an effort to better understand diseases such as cancer at different developmental stages, the Microscale Life Science Center headquartered at the Arizona State University is pursuing single-cell studies by developing novel technologies. This research arranged and applied a statistical framework that tackled the following challenges: random noise, heterogeneous dynamic systems with multiple states, and understanding cell behavior within and across different Barrett's esophageal epithelial cell lines using oxygen consumption curves. These curves were characterized with good empirical fit using nonlinear models with simple structures which allowed extraction of a large number of features. Application of a supervised classification model to these features and the integration of experimental factors allowed for identification of subtle patterns among different cell types visualized through multidimensional scaling. Motivated by the challenges of analyzing real-time measurements, we further explored a unique two-dimensional representation of multiple time series using a wavelet approach which showcased promising results towards less complex approximations. Also, the benefits of external information were explored to improve the image representation.

ContributorsTorres Garcia, Wandaliz (Author) / Meldrum, Deirdre R. (Thesis advisor) / Runger, George C. (Thesis advisor) / Gel, Esma S. (Committee member) / Li, Jing (Committee member) / Zhang, Weiwen (Committee member) / Arizona State University (Publisher)

Created2011

Computational approaches for addressing complexity in biomedicine

Description

The living world we inhabit and observe is extraordinarily complex. From the perspective of a person analyzing data about the living world, complexity is most commonly encountered in two forms: 1) in the sheer size of the datasets that must be analyzed and the physical number of mathematical computations necessary…

The living world we inhabit and observe is extraordinarily complex. From the perspective of a person analyzing data about the living world, complexity is most commonly encountered in two forms: 1) in the sheer size of the datasets that must be analyzed and the physical number of mathematical computations necessary to obtain an answer and 2) in the underlying structure of the data, which does not conform to classical normal theory statistical assumptions and includes clustering and unobserved latent constructs. Until recently, the methods and tools necessary to effectively address the complexity of biomedical data were not ordinarily available. The utility of four methods--High Performance Computing, Monte Carlo Simulations, Multi-Level Modeling and Structural Equation Modeling--designed to help make sense of complex biomedical data are presented here.

ContributorsBrown, Justin Reed (Author) / Dinu, Valentin (Thesis advisor) / Johnson, William (Committee member) / Petitti, Diana (Committee member) / Arizona State University (Publisher)

Created2012

Analysis of immunosignaturing case studies

Description

Immunosignaturing is a technology that allows the humoral immune response to be observed through the binding of antibodies to random sequence peptides. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides in a multiplexed fashion. There are computational and statistical challenges to…

Immunosignaturing is a technology that allows the humoral immune response to be observed through the binding of antibodies to random sequence peptides. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides in a multiplexed fashion. There are computational and statistical challenges to the analysis of immunosignaturing data. The overall aim of my dissertation is to develop novel computational and statistical methods for immunosignaturing data to access its potential for diagnostics and drug discovery. Firstly, I discovered that a classification algorithm Naive Bayes which leverages the biological independence of the probes on our array in such a way as to gather more information outperforms other classification algorithms due to speed and accuracy. Secondly, using this classifier, I then tested the specificity and sensitivity of immunosignaturing platform for its ability to resolve four different diseases (pancreatic cancer, pancreatitis, type 2 diabetes and panIN) that target the same organ (pancreas). These diseases were separated with >90% specificity from controls and from each other. Thirdly, I observed that the immunosignature of type 2 diabetes and cardiovascular complications are unique, consistent, and reproducible and can be separated by 100% accuracy from controls. But when these two complications arise in the same person, the resultant immunosignature is quite different in that of individuals with only one disease. I developed a method to trace back from informative random peptides in disease signatures to the potential antigen(s). Hence, I built a decipher system to trace random peptides in type 1 diabetes immunosignature to known antigens. Immunosignaturing, unlike the ELISA, has the ability to not only detect the presence of response but also absence of response during a disease. I observed, not only higher but also lower peptides intensities can be mapped to antigens in type 1 diabetes. To study immunosignaturing potential for population diagnostics, I studied effect of age, gender and geographical location on immunosignaturing data. For its potential to be a health monitoring technology, I proposed a single metric Coefficient of Variation that has shown potential to change significantly when a person enters a disease state.

ContributorsKukreja, Muskan (Author) / Johnston, Stephen Albert (Thesis advisor) / Stafford, Phillip (Committee member) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2012

Supervised and ensemble classification of multivariate functional data: applications to lupus diagnosis

Description

This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional…

This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional nonparametric classifiers form a methodological basis, which is used herein to develop a) the family of ESFuNC segment-wise curve classification algorithms and b) per-pixel ensembles based on logistic regression and fused-LASSO. The proposed methods achieve test set accuracy rates as high as 94.3%, while returning information about regions of the temperature domain that are critical for population discrimination. The undertaken analyses suggest that derivate-based information contributes significantly in improved classification performance relative to recently published studies on SLE plasma thermograms.

ContributorsBuscaglia, Robert, Ph.D (Author) / Kamarianakis, Yiannis (Thesis advisor) / Armbruster, Dieter (Committee member) / Lanchier, Nicholas (Committee member) / McCulloch, Robert (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2018

Family, "foreigners [untitled]: a bioarchaeological approach to social organization at late classic Copan

Description

In anthropological models of social organization, kinship is perceived to be fundamental to social structure. This project aimed to understand how individuals buried in neighborhoods or patio groups were affiliated, by considering multiple possibilities of fictive and biological kinship, short or long-term co-residence, and long-distance kin affiliation. The social organization…

In anthropological models of social organization, kinship is perceived to be fundamental to social structure. This project aimed to understand how individuals buried in neighborhoods or patio groups were affiliated, by considering multiple possibilities of fictive and biological kinship, short or long-term co-residence, and long-distance kin affiliation. The social organization of the ancient Maya urban center of Copan, Honduras during the Late Classic (AD 600-822) period was evaluated through analysis of the human skeletal remains drawn from the largest collection yet recovered in Mesoamerica (n=1200). The research question was: What are the roles that kinship (biological or fictive) and co-residence play in the internal social organization of a lineage-based and/or house society? Biodistance and radiogenic strontium isotope analysis were combined to identify the degree to which individuals buried within 22 patio groups and eight neighborhoods, were (1) related to one another and (2) of local or non-local origin. Copan was an ideal place to evaluate the nuances of migration and kinship as the site is situated at the frontier of the Maya region and the edge of culturally diverse Honduras.

The results highlight the complexity of Copan’s social structure within the lineage and house models proposed for ancient Maya social organization. The radiogenic strontium data are diverse; the percentage of potential non-local individuals varied by neighborhood, some with only 10% in-migration while others approached 40%. The biodistance results are statistically significant with differences between neighborhoods, patios, and even patios within one neighborhood. The high level of in-migration and biological heterogeneity are unique to Copan. Overall, these results highlight that the Copan community was created within a complex system that was influenced by multiple factors where neither a lineage nor house model is appropriate. It was a dynamic urban environment where genealogy, affiliation, and migration all affected the social structure.

ContributorsMiller, Katherine Anne (Author) / Buikstra, Jane E. (Thesis advisor) / Bell, Ellen E. (Committee member) / Stojanowski, Christopher M (Committee member) / Knudson, Kelly J. (Committee member) / Arizona State University (Publisher)

Created2015

Novel methods of biomarker discovery and predictive modeling using Random Forest

Description

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF…

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval.

Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships.

Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets.

Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets.

ContributorsGuan, Xin (Author) / Liu, Li (Thesis advisor) / Runger, George C. (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2017

Identifying Effective Strategies for Combatting COVID Misinformation in the Digital Age

Description

The unprecedented amount and sources of information during the COVID-19 pandemic resulted in an indiscriminate level of misinformation that was confusing and compromised healthcare access and delivery. The World Health Organization (WHO) called this an ‘infodemic’, and conspiracy theories and fake news about COVID-19, plagued public health efforts to contain…

The unprecedented amount and sources of information during the COVID-19 pandemic resulted in an indiscriminate level of misinformation that was confusing and compromised healthcare access and delivery. The World Health Organization (WHO) called this an ‘infodemic’, and conspiracy theories and fake news about COVID-19, plagued public health efforts to contain the COVID-19 pandemic. National and international public health priorities expanded to counter misinformation. As a multi-disciplinary study encompassing expertise from public health, informatics, and communication, this research focused on eliciting strategies to better understand and combat misinformation on COVID-19. The study hypotheses is that 1) factors influencing vaccine-acceptance like socio-demographic factors, COVID-19 knowledge, trust in institutions, and media related factors could be leveraged for public health education and intervention; and 2) individuals with a high level of knowledge regarding COVID-19 prevention and control have unique behaviors and practices, like nuanced media literacy and validation skills that could be promoted to improve vaccine acceptance and preventative health behaviors. In this biphasic study an initial survey of 1,498 individuals sampled from Amazon Mechanical Turk (MTurk) assessed socio-demographic factors, an 18-item test of COVID-19 knowledge, trust in healthcare stakeholders, and measures of media literacy and consumption. Subsequently, using the Positive Deviance Framework, a diverse subset of 25 individuals with high COVID-19 knowledge scores were interviewed to identify these deviants’ information and media practices that helped avoid COVID-19 misinformation. Access to primary care, higher educational attainment and living in urban communities were positive socio-demographic predictors of COVID-19 vaccine acceptance emphasizing the need to invest in education and rural health. High COVID-19 knowledge and trust in government and health providers were also critical factors and associated with a higher level of trust in science and credible information sources like the Centers for Disease Control (CDC) and health experts. Positive deviants practiced media literacy skills that emphasized checking sources for scientific basis as well as hidden bias; cross-checking information across multiple sources and verifying health information with scientific experts. These identified information validation and confirmation practices may be useful in educating the public and designing strategies to better protect communities against harmful health misinformation.

ContributorsSivanandam, Shalini (Author) / Doebbeling, Bradley (Thesis advisor) / Koskan, Alexis (Committee member) / Roschke, Kristy (Committee member) / Chung, Yunro (Committee member) / Arizona State University (Publisher)

Created2023

Quantitative modeling methods for analyzing clinical to public health problems

Description

Statistical Methods have been widely used in understanding factors for clinical and public health data. Statistical hypotheses are procedures for testing pre-stated hypotheses. The development and properties of these procedures as well as their performance are based upon certain assumptions. Desirable properties of statistical tests are to maintain validity and…

Statistical Methods have been widely used in understanding factors for clinical and public health data. Statistical hypotheses are procedures for testing pre-stated hypotheses. The development and properties of these procedures as well as their performance are based upon certain assumptions. Desirable properties of statistical tests are to maintain validity and to perform well even if these assumptions are not met. A statistical test that maintains such desirable properties is called robust. Mathematical models are typically mechanistic framework, used to study dynamic interactions between components (mechanisms) of a system, and how these interactions give rise to the changes in behavior (patterns) of the system as a whole over time.

In this thesis, I have developed a study that uses novel techniques to link robust statistical tests and mathematical modeling methods guided by limited data from developed and developing regions in order to address pressing clinical and epidemiological questions of interest. The procedure in this study consists of three primary steps, namely, data collection, uncertainty quantification in data, and linking dynamic model to collected data.

The first part of the study focuses on designing, collecting, and summarizing empirical data from the only national survey of hospitals ever conducted regarding patient controlled analgesia (PCA) practices among 168 hospitals across 40 states, in order to assess risks before putting patients on PCA. I used statistical relational models and exploratory data analysis to address the question. Risk factors assessed indicate a great concern for the safety of patients from one healthcare institution to other.

In the second part, I quantify uncertainty associated with data obtained from James A Lovell Federal Healthcare Center to primarily study the effect of Benign Prostatic Hypertrophy (BPH) on sleep architecture in patients with Obstructive Sleep Apnea (OSA). Patients with OSA and BPH demonstrated significant difference in their sleep architecture in comparison to patients without BPH. One of the ways to validate these differences in sleep architecture between the two groups may be to carry out a similar study that evaluates the effect of some other chronic disease on sleep architecture in patients with OSA.

Additionally, I also address theoretical statistical questions such as (1) how to estimate the distribution of a variable in order to retest null hypothesis when the sample size is limited, and (2) how changes on assumptions (like monotonicity and nonlinearity) translate into the effect of the independent variable on the outcome variable. To address these questions we use multiple techniques such as Partial Rank Correlation Coefficients (PRCC) based sensitivity analysis, Fractional Polynomials, and statistical relational models.

In the third part, my goal was to identify socio-economic-environment-related risk factors for Visceral Leishmaniasis (VL) and use the identified critical factors to develop a mathematical model to understand VL transmission dynamics when data is highly underreported. I primarily studied the role of age-specific- susceptibility and epidemiological quantities on the dynamics of VL in the Indian state of Bihar. Statistical results provided ideas on the choice of the modeling framework and estimates of model parameters.

In the conclusion, this study addressed three primary theoretical modeling-related questions (1) how to analyze collected data when sample size limited, and how modeling assumptions varies results of data analysis? (2) Is it possible to identify hidden associations and nonlinearity of these associations using such underpowered data and (3) how statistical models provide more reasonable structure to mathematical modeling framework that can be used in turn to understand dynamics of the system.

ContributorsGonzalez, Beverly, 1980- (Author) / Castillo-Chavez, Carlos (Thesis advisor) / Mubayi, Anuj (Thesis advisor) / Nuno, Miriam (Committee member) / Arizona State University (Publisher)

Created2015

Essays on the Modeling of Binary Longitudinal Data with Time-dependent Covariates

Description

Longitudinal studies contain correlated data due to the repeated measurements on the same subject. The changing values of the time-dependent covariates and their association with the outcomes presents another source of correlation. Most methods used to analyze longitudinal data average the effects of time-dependent covariates on outcomes over time and…

Longitudinal studies contain correlated data due to the repeated measurements on the same subject. The changing values of the time-dependent covariates and their association with the outcomes presents another source of correlation. Most methods used to analyze longitudinal data average the effects of time-dependent covariates on outcomes over time and provide a single regression coefficient per time-dependent covariate. This denies researchers the opportunity to follow the changing impact of time-dependent covariates on the outcomes. This dissertation addresses such issue through the use of partitioned regression coefficients in three different papers.

In the first paper, an alternative approach to the partitioned Generalized Method of Moments logistic regression model for longitudinal binary outcomes is presented. This method relies on Bayes estimators and is utilized when the partitioned Generalized Method of Moments model provides numerically unstable estimates of the regression coefficients. It is used to model obesity status in the Add Health study and cognitive impairment diagnosis in the National Alzheimer’s Coordination Center database.

The second paper develops a model that allows the joint modeling of two or more binary outcomes that provide an overall measure of a subject’s trait over time. The simultaneous modelling of all outcomes provides a complete picture of the overall measure of interest. This approach accounts for the correlation among and between the outcomes across time and the changing effects of time-dependent covariates on the outcomes. The model is used to analyze four outcomes measuring overall the quality of life in the Chinese Longitudinal Healthy Longevity Study.

The third paper presents an approach that allows for estimation of cross-sectional and lagged effects of the covariates on the outcome as well as the feedback of the response on future covariates. This is done in two-parts, in part-1, the effects of time-dependent covariates on the outcomes are estimated, then, in part-2, the outcome influences on future values of the covariates are measured. These model parameters are obtained through a Generalized Method of Moments procedure that uses valid moment conditions between the outcome and the covariates. Child morbidity in the Philippines and obesity status in the Add Health data are analyzed.

ContributorsVazquez Arreola, Elsa Aimara (Author) / Wilson, Jeffrey R (Thesis advisor) / Hahn, Paul R (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by