Matching Items (14)

148455-Thumbnail Image.png

THE IMPACT OF RACE AND OTHER LARGE-SCALE PREDICTORS ON THE INCIDENCE OF MELANOMA SKIN CANCER-A BIOSTATISTICAL ANALYSIS

Description

Melanoma is one of the most severe forms of skin cancer and can be life-threatening due to metastasis if not caught early on in its development. Over the past decade,

Melanoma is one of the most severe forms of skin cancer and can be life-threatening due to metastasis if not caught early on in its development. Over the past decade, the U.S. Government added a Healthy People 2020 objective to reduce the melanoma skin cancer rate in the U.S. population. Now that the decade has come to a close, this research investigates possible large-scale risk factors that could lead to incidence of melanoma in the population using logistic regression and propensity score matching. Logistic regression results showed that Caucasians are 14.765 times more likely to get melanoma compared to non-Caucasians; however, after adjustment using propensity scoring, this value was adjusted to 11.605 times more likely for Caucasians than non-Caucasians. Cholesterol, Chronic Obstructive Pulmonary Disease, and Hypertension predictors also showed significance in the initial logistic regression. By using the results found in this experiment, the door has been opened for further analysis of larger-scale predictors and gives public health programs the initial information needed to create successful skin safety advocacy plans.

Contributors

Created

Date Created
  • 2021-05

153739-Thumbnail Image.png

Family, "foreigners [untitled]: a bioarchaeological approach to social organization at late classic Copan

Description

In anthropological models of social organization, kinship is perceived to be fundamental to social structure. This project aimed to understand how individuals buried in neighborhoods or patio groups were affiliated,

In anthropological models of social organization, kinship is perceived to be fundamental to social structure. This project aimed to understand how individuals buried in neighborhoods or patio groups were affiliated, by considering multiple possibilities of fictive and biological kinship, short or long-term co-residence, and long-distance kin affiliation. The social organization of the ancient Maya urban center of Copan, Honduras during the Late Classic (AD 600-822) period was evaluated through analysis of the human skeletal remains drawn from the largest collection yet recovered in Mesoamerica (n=1200). The research question was: What are the roles that kinship (biological or fictive) and co-residence play in the internal social organization of a lineage-based and/or house society? Biodistance and radiogenic strontium isotope analysis were combined to identify the degree to which individuals buried within 22 patio groups and eight neighborhoods, were (1) related to one another and (2) of local or non-local origin. Copan was an ideal place to evaluate the nuances of migration and kinship as the site is situated at the frontier of the Maya region and the edge of culturally diverse Honduras.

The results highlight the complexity of Copan’s social structure within the lineage and house models proposed for ancient Maya social organization. The radiogenic strontium data are diverse; the percentage of potential non-local individuals varied by neighborhood, some with only 10% in-migration while others approached 40%. The biodistance results are statistically significant with differences between neighborhoods, patios, and even patios within one neighborhood. The high level of in-migration and biological heterogeneity are unique to Copan. Overall, these results highlight that the Copan community was created within a complex system that was influenced by multiple factors where neither a lineage nor house model is appropriate. It was a dynamic urban environment where genealogy, affiliation, and migration all affected the social structure.

Contributors

Agent

Created

Date Created
  • 2015

152300-Thumbnail Image.png

Combining thickness information with surface tensor-based morphometry for the 3D statistical analysis of the corpus callosum

Description

In blindness research, the corpus callosum (CC) is the most frequently studied sub-cortical structure, due to its important involvement in visual processing. While most callosal analyses from brain structural magnetic

In blindness research, the corpus callosum (CC) is the most frequently studied sub-cortical structure, due to its important involvement in visual processing. While most callosal analyses from brain structural magnetic resonance images (MRI) are limited to the 2D mid-sagittal slice, we propose a novel framework to capture a complete set of 3D morphological differences in the corpus callosum between two groups of subjects. The CCs are segmented from whole brain T1-weighted MRI and modeled as 3D tetrahedral meshes. The callosal surface is divided into superior and inferior patches on which we compute a volumetric harmonic field by solving the Laplace's equation with Dirichlet boundary conditions. We adopt a refined tetrahedral mesh to compute the Laplacian operator, so our computation can achieve sub-voxel accuracy. Thickness is estimated by tracing the streamlines in the harmonic field. We combine areal changes found using surface tensor-based morphometry and thickness information into a vector at each vertex to be used as a metric for the statistical analysis. Group differences are assessed on this combined measure through Hotelling's T2 test. The method is applied to statistically compare three groups consisting of: congenitally blind (CB), late blind (LB; onset > 8 years old) and sighted (SC) subjects. Our results reveal significant differences in several regions of the CC between both blind groups and the sighted groups; and to a lesser extent between the LB and CB groups. These results demonstrate the crucial role of visual deprivation during the developmental period in reshaping the structural architecture of the CC.

Contributors

Agent

Created

Date Created
  • 2013

158415-Thumbnail Image.png

Essays on the Modeling of Binary Longitudinal Data with Time-dependent Covariates

Description

Longitudinal studies contain correlated data due to the repeated measurements on the same subject. The changing values of the time-dependent covariates and their association with the outcomes presents another source

Longitudinal studies contain correlated data due to the repeated measurements on the same subject. The changing values of the time-dependent covariates and their association with the outcomes presents another source of correlation. Most methods used to analyze longitudinal data average the effects of time-dependent covariates on outcomes over time and provide a single regression coefficient per time-dependent covariate. This denies researchers the opportunity to follow the changing impact of time-dependent covariates on the outcomes. This dissertation addresses such issue through the use of partitioned regression coefficients in three different papers.

In the first paper, an alternative approach to the partitioned Generalized Method of Moments logistic regression model for longitudinal binary outcomes is presented. This method relies on Bayes estimators and is utilized when the partitioned Generalized Method of Moments model provides numerically unstable estimates of the regression coefficients. It is used to model obesity status in the Add Health study and cognitive impairment diagnosis in the National Alzheimer’s Coordination Center database.

The second paper develops a model that allows the joint modeling of two or more binary outcomes that provide an overall measure of a subject’s trait over time. The simultaneous modelling of all outcomes provides a complete picture of the overall measure of interest. This approach accounts for the correlation among and between the outcomes across time and the changing effects of time-dependent covariates on the outcomes. The model is used to analyze four outcomes measuring overall the quality of life in the Chinese Longitudinal Healthy Longevity Study.

The third paper presents an approach that allows for estimation of cross-sectional and lagged effects of the covariates on the outcome as well as the feedback of the response on future covariates. This is done in two-parts, in part-1, the effects of time-dependent covariates on the outcomes are estimated, then, in part-2, the outcome influences on future values of the covariates are measured. These model parameters are obtained through a Generalized Method of Moments procedure that uses valid moment conditions between the outcome and the covariates. Child morbidity in the Philippines and obesity status in the Add Health data are analyzed.

Contributors

Agent

Created

Date Created
  • 2020

156200-Thumbnail Image.png

An exploration of statistical modelling methods on simulation data case study: biomechanical predator-prey simulations

Description

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics”

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more traditional statistical methods of inference are still very effective and widely used. Nevertheless, with the decrease in cost of high-performance computing, these fields are starting to employ simulation models to generate insights into questions that have been elusive in the laboratory and field. Although these computational models allow for exquisite control over large numbers of parameters, they also generate data at a qualitatively different scale than most experts in these fields are accustomed to. Thus, more sophisticated methods from big-data statistics have an opportunity to better facilitate the often-forgotten area of bioinformatics that might be called “in-silicomics”.

As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics.

Contributors

Agent

Created

Date Created
  • 2018

155725-Thumbnail Image.png

Novel methods of biomarker discovery and predictive modeling using Random Forest

Description

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval.

Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships.

Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets.

Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets.

Contributors

Agent

Created

Date Created
  • 2017

Single-Focus Confocal Data Analysis with Bayesian Nonparametrics

Description

The cell is a dense environment composes of proteins, nucleic acids, as well as other small molecules, which are constantly bombarding each other and interacting. These interactions and the diffusive

The cell is a dense environment composes of proteins, nucleic acids, as well as other small molecules, which are constantly bombarding each other and interacting. These interactions and the diffusive motions are driven by internal thermal fluctuations. Upon collision, molecules can interact and form complexes. It is of interest to learn kinetic parameters such as reaction rates of one molecule converting to different species or two molecules colliding and form a new species as well as to learn diffusion coefficients.

Several experimental measurements can probe diffusion coefficients at the single-molecule and bulk level. The target of this thesis is on single-molecule methods, which can assess diffusion coefficients at the individual molecular level. For instance, super resolution methods like stochastic optical reconstruction microscopy (STORM) and photo activated localization microscopy (PALM), have a high spatial resolution with the cost of lower temporal resolution. Also, there is a different group of methods, such as MINFLUX, multi-detector tracking, which can track a single molecule with high spatio-temporal resolution. The problem with these methods is that they are only applicable to very diluted samples since they need to ensure existence of a single molecule in the region of interest (ROI).

In this thesis, the goal is to have the best of both worlds by achieving high spatio-temporal resolutions without being limited to a few molecules. To do so, one needs to refocus on fluorescence correlation spectroscopy (FCS) as a method that applies to both in vivo and in vitro systems with a high temporal resolution and relies on multiple molecules traversing a confocal volume for an extended period of time. The difficulty here is that the interpretation of the signal leads to different estimates for the kinetic parameters such as diffusion coefficients based on a different number of molecules we consider in the model. It is for this reason that the focus of this thesis is now on using Bayesian nonparametrics (BNPs) as a way to solve this model selection problem and extract kinetic parameters such as diffusion coefficients at the single-molecule level from a few photons, and thus with the highest temporal resolution as possible.

Contributors

Agent

Created

Date Created
  • 2020

154241-Thumbnail Image.png

Quantitative modeling methods for analyzing clinical to public health problems

Description

Statistical Methods have been widely used in understanding factors for clinical and public health data. Statistical hypotheses are procedures for testing pre-stated hypotheses. The development and properties of these procedures

Statistical Methods have been widely used in understanding factors for clinical and public health data. Statistical hypotheses are procedures for testing pre-stated hypotheses. The development and properties of these procedures as well as their performance are based upon certain assumptions. Desirable properties of statistical tests are to maintain validity and to perform well even if these assumptions are not met. A statistical test that maintains such desirable properties is called robust. Mathematical models are typically mechanistic framework, used to study dynamic interactions between components (mechanisms) of a system, and how these interactions give rise to the changes in behavior (patterns) of the system as a whole over time.

In this thesis, I have developed a study that uses novel techniques to link robust statistical tests and mathematical modeling methods guided by limited data from developed and developing regions in order to address pressing clinical and epidemiological questions of interest. The procedure in this study consists of three primary steps, namely, data collection, uncertainty quantification in data, and linking dynamic model to collected data.

The first part of the study focuses on designing, collecting, and summarizing empirical data from the only national survey of hospitals ever conducted regarding patient controlled analgesia (PCA) practices among 168 hospitals across 40 states, in order to assess risks before putting patients on PCA. I used statistical relational models and exploratory data analysis to address the question. Risk factors assessed indicate a great concern for the safety of patients from one healthcare institution to other.

In the second part, I quantify uncertainty associated with data obtained from James A Lovell Federal Healthcare Center to primarily study the effect of Benign Prostatic Hypertrophy (BPH) on sleep architecture in patients with Obstructive Sleep Apnea (OSA). Patients with OSA and BPH demonstrated significant difference in their sleep architecture in comparison to patients without BPH. One of the ways to validate these differences in sleep architecture between the two groups may be to carry out a similar study that evaluates the effect of some other chronic disease on sleep architecture in patients with OSA.

Additionally, I also address theoretical statistical questions such as (1) how to estimate the distribution of a variable in order to retest null hypothesis when the sample size is limited, and (2) how changes on assumptions (like monotonicity and nonlinearity) translate into the effect of the independent variable on the outcome variable. To address these questions we use multiple techniques such as Partial Rank Correlation Coefficients (PRCC) based sensitivity analysis, Fractional Polynomials, and statistical relational models.

In the third part, my goal was to identify socio-economic-environment-related risk factors for Visceral Leishmaniasis (VL) and use the identified critical factors to develop a mathematical model to understand VL transmission dynamics when data is highly underreported. I primarily studied the role of age-specific- susceptibility and epidemiological quantities on the dynamics of VL in the Indian state of Bihar. Statistical results provided ideas on the choice of the modeling framework and estimates of model parameters.

In the conclusion, this study addressed three primary theoretical modeling-related questions (1) how to analyze collected data when sample size limited, and how modeling assumptions varies results of data analysis? (2) Is it possible to identify hidden associations and nonlinearity of these associations using such underpowered data and (3) how statistical models provide more reasonable structure to mathematical modeling framework that can be used in turn to understand dynamics of the system.

Contributors

Agent

Created

Date Created
  • 2015

149928-Thumbnail Image.png

Integrative analyses of diverse biological data sources

Description

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards these objectives, this research focuses on data integration within two scenarios: (1) transcriptomic, proteomic and functional information and (2) real-time sensor-based measurements motivated by single-cell technology. To assess relationships between protein abundance, transcriptomic and functional data, a nonlinear model was explored at static and temporal levels. The successful integration of these heterogeneous data sources through the stochastic gradient boosted tree approach and its improved predictability are some highlights of this work. Through the development of an innovative validation subroutine based on a permutation approach and the use of external information (i.e., operons), lack of a priori knowledge for undetected proteins was overcome. The integrative methodologies allowed for the identification of undetected proteins for Desulfovibrio vulgaris and Shewanella oneidensis for further biological exploration in laboratories towards finding functional relationships. In an effort to better understand diseases such as cancer at different developmental stages, the Microscale Life Science Center headquartered at the Arizona State University is pursuing single-cell studies by developing novel technologies. This research arranged and applied a statistical framework that tackled the following challenges: random noise, heterogeneous dynamic systems with multiple states, and understanding cell behavior within and across different Barrett's esophageal epithelial cell lines using oxygen consumption curves. These curves were characterized with good empirical fit using nonlinear models with simple structures which allowed extraction of a large number of features. Application of a supervised classification model to these features and the integration of experimental factors allowed for identification of subtle patterns among different cell types visualized through multidimensional scaling. Motivated by the challenges of analyzing real-time measurements, we further explored a unique two-dimensional representation of multiple time series using a wavelet approach which showcased promising results towards less complex approximations. Also, the benefits of external information were explored to improve the image representation.

Contributors

Agent

Created

Date Created
  • 2011

152641-Thumbnail Image.png

Using antibodies to characterize healthy, disease, and age states

Description

The advent of new high throughput technology allows for increasingly detailed characterization of the immune system in healthy, disease, and age states. The immune system is composed of two main

The advent of new high throughput technology allows for increasingly detailed characterization of the immune system in healthy, disease, and age states. The immune system is composed of two main branches: the innate and adaptive immune system, though the border between these two states is appearing less distinct. The adaptive immune system is further split into two main categories: humoral and cellular immunity. The humoral immune response produces antibodies against specific targets, and these antibodies can be used to learn about disease and normal states. In this document, I use antibodies to characterize the immune system in two ways: 1. I determine the Antibody Status (AbStat) from the data collected from applying sera to an array of non-natural sequence peptides, and demonstrate that this AbStat measure can distinguish between disease, normal, and aged samples as well as produce a single AbStat number for each sample; 2. I search for antigens for use in a cancer vaccine, and this search results in several candidates as well as a new hypothesis. Antibodies provide us with a powerful tool for characterizing the immune system, and this natural tool combined with emerging technologies allows us to learn more about healthy and disease states.

Contributors

Agent

Created

Date Created
  • 2014