Search Content

Evolutionary Diagnosis of Non-Synonymous Variants Involved in Differential Drug Response

Description

Background:
Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their…

Background:
Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their ability to modulate the drug response.

Results:
We found that the available data on the link between drug response and nsSNV is rather modest. There were only 31 distinct drug response-altering (DR-altering) and 43 distinct drug response-neutral (DR-neutral) nsSNVs in the whole Pharmacogenomics Knowledge Base (PharmGKB). However, even with this modest dataset, it was clear that existing bioinformatics tools have difficulties in correctly predicting the known DR-altering and DR-neutral nsSNVs. They exhibited an overall accuracy of less than 50%, which was not better than random diagnosis. We found that the underlying problem is the markedly different evolutionary properties between positions harboring nsSNVs linked to drug responses and those observed for inherited diseases. To solve this problem, we developed a new diagnosis method, Drug-EvoD, which was trained on the evolutionary properties of nsSNVs associated with drug responses in a sparse learning framework. Drug-EvoD achieves a TPR of 84% and a TNR of 53%, with a balanced accuracy of 69%, which improves upon other methods significantly.

Conclusions:
The new tool will enable researchers to computationally identify nsSNVs that may affect drug responses. However, much larger training and testing datasets are needed to develop more reliable and accurate tools.

ContributorsGerek, Nevin Z. (Author) / Liu, Li (Author) / Gerold, Kristyn (Author) / Biparva, Pegah (Author) / Thomas, Eric D. (Author) / Kumar, Sudhir (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor)

Created2015-01-15

A Retrospective Investigation to Assess the Potential Application of Predictive Machine Learning Algorithms in Oncology Clinical Trials

Description

The purpose of this investigation is to apply a machine learning algorithm with de-identified, historic oncology clinical trial data to assess the theoretical understanding of predictive modeling to derive potential clinical practice recommendations. Within this study, electronic medical records from the HonorHealth Virginia G. Piper Institute will undergo data visualization…

The purpose of this investigation is to apply a machine learning algorithm with de-identified, historic oncology clinical trial data to assess the theoretical understanding of predictive modeling to derive potential clinical practice recommendations. Within this study, electronic medical records from the HonorHealth Virginia G. Piper Institute will undergo data visualization to identify potential correlations and trends critical for model creation as well as further identify potential expansions or limitations of scope regarding model purpose. Hypothesis pursued post data visualization was the development of a predictive model for 6-month survival. Current standard is estimated physician accuracy at 56.5% accuracy at 6 months out. This study created supervised learning models using decision trees, KNN, SVM and Ensemble methods using combinations of LASSO Logistic Regression and Know-GRFF Random Forest for feature selection. SVM trained on a combined set of LASSO and Know-GRRF featured produced the highest performing model at 75.5% with an AUC of 0.82. This study demonstrates the potential for applying predictive modeling on readily available EMR records to drive clinical practice recommendations. The models developed could potentially, with further development, be used as an ancillary tool for jumpstarting patient-physician conversations on survival and life expectancy.

ContributorsLi, Richard Longfei (Co-author) / Liu, Li (Co-author, Thesis director) / Gosselin, Kevin (Co-author, Committee member) / Harrington Bioengineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Differential Gene Expression in Type II Diabetes

Description

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develo…

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develop a gene regulatory pathway, and 2) utilize this pathway to determine suitable drug therapeutics for prevention and treatment. Using a Gene Set Enrichment Analysis (GSEA), a set of 1000 gene identifiers from a Mayo Clinic database was analyzed to determine the most significant genetic variants related to insulin signaling pathways involved in Type II Diabetes. The following genes were identified: NRAS, KRAS, PIK3CA, PDE3B, TSC1, AKT3, SOS1, NEU1, PRKAA2, AMPK, and ACC. In an extensive literature review and cross-analysis with Kegg and Reactome pathway databases, novel SNPs located on these gene variants were identified and used to determine suitable drug therapeutics for treatment. Overall, understanding how genetic mutations affect target gene function related to Type II Diabetes disease pathology is crucial to the development of effective diagnosis and treatment. This project provides new insight into the molecular basis of the Type II Diabetes, serving to help untangle the regulatory complexity of the disease and aid in the advancement of diagnosis and treatment. Keywords: Type II Diabetes mellitus, Gene Set Enrichment Analysis, genetic variants, KEGG Insulin Pathway, gene-regulatory pathway

ContributorsBucklin, Lindsay (Co-author) / Davis, Vanessa (Co-author) / Holechek, Susan (Thesis director) / Wang, Junwen (Committee member) / Nyarige, Verah (Committee member) / School of Human Evolution & Social Change (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Differential Gene Expression in Type II Diabetes

Description

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develo…

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develop a gene regulatory pathway, and 2) utilize this pathway to determine suitable drug therapeutics for prevention and treatment. Using a Gene Set Enrichment Analysis (GSEA), a set of 1000 gene identifiers from a Mayo Clinic database was analyzed to determine the most significant genetic variants related to insulin signaling pathways involved in Type II Diabetes. The following genes were identified: NRAS, KRAS, PIK3CA, PDE3B, TSC1, AKT3, SOS1, NEU1, PRKAA2, AMPK, and ACC. In an extensive literature review and cross-analysis with Kegg and Reactome pathway databases, novel SNPs located on these gene variants were identified and used to determine suitable drug therapeutics for treatment. Overall, understanding how genetic mutations affect target gene function related to Type II Diabetes disease pathology is crucial to the development of effective diagnosis and treatment. This project provides new insight into the molecular basis of the Type II Diabetes, serving to help untangle the regulatory complexity of the disease and aid in the advancement of diagnosis and treatment.

ContributorsDavis, Vanessa Brooke (Co-author) / Bucklin, Lindsay (Co-author) / Holechek, Susan (Thesis director) / Wang, Junwen (Committee member) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Expansion and Application of Pathways of Topological Rank Analysis (PoTRA) to Various Cancers

Description

Cancer is the second leading cause of death in the United States. Cancer is a serious, complex disease which causes cells to grow uncontrollably, causing millions of deaths per year [1]. Cancer is usually caused by a combination of environmental variables and biological pathways. The pathways have a very robust…

Cancer is the second leading cause of death in the United States. Cancer is a serious, complex disease which causes cells to grow uncontrollably, causing millions of deaths per year [1]. Cancer is usually caused by a combination of environmental variables and biological pathways. The pathways have a very robust structure normally, but are altered because of cancer, resulting in a loss of connectivity between pathways. In order detect these pathways, a PageRank-based method called Pathways of Topological Rank Analysis (PoTRA) was created, which measures the relative rankings of the genes in each pathway. Applying this algorithm will allow us to figure out what pathways differed significantly in areas with cancer and areas without cancer. This would allow scientists to focus on specific pathways in order to learn more about the cancer and find more effective ways to treat it. So far, analysis using PoTRA has been successfully conducted on hepatocellular carcinoma (HCC) and its subtypes, resulting in all significant pathways found being cancer-associated. Now, using the TCGA data stored in Google Cloud's BigQuery, we created a pipeline to apply PoTRA to other cancer data sets and see how well it cross-applies to other cancers. The results show that even though some modification may need to be made to adapt to other datasets, many significant pathways were found for both HCC and breast cancer.

ContributorsMahesh, Sunny Nishant (Author) / Valentin, Dinu (Thesis director) / Liu, Li (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Analysis of HIV Risk Groups Using Bayesian Analysis

Description

Phylogenetic analyses that were conducted in the past didn't have the ability or functionality to inform and implement useful public health decisions while using clustering. Models can be constructed to conduct any further analyses for the result of meaningful data to be used in the future of public health informatics.…

Phylogenetic analyses that were conducted in the past didn't have the ability or functionality to inform and implement useful public health decisions while using clustering. Models can be constructed to conduct any further analyses for the result of meaningful data to be used in the future of public health informatics. A phylogenetic tree is considered one of the best ways for researchers to visualize and analyze the evolutionary history of a certain virus. The focus of this study was to research HIV phylodynamic and phylogenetic methods. This involved identifying the fast growing HIV transmission clusters and rates for certain risk groups in the US. In order to achieve these results an HIV database was required to retrieve real-time data for implementation, alignment software for multiple sequence alignment, Bayesian analysis software for the development and manipulation of models, and graphical tools for visualizing the output from the models created. This study began by conducting a literature review on HIV phylogeographies and phylodynamics. Sequence data was then obtained from a sequence database to be run in a multiple alignment software. The sequence that was obtained was unaligned which is why the alignment was required. Once the alignment was performed, the same file was loaded into a Bayesian analysis software for model creation of a phylogenetic tree. When the model was created, the tree was edited in a tree visualization software for the user to easily interpret. From this study the output of the tree resulted the way it did, due to a distant homology or the mixing of certain parameters. For a further continuation of this study, it would be interesting to use the same aligned sequence and use different model parameter selections for the initial creation of the model to see how the output changes. This is because one small change for the model parameter could greatly affect the output of the phylogenetic tree.

ContributorsNandan, Meghana (Author) / Scotch, Matthew (Thesis director) / Liu, Li (Committee member) / Biomedical Informatics Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular…

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

ContributorsNyarige, Verah Mocheche (Author) / Liu, Li (Thesis advisor) / Wang, Junwen (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2022

Mining Associations between MRI Morphometry Measurements and Beta-Amyloid/tau Burden

Description

Beta-Amyloid(Aβ) plaques and tau protein tangles in the brain are now widely recognized as the defining hallmarks of Alzheimer’s disease (AD), followed by structural atrophy detectable on brain magnetic resonance imaging (MRI) scans. However, current methods to detect Aβ/tau pathology are either invasive (lumbar puncture) or quite costly and not…

Beta-Amyloid(Aβ) plaques and tau protein tangles in the brain are now widely recognized as the defining hallmarks of Alzheimer’s disease (AD), followed by structural atrophy detectable on brain magnetic resonance imaging (MRI) scans. However, current methods to detect Aβ/tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (positron emission tomography (PET)). And one of the particular neurodegenerative regions is the hippocampus to which the influence of Aβ/tau on has been one of the research projects focuses in the AD pathophysiological progress. In this dissertation, I proposed three novel machine learning and statistical models to examine subtle aspects of the hippocampal morphometry from MRI that are associated with Aβ /tau burden in the brain, measured using PET images. The first model is a novel unsupervised feature reduction model to generate a low-dimensional representation of hippocampal morphometry for each individual subject, which has superior performance in predicting Aβ/tau burden in the brain. The second one is an efficient federated group lasso model to identify the hippocampal subregions where atrophy is strongly associated with abnormal Aβ/Tau. The last one is a federated model for imaging genetics, which can identify genetic and transcriptomic influences on hippocampal morphometry. Finally, I stated the results of these three models that have been published or submitted to peer-reviewed conferences and journals.

ContributorsWu, Jianfeng (Author) / Wang, Yalin (Thesis advisor) / Li, Baoxin (Committee member) / Liang, Jianming (Committee member) / Wang, Junwen (Committee member) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2022

Unveiling Cellular Heterogeneity, Genetic Regulation, and Protein Trafficking Dynamics Via Novel Integrative Multi-Omic Approaches

Description

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological…

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological mechanisms, integrative multi-omics analysis is needed that can provide a comprehensive picture of the disease. Here, I present three novel approaches to multi-omics integrative analysis. I introduce a single-cell integrative clustering method, which leverages multi-omics to enhance the resolution of cell subpopulations. Applied to a Cellular Indexing of Transcriptomes and Epitopes (CITE-Seq) dataset from human Acute Myeloid Lymphoma (AML) and control samples, this approach unveiled nuanced cell populations that otherwise remain elusive. I then shift the focus to a computational framework to discover transcriptional regulatory trios in which a transcription factor binds to a regulatory element harboring a genetic variant and subsequently differentially regulates the transcription level of a target gene. Applied to whole-exome, whole-genome, and transcriptome data of multiple myeloma samples, this approach discovered synergetic cis-acting and trans-acting regulatory elements associated with tumorigenesis. The next part of this work introduces a novel methodology that leverages the transcriptome and surface protein data at the single-cell level produced by CITE-Seq to model the intracellular protein trafficking process. Applied to COVID-19 samples, this approach revealed dysregulated protein trafficking associated with the severity of the infection.

ContributorsMudappathi, Rekha (Author) / Liu, Li (Thesis advisor) / Dinu, Valentin (Committee member) / Sun, Zhifu (Committee member) / Arizona State University (Publisher)

Created2023

Examining the Significance of Economic Connectedness as an Indicator of Disparities in COVID-19 Infection Risk in Arizona ZCTAs

Description

Bridging social capital describes the diffusion of information across networks built between individuals of different social identities. This project aims to understand if the bridging ties of economic connectedness (EC), measured by data from Facebook friends and calculated as the average share of high socioeconomic status friends that an individual…

Bridging social capital describes the diffusion of information across networks built between individuals of different social identities. This project aims to understand if the bridging ties of economic connectedness (EC), measured by data from Facebook friends and calculated as the average share of high socioeconomic status friends that an individual from a low socioeconomic status has, can be a predictor of variations in COVID-19 infection risk across Arizona ZIP code tabulation areas (ZCTAs). Economic connectedness values across Arizona ZCTAs was examined in addition to the correlation of EC to various social and demographic factors such as age, sex, race and ethnicity, educational background, income, and health insurance coverage. A multiple linear regression model was conducted to examine the association of EC to biweekly COVID-19 growth rate from October 2020 to November 2021, and to examine the longitudinal trends in the association between these two factors. The study found that the bridging ties of economic connectedness has a significant effect size comparable to that of other demographic features, and has implications in being used to identify vulnerabilities and health disparities in communities during the pandemic.

ContributorsBoby, Maria (Author) / Oh, Hyunsung (Thesis director) / Marsiglia, Flavio (Committee member) / Liu, Li (Committee member) / Barrett, The Honors College (Contributor) / School of Life Sciences (Contributor) / School of Human Evolution & Social Change (Contributor) / School of Social Work (Contributor)

Created2023-05