Search Content

Linnorm: Improved Statistical Analysis for Single Cell RNA-seq Expression Data

Description

Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing…

Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy.

ContributorsYip, Shun H. (Author) / Wang, Panwen (Author) / Kocher, Jean-Pierre A. (Author) / Sham, Pak Chung (Author) / Wang, Junwen (Author) / College of Health Solutions (Contributor)

Created2017-09-18

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular…

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

ContributorsNyarige, Verah Mocheche (Author) / Liu, Li (Thesis advisor) / Wang, Junwen (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2022

Mining Associations between MRI Morphometry Measurements and Beta-Amyloid/tau Burden

Description

Beta-Amyloid(Aβ) plaques and tau protein tangles in the brain are now widely recognized as the defining hallmarks of Alzheimer’s disease (AD), followed by structural atrophy detectable on brain magnetic resonance imaging (MRI) scans. However, current methods to detect Aβ/tau pathology are either invasive (lumbar puncture) or quite costly and not…

Beta-Amyloid(Aβ) plaques and tau protein tangles in the brain are now widely recognized as the defining hallmarks of Alzheimer’s disease (AD), followed by structural atrophy detectable on brain magnetic resonance imaging (MRI) scans. However, current methods to detect Aβ/tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (positron emission tomography (PET)). And one of the particular neurodegenerative regions is the hippocampus to which the influence of Aβ/tau on has been one of the research projects focuses in the AD pathophysiological progress. In this dissertation, I proposed three novel machine learning and statistical models to examine subtle aspects of the hippocampal morphometry from MRI that are associated with Aβ /tau burden in the brain, measured using PET images. The first model is a novel unsupervised feature reduction model to generate a low-dimensional representation of hippocampal morphometry for each individual subject, which has superior performance in predicting Aβ/tau burden in the brain. The second one is an efficient federated group lasso model to identify the hippocampal subregions where atrophy is strongly associated with abnormal Aβ/Tau. The last one is a federated model for imaging genetics, which can identify genetic and transcriptomic influences on hippocampal morphometry. Finally, I stated the results of these three models that have been published or submitted to peer-reviewed conferences and journals.

ContributorsWu, Jianfeng (Author) / Wang, Yalin (Thesis advisor) / Li, Baoxin (Committee member) / Liang, Jianming (Committee member) / Wang, Junwen (Committee member) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2022

Statistical Methods for Analysis of Genomic Data with Applications in Oncology

Description

This dissertation presents three novel algorithms with real-world applications to genomic oncology. While the methodologies presented here were all developed to overcome various challenges associated with the adoption of high throughput genomic data in clinical oncology, they can be used in other domains as well. First, a network informed feature…

This dissertation presents three novel algorithms with real-world applications to genomic oncology. While the methodologies presented here were all developed to overcome various challenges associated with the adoption of high throughput genomic data in clinical oncology, they can be used in other domains as well. First, a network informed feature ranking algorithm is presented, which shows a significant increase in ability to select true predictive features from simulated data sets when compared to other state of the art graphical feature ranking methods. The methodology also shows an increased ability to predict pathological complete response to preoperative chemotherapy from genomic sequencing data of breast cancer patients utilizing domain knowledge from protein-protein interaction networks. Second, an algorithm that overcomes population biases inherent in the use of a human reference genome developed primarily from European populations is presented to classify microsatellite instability (MSI) status from next-generation-sequencing (NGS) data. The methodology significantly increases the accuracy of MSI status prediction in African and African American ancestries. Finally, a single variable model is presented to capture the bimodality inherent in genomic data stemming from heterogeneous diseases. This model shows improvements over other parametric models in the measurements of receiver-operator characteristic (ROC) curves for bimodal data. The model is used to estimate ROC curves for heterogeneous biomarkers in a dataset containing breast cancer and cancer-free specimen.

ContributorsSaul, Michelle (Author) / Dinu, Valentin (Thesis advisor) / Liu, Li (Committee member) / Wang, Junwen (Committee member) / Arizona State University (Publisher)

Created2021

Differential Gene Expression in Type II Diabetes

Description

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develo…

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develop a gene regulatory pathway, and 2) utilize this pathway to determine suitable drug therapeutics for prevention and treatment. Using a Gene Set Enrichment Analysis (GSEA), a set of 1000 gene identifiers from a Mayo Clinic database was analyzed to determine the most significant genetic variants related to insulin signaling pathways involved in Type II Diabetes. The following genes were identified: NRAS, KRAS, PIK3CA, PDE3B, TSC1, AKT3, SOS1, NEU1, PRKAA2, AMPK, and ACC. In an extensive literature review and cross-analysis with Kegg and Reactome pathway databases, novel SNPs located on these gene variants were identified and used to determine suitable drug therapeutics for treatment. Overall, understanding how genetic mutations affect target gene function related to Type II Diabetes disease pathology is crucial to the development of effective diagnosis and treatment. This project provides new insight into the molecular basis of the Type II Diabetes, serving to help untangle the regulatory complexity of the disease and aid in the advancement of diagnosis and treatment. Keywords: Type II Diabetes mellitus, Gene Set Enrichment Analysis, genetic variants, KEGG Insulin Pathway, gene-regulatory pathway

ContributorsBucklin, Lindsay (Co-author) / Davis, Vanessa (Co-author) / Holechek, Susan (Thesis director) / Wang, Junwen (Committee member) / Nyarige, Verah (Committee member) / School of Human Evolution & Social Change (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Differential Gene Expression in Type II Diabetes

Description

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develo…

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develop a gene regulatory pathway, and 2) utilize this pathway to determine suitable drug therapeutics for prevention and treatment. Using a Gene Set Enrichment Analysis (GSEA), a set of 1000 gene identifiers from a Mayo Clinic database was analyzed to determine the most significant genetic variants related to insulin signaling pathways involved in Type II Diabetes. The following genes were identified: NRAS, KRAS, PIK3CA, PDE3B, TSC1, AKT3, SOS1, NEU1, PRKAA2, AMPK, and ACC. In an extensive literature review and cross-analysis with Kegg and Reactome pathway databases, novel SNPs located on these gene variants were identified and used to determine suitable drug therapeutics for treatment. Overall, understanding how genetic mutations affect target gene function related to Type II Diabetes disease pathology is crucial to the development of effective diagnosis and treatment. This project provides new insight into the molecular basis of the Type II Diabetes, serving to help untangle the regulatory complexity of the disease and aid in the advancement of diagnosis and treatment.

ContributorsDavis, Vanessa Brooke (Co-author) / Bucklin, Lindsay (Co-author) / Holechek, Susan (Thesis director) / Wang, Junwen (Committee member) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Long Noncoding RNA LINC00305 Promotes Inflammation by Activating the AHRR-NF-κB Pathway in Human Monocytes

Description

Accumulating data from genome-wide association studies (GWAS) have provided a collection of novel candidate genes associated with complex diseases, such as atherosclerosis. We identified an atherosclerosis-associated single-nucleotide polymorphism (SNP) located in the intron of the long noncoding RNA (lncRNA) LINC00305 by searching the GWAS database. Although the function of LINC00305…

Accumulating data from genome-wide association studies (GWAS) have provided a collection of novel candidate genes associated with complex diseases, such as atherosclerosis. We identified an atherosclerosis-associated single-nucleotide polymorphism (SNP) located in the intron of the long noncoding RNA (lncRNA) LINC00305 by searching the GWAS database. Although the function of LINC00305 is unknown, we found that LINC00305 expression is enriched in atherosclerotic plaques and monocytes. Overexpression of LINC00305 promoted the expression of inflammation-associated genes in THP-1 cells and reduced the expression of contractile markers in co-cultured human aortic smooth muscle cells (HASMCs). We showed that overexpression of LINC00305 activated nuclear factor-kappa beta (NF-κB) and that inhibition of NF-κB abolished LINC00305-mediated activation of cytokine expression. Mechanistically, LINC00305 interacted with lipocalin-1 interacting membrane receptor (LIMR), enhanced the interaction of LIMR and aryl-hydrocarbon receptor repressor (AHRR), and promoted protein expression as well as nuclear localization of AHRR. Moreover, LINC00305 activated NF-κB exclusively in the presence of LIMR and AHRR. In light of these findings, we propose that LINC00305 promotes monocyte inflammation by facilitating LIMR and AHRR cooperation and the AHRR activation, which eventually activates NF-κB, thereby inducing HASMC phenotype switching.

ContributorsZhang, Dan-Dan (Author) / Wang, Wen-Tian (Author) / Xiong, Jian (Author) / Xie, Xue-Min (Author) / Cui, Shen-Shen (Author) / Zhao, Zhi-Guo (Author) / Li, Mulin Jun (Author) / Zhang, Zhu-Qin (Author) / Hao, De-Long (Author) / Zhao, Xiang (Author) / Li, Yong-Jun (Author) / Wang, Junwen (Author) / Chen, Hou-Zao (Author) / Lv, Xiang (Author) / Liu, De-Pei (Author) / College of Health Solutions (Contributor)

Created2017-04-10

Cepip: Context-Dependent Epigenomic Weighting for Prioritization of Regulatory Variants and Disease-Associated Genes

Description

It remains challenging to predict regulatory variants in particular tissues or cell types due to highly context-specific gene regulation. By connecting large-scale epigenomic profiles to expression quantitative trait loci (eQTLs) in a wide range of human tissues/cell types, we identify critical chromatin features that predict variant regulatory potential. We present…

It remains challenging to predict regulatory variants in particular tissues or cell types due to highly context-specific gene regulation. By connecting large-scale epigenomic profiles to expression quantitative trait loci (eQTLs) in a wide range of human tissues/cell types, we identify critical chromatin features that predict variant regulatory potential. We present cepip, a joint likelihood framework, for estimating a variant’s regulatory probability in a context-dependent manner. Our method exhibits significant GWAS signal enrichment and is superior to existing cell type-specific methods. Furthermore, using phenotypically relevant epigenomes to weight the GWAS single-nucleotide polymorphisms, we improve the statistical power of the gene-based association test.

ContributorsLi, Mulin Jun (Author) / Li, Miaoxin (Author) / Liu, Zipeng (Author) / Yan, Bin (Author) / Pan, Zhicheng (Author) / Huang, Dandan (Author) / Liang, Qian (Author) / Ying, Dingge (Author) / Xu, Feng (Author) / Yao, Hongcheng (Author) / Wang, Panwen (Author) / Kocher, Jean-Pierre A. (Author) / Xia, Zhengyuan (Author) / Sham, Pak Chung (Author) / Liu, Jun S. (Author) / Wang, Junwen (Author) / College of Health Solutions (Contributor)

Created2017-03-16

Activation of E-Prostanoid 3 Receptor in Macrophages Facilitates Cardiac Healing After Myocardial Infarction

Description

Two distinct monocyte (Mo)/macrophage (Mp) subsets (Ly6C^low and Ly6C^hi) orchestrate cardiac recovery process following myocardial infarction (MI). Prostaglandin (PG) E₂ is involved in the Mo/Mp-mediated inflammatory response, however, the role of its receptors in Mos/Mps in cardiac healing remains to be determined. Here we show that pharmacological inhibition or gene…

Two distinct monocyte (Mo)/macrophage (Mp) subsets (Ly6C^low and Ly6C^hi) orchestrate cardiac recovery process following myocardial infarction (MI). Prostaglandin (PG) E₂ is involved in the Mo/Mp-mediated inflammatory response, however, the role of its receptors in Mos/Mps in cardiac healing remains to be determined. Here we show that pharmacological inhibition or gene ablation of the Ep3 receptor in mice suppresses accumulation of Ly6C^low Mos/Mps in infarcted hearts. Ep3 deletion in Mos/Mps markedly attenuates healing after MI by reducing neovascularization in peri-infarct zones. Ep3 deficiency diminishes CX3C chemokine receptor 1 (CX3CR1) expression and vascular endothelial growth factor (VEGF) secretion in Mos/Mps by suppressing TGFβ1 signaling and subsequently inhibits Ly6C^low Mos/Mps migration and angiogenesis. Targeted overexpression of Ep3 receptors in Mos/Mps improves wound healing by enhancing angiogenesis. Thus, the PGE₂/Ep3 axis promotes cardiac healing after MI by activating reparative Ly6C^low Mos/Mps, indicating that Ep3 receptor activation may be a promising therapeutic target for acute MI.

ContributorsTang, Juan (Author) / Shen, Yujun (Author) / Chen, Guilin (Author) / Wan, Qiangyou (Author) / Wang, Kai (Author) / Zhang, Jian (Author) / Qin, Jing (Author) / Liu, Guizhu (Author) / Zuo, Shengkai (Author) / Tao, Bo (Author) / Yu, Yu (Author) / Wang, Junwen (Author) / Lazarus, Michael (Author) / Yu, Ying (Author) / College of Health Solutions (Contributor)

Created2017-03-03

Evolution of Drug-Resistant Acinetobacter Baumannii After DCD Renal Transplantation

Description

Infection after renal transplantation remains a major cause of morbidity and death, especially infection from the extensively drug-resistant bacteria, A. baumannii. A total of fourteen A. baumannii isolates were isolated from the donors’ preserved fluid from DCD (donation after cardiac death) renal transplantation and four isolates in the recipients’ draining…

Infection after renal transplantation remains a major cause of morbidity and death, especially infection from the extensively drug-resistant bacteria, A. baumannii. A total of fourteen A. baumannii isolates were isolated from the donors’ preserved fluid from DCD (donation after cardiac death) renal transplantation and four isolates in the recipients’ draining liquid at the Kidney Disease Center, The First Affiliated Hospital, College of Medicine, Zhejiang University, from March 2013 to November 2014. An outbreak of A. baumannii emerging after DCD renal transplantation was tracked to understand the transmission of the pathogen. PFGE displayed similar DNA patterns between isolates from the same hospital. Antimicrobial susceptibility tests against thirteen antimicrobial agents were determined using the K-B diffusion method and eTest. Whole-genome sequencing was applied to investigate the genetic relationship of the isolates. With the clinical data and research results, we concluded that the A. baumannii isolates 3R1 and 3R2 was probably transmitted from the donor who acquired the bacteria during his stay in the ICU, while isolate 4R1 was transmitted from 3R1 and 3R2 via medical manipulation. This study demonstrated the value of integration of clinical profiles with molecular methods in outbreak investigation and their importance in controlling infection and preventing serious complications after DCD transplantation.

ContributorsJiang, Hong (Author) / Cao, Luxi (Author) / Qu, Lihui (Author) / Qu, Tingting (Author) / Liu, Guangjun (Author) / Wang, Rending (Author) / Li, Bingjue (Author) / Wang, Yuchen (Author) / Ying, Chaoqun (Author) / Chen, Miao (Author) / Lu, Yingying (Author) / Feng, Shi (Author) / Xiao, Yonghong (Author) / Wang, Junwen (Author) / Wu, Jianyong (Author) / Chen, Jianghua (Author) / College of Health Solutions (Contributor)

Created2017-05-16