Wang, Junwen

Mining Associations between MRI Morphometry Measurements and Beta-Amyloid/tau Burden

Description

Beta-Amyloid(Aβ) plaques and tau protein tangles in the brain are now widely recognized as the defining hallmarks of Alzheimer’s disease (AD), followed by structural atrophy detectable on brain magnetic resonance imaging (MRI) scans. However, current methods to detect Aβ/tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (positron emission tomography (PET)). And one of the particular neurodegenerative regions is the hippocampus to which the influence of Aβ/tau on has been one of the research projects focuses in the AD pathophysiological progress. In this dissertation, I proposed three novel machine learning and statistical models to examine subtle aspects of the hippocampal morphometry from MRI that are associated with Aβ /tau burden in the brain, measured using PET images. The first model is a novel unsupervised feature reduction model to generate a low-dimensional representation of hippocampal morphometry for each individual subject, which has superior performance in predicting Aβ/tau burden in the brain. The second one is an efficient federated group lasso model to identify the hippocampal subregions where atrophy is strongly associated with abnormal Aβ/Tau. The last one is a federated model for imaging genetics, which can identify genetic and transcriptomic influences on hippocampal morphometry. Finally, I stated the results of these three models that have been published or submitted to peer-reviewed conferences and journals.

Date Created

2022

Agent

Author (aut): Wu, Jianfeng
Thesis advisor (ths): Wang, Yalin
Committee member: Li, Baoxin
Committee member: Liang, Jianming
Committee member: Wang, Junwen
Committee member: Wu, Teresa
Publisher (pbl): Arizona State University

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

Date Created

2022

Agent

Author (aut): Nyarige, Verah Mocheche
Thesis advisor (ths): Liu, Li
Thesis advisor (ths): Wang, Junwen
Committee member: Dinu, Valentin
Publisher (pbl): Arizona State University

Statistical Methods for Analysis of Genomic Data with Applications in Oncology

Description

This dissertation presents three novel algorithms with real-world applications to genomic oncology. While the methodologies presented here were all developed to overcome various challenges associated with the adoption of high throughput genomic data in clinical oncology, they can be used in other domains as well. First, a network informed feature ranking algorithm is presented, which shows a significant increase in ability to select true predictive features from simulated data sets when compared to other state of the art graphical feature ranking methods. The methodology also shows an increased ability to predict pathological complete response to preoperative chemotherapy from genomic sequencing data of breast cancer patients utilizing domain knowledge from protein-protein interaction networks. Second, an algorithm that overcomes population biases inherent in the use of a human reference genome developed primarily from European populations is presented to classify microsatellite instability (MSI) status from next-generation-sequencing (NGS) data. The methodology significantly increases the accuracy of MSI status prediction in African and African American ancestries. Finally, a single variable model is presented to capture the bimodality inherent in genomic data stemming from heterogeneous diseases. This model shows improvements over other parametric models in the measurements of receiver-operator characteristic (ROC) curves for bimodal data. The model is used to estimate ROC curves for heterogeneous biomarkers in a dataset containing breast cancer and cancer-free specimen.

Date Created

2021

Agent

Author (aut): Saul, Michelle
Thesis advisor (ths): Dinu, Valentin
Committee member: Liu, Li
Committee member: Wang, Junwen
Publisher (pbl): Arizona State University

Differential Gene Expression in Type II Diabetes

Description

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develop a gene regulatory pathway, and 2) utilize this pathway to determine suitable drug therapeutics for prevention and treatment. Using a Gene Set Enrichment Analysis (GSEA), a set of 1000 gene identifiers from a Mayo Clinic database was analyzed to determine the most significant genetic variants related to insulin signaling pathways involved in Type II Diabetes. The following genes were identified: NRAS, KRAS, PIK3CA, PDE3B, TSC1, AKT3, SOS1, NEU1, PRKAA2, AMPK, and ACC. In an extensive literature review and cross-analysis with Kegg and Reactome pathway databases, novel SNPs located on these gene variants were identified and used to determine suitable drug therapeutics for treatment. Overall, understanding how genetic mutations affect target gene function related to Type II Diabetes disease pathology is crucial to the development of effective diagnosis and treatment. This project provides new insight into the molecular basis of the Type II Diabetes, serving to help untangle the regulatory complexity of the disease and aid in the advancement of diagnosis and treatment. Keywords: Type II Diabetes mellitus, Gene Set Enrichment Analysis, genetic variants, KEGG Insulin Pathway, gene-regulatory pathway

Date Created

2019-05

Agent

Co-author: Bucklin, Lindsay
Co-author: Davis, Vanessa
Thesis director: Holechek, Susan
Committee member: Wang, Junwen
Committee member: Nyarige, Verah
Contributor (ctb): School of Human Evolution & Social Change
Contributor (ctb): School of Life Sciences
Contributor (ctb): Barrett, The Honors College

Differential Gene Expression in Type II Diabetes

Description

Date Created

2019-05

Agent

Co-author: Davis, Vanessa Brooke
Co-author: Bucklin, Lindsay
Thesis director: Holechek, Susan
Committee member: Wang, Junwen
Contributor (ctb): School of Molecular Sciences
Contributor (ctb): Barrett, The Honors College

Linnorm: Improved Statistical Analysis for Single Cell RNA-seq Expression Data

Description

Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy.

Date Created

2017-09-18

Agent

Author (aut): Yip, Shun H.
Author (aut): Wang, Panwen
Author (aut): Kocher, Jean-Pierre A.
Author (aut): Sham, Pak Chung
Author (aut): Wang, Junwen
Contributor (ctb): College of Health Solutions

An Integrative Method to Decode Regulatory Logics in Gene Transcription

Description

Modeling of transcriptional regulatory networks (TRNs) has been increasingly used to dissect the nature of gene regulation. Inference of regulatory relationships among transcription factors (TFs) and genes, especially among multiple TFs, is still challenging. In this study, we introduced an integrative method, LogicTRN, to decode TF–TF interactions that form TF logics in regulating target genes. By combining cis-regulatory logics and transcriptional kinetics into one single model framework, LogicTRN can naturally integrate dynamic gene expression data and TF-DNA-binding signals in order to identify the TF logics and to reconstruct the underlying TRNs. We evaluated the newly developed methodology using simulation, comparison and application studies, and the results not only show their consistence with existing knowledge, but also demonstrate its ability to accurately reconstruct TRNs in biological complex systems.

Date Created

2017-10-19

Agent

Author (aut): Yan, Bin
Author (aut): Guan, Daogang
Author (aut): Wang, Chao
Author (aut): Wang, Junwen
Author (aut): He, Bing
Author (aut): Qin, Jing
Author (aut): Boheler, Kenneth R.
Author (aut): Lu, Aiping
Author (aut): Zhang, Ge
Author (aut): Zhu, Hailong
Contributor (ctb): College of Health Solutions

Robust and Rapid Algorithms Facilitate Large-Scale Whole Genome Sequencing Downstream Analysis in an Integrative Framework

Description

Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.

Date Created

2017-01-23

Agent

Author (aut): Li, Miaoxin
Author (aut): Li, Jiang
Author (aut): Li, Mulin Jun
Author (aut): Pan, Zhicheng
Author (aut): Hsu, Jacob Shujui
Author (aut): Liu, Dajiang J.
Author (aut): Zhan, Xiaowei
Author (aut): Wang, Junwen
Author (aut): Song, Youqiang
Author (aut): Sham, Pak Chung
Contributor (ctb): College of Health Solutions

Integrative Approach for the Analysis of the Proteome-Wide Response to Bismuth Drugs in Helicobacter Pylori

Description

Bismuth drugs, despite being clinically used for decades, surprisingly remain in use and effective for the treatment of Helicobacter pylori infection, even for resistant strains when co-administrated with antibiotics. However, the molecular mechanisms underlying the clinically sustained susceptibility of H. pylori to bismuth drugs remain elusive. Herein, we report that integration of in-house metalloproteomics and quantitative proteomics allows comprehensive uncovering of the bismuth-associated proteomes, including 63 bismuth-binding and 119 bismuth-regulated proteins from Helicobacter pylori, with over 60% being annotated with catalytic functions. Through bioinformatics analysis in combination with bioassays, we demonstrated that bismuth drugs disrupted multiple essential pathways in the pathogen, including ROS defence and pH buffering, by binding and functional perturbation of a number of key enzymes. Moreover, we discovered that HpDnaK may serve as a new target of bismuth drugs to inhibit bacterium-host cell adhesion. The integrative approach we report, herein, provides a novel strategy to unveil the molecular mechanisms of antimicrobial metals against pathogens in general. This study sheds light on the design of new types of antimicrobial agents with multiple targets to tackle the current crisis of antimicrobial resistance.

Date Created

2017-04-19

Agent

Author (aut): Wang, Yuchuan
Author (aut): Hu, Ligang
Author (aut): Xu, Feng
Author (aut): Quan, Quan
Author (aut): Lai, Yau-Tsz
Author (aut): Xia, Wei
Author (aut): Yang, Ya
Author (aut): Chang, Yuen-Yan
Author (aut): Yang, Xinming
Author (aut): Chai, Zhifang
Author (aut): Wang, Junwen
Author (aut): Chu, Ivan K.
Author (aut): Li, Hongyan
Author (aut): Sun, Hongzhe
Contributor (ctb): College of Health Solutions

Exploring Genetic Associations With ceRNA Regulation in the Human Genome

Description

Competing endogenous RNAs (ceRNAs) are RNA molecules that sequester shared microRNAs (miRNAs) thereby affecting the expression of other targets of the miRNAs. Whether genetic variants in ceRNA can affect its biological function and disease development is still an open question. Here we identified a large number of genetic variants that are associated with ceRNA's function using Geuvaids RNA-seq data for 462 individuals from the 1000 Genomes Project. We call these loci competing endogenous RNA expression quantitative trait loci or ‘cerQTL’, and found that a large number of them were unexplored in conventional eQTL mapping. We identified many cerQTLs that have undergone recent positive selection in different human populations, and showed that single nucleotide polymorphisms in gene 3΄UTRs at the miRNA seed binding regions can simultaneously regulate gene expression changes in both cis and trans by the ceRNA mechanism. We also discovered that cerQTLs are significantly enriched in traits/diseases associated variants reported from genome-wide association studies in the miRNA binding sites, suggesting that disease susceptibilities could be attributed to ceRNA regulation. Further in vitro functional experiments demonstrated that a cerQTL rs11540855 can regulate ceRNA function. These results provide a comprehensive catalog of functional non-coding regulatory variants that may be responsible for ceRNA crosstalk at the post-transcriptional level.

Date Created

2017-05-02

Agent

Author (aut): Li, Mulin Jun
Author (aut): Zhang, Jian
Author (aut): Liang, Qian
Author (aut): Xuan, Chenghao
Author (aut): Wu, Jiexing
Author (aut): Jiang, Peng
Author (aut): Li, Wei
Author (aut): Zhu, Yun
Author (aut): Wang, Panwen
Author (aut): Fernandez, Daniel
Author (aut): Shen, Yujun
Author (aut): Chen, Yiwen
Author (aut): Kocher, Jean-Pierre A.
Author (aut): Yu, Ying
Author (aut): Sham, Pak Chung
Author (aut): Wang, Junwen
Author (aut): Liu, Jun S.
Author (aut): Liu, X. Shirley
Contributor (ctb): College of Health Solutions

Subscribe to Wang, Junwen