Search Content

Topological analysis of biological pathways : genes, microRNAs and pathways involved in hepatocellular carcinoma

Description

Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired miRNA-mRNA interactions are defined as the differential (rewiring) effects of…

Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired miRNA-mRNA interactions are defined as the differential (rewiring) effects of miRNAs on the topology of biological pathways between controls and cases. In the dissertation, it is discussed that how rewired biological pathways (Chapter 1) and/or rewired miRNA-mRNA interactions (Chapter 2) aberrantly influence the activity of biological pathways and their association with disease.

This dissertation proposes two PageRank-based analytical methods, Pathways of Topological Rank Analysis (PoTRA) and miR2Pathway, discussed in Chapter 1 and Chapter 2, respectively. PoTRA focuses on detecting pathways with an altered number of hub genes in corresponding pathways between two phenotypes. The basis for PoTRA is that the loss of connectivity is a common topological trait of cancer networks, as well as the prior knowledge that a normal biological network is a scale-free network whose degree distribution follows a power law where a small number of nodes are hubs and a large number of nodes are non-hubs. However, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the scale-free structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal samples. Hence, it is hypothesized that if the number of hub genes is different in a pathway between normal and cancer, this pathway might be involved in cancer. MiR2Pathway focuses on quantifying the differential effects of miRNAs on the activity of a biological pathway when miRNA-mRNA connections are altered from normal to disease and rank disease risk of rewired miRNA-mediated biological pathways. This dissertation explores how rewired gene-gene interactions and rewired miRNA-mRNA interactions lead to aberrant activity of biological pathways, and rank pathways for their disease risk. The two methods proposed here can be used to complement existing genomics analysis methods to facilitate the study of biological mechanisms behind disease at the systems-level.

ContributorsLi, Chaoxing (Author) / Dinu, Valentin (Thesis advisor) / Kuang, Yang (Thesis advisor) / Liu, Li (Committee member) / Wang, Xiao (Committee member) / Arizona State University (Publisher)

Created2017

Genetic variations and associated electrophysiological and behavioral traits in children with childhood apraxia of speech

Description

Childhood Apraxia of Speech (CAS) is a severe motor speech disorder that is difficult to diagnose as there is currently no gold-standard measurement to differentiate between CAS and other speech disorders. In the present study, we investigate underlying biomarkers associated with CAS in addition to enhanced phenotyping through behavioral testing.…

Childhood Apraxia of Speech (CAS) is a severe motor speech disorder that is difficult to diagnose as there is currently no gold-standard measurement to differentiate between CAS and other speech disorders. In the present study, we investigate underlying biomarkers associated with CAS in addition to enhanced phenotyping through behavioral testing. Cortical electrophysiological measures were utilized to investigate differences in neural activation in response to native and non-native vowel contrasts between children with CAS and typically developing peers. Genetic analysis included full exome sequencing of a child with CAS and his unaffected parents in order to uncover underlying genetic variation that may be causal to the child’s severely impaired speech and language. Enhanced phenotyping was completed through extensive behavioral testing, including speech, language, reading, spelling, phonological awareness, gross/fine motor, and oral and hand motor tasks. Results from cortical electrophysiological measures are consistent with previous evidence of a heightened neural response to non-native sounds in CAS, potentially indicating over specified phonological representations in this population. Results of exome sequencing suggest multiple genetic variations contributing to the severely affected phenotype in the child and provide further evidence of heterogeneous genomic pathways associated with CAS. Finally, results of behavioral testing demonstrate significant impairments evident across tasks in CAS, suggesting underlying sequential processing deficits in multiple domains. Overall, these results have the potential to delineate functional pathways from genetic variations to the brain to observable behavioral phenotypes and motivate the development of preventative and targeted treatment approaches.

ContributorsVose, Caitlin (Author) / Peter, Beate (Thesis advisor) / Liu, Li (Committee member) / Brewer, Gene (Committee member) / Arizona State University (Publisher)

Created2018

Circular RNA characterization and regulatory network prediction in human tissue

Description

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their…

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their functional role and impact remains to be clarified, circRNAs have been found to regulate micro-RNAs (miRNAs) as well as parental gene transcription and may thus have key roles in transcriptional regulation. Although circRNAs have continued to gain attention, our understanding of their expression in a cell-, tissue- , and brain region-specific context remains limited. Further, computational algorithms produce varied results in terms of what circRNAs are detected. This thesis aims to advance current knowledge of circRNA expression in a region specific context focusing on the human brain, as well as address computational challenges.

The overarching goal of my research unfolds over three aims: (i) evaluating circRNAs and their predicted impact on transcriptional regulatory networks in cell-specific RNAseq data; (ii) developing a novel solution for de novo detection of full length circRNAs as well as in silico validation of selected circRNA junctions using assembly; and (iii) application of these assembly based detection and validation workflows, and integrating existing tools, to systematically identify and characterize circRNAs in functionally distinct human brain regions. To this end, I have developed novel bioinformatics workflows that are applicable to non-polyA selected RNAseq datasets and can be used to characterize circRNA expression across various sample types and diseases. Further, I establish a reference dataset of circRNA expression profiles and regulatory networks in a brain region-specific manner. This resource along with existing databases such as circBase will be invaluable in advancing circRNA research as well as improving our understanding of their role in transcriptional regulation and various neurological conditions.

ContributorsSekar, Shobana (Author) / Liang, Winnie S (Thesis advisor) / Dinu, Valentin (Thesis advisor) / Craig, David (Committee member) / Liu, Li (Committee member) / Arizona State University (Publisher)

Created2018

Analysis of HIV Risk Groups Using Bayesian Analysis

Description

Phylogenetic analyses that were conducted in the past didn't have the ability or functionality to inform and implement useful public health decisions while using clustering. Models can be constructed to conduct any further analyses for the result of meaningful data to be used in the future of public health informatics.…

Phylogenetic analyses that were conducted in the past didn't have the ability or functionality to inform and implement useful public health decisions while using clustering. Models can be constructed to conduct any further analyses for the result of meaningful data to be used in the future of public health informatics. A phylogenetic tree is considered one of the best ways for researchers to visualize and analyze the evolutionary history of a certain virus. The focus of this study was to research HIV phylodynamic and phylogenetic methods. This involved identifying the fast growing HIV transmission clusters and rates for certain risk groups in the US. In order to achieve these results an HIV database was required to retrieve real-time data for implementation, alignment software for multiple sequence alignment, Bayesian analysis software for the development and manipulation of models, and graphical tools for visualizing the output from the models created. This study began by conducting a literature review on HIV phylogeographies and phylodynamics. Sequence data was then obtained from a sequence database to be run in a multiple alignment software. The sequence that was obtained was unaligned which is why the alignment was required. Once the alignment was performed, the same file was loaded into a Bayesian analysis software for model creation of a phylogenetic tree. When the model was created, the tree was edited in a tree visualization software for the user to easily interpret. From this study the output of the tree resulted the way it did, due to a distant homology or the mixing of certain parameters. For a further continuation of this study, it would be interesting to use the same aligned sequence and use different model parameter selections for the initial creation of the model to see how the output changes. This is because one small change for the model parameter could greatly affect the output of the phylogenetic tree.

ContributorsNandan, Meghana (Author) / Scotch, Matthew (Thesis director) / Liu, Li (Committee member) / Biomedical Informatics Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Expansion and Application of Pathways of Topological Rank Analysis (PoTRA) to Various Cancers

Description

Cancer is the second leading cause of death in the United States. Cancer is a serious, complex disease which causes cells to grow uncontrollably, causing millions of deaths per year [1]. Cancer is usually caused by a combination of environmental variables and biological pathways. The pathways have a very robust…

Cancer is the second leading cause of death in the United States. Cancer is a serious, complex disease which causes cells to grow uncontrollably, causing millions of deaths per year [1]. Cancer is usually caused by a combination of environmental variables and biological pathways. The pathways have a very robust structure normally, but are altered because of cancer, resulting in a loss of connectivity between pathways. In order detect these pathways, a PageRank-based method called Pathways of Topological Rank Analysis (PoTRA) was created, which measures the relative rankings of the genes in each pathway. Applying this algorithm will allow us to figure out what pathways differed significantly in areas with cancer and areas without cancer. This would allow scientists to focus on specific pathways in order to learn more about the cancer and find more effective ways to treat it. So far, analysis using PoTRA has been successfully conducted on hepatocellular carcinoma (HCC) and its subtypes, resulting in all significant pathways found being cancer-associated. Now, using the TCGA data stored in Google Cloud's BigQuery, we created a pipeline to apply PoTRA to other cancer data sets and see how well it cross-applies to other cancers. The results show that even though some modification may need to be made to adapt to other datasets, many significant pathways were found for both HCC and breast cancer.

ContributorsMahesh, Sunny Nishant (Author) / Valentin, Dinu (Thesis director) / Liu, Li (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Novel methods of biomarker discovery and predictive modeling using Random Forest

Description

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF…

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval.

Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships.

Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets.

Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets.

ContributorsGuan, Xin (Author) / Liu, Li (Thesis advisor) / Runger, George C. (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2017

Examining the Significance of Economic Connectedness as an Indicator of Disparities in COVID-19 Infection Risk in Arizona ZCTAs

Description

Bridging social capital describes the diffusion of information across networks built between individuals of different social identities. This project aims to understand if the bridging ties of economic connectedness (EC), measured by data from Facebook friends and calculated as the average share of high socioeconomic status friends that an individual…

Bridging social capital describes the diffusion of information across networks built between individuals of different social identities. This project aims to understand if the bridging ties of economic connectedness (EC), measured by data from Facebook friends and calculated as the average share of high socioeconomic status friends that an individual from a low socioeconomic status has, can be a predictor of variations in COVID-19 infection risk across Arizona ZIP code tabulation areas (ZCTAs). Economic connectedness values across Arizona ZCTAs was examined in addition to the correlation of EC to various social and demographic factors such as age, sex, race and ethnicity, educational background, income, and health insurance coverage. A multiple linear regression model was conducted to examine the association of EC to biweekly COVID-19 growth rate from October 2020 to November 2021, and to examine the longitudinal trends in the association between these two factors. The study found that the bridging ties of economic connectedness has a significant effect size comparable to that of other demographic features, and has implications in being used to identify vulnerabilities and health disparities in communities during the pandemic.

ContributorsBoby, Maria (Author) / Oh, Hyunsung (Thesis director) / Marsiglia, Flavio (Committee member) / Liu, Li (Committee member) / Barrett, The Honors College (Contributor) / School of Life Sciences (Contributor) / School of Human Evolution & Social Change (Contributor) / School of Social Work (Contributor)

Created2023-05

Developing a Stochastic Modeling App for Biophysics Education

Description

Computational and systems biology are rapidly growing fields of academic study, but unfamiliar researchers are impeded by a lack of accessible, programming-optional, modelling tools. To address this gap, I developed BioSSA, a web framework built on JavaScript and D3.js which allows users to explore a small library of curated biophysical…

Computational and systems biology are rapidly growing fields of academic study, but unfamiliar researchers are impeded by a lack of accessible, programming-optional, modelling tools. To address this gap, I developed BioSSA, a web framework built on JavaScript and D3.js which allows users to explore a small library of curated biophysical models as well as create and simulate their own reaction network. The mathematical foundation of BioSSA is the Stochastic Gillespie Algorithm, which is widely used in mathematical modeling and biology to represent chemical reaction systems. SGA is particularly well-suited as an introductory modelling tool because of its flexibility, broad applicability, and its ability to numerically approximate systems when analytical solutions are not available. BioSSA is freely available to the community and further improvements are planned.

ContributorsRamirez, Daniel (Author) / Ghasemzadeh, Hassan (Thesis director) / Liu, Li (Committee member) / Lu, Mingyang (Committee member) / Barrett, The Honors College (Contributor) / College of Health Solutions (Contributor)

Created2023-05

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular…

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

ContributorsNyarige, Verah Mocheche (Author) / Liu, Li (Thesis advisor) / Wang, Junwen (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2022

中国双重上市公司A、B股价格差异研究

Description中国大陆证券市场上的Ａ、Ｂ股市场，是世界独特的分割市场，其中，双重上市公司A、B股（以下简称AB股），同股同权，但B股相对A股价格长期折价，被称为“Ｂ股难题”（B Share Puzzle）, 这是国际资本市场上的一个热点问题，此相关问题研究也一直延续。本文尝试研究中国政府出台的对股市长期发展进行调节的政策与B股折价之间的关系，通过对AB股发展历史的回顾，梳理出二个对AB股长期发展干预和调节的政策，即2001年2月中国政府允许中国大陆居民投资B股（简称政策一）和2005年4月29日开始的中国证券市场股权分置改革（简称政策二），并在此基础上，运用计量统计方法实证分析，研究发现中国政府出台的对股市长期发展进行调节的政策一、政策二与B股折价率有显著相关性，同时政策的干预和调节是分别有针对性进行的，使得B股折价率变化在政策影响下，通过A股价格或者B股价格的显著变化而实现。另外发现，B股平均折价率具有波动聚集特性，有小幅波动和均值回归特点，具有可预测性。

ContributorsLiu, Li (Author) / Li, Hongmin (Thesis advisor) / Zhang, Jie (Thesis advisor) / Chen, Hui (Committee member) / Arizona State University (Publisher)

Created2023