Search Content

Matching Items (2)

Filtering by

All Subjects: Bioinformatics

A Robust scRNA-seq Data Analysis Pipeline for Measuring Gene Expression Noise

Description

The past decade has seen a drastic increase in collaboration between Computer Science (CS) and Molecular Biology (MB). Current foci in CS such as deep learning require very large amounts of data, and MB research can often be rapidly advanced by analysis and models from CS. One of the places where CS could aid MB is during analysis of sequences to find binding sites, prediction of folding patterns of proteins. Maintenance and replication of stem-like cells is possible for long terms as well as differentiation of these cells into various tissue types. These behaviors are possible by controlling the expression of specific genes. These genes then cascade into a network effect by either promoting or repressing downstream gene expression. The expression level of all gene transcripts within a single cell can be analyzed using single cell RNA sequencing (scRNA-seq). A significant portion of noise in scRNA-seq data are results of extrinsic factors and could only be removed by customized scRNA-seq analysis pipeline. scRNA-seq experiments utilize next-gen sequencing to measure genome scale gene expression levels with single cell resolution.

Almost every step during analysis and quantification requires the use of an often empirically determined threshold, which makes quantification of noise less accurate. In addition, each research group often develops their own data analysis pipeline making it impossible to compare data from different groups. To remedy this problem a streamlined and standardized scRNA-seq data analysis and normalization protocol was designed and developed. After analyzing multiple experiments we identified the possible pipeline stages, and tools needed. Our pipeline is capable of handling data with adapters and barcodes, which was not the case with pipelines from some experiments. Our pipeline can be used to analyze single experiment scRNA-seq data and also to compare scRNA-seq data across experiments. Various processes like data gathering, file conversion, and data merging were automated in the pipeline. The main focus was to standardize and normalize single-cell RNA-seq data to minimize technical noise introduced by disparate platforms.

ContributorsBalachandran, Parithi (Author) / Wang, Xiao (Thesis advisor) / Brafman, David (Committee member) / Lockhart, Thurmon (Committee member) / Arizona State University (Publisher)

Created2017

Landscape of Gene Regulatory Network Motifs

Description

The human transcriptional regulatory machine utilizes hundreds of transcription factors which bind to specific genic sites resulting in either activation or repression of targeted genes. Networks comprised of nodes and edges can be constructed to model the relationships of regulators and their targets. Within these biological networks small enriched structural patterns containing at least three nodes can be identified as potential building blocks from which a network is organized. A first iteration computational pipeline was designed to generate a disease specific gene regulatory network for motif detection using established computational tools. The first goal was to identify motifs that can express themselves in a state that results in differential patient survival in one of the 32 different cancer types studied. This study identified issues for detecting strongly correlated motifs that also effect patient survival, yielding preliminary results for possible driving cancer etiology. Second, a comparison was performed for the topology of network motifs across multiple different data types to identify possible divergence from a conserved enrichment pattern in network perturbing diseases. The topology of enriched motifs across all the datasets converged upon a single conserved pattern reported in a previous study which did not appear to diverge dependent upon the type of disease. This report highlights possible methods to improve detection of disease driving motifs that can aid in identifying possible treatment targets in cancer. Finally, networks where only minimally perturbed, suggesting that regulatory programs were run from evolved circuits into a cancer context.

ContributorsStriker, Shawn Scott (Author) / Plaisier, Christopher (Thesis advisor) / Brafman, David (Committee member) / Wang, Xiao (Committee member) / Arizona State University (Publisher)

Created2020