Search Content

Matching Items (3)

Filtering by

All Subjects: Bioinformatics

Dense non-natural sequence peptide microarrays for epitope mapping and diagnostics

Description

The healthcare system in this country is currently unacceptable. New technologies may contribute to reducing cost and improving outcomes. Early diagnosis and treatment represents the least risky option for addressing this issue. Such a technology needs to be inexpensive, highly sensitive, highly specific, and amenable to adoption in a clinic. This thesis explores an immunodiagnostic technology based on highly scalable, non-natural sequence peptide microarrays designed to profile the humoral immune response and address the healthcare problem. The primary aim of this thesis is to explore the ability of these arrays to map continuous (linear) epitopes. I discovered that using a technique termed subsequence analysis where epitopes could be decisively mapped to an eliciting protein with high success rate. This led to the discovery of novel linear epitopes from Plasmodium falciparum (Malaria) and Treponema palladium (Syphilis), as well as validation of previously discovered epitopes in Dengue and monoclonal antibodies. Next, I developed and tested a classification scheme based on Support Vector Machines for development of a Dengue Fever diagnostic, achieving higher sensitivity and specificity than current FDA approved techniques. The software underlying this method is available for download under the BSD license. Following this, I developed a kinetic model for immunosignatures and tested it against existing data driven by previously unexplained phenomena. This model provides a framework and informs ways to optimize the platform for maximum stability and efficiency. I also explored the role of sequence composition in explaining an immunosignature binding profile, determining a strong role for charged residues that seems to have some predictive ability for disease. Finally, I developed a database, software and indexing strategy based on Apache Lucene for searching motif patterns (regular expressions) in large biological databases. These projects as a whole have advanced knowledge of how to approach high throughput immunodiagnostics and provide an example of how technology can be fused with biology in order to affect scientific and health outcomes.

ContributorsRicher, Joshua Amos (Author) / Johnston, Stephen A. (Thesis advisor) / Woodbury, Neal (Committee member) / Stafford, Phillip (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2014

A Robust scRNA-seq Data Analysis Pipeline for Measuring Gene Expression Noise

Description

The past decade has seen a drastic increase in collaboration between Computer Science (CS) and Molecular Biology (MB). Current foci in CS such as deep learning require very large amounts of data, and MB research can often be rapidly advanced by analysis and models from CS. One of the places where CS could aid MB is during analysis of sequences to find binding sites, prediction of folding patterns of proteins. Maintenance and replication of stem-like cells is possible for long terms as well as differentiation of these cells into various tissue types. These behaviors are possible by controlling the expression of specific genes. These genes then cascade into a network effect by either promoting or repressing downstream gene expression. The expression level of all gene transcripts within a single cell can be analyzed using single cell RNA sequencing (scRNA-seq). A significant portion of noise in scRNA-seq data are results of extrinsic factors and could only be removed by customized scRNA-seq analysis pipeline. scRNA-seq experiments utilize next-gen sequencing to measure genome scale gene expression levels with single cell resolution.

Almost every step during analysis and quantification requires the use of an often empirically determined threshold, which makes quantification of noise less accurate. In addition, each research group often develops their own data analysis pipeline making it impossible to compare data from different groups. To remedy this problem a streamlined and standardized scRNA-seq data analysis and normalization protocol was designed and developed. After analyzing multiple experiments we identified the possible pipeline stages, and tools needed. Our pipeline is capable of handling data with adapters and barcodes, which was not the case with pipelines from some experiments. Our pipeline can be used to analyze single experiment scRNA-seq data and also to compare scRNA-seq data across experiments. Various processes like data gathering, file conversion, and data merging were automated in the pipeline. The main focus was to standardize and normalize single-cell RNA-seq data to minimize technical noise introduced by disparate platforms.

ContributorsBalachandran, Parithi (Author) / Wang, Xiao (Thesis advisor) / Brafman, David (Committee member) / Lockhart, Thurmon (Committee member) / Arizona State University (Publisher)

Created2017

Landscape of Gene Regulatory Network Motifs

Description

The human transcriptional regulatory machine utilizes hundreds of transcription factors which bind to specific genic sites resulting in either activation or repression of targeted genes. Networks comprised of nodes and edges can be constructed to model the relationships of regulators and their targets. Within these biological networks small enriched structural patterns containing at least three nodes can be identified as potential building blocks from which a network is organized. A first iteration computational pipeline was designed to generate a disease specific gene regulatory network for motif detection using established computational tools. The first goal was to identify motifs that can express themselves in a state that results in differential patient survival in one of the 32 different cancer types studied. This study identified issues for detecting strongly correlated motifs that also effect patient survival, yielding preliminary results for possible driving cancer etiology. Second, a comparison was performed for the topology of network motifs across multiple different data types to identify possible divergence from a conserved enrichment pattern in network perturbing diseases. The topology of enriched motifs across all the datasets converged upon a single conserved pattern reported in a previous study which did not appear to diverge dependent upon the type of disease. This report highlights possible methods to improve detection of disease driving motifs that can aid in identifying possible treatment targets in cancer. Finally, networks where only minimally perturbed, suggesting that regulatory programs were run from evolved circuits into a cancer context.

ContributorsStriker, Shawn Scott (Author) / Plaisier, Christopher (Thesis advisor) / Brafman, David (Committee member) / Wang, Xiao (Committee member) / Arizona State University (Publisher)

Created2020