Search Content

Antibody based strategies for multiplexed diagnostics

Description

Peptide microarrays are to proteomics as sequencing is to genomics. As microarrays become more content-rich, higher resolution proteomic studies will parallel deep sequencing of nucleic acids. Antigen-antibody interactions can be studied at a much higher resolution using microarrays than was possible only a decade ago. My dissertation focuses on testing…

Peptide microarrays are to proteomics as sequencing is to genomics. As microarrays become more content-rich, higher resolution proteomic studies will parallel deep sequencing of nucleic acids. Antigen-antibody interactions can be studied at a much higher resolution using microarrays than was possible only a decade ago. My dissertation focuses on testing the feasibility of using either the Immunosignature platform, based on non-natural peptide sequences, or a pathogen peptide microarray, which uses bioinformatically-selected peptides from pathogens for creating sensitive diagnostics. Both diagnostic applications use relatively little serum from infected individuals, but each approaches diagnosis of disease differently. The first project compares pathogen epitope peptide (life-space) and non-natural (random-space) peptide microarrays while using them for the early detection of Coccidioidomycosis (Valley Fever). The second project uses NIAID category A, B and C priority pathogen epitope peptides in a multiplexed microarray platform to assess the feasibility of using epitope peptides to simultaneously diagnose multiple exposures using a single assay. Cross-reactivity is a consistent feature of several antigen-antibody based immunodiagnostics. This work utilizes microarray optimization and bioinformatic approaches to distill the underlying disease specific antibody signature pattern. Circumventing inherent cross-reactivity observed in antibody binding to peptides was crucial to achieve the goal of this work to accurately distinguishing multiple exposures simultaneously.

ContributorsNavalkar, Krupa Arun (Author) / Johnston, Stephen A. (Thesis advisor) / Stafford, Phillip (Thesis advisor) / Sykes, Kathryn (Committee member) / Jacobs, Bertram (Committee member) / Arizona State University (Publisher)

Created2014

Dense non-natural sequence peptide microarrays for epitope mapping and diagnostics

Description

The healthcare system in this country is currently unacceptable. New technologies may contribute to reducing cost and improving outcomes. Early diagnosis and treatment represents the least risky option for addressing this issue. Such a technology needs to be inexpensive, highly sensitive, highly specific, and amenable to adoption in a clinic.…

The healthcare system in this country is currently unacceptable. New technologies may contribute to reducing cost and improving outcomes. Early diagnosis and treatment represents the least risky option for addressing this issue. Such a technology needs to be inexpensive, highly sensitive, highly specific, and amenable to adoption in a clinic. This thesis explores an immunodiagnostic technology based on highly scalable, non-natural sequence peptide microarrays designed to profile the humoral immune response and address the healthcare problem. The primary aim of this thesis is to explore the ability of these arrays to map continuous (linear) epitopes. I discovered that using a technique termed subsequence analysis where epitopes could be decisively mapped to an eliciting protein with high success rate. This led to the discovery of novel linear epitopes from Plasmodium falciparum (Malaria) and Treponema palladium (Syphilis), as well as validation of previously discovered epitopes in Dengue and monoclonal antibodies. Next, I developed and tested a classification scheme based on Support Vector Machines for development of a Dengue Fever diagnostic, achieving higher sensitivity and specificity than current FDA approved techniques. The software underlying this method is available for download under the BSD license. Following this, I developed a kinetic model for immunosignatures and tested it against existing data driven by previously unexplained phenomena. This model provides a framework and informs ways to optimize the platform for maximum stability and efficiency. I also explored the role of sequence composition in explaining an immunosignature binding profile, determining a strong role for charged residues that seems to have some predictive ability for disease. Finally, I developed a database, software and indexing strategy based on Apache Lucene for searching motif patterns (regular expressions) in large biological databases. These projects as a whole have advanced knowledge of how to approach high throughput immunodiagnostics and provide an example of how technology can be fused with biology in order to affect scientific and health outcomes.

ContributorsRicher, Joshua Amos (Author) / Johnston, Stephen A. (Thesis advisor) / Woodbury, Neal (Committee member) / Stafford, Phillip (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2014

Biology-based matched signal processing and physics-based modeling for improved detection

Description

Peptide microarrays have been used in molecular biology to profile immune responses and develop diagnostic tools. When the microarrays are printed with random peptide sequences, they can be used to identify antigen antibody binding patterns or immunosignatures. In this thesis, an advanced signal processing method is proposed to estimate…

Peptide microarrays have been used in molecular biology to profile immune responses and develop diagnostic tools. When the microarrays are printed with random peptide sequences, they can be used to identify antigen antibody binding patterns or immunosignatures. In this thesis, an advanced signal processing method is proposed to estimate epitope antigen subsequences as well as identify mimotope antigen subsequences that mimic the structure of epitopes from random-sequence peptide microarrays. The method first maps peptide sequences to linear expansions of highly-localized one-dimensional (1-D) time-varying signals and uses a time-frequency processing technique to detect recurring patterns in subsequences. This technique is matched to the aforementioned mapping scheme, and it allows for an inherent analysis on how substitutions in the subsequences can affect antibody binding strength. The performance of the proposed method is demonstrated by estimating epitopes and identifying potential mimotopes for eight monoclonal antibody samples.

The proposed mapping is generalized to express information on a protein's sequence location, structure and function onto a highly localized three-dimensional (3-D) Gaussian waveform. In particular, as analysis of protein homology has shown that incorporating different kinds of information into an alignment process can yield more robust alignment results, a pairwise protein structure alignment method is proposed based on a joint similarity measure of multiple mapped protein attributes. The 3-D mapping allocates protein properties into distinct regions in the time-frequency plane in order to simplify the alignment process by including all relevant information into a single, highly customizable waveform. Simulations demonstrate the improved performance of the joint alignment approach to infer relationships between proteins, and they provide information on mutations that cause changes to both the sequence and structure of a protein.

In addition to the biology-based signal processing methods, a statistical method is considered that uses a physics-based model to improve processing performance. In particular, an externally developed physics-based model for sea clutter is examined when detecting a low radar cross-section target in heavy sea clutter. This novel model includes a process that generates random dynamic sea clutter based on the governing physics of water gravity and capillary waves and a finite-difference time-domain electromagnetics simulation process based on Maxwell's equations propagating the radar signal. A subspace clutter suppression detector is applied to remove dominant clutter eigenmodes, and its improved performance over matched filtering is demonstrated using simulations.

ContributorsO'Donnell, Brian (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel (Committee member) / Johnston, Stephen A. (Committee member) / Kovvali, Narayan (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2014

Characterization and analysis of a novel platform for profiling the antibody response

Description

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze…

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze the factors affecting the binding patterns using monoclonal antibodies and determine how much information may be extracted from the sequences. Specifically, I examined the effects of antibody concentration, competition, peptide density, and antibody valence. Peptide binding could be detected at the low concentrations relevant to immunosignaturing, and a monoclonal's signature could even be detected in the presences of 100 fold excess naive IgG. I also found that peptide density was important, but this effect was not due to bivalent binding. Next, I examined in more detail how a polyreactive antibody binds to the random sequence peptides compared to protein sequence derived peptides, and found that it bound to many peptides from both sets, but with low apparent affinity. An in depth look at how the peptide physicochemical properties and sequence complexity revealed that there were some correlations with properties, but they were generally small and varied greatly between antibodies. However, on a limited diversity but larger peptide library, I found that sequence complexity was important for antibody binding. The redundancy on that library did enable the identification of specific sub-sequences recognized by an antibody. The current immunosignaturing platform has little repetition of sub-sequences, so I evaluated several methods to infer antibody epitopes. I found two methods that had modest prediction accuracy, and I developed a software application called GuiTope to facilitate the epitope prediction analysis. None of the methods had sufficient accuracy to identify an unknown antigen from a database. In conclusion, the characteristics of the immunosignaturing platform observed through monoclonal antibody experiments demonstrate its promise as a new diagnostic technology. However, a major limitation is the difficulty in connecting the signature back to the original antigen, though larger peptide libraries could facilitate these predictions.

ContributorsHalperin, Rebecca (Author) / Johnston, Stephen A. (Thesis advisor) / Bordner, Andrew (Committee member) / Taylor, Thomas (Committee member) / Stafford, Phillip (Committee member) / Arizona State University (Publisher)

Created2011

Association based prioritization of genes

Description

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them…

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. To validate these approaches in a disease-specific context, we built a schizophreniaspecific network based on the inferred associations and performed a comprehensive prioritization of human genes with respect to the disease. These results are expected to be validated empirically, but computational validation using known targets are very positive.

ContributorsLee, Jang (Author) / Gonzalez, Graciela (Thesis advisor) / Ye, Jieping (Committee member) / Davulcu, Hasan (Committee member) / Gallitano-Mendel, Amelia (Committee member) / Arizona State University (Publisher)

Created2011

Identification of neo-antigens for a cancer vaccine by transcriptome analysis

Description

We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA…

We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA processing or in genomic DNA, may lead to generation of neo-peptides that are foreign to the immune system. Viral peptides presumably would originate from exogenous but integrated viral nucleic acid sequences. Both are non-self, therefore lessen concerns about development of autoimmunity. I have developed a bioinformatical approach to identify these aberrant transcripts in the cancer transcriptome. Their suitability for use in a vaccine is evaluated by establishing their frequencies and predicting possible epitopes along with their population coverage according to the prevalence of major histocompatibility complex (MHC) types. Viral transcripts and transcripts with FS mutations from gene fusion, insertion/deletion at coding microsatellite DNA, and alternative splicing were identified in NCBI Expressed Sequence Tag (EST) database. 48 FS chimeric transcripts were validated in 50 breast cell lines and 68 primary breast tumor samples with their frequencies from 4% to 98% by RT-PCR and sequencing confirmation. These 48 FS peptides, if translated and presented, could be used to protect more than 90% of the population in Northern America based on the prediction of epitopes derived from them. Furthermore, we synthesized 150 peptides that correspond to FS and viral peptides that we predicted would exist in tumor patients and we tested over 200 different cancer patient sera. We found a number of serological reactive peptide sequences in cancer patients that had little to no reactivity in healthy controls; strong support for the strength of our bioinformatic approach. This study describes a process used to identify aberrant transcripts that lead to a new source of antigens that can be tested and used in a prophylactic cancer vaccine. The vast amount of transcriptome data of various cancers from the Cancer Genome Atlas (TCGA) project will enhance our ability to further select better cancer antigen candidates.

ContributorsLee, HoJoon (Author) / Johnston, Stephen A. (Thesis advisor) / Kumar, Sudhir (Committee member) / Miller, Laurence (Committee member) / Stafford, Phillip (Committee member) / Sykes, Kathryn (Committee member) / Arizona State University (Publisher)

Created2012

A timeline extraction approach to derive drug usage patterns in pregnant women using social media

Description

Proliferation of social media websites and discussion forums in the last decade has resulted in social media mining emerging as an effective mechanism to extract consumer patterns. Most research on social media and pharmacovigilance have concentrated on

Adverse Drug Reaction (ADR) identification. Such methods employ a step of drug search followed…

Proliferation of social media websites and discussion forums in the last decade has resulted in social media mining emerging as an effective mechanism to extract consumer patterns. Most research on social media and pharmacovigilance have concentrated on

Adverse Drug Reaction (ADR) identification. Such methods employ a step of drug search followed by classification of the associated text as consisting an ADR or not. Although this method works efficiently for ADR classifications, if ADR evidence is present in users posts over time, drug mentions fail to capture such ADRs. It also fails to record additional user information which may provide an opportunity to perform an in-depth analysis for lifestyle habits and possible reasons for any medical problems.

Pre-market clinical trials for drugs generally do not include pregnant women, and so their effects on pregnancy outcomes are not discovered early. This thesis presents a thorough, alternative strategy for assessing the safety profiles of drugs during pregnancy by utilizing user timelines from social media. I explore the use of a variety of state-of-the-art social media mining techniques, including rule-based and machine learning techniques, to identify pregnant women, monitor their drug usage patterns, categorize their birth outcomes, and attempt to discover associations between drugs and bad birth outcomes.

The technique used models user timelines as longitudinal patient networks, which provide us with a variety of key information about pregnancy, drug usage, and post-

birth reactions. I evaluate the distinct parts of the pipeline separately, validating the usefulness of each step. The approach to use user timelines in this fashion has produced very encouraging results, and can be employed for a range of other important tasks where users/patients are required to be followed over time to derive population-based measures.

ContributorsChandrashekar, Pramod Bharadwaj (Author) / Davulcu, Hasan (Thesis advisor) / Gonzalez, Graciela (Thesis advisor) / Hsiao, Sharon (Committee member) / Arizona State University (Publisher)

Created2016

Comparative genomics and novel bioinformatics methodology applied to the green anole reveal unique sex chromosome evolution

Description

In species with highly heteromorphic sex chromosomes, the degradation of one of the sex chromosomes can result in unequal gene expression between the sexes (e.g., between XX females and XY males) and between the sex chromosomes and the autosomes. Dosage compensation is a process whereby genes on the sex chromosomes…

In species with highly heteromorphic sex chromosomes, the degradation of one of the sex chromosomes can result in unequal gene expression between the sexes (e.g., between XX females and XY males) and between the sex chromosomes and the autosomes. Dosage compensation is a process whereby genes on the sex chromosomes achieve equal gene expression which prevents deleterious side effects from having too much or too little expression of genes on sex chromsomes. The green anole is part of a group of species that recently underwent an adaptive radiation. The green anole has XX/XY sex determination, but the content of the X chromosome and its evolution have not been described. Given its status as a model species, better understanding the green anole genome could reveal insights into other species. Genomic analyses are crucial for a comprehensive picture of sex chromosome differentiation and dosage compensation, in addition to understanding speciation.

In order to address this, multiple comparative genomics and bioinformatics analyses were conducted to elucidate patterns of evolution in the green anole and across multiple anole species. Comparative genomics analyses were used to infer additional X-linked loci in the green anole, RNAseq data from male and female samples were anayzed to quantify patterns of sex-biased gene expression across the genome, and the extent of dosage compensation on the anole X chromosome was characterized, providing evidence that the sex chromosomes in the green anole are dosage compensated.

In addition, X-linked genes have a lower ratio of nonsynonymous to synonymous substitution rates than the autosomes when compared to other Anolis species, and pairwise rates of evolution in genes across the anole genome were analyzed. To conduct this analysis a new pipeline was created for filtering alignments and performing batch calculations for whole genome coding sequences. This pipeline has been made publicly available.

ContributorsRupp, Shawn Michael (Author) / Wilson Sayres, Melissa A (Thesis advisor) / Kusumi, Kenro (Committee member) / DeNardo, Dale (Committee member) / Arizona State University (Publisher)

Created2016

Health information extraction from social media

Description

Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition…

Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition (NER), where the mentions of entities such as diseases are located in text and their entity type are identified. However, the language in social media is highly informal, and user-expressed health-related concepts are often non-technical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and advanced machine learning-based NLP techniques have been underutilized. This work explores the effectiveness of different machine learning techniques, and particularly deep learning, to address the challenges associated with extraction of health-related concepts from social media. Deep learning has recently attracted a lot of attention in machine learning research and has shown remarkable success in several applications particularly imaging and speech recognition. However, thus far, deep learning techniques are relatively unexplored for biomedical text mining and, in particular, this is the first attempt in applying deep learning for health information extraction from social media.

This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems.

This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques.

ContributorsNikfarjam, Azadeh (Author) / Gonzalez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)

Created2016

Exploring peptide space for enzyme modulators

Description

Enzymes which regulate the metabolic reactions for sustaining all living things, are the engines of life. The discovery of molecules that are able to control enzyme activity is of great interest for therapeutics and the biocatalysis industry. Peptides are promising enzyme modulators due to their large chemical diversity and the…

Enzymes which regulate the metabolic reactions for sustaining all living things, are the engines of life. The discovery of molecules that are able to control enzyme activity is of great interest for therapeutics and the biocatalysis industry. Peptides are promising enzyme modulators due to their large chemical diversity and the existence of well-established methods for library synthesis. Microarrays represent a powerful tool for screening thousands of molecules, on a small chip, for candidates that interact with enzymes and modulate their functions. In this work, a method is presented for screening high-density arrays to discover peptides that bind and modulate enzyme activity. A viscous polyvinyl alcohol (PVA) solution was applied to array surfaces to limit the diffusion of product molecules released from enzymatic reactions, allowing the simultaneous measurement of enzyme activity and binding at each peptide feature. For proof of concept, it was possible to identify peptides that bound to horseradish peroxidase (HRP), alkaline phosphatase (APase) and â-galactosidase (â-Gal) and substantially alter their activities by comparing the peptide-enzyme binding levels and bound enzyme activity on microarrays. Several peptides, selected from microarrays, were able to inhibit â-Gal in solution, which demonstrates that behaviors selected from surfaces often transfer to solution. A mechanistic study of inhibition revealed that some of the selected peptides inhibited enzyme activity by binding to enzymes and inducing aggregation. PVA-coated peptide slides can be rapidly analyzed, given an appropriate enzyme assay, and they may also be assayed under various conditions (such as temperature, pH and solvent). I have developed a general method to discover molecules that modulate enzyme activity at desired conditions. As demonstrations, some peptides were able to promote the thermal stability of bound enzyme, which were selected by performing the microarray-based enzyme assay at high temperature. For broad applications, selected peptide ligands were used to immobilize enzymes on solid surfaces. Compared to conventional methods, enzymes immobilized on peptide-modified surfaces exhibited higher specific activities and stabilities. Peptide-modified surfaces may prove useful for immobilizing enzymes on surfaces with optimized orientation, location and performance, which are of great interest to the biocatalysis industry.

ContributorsFu, Jinglin (Author) / Woodbury, Neal W (Thesis advisor) / Johnston, Stephen A. (Committee member) / Ghirlanda, Giovanna (Committee member) / Arizona State University (Publisher)

Created2010

Filtering by