Search Content

Antibody based strategies for multiplexed diagnostics

Description

Peptide microarrays are to proteomics as sequencing is to genomics. As microarrays become more content-rich, higher resolution proteomic studies will parallel deep sequencing of nucleic acids. Antigen-antibody interactions can be studied at a much higher resolution using microarrays than was possible only a decade ago. My dissertation focuses on testing…

Peptide microarrays are to proteomics as sequencing is to genomics. As microarrays become more content-rich, higher resolution proteomic studies will parallel deep sequencing of nucleic acids. Antigen-antibody interactions can be studied at a much higher resolution using microarrays than was possible only a decade ago. My dissertation focuses on testing the feasibility of using either the Immunosignature platform, based on non-natural peptide sequences, or a pathogen peptide microarray, which uses bioinformatically-selected peptides from pathogens for creating sensitive diagnostics. Both diagnostic applications use relatively little serum from infected individuals, but each approaches diagnosis of disease differently. The first project compares pathogen epitope peptide (life-space) and non-natural (random-space) peptide microarrays while using them for the early detection of Coccidioidomycosis (Valley Fever). The second project uses NIAID category A, B and C priority pathogen epitope peptides in a multiplexed microarray platform to assess the feasibility of using epitope peptides to simultaneously diagnose multiple exposures using a single assay. Cross-reactivity is a consistent feature of several antigen-antibody based immunodiagnostics. This work utilizes microarray optimization and bioinformatic approaches to distill the underlying disease specific antibody signature pattern. Circumventing inherent cross-reactivity observed in antibody binding to peptides was crucial to achieve the goal of this work to accurately distinguishing multiple exposures simultaneously.

ContributorsNavalkar, Krupa Arun (Author) / Johnston, Stephen A. (Thesis advisor) / Stafford, Phillip (Thesis advisor) / Sykes, Kathryn (Committee member) / Jacobs, Bertram (Committee member) / Arizona State University (Publisher)

Created2014

Dense non-natural sequence peptide microarrays for epitope mapping and diagnostics

Description

The healthcare system in this country is currently unacceptable. New technologies may contribute to reducing cost and improving outcomes. Early diagnosis and treatment represents the least risky option for addressing this issue. Such a technology needs to be inexpensive, highly sensitive, highly specific, and amenable to adoption in a clinic.…

The healthcare system in this country is currently unacceptable. New technologies may contribute to reducing cost and improving outcomes. Early diagnosis and treatment represents the least risky option for addressing this issue. Such a technology needs to be inexpensive, highly sensitive, highly specific, and amenable to adoption in a clinic. This thesis explores an immunodiagnostic technology based on highly scalable, non-natural sequence peptide microarrays designed to profile the humoral immune response and address the healthcare problem. The primary aim of this thesis is to explore the ability of these arrays to map continuous (linear) epitopes. I discovered that using a technique termed subsequence analysis where epitopes could be decisively mapped to an eliciting protein with high success rate. This led to the discovery of novel linear epitopes from Plasmodium falciparum (Malaria) and Treponema palladium (Syphilis), as well as validation of previously discovered epitopes in Dengue and monoclonal antibodies. Next, I developed and tested a classification scheme based on Support Vector Machines for development of a Dengue Fever diagnostic, achieving higher sensitivity and specificity than current FDA approved techniques. The software underlying this method is available for download under the BSD license. Following this, I developed a kinetic model for immunosignatures and tested it against existing data driven by previously unexplained phenomena. This model provides a framework and informs ways to optimize the platform for maximum stability and efficiency. I also explored the role of sequence composition in explaining an immunosignature binding profile, determining a strong role for charged residues that seems to have some predictive ability for disease. Finally, I developed a database, software and indexing strategy based on Apache Lucene for searching motif patterns (regular expressions) in large biological databases. These projects as a whole have advanced knowledge of how to approach high throughput immunodiagnostics and provide an example of how technology can be fused with biology in order to affect scientific and health outcomes.

ContributorsRicher, Joshua Amos (Author) / Johnston, Stephen A. (Thesis advisor) / Woodbury, Neal (Committee member) / Stafford, Phillip (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2014

Biology-based matched signal processing and physics-based modeling for improved detection

Description

Peptide microarrays have been used in molecular biology to profile immune responses and develop diagnostic tools. When the microarrays are printed with random peptide sequences, they can be used to identify antigen antibody binding patterns or immunosignatures. In this thesis, an advanced signal processing method is proposed to estimate…

Peptide microarrays have been used in molecular biology to profile immune responses and develop diagnostic tools. When the microarrays are printed with random peptide sequences, they can be used to identify antigen antibody binding patterns or immunosignatures. In this thesis, an advanced signal processing method is proposed to estimate epitope antigen subsequences as well as identify mimotope antigen subsequences that mimic the structure of epitopes from random-sequence peptide microarrays. The method first maps peptide sequences to linear expansions of highly-localized one-dimensional (1-D) time-varying signals and uses a time-frequency processing technique to detect recurring patterns in subsequences. This technique is matched to the aforementioned mapping scheme, and it allows for an inherent analysis on how substitutions in the subsequences can affect antibody binding strength. The performance of the proposed method is demonstrated by estimating epitopes and identifying potential mimotopes for eight monoclonal antibody samples.

The proposed mapping is generalized to express information on a protein's sequence location, structure and function onto a highly localized three-dimensional (3-D) Gaussian waveform. In particular, as analysis of protein homology has shown that incorporating different kinds of information into an alignment process can yield more robust alignment results, a pairwise protein structure alignment method is proposed based on a joint similarity measure of multiple mapped protein attributes. The 3-D mapping allocates protein properties into distinct regions in the time-frequency plane in order to simplify the alignment process by including all relevant information into a single, highly customizable waveform. Simulations demonstrate the improved performance of the joint alignment approach to infer relationships between proteins, and they provide information on mutations that cause changes to both the sequence and structure of a protein.

In addition to the biology-based signal processing methods, a statistical method is considered that uses a physics-based model to improve processing performance. In particular, an externally developed physics-based model for sea clutter is examined when detecting a low radar cross-section target in heavy sea clutter. This novel model includes a process that generates random dynamic sea clutter based on the governing physics of water gravity and capillary waves and a finite-difference time-domain electromagnetics simulation process based on Maxwell's equations propagating the radar signal. A subspace clutter suppression detector is applied to remove dominant clutter eigenmodes, and its improved performance over matched filtering is demonstrated using simulations.

ContributorsO'Donnell, Brian (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel (Committee member) / Johnston, Stephen A. (Committee member) / Kovvali, Narayan (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2014

Characterization and analysis of a novel platform for profiling the antibody response

Description

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze…

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze the factors affecting the binding patterns using monoclonal antibodies and determine how much information may be extracted from the sequences. Specifically, I examined the effects of antibody concentration, competition, peptide density, and antibody valence. Peptide binding could be detected at the low concentrations relevant to immunosignaturing, and a monoclonal's signature could even be detected in the presences of 100 fold excess naive IgG. I also found that peptide density was important, but this effect was not due to bivalent binding. Next, I examined in more detail how a polyreactive antibody binds to the random sequence peptides compared to protein sequence derived peptides, and found that it bound to many peptides from both sets, but with low apparent affinity. An in depth look at how the peptide physicochemical properties and sequence complexity revealed that there were some correlations with properties, but they were generally small and varied greatly between antibodies. However, on a limited diversity but larger peptide library, I found that sequence complexity was important for antibody binding. The redundancy on that library did enable the identification of specific sub-sequences recognized by an antibody. The current immunosignaturing platform has little repetition of sub-sequences, so I evaluated several methods to infer antibody epitopes. I found two methods that had modest prediction accuracy, and I developed a software application called GuiTope to facilitate the epitope prediction analysis. None of the methods had sufficient accuracy to identify an unknown antigen from a database. In conclusion, the characteristics of the immunosignaturing platform observed through monoclonal antibody experiments demonstrate its promise as a new diagnostic technology. However, a major limitation is the difficulty in connecting the signature back to the original antigen, though larger peptide libraries could facilitate these predictions.

ContributorsHalperin, Rebecca (Author) / Johnston, Stephen A. (Thesis advisor) / Bordner, Andrew (Committee member) / Taylor, Thomas (Committee member) / Stafford, Phillip (Committee member) / Arizona State University (Publisher)

Created2011

Integrative analyses of diverse biological data sources

Description

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards…

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards these objectives, this research focuses on data integration within two scenarios: (1) transcriptomic, proteomic and functional information and (2) real-time sensor-based measurements motivated by single-cell technology. To assess relationships between protein abundance, transcriptomic and functional data, a nonlinear model was explored at static and temporal levels. The successful integration of these heterogeneous data sources through the stochastic gradient boosted tree approach and its improved predictability are some highlights of this work. Through the development of an innovative validation subroutine based on a permutation approach and the use of external information (i.e., operons), lack of a priori knowledge for undetected proteins was overcome. The integrative methodologies allowed for the identification of undetected proteins for Desulfovibrio vulgaris and Shewanella oneidensis for further biological exploration in laboratories towards finding functional relationships. In an effort to better understand diseases such as cancer at different developmental stages, the Microscale Life Science Center headquartered at the Arizona State University is pursuing single-cell studies by developing novel technologies. This research arranged and applied a statistical framework that tackled the following challenges: random noise, heterogeneous dynamic systems with multiple states, and understanding cell behavior within and across different Barrett's esophageal epithelial cell lines using oxygen consumption curves. These curves were characterized with good empirical fit using nonlinear models with simple structures which allowed extraction of a large number of features. Application of a supervised classification model to these features and the integration of experimental factors allowed for identification of subtle patterns among different cell types visualized through multidimensional scaling. Motivated by the challenges of analyzing real-time measurements, we further explored a unique two-dimensional representation of multiple time series using a wavelet approach which showcased promising results towards less complex approximations. Also, the benefits of external information were explored to improve the image representation.

ContributorsTorres Garcia, Wandaliz (Author) / Meldrum, Deirdre R. (Thesis advisor) / Runger, George C. (Thesis advisor) / Gel, Esma S. (Committee member) / Li, Jing (Committee member) / Zhang, Weiwen (Committee member) / Arizona State University (Publisher)

Created2011

Identification of neo-antigens for a cancer vaccine by transcriptome analysis

Description

We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA…

We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA processing or in genomic DNA, may lead to generation of neo-peptides that are foreign to the immune system. Viral peptides presumably would originate from exogenous but integrated viral nucleic acid sequences. Both are non-self, therefore lessen concerns about development of autoimmunity. I have developed a bioinformatical approach to identify these aberrant transcripts in the cancer transcriptome. Their suitability for use in a vaccine is evaluated by establishing their frequencies and predicting possible epitopes along with their population coverage according to the prevalence of major histocompatibility complex (MHC) types. Viral transcripts and transcripts with FS mutations from gene fusion, insertion/deletion at coding microsatellite DNA, and alternative splicing were identified in NCBI Expressed Sequence Tag (EST) database. 48 FS chimeric transcripts were validated in 50 breast cell lines and 68 primary breast tumor samples with their frequencies from 4% to 98% by RT-PCR and sequencing confirmation. These 48 FS peptides, if translated and presented, could be used to protect more than 90% of the population in Northern America based on the prediction of epitopes derived from them. Furthermore, we synthesized 150 peptides that correspond to FS and viral peptides that we predicted would exist in tumor patients and we tested over 200 different cancer patient sera. We found a number of serological reactive peptide sequences in cancer patients that had little to no reactivity in healthy controls; strong support for the strength of our bioinformatic approach. This study describes a process used to identify aberrant transcripts that lead to a new source of antigens that can be tested and used in a prophylactic cancer vaccine. The vast amount of transcriptome data of various cancers from the Cancer Genome Atlas (TCGA) project will enhance our ability to further select better cancer antigen candidates.

ContributorsLee, HoJoon (Author) / Johnston, Stephen A. (Thesis advisor) / Kumar, Sudhir (Committee member) / Miller, Laurence (Committee member) / Stafford, Phillip (Committee member) / Sykes, Kathryn (Committee member) / Arizona State University (Publisher)

Created2012

Novel statistical models for complex data structures

Description

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these…

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these massive datasets lay in their complex structures, such as high-dimensionality, hierarchy, multi-modality, heterogeneity and data uncertainty. Besides the statistical challenges, the associated computational approaches are also considered essential in achieving efficiency, effectiveness, as well as the numerical stability in practice. On the other hand, some recent developments in statistics and machine learning, such as sparse learning, transfer learning, and some traditional methodologies which still hold potential, such as multi-level models, all shed lights on addressing these complex datasets in a statistically powerful and computationally efficient way. In this dissertation, we identify four kinds of general complex datasets, including "high-dimensional datasets", "hierarchically-structured datasets", "multimodality datasets" and "data uncertainties", which are ubiquitous in many domains, such as biology, medicine, neuroscience, health care delivery, manufacturing, etc. We depict the development of novel statistical models to analyze complex datasets which fall under these four categories, and we show how these models can be applied to some real-world applications, such as Alzheimer's disease research, nursing care process, and manufacturing.

ContributorsHuang, Shuai (Author) / Li, Jing (Thesis advisor) / Askin, Ronald (Committee member) / Ye, Jieping (Committee member) / Runger, George C. (Committee member) / Arizona State University (Publisher)

Created2012

Machine Learning Methods for Diagnosis, Prognosis and Prediction of Long-term Treatment Outcome of Major Depression

Description

Major Depression, clinically called Major Depressive Disorder, is a mood disorder that affects about one eighth of population in US and is projected to be the second leading cause of disability in the world by the year 2020. Recent advances in biotechnology have enabled us to…

Major Depression, clinically called Major Depressive Disorder, is a mood disorder that affects about one eighth of population in US and is projected to be the second leading cause of disability in the world by the year 2020. Recent advances in biotechnology have enabled us to collect a great variety of data which could potentially offer us a deeper understanding of the disorder as well as advancing personalized medicine.

This dissertation focuses on developing methods for three different aspects of predictive analytics related to the disorder: automatic diagnosis, prognosis, and prediction of long-term treatment outcome. The data used for each task have their specific characteristics and demonstrate unique problems. Automatic diagnosis of melancholic depression is made on the basis of metabolic profiles and micro-array gene expression profiles where the presence of missing values and strong empirical correlation between the variables is not unusual. To deal with these problems, a method of generating a representative set of features is proposed. Prognosis is made on data collected from rating scales and questionnaires which consist mainly of categorical and ordinal variables and thus favor decision tree based predictive models. Decision tree models are known for the notorious problem of overfitting. A decision tree pruning method that overcomes the shortcomings of a greedy nature and reliance on heuristics inherent in traditional decision tree pruning approaches is proposed. The method is further extended to prune Gradient Boosting Decision Tree and tested on the task of prognosis of treatment outcome. Follow-up studies evaluating the long-term effect of the treatments on patients usually measure patients' depressive symptom severity monthly, resulting in the actual time of relapse upper bounded by the observed time of relapse. To resolve such uncertainty in response, a general loss function where the hypothesis could take different forms is proposed to predict the risk of relapse in situations where only an interval for time of relapse can be derived from the observed data.

ContributorsNie, Zhi (Author) / Ye, Jieping (Thesis advisor) / He, Jingrui (Thesis advisor) / Li, Baoxin (Committee member) / Xue, Guoliang (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2017

Exploring peptide space for enzyme modulators

Description

Enzymes which regulate the metabolic reactions for sustaining all living things, are the engines of life. The discovery of molecules that are able to control enzyme activity is of great interest for therapeutics and the biocatalysis industry. Peptides are promising enzyme modulators due to their large chemical diversity and the…

Enzymes which regulate the metabolic reactions for sustaining all living things, are the engines of life. The discovery of molecules that are able to control enzyme activity is of great interest for therapeutics and the biocatalysis industry. Peptides are promising enzyme modulators due to their large chemical diversity and the existence of well-established methods for library synthesis. Microarrays represent a powerful tool for screening thousands of molecules, on a small chip, for candidates that interact with enzymes and modulate their functions. In this work, a method is presented for screening high-density arrays to discover peptides that bind and modulate enzyme activity. A viscous polyvinyl alcohol (PVA) solution was applied to array surfaces to limit the diffusion of product molecules released from enzymatic reactions, allowing the simultaneous measurement of enzyme activity and binding at each peptide feature. For proof of concept, it was possible to identify peptides that bound to horseradish peroxidase (HRP), alkaline phosphatase (APase) and â-galactosidase (â-Gal) and substantially alter their activities by comparing the peptide-enzyme binding levels and bound enzyme activity on microarrays. Several peptides, selected from microarrays, were able to inhibit â-Gal in solution, which demonstrates that behaviors selected from surfaces often transfer to solution. A mechanistic study of inhibition revealed that some of the selected peptides inhibited enzyme activity by binding to enzymes and inducing aggregation. PVA-coated peptide slides can be rapidly analyzed, given an appropriate enzyme assay, and they may also be assayed under various conditions (such as temperature, pH and solvent). I have developed a general method to discover molecules that modulate enzyme activity at desired conditions. As demonstrations, some peptides were able to promote the thermal stability of bound enzyme, which were selected by performing the microarray-based enzyme assay at high temperature. For broad applications, selected peptide ligands were used to immobilize enzymes on solid surfaces. Compared to conventional methods, enzymes immobilized on peptide-modified surfaces exhibited higher specific activities and stabilities. Peptide-modified surfaces may prove useful for immobilizing enzymes on surfaces with optimized orientation, location and performance, which are of great interest to the biocatalysis industry.

ContributorsFu, Jinglin (Author) / Woodbury, Neal W (Thesis advisor) / Johnston, Stephen A. (Committee member) / Ghirlanda, Giovanna (Committee member) / Arizona State University (Publisher)

Created2010

Novel Semi-Supervised Learning Models to Balance Data Inclusivity and Usability in Healthcare Applications

Description

Semi-supervised learning (SSL) is sub-field of statistical machine learning that is useful for problems that involve having only a few labeled instances with predictor (X) and target (Y) information, and abundance of unlabeled instances that only have predictor (X) information. SSL harnesses the target information available in the limited…

Semi-supervised learning (SSL) is sub-field of statistical machine learning that is useful for problems that involve having only a few labeled instances with predictor (X) and target (Y) information, and abundance of unlabeled instances that only have predictor (X) information. SSL harnesses the target information available in the limited labeled data, as well as the information in the abundant unlabeled data to build strong predictive models. However, not all the included information is useful. For example, some features may correspond to noise and including them will hurt the predictive model performance. Additionally, some instances may not be as relevant to model building and their inclusion will increase training time and potentially hurt the model performance. The objective of this research is to develop novel SSL models to balance data inclusivity and usability. My dissertation research focuses on applications of SSL in healthcare, driven by problems in brain cancer radiomics, migraine imaging, and Parkinson’s Disease telemonitoring.

The first topic introduces an integration of machine learning (ML) and a mechanistic model (PI) to develop an SSL model applied to predicting cell density of glioblastoma brain cancer using multi-parametric medical images. The proposed ML-PI hybrid model integrates imaging information from unbiopsied regions of the brain as well as underlying biological knowledge from the mechanistic model to predict spatial tumor density in the brain.

The second topic develops a multi-modality imaging-based diagnostic decision support system (MMI-DDS). MMI-DDS consists of modality-wise principal components analysis to incorporate imaging features at different aggregation levels (e.g., voxel-wise, connectivity-based, etc.), a constrained particle swarm optimization (cPSO) feature selection algorithm, and a clinical utility engine that utilizes inverse operators on chosen principal components for white-box classification models.

The final topic develops a new SSL regression model with integrated feature and instance selection called s2SSL (with “s2” referring to selection in two different ways: feature and instance). s2SSL integrates cPSO feature selection and graph-based instance selection to simultaneously choose the optimal features and instances and build accurate models for continuous prediction. s2SSL was applied to smartphone-based telemonitoring of Parkinson’s Disease patients.

ContributorsGaw, Nathan (Author) / Li, Jing (Thesis advisor) / Wu, Teresa (Committee member) / Yan, Hao (Committee member) / Hu, Leland (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by