Search Content

Targeted proteomics studies: design, development and translation of mass spectrometric immunoassays for diabetes and kidney disease

Description

In an effort to begin validating the large number of discovered candidate biomarkers, proteomics is beginning to shift from shotgun proteomic experiments towards targeted proteomic approaches that provide solutions to automation and economic concerns. Such approaches to validate biomarkers necessitate the mass spectrometric analysis of hundreds to thousands of human…

In an effort to begin validating the large number of discovered candidate biomarkers, proteomics is beginning to shift from shotgun proteomic experiments towards targeted proteomic approaches that provide solutions to automation and economic concerns. Such approaches to validate biomarkers necessitate the mass spectrometric analysis of hundreds to thousands of human samples. As this takes place, a serendipitous opportunity has become evident. By the virtue that as one narrows the focus towards "single" protein targets (instead of entire proteomes) using pan-antibody-based enrichment techniques, a discovery science has emerged, so to speak. This is due to the largely unknown context in which "single" proteins exist in blood (i.e. polymorphisms, transcript variants, and posttranslational modifications) and hence, targeted proteomics has applications for established biomarkers. Furthermore, besides protein heterogeneity accounting for interferences with conventional immunometric platforms, it is becoming evident that this formerly hidden dimension of structural information also contains rich-pathobiological information. Consequently, targeted proteomics studies that aim to ascertain a protein's genuine presentation within disease- stratified populations and serve as a stepping-stone within a biomarker translational pipeline are of clinical interest. Roughly 128 million Americans are pre-diabetic, diabetic, and/or have kidney disease and public and private spending for treating these diseases is in the hundreds of billions of dollars. In an effort to create new solutions for the early detection and management of these conditions, described herein is the design, development, and translation of mass spectrometric immunoassays targeted towards diabetes and kidney disease. Population proteomics experiments were performed for the following clinically relevant proteins: insulin, C-peptide, RANTES, and parathyroid hormone. At least thirty-eight protein isoforms were detected. Besides the numerous disease correlations confronted within the disease-stratified cohorts, certain isoforms also appeared to be causally related to the underlying pathophysiology and/or have therapeutic implications. Technical advancements include multiplexed isoform quantification as well a "dual- extraction" methodology for eliminating non-specific proteins while simultaneously validating isoforms. Industrial efforts towards widespread clinical adoption are also described. Consequently, this work lays a foundation for the translation of mass spectrometric immunoassays into the clinical arena and simultaneously presents the most recent advancements concerning the mass spectrometric immunoassay approach.

ContributorsOran, Paul (Author) / Nelson, Randall (Thesis advisor) / Hayes, Mark (Thesis advisor) / Ros, Alexandra (Committee member) / Williams, Peter (Committee member) / Arizona State University (Publisher)

Created2011

Statistical signal processing of ESI-TOF-MS for biomarker discovery

Description

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest from degraded measurements. One such application is mass spectrometry immunoassay…

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest from degraded measurements. One such application is mass spectrometry immunoassay (MSIA) which has been one of the primary methods of biomarker discovery techniques. MSIA analyzes protein molecules as potential biomarkers using time of flight mass spectrometry (TOF-MS). Peak detection in TOF-MS is important for biomarker analysis and many other MS related application. Though many peak detection algorithms exist, most of them are based on heuristics models. One of the ways of detecting signal peaks is by deploying stochastic models of the signal and noise observations. Likelihood ratio test (LRT) detector, based on the Neyman-Pearson (NP) lemma, is an uniformly most powerful test to decision making in the form of a hypothesis test. The primary goal of this dissertation is to develop signal and noise models for the electrospray ionization (ESI) TOF-MS data. A new method is proposed for developing the signal model by employing first principles calculations based on device physics and molecular properties. The noise model is developed by analyzing MS data from careful experiments in the ESI mass spectrometer. A non-flat baseline in MS data is common. The reasons behind the formation of this baseline has not been fully comprehended. A new signal model explaining the presence of baseline is proposed, though detailed experiments are needed to further substantiate the model assumptions. Signal detection schemes based on these signal and noise models are proposed. A maximum likelihood (ML) method is introduced for estimating the signal peak amplitudes. The performance of the detection methods and ML estimation are evaluated with Monte Carlo simulation which shows promising results. An application of these methods is proposed for fractional abundance calculation for biomarker analysis, which is mathematically robust and fundamentally different than the current algorithms. Biomarker panels for type 2 diabetes and cardiovascular disease are analyzed using existing MS analysis algorithms. Finally, a support vector machine based multi-classification algorithm is developed for evaluating the biomarkers' effectiveness in discriminating type 2 diabetes and cardiovascular diseases and is shown to perform better than a linear discriminant analysis based classifier.

ContributorsBuddi, Sai (Author) / Taylor, Thomas (Thesis advisor) / Cochran, Douglas (Thesis advisor) / Nelson, Randall (Committee member) / Duman, Tolga (Committee member) / Arizona State University (Publisher)

Created2012

Structural characterization of potential cancer biomarker proteins

Description

Cancer claims hundreds of thousands of lives every year in US alone. Finding ways for early detection of cancer onset is crucial for better management and treatment of cancer. Thus, biomarkers especially protein biomarkers, being the functional units which reflect dynamic physiological changes, need to be discovered. Though important, there…

Cancer claims hundreds of thousands of lives every year in US alone. Finding ways for early detection of cancer onset is crucial for better management and treatment of cancer. Thus, biomarkers especially protein biomarkers, being the functional units which reflect dynamic physiological changes, need to be discovered. Though important, there are only a few approved protein cancer biomarkers till date. To accelerate this process, fast, comprehensive and affordable assays are required which can be applied to large population studies. For this, these assays should be able to comprehensively characterize and explore the molecular diversity of nominally "single" proteins across populations. This information is usually unavailable with commonly used immunoassays such as ELISA (enzyme linked immunosorbent assay) which either ignore protein microheterogeneity, or are confounded by it. To this end, mass spectrometric immuno assays (MSIA) for three different human plasma proteins have been developed. These proteins viz. IGF-1, hemopexin and tetranectin have been found in reported literature to show correlations with many diseases along with several carcinomas. Developed assays were used to extract entire proteins from plasma samples and subsequently analyzed on mass spectrometric platforms. Matrix assisted laser desorption ionization (MALDI) and electrospray ionization (ESI) mass spectrometric techniques where used due to their availability and suitability for the analysis. This resulted in visibility of different structural forms of these proteins showing their structural micro-heterogeneity which is invisible to commonly used immunoassays. These assays are fast, comprehensive and can be applied in large sample studies to analyze proteins for biomarker discovery.

ContributorsRai, Samita (Author) / Nelson, Randall (Thesis advisor) / Hayes, Mark (Thesis advisor) / Borges, Chad (Committee member) / Ros, Alexandra (Committee member) / Arizona State University (Publisher)

Created2012

Identification of Tumor Associated Antigens using Nucleic Acid Programmable Protein Arrays

Description

Identifying disease biomarkers may aid in the early detection of breast cancer and improve patient outcomes. Recent evidence suggests that tumors are immunogenic and therefore patients may launch an autoantibody response to tumor associated antigens. Single-chain variable fragments of autoantibodies derived from regional lymph node B cells of breast cancer…

Identifying disease biomarkers may aid in the early detection of breast cancer and improve patient outcomes. Recent evidence suggests that tumors are immunogenic and therefore patients may launch an autoantibody response to tumor associated antigens. Single-chain variable fragments of autoantibodies derived from regional lymph node B cells of breast cancer patients were used to discover these tumor associated biomarkers on protein microarrays. Six candidate biomarkers were discovered from 22 heavy chain-only variable region antibody fragments screened. Validation tests are necessary to confirm the tumorgenicity of these antigens. However, the use of single-chain variable autoantibody fragments presents a novel platform for diagnostics and cancer therapeutics.

ContributorsSharman, M. Camila (Author) / Magee, Dewey (Mitch) (Thesis director) / Wallstrom, Garrick (Committee member) / Petritis, Brianne (Committee member) / Barrett, The Honors College (Contributor) / College of Liberal Arts and Sciences (Contributor) / Virginia G. Piper Center for Personalized Diagnostics (Contributor) / Biodesign Institute (Contributor)

Created2012-12

Gut Bacteria in Children With Autism Spectrum Disorders: Challenges and Promise of Studying How a Complex Community Influences a Complex Disease

Description

Recent studies suggest a role for the microbiota in autism spectrum disorders (ASD), potentially arising from their role in modulating the immune system and gastrointestinal (GI) function or from gut–brain interactions dependent or independent from the immune system. GI problems such as chronic constipation and/or diarrhea are common in children…

Recent studies suggest a role for the microbiota in autism spectrum disorders (ASD), potentially arising from their role in modulating the immune system and gastrointestinal (GI) function or from gut–brain interactions dependent or independent from the immune system. GI problems such as chronic constipation and/or diarrhea are common in children with ASD, and significantly worsen their behavior and their quality of life. Here we first summarize previously published data supporting that GI dysfunction is common in individuals with ASD and the role of the microbiota in ASD. Second, by comparing with other publically available microbiome datasets, we provide some evidence that the shifted microbiota can be a result of westernization and that this shift could also be framing an altered immune system. Third, we explore the possibility that gut–brain interactions could also be a direct result of microbially produced metabolites.

ContributorsKrajmalnik-Brown, Rosa (Author) / Lozupone, Catherine (Author) / Kang, Dae Wook (Author) / Adams, James (Author) / Biodesign Institute (Contributor)

Created2015-03-12

Image-level and group-level models for Drosophila gene expression pattern annotation

Description

Background
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the…

Background
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
Results
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
Conclusion
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.

ContributorsSun, Qian (Author) / Muckatira, Sherin (Author) / Yuan, Lei (Author) / Ji, Shuiwang (Author) / Newfeld, Stuart (Author) / Kumar, Sudhir (Author) / Ye, Jieping (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / Ira A. Fulton Schools of Engineering (Contributor)

Created2013-12-03

GRASP [Genomic Resource Access for Stoichioproteomics]: comparative explorations of the atomic content of 12 Drosophila proteomes

Description

Background
“Stoichioproteomics” relates the elemental composition of proteins and proteomes to variation in the physiological and ecological environment. To help harness and explore the wealth of hypotheses made possible under this framework, we introduce GRASP (http://www.graspdb.net), a public bioinformatic knowledgebase containing information on the frequencies of 20 amino acids and atomic…

Background
“Stoichioproteomics” relates the elemental composition of proteins and proteomes to variation in the physiological and ecological environment. To help harness and explore the wealth of hypotheses made possible under this framework, we introduce GRASP (http://www.graspdb.net), a public bioinformatic knowledgebase containing information on the frequencies of 20 amino acids and atomic composition of their side chains. GRASP integrates comparative protein composition data with annotation data from multiple public databases. Currently, GRASP includes information on proteins of 12 sequenced Drosophila (fruit fly) proteomes, which will be expanded to include increasingly diverse organisms over time. In this paper we illustrate the potential of GRASP for testing stoichioproteomic hypotheses by conducting an exploratory investigation into the composition of 12 Drosophila proteomes, testing the prediction that protein atomic content is associated with species ecology and with protein expression levels.
Results
Elements varied predictably along multivariate axes. Species were broadly similar, with the D. willistoni proteome a clear outlier. As expected, individual protein atomic content within proteomes was influenced by protein function and amino acid biochemistry. Evolution in elemental composition across the phylogeny followed less predictable patterns, but was associated with broad ecological variation in diet. Using expression data available for D. melanogaster, we found evidence consistent with selection for efficient usage of elements within the proteome: as expected, nitrogen content was reduced in highly expressed proteins in most tissues, most strongly in the gut, where nutrients are assimilated, and least strongly in the germline.
Conclusions
The patterns identified here using GRASP provide a foundation on which to base future research into the evolution of atomic composition in Drosophila and other taxa.

ContributorsGilbert, James D. J. (Author) / Acquisti, Claudia (Author) / Martinson, Holly M. (Author) / Elser, James (Author) / Kumar, Sudhir (Author) / Fagan, William F. (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor)

Created2013-09-04

A composite genome approach to identify phylogenetically informative data from next-generation sequencing

Description

Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS,…

Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.
Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.
Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.

ContributorsSchwartz, Rachel (Author) / Harkins, Kelly (Author) / Stone, Anne (Author) / Cartwright, Reed (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor) / School of Life Sciences (Contributor)

Created2015-06-11

A Bag-of-Words Approach for Drosophila Gene Expression Pattern Annotation

Description

Background:
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput…

Background:
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task.

Results:
We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods.

Conclusion:
The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.

ContributorsJi, Shuiwang (Author) / Li, Ying-Xin (Author) / Zhou, Zhi-Hua (Author) / Kumar, Sudhir (Author) / Ye, Jieping (Author) / Biodesign Institute (Contributor) / Ira A. Fulton Schools of Engineering (Contributor) / School of Electrical, Computer and Energy Engineering (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor)

Created2009-04-21

Evolutionary Diagnosis of Non-Synonymous Variants Involved in Differential Drug Response

Description

Background:
Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their…

Background:
Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their ability to modulate the drug response.

Results:
We found that the available data on the link between drug response and nsSNV is rather modest. There were only 31 distinct drug response-altering (DR-altering) and 43 distinct drug response-neutral (DR-neutral) nsSNVs in the whole Pharmacogenomics Knowledge Base (PharmGKB). However, even with this modest dataset, it was clear that existing bioinformatics tools have difficulties in correctly predicting the known DR-altering and DR-neutral nsSNVs. They exhibited an overall accuracy of less than 50%, which was not better than random diagnosis. We found that the underlying problem is the markedly different evolutionary properties between positions harboring nsSNVs linked to drug responses and those observed for inherited diseases. To solve this problem, we developed a new diagnosis method, Drug-EvoD, which was trained on the evolutionary properties of nsSNVs associated with drug responses in a sparse learning framework. Drug-EvoD achieves a TPR of 84% and a TNR of 53%, with a balanced accuracy of 69%, which improves upon other methods significantly.

Conclusions:
The new tool will enable researchers to computationally identify nsSNVs that may affect drug responses. However, much larger training and testing datasets are needed to develop more reliable and accurate tools.

ContributorsGerek, Nevin Z. (Author) / Liu, Li (Author) / Gerold, Kristyn (Author) / Biparva, Pegah (Author) / Thomas, Eric D. (Author) / Kumar, Sudhir (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor)

Created2015-01-15