Search Content

Statistical signal processing of ESI-TOF-MS for biomarker discovery

Description

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest from degraded measurements. One such application is mass spectrometry immunoassay…

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest from degraded measurements. One such application is mass spectrometry immunoassay (MSIA) which has been one of the primary methods of biomarker discovery techniques. MSIA analyzes protein molecules as potential biomarkers using time of flight mass spectrometry (TOF-MS). Peak detection in TOF-MS is important for biomarker analysis and many other MS related application. Though many peak detection algorithms exist, most of them are based on heuristics models. One of the ways of detecting signal peaks is by deploying stochastic models of the signal and noise observations. Likelihood ratio test (LRT) detector, based on the Neyman-Pearson (NP) lemma, is an uniformly most powerful test to decision making in the form of a hypothesis test. The primary goal of this dissertation is to develop signal and noise models for the electrospray ionization (ESI) TOF-MS data. A new method is proposed for developing the signal model by employing first principles calculations based on device physics and molecular properties. The noise model is developed by analyzing MS data from careful experiments in the ESI mass spectrometer. A non-flat baseline in MS data is common. The reasons behind the formation of this baseline has not been fully comprehended. A new signal model explaining the presence of baseline is proposed, though detailed experiments are needed to further substantiate the model assumptions. Signal detection schemes based on these signal and noise models are proposed. A maximum likelihood (ML) method is introduced for estimating the signal peak amplitudes. The performance of the detection methods and ML estimation are evaluated with Monte Carlo simulation which shows promising results. An application of these methods is proposed for fractional abundance calculation for biomarker analysis, which is mathematically robust and fundamentally different than the current algorithms. Biomarker panels for type 2 diabetes and cardiovascular disease are analyzed using existing MS analysis algorithms. Finally, a support vector machine based multi-classification algorithm is developed for evaluating the biomarkers' effectiveness in discriminating type 2 diabetes and cardiovascular diseases and is shown to perform better than a linear discriminant analysis based classifier.

ContributorsBuddi, Sai (Author) / Taylor, Thomas (Thesis advisor) / Cochran, Douglas (Thesis advisor) / Nelson, Randall (Committee member) / Duman, Tolga (Committee member) / Arizona State University (Publisher)

Created2012

Targeted proteomics studies: design, development and translation of mass spectrometric immunoassays for diabetes and kidney disease

Description

In an effort to begin validating the large number of discovered candidate biomarkers, proteomics is beginning to shift from shotgun proteomic experiments towards targeted proteomic approaches that provide solutions to automation and economic concerns. Such approaches to validate biomarkers necessitate the mass spectrometric analysis of hundreds to thousands of human…

In an effort to begin validating the large number of discovered candidate biomarkers, proteomics is beginning to shift from shotgun proteomic experiments towards targeted proteomic approaches that provide solutions to automation and economic concerns. Such approaches to validate biomarkers necessitate the mass spectrometric analysis of hundreds to thousands of human samples. As this takes place, a serendipitous opportunity has become evident. By the virtue that as one narrows the focus towards "single" protein targets (instead of entire proteomes) using pan-antibody-based enrichment techniques, a discovery science has emerged, so to speak. This is due to the largely unknown context in which "single" proteins exist in blood (i.e. polymorphisms, transcript variants, and posttranslational modifications) and hence, targeted proteomics has applications for established biomarkers. Furthermore, besides protein heterogeneity accounting for interferences with conventional immunometric platforms, it is becoming evident that this formerly hidden dimension of structural information also contains rich-pathobiological information. Consequently, targeted proteomics studies that aim to ascertain a protein's genuine presentation within disease- stratified populations and serve as a stepping-stone within a biomarker translational pipeline are of clinical interest. Roughly 128 million Americans are pre-diabetic, diabetic, and/or have kidney disease and public and private spending for treating these diseases is in the hundreds of billions of dollars. In an effort to create new solutions for the early detection and management of these conditions, described herein is the design, development, and translation of mass spectrometric immunoassays targeted towards diabetes and kidney disease. Population proteomics experiments were performed for the following clinically relevant proteins: insulin, C-peptide, RANTES, and parathyroid hormone. At least thirty-eight protein isoforms were detected. Besides the numerous disease correlations confronted within the disease-stratified cohorts, certain isoforms also appeared to be causally related to the underlying pathophysiology and/or have therapeutic implications. Technical advancements include multiplexed isoform quantification as well a "dual- extraction" methodology for eliminating non-specific proteins while simultaneously validating isoforms. Industrial efforts towards widespread clinical adoption are also described. Consequently, this work lays a foundation for the translation of mass spectrometric immunoassays into the clinical arena and simultaneously presents the most recent advancements concerning the mass spectrometric immunoassay approach.

ContributorsOran, Paul (Author) / Nelson, Randall (Thesis advisor) / Hayes, Mark (Thesis advisor) / Ros, Alexandra (Committee member) / Williams, Peter (Committee member) / Arizona State University (Publisher)

Created2011

Characterization and analysis of a novel platform for profiling the antibody response

Description

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze…

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze the factors affecting the binding patterns using monoclonal antibodies and determine how much information may be extracted from the sequences. Specifically, I examined the effects of antibody concentration, competition, peptide density, and antibody valence. Peptide binding could be detected at the low concentrations relevant to immunosignaturing, and a monoclonal's signature could even be detected in the presences of 100 fold excess naive IgG. I also found that peptide density was important, but this effect was not due to bivalent binding. Next, I examined in more detail how a polyreactive antibody binds to the random sequence peptides compared to protein sequence derived peptides, and found that it bound to many peptides from both sets, but with low apparent affinity. An in depth look at how the peptide physicochemical properties and sequence complexity revealed that there were some correlations with properties, but they were generally small and varied greatly between antibodies. However, on a limited diversity but larger peptide library, I found that sequence complexity was important for antibody binding. The redundancy on that library did enable the identification of specific sub-sequences recognized by an antibody. The current immunosignaturing platform has little repetition of sub-sequences, so I evaluated several methods to infer antibody epitopes. I found two methods that had modest prediction accuracy, and I developed a software application called GuiTope to facilitate the epitope prediction analysis. None of the methods had sufficient accuracy to identify an unknown antigen from a database. In conclusion, the characteristics of the immunosignaturing platform observed through monoclonal antibody experiments demonstrate its promise as a new diagnostic technology. However, a major limitation is the difficulty in connecting the signature back to the original antigen, though larger peptide libraries could facilitate these predictions.

ContributorsHalperin, Rebecca (Author) / Johnston, Stephen A. (Thesis advisor) / Bordner, Andrew (Committee member) / Taylor, Thomas (Committee member) / Stafford, Phillip (Committee member) / Arizona State University (Publisher)

Created2011

Micro-particle streak velocimetry: theory, simulation methods and applications

Description

This dissertation describes a novel, low cost strategy of using particle streak (track) images for accurate micro-channel velocity field mapping. It is shown that 2-dimensional, 2-component fields can be efficiently obtained using the spatial variation of particle track lengths in micro-channels. The velocity field is a critical performance feature of…

This dissertation describes a novel, low cost strategy of using particle streak (track) images for accurate micro-channel velocity field mapping. It is shown that 2-dimensional, 2-component fields can be efficiently obtained using the spatial variation of particle track lengths in micro-channels. The velocity field is a critical performance feature of many microfluidic devices. Since it is often the case that un-modeled micro-scale physics frustrates principled design methodologies, particle based velocity field estimation is an essential design and validation tool. Current technologies that achieve this goal use particle constellation correlation strategies and rely heavily on costly, high-speed imaging hardware. The proposed image/ video processing based method achieves comparable accuracy for fraction of the cost. In the context of micro-channel velocimetry, the usability of particle streaks has been poorly studied so far. Their use has remained restricted mostly to bulk flow measurements and occasional ad-hoc uses in microfluidics. A second look at the usability of particle streak lengths in this work reveals that they can be efficiently used, after approximately 15 years from their first use for micro-channel velocimetry. Particle tracks in steady, smooth microfluidic flows is mathematically modeled and a framework for using experimentally observed particle track lengths for local velocity field estimation is introduced here, followed by algorithm implementation and quantitative verification. Further, experimental considerations and image processing techniques that can facilitate the proposed methods are also discussed in this dissertation. Unavailability of benchmarked particle track image data motivated the implementation of a simulation framework with the capability to generate exposure time controlled particle track image sequence for velocity vector fields. This dissertation also describes this work and shows that arbitrary velocity fields designed in computational fluid dynamics software tools can be used to obtain such images. Apart from aiding gold-standard data generation, such images would find use for quick microfluidic flow field visualization and help improve device designs.

ContributorsMahanti, Prasun (Author) / Cochran, Douglas (Thesis advisor) / Taylor, Thomas (Thesis advisor) / Hayes, Mark (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)

Created2011

Bayesian networks and gaussian mixture models in multi-dimensional data analysis with application to religion-conflict data

Description

This thesis examines the application of statistical signal processing approaches to data arising from surveys intended to measure psychological and sociological phenomena underpinning human social dynamics. The use of signal processing methods for analysis of signals arising from measurement of social, biological, and other non-traditional phenomena has been an important…

This thesis examines the application of statistical signal processing approaches to data arising from surveys intended to measure psychological and sociological phenomena underpinning human social dynamics. The use of signal processing methods for analysis of signals arising from measurement of social, biological, and other non-traditional phenomena has been an important and growing area of signal processing research over the past decade. Here, we explore the application of statistical modeling and signal processing concepts to data obtained from the Global Group Relations Project, specifically to understand and quantify the effects and interactions of social psychological factors related to intergroup conflicts. We use Bayesian networks to specify prospective models of conditional dependence. Bayesian networks are determined between social psychological factors and conflict variables, and modeled by directed acyclic graphs, while the significant interactions are modeled as conditional probabilities. Since the data are sparse and multi-dimensional, we regress Gaussian mixture models (GMMs) against the data to estimate the conditional probabilities of interest. The parameters of GMMs are estimated using the expectation-maximization (EM) algorithm. However, the EM algorithm may suffer from over-fitting problem due to the high dimensionality and limited observations entailed in this data set. Therefore, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are used for GMM order estimation. To assist intuitive understanding of the interactions of social variables and the intergroup conflicts, we introduce a color-based visualization scheme. In this scheme, the intensities of colors are proportional to the conditional probabilities observed.

ContributorsLiu, Hui (Author) / Taylor, Thomas (Thesis advisor) / Cochran, Douglas (Thesis advisor) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)

Created2012

Structural characterization of potential cancer biomarker proteins

Description

Cancer claims hundreds of thousands of lives every year in US alone. Finding ways for early detection of cancer onset is crucial for better management and treatment of cancer. Thus, biomarkers especially protein biomarkers, being the functional units which reflect dynamic physiological changes, need to be discovered. Though important, there…

Cancer claims hundreds of thousands of lives every year in US alone. Finding ways for early detection of cancer onset is crucial for better management and treatment of cancer. Thus, biomarkers especially protein biomarkers, being the functional units which reflect dynamic physiological changes, need to be discovered. Though important, there are only a few approved protein cancer biomarkers till date. To accelerate this process, fast, comprehensive and affordable assays are required which can be applied to large population studies. For this, these assays should be able to comprehensively characterize and explore the molecular diversity of nominally "single" proteins across populations. This information is usually unavailable with commonly used immunoassays such as ELISA (enzyme linked immunosorbent assay) which either ignore protein microheterogeneity, or are confounded by it. To this end, mass spectrometric immuno assays (MSIA) for three different human plasma proteins have been developed. These proteins viz. IGF-1, hemopexin and tetranectin have been found in reported literature to show correlations with many diseases along with several carcinomas. Developed assays were used to extract entire proteins from plasma samples and subsequently analyzed on mass spectrometric platforms. Matrix assisted laser desorption ionization (MALDI) and electrospray ionization (ESI) mass spectrometric techniques where used due to their availability and suitability for the analysis. This resulted in visibility of different structural forms of these proteins showing their structural micro-heterogeneity which is invisible to commonly used immunoassays. These assays are fast, comprehensive and can be applied in large sample studies to analyze proteins for biomarker discovery.

ContributorsRai, Samita (Author) / Nelson, Randall (Thesis advisor) / Hayes, Mark (Thesis advisor) / Borges, Chad (Committee member) / Ros, Alexandra (Committee member) / Arizona State University (Publisher)

Created2012

Evaluating the utility of blood glycan levels as predictors of stage I adenocarcinoma using support vector machines.

Description

Aberrant glycosylation has been shown to be linked to specific cancers, and using this idea, it was proposed that the levels of glycans in the blood could predict stage I adenocarcinoma. To track this glycosylation, glycan were broken down into glycan nodes via methylation analysis. This analysis utilized information from…

Aberrant glycosylation has been shown to be linked to specific cancers, and using this idea, it was proposed that the levels of glycans in the blood could predict stage I adenocarcinoma. To track this glycosylation, glycan were broken down into glycan nodes via methylation analysis. This analysis utilized information from N-, O-, and lipid linked glycans detected from gas chromatography-mass spectrometry. The resulting glycan node-ratios represent the initial quantitative data that were used in this experiment.
For this experiment, two Sets of 50 µl blood plasma samples were provided by NYU Medical School. These samples were then analyzed by Dr. Borges’s lab so that they contained normalized biomarker levels from patients with stage 1 adenocarcinoma and control patients with matched age, smoking status, and gender were examined. An ROC curve was constructed under individual and paired conditions and AUC calculated in Wolfram Mathematica 10.2. Methods such as increasing size of training set, using hard vs. soft margins, and processing biomarkers together and individually were used in order to increase the AUC. Using a soft margin for this particular data set was proved to be most useful compared to the initial set hard margin, raising the AUC from 0.6013 to 0.6585. In regards to which biomarkers yielded the better value, 6-Glc/6-Man and 3,6-Gal glycan node ratios had the best with 0.7687 AUC and a sensitivity of .7684 and specificity of .6051. While this is not enough accuracy to become a primary diagnostic tool for diagnosing stage I adenocarcinoma, the methods examined in the paper should be evaluated further. . By comparison, the current clinical standard blood test for prostate cancer that has an AUC of only 0.67.

ContributorsDe Jesus, Celine Spicer (Author) / Taylor, Thomas (Thesis director) / Borges, Chad (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Hyphenation of a microfluidic platform with MALDI-TOF mass spectrometry for single cell analysis

Description

Cell heterogeneity is widely present in the biological world and exists even in an isogenic population. Resolving the protein heterogeneity at the single cell level is of enormous biological and clinical relevance. However, single cell protein analysis has proven to be challenging due to extremely low amount of protein in…

Cell heterogeneity is widely present in the biological world and exists even in an isogenic population. Resolving the protein heterogeneity at the single cell level is of enormous biological and clinical relevance. However, single cell protein analysis has proven to be challenging due to extremely low amount of protein in a single cell and the huge complexity of proteome. This requires appropriate sampling and sensitive detection techniques. Here, a new approach, microfluidics combined with MALDI-TOF mass spectrometry was brought forward, for the analysis of proteins in single cells. The detection sensitivity of peptides as low as 300 molecules and of proteins as low as 10^6 molecules has been demonstrated. Furthermore, an immunoassay was successfully integrated in the microfluidic device for capturing the proteins of interest and further identifying them by subsequent enzymatic digestion. Moreover, an improved microfluidic platform was designed with separate chambers and valves, allowing the absolute quantification by employing iTRAQ tags or an isotopically labeled peptide. The study was further extended to analyze a protein in MCF-7 cell lysate. The approach capable of identifying and quantifying protein molecules in MCF-7 cells is promising for future proteomic studies at the single cell level.

ContributorsYang, Mian (Author) / Ros, Alexandra (Thesis advisor) / Hayes, Mark (Committee member) / Nelson, Randall (Committee member) / Arizona State University (Publisher)

Created2016

Development and analysis of stochastic boundary coverage strategies for multi-robot systems

Description

Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require the robots to allocate themselves around the boundaries of regions…

Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require the robots to allocate themselves around the boundaries of regions or features of interest and achieve target objectives that derive from their resulting spatial configurations, such as forming a connected communication network or acquiring sensor data around the entire boundary. We refer to this spatial allocation problem as boundary coverage. Possible swarm tasks that will involve boundary coverage include cooperative load manipulation for applications in construction, manufacturing, and disaster response.

In this work, I address the challenges of controlling a swarm of resource-constrained robots to achieve boundary coverage, which I refer to as the problem of stochastic boundary coverage. I first examined an instance of this behavior in the biological phenomenon of group food retrieval by desert ants, and developed a hybrid dynamical system model of this process from experimental data. Subsequently, with the aid of collaborators, I used a continuum abstraction of swarm population dynamics, adapted from a modeling framework used in chemical kinetics, to derive stochastic robot control policies that drive a swarm to target steady-state allocations around multiple boundaries in a way that is robust to environmental variations.

Next, I determined the statistical properties of the random graph that is formed by a group of robots, each with the same capabilities, that have attached to a boundary at random locations. I also computed the probability density functions (pdfs) of the robot positions and inter-robot distances for this case.

I then extended this analysis to cases in which the robots have heterogeneous communication/sensing radii and attach to a boundary according to non-uniform, non-identical pdfs. I proved that these more general coverage strategies generate random graphs whose probability of connectivity is Sharp-P Hard to compute. Finally, I investigated possible approaches to validating our boundary coverage strategies in multi-robot simulations with realistic Wi-fi communication.

ContributorsPeruvemba Kumar, Ganesh (Author) / Berman, Spring M (Thesis advisor) / Fainekos, Georgios (Thesis advisor) / Bazzi, Rida (Committee member) / Syrotiuk, Violet (Committee member) / Taylor, Thomas (Committee member) / Arizona State University (Publisher)

Created2016

Statistical and dynamical modeling of Riemannian trajectories with application to human movement analysis

Description

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon…

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.

ContributorsAnirudh, Rushil (Author) / Turaga, Pavan (Thesis advisor) / Cochran, Douglas (Committee member) / Runger, George C. (Committee member) / Taylor, Thomas (Committee member) / Arizona State University (Publisher)

Created2016