Matching Items (3)

151436-Thumbnail Image.png

Statistical signal processing of ESI-TOF-MS for biomarker discovery

Description

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest from degraded measurements. One such application is mass spectrometry immunoassay (MSIA) which has been one of the primary methods of biomarker discovery techniques. MSIA analyzes protein molecules as potential biomarkers using time of flight mass spectrometry (TOF-MS). Peak detection in TOF-MS is important for biomarker analysis and many other MS related application. Though many peak detection algorithms exist, most of them are based on heuristics models. One of the ways of detecting signal peaks is by deploying stochastic models of the signal and noise observations. Likelihood ratio test (LRT) detector, based on the Neyman-Pearson (NP) lemma, is an uniformly most powerful test to decision making in the form of a hypothesis test. The primary goal of this dissertation is to develop signal and noise models for the electrospray ionization (ESI) TOF-MS data. A new method is proposed for developing the signal model by employing first principles calculations based on device physics and molecular properties. The noise model is developed by analyzing MS data from careful experiments in the ESI mass spectrometer. A non-flat baseline in MS data is common. The reasons behind the formation of this baseline has not been fully comprehended. A new signal model explaining the presence of baseline is proposed, though detailed experiments are needed to further substantiate the model assumptions. Signal detection schemes based on these signal and noise models are proposed. A maximum likelihood (ML) method is introduced for estimating the signal peak amplitudes. The performance of the detection methods and ML estimation are evaluated with Monte Carlo simulation which shows promising results. An application of these methods is proposed for fractional abundance calculation for biomarker analysis, which is mathematically robust and fundamentally different than the current algorithms. Biomarker panels for type 2 diabetes and cardiovascular disease are analyzed using existing MS analysis algorithms. Finally, a support vector machine based multi-classification algorithm is developed for evaluating the biomarkers' effectiveness in discriminating type 2 diabetes and cardiovascular diseases and is shown to perform better than a linear discriminant analysis based classifier.

Contributors

Agent

Created

Date Created
  • 2012

154587-Thumbnail Image.png

Graph-based estimation of information divergence functions

Description

Information divergence functions, such as the Kullback-Leibler divergence or the Hellinger distance, play a critical role in statistical signal processing and information theory; however estimating them can be challenge. Most

Information divergence functions, such as the Kullback-Leibler divergence or the Hellinger distance, play a critical role in statistical signal processing and information theory; however estimating them can be challenge. Most often, parametric assumptions are made about the two distributions to estimate the divergence of interest. In cases where no parametric model fits the data, non-parametric density estimation is used. In statistical signal processing applications, Gaussianity is usually assumed since closed-form expressions for common divergence measures have been derived for this family of distributions. Parametric assumptions are preferred when it is known that the data follows the model, however this is rarely the case in real-word scenarios. Non-parametric density estimators are characterized by a very large number of parameters that have to be tuned with costly cross-validation. In this dissertation we focus on a specific family of non-parametric estimators, called direct estimators, that bypass density estimation completely and directly estimate the quantity of interest from the data. We introduce a new divergence measure, the $D_p$-divergence, that can be estimated directly from samples without parametric assumptions on the distribution. We show that the $D_p$-divergence bounds the binary, cross-domain, and multi-class Bayes error rates and, in certain cases, provides provably tighter bounds than the Hellinger divergence. In addition, we also propose a new methodology that allows the experimenter to construct direct estimators for existing divergence measures or to construct new divergence measures with custom properties that are tailored to the application. To examine the practical efficacy of these new methods, we evaluate them in a statistical learning framework on a series of real-world data science problems involving speech-based monitoring of neuro-motor disorders.

Contributors

Agent

Created

Date Created
  • 2017

153287-Thumbnail Image.png

Estimation of subspace occupancy

Description

The ability to identify unoccupied resources in the radio spectrum is a key capability for opportunistic users in a cognitive radio environment. This paper draws upon and extends geometrically based

The ability to identify unoccupied resources in the radio spectrum is a key capability for opportunistic users in a cognitive radio environment. This paper draws upon and extends geometrically based ideas in statistical signal processing to develop estimators for the rank and the occupied subspace in a multi-user environment from multiple temporal samples of the signal received at a single antenna. These estimators enable identification of resources, such as the orthogonal complement of the occupied subspace, that may be exploitable by an opportunistic user. This concept is supported by simulations showing the estimation of the number of users in a simple CDMA system using a maximum a posteriori (MAP) estimate for the rank. It was found that with suitable parameters, such as high SNR, sufficient number of time epochs and codes of appropriate length, the number of users could be correctly estimated using the MAP estimator even when the noise variance is unknown. Additionally, the process of identifying the maximum likelihood estimate of the orthogonal projector onto the unoccupied subspace is discussed.

Contributors

Agent

Created

Date Created
  • 2014