Matching Items (10)

135360-Thumbnail Image.png

Evaluating the utility of blood glycan levels as predictors of stage I adenocarcinoma using support vector machines.

Description

Aberrant glycosylation has been shown to be linked to specific cancers, and using this idea, it was proposed that the levels of glycans in the blood could predict stage I

Aberrant glycosylation has been shown to be linked to specific cancers, and using this idea, it was proposed that the levels of glycans in the blood could predict stage I adenocarcinoma. To track this glycosylation, glycan were broken down into glycan nodes via methylation analysis. This analysis utilized information from N-, O-, and lipid linked glycans detected from gas chromatography-mass spectrometry. The resulting glycan node-ratios represent the initial quantitative data that were used in this experiment.
For this experiment, two Sets of 50 µl blood plasma samples were provided by NYU Medical School. These samples were then analyzed by Dr. Borges’s lab so that they contained normalized biomarker levels from patients with stage 1 adenocarcinoma and control patients with matched age, smoking status, and gender were examined. An ROC curve was constructed under individual and paired conditions and AUC calculated in Wolfram Mathematica 10.2. Methods such as increasing size of training set, using hard vs. soft margins, and processing biomarkers together and individually were used in order to increase the AUC. Using a soft margin for this particular data set was proved to be most useful compared to the initial set hard margin, raising the AUC from 0.6013 to 0.6585. In regards to which biomarkers yielded the better value, 6-Glc/6-Man and 3,6-Gal glycan node ratios had the best with 0.7687 AUC and a sensitivity of .7684 and specificity of .6051. While this is not enough accuracy to become a primary diagnostic tool for diagnosing stage I adenocarcinoma, the methods examined in the paper should be evaluated further. . By comparison, the current clinical standard blood test for prostate cancer that has an AUC of only 0.67.

Contributors

Agent

Created

Date Created
  • 2016-05

151436-Thumbnail Image.png

Statistical signal processing of ESI-TOF-MS for biomarker discovery

Description

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest from degraded measurements. One such application is mass spectrometry immunoassay (MSIA) which has been one of the primary methods of biomarker discovery techniques. MSIA analyzes protein molecules as potential biomarkers using time of flight mass spectrometry (TOF-MS). Peak detection in TOF-MS is important for biomarker analysis and many other MS related application. Though many peak detection algorithms exist, most of them are based on heuristics models. One of the ways of detecting signal peaks is by deploying stochastic models of the signal and noise observations. Likelihood ratio test (LRT) detector, based on the Neyman-Pearson (NP) lemma, is an uniformly most powerful test to decision making in the form of a hypothesis test. The primary goal of this dissertation is to develop signal and noise models for the electrospray ionization (ESI) TOF-MS data. A new method is proposed for developing the signal model by employing first principles calculations based on device physics and molecular properties. The noise model is developed by analyzing MS data from careful experiments in the ESI mass spectrometer. A non-flat baseline in MS data is common. The reasons behind the formation of this baseline has not been fully comprehended. A new signal model explaining the presence of baseline is proposed, though detailed experiments are needed to further substantiate the model assumptions. Signal detection schemes based on these signal and noise models are proposed. A maximum likelihood (ML) method is introduced for estimating the signal peak amplitudes. The performance of the detection methods and ML estimation are evaluated with Monte Carlo simulation which shows promising results. An application of these methods is proposed for fractional abundance calculation for biomarker analysis, which is mathematically robust and fundamentally different than the current algorithms. Biomarker panels for type 2 diabetes and cardiovascular disease are analyzed using existing MS analysis algorithms. Finally, a support vector machine based multi-classification algorithm is developed for evaluating the biomarkers' effectiveness in discriminating type 2 diabetes and cardiovascular diseases and is shown to perform better than a linear discriminant analysis based classifier.

Contributors

Agent

Created

Date Created
  • 2012

154471-Thumbnail Image.png

Statistical and dynamical modeling of Riemannian trajectories with application to human movement analysis

Description

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.

Contributors

Agent

Created

Date Created
  • 2016

150439-Thumbnail Image.png

Micro-particle streak velocimetry: theory, simulation methods and applications

Description

This dissertation describes a novel, low cost strategy of using particle streak (track) images for accurate micro-channel velocity field mapping. It is shown that 2-dimensional, 2-component fields can be efficiently

This dissertation describes a novel, low cost strategy of using particle streak (track) images for accurate micro-channel velocity field mapping. It is shown that 2-dimensional, 2-component fields can be efficiently obtained using the spatial variation of particle track lengths in micro-channels. The velocity field is a critical performance feature of many microfluidic devices. Since it is often the case that un-modeled micro-scale physics frustrates principled design methodologies, particle based velocity field estimation is an essential design and validation tool. Current technologies that achieve this goal use particle constellation correlation strategies and rely heavily on costly, high-speed imaging hardware. The proposed image/ video processing based method achieves comparable accuracy for fraction of the cost. In the context of micro-channel velocimetry, the usability of particle streaks has been poorly studied so far. Their use has remained restricted mostly to bulk flow measurements and occasional ad-hoc uses in microfluidics. A second look at the usability of particle streak lengths in this work reveals that they can be efficiently used, after approximately 15 years from their first use for micro-channel velocimetry. Particle tracks in steady, smooth microfluidic flows is mathematically modeled and a framework for using experimentally observed particle track lengths for local velocity field estimation is introduced here, followed by algorithm implementation and quantitative verification. Further, experimental considerations and image processing techniques that can facilitate the proposed methods are also discussed in this dissertation. Unavailability of benchmarked particle track image data motivated the implementation of a simulation framework with the capability to generate exposure time controlled particle track image sequence for velocity vector fields. This dissertation also describes this work and shows that arbitrary velocity fields designed in computational fluid dynamics software tools can be used to obtain such images. Apart from aiding gold-standard data generation, such images would find use for quick microfluidic flow field visualization and help improve device designs.

Contributors

Agent

Created

Date Created
  • 2011

150250-Thumbnail Image.png

Characterization and analysis of a novel platform for profiling the antibody response

Description

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze the factors affecting the binding patterns using monoclonal antibodies and determine how much information may be extracted from the sequences. Specifically, I examined the effects of antibody concentration, competition, peptide density, and antibody valence. Peptide binding could be detected at the low concentrations relevant to immunosignaturing, and a monoclonal's signature could even be detected in the presences of 100 fold excess naive IgG. I also found that peptide density was important, but this effect was not due to bivalent binding. Next, I examined in more detail how a polyreactive antibody binds to the random sequence peptides compared to protein sequence derived peptides, and found that it bound to many peptides from both sets, but with low apparent affinity. An in depth look at how the peptide physicochemical properties and sequence complexity revealed that there were some correlations with properties, but they were generally small and varied greatly between antibodies. However, on a limited diversity but larger peptide library, I found that sequence complexity was important for antibody binding. The redundancy on that library did enable the identification of specific sub-sequences recognized by an antibody. The current immunosignaturing platform has little repetition of sub-sequences, so I evaluated several methods to infer antibody epitopes. I found two methods that had modest prediction accuracy, and I developed a software application called GuiTope to facilitate the epitope prediction analysis. None of the methods had sufficient accuracy to identify an unknown antigen from a database. In conclusion, the characteristics of the immunosignaturing platform observed through monoclonal antibody experiments demonstrate its promise as a new diagnostic technology. However, a major limitation is the difficulty in connecting the signature back to the original antigen, though larger peptide libraries could facilitate these predictions.

Contributors

Agent

Created

Date Created
  • 2011

150929-Thumbnail Image.png

Bayesian networks and gaussian mixture models in multi-dimensional data analysis with application to religion-conflict data

Description

This thesis examines the application of statistical signal processing approaches to data arising from surveys intended to measure psychological and sociological phenomena underpinning human social dynamics. The use of signal

This thesis examines the application of statistical signal processing approaches to data arising from surveys intended to measure psychological and sociological phenomena underpinning human social dynamics. The use of signal processing methods for analysis of signals arising from measurement of social, biological, and other non-traditional phenomena has been an important and growing area of signal processing research over the past decade. Here, we explore the application of statistical modeling and signal processing concepts to data obtained from the Global Group Relations Project, specifically to understand and quantify the effects and interactions of social psychological factors related to intergroup conflicts. We use Bayesian networks to specify prospective models of conditional dependence. Bayesian networks are determined between social psychological factors and conflict variables, and modeled by directed acyclic graphs, while the significant interactions are modeled as conditional probabilities. Since the data are sparse and multi-dimensional, we regress Gaussian mixture models (GMMs) against the data to estimate the conditional probabilities of interest. The parameters of GMMs are estimated using the expectation-maximization (EM) algorithm. However, the EM algorithm may suffer from over-fitting problem due to the high dimensionality and limited observations entailed in this data set. Therefore, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are used for GMM order estimation. To assist intuitive understanding of the interactions of social variables and the intergroup conflicts, we introduce a color-based visualization scheme. In this scheme, the intensities of colors are proportional to the conditional probabilities observed.

Contributors

Agent

Created

Date Created
  • 2012

154497-Thumbnail Image.png

Development and analysis of stochastic boundary coverage strategies for multi-robot systems

Description

Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains

Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require the robots to allocate themselves around the boundaries of regions or features of interest and achieve target objectives that derive from their resulting spatial configurations, such as forming a connected communication network or acquiring sensor data around the entire boundary. We refer to this spatial allocation problem as boundary coverage. Possible swarm tasks that will involve boundary coverage include cooperative load manipulation for applications in construction, manufacturing, and disaster response.

In this work, I address the challenges of controlling a swarm of resource-constrained robots to achieve boundary coverage, which I refer to as the problem of stochastic boundary coverage. I first examined an instance of this behavior in the biological phenomenon of group food retrieval by desert ants, and developed a hybrid dynamical system model of this process from experimental data. Subsequently, with the aid of collaborators, I used a continuum abstraction of swarm population dynamics, adapted from a modeling framework used in chemical kinetics, to derive stochastic robot control policies that drive a swarm to target steady-state allocations around multiple boundaries in a way that is robust to environmental variations.

Next, I determined the statistical properties of the random graph that is formed by a group of robots, each with the same capabilities, that have attached to a boundary at random locations. I also computed the probability density functions (pdfs) of the robot positions and inter-robot distances for this case.

I then extended this analysis to cases in which the robots have heterogeneous communication/sensing radii and attach to a boundary according to non-uniform, non-identical pdfs. I proved that these more general coverage strategies generate random graphs whose probability of connectivity is Sharp-P Hard to compute. Finally, I investigated possible approaches to validating our boundary coverage strategies in multi-robot simulations with realistic Wi-fi communication.

Contributors

Agent

Created

Date Created
  • 2016

149410-Thumbnail Image.png

Faceted search and browsing of Indonesian text collection using shallow parsing techniques

Description

Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the

Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the enterprise content is not cross linked and does not follow a page rank algorithm. On the other hand the enterprise search engine uses metadata information, which allows the user to specify the conditions that any retrieved document should meet. Therefore, using metadata information for searching will also be very useful. My thesis aims on developing an enterprise search engine using metadata information by providing advanced features like faceted navigation. The search engine data was extracted from various Indonesian web sources. Metadata information like person, organization, location, and sentiment analytic keyword entities should be tagged in each document to provide facet search capability. A shallow parsing technique like named entity recognizer is used for this purpose. There are more than 1500 entities that have been tagged in this process. These documents have been successfully converted into XML format and are indexed with "Apache Solr". It is an open source enterprise search engine with full text search and faceted search capabilities. The entities will be helpful for users to specify conditions and search faster through the large collection of documents. The user is assured results by clicking on a metadata condition. Since the sentiment analytic keywords are tagged with positive and negative values, social scientists can use these results to check for overlapping or conflicting organizations and ideologies. In addition, this tool is the first of its kind for the Indonesian language. The results are fetched much faster and with better accuracy.

Contributors

Agent

Created

Date Created
  • 2010

149386-Thumbnail Image.png

Computational modeling of peptide-protein binding

Description

Peptides offer great promise as targeted affinity ligands, but the space of possible peptide sequences is vast, making experimental identification of lead candidates expensive, difficult, and uncertain. Computational modeling

Peptides offer great promise as targeted affinity ligands, but the space of possible peptide sequences is vast, making experimental identification of lead candidates expensive, difficult, and uncertain. Computational modeling can narrow the search by estimating the affinity and specificity of a given peptide in relation to a predetermined protein target. The predictive performance of computational models of interactions of intermediate-length peptides with proteins can be improved by taking into account the stochastic nature of the encounter and binding dynamics. A theoretical case is made for the hypothesis that, because of the flexibility of the peptide and the structural complexity of the target protein, interactions are best characterized by an ensemble of possible bound configurations rather than a single “lock and key” fit. A model incorporating these factors is proposed and evaluated. A comprehensive dataset of 3,924 peptide-protein interface structures was extracted from the Protein Data Bank (PDB) and descriptors were computed characterizing the geometry and energetics of each interface. The characteristics of these interfaces are shown to be generally consistent with the proposed model, and heuristics for design and selection of peptide ligands are derived. The curated and energy-minimized interface structure dataset and a relational database containing the detailed results of analysis and energy modeling are made publicly available via a web repository. A novel analytical technique based on the proposed theoretical model, Virtual Scanning Probe Mapping (VSPM), is implemented in software to analyze the interaction between a target protein of known structure and a peptide of specified sequence, producing a spatial map indicating the most likely peptide binding regions on the protein target. The resulting predictions are shown to be superior to those of two other published methods, and support the validity of the stochastic binding model.

Contributors

Agent

Created

Date Created
  • 2010

149544-Thumbnail Image.png

Opportunistic scheduling, cooperative relaying and multicast in wireless networks

Description

This dissertation builds a clear understanding of the role of information in wireless networks, and devises adaptive strategies to optimize the overall performance. The meaning of information ranges from channel

This dissertation builds a clear understanding of the role of information in wireless networks, and devises adaptive strategies to optimize the overall performance. The meaning of information ranges from channel
etwork states to the structure of the signal itself. Under the common thread of characterizing the role of information, this dissertation investigates opportunistic scheduling, relaying and multicast in wireless networks. To assess the role of channel state information, the problem of opportunistic distributed opportunistic scheduling (DOS) with incomplete information is considered for ad-hoc networks in which many links contend for the same channel using random access. The objective is to maximize the system throughput. In practice, link state information is noisy, and may result in throughput degradation. Therefore, refining the state information by additional probing can improve the throughput, but at the cost of further probing. Capitalizing on optimal stopping theory, the optimal scheduling policy is shown to be threshold-based and is characterized by either one or two thresholds, depending on network settings. To understand the benefits of side information in cooperative relaying scenarios, a basic model is explored for two-hop transmissions of two information flows which interfere with each other. While the first hop is a classical interference channel, the second hop can be treated as an interference channel with transmitter side information. Various cooperative relaying strategies are developed to enhance the achievable rate. In another context, a simple sensor network is considered, where a sensor node acts as a relay, and aids fusion center in detecting an event. Two relaying schemes are considered: analog relaying and digital relaying. Sufficient conditions are provided for the optimality of analog relaying over digital relaying in this network. To illustrate the role of information about the signal structure in joint source-channel coding, multicast of compressible signals over lossy channels is studied. The focus is on the network outage from the perspective of signal distortion across all receivers. Based on extreme value theory, the network outage is characterized in terms of key parameters. A new method using subblock network coding is devised, which prioritizes resource allocation based on the signal information structure.

Contributors

Agent

Created

Date Created
  • 2011