Matching Items (13)

151689-Thumbnail Image.png

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

Contributors

Agent

Created

Date Created
  • 2013

155228-Thumbnail Image.png

Structured sparse methods for imaging genetics

Description

Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a

Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a novel direction to reveal and understand the complex genetic mechanisms. Oftentimes, imaging genetics studies are challenging due to the relatively small number of subjects but extremely high-dimensionality of both imaging data and genomic data. In this dissertation, I carry on my research on imaging genetics with particular focuses on two tasks---building predictive models between neuroimaging data and genomic data, and identifying disorder-related genetic risk factors through image-based biomarkers. To this end, I consider a suite of structured sparse methods---that can produce interpretable models and are robust to overfitting---for imaging genetics. With carefully-designed sparse-inducing regularizers, different biological priors are incorporated into learning models. More specifically, in the Allen brain image--gene expression study, I adopt an advanced sparse coding approach for image feature extraction and employ a multi-task learning approach for multi-class annotation. Moreover, I propose a label structured-based two-stage learning framework, which utilizes the hierarchical structure among labels, for multi-label annotation. In the Alzheimer's disease neuroimaging initiative (ADNI) imaging genetics study, I employ Lasso together with EDPP (enhanced dual polytope projections) screening rules to fast identify Alzheimer's disease risk SNPs. I also adopt the tree-structured group Lasso with MLFre (multi-layer feature reduction) screening rules to incorporate linkage disequilibrium information into modeling. Moreover, I propose a novel absolute fused Lasso model for ADNI imaging genetics. This method utilizes SNP spatial structure and is robust to the choice of reference alleles of genotype coding. In addition, I propose a two-level structured sparse model that incorporates gene-level networks through a graph penalty into SNP-level model construction. Lastly, I explore a convolutional neural network approach for accurate predicting Alzheimer's disease related imaging phenotypes. Experimental results on real-world imaging genetics applications demonstrate the efficiency and effectiveness of the proposed structured sparse methods.

Contributors

Agent

Created

Date Created
  • 2017

153223-Thumbnail Image.png

Exploring latent structure in data: algorithms and implementations

Description

Feature representations for raw data is one of the most important component in a machine learning system. Traditionally, features are \textit{hand crafted} by domain experts which can often be a

Feature representations for raw data is one of the most important component in a machine learning system. Traditionally, features are \textit{hand crafted} by domain experts which can often be a time consuming process. Furthermore, they do not generalize well to unseen data and novel tasks. Recently, there have been many efforts to generate data-driven representations using clustering and sparse models. This dissertation focuses on building data-driven unsupervised models for analyzing raw data and developing efficient feature representations.

Simultaneous segmentation and feature extraction approaches for silicon-pores sensor data are considered. Aggregating data into a matrix and performing low rank and sparse matrix decompositions with additional smoothness constraints are proposed to solve this problem. Comparison of several variants of the approaches and results for signal de-noising and translocation/trapping event extraction are presented. Algorithms to improve transform-domain features for ion-channel time-series signals based on matrix completion are presented. The improved features achieve better performance in classification tasks and in reducing the false alarm rates when applied to analyte detection.

Developing representations for multimedia is an important and challenging problem with applications ranging from scene recognition, multi-media retrieval and personal life-logging systems to field robot navigation. In this dissertation, we present a new framework for feature extraction for challenging natural environment sounds. Proposed features outperform traditional spectral features on challenging environmental sound datasets. Several algorithms are proposed that perform supervised tasks such as recognition and tag annotation. Ensemble methods are proposed to improve the tag annotation process.

To facilitate the use of large datasets, fast implementations are developed for sparse coding, the key component in our algorithms. Several strategies to speed-up Orthogonal Matching Pursuit algorithm using CUDA kernel on a GPU are proposed. Implementations are also developed for a large scale image retrieval system. Image-based "exact search" and "visually similar search" using the image patch sparse codes are performed. Results demonstrate large speed-up over CPU implementations and good retrieval performance is also achieved.

Contributors

Agent

Created

Date Created
  • 2014

153196-Thumbnail Image.png

Graph-based sparse learning: models, algorithms, and applications

Description

Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and

Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and computer vision. Recently, the a priori structural information has been shown to be powerful for improving the performance of sparse learning models. A graph is a fundamental way to represent structural information of features. This dissertation focuses on graph-based sparse learning. The first part of this dissertation aims to integrate a graph into sparse learning to improve the performance. Specifically, the problem of feature grouping and selection over a given undirected graph is considered. Three models are proposed along with efficient solvers to achieve simultaneous feature grouping and selection, enhancing estimation accuracy. One major challenge is that it is still computationally challenging to solve large scale graph-based sparse learning problems. An efficient, scalable, and parallel algorithm for one widely used graph-based sparse learning approach, called anisotropic total variation regularization is therefore proposed, by explicitly exploring the structure of a graph. The second part of this dissertation focuses on uncovering the graph structure from the data. Two issues in graphical modeling are considered. One is the joint estimation of multiple graphical models using a fused lasso penalty and the other is the estimation of hierarchical graphical models. The key technical contribution is to establish the necessary and sufficient condition for the graphs to be decomposable. Based on this key property, a simple screening rule is presented, which reduces the size of the optimization problem, dramatically reducing the computational cost.

Contributors

Agent

Created

Date Created
  • 2014

154381-Thumbnail Image.png

High-order sparsity exploiting methods with applications in imaging and PDEs

Description

High-order methods are known for their accuracy and computational performance when applied to solving partial differential equations and have widespread use

in representing images compactly. Nonetheless, high-order methods have difficulty

High-order methods are known for their accuracy and computational performance when applied to solving partial differential equations and have widespread use

in representing images compactly. Nonetheless, high-order methods have difficulty representing functions containing discontinuities or functions having slow spectral decay in the chosen basis. Certain sensing techniques such as MRI and SAR provide data in terms of Fourier coefficients, and thus prescribe a natural high-order basis. The field of compressed sensing has introduced a set of techniques based on $\ell^1$ regularization that promote sparsity and facilitate working with functions having discontinuities. In this dissertation, high-order methods and $\ell^1$ regularization are used to address three problems: reconstructing piecewise smooth functions from sparse and and noisy Fourier data, recovering edge locations in piecewise smooth functions from sparse and noisy Fourier data, and reducing time-stepping constraints when numerically solving certain time-dependent hyperbolic partial differential equations.

Contributors

Agent

Created

Date Created
  • 2016

151402-Thumbnail Image.png

Drosophila stage annotation using sparse learning method

Description

Drosophila melanogaster, as an important model organism, is used to explore the mechanism which governs cell differentiation and embryonic development. Understanding the mechanism will help to reveal the effects of

Drosophila melanogaster, as an important model organism, is used to explore the mechanism which governs cell differentiation and embryonic development. Understanding the mechanism will help to reveal the effects of genes on other species or even human beings. Currently, digital camera techniques make high quality Drosophila gene expression imaging possible. On the other hand, due to the advances in biology, gene expression images which can reveal spatiotemporal patterns are generated in a high-throughput pace. Thus, an automated and efficient system that can analyze gene expression will become a necessary tool for investigating the gene functions, interactions and developmental processes. One investigation method is to compare the expression patterns of different developmental stages. Recently, however, the expression patterns are manually annotated with rough stage ranges. The work of annotation requires professional knowledge from experienced biologists. Hence, how to transfer the domain knowledge in biology into an automated system which can automatically annotate the patterns provides a challenging problem for computer scientists. In this thesis, the problem of stage annotation for Drosophila embryo is modeled in the machine learning framework. Three sparse learning algorithms and one ensemble algorithm are used to attack the problem. The sparse algorithms are Lasso, group Lasso and sparse group Lasso. The ensemble algorithm is based on a voting method. Besides that the proposed algorithms can annotate the patterns to stages instead of stage ranges with high accuracy; the decimal stage annotation algorithm presents a novel way to annotate the patterns to decimal stages. In addition, some analysis on the algorithm performance are made and corresponding explanations are given. Finally, with the proposed system, all the lateral view BDGP and FlyFish images are annotated and several interesting applications of decimal stage value are revealed.

Contributors

Agent

Created

Date Created
  • 2012

151537-Thumbnail Image.png

New directions in sparse models for image analysis and restoration

Description

Effective modeling of high dimensional data is crucial in information processing and machine learning. Classical subspace methods have been very effective in such applications. However, over the past few decades,

Effective modeling of high dimensional data is crucial in information processing and machine learning. Classical subspace methods have been very effective in such applications. However, over the past few decades, there has been considerable research towards the development of new modeling paradigms that go beyond subspace methods. This dissertation focuses on the study of sparse models and their interplay with modern machine learning techniques such as manifold, ensemble and graph-based methods, along with their applications in image analysis and recovery. By considering graph relations between data samples while learning sparse models, graph-embedded codes can be obtained for use in unsupervised, supervised and semi-supervised problems. Using experiments on standard datasets, it is demonstrated that the codes obtained from the proposed methods outperform several baseline algorithms. In order to facilitate sparse learning with large scale data, the paradigm of ensemble sparse coding is proposed, and different strategies for constructing weak base models are developed. Experiments with image recovery and clustering demonstrate that these ensemble models perform better when compared to conventional sparse coding frameworks. When examples from the data manifold are available, manifold constraints can be incorporated with sparse models and two approaches are proposed to combine sparse coding with manifold projection. The improved performance of the proposed techniques in comparison to sparse coding approaches is demonstrated using several image recovery experiments. In addition to these approaches, it might be required in some applications to combine multiple sparse models with different regularizations. In particular, combining an unconstrained sparse model with non-negative sparse coding is important in image analysis, and it poses several algorithmic and theoretical challenges. A convex and an efficient greedy algorithm for recovering combined representations are proposed. Theoretical guarantees on sparsity thresholds for exact recovery using these algorithms are derived and recovery performance is also demonstrated using simulations on synthetic data. Finally, the problem of non-linear compressive sensing, where the measurement process is carried out in feature space obtained using non-linear transformations, is considered. An optimized non-linear measurement system is proposed, and improvements in recovery performance are demonstrated in comparison to using random measurements as well as optimized linear measurements.

Contributors

Agent

Created

Date Created
  • 2013

151285-Thumbnail Image.png

Improving the execution time of large system simulations

Description

Today, the electric power system faces new challenges from rapid developing technology and the growing concern about environmental problems. The future of the power system under these new challenges needs

Today, the electric power system faces new challenges from rapid developing technology and the growing concern about environmental problems. The future of the power system under these new challenges needs to be planned and studied. However, due to the high degree of computational complexity of the optimization problem, conducting a system planning study which takes into account the market structure and environmental constraints on a large-scale power system is computationally taxing. To improve the execution time of large system simulations, such as the system planning study, two possible strategies are proposed in this thesis. The first one is to implement a relative new factorization method, known as the multifrontal method, to speed up the solution of the sparse linear matrix equations within the large system simulations. The performance of the multifrontal method implemented by UMFAPACK is compared with traditional LU factorization on a wide range of power-system matrices. The results show that the multifrontal method is superior to traditional LU factorization on relatively denser matrices found in other specialty areas, but has poor performance on the more sparse matrices that occur in power-system applications. This result suggests that multifrontal methods may not be an effective way to improve execution time for large system simulation and power system engineers should evaluate the performance of the multifrontal method before applying it to their applications. The second strategy is to develop a small dc equivalent of the large-scale network with satisfactory accuracy for the large-scale system simulations. In this thesis, a modified Ward equivalent is generated for a large-scale power system, such as the full Electric Reliability Council of Texas (ERCOT) system. In this equivalent, all the generators in the full model are retained integrally. The accuracy of the modified Ward equivalent is validated and the equivalent is used to conduct the optimal generation investment planning study. By using the dc equivalent, the execution time for optimal generation investment planning is greatly reduced. Different scenarios are modeled to study the impact of fuel prices, environmental constraints and incentives for renewable energy on future investment and retirement in generation.

Contributors

Agent

Created

Date Created
  • 2012

151544-Thumbnail Image.png

Sparse methods in image understanding and computer vision

Description

Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence

Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse modeling, data is represented as a sparse linear combination of atoms from a "dictionary" matrix. This dissertation focuses on understanding different aspects of sparse learning, thereby enhancing the use of sparse methods by incorporating tools from machine learning. With the growing need to adapt models for large scale data, it is important to design dictionaries that can model the entire data space and not just the samples considered. By exploiting the relation of dictionary learning to 1-D subspace clustering, a multilevel dictionary learning algorithm is developed, and it is shown to outperform conventional sparse models in compressed recovery, and image denoising. Theoretical aspects of learning such as algorithmic stability and generalization are considered, and ensemble learning is incorporated for effective large scale learning. In addition to building strategies for efficiently implementing 1-D subspace clustering, a discriminative clustering approach is designed to estimate the unknown mixing process in blind source separation. By exploiting the non-linear relation between the image descriptors, and allowing the use of multiple features, sparse methods can be made more effective in recognition problems. The idea of multiple kernel sparse representations is developed, and algorithms for learning dictionaries in the feature space are presented. Using object recognition experiments on standard datasets it is shown that the proposed approaches outperform other sparse coding-based recognition frameworks. Furthermore, a segmentation technique based on multiple kernel sparse representations is developed, and successfully applied for automated brain tumor identification. Using sparse codes to define the relation between data samples can lead to a more robust graph embedding for unsupervised clustering. By performing discriminative embedding using sparse coding-based graphs, an algorithm for measuring the glomerular number in kidney MRI images is developed. Finally, approaches to build dictionaries for local sparse coding of image descriptors are presented, and applied to object recognition and image retrieval.

Contributors

Agent

Created

Date Created
  • 2013

150190-Thumbnail Image.png

Sparse learning package with stability selection and application to alzheimer's disease

Description

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for learning a sparse set of the most relevant features for both regression and classification problems. Structural dependencies among features which introduce additional requirements are also provided as part of the package. The features may be grouped together, and there may exist hierarchies and over- lapping groups among these, and there may be requirements for selecting the most relevant groups among them. In spite of getting sparse solutions, the solutions are not guaranteed to be robust. For the selection to be robust, there are certain techniques which provide theoretical justification of why certain features are selected. The stability selection, is a method for feature selection which allows the use of existing sparse learning methods to select the stable set of features for a given training sample. This is done by assigning probabilities for the features: by sub-sampling the training data and using a specific sparse learning technique to learn the relevant features, and repeating this a large number of times, and counting the probability as the number of times a feature is selected. Cross-validation which is used to determine the best parameter value over a range of values, further allows to select the best parameter value. This is done by selecting the parameter value which gives the maximum accuracy score. With such a combination of algorithms, with good convergence guarantees, stable feature selection properties and the inclusion of various structural dependencies among features, the sparse learning package will be a powerful tool for machine learning research. Modular structure, C implementation, ATLAS integration for fast linear algebraic subroutines, make it one of the best tool for a large sparse setting. The varied collection of algorithms, support for group sparsity, batch algorithms, are a few of the notable functionality of the SLEP package, and these features can be used in a variety of fields to infer relevant elements. The Alzheimer Disease(AD) is a neurodegenerative disease, which gradually leads to dementia. The SLEP package is used for feature selection for getting the most relevant biomarkers from the available AD dataset, and the results show that, indeed, only a subset of the features are required to gain valuable insights.

Contributors

Agent

Created

Date Created
  • 2011