Matching Items (15)
Filtering by

Clear all filters

149991-Thumbnail Image.png
Description
With the introduction of compressed sensing and sparse representation,many image processing and computer vision problems have been looked at in a new way. Recent trends indicate that many challenging computer vision and image processing problems are being solved using compressive sensing and sparse representation algorithms. This thesis assays some applications

With the introduction of compressed sensing and sparse representation,many image processing and computer vision problems have been looked at in a new way. Recent trends indicate that many challenging computer vision and image processing problems are being solved using compressive sensing and sparse representation algorithms. This thesis assays some applications of compressive sensing and sparse representation with regards to image enhancement, restoration and classication. The first application deals with image Super-Resolution through compressive sensing based sparse representation. A novel framework is developed for understanding and analyzing some of the implications of compressive sensing in reconstruction and recovery of an image through raw-sampled and trained dictionaries. Properties of the projection operator and the dictionary are examined and the corresponding results presented. In the second application a novel technique for representing image classes uniquely in a high-dimensional space for image classification is presented. In this method, design and implementation strategy of the image classification system through unique affine sparse codes is presented, which leads to state of the art results. This further leads to analysis of some of the properties attributed to these unique sparse codes. In addition to obtaining these codes, a strong classier is designed and implemented to boost the results obtained. Evaluation with publicly available datasets shows that the proposed method outperforms other state of the art results in image classication. The final part of the thesis deals with image denoising with a novel approach towards obtaining high quality denoised image patches using only a single image. A new technique is proposed to obtain highly correlated image patches through sparse representation, which are then subjected to matrix completion to obtain high quality image patches. Experiments suggest that there may exist a structure within a noisy image which can be exploited for denoising through a low-rank constraint.
ContributorsKulkarni, Naveen (Author) / Li, Baoxin (Thesis advisor) / Ye, Jieping (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2011
150181-Thumbnail Image.png
Description
Real-world environments are characterized by non-stationary and continuously evolving data. Learning a classification model on this data would require a framework that is able to adapt itself to newer circumstances. Under such circumstances, transfer learning has come to be a dependable methodology for improving classification performance with reduced training costs

Real-world environments are characterized by non-stationary and continuously evolving data. Learning a classification model on this data would require a framework that is able to adapt itself to newer circumstances. Under such circumstances, transfer learning has come to be a dependable methodology for improving classification performance with reduced training costs and without the need for explicit relearning from scratch. In this thesis, a novel instance transfer technique that adapts a "Cost-sensitive" variation of AdaBoost is presented. The method capitalizes on the theoretical and functional properties of AdaBoost to selectively reuse outdated training instances obtained from a "source" domain to effectively classify unseen instances occurring in a different, but related "target" domain. The algorithm is evaluated on real-world classification problems namely accelerometer based 3D gesture recognition, smart home activity recognition and text categorization. The performance on these datasets is analyzed and evaluated against popular boosting-based instance transfer techniques. In addition, supporting empirical studies, that investigate some of the less explored bottlenecks of boosting based instance transfer methods, are presented, to understand the suitability and effectiveness of this form of knowledge transfer.
ContributorsVenkatesan, Ashok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Li, Baoxin (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2011
150189-Thumbnail Image.png
Description
This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log

This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log archives over a period of five years) and focused on drawing more precise metrics from different perspectives of the communication data. Also, I attempted to overcome the scalability issue by using Apache Pig libraries, which run on a MapReduce framework based Hadoop Cluster. I described four metrics based on which I observed and analyzed the data and also presented the results which show the required patterns and anomalies to better understand and infer the communication. Also described the usage experience with Pig Latin (scripting language of Apache Pig Libraries) for this research and how they brought the feature of scalability, simplicity, and visibility in this data intensive research work. These approaches are useful in project monitoring, to augment human observation and reporting, in social network analysis, to track individual contributions.
ContributorsMotamarri, Lakshminarayana (Author) / Santanam, Raghu (Thesis advisor) / Ye, Jieping (Thesis advisor) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2011
150190-Thumbnail Image.png
Description
Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for learning a sparse set of the most relevant features for both regression and classification problems. Structural dependencies among features which introduce additional requirements are also provided as part of the package. The features may be grouped together, and there may exist hierarchies and over- lapping groups among these, and there may be requirements for selecting the most relevant groups among them. In spite of getting sparse solutions, the solutions are not guaranteed to be robust. For the selection to be robust, there are certain techniques which provide theoretical justification of why certain features are selected. The stability selection, is a method for feature selection which allows the use of existing sparse learning methods to select the stable set of features for a given training sample. This is done by assigning probabilities for the features: by sub-sampling the training data and using a specific sparse learning technique to learn the relevant features, and repeating this a large number of times, and counting the probability as the number of times a feature is selected. Cross-validation which is used to determine the best parameter value over a range of values, further allows to select the best parameter value. This is done by selecting the parameter value which gives the maximum accuracy score. With such a combination of algorithms, with good convergence guarantees, stable feature selection properties and the inclusion of various structural dependencies among features, the sparse learning package will be a powerful tool for machine learning research. Modular structure, C implementation, ATLAS integration for fast linear algebraic subroutines, make it one of the best tool for a large sparse setting. The varied collection of algorithms, support for group sparsity, batch algorithms, are a few of the notable functionality of the SLEP package, and these features can be used in a variety of fields to infer relevant elements. The Alzheimer Disease(AD) is a neurodegenerative disease, which gradually leads to dementia. The SLEP package is used for feature selection for getting the most relevant biomarkers from the available AD dataset, and the results show that, indeed, only a subset of the features are required to gain valuable insights.
ContributorsThulasiram, Ramesh (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2011
150086-Thumbnail Image.png
Description
Detecting anatomical structures, such as the carina, the pulmonary trunk and the aortic arch, is an important step in designing a CAD system of detection Pulmonary Embolism. The presented CAD system gets rid of the high-level prior defined knowledge to become a system which can easily extend to detect other

Detecting anatomical structures, such as the carina, the pulmonary trunk and the aortic arch, is an important step in designing a CAD system of detection Pulmonary Embolism. The presented CAD system gets rid of the high-level prior defined knowledge to become a system which can easily extend to detect other anatomic structures. The system is based on a machine learning algorithm --- AdaBoost and a general feature --- Haar. This study emphasizes on off-line and on-line AdaBoost learning. And in on-line AdaBoost, the thesis further deals with extremely imbalanced condition. The thesis first reviews several knowledge-based detection methods, which are relied on human being's understanding of the relationship between anatomic structures. Then the thesis introduces a classic off-line AdaBoost learning. The thesis applies different cascading scheme, namely multi-exit cascading scheme. The comparison between the two methods will be provided and discussed. Both of the off-line AdaBoost methods have problems in memory usage and time consuming. Off-line AdaBoost methods need to store all the training samples and the dataset need to be set before training. The dataset cannot be enlarged dynamically. Different training dataset requires retraining the whole process. The retraining is very time consuming and even not realistic. To deal with the shortcomings of off-line learning, the study exploited on-line AdaBoost learning approach. The thesis proposed a novel pool based on-line method with Kalman filters and histogram to better represent the distribution of the samples' weight. Analysis of the performance, the stability and the computational complexity will be provided in the thesis. Furthermore, the original on-line AdaBoost performs badly in imbalanced conditions, which occur frequently in medical image processing. In image dataset, positive samples are limited and negative samples are countless. A novel Self-Adaptive Asymmetric On-line Boosting method is presented. The method utilized a new asymmetric loss criterion with self-adaptability according to the ratio of exposed positive and negative samples and it has an advanced rule to update sample's importance weight taking account of both classification result and sample's label. Compared to traditional on-line AdaBoost Learning method, the new method can achieve far more accuracy in imbalanced conditions.
ContributorsWu, Hong (Author) / Liang, Jianming (Thesis advisor) / Farin, Gerald (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2011
152300-Thumbnail Image.png
Description
In blindness research, the corpus callosum (CC) is the most frequently studied sub-cortical structure, due to its important involvement in visual processing. While most callosal analyses from brain structural magnetic resonance images (MRI) are limited to the 2D mid-sagittal slice, we propose a novel framework to capture a complete set

In blindness research, the corpus callosum (CC) is the most frequently studied sub-cortical structure, due to its important involvement in visual processing. While most callosal analyses from brain structural magnetic resonance images (MRI) are limited to the 2D mid-sagittal slice, we propose a novel framework to capture a complete set of 3D morphological differences in the corpus callosum between two groups of subjects. The CCs are segmented from whole brain T1-weighted MRI and modeled as 3D tetrahedral meshes. The callosal surface is divided into superior and inferior patches on which we compute a volumetric harmonic field by solving the Laplace's equation with Dirichlet boundary conditions. We adopt a refined tetrahedral mesh to compute the Laplacian operator, so our computation can achieve sub-voxel accuracy. Thickness is estimated by tracing the streamlines in the harmonic field. We combine areal changes found using surface tensor-based morphometry and thickness information into a vector at each vertex to be used as a metric for the statistical analysis. Group differences are assessed on this combined measure through Hotelling's T2 test. The method is applied to statistically compare three groups consisting of: congenitally blind (CB), late blind (LB; onset > 8 years old) and sighted (SC) subjects. Our results reveal significant differences in several regions of the CC between both blind groups and the sighted groups; and to a lesser extent between the LB and CB groups. These results demonstrate the crucial role of visual deprivation during the developmental period in reshaping the structural architecture of the CC.
ContributorsXu, Liang (Author) / Wang, Yalin (Thesis advisor) / Maciejewski, Ross (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2013
151402-Thumbnail Image.png
Description
Drosophila melanogaster, as an important model organism, is used to explore the mechanism which governs cell differentiation and embryonic development. Understanding the mechanism will help to reveal the effects of genes on other species or even human beings. Currently, digital camera techniques make high quality Drosophila gene expression imaging possible.

Drosophila melanogaster, as an important model organism, is used to explore the mechanism which governs cell differentiation and embryonic development. Understanding the mechanism will help to reveal the effects of genes on other species or even human beings. Currently, digital camera techniques make high quality Drosophila gene expression imaging possible. On the other hand, due to the advances in biology, gene expression images which can reveal spatiotemporal patterns are generated in a high-throughput pace. Thus, an automated and efficient system that can analyze gene expression will become a necessary tool for investigating the gene functions, interactions and developmental processes. One investigation method is to compare the expression patterns of different developmental stages. Recently, however, the expression patterns are manually annotated with rough stage ranges. The work of annotation requires professional knowledge from experienced biologists. Hence, how to transfer the domain knowledge in biology into an automated system which can automatically annotate the patterns provides a challenging problem for computer scientists. In this thesis, the problem of stage annotation for Drosophila embryo is modeled in the machine learning framework. Three sparse learning algorithms and one ensemble algorithm are used to attack the problem. The sparse algorithms are Lasso, group Lasso and sparse group Lasso. The ensemble algorithm is based on a voting method. Besides that the proposed algorithms can annotate the patterns to stages instead of stage ranges with high accuracy; the decimal stage annotation algorithm presents a novel way to annotate the patterns to decimal stages. In addition, some analysis on the algorithm performance are made and corresponding explanations are given. Finally, with the proposed system, all the lateral view BDGP and FlyFish images are annotated and several interesting applications of decimal stage value are revealed.
ContributorsPan, Cheng (Author) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Farin, Gerald (Committee member) / Arizona State University (Publisher)
Created2012
151963-Thumbnail Image.png
Description
Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to the interface language of the system. NL2KR (Natural language to knowledge representation) v.1 system is a prototype of such a system. It is a learning based system that learns new meanings of words in terms of lambda-calculus formulas given an initial lexicon of some words and their meanings and a training corpus of sentences with their translations. As a part of this thesis, we take the prototype NL2KR v.1 system and enhance various components of it to make it usable for somewhat substantial and useful interface languages. We revamped the lexicon learning components, Inverse-lambda and Generalization modules, and redesigned the lexicon learning algorithm which uses these components to learn new meanings of words. Similarly, we re-developed an inbuilt parser of the system in Answer Set Programming (ASP) and also integrated external parser with the system. Apart from this, we added some new rich features like various system configurations and memory cache in the learning component of the NL2KR system. These enhancements helped in learning more meanings of the words, boosted performance of the system by reducing the computation time by a factor of 8 and improved the usability of the system. We evaluated the NL2KR system on iRODS domain. iRODS is a rule-oriented data system, which helps in managing large set of computer files using policies. This system provides a Rule-Oriented interface langauge whose syntactic structure is like any procedural programming language (eg. C). However, direct translation of natural language (NL) to this interface language is difficult. So, for automatic translation of NL to this language, we define a simple intermediate Policy Declarative Language (IPDL) to represent the knowledge in the policies, which then can be directly translated to iRODS rules. We develop a corpus of 100 policy statements and manually translate them to IPDL langauge. This corpus is then used for the evaluation of NL2KR system. We performed 10 fold cross validation on the system. Furthermore, using this corpus, we illustrate how different components of our NL2KR system work.
ContributorsKumbhare, Kanchan Ravishankar (Author) / Baral, Chitta (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2013
151154-Thumbnail Image.png
Description
Alzheimer's Disease (AD) is the most common form of dementia observed in elderly patients and has significant social-economic impact. There are many initiatives which aim to capture leading causes of AD. Several genetic, imaging, and biochemical markers are being explored to monitor progression of AD and explore treatment and detection

Alzheimer's Disease (AD) is the most common form of dementia observed in elderly patients and has significant social-economic impact. There are many initiatives which aim to capture leading causes of AD. Several genetic, imaging, and biochemical markers are being explored to monitor progression of AD and explore treatment and detection options. The primary focus of this thesis is to identify key biomarkers to understand the pathogenesis and prognosis of Alzheimer's Disease. Feature selection is the process of finding a subset of relevant features to develop efficient and robust learning models. It is an active research topic in diverse areas such as computer vision, bioinformatics, information retrieval, chemical informatics, and computational finance. In this work, state of the art feature selection algorithms, such as Student's t-test, Relief-F, Information Gain, Gini Index, Chi-Square, Fisher Kernel Score, Kruskal-Wallis, Minimum Redundancy Maximum Relevance, and Sparse Logistic regression with Stability Selection have been extensively exploited to identify informative features for AD using data from Alzheimer's Disease Neuroimaging Initiative (ADNI). An integrative approach which uses blood plasma protein, Magnetic Resonance Imaging, and psychometric assessment scores biomarkers has been explored. This work also analyzes the techniques to handle unbalanced data and evaluate the efficacy of sampling techniques. Performance of feature selection algorithm is evaluated using the relevance of derived features and the predictive power of the algorithm using Random Forest and Support Vector Machine classifiers. Performance metrics such as Accuracy, Sensitivity and Specificity, and area under the Receiver Operating Characteristic curve (AUC) have been used for evaluation. The feature selection algorithms best suited to analyze AD proteomics data have been proposed. The key biomarkers distinguishing healthy and AD patients, Mild Cognitive Impairment (MCI) converters and non-converters, and healthy and MCI patients have been identified.
ContributorsDubey, Rashmi (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Wu, Tong (Committee member) / Arizona State University (Publisher)
Created2012
149373-Thumbnail Image.png
Description
Natural Language Processing is a subject that combines computer science and linguistics, aiming to provide computers with the ability to understand natural language and to develop a more intuitive human-computer interaction. The research community has developed ways to translate natural language to mathematical formalisms. It has not yet been shown,

Natural Language Processing is a subject that combines computer science and linguistics, aiming to provide computers with the ability to understand natural language and to develop a more intuitive human-computer interaction. The research community has developed ways to translate natural language to mathematical formalisms. It has not yet been shown, however, how to automatically translate different kinds of knowledge in English to distinct formal languages. Most of the recent work presents the problem that the translation method aims to a specific formal language or is hard to generalize. In this research, I take a first step to overcome this difficulty and present two algorithms which take as input two lambda-calculus expressions G and H and compute a lambda-calculus expression F. The expression F returned by the first algorithm satisfies F@G=H and, in the case of the second algorithm, we obtain G@F=H. The lambda expressions represent the meanings of words and sentences. For each formal language that one desires to use with the algorithms, the language must be defined in terms of lambda calculus. Also, some additional concepts must be included. After doing this, given a sentence, its representation and knowing the representation of several words in the sentence, the algorithms can be used to obtain the representation of the other words in that sentence. In this work, I define two languages and show examples of their use with the algorithms. The algorithms are illustrated along with soundness and completeness proofs, the latter with respect to typed lambda-calculus formulas up to the second order. These algorithms are a core part of a natural language semantics system that translates sentences from English to formulas in different formal languages.
ContributorsAlvarez Gonzalez, Marcos (Author) / Baral, Chitta (Thesis advisor) / Lee, Joohyung (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2010