Matching Items (8)
Filtering by

Clear all filters

151544-Thumbnail Image.png
Description
Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse

Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse modeling, data is represented as a sparse linear combination of atoms from a "dictionary" matrix. This dissertation focuses on understanding different aspects of sparse learning, thereby enhancing the use of sparse methods by incorporating tools from machine learning. With the growing need to adapt models for large scale data, it is important to design dictionaries that can model the entire data space and not just the samples considered. By exploiting the relation of dictionary learning to 1-D subspace clustering, a multilevel dictionary learning algorithm is developed, and it is shown to outperform conventional sparse models in compressed recovery, and image denoising. Theoretical aspects of learning such as algorithmic stability and generalization are considered, and ensemble learning is incorporated for effective large scale learning. In addition to building strategies for efficiently implementing 1-D subspace clustering, a discriminative clustering approach is designed to estimate the unknown mixing process in blind source separation. By exploiting the non-linear relation between the image descriptors, and allowing the use of multiple features, sparse methods can be made more effective in recognition problems. The idea of multiple kernel sparse representations is developed, and algorithms for learning dictionaries in the feature space are presented. Using object recognition experiments on standard datasets it is shown that the proposed approaches outperform other sparse coding-based recognition frameworks. Furthermore, a segmentation technique based on multiple kernel sparse representations is developed, and successfully applied for automated brain tumor identification. Using sparse codes to define the relation between data samples can lead to a more robust graph embedding for unsupervised clustering. By performing discriminative embedding using sparse coding-based graphs, an algorithm for measuring the glomerular number in kidney MRI images is developed. Finally, approaches to build dictionaries for local sparse coding of image descriptors are presented, and applied to object recognition and image retrieval.
ContributorsJayaraman Thiagarajan, Jayaraman (Author) / Spanias, Andreas (Thesis advisor) / Frakes, David (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2013
151204-Thumbnail Image.png
Description
There is a growing interest for improved high-accuracy camera calibration methods due to the increasing demand for 3D visual media in commercial markets. Camera calibration is used widely in the fields of computer vision, robotics and 3D reconstruction. Camera calibration is the first step for extracting 3D data from a

There is a growing interest for improved high-accuracy camera calibration methods due to the increasing demand for 3D visual media in commercial markets. Camera calibration is used widely in the fields of computer vision, robotics and 3D reconstruction. Camera calibration is the first step for extracting 3D data from a 2D image. It plays a crucial role in computer vision and 3D reconstruction due to the fact that the accuracy of the reconstruction and 3D coordinate determination relies on the accuracy of the camera calibration to a great extent. This thesis presents a novel camera calibration method using a circular calibration pattern. The disadvantages and issues with existing state-of-the-art methods are discussed and are overcome in this work. The implemented system consists of techniques of local adaptive segmentation, ellipse fitting, projection and optimization. Simulation results are presented to illustrate the performance of the proposed scheme. These results show that the proposed method reduces the error as compared to the state-of-the-art for high-resolution images, and that the proposed scheme is more robust to blur in the imaged calibration pattern.
ContributorsPrakash, Charan Dudda (Author) / Karam, Lina J (Thesis advisor) / Frakes, David (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2012
151120-Thumbnail Image.png
Description
Diabetic retinopathy (DR) is a common cause of blindness occurring due to prolonged presence of diabetes. The risk of developing DR or having the disease progress is increasing over time. Despite advances in diabetes care over the years, DR remains a vision-threatening complication and one of the leading causes of

Diabetic retinopathy (DR) is a common cause of blindness occurring due to prolonged presence of diabetes. The risk of developing DR or having the disease progress is increasing over time. Despite advances in diabetes care over the years, DR remains a vision-threatening complication and one of the leading causes of blindness among American adults. Recent studies have shown that diagnosis based on digital retinal imaging has potential benefits over traditional face-to-face evaluation. Yet there is a dearth of computer-based systems that can match the level of performance achieved by ophthalmologists. This thesis takes a fresh perspective in developing a computer-based system aimed at improving diagnosis of DR images. These images are categorized into three classes according to their severity level. The proposed approach explores effective methods to classify new images and retrieve clinically-relevant images from a database with prior diagnosis information associated with them. Retrieval provides a novel way to utilize the vast knowledge in the archives of previously-diagnosed DR images and thereby improve a clinician's performance while classification can safely reduce the burden on DR screening programs and possibly achieve higher detection accuracy than human experts. To solve the three-class retrieval and classification problem, the approach uses a multi-class multiple-instance medical image retrieval framework that makes use of spectrally tuned color correlogram and steerable Gaussian filter response features. The results show better retrieval and classification performances than prior-art methods and are also observed to be of clinical and visual relevance.
ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Frakes, David (Committee member) / Arizona State University (Publisher)
Created2012
136785-Thumbnail Image.png
Description
This paper presents the design and evaluation of a haptic interface for augmenting human-human interpersonal interactions by delivering facial expressions of an interaction partner to an individual who is blind using a visual-to-tactile mapping of facial action units and emotions. Pancake shaftless vibration motors are mounted on the back of

This paper presents the design and evaluation of a haptic interface for augmenting human-human interpersonal interactions by delivering facial expressions of an interaction partner to an individual who is blind using a visual-to-tactile mapping of facial action units and emotions. Pancake shaftless vibration motors are mounted on the back of a chair to provide vibrotactile stimulation in the context of a dyadic (one-on-one) interaction across a table. This work explores the design of spatiotemporal vibration patterns that can be used to convey the basic building blocks of facial movements according to the Facial Action Unit Coding System. A behavioral study was conducted to explore the factors that influence the naturalness of conveying affect using vibrotactile cues.
ContributorsBala, Shantanu (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Department of Psychology (Contributor)
Created2014-05
157758-Thumbnail Image.png
Description
Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual

Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual recognition models. Despite significant advances made over the past decade, the fact remains collecting and annotating the data needed to successfully train a model is a prohibitively expensive endeavor. Moreover, these models are prone to rapid performance degradation when applied to data sampled from a different domain. Recent works in the development of deep adaptation networks seek to overcome these challenges by facilitating transfer learning between source and target domains. In parallel, the unification of dominant semi-supervised learning techniques has illustrated unprecedented potential for utilizing unlabeled data to train classification models in defiance of discouragingly meager sets of annotated data.

In this thesis, a novel domain adaptation algorithm -- Domain Adaptive Fusion (DAF) -- is proposed, which encourages a domain-invariant linear relationship between the pixel-space of different domains and the prediction-space while being trained under a domain adversarial signal. The thoughtful combination of key components in unsupervised domain adaptation and semi-supervised learning enable DAF to effectively bridge the gap between source and target domains. Experiments performed on computer vision benchmark datasets for domain adaptation endorse the efficacy of this hybrid approach, outperforming all of the baseline architectures on most of the transfer tasks.
ContributorsDudley, Andrew, M.S (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Committee member) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2019
158120-Thumbnail Image.png
Description
Humans perceive the environment using multiple modalities like vision, speech (language), touch, taste, and smell. The knowledge obtained from one modality usually complements the other. Learning through several modalities helps in constructing an accurate model of the environment. Most of the current vision and language models are modality-specific and, in

Humans perceive the environment using multiple modalities like vision, speech (language), touch, taste, and smell. The knowledge obtained from one modality usually complements the other. Learning through several modalities helps in constructing an accurate model of the environment. Most of the current vision and language models are modality-specific and, in many cases, extensively use deep-learning based attention mechanisms for learning powerful representations. This work discusses the role of attention in associating vision and language for generating shared representation. Language Image Transformer (LIT) is proposed for learning multi-modal representations of the environment. It uses a training objective based on Contrastive Predictive Coding (CPC) to maximize the Mutual Information (MI) between the visual and linguistic representations. It learns the relationship between the modalities using the proposed cross-modal attention layers. It is trained and evaluated using captioning datasets, MS COCO, and Conceptual Captions. The results and the analysis offers a perspective on the use of Mutual Information Maximisation (MIM) for generating generalizable representations across multiple modalities.
ContributorsRamakrishnan, Raghavendran (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth Kumar (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
158259-Thumbnail Image.png
Description
In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of

In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of catastrophic forgetting, where the model forgets previously learned knowledge as it overfits to the newly available data. Incremental learning algorithms enable deep neural networks to prevent catastrophic forgetting by retaining knowledge of previously observed data while also learning from newly available data.

This thesis presents three models for incremental learning; (i) Design of an algorithm for generative incremental learning using a pre-trained deep neural network classifier; (ii) Development of a hashing based clustering algorithm for efficient incremental learning; (iii) Design of a student-teacher coupled neural network to distill knowledge for incremental learning. The proposed algorithms were evaluated using popular vision datasets for classification tasks. The thesis concludes with a discussion about the feasibility of using these techniques to transfer information between networks and also for incremental learning applications.
ContributorsPatil, Rishabh (Author) / Venkateswara, Hemanth (Thesis advisor) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
158278-Thumbnail Image.png
Description
Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of these models to new data is constrained by the domain

Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of these models to new data is constrained by the domain gap. Many factors such as image background, image resolution, color, camera perspective and variations in the objects are responsible for the domain gap between the training data (source domain) and testing data (target domain). Domain adaptation algorithms aim to overcome the domain gap between the source and target domains and learn robust models that can perform well across both the domains.

This thesis provides solutions for the standard problem of unsupervised domain adaptation (UDA) and the more generic problem of generalized domain adaptation (GDA). The contributions of this thesis are as follows. (1) Certain and Consistent Domain Adaptation model for closed-set unsupervised domain adaptation by aligning the features of the source and target domain using deep neural networks. (2) A multi-adversarial deep learning model for generalized domain adaptation. (3) A gating model that detects out-of-distribution samples for generalized domain adaptation.

The models were tested across multiple computer vision datasets for domain adaptation.

The dissertation concludes with a discussion on the proposed approaches and future directions for research in closed set and generalized domain adaptation.
ContributorsNagabandi, Bhadrinath (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020