Matching Items (9)
Filtering by

Clear all filters

150181-Thumbnail Image.png
Description
Real-world environments are characterized by non-stationary and continuously evolving data. Learning a classification model on this data would require a framework that is able to adapt itself to newer circumstances. Under such circumstances, transfer learning has come to be a dependable methodology for improving classification performance with reduced training costs

Real-world environments are characterized by non-stationary and continuously evolving data. Learning a classification model on this data would require a framework that is able to adapt itself to newer circumstances. Under such circumstances, transfer learning has come to be a dependable methodology for improving classification performance with reduced training costs and without the need for explicit relearning from scratch. In this thesis, a novel instance transfer technique that adapts a "Cost-sensitive" variation of AdaBoost is presented. The method capitalizes on the theoretical and functional properties of AdaBoost to selectively reuse outdated training instances obtained from a "source" domain to effectively classify unseen instances occurring in a different, but related "target" domain. The algorithm is evaluated on real-world classification problems namely accelerometer based 3D gesture recognition, smart home activity recognition and text categorization. The performance on these datasets is analyzed and evaluated against popular boosting-based instance transfer techniques. In addition, supporting empirical studies, that investigate some of the less explored bottlenecks of boosting based instance transfer methods, are presented, to understand the suitability and effectiveness of this form of knowledge transfer.
ContributorsVenkatesan, Ashok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Li, Baoxin (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2011
156430-Thumbnail Image.png
Description
Machine learning models convert raw data in the form of video, images, audio,

text, etc. into feature representations that are convenient for computational process-

ing. Deep neural networks have proven to be very efficient feature extractors for a

variety of machine learning tasks. Generative models based on deep neural networks

introduce constraints on the

Machine learning models convert raw data in the form of video, images, audio,

text, etc. into feature representations that are convenient for computational process-

ing. Deep neural networks have proven to be very efficient feature extractors for a

variety of machine learning tasks. Generative models based on deep neural networks

introduce constraints on the feature space to learn transferable and disentangled rep-

resentations. Transferable feature representations help in training machine learning

models that are robust across different distributions of data. For example, with the

application of transferable features in domain adaptation, models trained on a source

distribution can be applied to a data from a target distribution even though the dis-

tributions may be different. In style transfer and image-to-image translation, disen-

tangled representations allow for the separation of style and content when translating

images.

This thesis examines learning transferable data representations in novel deep gen-

erative models. The Semi-Supervised Adversarial Translator (SAT) utilizes adversar-

ial methods and cross-domain weight sharing in a neural network to extract trans-

ferable representations. These transferable interpretations can then be decoded into

the original image or a similar image in another domain. The Explicit Disentangling

Network (EDN) utilizes generative methods to disentangle images into their core at-

tributes and then segments sets of related attributes. The EDN can separate these

attributes by controlling the ow of information using a novel combination of losses

and network architecture. This separation of attributes allows precise modi_cations

to speci_c components of the data representation, boosting the performance of ma-

chine learning tasks. The effectiveness of these models is evaluated across domain

adaptation, style transfer, and image-to-image translation tasks.
ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2018
155339-Thumbnail Image.png
Description
The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences

The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences in image quality (resolution, brightness, occlusion and color), changes in camera perspective, dissimilar backgrounds and an inherent diversity of the samples themselves. Machine learning techniques like transfer learning are employed to adapt computational models across distributions. Domain adaptation is a special case of transfer learning, where knowledge from a source domain is transferred to a target domain in the form of learned models and efficient feature representations.

The dissertation outlines novel domain adaptation approaches across different feature spaces; (i) a linear Support Vector Machine model for domain alignment; (ii) a nonlinear kernel based approach that embeds domain-aligned data for enhanced classification; (iii) a hierarchical model implemented using deep learning, that estimates domain-aligned hash values for the source and target data, and (iv) a proposal for a feature selection technique to reduce cross-domain disparity. These adaptation procedures are tested and validated across a range of computer vision applications like object classification, facial expression recognition, digit recognition, and activity recognition. The dissertation also provides a unique perspective of domain adaptation literature from the point-of-view of linear, nonlinear and hierarchical feature spaces. The dissertation concludes with a discussion on the future directions for research that highlight the role of domain adaptation in an era of rapid advancements in artificial intelligence.
ContributorsDemakethepalli Venkateswara, Hemanth (Author) / Panchanathan, Sethuraman (Thesis advisor) / Li, Baoxin (Committee member) / Davulcu, Hasan (Committee member) / Ye, Jieping (Committee member) / Chakraborty, Shayok (Committee member) / Arizona State University (Publisher)
Created2017
149310-Thumbnail Image.png
Description
The fields of pattern recognition and machine learning are on a fundamental quest to design systems that can learn the way humans do. One important aspect of human intelligence that has so far not been given sufficient attention is the capability of humans to express when they are certain about

The fields of pattern recognition and machine learning are on a fundamental quest to design systems that can learn the way humans do. One important aspect of human intelligence that has so far not been given sufficient attention is the capability of humans to express when they are certain about a decision, or when they are not. Machine learning techniques today are not yet fully equipped to be trusted with this critical task. This work seeks to address this fundamental knowledge gap. Existing approaches that provide a measure of confidence on a prediction such as learning algorithms based on the Bayesian theory or the Probably Approximately Correct theory require strong assumptions or often produce results that are not practical or reliable. The recently developed Conformal Predictions (CP) framework - which is based on the principles of hypothesis testing, transductive inference and algorithmic randomness - provides a game-theoretic approach to the estimation of confidence with several desirable properties such as online calibration and generalizability to all classification and regression methods. This dissertation builds on the CP theory to compute reliable confidence measures that aid decision-making in real-world problems through: (i) Development of a methodology for learning a kernel function (or distance metric) for optimal and accurate conformal predictors; (ii) Validation of the calibration properties of the CP framework when applied to multi-classifier (or multi-regressor) fusion; and (iii) Development of a methodology to extend the CP framework to continuous learning, by using the framework for online active learning. These contributions are validated on four real-world problems from the domains of healthcare and assistive technologies: two classification-based applications (risk prediction in cardiac decision support and multimodal person recognition), and two regression-based applications (head pose estimation and saliency prediction in images). The results obtained show that: (i) multiple kernel learning can effectively increase efficiency in the CP framework; (ii) quantile p-value combination methods provide a viable solution for fusion in the CP framework; and (iii) eigendecomposition of p-value difference matrices can serve as effective measures for online active learning; demonstrating promise and potential in using these contributions in multimedia pattern recognition problems in real-world settings.
ContributorsNallure Balasubramanian, Vineeth (Author) / Panchanathan, Sethuraman (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Vovk, Vladimir (Committee member) / Arizona State University (Publisher)
Created2010
157758-Thumbnail Image.png
Description
Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual

Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual recognition models. Despite significant advances made over the past decade, the fact remains collecting and annotating the data needed to successfully train a model is a prohibitively expensive endeavor. Moreover, these models are prone to rapid performance degradation when applied to data sampled from a different domain. Recent works in the development of deep adaptation networks seek to overcome these challenges by facilitating transfer learning between source and target domains. In parallel, the unification of dominant semi-supervised learning techniques has illustrated unprecedented potential for utilizing unlabeled data to train classification models in defiance of discouragingly meager sets of annotated data.

In this thesis, a novel domain adaptation algorithm -- Domain Adaptive Fusion (DAF) -- is proposed, which encourages a domain-invariant linear relationship between the pixel-space of different domains and the prediction-space while being trained under a domain adversarial signal. The thoughtful combination of key components in unsupervised domain adaptation and semi-supervised learning enable DAF to effectively bridge the gap between source and target domains. Experiments performed on computer vision benchmark datasets for domain adaptation endorse the efficacy of this hybrid approach, outperforming all of the baseline architectures on most of the transfer tasks.
ContributorsDudley, Andrew, M.S (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Committee member) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2019
157788-Thumbnail Image.png
Description
Parents fulfill a pivotal role in early childhood development of social and communication

skills. In children with autism, the development of these skills can be delayed. Applied

behavioral analysis (ABA) techniques have been created to aid in skill acquisition.

Among these, pivotal response treatment (PRT) has been empirically shown to foster

improvements. Research into

Parents fulfill a pivotal role in early childhood development of social and communication

skills. In children with autism, the development of these skills can be delayed. Applied

behavioral analysis (ABA) techniques have been created to aid in skill acquisition.

Among these, pivotal response treatment (PRT) has been empirically shown to foster

improvements. Research into PRT implementation has also shown that parents can be

trained to be effective interventionists for their children. The current difficulty in PRT

training is how to disseminate training to parents who need it, and how to support and

motivate practitioners after training.

Evaluation of the parents’ fidelity to implementation is often undertaken using video

probes that depict the dyadic interaction occurring between the parent and the child during

PRT sessions. These videos are time consuming for clinicians to process, and often result

in only minimal feedback for the parents. Current trends in technology could be utilized to

alleviate the manual cost of extracting data from the videos, affording greater

opportunities for providing clinician created feedback as well as automated assessments.

The naturalistic context of the video probes along with the dependence on ubiquitous

recording devices creates a difficult scenario for classification tasks. The domain of the

PRT video probes can be expected to have high levels of both aleatory and epistemic

uncertainty. Addressing these challenges requires examination of the multimodal data

along with implementation and evaluation of classification algorithms. This is explored

through the use of a new dataset of PRT videos.

The relationship between the parent and the clinician is important. The clinician can

provide support and help build self-efficacy in addition to providing knowledge and

modeling of treatment procedures. Facilitating this relationship along with automated

feedback not only provides the opportunity to present expert feedback to the parent, but

also allows the clinician to aid in personalizing the classification models. By utilizing a

human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the

classification models by providing additional labeled samples. This will allow the system

to improve classification and provides a person-centered approach to extracting

multimodal data from PRT video probes.
ContributorsCopenhaver Heath, Corey D (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Davulcu, Hasan (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)
Created2019
158120-Thumbnail Image.png
Description
Humans perceive the environment using multiple modalities like vision, speech (language), touch, taste, and smell. The knowledge obtained from one modality usually complements the other. Learning through several modalities helps in constructing an accurate model of the environment. Most of the current vision and language models are modality-specific and, in

Humans perceive the environment using multiple modalities like vision, speech (language), touch, taste, and smell. The knowledge obtained from one modality usually complements the other. Learning through several modalities helps in constructing an accurate model of the environment. Most of the current vision and language models are modality-specific and, in many cases, extensively use deep-learning based attention mechanisms for learning powerful representations. This work discusses the role of attention in associating vision and language for generating shared representation. Language Image Transformer (LIT) is proposed for learning multi-modal representations of the environment. It uses a training objective based on Contrastive Predictive Coding (CPC) to maximize the Mutual Information (MI) between the visual and linguistic representations. It learns the relationship between the modalities using the proposed cross-modal attention layers. It is trained and evaluated using captioning datasets, MS COCO, and Conceptual Captions. The results and the analysis offers a perspective on the use of Mutual Information Maximisation (MIM) for generating generalizable representations across multiple modalities.
ContributorsRamakrishnan, Raghavendran (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth Kumar (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
158259-Thumbnail Image.png
Description
In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of

In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of catastrophic forgetting, where the model forgets previously learned knowledge as it overfits to the newly available data. Incremental learning algorithms enable deep neural networks to prevent catastrophic forgetting by retaining knowledge of previously observed data while also learning from newly available data.

This thesis presents three models for incremental learning; (i) Design of an algorithm for generative incremental learning using a pre-trained deep neural network classifier; (ii) Development of a hashing based clustering algorithm for efficient incremental learning; (iii) Design of a student-teacher coupled neural network to distill knowledge for incremental learning. The proposed algorithms were evaluated using popular vision datasets for classification tasks. The thesis concludes with a discussion about the feasibility of using these techniques to transfer information between networks and also for incremental learning applications.
ContributorsPatil, Rishabh (Author) / Venkateswara, Hemanth (Thesis advisor) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
158278-Thumbnail Image.png
Description
Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of these models to new data is constrained by the domain

Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of these models to new data is constrained by the domain gap. Many factors such as image background, image resolution, color, camera perspective and variations in the objects are responsible for the domain gap between the training data (source domain) and testing data (target domain). Domain adaptation algorithms aim to overcome the domain gap between the source and target domains and learn robust models that can perform well across both the domains.

This thesis provides solutions for the standard problem of unsupervised domain adaptation (UDA) and the more generic problem of generalized domain adaptation (GDA). The contributions of this thesis are as follows. (1) Certain and Consistent Domain Adaptation model for closed-set unsupervised domain adaptation by aligning the features of the source and target domain using deep neural networks. (2) A multi-adversarial deep learning model for generalized domain adaptation. (3) A gating model that detects out-of-distribution samples for generalized domain adaptation.

The models were tested across multiple computer vision datasets for domain adaptation.

The dissertation concludes with a discussion on the proposed approaches and future directions for research in closed set and generalized domain adaptation.
ContributorsNagabandi, Bhadrinath (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020