Matching Items (10)
Filtering by

Clear all filters

156430-Thumbnail Image.png
Description
Machine learning models convert raw data in the form of video, images, audio,

text, etc. into feature representations that are convenient for computational process-

ing. Deep neural networks have proven to be very efficient feature extractors for a

variety of machine learning tasks. Generative models based on deep neural networks

introduce constraints on the

Machine learning models convert raw data in the form of video, images, audio,

text, etc. into feature representations that are convenient for computational process-

ing. Deep neural networks have proven to be very efficient feature extractors for a

variety of machine learning tasks. Generative models based on deep neural networks

introduce constraints on the feature space to learn transferable and disentangled rep-

resentations. Transferable feature representations help in training machine learning

models that are robust across different distributions of data. For example, with the

application of transferable features in domain adaptation, models trained on a source

distribution can be applied to a data from a target distribution even though the dis-

tributions may be different. In style transfer and image-to-image translation, disen-

tangled representations allow for the separation of style and content when translating

images.

This thesis examines learning transferable data representations in novel deep gen-

erative models. The Semi-Supervised Adversarial Translator (SAT) utilizes adversar-

ial methods and cross-domain weight sharing in a neural network to extract trans-

ferable representations. These transferable interpretations can then be decoded into

the original image or a similar image in another domain. The Explicit Disentangling

Network (EDN) utilizes generative methods to disentangle images into their core at-

tributes and then segments sets of related attributes. The EDN can separate these

attributes by controlling the ow of information using a novel combination of losses

and network architecture. This separation of attributes allows precise modi_cations

to speci_c components of the data representation, boosting the performance of ma-

chine learning tasks. The effectiveness of these models is evaluated across domain

adaptation, style transfer, and image-to-image translation tasks.
ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2018
155511-Thumbnail Image.png
Description
The Internet is a major source of online news content. Online news is a form of large-scale narrative text with rich, complex contents that embed deep meanings (facts, strategic communication frames, and biases) for shaping and transitioning standards, values, attitudes, and beliefs of the masses. Currently, this body of narrative

The Internet is a major source of online news content. Online news is a form of large-scale narrative text with rich, complex contents that embed deep meanings (facts, strategic communication frames, and biases) for shaping and transitioning standards, values, attitudes, and beliefs of the masses. Currently, this body of narrative text remains untapped due—in large part—to human limitations. The human ability to comprehend rich text and extract hidden meanings is far superior to known computational algorithms but remains unscalable. In this research, computational treatment is given to online news framing for exposing a deeper level of expressivity coined “double subjectivity” as characterized by its cumulative amplification effects. A visual language is offered for extracting spatial and temporal dynamics of double subjectivity that may give insight into social influence about critical issues, such as environmental, economic, or political discourse. This research offers benefits of 1) scalability for processing hidden meanings in big data and 2) visibility of the entire network dynamics over time and space to give users insight into the current status and future trends of mass communication.
ContributorsCheeks, Loretta H. (Author) / Gaffar, Ashraf (Thesis advisor) / Wald, Dara M (Committee member) / Ben Amor, Hani (Committee member) / Doupe, Adam (Committee member) / Cooke, Nancy J. (Committee member) / Arizona State University (Publisher)
Created2017
157758-Thumbnail Image.png
Description
Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual

Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual recognition models. Despite significant advances made over the past decade, the fact remains collecting and annotating the data needed to successfully train a model is a prohibitively expensive endeavor. Moreover, these models are prone to rapid performance degradation when applied to data sampled from a different domain. Recent works in the development of deep adaptation networks seek to overcome these challenges by facilitating transfer learning between source and target domains. In parallel, the unification of dominant semi-supervised learning techniques has illustrated unprecedented potential for utilizing unlabeled data to train classification models in defiance of discouragingly meager sets of annotated data.

In this thesis, a novel domain adaptation algorithm -- Domain Adaptive Fusion (DAF) -- is proposed, which encourages a domain-invariant linear relationship between the pixel-space of different domains and the prediction-space while being trained under a domain adversarial signal. The thoughtful combination of key components in unsupervised domain adaptation and semi-supervised learning enable DAF to effectively bridge the gap between source and target domains. Experiments performed on computer vision benchmark datasets for domain adaptation endorse the efficacy of this hybrid approach, outperforming all of the baseline architectures on most of the transfer tasks.
ContributorsDudley, Andrew, M.S (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Committee member) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2019
157788-Thumbnail Image.png
Description
Parents fulfill a pivotal role in early childhood development of social and communication

skills. In children with autism, the development of these skills can be delayed. Applied

behavioral analysis (ABA) techniques have been created to aid in skill acquisition.

Among these, pivotal response treatment (PRT) has been empirically shown to foster

improvements. Research into

Parents fulfill a pivotal role in early childhood development of social and communication

skills. In children with autism, the development of these skills can be delayed. Applied

behavioral analysis (ABA) techniques have been created to aid in skill acquisition.

Among these, pivotal response treatment (PRT) has been empirically shown to foster

improvements. Research into PRT implementation has also shown that parents can be

trained to be effective interventionists for their children. The current difficulty in PRT

training is how to disseminate training to parents who need it, and how to support and

motivate practitioners after training.

Evaluation of the parents’ fidelity to implementation is often undertaken using video

probes that depict the dyadic interaction occurring between the parent and the child during

PRT sessions. These videos are time consuming for clinicians to process, and often result

in only minimal feedback for the parents. Current trends in technology could be utilized to

alleviate the manual cost of extracting data from the videos, affording greater

opportunities for providing clinician created feedback as well as automated assessments.

The naturalistic context of the video probes along with the dependence on ubiquitous

recording devices creates a difficult scenario for classification tasks. The domain of the

PRT video probes can be expected to have high levels of both aleatory and epistemic

uncertainty. Addressing these challenges requires examination of the multimodal data

along with implementation and evaluation of classification algorithms. This is explored

through the use of a new dataset of PRT videos.

The relationship between the parent and the clinician is important. The clinician can

provide support and help build self-efficacy in addition to providing knowledge and

modeling of treatment procedures. Facilitating this relationship along with automated

feedback not only provides the opportunity to present expert feedback to the parent, but

also allows the clinician to aid in personalizing the classification models. By utilizing a

human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the

classification models by providing additional labeled samples. This will allow the system

to improve classification and provides a person-centered approach to extracting

multimodal data from PRT video probes.
ContributorsCopenhaver Heath, Corey D (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Davulcu, Hasan (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)
Created2019
158101-Thumbnail Image.png
Description
Driving is the coordinated operation of mind and body for movement of a vehicle, such as a car, or a bus. Driving, being considered an everyday activity for many people, still has an issue of safety. Driver distraction is becoming a critical safety problem. Speed, drunk driving as well as

Driving is the coordinated operation of mind and body for movement of a vehicle, such as a car, or a bus. Driving, being considered an everyday activity for many people, still has an issue of safety. Driver distraction is becoming a critical safety problem. Speed, drunk driving as well as distracted driving are the three leading factors in the fatal car crashes. Distraction, which is defined as an excessive workload and limited attention, is the main paradigm that guides this research area. Driver behavior analysis can be used to address the distraction problem and provide an intelligent adaptive agent to work closely with the driver, fay beyond traditional algorithmic computational models. A variety of machine learning approaches has been proposed to estimate or predict drivers’ fatigue level using car data, driver status or a combination of them.

Three important features of intelligence and cognition are perception, attention and sensory memory. In this thesis, I focused on memory and attention as essential parts of highly intelligent systems. Without memory, systems will only show limited intelligence since their response would be exclusively based on spontaneous decision without considering the effect of previous events. I proposed a memory-based sequence to predict the driver behavior and distraction level using neural network. The work started with a large-scale experiment to collect data and make an artificial intelligence-friendly dataset. After that, the data was used to train a deep neural network to estimate the driver behavior. With a focus on memory by using Long Short Term Memory (LSTM) network to increase the level of intelligence in two dimensions: Forgiveness of minor glitches, and accumulation of anomalous behavior., I reduced the model error and computational expense by adding attention mechanism on the top of LSTM models. This system can be generalized to build and train highly intelligent agents in other domains.
ContributorsMonjezi Kouchak, Shokoufeh (Author) / Gaffar, Ashraf (Thesis advisor) / Doupe, Adam (Committee member) / Ben Amor, Hani (Committee member) / Cheeks, Loretta (Committee member) / Arizona State University (Publisher)
Created2020
157623-Thumbnail Image.png
Description
Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models

Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models have seen numerous applications in both language and vision community as they capture the information in the modality (English language) efficiently. Inspired by these language models, this work focuses on learning embedding spaces for two visual computing tasks, 1. Image Hashing 2. Zero Shot Learning. The training set was used to learn embedding spaces over which similarity/dissimilarity is measured using several distance metrics like hamming / euclidean / cosine distances. While the above-mentioned language models learn generic word embeddings, in this work task specific embeddings were learnt which can be used for Image Retrieval and Classification separately.

Image Hashing is the task of mapping images to binary codes such that some notion of user-defined similarity is preserved. The first part of this work focuses on designing a new framework that uses the hash-tags associated with web images to learn the binary codes. Such codes can be used in several applications like Image Retrieval and Image Classification. Further, this framework requires no labelled data, leaving it very inexpensive. Results show that the proposed approach surpasses the state-of-art approaches by a significant margin.

Zero-shot classification is the task of classifying the test sample into a new class which was not seen during training. This is possible by establishing a relationship between the training and the testing classes using auxiliary information. In the second part of this thesis, a framework is designed that trains using the handcrafted attribute vectors and word vectors but doesn’t require the expensive attribute vectors during test time. More specifically, an intermediate space is learnt between the word vector space and the image feature space using the hand-crafted attribute vectors. Preliminary results on two zero-shot classification datasets show that this is a promising direction to explore.
ContributorsGattupalli, Jaya Vijetha (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2019
158259-Thumbnail Image.png
Description
In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of

In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of catastrophic forgetting, where the model forgets previously learned knowledge as it overfits to the newly available data. Incremental learning algorithms enable deep neural networks to prevent catastrophic forgetting by retaining knowledge of previously observed data while also learning from newly available data.

This thesis presents three models for incremental learning; (i) Design of an algorithm for generative incremental learning using a pre-trained deep neural network classifier; (ii) Development of a hashing based clustering algorithm for efficient incremental learning; (iii) Design of a student-teacher coupled neural network to distill knowledge for incremental learning. The proposed algorithms were evaluated using popular vision datasets for classification tasks. The thesis concludes with a discussion about the feasibility of using these techniques to transfer information between networks and also for incremental learning applications.
ContributorsPatil, Rishabh (Author) / Venkateswara, Hemanth (Thesis advisor) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
158273-Thumbnail Image.png
Description
Traumatic injuries are the leading cause of death in children under 18, with head trauma being the leading cause of death in children below 5. A large but unknown number of traumatic injuries are non-accidental, i.e. inflicted. The lack of sensitivity and specificity required to diagnose Abusive Head Trauma (AHT)

Traumatic injuries are the leading cause of death in children under 18, with head trauma being the leading cause of death in children below 5. A large but unknown number of traumatic injuries are non-accidental, i.e. inflicted. The lack of sensitivity and specificity required to diagnose Abusive Head Trauma (AHT) from radiological studies results in putting the children at risk of re-injury and death. Modern Deep Learning techniques can be utilized to detect Abusive Head Trauma using Computer Tomography (CT) scans. Training models using these techniques are only a part of building AI-driven Computer-Aided Diagnostic systems. There are challenges in deploying the models to make them highly available and scalable.

The thesis models the domain of Abusive Head Trauma using Deep Learning techniques and builds an AI-driven System at scale using best Software Engineering Practices. It has been done in collaboration with Phoenix Children Hospital (PCH). The thesis breaks down AHT into sub-domains of Medical Knowledge, Data Collection, Data Pre-processing, Image Generation, Image Classification, Building APIs, Containers and Kubernetes. Data Collection and Pre-processing were done at PCH with the help of trauma researchers and radiologists. Experiments are run using Deep Learning models such as DCGAN (for Image Generation), Pretrained 2D and custom 3D CNN classifiers for the classification tasks. The trained models are exposed as APIs using the Flask web framework, contained using Docker and deployed on a Kubernetes cluster.



The results are analyzed based on the accuracy of the models, the feasibility of their implementation as APIs and load testing the Kubernetes cluster. They suggest the need for Data Annotation at the Slice level for CT scans and an increase in the Data Collection process. Load Testing reveals the auto-scalability feature of the cluster to serve a high number of requests.
ContributorsVikram, Aditya (Author) / Sanchez, Javier Gonzalez (Thesis advisor) / Gaffar, Ashraf (Thesis advisor) / Findler, Michael (Committee member) / Arizona State University (Publisher)
Created2020
158278-Thumbnail Image.png
Description
Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of these models to new data is constrained by the domain

Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of these models to new data is constrained by the domain gap. Many factors such as image background, image resolution, color, camera perspective and variations in the objects are responsible for the domain gap between the training data (source domain) and testing data (target domain). Domain adaptation algorithms aim to overcome the domain gap between the source and target domains and learn robust models that can perform well across both the domains.

This thesis provides solutions for the standard problem of unsupervised domain adaptation (UDA) and the more generic problem of generalized domain adaptation (GDA). The contributions of this thesis are as follows. (1) Certain and Consistent Domain Adaptation model for closed-set unsupervised domain adaptation by aligning the features of the source and target domain using deep neural networks. (2) A multi-adversarial deep learning model for generalized domain adaptation. (3) A gating model that detects out-of-distribution samples for generalized domain adaptation.

The models were tested across multiple computer vision datasets for domain adaptation.

The dissertation concludes with a discussion on the proposed approaches and future directions for research in closed set and generalized domain adaptation.
ContributorsNagabandi, Bhadrinath (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
168538-Thumbnail Image.png
Description
Recently, Generative Adversarial Networks (GANs) have been applied to the problem of Cold-Start Recommendation, but the training performance of these models is hampered by the extreme sparsity in warm user purchase behavior. This thesis introduces a novel representation for user-vectors by combining user demographics and user preferences, making the model

Recently, Generative Adversarial Networks (GANs) have been applied to the problem of Cold-Start Recommendation, but the training performance of these models is hampered by the extreme sparsity in warm user purchase behavior. This thesis introduces a novel representation for user-vectors by combining user demographics and user preferences, making the model a hybrid system which uses Collaborative Filtering and Content Based Recommendation. This system models user purchase behavior using weighted user-product preferences (explicit feedback) rather than binary user-product interactions (implicit feedback). Using this a novel sparse adversarial model, Sparse ReguLarized Generative Adversarial Network (SRLGAN), is developed for Cold-Start Recommendation. SRLGAN leverages the sparse user-purchase behavior which ensures training stability and avoids over-fitting on warm users. The performance of SRLGAN is evaluated on two popular datasets and demonstrate state-of-the-art results.
ContributorsShah, Aksheshkumar Ajaykumar (Author) / Venkateswara, Hemanth (Thesis advisor) / Berman, Spring (Thesis advisor) / Ladani, Leila J (Committee member) / Arizona State University (Publisher)
Created2022