Search Content

Why Pop? A System to Explain How Deep Learning Models Classify Music

Description

The impact of Artificial Intelligence (AI) has increased significantly in daily life. AI is taking big strides towards moving into areas of life that are critical such as healthcare but, also into areas such as entertainment and leisure. Deep neural networks have been pivotal in making all these advancements possible.…

The impact of Artificial Intelligence (AI) has increased significantly in daily life. AI is taking big strides towards moving into areas of life that are critical such as healthcare but, also into areas such as entertainment and leisure. Deep neural networks have been pivotal in making all these advancements possible. But, a well-known problem with deep neural networks is the lack of explanations for the choices it makes. To combat this, several methods have been tried in the field of research. One example of this is assigning rankings to the individual features and how influential they are in the decision-making process. In contrast a newer class of methods focuses on Concept Activation Vectors (CAV) which focus on extracting higher-level concepts from the trained model to capture more information as a mixture of several features and not just one. The goal of this thesis is to employ concepts in a novel domain: to explain how a deep learning model uses computer vision to classify music into different genres. Due to the advances in the field of computer vision with deep learning for classification tasks, it is rather a standard practice now to convert an audio clip into corresponding spectrograms and use those spectrograms as image inputs to the deep learning model. Thus, a pre-trained model can classify the spectrogram images (representing songs) into musical genres. The proposed explanation system called “Why Pop?” tries to answer certain questions about the classification process such as what parts of the spectrogram influence the model the most, what concepts were extracted and how are they different for different classes. These explanations aid the user gain insights into the model’s learnings, biases, and the decision-making process.

ContributorsSharma, Shubham (Author) / Bryan, Chris (Thesis advisor) / McDaniel, Troy (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2022

A Person-Centric Design Framework for At-Home Motor Learning in Serious Games

Description

In motor learning, real-time multi-modal feedback is a critical element in guided training. Serious games have been introduced as a platform for at-home motor training due to their highly interactive and multi-modal nature. This dissertation explores the design of a multimodal environment for at-home training in which an autonomous system…

In motor learning, real-time multi-modal feedback is a critical element in guided training. Serious games have been introduced as a platform for at-home motor training due to their highly interactive and multi-modal nature. This dissertation explores the design of a multimodal environment for at-home training in which an autonomous system observes and guides the user in the place of a live trainer, providing real-time assessment, feedback and difficulty adaptation as the subject masters a motor skill. After an in-depth review of the latest solutions in this field, this dissertation proposes a person-centric approach to the design of this environment, in contrast to the standard techniques implemented in related work, to address many of the limitations of these approaches. The unique advantages and restrictions of this approach are presented in the form of a case study in which a system entitled the "Autonomous Training Assistant" consisting of both hardware and software for guided at-home motor learning is designed and adapted for a specific individual and trainer.

In this work, the design of an autonomous motor learning environment is approached from three areas: motor assessment, multimodal feedback, and serious game design. For motor assessment, a 3-dimensional assessment framework is proposed which comprises of 2 spatial (posture, progression) and 1 temporal (pacing) domains of real-time motor assessment. For multimodal feedback, a rod-shaped device called the "Intelligent Stick" is combined with an audio-visual interface to provide feedback to the subject in three domains (audio, visual, haptic). Feedback domains are mapped to modalities and feedback is provided whenever the user's performance deviates from the ideal performance level by an adaptive threshold. Approaches for multi-modal integration and feedback fading are discussed. Finally, a novel approach for stealth adaptation in serious game design is presented. This approach allows serious games to incorporate motor tasks in a more natural way, facilitating self-assessment by the subject. An evaluation of three different stealth adaptation approaches are presented and evaluated using the flow-state ratio metric. The dissertation concludes with directions for future work in the integration of stealth adaptation techniques across the field of exergames.

ContributorsTadayon, Ramin (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Amresh, Ashish (Committee member) / Glenberg, Arthur (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2017

Analysis of Machine Learning Assisted Fatigue Identification in Radiology Readings

Description

Fatigue in radiology is a readily studied area. Machine learning concepts appliedto the identification of fatigue are also readily available. However, the intersection between the two areas is not a relative commonality. This study looks to explore the intersection of fatigue in radiology and machine learning concepts by analyzing temporal trends in multivariate…

Fatigue in radiology is a readily studied area. Machine learning concepts appliedto the identification of fatigue are also readily available. However, the intersection between the two areas is not a relative commonality. This study looks to explore the intersection of fatigue in radiology and machine learning concepts by analyzing temporal trends in multivariate time series data. A novel methodological approach using support vector machines to observe temporal trends in time-based aggregations of time series data is proposed. The data used in the study is captured in a real-world, unconstrained radiology setting where gaze and facial metrics are captured from radiologists performing live image reviews. The captured data is formatted into classes whose labels represent a window of time during the radiologist’s review. Using the labeled classes, the decision function and accuracy of trained, linear support vector machine models are evaluated to produce a visualization of temporal trends and critical inflection points as well as the contribution of individual features. Consequently, the study finds valid potential justification in the methods suggested. The study offers a prospective use of maximummargin classification to demarcate the manipulation of an abstract phenomenon such as fatigue on temporal data. Potential applications are envisioned that could improve the workload distribution of the medical act.

ContributorsHayes, Matthew (Author) / McDaniel, Troy (Thesis advisor) / Coza, Aurel (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)

Created2022

"Can I Consider You My Friend?" Moving Beyond One-Sided Conversation in Social Robotics

Description

As people begin to live longer and the population shifts to having more olderadults on Earth than young children, radical solutions will be needed to ease the burden on society. It will be essential to develop technology that can age with the individual. One solution is to keep older adults in their…

As people begin to live longer and the population shifts to having more olderadults on Earth than young children, radical solutions will be needed to ease the burden on society. It will be essential to develop technology that can age with the individual. One solution is to keep older adults in their homes longer through smart home and smart living technology, allowing them to age in place. People have many choices when choosing where to age in place, including their own homes, assisted living facilities, nursing homes, or family members. No matter where people choose to age, they may face isolation and financial hardships. It is crucial to keep finances in mind when developing Smart Home technology. Smart home technologies seek to allow individuals to stay inside their homes for as long as possible, yet little work looks at how we can use technology in different life stages. Robots are poised to impact society and ease burns at home and in the workforce. Special attention has been given to social robots to ease isolation. As social robots become accepted into society, researchers need to understand how these robots should mimic natural conversation. My work attempts to answer this question within social robotics by investigating how to make conversational robots natural and reciprocal. I investigated this through a 2x2 Wizard of Oz between-subjects user study. The study lasted four months, testing four different levels of interactivity with the robot. None of the levels were significantly different from the others, an unexpected result. I then investigated the robot’s personality, the participant’s trust, and the participant’s acceptance of the robot and how that influenced the study.

ContributorsMiller, Jordan (Author) / McDaniel, Troy (Thesis advisor) / Michael, Katina (Committee member) / Cooke, Nancy (Committee member) / Bryan, Chris (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2022

Deep domain fusion for adaptive image classification

Description

Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual…

Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual recognition models. Despite significant advances made over the past decade, the fact remains collecting and annotating the data needed to successfully train a model is a prohibitively expensive endeavor. Moreover, these models are prone to rapid performance degradation when applied to data sampled from a different domain. Recent works in the development of deep adaptation networks seek to overcome these challenges by facilitating transfer learning between source and target domains. In parallel, the unification of dominant semi-supervised learning techniques has illustrated unprecedented potential for utilizing unlabeled data to train classification models in defiance of discouragingly meager sets of annotated data.

In this thesis, a novel domain adaptation algorithm -- Domain Adaptive Fusion (DAF) -- is proposed, which encourages a domain-invariant linear relationship between the pixel-space of different domains and the prediction-space while being trained under a domain adversarial signal. The thoughtful combination of key components in unsupervised domain adaptation and semi-supervised learning enable DAF to effectively bridge the gap between source and target domains. Experiments performed on computer vision benchmark datasets for domain adaptation endorse the efficacy of this hybrid approach, outperforming all of the baseline architectures on most of the transfer tasks.

ContributorsDudley, Andrew, M.S (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Committee member) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)

Created2019

Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment

Description

Parents fulfill a pivotal role in early childhood development of social and communication

skills. In children with autism, the development of these skills can be delayed. Applied

behavioral analysis (ABA) techniques have been created to aid in skill acquisition.

Among these, pivotal response treatment (PRT) has been empirically shown to foster

improvements. Research into…

Parents fulfill a pivotal role in early childhood development of social and communication

skills. In children with autism, the development of these skills can be delayed. Applied

behavioral analysis (ABA) techniques have been created to aid in skill acquisition.

Among these, pivotal response treatment (PRT) has been empirically shown to foster

improvements. Research into PRT implementation has also shown that parents can be

trained to be effective interventionists for their children. The current difficulty in PRT

training is how to disseminate training to parents who need it, and how to support and

motivate practitioners after training.

Evaluation of the parents’ fidelity to implementation is often undertaken using video

probes that depict the dyadic interaction occurring between the parent and the child during

PRT sessions. These videos are time consuming for clinicians to process, and often result

in only minimal feedback for the parents. Current trends in technology could be utilized to

alleviate the manual cost of extracting data from the videos, affording greater

opportunities for providing clinician created feedback as well as automated assessments.

The naturalistic context of the video probes along with the dependence on ubiquitous

recording devices creates a difficult scenario for classification tasks. The domain of the

PRT video probes can be expected to have high levels of both aleatory and epistemic

uncertainty. Addressing these challenges requires examination of the multimodal data

along with implementation and evaluation of classification algorithms. This is explored

through the use of a new dataset of PRT videos.

The relationship between the parent and the clinician is important. The clinician can

provide support and help build self-efficacy in addition to providing knowledge and

modeling of treatment procedures. Facilitating this relationship along with automated

feedback not only provides the opportunity to present expert feedback to the parent, but

also allows the clinician to aid in personalizing the classification models. By utilizing a

human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the

classification models by providing additional labeled samples. This will allow the system

to improve classification and provides a person-centered approach to extracting

multimodal data from PRT video probes.

ContributorsCopenhaver Heath, Corey D (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Davulcu, Hasan (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)

Created2019

Language Image Transformer

Description

Humans perceive the environment using multiple modalities like vision, speech (language), touch, taste, and smell. The knowledge obtained from one modality usually complements the other. Learning through several modalities helps in constructing an accurate model of the environment. Most of the current vision and language models are modality-specific and, in…

Humans perceive the environment using multiple modalities like vision, speech (language), touch, taste, and smell. The knowledge obtained from one modality usually complements the other. Learning through several modalities helps in constructing an accurate model of the environment. Most of the current vision and language models are modality-specific and, in many cases, extensively use deep-learning based attention mechanisms for learning powerful representations. This work discusses the role of attention in associating vision and language for generating shared representation. Language Image Transformer (LIT) is proposed for learning multi-modal representations of the environment. It uses a training objective based on Contrastive Predictive Coding (CPC) to maximize the Mutual Information (MI) between the visual and linguistic representations. It learns the relationship between the modalities using the proposed cross-modal attention layers. It is trained and evaluated using captioning datasets, MS COCO, and Conceptual Captions. The results and the analysis offers a perspective on the use of Mutual Information Maximisation (MIM) for generating generalizable representations across multiple modalities.

ContributorsRamakrishnan, Raghavendran (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth Kumar (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)

Created2020

Haptic Vision: Augmenting Non-visual Travel Tools, Techniques, and Methods by Increasing Spatial Knowledge Through Dynamic Haptic Interactions

Description

Access to real-time situational information including the relative position and motion of surrounding objects is critical for safe and independent travel. Object or obstacle (OO) detection at a distance is primarily a task of the visual system due to the high resolution information the eyes are able to receive from…

Access to real-time situational information including the relative position and motion of surrounding objects is critical for safe and independent travel. Object or obstacle (OO) detection at a distance is primarily a task of the visual system due to the high resolution information the eyes are able to receive from afar. As a sensory organ in particular, the eyes have an unparalleled ability to adjust to varying degrees of light, color, and distance. Therefore, in the case of a non-visual traveler, someone who is blind or low vision, access to visual information is unattainable if it is positioned beyond the reach of the preferred mobility device or outside the path of travel. Although, the area of assistive technology in terms of electronic travel aids (ETA’s) has received considerable attention over the last two decades; surprisingly, the field has seen little work in the area focused on augmenting rather than replacing current non-visual travel techniques, methods, and tools. Consequently, this work describes the design of an intuitive tactile language and series of wearable tactile interfaces (the Haptic Chair, HaptWrap, and HapBack) to deliver real-time spatiotemporal data. The overall intuitiveness of the haptic mappings conveyed through the tactile interfaces are evaluated using a combination of absolute identification accuracy of a series of patterns and subjective feedback through post-experiment surveys. Two types of spatiotemporal representations are considered: static patterns representing object location at a single time instance, and dynamic patterns, added in the HaptWrap, which represent object movement over a time interval. Results support the viability of multi-dimensional haptics applied to the body to yield an intuitive understanding of dynamic interactions occurring around the navigator during travel. Lastly, it is important to point out that the guiding principle of this work centered on providing the navigator with spatial knowledge otherwise unattainable through current mobility techniques, methods, and tools, thus, providing the \emph{navigator} with the information necessary to make informed navigation decisions independently, at a distance.

ContributorsDuarte, Bryan Joiner (Author) / McDaniel, Troy (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)

Created2020

Anticipatory and Invisible Interfaces to Address Impaired Proprioception in Neurological Disorders

Description

The burden of adaptation has been a major limiting factor in the adoption rates of new wearable assistive technologies. This burden has created a necessity for the exploration and combination of two key concepts in the development of upcoming wearables: anticipation and invisibility. The combination of these two topics has…

The burden of adaptation has been a major limiting factor in the adoption rates of new wearable assistive technologies. This burden has created a necessity for the exploration and combination of two key concepts in the development of upcoming wearables: anticipation and invisibility. The combination of these two topics has created the field of Anticipatory and Invisible Interfaces (AII)

In this dissertation, a novel framework is introduced for the development of anticipatory devices that augment the proprioceptive system in individuals with neurodegenerative disorders in a seamless way that scaffolds off of existing cognitive feedback models. The framework suggests three main categories of consideration in the development of devices which are anticipatory and invisible:

• Idiosyncratic Design: How do can a design encapsulate the unique characteristics of the individual in the design of assistive aids?

• Adaptation to Intrapersonal Variations: As individuals progress through the various stages of a disability
eurological disorder, how can the technology adapt thresholds for feedback over time to address these shifts in ability?

• Context Aware Invisibility: How can the mechanisms of interaction be modified in order to reduce cognitive load?

The concepts proposed in this framework can be generalized to a broad range of domains; however, there are two primary applications for this work: rehabilitation and assistive aids. In preliminary studies, the framework is applied in the areas of Parkinsonian freezing of gait anticipation and the anticipation of body non-compliance during rehabilitative exercise.

ContributorsTadayon, Arash (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Krishnamurthi, Narayanan (Committee member) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2020

Incremental Learning With Sample Generation From Pretrained Networks

Description

In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of…

In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of catastrophic forgetting, where the model forgets previously learned knowledge as it overfits to the newly available data. Incremental learning algorithms enable deep neural networks to prevent catastrophic forgetting by retaining knowledge of previously observed data while also learning from newly available data.

This thesis presents three models for incremental learning; (i) Design of an algorithm for generative incremental learning using a pre-trained deep neural network classifier; (ii) Development of a hashing based clustering algorithm for efficient incremental learning; (iii) Design of a student-teacher coupled neural network to distill knowledge for incremental learning. The proposed algorithms were evaluated using popular vision datasets for classification tasks. The thesis concludes with a discussion about the feasibility of using these techniques to transfer information between networks and also for incremental learning applications.

ContributorsPatil, Rishabh (Author) / Venkateswara, Hemanth (Thesis advisor) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by