Matching Items (166)
Filtering by

Clear all filters

Description
Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more

Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more safe and efficient AI agents that can navigate through our cities. However, driving is a very complex task to master even for a human, let alone the challenges in developing robots to do the same. It requires attention and inputs from the surroundings of the car, and it is nearly impossible for us to program all the possible factors affecting this complex task. As a solution, imitation learning was introduced, wherein the agents learn a policy, mapping the observations to the actions through demonstrations given by humans. Through imitation learning, one could easily teach self-driving cars the expected behavior in many scenarios. Despite their autonomous nature, it is undeniable that humans play a vital role in the development and execution of safe and trustworthy self-driving cars and hence form the strongest link in this application of Human-Robot Interaction. Several approaches were taken to incorporate this link between humans and self-driving cars, one of which involves the communication of human's navigational instruction to self-driving cars. The communicative channel provides humans with control over the agent’s decisions as well as the ability to guide them in real-time. In this work, the abilities of imitation learning in creating a self-driving agent that can follow natural language instructions given by humans based on environmental objects’ descriptions were explored. The proposed model architecture is capable of handling latent temporal context in these instructions thus making the agent capable of taking multiple decisions along its course. The work shows promising results that push the boundaries of natural language instructions and their complexities in navigating self-driving cars through towns.
ContributorsMoudhgalya, Nithish B (Author) / Amor, Hani Ben (Thesis advisor) / Baral, Chitta (Committee member) / Yang, Yezhou (Committee member) / Zhang, Wenlong (Committee member) / Arizona State University (Publisher)
Created2021
161431-Thumbnail Image.png
Description
In videos that contain actions performed unintentionally, agents do not achieve their desired goals. In such videos, it is challenging for computer vision systems to understand high-level concepts such as goal-directed behavior. On the other hand, from a very early age, humans are able to understand the relation between an

In videos that contain actions performed unintentionally, agents do not achieve their desired goals. In such videos, it is challenging for computer vision systems to understand high-level concepts such as goal-directed behavior. On the other hand, from a very early age, humans are able to understand the relation between an agent and their ultimate goal even if the action gets disrupted or unintentional effects occur. Inculcating this ability in artificially intelligent agents would make them better social learners by not just learning from their own mistakes, i.e, reinforcement learning, but also learning from other's mistakes. For example, this could greatly reduce the search space for artificially intelligent agents for finding the correct action sequence when trying to achieve a new goal, since they would be able to learn from others what not to do as well as how/when actions result in undesired outcomes.To validate this ability of deep learning models to perform this task, the Weakly Augmented Oops (W-Oops) dataset is proposed, built upon the Oops dataset. W-Oops consists of 2,100 unintentional human action videos, with 44 goal-directed and 33 unintentional video-level activity labels collected through human annotations. Inspired by previous methods on tasks such as weakly supervised action localization which show promise for achieving good localization results without ground truth segment annotations, this paper proposes a weakly supervised algorithm for localizing the goal-directed as well as the unintentional temporal region of a video using only video-level labels. In particular, an attention mechanism based strategy is employed that predicts the temporal regions which contributes the most to a classification task, leveraging solely video-level labels. Meanwhile, our designed overlap regularization allows the model to focus on distinct portions of the video for inferring the goal-directed and unintentional activity, while guaranteeing their temporal ordering. Extensive quantitative experiments verify the validity of our localization method.
ContributorsChakravarthy, Arnav (Author) / Yang, Yezhou (Thesis advisor) / Davulcu, Hasan (Committee member) / Pavlic, Theodore (Committee member) / Arizona State University (Publisher)
Created2021
Description

Standardization is sorely lacking in the field of musical machine learning. This thesis project endeavors to contribute to this standardization by training three machine learning models on the same dataset and comparing them using the same metrics. The music-specific metrics utilized provide more relevant information for diagnosing the shortcomings of

Standardization is sorely lacking in the field of musical machine learning. This thesis project endeavors to contribute to this standardization by training three machine learning models on the same dataset and comparing them using the same metrics. The music-specific metrics utilized provide more relevant information for diagnosing the shortcomings of each model.

ContributorsHilliker, Jacob (Author) / Li, Baoxin (Thesis director) / Libman, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)
Created2021-12
154380-Thumbnail Image.png
Description
In brain imaging study, 3D surface-based algorithms may provide more advantages over volume-based methods, due to their sub-voxel accuracy to represent subtle subregional changes and solid mathematical foundations on which global shape analyses can be achieved on complicated topological structures, such as the convoluted cortical surfaces. On the other hand,

In brain imaging study, 3D surface-based algorithms may provide more advantages over volume-based methods, due to their sub-voxel accuracy to represent subtle subregional changes and solid mathematical foundations on which global shape analyses can be achieved on complicated topological structures, such as the convoluted cortical surfaces. On the other hand, given the enormous amount of data being generated daily, it is still challenging to develop effective and efficient surface-based methods to analyze brain shape morphometry. There are two major problems in surface-based shape analysis research: correspondence and similarity. This dissertation covers both topics by proposing novel surface registration and indexing algorithms based on conformal geometry for brain morphometry analysis.

First, I propose a surface fluid registration system, which extends the traditional image fluid registration to surfaces. With surface conformal parameterization, the complexity of the proposed registration formula has been greatly reduced, compared to prior methods. Inverse consistency is also incorporated to drive a symmetric correspondence between surfaces. After registration, the multivariate tensor-based morphometry (mTBM) is computed to measure local shape deformations. The algorithm was applied to study hippocampal atrophy associated with Alzheimer's disease (AD).

Next, I propose a ventricular surface registration algorithm based on hyperbolic Ricci flow, which computes a global conformal parameterization for each ventricular surface without introducing any singularity. Furthermore, in the parameter space, unique hyperbolic geodesic curves are introduced to guide consistent correspondences across subjects, a technique called geodesic curve lifting. Tensor-based morphometry (TBM) statistic is computed from the registration to measure shape changes. This algorithm was applied to study ventricular enlargement in mild cognitive impatient (MCI) converters.

Finally, a new shape index, the hyperbolic Wasserstein distance, is introduced. This algorithm computes the Wasserstein distance between general topological surfaces as a shape similarity measure of different surfaces. It is based on hyperbolic Ricci flow, hyperbolic harmonic map, and optimal mass transportation map, which is extended to hyperbolic space. This method fills a gap in the Wasserstein distance study, where prior work only dealt with images or genus-0 closed surfaces. The algorithm was applied in an AD vs. control cortical shape classification study and achieved promising accuracy rate.
ContributorsShi, Jie, Ph.D (Author) / Wang, Yalin (Thesis advisor) / Caselli, Richard (Committee member) / Li, Baoxin (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)
Created2016
156898-Thumbnail Image.png
Description
Virtual digital assistants are automated software systems which assist humans by understanding natural languages such as English, either in voice or textual form. In recent times, a lot of digital applications have shifted towards providing a user experience using natural language interface. The change is brought up by the degree

Virtual digital assistants are automated software systems which assist humans by understanding natural languages such as English, either in voice or textual form. In recent times, a lot of digital applications have shifted towards providing a user experience using natural language interface. The change is brought up by the degree of ease with which the virtual digital assistants such as Google Assistant and Amazon Alexa can be integrated into your application. These assistants make use of a Natural Language Understanding (NLU) system which acts as an interface to translate unstructured natural language data into a structured form. Such an NLU system uses an intent finding algorithm which gives a high-level idea or meaning of a user query, termed as intent classification. The intent classification step identifies the action(s) that a user wants the assistant to perform. The intent classification step is followed by an entity recognition step in which the entities in the utterance are identified on which the intended action is performed. This step can be viewed as a sequence labeling task which maps an input word sequence into a corresponding sequence of slot labels. This step is also termed as slot filling.

In this thesis, we improve the intent classification and slot filling in the virtual voice agents by automatic data augmentation. Spoken Language Understanding systems face the issue of data sparsity. The reason behind this is that it is hard for a human-created training sample to represent all the patterns in the language. Due to the lack of relevant data, deep learning methods are unable to generalize the Spoken Language Understanding model. This thesis expounds a way to overcome the issue of data sparsity in deep learning approaches on Spoken Language Understanding tasks. Here we have described the limitations in the current intent classifiers and how the proposed algorithm uses existing knowledge bases to overcome those limitations. The method helps in creating a more robust intent classifier and slot filling system.
ContributorsGarg, Prashant (Author) / Baral, Chitta (Thesis advisor) / Kumar, Hemanth (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2018
153595-Thumbnail Image.png
Description
A major challenge in automated text analysis is that different words are used for related concepts. Analyzing text at the surface level would treat related concepts (i.e. actors, actions, targets, and victims) as different objects, potentially missing common narrative patterns. Generalized concepts are used to overcome this problem. Generalization may

A major challenge in automated text analysis is that different words are used for related concepts. Analyzing text at the surface level would treat related concepts (i.e. actors, actions, targets, and victims) as different objects, potentially missing common narrative patterns. Generalized concepts are used to overcome this problem. Generalization may result into word sense disambiguation failing to find similarity. This is addressed by taking into account contextual synonyms. Concept discovery based on contextual synonyms reveal information about the semantic roles of the words leading to concepts. Merger engine generalize the concepts so that it can be used as features in learning algorithms.
ContributorsKedia, Nitesh (Author) / Davulcu, Hasan (Thesis advisor) / Corman, Steve R (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2015