Search Content

ARGOS: Adaptive Recognition and Gated Operation System for Real-time Vision Applications

Description

Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs remains a challenge, which impedes their deployment on embedded devices.…

Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs remains a challenge, which impedes their deployment on embedded devices. For instance, the intersection management of Connected Autonomous Vehicles (CAVs) requires running computationally intensive object recognition algorithms on low-power traffic cameras. This dissertation aims to study the effect of a dynamic hardware and software approach to address this issue. Characteristics of real-world applications can facilitate this dynamic adjustment and reduce the computation. Specifically, this dissertation starts with a dynamic hardware approach that adjusts itself based on the toughness of input and extracts deeper features if needed. Next, an adaptive learning mechanism has been studied that use extracted feature from previous inputs to improve system performance. Finally, a system (ARGOS) was proposed and evaluated that can be run on embedded systems while maintaining the desired accuracy. This system adopts shallow features at inference time, but it can switch to deep features if the system desires a higher accuracy. To improve the performance, ARGOS distills the temporal knowledge from deep features to the shallow system. Moreover, ARGOS reduces the computation furthermore by focusing on regions of interest. The response time and mean average precision are adopted for the performance evaluation to evaluate the proposed ARGOS system.

ContributorsFarhadi, Mohammad (Author) / Yang, Yezhou (Thesis advisor) / Vrudhula, Sarma (Committee member) / Wu, Carole-Jean (Committee member) / Ren, Yi (Committee member) / Arizona State University (Publisher)

Created2022

Attributable Watermarking of Speech Generative Models

Description

Generative models in various domain such as images, speeches, and videos are beingdeveloped actively over the last decades and recent deep generative models are now capable of synthesizing multimedia contents are difficult to be distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation, Intellectual property theft(IP theft) and copyright infringement. One…

Generative models in various domain such as images, speeches, and videos are beingdeveloped actively over the last decades and recent deep generative models are now capable of synthesizing multimedia contents are difficult to be distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation, Intellectual property theft(IP theft) and copyright infringement. One method to solve these threats is to embedded attributable watermarking in synthesized contents so that user can identify the user-end models where the contents are generated from. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Existing studies showed the feasibility of model attribution in the image domain and tradeoff between attribution accuracy and generation quality under the various adversarial attacks but not in speech domain. This work discuss the feasibility of model attribution in different domain and algorithmic improvements for generating user-end speech models that empirically achieve high accuracy of attribution while maintaining high generation quality. Lastly, several experiments are conducted show the tradeoff between attributability and generation quality under a variety of attacks on generated speech signals attempting to remove the watermarks.

ContributorsCho, Yongbaek (Author) / Yang, Yezhou (Thesis advisor) / Ren, Yi (Committee member) / Trieu, Ni (Committee member) / Arizona State University (Publisher)

Created2021

Cognitive Mapping for Object Searching in Indoor Scenes

Description

Visual navigation is a multi-disciplinary field across computer vision, machine learning and robotics. It is of great significance in both research and industrial applications. An intelligent agent with visual navigation ability will be capable of performing the following tasks: actively explore in environments, distinguish and localize a requested target and…

Visual navigation is a multi-disciplinary field across computer vision, machine learning and robotics. It is of great significance in both research and industrial applications. An intelligent agent with visual navigation ability will be capable of performing the following tasks: actively explore in environments, distinguish and localize a requested target and approach the target following acquired strategies. Despite a variety of advances in mobile robotics, enabling an autonomous with above-mentioned abilities is still a challenging and complex task. However, the solution to the task is very likely to accelerate the landing of assistive robots.

Reinforcement learning is a method that trains autonomous robot based on rewarding desired behaviors to help it obtain an action policy that maximizes rewards while the robot interacting with the environment. Through trial and error, an agent learns sophisticated and skillful strategies to handle complex tasks in the environment. Inspired by navigation procedures of human beings that when navigating through environments, humans reason about accessible spaces and geometry of the environment a lot based on first-person view, figure out the destination and then ease over, this work develops a model that maps from pixels to actions and inherently estimate the target as well as the free-space map. The model has three major constituents: (i) a cognitive mapper that maps the topologic free-space map from first-person view images, (ii) a target recognition network that locates a desired object and (iii) an action policy deep reinforcement learning network. Further, a planner model with cascade architecture based on multi-scale semantic top-down occupancy map input is proposed.

ContributorsZheng, Shibin (Author) / Yang, Yezhou (Thesis advisor) / Zhang, Wenlong (Committee member) / Ren, Yi (Committee member) / Arizona State University (Publisher)

Created2019

Physically Realizable Targeted Adversarial Attacks on Autonomous Driving

Description

Autonomous Driving (AD) systems are being researched and developed actively in recent days to solve the task of controlling the vehicles safely without human intervention. One method to solve such task is through deep Reinforcement Learning (RL) approach. In deep RL, the main objective is to find an optimal control…

Autonomous Driving (AD) systems are being researched and developed actively in recent days to solve the task of controlling the vehicles safely without human intervention. One method to solve such task is through deep Reinforcement Learning (RL) approach. In deep RL, the main objective is to find an optimal control behavior, often called policy performed by an agent, which is AD system in this case. This policy is usually learned through Deep Neural Networks (DNNs) based on the observations that the agent perceives along with rewards feedback received from environment.However, recent studies demonstrated the vulnerability of such control policies learned through deep RL against adversarial attacks. This raises concerns about the application of such policies to risk-sensitive tasks like AD. Previous adversarial attacks assume that the threats can be broadly realized in two ways: First one is targeted attacks through manipu- lation of the agent’s complete observation in real time and the other is untargeted attacks through manipulation of objects in environment. The former assumes full access to the agent’s observations at almost all time, while the latter has no control over outcomes of attack. This research investigates the feasibility of targeted attacks through physical adver- sarial objects in the environment, a threat that combines the effectiveness and practicality. Through simulations on one of the popular AD systems, it is demonstrated that a fixed optimal policy can be malfunctioned over time by an attacker e.g., performing an unintended self-parking, when an adversarial object is present. The proposed approach is formulated in such a way that the attacker can learn a dynamics of the environment and also utilizes common knowledge of agent’s dynamics to realize the attack. Further, several experiments are conducted to show the effectiveness of the proposed attack on different driving scenarios empirically. Lastly, this work also studies robustness of object location, and trade-off between the attack strength and attack length based on proposed evaluation metrics.

ContributorsBuddareddygari, Prasanth (Author) / Yang, Yezhou (Thesis advisor) / Ren, Yi (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2021

Modeling Human Adaptation with Game-theoretic Intention Decoding in Human-Robot Interactions

Description

With the substantial development of intelligent robots, human-robot interaction (HRI) has become ubiquitous in applications such as collaborative manufacturing, surgical robotic operations, and autonomous driving. In all these applications, a human behavior model, which can provide predictions of human actions, is a helpful reference that helps robots to achieve intelligent…

With the substantial development of intelligent robots, human-robot interaction (HRI) has become ubiquitous in applications such as collaborative manufacturing, surgical robotic operations, and autonomous driving. In all these applications, a human behavior model, which can provide predictions of human actions, is a helpful reference that helps robots to achieve intelligent interaction with humans. The requirement elicits an essential problem of how to properly model human behavior, especially when individuals are interacting or cooperating with each other. The major objective of this thesis is to utilize the human intention decoding method to help robots enhance their performance while interacting with humans. Preliminary work on integrating human intention estimation with an HRI scenario is shown to demonstrate the benefit. In order to achieve this goal, the research topic is divided into three phases. First, a novel method of an online measure of the human's reliance on the robot, which can be estimated through the intention decoding process from human actions，is described. An experiment that requires human participants to complete an object-moving task with a robot manipulator was conducted under different conditions of distractions. A relationship is discovered between human intention and trust while participants performed a familiar task with no distraction. This finding suggests a relationship between the psychological construct of trust and joint physical coordination, which bridges the human's action to its mental states. Then, a novel human collaborative dynamic model is introduced based on game theory and bounded rationality, which is a novel method to describe human dyadic behavior with the aforementioned theories. The mutual intention decoding process was also considered to inform this model. Through this model, the connection between the mental states of the individuals to their cooperative actions is indicated. A haptic interface is developed with a virtual environment and the experiments are conducted with 30 human subjects. The result suggests the existence of mutual intention decoding during the human dyadic cooperative behaviors. Last, the empirical results show that allowing agents to have empathy in inference, which lets the agents understand that others might have a false understanding of their intentions, can help to achieve correct intention inference. It has been verified that knowledge about vehicle dynamics was also important to correctly infer intentions. A new courteous policy is proposed that bounded the courteous motion using its inferred set of equilibrium motions. A simulation, which is set to reproduce an intersection passing case between an autonomous car and a human driving car, is conducted to demonstrate the benefit of the novel courteous control policy.

ContributorsWang, Yiwei (Author) / Zhang, Wenlong (Thesis advisor) / Berman, Spring (Committee member) / Lee, Hyunglae (Committee member) / Ren, Yi (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2021

Moving Target Defense: Defending against Adversarial Defense

Description

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the…

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the reasons why particular combinations were more effective than others is explored.

ContributorsMazboudi, Yassine Ahmad (Author) / Yang, Yezhou (Thesis director) / Ren, Yi (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Economics Program in CLAS (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Theses and Dissertations

Filtering by

ARGOS: Adaptive Recognition and Gated Operation System for Real-time Vision Applications

Attributable Watermarking of Speech Generative Models

Cognitive Mapping for Object Searching in Indoor Scenes

Physically Realizable Targeted Adversarial Attacks on Autonomous Driving

Modeling Human Adaptation with Game-theoretic Intention Decoding in Human-Robot Interactions

Moving Target Defense: Defending against Adversarial Defense