Search Content

Learning Transferable Data Representations Using Deep Generative Models

Description

Machine learning models convert raw data in the form of video, images, audio,

text, etc. into feature representations that are convenient for computational process-

ing. Deep neural networks have proven to be very efficient feature extractors for a

variety of machine learning tasks. Generative models based on deep neural networks

introduce constraints on the…

Machine learning models convert raw data in the form of video, images, audio,

text, etc. into feature representations that are convenient for computational process-

ing. Deep neural networks have proven to be very efficient feature extractors for a

variety of machine learning tasks. Generative models based on deep neural networks

introduce constraints on the feature space to learn transferable and disentangled rep-

resentations. Transferable feature representations help in training machine learning

models that are robust across different distributions of data. For example, with the

application of transferable features in domain adaptation, models trained on a source

distribution can be applied to a data from a target distribution even though the dis-

tributions may be different. In style transfer and image-to-image translation, disen-

tangled representations allow for the separation of style and content when translating

images.

This thesis examines learning transferable data representations in novel deep gen-

erative models. The Semi-Supervised Adversarial Translator (SAT) utilizes adversar-

ial methods and cross-domain weight sharing in a neural network to extract trans-

ferable representations. These transferable interpretations can then be decoded into

the original image or a similar image in another domain. The Explicit Disentangling

Network (EDN) utilizes generative methods to disentangle images into their core at-

tributes and then segments sets of related attributes. The EDN can separate these

attributes by controlling the ow of information using a novel combination of losses

and network architecture. This separation of attributes allows precise modi_cations

to speci_c components of the data representation, boosting the performance of ma-

chine learning tasks. The effectiveness of these models is evaluated across domain

adaptation, style transfer, and image-to-image translation tasks.

ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)

Created2018

Addressing Problems Facing Unmanned Aerial System Scheduling Systems in Urban Environments

Description

Research literature was reviewed to find recommended tools and technologies for operating Unmanned Aerial Systems (UAS) fleets in an urban environment. However, restrictive legislation prohibits fully autonomous flight without an operator. Existing literature covers considerations for operating UAS fleets in a controlled environment, with an emphasis on the effect different…

Research literature was reviewed to find recommended tools and technologies for operating Unmanned Aerial Systems (UAS) fleets in an urban environment. However, restrictive legislation prohibits fully autonomous flight without an operator. Existing literature covers considerations for operating UAS fleets in a controlled environment, with an emphasis on the effect different networking approaches have on the topology of the UAS network. The primary network topology used to implement UAS communications is 802.11 protocols, which can transmit telemetry and a video stream using off the shelf hardware. Other implementations use low-frequency radios for long distance communication, or higher latency 4G LTE modems to access existing network infrastructure. However, a gap remains testing different network topologies outside of a controlled environment.

With the correct permits in place, further research can explore how different UAS network topologies behave in an urban environment when implemented with off the shelf UAS hardware. In addition to testing different network topologies, this thesis covers the implementation of building a secure, scalable system using modern cloud computation tools and services capable of supporting a variable number of UAS. The system also supports the end-to-end simulation of the system considering factors such as battery life and realistic UAS kinematics. The implementation of the system leads to new findings needed to deploy UAS fleets in urban environments.

ContributorsD'Souza, Daniel (Author) / Panchanathan, Sethuraman (Thesis advisor) / Berman, Spring (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)

Created2018

Convolutional Neural Networks for Facial Expression Recognition

Description

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can…

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can be used to understand the image better through recognizing different features present within the image. Deep CNNs, however, require training sets that can be larger than a million pictures in order to fine tune their feature detectors. For the case of facial expression datasets, none of these large datasets are available. Due to this limited availability of data required to train a new CNN, the idea of using naïve domain adaptation is explored. Instead of creating and using a new CNN trained specifically to extract features related to FER, a previously trained CNN originally trained for another computer vision task is used. Work for this research involved creating a system that can run a CNN, can extract feature vectors from the CNN, and can classify these extracted features. Once this system was built, different aspects of the system were tested and tuned. These aspects include the pre-trained CNN that was used, the layer from which features were extracted, normalization used on input images, and training data for the classifier. Once properly tuned, the created system returned results more accurate than previous attempts on facial expression recognition. Based on these positive results, naïve domain adaptation is shown to successfully leverage advantages of deep CNNs for facial expression recognition.

ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

The Dyadic Interaction Assistant for Individuals with Visual Impairments

Description

This paper presents an overview of The Dyadic Interaction Assistant for Individuals with Visual Impairments with a focus on the software component. The system is designed to communicate facial information (facial Action Units, facial expressions, and facial features) to an individual with visual impairments in a dyadic interaction between two…

This paper presents an overview of The Dyadic Interaction Assistant for Individuals with Visual Impairments with a focus on the software component. The system is designed to communicate facial information (facial Action Units, facial expressions, and facial features) to an individual with visual impairments in a dyadic interaction between two people sitting across from each other. Comprised of (1) a webcam, (2) software, and (3) a haptic device, the system can also be described as a series of input, processing, and output stages, respectively. The processing stage of the system builds on the open source FaceTracker software and the application Computer Expression Recognition Toolbox (CERT). While these two sources provide the facial data, the program developed through the IDE Qt Creator and several AppleScripts are used to adapt the information to a Graphical User Interface (GUI) and output the data to a comma-separated values (CSV) file. It is the first software to convey all 3 types of facial information at once in real-time. Future work includes testing and evaluating the quality of the software with human subjects (both sighted and blind/low vision), integrating the haptic device to complete the system, and evaluating the entire system with human subjects (sighted and blind/low vision).

ContributorsBrzezinski, Chelsea Victoria (Author) / Balasubramanian, Vineeth (Thesis director) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2013-05

Modeling and control for vision based rear wheel drive robot and solving indoor SLAM problem using LIDAR

Description

To achieve the ambitious long-term goal of a feet of cooperating Flexible Autonomous

Machines operating in an uncertain Environment (FAME), this thesis addresses several

critical modeling, design, control objectives for rear-wheel drive ground vehicles.

Toward this ambitious goal, several critical objectives are addressed. One central objective of the thesis was to show how…

To achieve the ambitious long-term goal of a feet of cooperating Flexible Autonomous

Machines operating in an uncertain Environment (FAME), this thesis addresses several

critical modeling, design, control objectives for rear-wheel drive ground vehicles.

Toward this ambitious goal, several critical objectives are addressed. One central objective of the thesis was to show how to build low-cost multi-capability robot platform

that can be used for conducting FAME research.

A TFC-KIT car chassis was augmented to provide a suite of substantive capabilities.

The augmented vehicle (FreeSLAM Robot) costs less than $500 but offers the capability

of commercially available vehicles costing over $2000.

All demonstrations presented involve rear-wheel drive FreeSLAM robot. The following

summarizes the key hardware demonstrations presented and analyzed:

(1)Cruise (v, ) control along a line,

(2) Cruise (v, ) control along a curve,

(3) Planar (x, y) Cartesian Stabilization for rear wheel drive vehicle,

(4) Finish the track with camera pan tilt structure in minimum time,

(5) Finish the track without camera pan tilt structure in minimum time,

(6) Vision based tracking performance with different cruise speed vx,

(7) Vision based tracking performance with different camera fixed look-ahead distance L,

(8) Vision based tracking performance with different delay Td from vision subsystem,

(9) Manually remote controlled robot to perform indoor SLAM,

(10) Autonomously line guided robot to perform indoor SLAM.

For most cases, hardware data is compared with, and corroborated by, model based

simulation data. In short, the thesis uses low-cost self-designed rear-wheel

drive robot to demonstrate many capabilities that are critical in order to reach the

longer-term FAME goal.

ContributorsLu, Xianglong (Author) / Rodriguez, Armando Antonio (Thesis advisor) / Berman, Spring (Committee member) / Artemiadis, Panagiotis (Committee member) / Arizona State University (Publisher)

Created2016

Design of a Graph Neural Network Coupled with an Advantage Actor-Critic Reinforcement Learning Algorithm for Multi-Agent Navigation

Description

A Graph Neural Network (GNN) is a type of neural network architecture that operates on data consisting of objects and their relationships, which are represented by a graph. Within the graph, nodes represent objects and edges represent associations between those objects. The representation of relationships and correlations between data is…

A Graph Neural Network (GNN) is a type of neural network architecture that operates on data consisting of objects and their relationships, which are represented by a graph. Within the graph, nodes represent objects and edges represent associations between those objects. The representation of relationships and correlations between data is unique to graph structures. GNNs exploit this feature of graphs by augmenting both forms of data, individual and relational, and have been designed to allow for communication and sharing of data within each neural network layer. These benefits allow each node to have an enriched perspective, or a better understanding, of its neighbouring nodes and its connections to those nodes. The ability of GNNs to efficiently process high-dimensional node data and multi-faceted relationships among nodes gives them advantages over neural network architectures such as Convolutional Neural Networks (CNNs) that do not implicitly handle relational data. These quintessential characteristics of GNN models make them suitable for solving problems in which the correspondences among input data are needed to produce an accurate and precise representation of these data. GNN frameworks may significantly improve existing communication and control techniques for multi-agent tasks by implicitly representing not only information associated with the individual agents, such as agent position, velocity, and camera data, but also their relationships with one another, such as distances between the agents and their ability to communicate with one another. One such task is a multi-agent navigation problem in which the agents must coordinate with one another in a decentralized manner, using proximity sensors only, to navigate safely to their intended goal positions in the environment without collisions or deadlocks. The contribution of this thesis is the design of an end-to-end decentralized control scheme for multi-agent navigation that utilizes GNNs to prevent inter-agent collisions and deadlocks. The contributions consist of the development, simulation and evaluation of the performance of an advantage actor-critic (A2C) reinforcement learning algorithm that employs actor and critic networks for training that simultaneously approximate the policy function and value function, respectively. These networks are implemented using GNN frameworks for navigation by groups of 3, 5, 10 and 15 agents in simulated two-dimensional environments. It is observed that in $40\%$ to $50\%$ of the simulation trials, between 70$\%$ to 80$\%$ of the agents reach their goal positions without colliding with other agents or becoming trapped in deadlocks. The model is also compared to a random run simulation, where actions are chosen randomly for the agents and observe that the model performs notably well for smaller groups of agents.

ContributorsAyalasomayajula, Manaswini (Author) / Berman, Spring (Thesis advisor) / Mian, Sami (Committee member) / Pavlic, Theodore (Committee member) / Arizona State University (Publisher)

Created2022

AvaCAR

Description

For a system of autonomous vehicles functioning together in a traffic scene, 3Dunderstanding of participants in the field of view or surrounding is very essential for assessing the safety operation of the involved. This problem can be decomposed into online pose and shape estimation, which has been a core research area of…

For a system of autonomous vehicles functioning together in a traffic scene, 3Dunderstanding of participants in the field of view or surrounding is very essential for assessing the safety operation of the involved. This problem can be decomposed into online pose and shape estimation, which has been a core research area of computer vision for over a decade now. This work is an add-on to support and improve the joint estimate of the pose and shape of vehicles from monocular cameras. The objective of jointly estimating the vehicle pose and shape online is enabled by what is called an offline reconstruction pipeline. In the offline reconstruction step, an approach to obtain the vehicle 3D shape with keypoints labeled is formulated. This work proposes a multi-view reconstruction pipeline using images and masks which can create an approximate shape of vehicles and can be used as a shape prior. Then a 3D model-fitting optimization approach to refine the shape prior using high quality computer-aided design (CAD) models of vehicles is developed. A dataset of such 3D vehicles with 20 keypoints annotated is prepared and call it the AvaCAR dataset. The AvaCAR dataset can be used to estimate the vehicle shape and pose, without having the need to collect significant amounts of data needed for adequate training of a neural network. The online reconstruction can use this synthesis dataset to generate novel viewpoints and simultaneously train a neural network for pose and shape estimation. Most methods in the current literature using deep neural networks, that are trained to estimate pose of the object from a single image, are inherently biased to the viewpoint of the images used. This approach aims at addressing these existing limitations in the current method by delivering the online estimation a shape prior which can generate novel views to account for the bias due to viewpoint. The dataset is provided with ground truth extrinsic parameters and the compact vector based shape representations which along with the multi-view dataset can be used to efficiently trained neural networks for vehicle pose and shape estimation. The vehicles in this library are evaluated with some standard metrics to assure they are capable of aiding online estimation and model based tracking.

ContributorsDUTTA, PRABAL BIJOY (Author) / Yang, Yezhou (Thesis advisor) / Berman, Spring (Committee member) / Lu, Duo (Committee member) / Arizona State University (Publisher)

Created2022

An Investigation into Modern Facial Expressions Recognition by a Computer

Description

Facial Expressions Recognition using the Convolution Neural Network has been actively researched upon in the last decade due to its high number of applications in the human-computer interaction domain. As Convolution Neural Networks have the exceptional ability to learn, they outperform the methods using handcrafted features. Though the state-of-the-art models…

Facial Expressions Recognition using the Convolution Neural Network has been actively researched upon in the last decade due to its high number of applications in the human-computer interaction domain. As Convolution Neural Networks have the exceptional ability to learn, they outperform the methods using handcrafted features. Though the state-of-the-art models achieve high accuracy on the lab-controlled images, they still struggle for the wild expressions. Wild expressions are captured in a real-world setting and have natural expressions. Wild databases have many challenges such as occlusion, variations in lighting conditions and head poses. In this work, I address these challenges and propose a new model containing a Hybrid Convolutional Neural Network with a Fusion Layer. The Fusion Layer utilizes a combination of the knowledge obtained from two different domains for enhanced feature extraction from the in-the-wild images. I tested my network on two publicly available in-the-wild datasets namely RAF-DB and AffectNet. Next, I tested my trained model on CK+ dataset for the cross-database evaluation study. I prove that my model achieves comparable results with state-of-the-art methods. I argue that it can perform well on such datasets because it learns the features from two different domains rather than a single domain. Last, I present a real-time facial expression recognition system as a part of this work where the images are captured in real-time using laptop camera and passed to the model for obtaining a facial expression label for it. It indicates that the proposed model has low processing time and can produce output almost instantly.

ContributorsChhabra, Sachin (Author) / Li, Baoxin (Thesis advisor) / Venkateswara, Hemanth (Committee member) / Srivastava, Siddharth (Committee member) / Arizona State University (Publisher)

Created2019

Multi-Agent Control for Collective Construction using Chemical Reaction Network Models

Description

Chemical Reaction Networks (CRNs) provide a useful framework for modeling andcontrolling large numbers of agents that undergo stochastic transitions between a set of states in a manner similar to chemical compounds. By utilizing CRN models to design agent control policies, some of the computational challenges in the coordination of multi-agent systems can be…

Chemical Reaction Networks (CRNs) provide a useful framework for modeling andcontrolling large numbers of agents that undergo stochastic transitions between a set of states in a manner similar to chemical compounds. By utilizing CRN models to design agent control policies, some of the computational challenges in the coordination of multi-agent systems can be overcome. In this thesis, a CRN model is developed that defines agent control policies for a multi-agent construction task. The use of surface CRNs to overcome the tradeoff between speed and accuracy of task performance is explained. The computational difficulties involved in coordinating multiple agents to complete collective construction tasks is then discussed. A method for stochastic task and motion planning (TAMP) is proposed to explain how a TAMP solver can be applied with CRNs to coordinate multiple agents. This work defines a collective construction scenario in which a group of noncommunicating agents must rearrange blocks on a discrete domain with obstacles into a predefined target distribution. Four different construction tasks are considered with 10, 20, 30, or 40 blocks, and a simulation of each scenario with 2, 4, 6, or 8 agents is performed. As the number of blocks increases, the construction problem becomes more complex, and a given population of agents requires more time to complete the task. Populations of fewer than 8 agents are unable to solve the 30-block and 40-block problems in the allotted simulation time, suggesting an inflection point for computational feasibility, implying that beyond that point the solution times for fewer than 8 agents would be expected to increase significantly. For a group of 8 agents, the time to complete the task generally increases as the number of blocks increases, except for the 30-block problem, which has specifications that make the task slightly easier for the agents to complete compared to the 20-block problem. For the 10-block and 20- block problems, the time to complete the task decreases as the number of agents increases; however, the marginal effect of each additional two agents on this time decreases. This can be explained through the pigeonhole principle: since there area finite number of states, when the number of agents is greater than the number of available spaces, deadlocks start to occur and the expectation is that the overall solution time to tend to infinity.

ContributorsKamojjhala, Pranav (Author) / Berman, Spring (Thesis advisor) / Fainekos, Gergios E (Thesis advisor) / Pavlic, Theodore P (Committee member) / Arizona State University (Publisher)

Created2022

Decentralized Motion Planning for Autonomous Multi-Agent Systems: Multi-Segment Manipulators and Mobile Robot Collectives

Description

Multi-segment manipulators and mobile robot collectives are examples of multi-agent robotic systems, in which each segment or robot can be considered an agent. Fundamental motion control problems for such systems include the stabilization of one or more agents to target configurations or trajectories while preventing inter-agent collisions, agent collisions with…

Multi-segment manipulators and mobile robot collectives are examples of multi-agent robotic systems, in which each segment or robot can be considered an agent. Fundamental motion control problems for such systems include the stabilization of one or more agents to target configurations or trajectories while preventing inter-agent collisions, agent collisions with obstacles, and deadlocks. Despite extensive research on these control problems, there are still challenges in designing controllers that (1) are scalable with the number of agents; (2) have theoretical guarantees on collision-free agent navigation; and (3) can be used when the states of the agents and the environment are only partially observable. Existing centralized and distributed control architectures have limited scalability due to their computational complexity and communication requirements, while decentralized control architectures are often effective only under impractical assumptions that do not hold in real-world implementations. The main objective of this dissertation is to develop and evaluate decentralized approaches for multi-agent motion control that enable agents to use their onboard sensors and computational resources to decide how to move through their environment, with limited or absent inter-agent communication and external supervision. Specifically, control approaches are designed for multi-segment manipulators and mobile robot collectives to achieve position and pose (position and orientation) stabilization, trajectory tracking, and collision and deadlock avoidance. These control approaches are validated in both simulations and physical experiments to show that they can be implemented in real-time while remaining computationally tractable. First, kinematic controllers are proposed for position stabilization and trajectory tracking control of two- or three-dimensional hyper-redundant multi-segment manipulators. Next, robust and gradient-based feedback controllers are presented for individual holonomic and nonholonomic mobile robots that achieve position stabilization, trajectory tracking control, and obstacle avoidance. Then, nonlinear Model Predictive Control methods are developed for collision-free, deadlock-free pose stabilization and trajectory tracking control of multiple nonholonomic mobile robots in known and unknown environments with obstacles, both static and dynamic. Finally, a feedforward proportional-derivative controller is defined for collision-free velocity tracking of a moving ground target by multiple unmanned aerial vehicles.

ContributorsSalimi Lafmejani, Amir (Author) / Berman, Spring (Thesis advisor) / Tsakalis, Konstantinos (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Marvi, Hamidreza (Committee member) / Arizona State University (Publisher)

Created2022

Filtering by