Matching Items (21)

132750-Thumbnail Image.png

A Study on Resources Utilization of Deep Learning Workloads

Description

Deep learning and AI have grabbed tremendous attention in the last decade. The substantial accuracy improvement by neural networks in common tasks such as image classification and speech recognition has made deep learning as a replacement for many conventional machine

Deep learning and AI have grabbed tremendous attention in the last decade. The substantial accuracy improvement by neural networks in common tasks such as image classification and speech recognition has made deep learning as a replacement for many conventional machine learning techniques. Training Deep Neural networks require a lot of data, and therefore vast of amounts of computing resources to process the data and train the model for the neural network. The most obvious solution to solving this problem is to speed up the time it takes to train Deep Neural networks.
AI and deep learning workloads are different from the conventional cloud and mobile workloads, with respect to: (1) Computational Intensity, (2) I/O characteristics, and (3) communication pattern. While there is a considerable amount of research activity on the theoretical aspects of AI and Deep Learning algorithms that run with greater efficiency, there are only a few studies on the infrastructural impact of Deep Learning workloads on computing and storage resources in distributed systems.
It is typical to utilize a heterogeneous mixture of CPU and GPU devices to perform training on a neural network. Google Brain has a developed a reinforcement model that can place training operations across a heterogeneous cluster. Though it has only been tested with local devices in a single cluster. This study will explore the method’s capabilities and attempt to apply this method on a cluster with nodes across a network.

Contributors

Agent

Created

Date Created
2019-05

Deep Periodic Networks

Description

In the field of machine learning, reinforcement learning stands out for its ability to explore approaches to complex, high dimensional problems that outperform even expert humans. For robotic locomotion tasks reinforcement learning provides an approach to solving them without the

In the field of machine learning, reinforcement learning stands out for its ability to explore approaches to complex, high dimensional problems that outperform even expert humans. For robotic locomotion tasks reinforcement learning provides an approach to solving them without the need for unique controllers. In this thesis, two reinforcement learning algorithms, Deep Deterministic Policy Gradient and Group Factor Policy Search are compared based upon their performance in the bipedal walking environment provided by OpenAI gym. These algorithms are evaluated on their performance in the environment and their sample efficiency.

Contributors

Agent

Created

Date Created
2018-12

131140-Thumbnail Image.png

Intelli-Trail

Description

Intelli-Trail is a game where the player plays as a small blue man with the simple goal of reaching the purple door. The player will primarily interact with the game through combat. The game itself will react to

Intelli-Trail is a game where the player plays as a small blue man with the simple goal of reaching the purple door. The player will primarily interact with the game through combat. The game itself will react to the patterns in the players behavior to progressively become harder for the player to win.

Contributors

Agent

Created

Date Created
2020-05

161912-Thumbnail Image.png

Application of Deep Reinforcement Learning to Wide Area Power System and Big Data Analysis to Smart Meter Status Monitoring

Description

Due to the large scale of power systems, latency uncertainty in communication can cause severe problems in wide-area measurement systems. To resolve the issue, a significant amount of past work focuses on using emerging technologywhich is machine learning methods such

Due to the large scale of power systems, latency uncertainty in communication can cause severe problems in wide-area measurement systems. To resolve the issue, a significant amount of past work focuses on using emerging technologywhich is machine learning methods such as Q-learning to address latency issues in modern controls. Although such a method can deal with the stochastic characteristics of communication latency in the long run, the Q-learning methods tend to overestimate Q-values, leading to high bias. To solve the overestimation bias issue, the learning structure is redesigned with a twin-delayed deep deterministic policy gradient algorithm to handle the damping control issue under unknown latency in the power network. Meanwhile, a new reward function is proposed, taking into account the machine speed deviation, the episode termination prevention, and the feedback from action space. In this way, the system optimally damps down frequency oscillations while maintaining the system’s stability and reliable operation within defined limits. The simulation results verify the proposed algorithm in various perspectives including the latency sensitivity analysis under high renewable energy penetration and the comparison with other machine learning algorithms.
For example, if the proposed twin-delayed deep deterministic policy gradient algorithm is applied, the low-frequency oscillation significantly improved compared to existing algorithms.
Furthermore, under the mentorship of Dr. Yang Weng, the development of a big data analysis software project has been collaborating with the Salt River Project (SRP), a major power utility in Arizona. After a thorough examination of data for
the project, it is examined that SRP is suffering from a lot of smart meters data issues. An important goal of the project is to design big data software to monitor SRP smart meter data and to present indicators of abnormalities and special
events. Currently, the big data software interface has been developed for SRP, which has already been successfully adopted by other utilities, research institutes, and laboratories as well.

Contributors

Agent

Created

Date Created
2021

161785-Thumbnail Image.png

Disaster Analytics for Critical Infrastructures : Methods and Algorithms for Modeling Disasters and Proactive Recovery Preparedness

Description

Natural disasters are occurring increasingly around the world, causing significant economiclosses. To alleviate their adverse effect, it is crucial to plan what should be done in response
to them in a proactive manner. This research aims at developing proactive and

Natural disasters are occurring increasingly around the world, causing significant economiclosses. To alleviate their adverse effect, it is crucial to plan what should be done in response
to them in a proactive manner. This research aims at developing proactive and real-time
recovery algorithms for large-scale power networks exposed to weather events considering
uncertainty. These algorithms support the recovery decisions to mitigate the disaster impact, resulting in faster recovery of the network. The challenges associated with developing
these algorithms are summarized below:
1. Even ignoring uncertainty, when operating cost of the network is considered the problem will be a bi-level optimization which is NP-hard.
2. To meet the requirement for real-time decision making under uncertainty, the problem could be formulated a Stochastic Dynamic Program with the aim to minimize
the total cost. However, considering the operating cost of the network violates the
underlying assumptions of this approach.
3. Stochastic Dynamic Programming approach is also not applicable to realistic problem sizes, due to the curse of dimensionality.
4. Uncertainty-based approaches for failure modeling, rely on point-generation of failures and ignore the network structure.
To deal with the first challenge, in chapter 2, a heuristic solution framework is proposed, and its performance is evaluated by conducting numerical experiments. To address the second challenge, in chapter 3, after formulating the problem as a Stochastic Dynamic Program, an approximated dynamic programming heuristic is proposed to solve the problem. Numerical experiments on synthetic and realistic test-beds, show the satisfactory performance of the proposed approach. To address the third challenge, in chapter 4, an efficient base heuristic policy and an aggregation scheme in the action space is proposed. Numerical experiments on a realistic test-bed verify the ability of the proposed method to
recover the network more efficiently. Finally, to address the fourth challenge, in chapter 5, a simulation-based model is proposed that using historical data and accounting for the interaction between network components, allows for analyzing the impact of adverse events on regional service level. A realistic case study is then conducted to showcase the applicability of the approach.

Contributors

Agent

Created

Date Created
2021

158010-Thumbnail Image.png

Data-Efficient Reinforcement Learning Control of Robotic Lower-Limb Prosthesis With Human in the Loop

Description

Robotic lower limb prostheses provide new opportunities to help transfemoral amputees regain mobility. However, their application is impeded by that the impedance control parameters need to be tuned and optimized manually by prosthetists for each individual user in different task

Robotic lower limb prostheses provide new opportunities to help transfemoral amputees regain mobility. However, their application is impeded by that the impedance control parameters need to be tuned and optimized manually by prosthetists for each individual user in different task environments. Reinforcement learning (RL) is capable of automatically learning from interacting with the environment. It becomes a natural candidate to replace human prosthetists to customize the control parameters. However, neither traditional RL approaches nor the popular deep RL approaches are readily suitable for learning with limited number of samples and samples with large variations. This dissertation aims to explore new RL based adaptive solutions that are data-efficient for controlling robotic prostheses.

This dissertation begins by proposing a new flexible policy iteration (FPI) framework. To improve sample efficiency, FPI can utilize either on-policy or off-policy learning strategy, can learn from either online or offline data, and can even adopt exiting knowledge of an external critic. Approximate convergence to Bellman optimal solutions are guaranteed under mild conditions. Simulation studies validated that FPI was data efficient compared to several established RL methods. Furthermore, a simplified version of FPI was implemented to learn from offline data, and then the learned policy was successfully tested for tuning the control parameters online on a human subject.

Next, the dissertation discusses RL control with information transfer (RL-IT), or knowledge-guided RL (KG-RL), which is motivated to benefit from transferring knowledge acquired from one subject to another. To explore its feasibility, knowledge was extracted from data measurements of able-bodied (AB) subjects, and transferred to guide Q-learning control for an amputee in OpenSim simulations. This result again demonstrated that data and time efficiency were improved using previous knowledge.

While the present study is new and promising, there are still many open questions to be addressed in future research. To account for human adaption, the learning control objective function may be designed to incorporate human-prosthesis performance feedback such as symmetry, user comfort level and satisfaction, and user energy consumption. To make the RL based control parameter tuning practical in real life, it should be further developed and tested in different use environments, such as from level ground walking to stair ascending or descending, and from walking to running.

Contributors

Agent

Created

Date Created
2020

157799-Thumbnail Image.png

Sample-Efficient Reinforcement Learning of Robot Control Policies in the Real World

Description

The goal of reinforcement learning is to enable systems to autonomously solve tasks in the real world, even in the absence of prior data. To succeed in such situations, reinforcement learning algorithms collect new experience through interactions with the environment

The goal of reinforcement learning is to enable systems to autonomously solve tasks in the real world, even in the absence of prior data. To succeed in such situations, reinforcement learning algorithms collect new experience through interactions with the environment to further the learning process. The behaviour is optimized by maximizing a reward function, which assigns high numerical values to desired behaviours. Especially in robotics, such interactions with the environment are expensive in terms of the required execution time, human involvement, and mechanical degradation of the system itself. Therefore, this thesis aims to introduce sample-efficient reinforcement learning methods which are applicable to real-world settings and control tasks such as bimanual manipulation and locomotion. Sample efficiency is achieved through directed exploration, either by using dimensionality reduction or trajectory optimization methods. Finally, it is demonstrated how data-efficient reinforcement learning methods can be used to optimize the behaviour and morphology of robots at the same time.

Contributors

Agent

Created

Date Created
2019

154883-Thumbnail Image.png

Haptic perception, decision-making, and learning for manipulation with artificial hands

Description

Robotic systems are outmatched by the abilities of the human hand to perceive and manipulate the world. Human hands are able to physically interact with the world to perceive, learn, and act to accomplish tasks. Limitations of robotic systems to

Robotic systems are outmatched by the abilities of the human hand to perceive and manipulate the world. Human hands are able to physically interact with the world to perceive, learn, and act to accomplish tasks. Limitations of robotic systems to interact with and manipulate the world diminish their usefulness. In order to advance robot end effectors, specifically artificial hands, rich multimodal tactile sensing is needed. In this work, a multi-articulating, anthropomorphic robot testbed was developed for investigating tactile sensory stimuli during finger-object interactions. The artificial finger is controlled by a tendon-driven remote actuation system that allows for modular control of any tendon-driven end effector and capabilities for both speed and strength. The artificial proprioception system enables direct measurement of joint angles and tendon tensions while temperature, vibration, and skin deformation are provided by a multimodal tactile sensor. Next, attention was focused on real-time artificial perception for decision-making. A robotic system needs to perceive its environment in order to make decisions. Specific actions such as “exploratory procedures” can be employed to classify and characterize object features. Prior work on offline perception was extended to develop an anytime predictive model that returns the probability of having touched a specific feature of an object based on minimally processed sensor data. Developing models for anytime classification of features facilitates real-time action-perception loops. Finally, by combining real-time action-perception with reinforcement learning, a policy was learned to complete a functional contour-following task: closing a deformable ziplock bag. The approach relies only on proprioceptive and localized tactile data. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards within a finite time period by balancing exploration versus exploitation of the action space. Performance of the C-MAB learner was compared to a benchmark Q-learner that eventually returns the optimal policy. To assess robustness and generalizability, the learned policy was tested on variations of the original contour-following task. The work presented contributes to the full range of tools necessary to advance the abilities of artificial hands with respect to dexterity, perception, decision-making, and learning.

Contributors

Agent

Created

Date Created
2016

161939-Thumbnail Image.png

Learning Complex Behaviors from Simple Ones: An analysis of Behavior-based Modular Design for RL Agents

Description

Traditional Reinforcement Learning (RL) assumes to learn policies with respect to reward available from the environment but sometimes learning in a complex domain requires wisdom which comes from a wide range of experience. In behavior based robotics, it is observed

Traditional Reinforcement Learning (RL) assumes to learn policies with respect to reward available from the environment but sometimes learning in a complex domain requires wisdom which comes from a wide range of experience. In behavior based robotics, it is observed that a complex behavior can be described by a combination of simpler behaviors. It is tempting to apply similar idea such that simpler behaviors can be combined in a meaningful way to tailor the complex combination. Such an approach would enable faster learning and modular design of behaviors. Complex behaviors can be combined with other behaviors to create even more advanced behaviors resulting in a rich set of possibilities. Similar to RL, combined behavior can keep evolving by interacting with the environment. The requirement of this method is to specify a reasonable set of simple behaviors. In this research, I present an algorithm that aims at combining behavior such that the resulting behavior has characteristics of each individual behavior. This approach has been inspired by behavior based robotics, such as the subsumption architecture and motor schema-based design. The combination algorithm outputs n weights to combine behaviors linearly. The weights are state dependent and change dynamically at every step in an episode. This idea is tested on discrete and continuous environments like OpenAI’s “Lunar Lander” and “Biped Walker”. Results are compared with related domains like Multi-objective RL, Hierarchical RL, Transfer learning, and basic RL. It is observed that the combination of behaviors is a novel way of learning which helps the agent achieve required characteristics. A combination is learned for a given state and so the agent is able to learn faster in an efficient manner compared to other similar approaches. Agent beautifully demonstrates characteristics of multiple behaviors which helps the agent to learn and adapt to the environment. Future directions are also suggested as possible extensions to this research.

Contributors

Agent

Created

Date Created
2021

161758-Thumbnail Image.png

A Comparative Study of Multi-Agent Reinforcement Learning on Real World Problems

Description

This work investigates the multi-agent reinforcement learning methods that have applicability to real-world scenarios including stochastic, partially observable, and infinite horizon problems. These problems are hard due to large state and control spaces and may require some form of intelligent

This work investigates the multi-agent reinforcement learning methods that have applicability to real-world scenarios including stochastic, partially observable, and infinite horizon problems. These problems are hard due to large state and control spaces and may require some form of intelligent multi-agent behavior to achieve the target objective. The study also introduces novel rollout-based methods that provide reasonable guarantees to cost improvements and obtaining a sub-optimal solution to such problems while being amenable to distributed computation and hence a faster runtime. These methods, first introduced and developed for single-agent scenarios, are gradually extended to the multi-agent variants. They have been named multi-agent rollout methods. The problems studied in this work target one or more aspects of three major challenges of real-world problems. Spider and Fly problem deals with stochastic environments, multi-robot repair problem is an example of a partial observation Markov decision problem or POMDP, whereas the Flatland challenge is an RL benchmark that aims to solve the vehicle rescheduling problem. The study also includes comparisons to some existing methods that are used widely for such problems as POMCP, DESPOT, and MADDPG. The work also delineates and compares different behaviors arising out of our methods to other existing methods thereby positing the efficacy of our rollout-based methods in solving real-world multi-agent reinforcement learning problems. Additionally, the source code and problem environments have been released for the community to further the research in this field. The source code and the related research can be found on https://sahilbadyal.com/marl.

Contributors

Agent

Created

Date Created
2021