Matching Items (12)

131140-Thumbnail Image.png

Intelli-Trail

Description

Intelli-Trail is a game where the player plays as a small blue man with the simple goal of reaching the purple door. The player will primarily interact with the

Intelli-Trail is a game where the player plays as a small blue man with the simple goal of reaching the purple door. The player will primarily interact with the game through combat. The game itself will react to the patterns in the players behavior to progressively become harder for the player to win.

Contributors

Agent

Created

Date Created
  • 2020-05

132750-Thumbnail Image.png

A Study on Resources Utilization of Deep Learning Workloads

Description

Deep learning and AI have grabbed tremendous attention in the last decade. The substantial accuracy improvement by neural networks in common tasks such as image classification and speech recognition has

Deep learning and AI have grabbed tremendous attention in the last decade. The substantial accuracy improvement by neural networks in common tasks such as image classification and speech recognition has made deep learning as a replacement for many conventional machine learning techniques. Training Deep Neural networks require a lot of data, and therefore vast of amounts of computing resources to process the data and train the model for the neural network. The most obvious solution to solving this problem is to speed up the time it takes to train Deep Neural networks.
AI and deep learning workloads are different from the conventional cloud and mobile workloads, with respect to: (1) Computational Intensity, (2) I/O characteristics, and (3) communication pattern. While there is a considerable amount of research activity on the theoretical aspects of AI and Deep Learning algorithms that run with greater efficiency, there are only a few studies on the infrastructural impact of Deep Learning workloads on computing and storage resources in distributed systems.
It is typical to utilize a heterogeneous mixture of CPU and GPU devices to perform training on a neural network. Google Brain has a developed a reinforcement model that can place training operations across a heterogeneous cluster. Though it has only been tested with local devices in a single cluster. This study will explore the method’s capabilities and attempt to apply this method on a cluster with nodes across a network.

Contributors

Agent

Created

Date Created
  • 2019-05

Deep Periodic Networks

Description

In the field of machine learning, reinforcement learning stands out for its ability to explore approaches to complex, high dimensional problems that outperform even expert humans. For robotic locomotion tasks

In the field of machine learning, reinforcement learning stands out for its ability to explore approaches to complex, high dimensional problems that outperform even expert humans. For robotic locomotion tasks reinforcement learning provides an approach to solving them without the need for unique controllers. In this thesis, two reinforcement learning algorithms, Deep Deterministic Policy Gradient and Group Factor Policy Search are compared based upon their performance in the bipedal walking environment provided by OpenAI gym. These algorithms are evaluated on their performance in the environment and their sample efficiency.

Contributors

Agent

Created

Date Created
  • 2018-12

158010-Thumbnail Image.png

Data-Efficient Reinforcement Learning Control of Robotic Lower-Limb Prosthesis With Human in the Loop

Description

Robotic lower limb prostheses provide new opportunities to help transfemoral amputees regain mobility. However, their application is impeded by that the impedance control parameters need to be tuned and optimized

Robotic lower limb prostheses provide new opportunities to help transfemoral amputees regain mobility. However, their application is impeded by that the impedance control parameters need to be tuned and optimized manually by prosthetists for each individual user in different task environments. Reinforcement learning (RL) is capable of automatically learning from interacting with the environment. It becomes a natural candidate to replace human prosthetists to customize the control parameters. However, neither traditional RL approaches nor the popular deep RL approaches are readily suitable for learning with limited number of samples and samples with large variations. This dissertation aims to explore new RL based adaptive solutions that are data-efficient for controlling robotic prostheses.

This dissertation begins by proposing a new flexible policy iteration (FPI) framework. To improve sample efficiency, FPI can utilize either on-policy or off-policy learning strategy, can learn from either online or offline data, and can even adopt exiting knowledge of an external critic. Approximate convergence to Bellman optimal solutions are guaranteed under mild conditions. Simulation studies validated that FPI was data efficient compared to several established RL methods. Furthermore, a simplified version of FPI was implemented to learn from offline data, and then the learned policy was successfully tested for tuning the control parameters online on a human subject.

Next, the dissertation discusses RL control with information transfer (RL-IT), or knowledge-guided RL (KG-RL), which is motivated to benefit from transferring knowledge acquired from one subject to another. To explore its feasibility, knowledge was extracted from data measurements of able-bodied (AB) subjects, and transferred to guide Q-learning control for an amputee in OpenSim simulations. This result again demonstrated that data and time efficiency were improved using previous knowledge.

While the present study is new and promising, there are still many open questions to be addressed in future research. To account for human adaption, the learning control objective function may be designed to incorporate human-prosthesis performance feedback such as symmetry, user comfort level and satisfaction, and user energy consumption. To make the RL based control parameter tuning practical in real life, it should be further developed and tested in different use environments, such as from level ground walking to stair ascending or descending, and from walking to running.

Contributors

Agent

Created

Date Created
  • 2020

158187-Thumbnail Image.png

Is the Click the Trick? The Efficacy of Clickers and Other Reinforcement Methods in Training Naïve Dogs to Perform New Tasks

Description

A handheld metal noisemaker known as a “clicker” is widely used to train new behaviors in dogs; however, evidence for the superior efficacy of clickers as opposed to providing solely

A handheld metal noisemaker known as a “clicker” is widely used to train new behaviors in dogs; however, evidence for the superior efficacy of clickers as opposed to providing solely primary reinforcement or other secondary reinforcers in the acquisition of novel behavior in dogs is almost entirely anecdotal. Three experiments were conducted to determine under what circumstances a clicker may result in acquisition of a novel behavior more rapidly or to a higher level compared to other readily available reinforcement methods. In Experiment 1, three groups of 30 dogs each were trained to emit a novel sit and stay behavior of increasing duration with either the delivery of food alone, a verbal stimulus paired with food, or a clicker with food. The group that received only a primary reinforcer reached a significantly higher criterion of training success than the group trained with a verbal secondary reinforcer. Performance of the group experiencing a clicker secondary reinforcer was intermediate between the other two groups, but not significantly different from either. In Experiment 2, three different groups of 25 dogs each were shaped to emit a nose targeting behavior and then perform that behavior at increasing distances from the experimenter using the same three methods of positive reinforcement as in Experiment 1. No statistically significant differences between the groups were found. In Experiment 3, three groups of 30 dogs each were shaped to emit a nose-targeting behavior upon an array of wooden blocks with task difficulty increasing throughout testing using the same three methods of positive reinforcement as previously. No statistically significant differences between the groups were found. Overall, the findings suggest that both clickers and other forms of positive reinforcement can be used successfully in training a dog to perform a novel behavior, but that no positive reinforcement method has significantly greater efficacy than any other.

Contributors

Agent

Created

Date Created
  • 2020

155179-Thumbnail Image.png

A novel approach to study task organization in animal groups

Description

A key factor in the success of social animals is their organization of work. Mathematical models have been instrumental in unraveling how simple, individual-based rules can generate collective patterns via

A key factor in the success of social animals is their organization of work. Mathematical models have been instrumental in unraveling how simple, individual-based rules can generate collective patterns via self-organization. However, existing models offer limited insights into how these patterns are shaped by behavioral differences within groups, in part because they focus on analyzing specific rules rather than general mechanisms that can explain behavior at the individual-level. My work argues for a more principled approach that focuses on the question of how individuals make decisions in costly environments.

In Chapters 2 and 3, I demonstrate how this approach provides novel insights into factors that shape the flexibility and robustness of task organization in harvester ant colonies (Pogonomyrmex barbatus). My results show that the degree to which colonies can respond to work in fluctuating environments depends on how individuals weigh the costs of activity and update their behavior in response to social information. In Chapter 4, I introduce a mathematical framework to study the emergence of collective organization in heterogenous groups. My approach, which is based on the theory of multi-agent systems, focuses on myopic agents whose behavior emerges out of an independent valuation of alternative choices in a given work environment. The product of this dynamic is an equilibrium organization in which agents perform different tasks (or abstain from work) with an analytically defined set of threshold probabilities. The framework is minimally developed, but can be extended to include other factors known to affect task decisions including individual experience and social facilitation. This research contributes a novel approach to developing (and analyzing) models of task organization that can be applied in a broader range of contexts where animals cooperate.

Contributors

Agent

Created

Date Created
  • 2016

158322-Thumbnail Image.png

Recommender System using Reinforcement Learning

Description

Currently, recommender systems are used extensively to find the right audience with the "right" content over various platforms. Recommendations generated by these systems aim to offer relevant items to users.

Currently, recommender systems are used extensively to find the right audience with the "right" content over various platforms. Recommendations generated by these systems aim to offer relevant items to users. Different approaches have been suggested to solve this problem mainly by using the rating history of the user or by identifying the preferences of similar users. Most of the existing recommendation systems are formulated in an identical fashion, where a model is trained to capture the underlying preferences of users over different kinds of items. Once it is deployed, the model suggests personalized recommendations precisely, and it is assumed that the preferences of users are perfectly reflected by the historical data. However, such user data might be limited in practice, and the characteristics of users may constantly evolve during their intensive interaction between recommendation systems.

Moreover, most of these recommender systems suffer from the cold-start problems where insufficient data for new users or products results in reduced overall recommendation output. In the current study, we have built a recommender system to recommend movies to users. Biclustering algorithm is used to cluster the users and movies simultaneously at the beginning to generate explainable recommendations, and these biclusters are used to form a gridworld where Q-Learning is used to learn the policy to traverse through the grid. The reward function uses the Jaccard Index, which is a measure of common users between two biclusters. Demographic details of new users are used to generate recommendations that solve the cold-start problem too.

Lastly, the implemented algorithm is examined with a real-world dataset against the widely used recommendation algorithm and the performance for the cold-start cases.

Contributors

Agent

Created

Date Created
  • 2020

158800-Thumbnail Image.png

A Deep Reinforcement Learning Approach for Robotic Bicycle Stabilization

Description

Bicycle stabilization has become a popular topic because of its complex dynamic behavior and the large body of bicycle modeling research. Riding a bicycle requires accurately performing several tasks, such

Bicycle stabilization has become a popular topic because of its complex dynamic behavior and the large body of bicycle modeling research. Riding a bicycle requires accurately performing several tasks, such as balancing and navigation which may be difficult for disabled people. Their problems could be partially reduced by providing steering assistance. For stabilization of these highly maneuverable and efficient machines, many control techniques have been applied – achieving interesting results, but with some limitations which includes strict environmental requirements. This thesis expands on the work of Randlov and Alstrom, using reinforcement learning for bicycle self-stabilization with robotic steering. This thesis applies the deep deterministic policy gradient algorithm, which can handle continuous action spaces which is not possible for Q-learning technique. The research involved algorithm training on virtual environments followed by simulations to assess its results. Furthermore, hardware testing was also conducted on Arizona State University’s RISE lab Smart bicycle platform for testing its self-balancing performance. Detailed analysis of the bicycle trial runs are presented. Validation of testing was done by plotting the real-time states and actions collected during the outdoor testing which included the roll angle of bicycle. Further improvements in regard to model training and hardware testing are also presented.

Contributors

Agent

Created

Date Created
  • 2020

155154-Thumbnail Image.png

FPGA accelerator architecture for Q-learning and its applications in space exploration rovers

Description

Achieving human level intelligence is a long-term goal for many Artificial Intelligence (AI) researchers. Recent developments in combining deep learning and reinforcement learning helped us to move a step forward

Achieving human level intelligence is a long-term goal for many Artificial Intelligence (AI) researchers. Recent developments in combining deep learning and reinforcement learning helped us to move a step forward in achieving this goal. Reinforcement learning using a delayed reward mechanism is an approach to machine intelligence which studies decision making with control and how a decision making agent can learn to act optimally in an environment-unaware conditions.

Q-learning is one of the model-free reinforcement directed learning strategies which uses temporal differences to estimate the performances of state-action pairs called Q values. A simple implementation of Q-learning algorithm can be done using a Q table memory to store and update the Q values. However, with an increase in state space data due to a complex environment, and with an increase in possible number of actions an agent can perform, Q table reaches its space limit and would be difficult to scale well. Q-learning with neural networks eliminates the use of Q table by approximating the Q function using neural networks.

Autonomous agents need to develop cognitive properties and become self-adaptive to be deployable in any environment. Reinforcement learning with Q-learning have been very efficient in solving such problems. However, embedded systems like space rovers and autonomous robots rarely implement such techniques due to the constraints faced like processing power, chip area, convergence rate and cost of the chip. These problems present a need for a portable, low power, area efficient hardware accelerator to accelerate the process of such learning.

This problem is targeted by implementing a hardware schematic architecture for Q-learning using Artificial Neural networks. This architecture exploits the massive parallelism provided by neural network with a dedicated fine grain parallelism provided by a Field Programmable Gate Array (FPGA) thereby processing the Q values at a high throughput. Mars exploration rovers currently use Xilinx-Space-grade FPGA devices for image processing, pyrotechnic operation control and obstacle avoidance. The hardware resource consumption for the architecture has been synthesized considering Xilinx Virtex7 FPGA as the target device.

Contributors

Agent

Created

Date Created
  • 2016

154883-Thumbnail Image.png

Haptic perception, decision-making, and learning for manipulation with artificial hands

Description

Robotic systems are outmatched by the abilities of the human hand to perceive and manipulate the world. Human hands are able to physically interact with the world to perceive, learn,

Robotic systems are outmatched by the abilities of the human hand to perceive and manipulate the world. Human hands are able to physically interact with the world to perceive, learn, and act to accomplish tasks. Limitations of robotic systems to interact with and manipulate the world diminish their usefulness. In order to advance robot end effectors, specifically artificial hands, rich multimodal tactile sensing is needed. In this work, a multi-articulating, anthropomorphic robot testbed was developed for investigating tactile sensory stimuli during finger-object interactions. The artificial finger is controlled by a tendon-driven remote actuation system that allows for modular control of any tendon-driven end effector and capabilities for both speed and strength. The artificial proprioception system enables direct measurement of joint angles and tendon tensions while temperature, vibration, and skin deformation are provided by a multimodal tactile sensor. Next, attention was focused on real-time artificial perception for decision-making. A robotic system needs to perceive its environment in order to make decisions. Specific actions such as “exploratory procedures” can be employed to classify and characterize object features. Prior work on offline perception was extended to develop an anytime predictive model that returns the probability of having touched a specific feature of an object based on minimally processed sensor data. Developing models for anytime classification of features facilitates real-time action-perception loops. Finally, by combining real-time action-perception with reinforcement learning, a policy was learned to complete a functional contour-following task: closing a deformable ziplock bag. The approach relies only on proprioceptive and localized tactile data. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards within a finite time period by balancing exploration versus exploitation of the action space. Performance of the C-MAB learner was compared to a benchmark Q-learner that eventually returns the optimal policy. To assess robustness and generalizability, the learned policy was tested on variations of the original contour-following task. The work presented contributes to the full range of tools necessary to advance the abilities of artificial hands with respect to dexterity, perception, decision-making, and learning.

Contributors

Agent

Created

Date Created
  • 2016