Matching Items (7)
Filtering by

Clear all filters

152687-Thumbnail Image.png
Description
Learning by trial-and-error requires retrospective information that whether a past action resulted in a rewarded outcome. Previous outcome in turn may provide information to guide future behavioral adjustment. But the specific contribution of this information to learning a task and the neural representations during the trial-and-error learning process is not

Learning by trial-and-error requires retrospective information that whether a past action resulted in a rewarded outcome. Previous outcome in turn may provide information to guide future behavioral adjustment. But the specific contribution of this information to learning a task and the neural representations during the trial-and-error learning process is not well understood. In this dissertation, such learning is analyzed by means of single unit neural recordings in the rats' motor agranular medial (AGm) and agranular lateral (AGl) while the rats learned to perform a directional choice task. Multichannel chronic recordings using implanted microelectrodes in the rat's brain were essential to this study. Also for fundamental scientific investigations in general and for some applications such as brain machine interface, the recorded neural waveforms need to be analyzed first to identify neural action potentials as basic computing units. Prior to analyzing and modeling the recorded neural signals, this dissertation proposes an advanced spike sorting system, the M-Sorter, to extract the action potentials from raw neural waveforms. The M-Sorter shows better or comparable performance compared with two other popular spike sorters under automatic mode. With the sorted action potentials in place, neuronal activity in the AGm and AGl areas in rats during learning of a directional choice task is examined. Systematic analyses suggest that rat's neural activity in AGm and AGl was modulated by previous trial outcomes during learning. Single unit based neural dynamics during task learning are described in detail in the dissertation. Furthermore, the differences in neural modulation between fast and slow learning rats were compared. The results show that the level of neural modulation of previous trial outcome is different in fast and slow learning rats which may in turn suggest an important role of previous trial outcome encoding in learning.
ContributorsYuan, Yu'an (Author) / Si, Jennie (Thesis advisor) / Buneo, Christopher (Committee member) / Santello, Marco (Committee member) / Chae, Junseok (Committee member) / Arizona State University (Publisher)
Created2014
152691-Thumbnail Image.png
Description
Animals learn to choose a proper action among alternatives according to the circumstance. Through trial-and-error, animals improve their odds by making correct association between their behavioral choices and external stimuli. While there has been an extensive literature on the theory of learning, it is still unclear how individual neurons and

Animals learn to choose a proper action among alternatives according to the circumstance. Through trial-and-error, animals improve their odds by making correct association between their behavioral choices and external stimuli. While there has been an extensive literature on the theory of learning, it is still unclear how individual neurons and a neural network adapt as learning progresses. In this dissertation, single units in the medial and lateral agranular (AGm and AGl) cortices were recorded as rats learned a directional choice task. The task required the rat to make a left/right side lever press if a light cue appeared on the left/right side of the interface panel. Behavior analysis showed that rat's movement parameters during performance of directional choices became stereotyped very quickly (2-3 days) while learning to solve the directional choice problem took weeks to occur. The entire learning process was further broken down to 3 stages, each having similar number of recording sessions (days). Single unit based firing rate analysis revealed that 1) directional rate modulation was observed in both cortices; 2) the averaged mean rate between left and right trials in the neural ensemble each day did not change significantly among the three learning stages; 3) the rate difference between left and right trials of the ensemble did not change significantly either. Besides, for either left or right trials, the trial-to-trial firing variability of single neurons did not change significantly over the three stages. To explore the spatiotemporal neural pattern of the recorded ensemble, support vector machines (SVMs) were constructed each day to decode the direction of choice in single trials. Improved classification accuracy indicated enhanced discriminability between neural patterns of left and right choices as learning progressed. When using a restricted Boltzmann machine (RBM) model to extract features from neural activity patterns, results further supported the idea that neural firing patterns adapted during the three learning stages to facilitate the neural codes of directional choices. Put together, these findings suggest a spatiotemporal neural coding scheme in a rat AGl and AGm neural ensemble that may be responsible for and contributing to learning the directional choice task.
ContributorsMao, Hongwei (Author) / Si, Jennie (Thesis advisor) / Buneo, Christopher (Committee member) / Cao, Yu (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)
Created2014
Description
Peripheral Vascular Disease (PVD) is a debilitating chronic disease of the lower extremities particularly affecting older adults and diabetics. It results in reduction of the blood flow to peripheral tissue and sometimes causing tissue damage such that PVD patients suffer from pain in the lower legs, thigh and buttocks after

Peripheral Vascular Disease (PVD) is a debilitating chronic disease of the lower extremities particularly affecting older adults and diabetics. It results in reduction of the blood flow to peripheral tissue and sometimes causing tissue damage such that PVD patients suffer from pain in the lower legs, thigh and buttocks after activities. Electrical neurostimulation based on the "Gate Theory of Pain" is a known to way to reduce pain but current devices to do this are bulky and not well suited to implantation in peripheral tissues. There is also an increased risk associated with surgery which limits the use of these devices. This research has designed and constructed wireless ultrasound powered microstimulators that are much smaller and injectable and so involve less implantation trauma. These devices are small enough to fit through an 18 gauge syringe needle increasing their potential for clinical use. These piezoelectric microdevices convert mechanical energy into electrical energy that then is used to block pain. The design and performance of these miniaturized devices was modeled by computer while constructed devices were evaluated in animal experiments. The devices are capable of producing 500ms pulses with an intensity of 2 mA into a 2 kilo-ohms load. Using the rat as an animal model, a series of experiments were conducted to evaluate the in-vivo performance of the devices.
ContributorsZong, Xi (Author) / Towe, Bruce (Thesis advisor) / Kleim, Jeffrey (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)
Created2014
136335-Thumbnail Image.png
Description
The primary motor cortex (M1) plays a vital role in motor planning and execution, as well as in motor learning. Baseline corticospinal excitability (CSE) in M1 is known to increase as a result of motor learning, but less is understand about the modulation of CSE at the pre-execution planning stage

The primary motor cortex (M1) plays a vital role in motor planning and execution, as well as in motor learning. Baseline corticospinal excitability (CSE) in M1 is known to increase as a result of motor learning, but less is understand about the modulation of CSE at the pre-execution planning stage due to learning. This question was addressed using single pulse transcranial magnetic stimulation (TMS) to measure the modulation of both baseline and planning CSE due to learning a reach to grasp task. It was hypothesized that baseline CSE would increase and planning CSE decrease as a function of trial; an increase in baseline CSE would replicate established findings in the literature, while a decrease in planning would be a novel finding. Eight right-handed subjects were visually cued to exert a precise grip force, with the goal of producing that force accurately and consistently. Subjects effectively learned the task in the first 10 trials, but no significant trends were found in the modulation of baseline or planning CSE. The lack of significant results may be due to the very quick learning phase or the lower intensity of training as compared to past studies. The findings presented here suggest that planning and baseline CSE may be modulated along different time courses as learning occurs and point to some important considerations for future studies addressing this question.
ContributorsMoore, Dalton Dale (Author) / Santello, Marco (Thesis director) / Kleim, Jeff (Committee member) / Barrett, The Honors College (Contributor) / Harrington Bioengineering Program (Contributor)
Created2015-05
137282-Thumbnail Image.png
Description
A previous study demonstrated that learning to lift an object is context-based and that in the presence of both the memory and visual cues, the acquired sensorimotor memory to manipulate an object in one context interferes with the performance of the same task in presence of visual information about a

A previous study demonstrated that learning to lift an object is context-based and that in the presence of both the memory and visual cues, the acquired sensorimotor memory to manipulate an object in one context interferes with the performance of the same task in presence of visual information about a different context (Fu et al, 2012).
The purpose of this study is to know whether the primary motor cortex (M1) plays a role in the sensorimotor memory. It was hypothesized that temporary disruption of the M1 following the learning to minimize a tilt using a ‘L’ shaped object would negatively affect the retention of sensorimotor memory and thus reduce interference between the memory acquired in one context and the visual cues to perform the same task in a different context.
Significant findings were shown in blocks 1, 2, and 4. In block 3, subjects displayed insignificant amount of learning. However, it cannot be concluded that there is full interference in block 3. Therefore, looked into 3 effects in statistical analysis: the main effects of the blocks, the main effects of the trials, and the effects of the blocks and trials combined. From the block effects, there is a p-value of 0.001, and from the trial effects, the p-value is less than 0.001. Both of these effects indicate that there is learning occurring. However, when looking at the blocks * trials effects, we see a p-value of 0.002 < 0.05 indicating significant interaction between sensorimotor memories. Based on the results that were found, there is a presence of interference in all the blocks but not enough to justify the use of TMS in order to reduce interference because there is a partial reduction of interference from the control experiment. It is evident that the time delay might be the issue between context switches. By reducing the time delay between block 2 and 3 from 10 minutes to 5 minutes, I will hope to see significant learning to occur from the first trial to the second trial.
ContributorsHasan, Salman Bashir (Author) / Santello, Marco (Thesis director) / Kleim, Jeffrey (Committee member) / Helms Tillery, Stephen (Committee member) / Barrett, The Honors College (Contributor) / W. P. Carey School of Business (Contributor) / Harrington Bioengineering Program (Contributor)
Created2014-05
191018-Thumbnail Image.png
Description
This dissertation focuses on reinforcement learning (RL) controller design aiming for real-life applications in continuous state and control problems. It involves three major research investigations in the aspect of design, analysis, implementation, and evaluation. The application case addresses automatically configuring robotic prosthesis impedance parameters. Major contributions of the dissertation include

This dissertation focuses on reinforcement learning (RL) controller design aiming for real-life applications in continuous state and control problems. It involves three major research investigations in the aspect of design, analysis, implementation, and evaluation. The application case addresses automatically configuring robotic prosthesis impedance parameters. Major contributions of the dissertation include the following. 1) An “echo control” using the intact knee profile as target is designed to overcome the limitation of a designer prescribed robotic knee profile. 2) Collaborative multiagent reinforcement learning (cMARL) is proposed to directly take into account human influence in the robot control design. 3) A phased actor in actor-critic (PAAC) reinforcement learning method is developed to reduce learning variance in RL. The design of an “echo control” is based on a new formulation of direct heuristic dynamic programming (dHDP) for tracking control of a robotic knee prosthesis to mimic the intact knee profile. A systematic simulation of the proposed control is provided using a human-robot system simulation in OpenSim. The tracking controller is then tested on able-bodied and amputee subjects. This is the first real-time human testing of RL tracking control of a robotic knee to mirror the profile of an intact knee. The cMARL is a new solution framework for the human-prosthesis collaboration (HPC) problem. This is the first attempt at considering human influence on human-robot walking with the presence of a reinforcement learning controlled lower limb prosthesis. Results show that treating the human and robot as coupled and collaborating agents and using an estimated human adaptation in robot control design help improve human walking performance. The above studies have demonstrated great potential of RL control in solving continuous problems. To solve more complex real-life tasks with multiple control inputs and high dimensional state space, high variance, low data efficiency, slow learning or even instability are major roadblocks to be addressed. A novel PAAC method is proposed to improve learning performance in policy gradient RL by accounting for both Q value and TD error in actor updates. Systematical and comprehensive demonstrations show its effectiveness by qualitative analysis and quantitative evaluation in DeepMind Control Suite.
ContributorsWu, Ruofan (Author) / Si, Jennie (Thesis advisor) / Huang, He (Committee member) / Santello, Marco (Committee member) / Papandreou- Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2023
158010-Thumbnail Image.png
Description
Robotic lower limb prostheses provide new opportunities to help transfemoral amputees regain mobility. However, their application is impeded by that the impedance control parameters need to be tuned and optimized manually by prosthetists for each individual user in different task environments. Reinforcement learning (RL) is capable of automatically learning from

Robotic lower limb prostheses provide new opportunities to help transfemoral amputees regain mobility. However, their application is impeded by that the impedance control parameters need to be tuned and optimized manually by prosthetists for each individual user in different task environments. Reinforcement learning (RL) is capable of automatically learning from interacting with the environment. It becomes a natural candidate to replace human prosthetists to customize the control parameters. However, neither traditional RL approaches nor the popular deep RL approaches are readily suitable for learning with limited number of samples and samples with large variations. This dissertation aims to explore new RL based adaptive solutions that are data-efficient for controlling robotic prostheses.

This dissertation begins by proposing a new flexible policy iteration (FPI) framework. To improve sample efficiency, FPI can utilize either on-policy or off-policy learning strategy, can learn from either online or offline data, and can even adopt exiting knowledge of an external critic. Approximate convergence to Bellman optimal solutions are guaranteed under mild conditions. Simulation studies validated that FPI was data efficient compared to several established RL methods. Furthermore, a simplified version of FPI was implemented to learn from offline data, and then the learned policy was successfully tested for tuning the control parameters online on a human subject.

Next, the dissertation discusses RL control with information transfer (RL-IT), or knowledge-guided RL (KG-RL), which is motivated to benefit from transferring knowledge acquired from one subject to another. To explore its feasibility, knowledge was extracted from data measurements of able-bodied (AB) subjects, and transferred to guide Q-learning control for an amputee in OpenSim simulations. This result again demonstrated that data and time efficiency were improved using previous knowledge.

While the present study is new and promising, there are still many open questions to be addressed in future research. To account for human adaption, the learning control objective function may be designed to incorporate human-prosthesis performance feedback such as symmetry, user comfort level and satisfaction, and user energy consumption. To make the RL based control parameter tuning practical in real life, it should be further developed and tested in different use environments, such as from level ground walking to stair ascending or descending, and from walking to running.
ContributorsGao, Xiang (Author) / Si, Jennie (Thesis advisor) / Huang, He Helen (Committee member) / Santello, Marco (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2020