This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
In this thesis work, a novel learning approach to solving the problem of controllinga quadcopter (drone) swarm is explored. To deal with large sizes, swarm control is
often achieved in a distributed fashion by combining different behaviors such that
each behavior implements some desired swarm characteristics, such as avoiding ob-
stacles and staying…
In this thesis work, a novel learning approach to solving the problem of controllinga quadcopter (drone) swarm is explored. To deal with large sizes, swarm control is
often achieved in a distributed fashion by combining different behaviors such that
each behavior implements some desired swarm characteristics, such as avoiding ob-
stacles and staying close to neighbors. One common approach in distributed swarm
control uses potential fields. A limitation of this approach is that the potential fields
often depend statically on a set of control parameters that are manually specified a
priori. This paper introduces Dynamic Potential Fields for flexible swarm control.
These potential fields are modulated by a set of dynamic control parameters (DCPs)
that can change under different environment situations. Since the focus is only on
these DCPs, it simplifies the learning problem and makes it feasible for practical use.
This approach uses soft actor critic (SAC) where the actor only determines how to
modify DCPs in the current situation, resulting in more flexible swarm control. In
the results, this work will show that the DCP approach allows for the drones to bet-
ter traverse environments with obstacles compared to several state-of-the-art swarm
control methods with a fixed set of control parameters. This approach also obtained
a higher safety score commonly used to assess swarm behavior. A basic reinforce-
ment learning approach is compared to demonstrate faster convergence. Finally, an
ablation study is conducted to validate the design of this approach.
This work has improved the quality of the solution to the sparse rewards problemby combining reinforcement learning (RL) with knowledge-rich planning. Classical
methods for coping with sparse rewards during reinforcement learning modify the
reward landscape so as to better guide the learner. In contrast, this work combines
RL with a planner in order…
This work has improved the quality of the solution to the sparse rewards problemby combining reinforcement learning (RL) with knowledge-rich planning. Classical
methods for coping with sparse rewards during reinforcement learning modify the
reward landscape so as to better guide the learner. In contrast, this work combines
RL with a planner in order to utilize other information about the environment. As
the scope for representing environmental information is limited in RL, this work has
conflated a model-free learning algorithm – temporal difference (TD) learning – with
a Hierarchical Task Network (HTN) planner to accommodate rich environmental
information in the algorithm. In the perpetual sparse rewards problem, rewards
reemerge after being collected within a fixed interval of time, culminating in a lack of a
well-defined goal state as an exit condition to the problem. Incorporating planning in
the learning algorithm not only improves the quality of the solution, but the algorithm
also avoids the ambiguity of incorporating a goal of maximizing profit while using
only a planning algorithm to solve this problem. Upon occasionally using the HTN
planner, this algorithm provides the necessary tweak toward the optimal solution. In
this work, I have demonstrated an on-policy algorithm that has improved the quality
of the solution over vanilla reinforcement learning. The objective of this work has
been to observe the capacity of the synthesized algorithm in finding optimal policies to
maximize rewards, awareness of the environment, and the awareness of the presence
of other agents in the vicinity.