SwarmNet: A Graph Based Learning Framework for Creating and Understanding Multi-Agent System Behaviors

Description

A swarm describes a group of interacting agents exhibiting complex collective behaviors. Higher-level behavioral patterns of the group are believed to emerge from simple low-level rules of decision making at the agent-level. With the potential application of swarms of aerial…

A swarm describes a group of interacting agents exhibiting complex collective behaviors. Higher-level behavioral patterns of the group are believed to emerge from simple low-level rules of decision making at the agent-level. With the potential application of swarms of aerial drones, underwater robots, and other multi-robot systems, there has been increasing interest in approaches for specifying complex, collective behavior for artificial swarms. Traditional methods for creating artificial multi-agent behaviors inspired by known swarms analyze the underlying dynamics and hand craft low-level control logics that constitute the emerging behaviors. Deep learning methods offered an approach to approximate the behaviors through optimization without much human intervention.

This thesis proposes a graph based neural network architecture, SwarmNet, for learning the swarming behaviors of multi-agent systems. Given observation of only the trajectories of an expert multi-agent system, the SwarmNet is able to learn sensible representations of the internal low-level interactions on top of being able to approximate the high-level behaviors and make long-term prediction of the motion of the system. Challenges in scaling the SwarmNet and graph neural networks in general are discussed in detail, along with measures to alleviate the scaling issue in generalization is proposed. Using the trained network as a control policy, it is shown that the combination of imitation learning and reinforcement learning improves the policy more efficiently. To some extent, it is shown that the low-level interactions are successfully identified and separated and that the separated functionality enables fine controlled custom training.