Robots are often used in long-duration scenarios, such as on the surface of Mars,where they may need to adapt to environmental changes. Typically, robots have been built specifically for single tasks, such as moving boxes in a warehouse or surveying construction sites. However, there is a modern trend away from human hand-engineering and toward robot learning. To this end, the ideal robot is not engineered,but automatically designed for a specific task. This thesis focuses on robots which learn path-planning algorithms for specific environments. Learning is accomplished via genetic programming. Path-planners are represented as Python code, which is optimized via Pareto evolution. These planners are encouraged to explore curiously and efficiently. This research asks the questions: “How can robots exhibit life-long learning where they adapt to changing environments in a robust way?”, and “How can robots learn to be curious?”.
Optimal foraging theory provides a suite of tools that model the best way that an animal will <br/>structure its searching and processing decisions in uncertain environments. It has been <br/>successful characterizing real patterns of animal decision making, thereby providing insights<br/>into why animals behave the way they do. However, it does not speak to how animals make<br/>decisions that tend to be adaptive. Using simulation studies, prior work has shown empirically<br/>that a simple decision-making heuristic tends to produce prey-choice behaviors that, on <br/>average, match the predicted behaviors of optimal foraging theory. That heuristic chooses<br/>to spend time processing an encountered prey item if that prey item's marginal rate of<br/>caloric gain (in calories per unit of processing time) is greater than the forager's<br/>current long-term rate of accumulated caloric gain (in calories per unit of total searching<br/>and processing time). Although this heuristic may seem intuitive, a rigorous mathematical<br/>argument for why it tends to produce the theorized optimal foraging theory behavior has<br/>not been developed. In this thesis, an analytical argument is given for why this<br/>simple decision-making heuristic is expected to realize the optimal performance<br/>predicted by optimal foraging theory. This theoretical guarantee not only provides support<br/>for why such a heuristic might be favored by natural selection, but it also provides<br/>support for why such a heuristic might a reliable tool for decision-making in autonomous<br/>engineered agents moving through theatres of uncertain rewards. Ultimately, this simple<br/>decision-making heuristic may provide a recipe for reinforcement learning in small robots<br/>with little computational capabilities.