<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-05-22T05:11:53Z</responseDate><request verb="GetRecord" metadataPrefix="oai_dc">https://keep.lib.asu.edu/oai/request</request><GetRecord><record><header><identifier>oai:keep.lib.asu.edu:node-193002</identifier><datestamp>2024-12-23T18:01:48Z</datestamp><setSpec>oai_pmh:repo_items</setSpec></header><metadata><oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:identifier>193002</dc:identifier>
          <dc:identifier>https://hdl.handle.net/2286/R.2.N.193002</dc:identifier>
                  <dc:rights>http://rightsstatements.org/vocab/InC/1.0/</dc:rights>
          <dc:rights>All Rights Reserved</dc:rights>
                  <dc:date>2024</dc:date>
          <dc:date>2026-05-01T09:24:11</dc:date>
                  <dc:format>50 pages</dc:format>
                  <dc:type>Masters Thesis</dc:type>
          <dc:type>Academic theses</dc:type>
          <dc:type>Text</dc:type>
                  <dc:language>eng</dc:language>
                  <dc:contributor>Kim, Hyohun</dc:contributor>
          <dc:contributor>Xu, Zhe ZX</dc:contributor>
          <dc:contributor>Lee, Hyunglae HL</dc:contributor>
          <dc:contributor>Berman, Spring SB</dc:contributor>
          <dc:contributor>Arizona State University</dc:contributor>
                  <dc:description>Partial requirement for: M.S., Arizona State University, 2024</dc:description>
          <dc:description>Field of study: Engineering</dc:description>
          <dc:description>Multi-agent reinforcement learning (MARL) plays a pivotal role in artificial intelligence by facilitating the learning process in complex environments inhabited by multiple entities. This thesis explores the integration of learning high-level knowledge through reward machines (RMs) with MARL to effectively manage non-Markovian reward functions in non-cooperative stochastic games. Reward machines offer a sophisticated way to model the temporal structure of rewards, thereby providing an enhanced representation of agent decision-making processes. A novel algorithm JIRP-SG is introduced, enabling agents to concurrently learn RMs and optimize their best response policies while navigating the intricate temporal dependencies present in non-cooperative settings. This approach employs automata learning to iteratively acquire RMs and utilizes the Lemke-Howson method to update the Q-functions, aiming for a Nash equilibrium. It is demonstrated that the method introduced reliably converges to accurately encode the reward functions and achieve the optimal best response policy for each agent over time. The effectiveness of the proposed approach is validated through case studies, including a Pacman Game scenario and a Factory Assembly scenario, illustrating its superior performance compared to baseline methods. Additionally, the impact of batch size on learning performance is examined, revealing that a diligent agent employing smaller batches can surpass the performance of an agent using larger batches, which fails to summarize experiences as effectively.</dc:description>
                  <dc:subject>Artificial Intelligence</dc:subject>
          <dc:subject>Robotics</dc:subject>
          <dc:subject>Artificial Intelligence</dc:subject>
          <dc:subject>Reinforcement Learning</dc:subject>
          <dc:subject>Reward Machine</dc:subject>
                  <dc:title>Joint Learning of Reward Machines and Policies for Multi-Agent Reinforcement Learning in Non-Cooperative Stochastic Games</dc:title></oai_dc:dc></metadata></record></GetRecord></OAI-PMH>
