Search Content

A Comparative Study of Multi-Agent Reinforcement Learning on Real World Problems

Description

This work investigates the multi-agent reinforcement learning methods that have applicability to real-world scenarios including stochastic, partially observable, and infinite horizon problems. These problems are hard due to large state and control spaces and may require some form of intelligent multi-agent behavior to achieve the target objective. The study…

This work investigates the multi-agent reinforcement learning methods that have applicability to real-world scenarios including stochastic, partially observable, and infinite horizon problems. These problems are hard due to large state and control spaces and may require some form of intelligent multi-agent behavior to achieve the target objective. The study also introduces novel rollout-based methods that provide reasonable guarantees to cost improvements and obtaining a sub-optimal solution to such problems while being amenable to distributed computation and hence a faster runtime. These methods, first introduced and developed for single-agent scenarios, are gradually extended to the multi-agent variants. They have been named multi-agent rollout methods. The problems studied in this work target one or more aspects of three major challenges of real-world problems. Spider and Fly problem deals with stochastic environments, multi-robot repair problem is an example of a partial observation Markov decision problem or POMDP, whereas the Flatland challenge is an RL benchmark that aims to solve the vehicle rescheduling problem. The study also includes comparisons to some existing methods that are used widely for such problems as POMCP, DESPOT, and MADDPG. The work also delineates and compares different behaviors arising out of our methods to other existing methods thereby positing the efficacy of our rollout-based methods in solving real-world multi-agent reinforcement learning problems. Additionally, the source code and problem environments have been released for the community to further the research in this field. The source code and the related research can be found on https://sahilbadyal.com/marl.

ContributorsBadyal, Sahil (Author) / Gil, Stephanie Dr. (Thesis advisor) / Bertsekas, Dimitri Dr. (Committee member) / Yang, Yingzhen Dr. (Committee member) / Arizona State University (Publisher)

Created2021

Explainable AI in Workflow Development and Verification Using Pi-Calculus

Description

Computer science education is an increasingly vital area of study with various challenges that increase the difficulty level for new students resulting in higher attrition rates. As part of an effort to resolve this issue, a new visual programming language environment was developed for this research, the Visual IoT and…

Computer science education is an increasingly vital area of study with various challenges that increase the difficulty level for new students resulting in higher attrition rates. As part of an effort to resolve this issue, a new visual programming language environment was developed for this research, the Visual IoT and Robotics Programming Language Environment (VIPLE). VIPLE is based on computational thinking and flowchart, which reduces the needs of memorization of detailed syntax in text-based programming languages. VIPLE has been used at Arizona State University (ASU) in multiple years and sections of FSE100 as well as in universities worldwide. Another major issue with teaching large programming classes is the potential lack of qualified teaching assistants to grade and offer insight to a student’s programs at a level beyond output analysis.

In this dissertation, I propose a novel framework for performing semantic autograding, which analyzes student programs at a semantic level to help students learn with additional and systematic help. A general autograder is not practical for general programming languages, due to the flexibility of semantics. A practical autograder is possible in VIPLE, because of its simplified syntax and restricted options of semantics. The design of this autograder is based on the concept of theorem provers. To achieve this goal, I employ a modified version of Pi-Calculus to represent VIPLE programs and Hoare Logic to formalize program requirements. By building on the inference rules of Pi-Calculus and Hoare Logic, I am able to construct a theorem prover that can perform automated semantic analysis. Furthermore, building on this theorem prover enables me to develop a self-learning algorithm that can learn the conditions for a program’s correctness according to a given solution program.

ContributorsDe Luca, Gennaro (Author) / Chen, Yinong (Thesis advisor) / Liu, Huan (Thesis advisor) / Hsiao, Sharon (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2020

Can knowledge rich sentences help language models to solve common sense reasoning problems?

Description

Significance of real-world knowledge for Natural Language Understanding(NLU) is well-known for decades. With advancements in technology, challenging tasks like question-answering, text-summarizing, and machine translation are made possible with continuous efforts in the field of Natural Language Processing(NLP). Yet, knowledge integration to answer common sense questions is still a daunting task.…

Significance of real-world knowledge for Natural Language Understanding(NLU) is well-known for decades. With advancements in technology, challenging tasks like question-answering, text-summarizing, and machine translation are made possible with continuous efforts in the field of Natural Language Processing(NLP). Yet, knowledge integration to answer common sense questions is still a daunting task. Logical reasoning has been a resort for many of the problems in NLP and has achieved considerable results in the field, but it is difficult to resolve the ambiguities in a natural language. Co-reference resolution is one of the problems where ambiguity arises due to the semantics of the sentence. Another such problem is the cause and result statements which require causal commonsense reasoning to resolve the ambiguity. Modeling these type of problems is not a simple task with rules or logic. State-of-the-art systems addressing these problems use a trained neural network model, which claims to have overall knowledge from a huge trained corpus. These systems answer the questions by using the knowledge embedded in their trained language model. Although the language models embed the knowledge from the data, they use occurrences of words and frequency of co-existing words to solve the prevailing ambiguity. This limits the performance of language models to solve the problems in common-sense reasoning task as it generalizes the concept rather than trying to answer the problem specific to its context. For example, "The painting in Mark's living room shows an oak tree. It is to the right of a house", is a co-reference resolution problem which requires knowledge. Language models can resolve whether "it" refers to "painting" or "tree", since "house" and "tree" are two common co-occurring words so the models can resolve "tree" to be the co-reference. On the other hand, "The large ball crashed right through the table. Because it was made of Styrofoam ." to resolve for "it" which can be either "table" or "ball", is difficult for a language model as it requires more information about the problem.

In this work, I have built an end-to-end framework, which uses the automatically extracted knowledge based on the problem. This knowledge is augmented with the language models using an explicit reasoning module to resolve the ambiguity. This system is built to improve the accuracy of the language models based approaches for commonsense reasoning. This system has proved to achieve the state of the art accuracy on the Winograd Schema Challenge.

ContributorsPrakash, Ashok (Author) / Baral, Chitta (Thesis advisor) / Devarakonda, Murthy (Committee member) / Anwar, Saadat (Committee member) / Arizona State University (Publisher)

Created2019

Learning Generalized Partial Policies from Examples

Description

Many real-world planning problems can be modeled as Markov Decision Processes (MDPs) which provide a framework for handling uncertainty in outcomes of action executions. A solution to such a planning problem is a policy that handles possible contingencies that could arise during execution. MDP solvers typically construct policies for a…

Many real-world planning problems can be modeled as Markov Decision Processes (MDPs) which provide a framework for handling uncertainty in outcomes of action executions. A solution to such a planning problem is a policy that handles possible contingencies that could arise during execution. MDP solvers typically construct policies for a problem instance without re-using information from previously solved instances. Research in generalized planning has demonstrated the utility of constructing algorithm-like plans that reuse such information. However, using such techniques in an MDP setting has not been adequately explored.

This thesis presents a novel approach for learning generalized partial policies that can be used to solve problems with different object names and/or object quantities using very few example policies for learning. This approach uses abstraction for state representation, which allows the identification of patterns in solutions such as loops that are agnostic to problem-specific properties. This thesis also presents some theoretical results related to the uniqueness and succinctness of the policies computed using such a representation. The presented algorithm can be used as fast, yet greedy and incomplete method for policy computation while falling back to a complete policy search algorithm when needed. Extensive empirical evaluation on discrete MDP benchmarks shows that this approach generalizes effectively and is often able to solve problems much faster than existing state-of-art discrete MDP solvers. Finally, the practical applicability of this approach is demonstrated by incorporating it in an anytime stochastic task and motion planning framework to successfully construct free-standing tower structures using Keva planks.

ContributorsKala Vasudevan, Deepak (Author) / Srivastava, Siddharth (Thesis advisor) / Zhang, Yu (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

A Hacker-Centric Perspective to Empower Cyber Defense

Description

Malicious hackers utilize the World Wide Web to share knowledge. Previous work has demonstrated that information mined from online hacking communities can be used as precursors to cyber-attacks. In a threatening scenario, where security alert systems are facing high false positive rates, understanding the people behind cyber incidents can hel…

Malicious hackers utilize the World Wide Web to share knowledge. Previous work has demonstrated that information mined from online hacking communities can be used as precursors to cyber-attacks. In a threatening scenario, where security alert systems are facing high false positive rates, understanding the people behind cyber incidents can help reduce the risk of attacks. However, the rapidly evolving nature of those communities leads to limitations still largely unexplored, such as: who are the skilled and influential individuals forming those groups, how they self-organize along the lines of technical expertise, how ideas propagate within them, and which internal patterns can signal imminent cyber offensives? In this dissertation, I have studied four key parts of this complex problem set. Initially, I leverage content, social network, and seniority analysis to mine key-hackers on darkweb forums, identifying skilled and influential individuals who are likely to succeed in their cybercriminal goals. Next, as hackers often use Web platforms to advertise and recruit collaborators, I analyze how social influence contributes to user engagement online. On social media, two time constraints are proposed to extend standard influence measures, which increases their correlation with adoption probability and consequently improves hashtag adoption prediction. On darkweb forums, the prediction of where and when hackers will post a message in the near future is accomplished by analyzing their recurrent interactions with other hackers. After that, I demonstrate how vendors of malware and malicious exploits organically form hidden organizations on darkweb marketplaces, obtaining significant consistency across the vendors’ communities extracted using the similarity of their products in different networks. Finally, I predict imminent cyber-attacks correlating malicious hacking activity on darkweb forums with real-world cyber incidents, evidencing how social indicators are crucial for the performance of the proposed model. This research is a hybrid of social network analysis (SNA), machine learning (ML), evolutionary computation (EC), and temporal logic (TL), presenting expressive contributions to empower cyber defense.

ContributorsSantana Marin, Ericsson (Author) / Shakarian, Paulo (Thesis advisor) / Doupe, Adam (Committee member) / Liu, Huan (Committee member) / Ferrara, Emilio (Committee member) / Arizona State University (Publisher)

Created2020

Pre-trained Models for nnUNet

Description

Image segmentation is one of the most critical tasks in medical imaging, which identifies target segments (e.g., organs, tissues, lesions, etc.) from images for ease of analyzing. Among nearly all of the online segmentation challenges, deep learning has shown great promise due to the invention of U-Net, a fully automated,…

Image segmentation is one of the most critical tasks in medical imaging, which identifies target segments (e.g., organs, tissues, lesions, etc.) from images for ease of analyzing. Among nearly all of the online segmentation challenges, deep learning has shown great promise due to the invention of U-Net, a fully automated, end-to-end neural architecture designed for segmentation tasks. Recent months have also witnessed the wide success of a framework that was directly derived from U-Net architecture, called nnU-Net (“no-new-net”). However, training nnU-Net from scratch takes weeks to converge and suffers from the unstable performance. To overcome the two limitations, instead of training from scratch, transfer learning was employed to nnU-Net by transferring generic image representation learned from massive images to specific target tasks. Although the transfer learning paradigm has proven a significant performance gain in many classification tasks, its effectiveness of segmentation tasks has yet to be sufficiently studied, especially in 3D medical image segmentation. In this thesis, first, nnU-Net was pre-trained on large-scale chest CT scans (LUNA 2016), following the self-supervised learning approach introduced in Models Genesis. Further, nnU-Net was fine-tuned on various target segmentation tasks through transfer learning. The experiments on liver/liver tumor, lung tumor segmentation tasks demonstrate a significantly improved and stabilized performance between fine-tuning and learning nnU-Net from scratch. This performance gain is attributed to the scalable, generic, robust image representation learned from the consistent and recurring anatomical structure embedded in medical images.

ContributorsBajpai, Shivam (Author) / Liang, Jianming Dr. (Thesis advisor) / Wang, Yalin Dr. (Committee member) / Venkateswara, Hemanth Kumar Demakethepalli Dr. (Committee member) / Arizona State University (Publisher)

Created2021

Synthesis of Interpretable and Obfuscatory Behaviors in Human-Aware AI Systems

Description

In settings where a human and an embodied AI (artificially intelligent) agent coexist, the AI agent has to be capable of reasoning with the human's preconceived notions about the environment as well as with the human's perception limitations. In addition, it should be capable of communicating intentions and objectives effectively…

In settings where a human and an embodied AI (artificially intelligent) agent coexist, the AI agent has to be capable of reasoning with the human's preconceived notions about the environment as well as with the human's perception limitations. In addition, it should be capable of communicating intentions and objectives effectively to the human-in-the-loop. While acting in the presence of human observers, the AI agent can synthesize interpretable behaviors like explicable, legible, and assistive behaviors by accounting for the human's mental model (inclusive of her sensor model) in its reasoning process. This thesis will study different behavior synthesis algorithms which focus on improving the interpretability of the agent's behavior in the presence of a human observer. Further, this thesis will study how environment redesign strategies can be leveraged to improve the overall interpretability of the agent's behavior. At times, the agent's environment may also consist of purely adversarial entities or mixed entities (i.e. adversarial as well as cooperative entities), that are trying to infer information from the AI agent's behavior. In such settings, it is crucial for the agent to exhibit obfuscatory behavior that prevents sensitive information from falling into the hands of the adversarial entities. This thesis will show that it is possible to synthesize interpretable as well as obfuscatory behaviors using a single underlying algorithmic framework.

ContributorsKulkarni, Anagha (Author) / Kambhampati, Subbarao (Thesis advisor) / Kamar, Ece (Committee member) / Smith, David E. (Committee member) / Srivastava, Siddharth (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)

Created2021

Segmentation and Classification of Melanoma

Description

A skin lesion is a part of the skin which has an uncommon growth or appearance in comparison with the skin around it. While most are harmless, some can be warnings of skin cancer. Melanoma is the deadliest form of skin cancer and its early detection in dermoscopic images is…

A skin lesion is a part of the skin which has an uncommon growth or appearance in comparison with the skin around it. While most are harmless, some can be warnings of skin cancer. Melanoma is the deadliest form of skin cancer and its early detection in dermoscopic images is crucial and results in increase in the survival rate. The clinical ABCD (asymmetry, border irregularity, color variation and diameter greater than 6mm) rule is one of the most widely used method for early melanoma recognition. However, accurate classification of melanoma is still extremely difficult due to following reasons(not limited to): great visual resemblance between melanoma and non-melanoma skin lesions, less contrast difference between skin and the lesions etc. There is an ever-growing need of correct and reliable detection of skin cancers. Advances in the field of deep learning deems it perfect for the task of automatic detection and is very useful to pathologists as they aid them in terms of efficiency and accuracy. In this thesis various state of the art deep learning frameworks are used. An analysis of their parameters is done, innovative techniques are implemented to address the challenges faced in the tasks, segmentation, and classification in skin lesions.• Segmentation is task of dividing out regions of interest. This is used to only keep the ROI and separate it from its background. • Classification is the task of assigning the image a class, i.e., Melanoma(Cancer) and Nevus(Not Cancer). A pre-trained model is used and fine-tuned as per the needs of the given problem statement/dataset. Experimental results show promise as the implemented techniques reduce the false negatives rate, i.e., neural network is less likely to misclassify a melanoma.

ContributorsVerma, Vivek (Author) / Motsch, Sebastien (Thesis advisor) / Berman, Spring (Thesis advisor) / Zhuang, Houlong (Committee member) / Arizona State University (Publisher)

Created2021

Forward and Backward Machine Learning for Modeling Copper Diffusion in Cadmium Telluride Solar Cells

Description

To optimize solar cell performance, it is necessary to properly design the doping profile in the absorber layer of the solar cell. For CdTe solar cells, Cu is used for providing p-type doping. Hence, having an estimator that, given the diffusion parameter set (time and Temperature) and the doping concentration…

To optimize solar cell performance, it is necessary to properly design the doping profile in the absorber layer of the solar cell. For CdTe solar cells, Cu is used for providing p-type doping. Hence, having an estimator that, given the diffusion parameter set (time and Temperature) and the doping concentration at the junction, gives the junction depth of the absorber layer, is essential in the design process of CdTe solar cells (and other cell technologies). In this work it is called a forward (direct) estimation process. The backward (inverse) problem then is the one in which, given the junction depth and the desired concentration of Cu doping at the CdTe/CdS heterointerface, the estimator gives the time and/or the Temperature needed to achieve the desired doping profiles. This is called a backward (inverse) estimation process. Such estimators, both forward and backward, do not exist in the literature for solar cell technology. To train the Machine Learning (ML) estimator, it is necessary to first generate a large set of data that are obtained by using the PVRD-FASP Solver, which has been validated via comparison with experimental values. Note that this big dataset needs to be generated only once. Next, one uses Machine Learning (ML), Deep Learning (DL) and Artificial Intelligence (AI) to extract the actual Cu doping profiles that result from the process of diffusion, annealing, and cool-down in the fabrication sequence of CdTe solar cells. Two deep learning neural network models are used: (1) Multilayer Perceptron Artificial Neural Network (MLPANN) model using a Keras Application Programmable Interface (API) with TensorFlow backend, and (2) Radial Basis Function Network (RBFN) model to predict the Cu doping profiles for different Temperatures and durations of the annealing process. Excellent agreement between the simulated results obtained with the PVRD-FASP Solver and the predicted values is obtained. It is important to mention here that it takes a significant amount of time to generate the Cu doping profiles given the initial conditions using the PVRD-FASP Solver, because solving the drift-diffusion-reaction model is mathematically a stiff problem and leads to numerical instabilities if the time steps are not small enough, which, in turn, affects the time needed for completion of one simulation run. The generation of the same with Machine Learning (ML) is almost instantaneous and can serve as an excellent simulation tool to guide future fabrication of optimal doping profiles in CdTe solar cells.

ContributorsSalman, Ghaith (Author) / Vasileska, Dragica (Thesis advisor) / Goodnick, Stephen M. (Thesis advisor) / Ringhofer, Christian (Committee member) / Banerjee, Ayan (Committee member) / Arizona State University (Publisher)

Created2021

SwarmNet: A Graph Based Learning Framework for Creating and Understanding Multi-Agent System Behaviors

Description

A swarm describes a group of interacting agents exhibiting complex collective behaviors. Higher-level behavioral patterns of the group are believed to emerge from simple low-level rules of decision making at the agent-level. With the potential application of swarms of aerial drones, underwater robots, and other multi-robot systems, there has been…

A swarm describes a group of interacting agents exhibiting complex collective behaviors. Higher-level behavioral patterns of the group are believed to emerge from simple low-level rules of decision making at the agent-level. With the potential application of swarms of aerial drones, underwater robots, and other multi-robot systems, there has been increasing interest in approaches for specifying complex, collective behavior for artificial swarms. Traditional methods for creating artificial multi-agent behaviors inspired by known swarms analyze the underlying dynamics and hand craft low-level control logics that constitute the emerging behaviors. Deep learning methods offered an approach to approximate the behaviors through optimization without much human intervention.

This thesis proposes a graph based neural network architecture, SwarmNet, for learning the swarming behaviors of multi-agent systems. Given observation of only the trajectories of an expert multi-agent system, the SwarmNet is able to learn sensible representations of the internal low-level interactions on top of being able to approximate the high-level behaviors and make long-term prediction of the motion of the system. Challenges in scaling the SwarmNet and graph neural networks in general are discussed in detail, along with measures to alleviate the scaling issue in generalization is proposed. Using the trained network as a control policy, it is shown that the combination of imitation learning and reinforcement learning improves the policy more efficiently. To some extent, it is shown that the low-level interactions are successfully identified and separated and that the separated functionality enables fine controlled custom training.

ContributorsZhou, Siyu (Author) / Ben Amor, Heni (Thesis advisor) / Walker, Sara I (Thesis advisor) / Davies, Paul (Committee member) / Pavlic, Ted (Committee member) / Presse, Steve (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by