Search Content

Interpretable Question Answering using Deep Embedded Knowledge Reasoning to Solve Qualitative Word Problems

Description

One of the measures to determine the intelligence of a system is through Question Answering, as it requires a system to comprehend a question and reason using its knowledge base to accurately answer it. Qualitative word problems are an important subset of such problems, as they require a system to…

One of the measures to determine the intelligence of a system is through Question Answering, as it requires a system to comprehend a question and reason using its knowledge base to accurately answer it. Qualitative word problems are an important subset of such problems, as they require a system to recognize and reason with qualitative knowledge expressed in natural language. Traditional approaches in this domain include multiple modules to parse a given problem and to perform the required reasoning. Recent approaches involve using large pre-trained Language models like the Bidirection Encoder Representations from Transformers for downstream question answering tasks through supervision. These approaches however either suffer from errors between multiple modules, or are not interpretable with respect to the reasoning process employed. The proposed solution in this work aims to overcome these drawbacks through a single end-to-end trainable model that performs both the required parsing and reasoning. The parsing is achieved through an attention mechanism, whereas the reasoning is performed in vector space using soft logic operations. The model also enforces constraints in the form of auxiliary loss terms to increase the interpretability of the underlying reasoning process. The work achieves state of the art accuracy on the QuaRel dataset and matches that of the QuaRTz dataset with additional interpretability.

ContributorsNarayana, Sanjay (Author) / Baral, Chitta (Thesis advisor) / Mitra, Arindam (Committee member) / Anwar, Saadat (Committee member) / Arizona State University (Publisher)

Created2020

Hidden Fear: Evaluating the Effectiveness of Messages on Social Media

Description

The development of the internet provided new means for people to communicate effectively and share their ideas. There has been a decline in the consumption of newspapers and traditional broadcasting media toward online social mediums in recent years. Social media has been introduced as a new way of increasing democratic…

The development of the internet provided new means for people to communicate effectively and share their ideas. There has been a decline in the consumption of newspapers and traditional broadcasting media toward online social mediums in recent years. Social media has been introduced as a new way of increasing democratic discussions on political and social matters. Among social media, Twitter is widely used by politicians, government officials, communities, and parties to make announcements and reach their voice to their followers. This greatly increases the acceptance domain of the medium.

The usage of social media during social and political campaigns has been the subject of a lot of social science studies including the Occupy Wall Street movement, The Arab Spring, the United States (US) election, more recently The Brexit campaign. The wide

spread usage of social media in this space and the active participation of people in the discussions on social media made this communication channel a suitable place for spreading propaganda to alter public opinion.

An interesting feature of twitter is the feasibility of which bots can be programmed to operate on this platform. Social media bots are automated agents engineered to emulate the activity of a human being by tweeting some specific content, replying to users, magnifying certain topics by retweeting them. Network on these bots is called botnets and describing the collaboration of connected computers with programs that communicates across multiple devices to perform some task.

In this thesis, I will study how bots can influence the opinion, finding which parameters are playing a role in shrinking or coalescing the communities, and finally logically proving the effectiveness of each of the hypotheses.

ContributorsAhmadi, Mohsen (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2020

Kitsune: Structurally-Aware and Adaptable Plagiarism Detection

Description

Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a number of plagiarism detection tools that attempt to encode knowledge…

Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a number of plagiarism detection tools that attempt to encode knowledge about the programming languages they support in order to better detect obscured duplicates. Many such tools do not support a large number of languages because doing so requires too much code and therefore too much maintenance. It is also difficult to add support for new languages because each language is vastly different syntactically. Tools that are more extensible often do so by reducing the features of a language that are encoded and end up closer to text comparison tools than structurally-aware program analysis tools.

Kitsune attempts to remedy these issues by tying itself to Antlr, a pre-existing language recognition tool with over 200 currently supported languages. In addition, it provides an interface through which generic manipulations can be applied to the parse tree generated by Antlr. As Kitsune relies on language-agnostic structure modifications, it can be adapted with minimal effort to provide plagiarism detection for new languages. Kitsune has been evaluated for 10 of the languages in the Antlr grammar repository with success and could easily be extended to support all of the grammars currently developed by Antlr or future grammars which are developed as new languages are written.

ContributorsMonroe, Zachary Lynn (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2020

Identification of Compromised Nodes in Collaborative Intrusion Detection Systems for Large Scale Networks Due to Insider Attacks

Description

Large organizations have multiple networks that are subject to attacks, which can be detected by continuous monitoring and analyzing the network traffic by Intrusion Detection Systems. Collaborative Intrusion Detection Systems (CIDS) are used for efficient detection of distributed attacks by having a global view of the traffic events in large…

Large organizations have multiple networks that are subject to attacks, which can be detected by continuous monitoring and analyzing the network traffic by Intrusion Detection Systems. Collaborative Intrusion Detection Systems (CIDS) are used for efficient detection of distributed attacks by having a global view of the traffic events in large networks. However, CIDS are vulnerable to internal attacks, and these internal attacks decrease the mutual trust among the nodes in CIDS required for sharing of critical and sensitive alert data in CIDS. Without the data sharing, the nodes of CIDS cannot collaborate efficiently to form a comprehensive view of events in the networks monitored to detect distributed attacks. The compromised nodes will further decrease the accuracy of CIDS by generating false positives and false negatives of the traffic event classifications. In this thesis, an approach based on a trust score system is presented to detect and suspend the compromised nodes in CIDS to improve the trust among the nodes for efficient collaboration. This trust score-based approach is implemented as a consensus model on a private blockchain because private blockchain has the features to address the accountability, integrity and privacy requirements of CIDS. In this approach, the trust scores of malicious nodes are decreased with every reported false negative or false positive of the traffic event classifications. When the trust scores of any node falls below a threshold, the node is identified as compromised and suspended. The approach is evaluated for the accuracy of identifying malicious nodes in CIDS.

ContributorsYenugunti, Chandralekha (Author) / Yau, Stephen S. (Thesis advisor) / Yang, Yezhou (Committee member) / Zou, Jia (Committee member) / Arizona State University (Publisher)

Created2020

Improved Bi-criteria Approximation for the All-or-Nothing Multicommodity Flow Problem in Arbitrary Networks

Description

This thesis addresses the following fundamental maximum throughput routing problem: Given an arbitrary edge-capacitated n-node directed network and a set of k commodities, with source-destination pairs (s_i,t_i) and demands d_i> 0, admit and route the largest possible number of commodities -- i.e., the maximum throughput -- to satisfy their demands.…

This thesis addresses the following fundamental maximum throughput routing problem: Given an arbitrary edge-capacitated n-node directed network and a set of k commodities, with source-destination pairs (s_i,t_i) and demands d_i> 0, admit and route the largest possible number of commodities -- i.e., the maximum throughput -- to satisfy their demands.

The main contributions of this thesis are three-fold: First, a bi-criteria approximation algorithm is presented for this all-or-nothing multicommodity flow (ANF) problem. This algorithm is the first to achieve a constant approximation of the maximum throughput with an edge capacity violation ratio that is at most logarithmic in n, with high probability. The approach used is based on a version of randomized rounding that keeps splittable flows, rather than approximating those via a non-splittable path for each commodity: This allows it to work for arbitrary directed edge-capacitated graphs, unlike most of the prior work on the ANF problem. The algorithm also works if a weighted throughput is considered, where the benefit gained by fully satisfying the demand for commodity i is determined by a given weight w_i>0. Second, a derandomization of the algorithm is presented that maintains the same approximation bounds, using novel pessimistic estimators for Bernstein's inequality. In addition, it is shown how the framework can be adapted to achieve a polylogarithmic fraction of the maximum throughput while maintaining a constant edge capacity violation, if the network capacity is large enough. Lastly, one important aspect of the randomized and derandomized algorithms is their simplicity, which lends to efficient implementations in practice. The implementations of both randomized rounding and derandomized algorithms for the ANF problem are presented and show their efficiency in practice.

ContributorsChaturvedi, Anya (Author) / Richa, Andréa W. (Thesis advisor) / Sen, Arunabha (Committee member) / Schmid, Stefan (Committee member) / Arizona State University (Publisher)

Created2020

Towards Advanced Malware Classification: A Reused Code Analysis of Mirai Bonnet and Ransomware

Description

Due to the increase in computer and database dependency, the damage caused by malicious codes increases. Moreover, gravity and the magnitude of malicious attacks by hackers grow at an unprecedented rate. A key challenge lies on detecting such malicious attacks and codes in real-time by the use of existing methods,…

Due to the increase in computer and database dependency, the damage caused by malicious codes increases. Moreover, gravity and the magnitude of malicious attacks by hackers grow at an unprecedented rate. A key challenge lies on detecting such malicious attacks and codes in real-time by the use of existing methods, such as a signature-based detection approach. To this end, computer scientists have attempted to classify heterogeneous types of malware on the basis of their observable characteristics. Existing literature focuses on classifying binary codes, due to the greater accessibility of malware binary than source code. Also, for the improved speed and scalability, machine learning-based approaches are widely used. Despite such merits, the machine learning-based approach critically lacks the interpretability of its outcome, thus restricts understandings of why a given code belongs to a particular type of malicious malware and, importantly, why some portions of a code are reused very often by hackers. In this light, this study aims to enhance understanding of malware by directly investigating reused codes and uncovering their characteristics.

To examine reused codes in malware, both malware with source code and malware with binary code are considered in this thesis. For malware with source code, reused code chunks in the Mirai botnet. This study lists frequently reused code chunks and analyzes the characteristics and location of the code. For malware with binary code, this study performs reverse engineering on the binary code for human readers to comprehend, visually inspects reused codes in binary ransomware code, and illustrates the functionality of the reused codes on the basis of similar behaviors and tactics.

This study makes a novel contribution to the literature by directly investigating the characteristics of reused code in malware. The findings of the study can help cybersecurity practitioners and scholars increase the performance of malware classification.

ContributorsLEe, Yeonjung (Author) / Bao, Youzhi (Thesis advisor) / Doupe, Adam (Committee member) / Shoshitaishvili, Yan (Committee member) / Arizona State University (Publisher)

Created2020

Poincare Embeddings for Visualizing Eigenvector Centrality

Description

Hyperbolic geometry, which is a geometry which concerns itself with hyperbolic space, has caught the eye of certain circles in the machine learning community as of late. Lauded for its ability to encapsulate strong clustering as well as latent hierarchies in complex and social networks, hyperbolic geometry has proven itself…

Hyperbolic geometry, which is a geometry which concerns itself with hyperbolic space, has caught the eye of certain circles in the machine learning community as of late. Lauded for its ability to encapsulate strong clustering as well as latent hierarchies in complex and social networks, hyperbolic geometry has proven itself to be an enduring presence in the network science community throughout the 2010s, with no signs of fading into obscurity anytime soon. Hyperbolic embeddings, which map a given graph to hyperbolic space, have particularly proven to be a powerful and dynamic tool for studying complex networks. Hyperbolic embeddings are exploited in this thesis to illustrate centrality in a graph. In network science, centrality quantifies the influence of individual nodes in a graph. Eigenvector centrality is one type of such measure, and assigns an influence weight to each node in a graph by solving for an eigenvector equation. A procedure is defined to embed a given network in a model of hyperbolic space, known as the Poincare disk, according to the influence weights computed by three eigenvector centrality measures: the PageRank algorithm, the Hyperlink-Induced Topic Search (HITS) algorithm, and the Pinski-Narin algorithm. The resulting embeddings are shown to accurately and meaningfully reflect each node's influence and proximity to influential nodes.

ContributorsChang, Alena (Author) / Xue, Guoliang (Thesis advisor) / Yang, Dejun (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

Referring Expression Comprehension for CLEVR-Ref+ Dataset

Description

Referring Expression Comprehension (REC) is an important area of research in Natural Language Processing (NLP) and vision domain. It involves locating an object in an image described by a natural language referring expression. This task requires information from both Natural Language and Vision aspect. The task is compositional in nature…

Referring Expression Comprehension (REC) is an important area of research in Natural Language Processing (NLP) and vision domain. It involves locating an object in an image described by a natural language referring expression. This task requires information from both Natural Language and Vision aspect. The task is compositional in nature as it requires visual reasoning as underlying process along with relationships among the objects in the image. Recent works based on modular networks have

displayed to be an effective framework for performing visual reasoning task.

Although this approach is effective, it has been established that the current benchmark datasets for referring expression comprehension suffer from bias. Recent work on CLEVR-Ref+ dataset deals with bias issues by constructing a synthetic dataset

and provides an approach for the aforementioned task which performed better than the previous state-of-the-art models as well as showing the reasoning process. This work aims to improve the performance on CLEVR-Ref+ dataset and achieve comparable interpretability. In this work, the neural module network approach with the attention map technique is employed. The neural module network is composed of the primitive operation modules which are specific to their functions and the output is generated using a separate segmentation module. From empirical results, it is clear that this approach is performing significantly better than the current State-of-theart in one aspect (Predicted programs) and achieving comparable results for another aspect (Ground truth programs)

ContributorsRathor, Kuldeep Singh (Author) / Baral, Chitta (Thesis advisor) / Yang, Yezhou (Committee member) / Simeone, Michael (Committee member) / Arizona State University (Publisher)

Created2020

Generating Trusted Coordination of Collaborative Software Development Using Blockchain

Description

The coordination of developing various complex and large-scale projects using computers has been well established and is the so-called computer-supported cooperative work (CSCW). Collaborative software development consists of a group of teams working together to achieve a common goal for developing a high-quality, complex, and large-scale software system efficiently, and…

The coordination of developing various complex and large-scale projects using computers has been well established and is the so-called computer-supported cooperative work (CSCW). Collaborative software development consists of a group of teams working together to achieve a common goal for developing a high-quality, complex, and large-scale software system efficiently, and it requires common processes and communication channels among these teams. The common processes for coordination among software development teams can be handled by similar principles in CSCW. The development of complex and large-scale software becomes complicated due to the involvement of many software development teams. The development of such a software system can be largely improved by effective collaboration among the participating software development teams at both software components and system levels. The efficiency of developing software components depends on trusted coordination among the participating teams for sharing, processing, and managing information on various participating teams, which are often operating in a distributed environment. Participating teams may belong to the same organization or different organizations. Existing approaches to coordination in collaborative software development are based on using a centralized repository to store, process, and retrieve information on participating software development teams during the development. These approaches use a centralized authority, have a single point of failure, and restricted rights to own data and software. In this thesis, the generation of trusted coordination in collaborative software development using blockchain is studied, and an approach to achieving trusted cooperation for collaborative software development using blockchain is presented. The smart contracts are created in the blockchain to encode software specifications and acceptance criteria for the software results generated by participating teams. The blockchain used in the approach is a private blockchain because a private blockchain has the characteristics of providing non-repudiation, privacy, and integrity, which are required in trusted coordination of collaborative software development. This approach is implemented using Hyperledger, an open-source private blockchain. An example to illustrate the approach is also given.

ContributorsPatel, Jinal Sunilkumar (Author) / Yau, Stephen S. (Thesis advisor) / Bansal, Ajay (Committee member) / Zou, Jia (Committee member) / Arizona State University (Publisher)

Created2020

Peer to Peer Microlending: A Charitable Donation Management Platform on Blockchain

Description

Microlending aims at providing low-barrier loans to small to medium scaled family run businesses that are financially disincluded historically. These borrowers might be in third world countries where traditional financing is not accessible. Lenders can be individual investors or institutions making risky investments or willing to help people who cannot…

Microlending aims at providing low-barrier loans to small to medium scaled family run businesses that are financially disincluded historically. These borrowers might be in third world countries where traditional financing is not accessible. Lenders can be individual investors or institutions making risky investments or willing to help people who cannot access traditional banks or do not have the credibility to get loans from traditional sources. Microlending involves a charitable cause as well where lenders are not really concerned about what and how they are paid.

This thesis aims at building a platform that will support both commercial microlending as well as charitable donation to support the real cause of microlending. The platform is expected to ensure privacy and transparency to the users in order to attract more users to use the system. Microlending involves monetary transactions, hence possible security threats to the system are discussed.

Blockchain is one of the technologies which has revolutionized financial transactions and microlending involves monetary transactions. Therefore, blockchain is viable option for microlending platform. Permissioned blockchain restricts the user admission to the platform and provides with identity management feature. This feature is required to ensure the security and privacy of various types of participants on the microlending platform.

ContributorsSiddharth, Sourabh (Author) / Boscovic, Dragan (Thesis advisor) / Basnal, Srividya (Thesis advisor) / Sanchez, Javier Gonzalez (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by