Search Content

Kitsune: Structurally-Aware and Adaptable Plagiarism Detection

Description

Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a number of plagiarism detection tools that attempt to encode knowledge…

Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a number of plagiarism detection tools that attempt to encode knowledge about the programming languages they support in order to better detect obscured duplicates. Many such tools do not support a large number of languages because doing so requires too much code and therefore too much maintenance. It is also difficult to add support for new languages because each language is vastly different syntactically. Tools that are more extensible often do so by reducing the features of a language that are encoded and end up closer to text comparison tools than structurally-aware program analysis tools.

Kitsune attempts to remedy these issues by tying itself to Antlr, a pre-existing language recognition tool with over 200 currently supported languages. In addition, it provides an interface through which generic manipulations can be applied to the parse tree generated by Antlr. As Kitsune relies on language-agnostic structure modifications, it can be adapted with minimal effort to provide plagiarism detection for new languages. Kitsune has been evaluated for 10 of the languages in the Antlr grammar repository with success and could easily be extended to support all of the grammars currently developed by Antlr or future grammars which are developed as new languages are written.

ContributorsMonroe, Zachary Lynn (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2020

Identification of Compromised Nodes in Collaborative Intrusion Detection Systems for Large Scale Networks Due to Insider Attacks

Description

Large organizations have multiple networks that are subject to attacks, which can be detected by continuous monitoring and analyzing the network traffic by Intrusion Detection Systems. Collaborative Intrusion Detection Systems (CIDS) are used for efficient detection of distributed attacks by having a global view of the traffic events in large…

Large organizations have multiple networks that are subject to attacks, which can be detected by continuous monitoring and analyzing the network traffic by Intrusion Detection Systems. Collaborative Intrusion Detection Systems (CIDS) are used for efficient detection of distributed attacks by having a global view of the traffic events in large networks. However, CIDS are vulnerable to internal attacks, and these internal attacks decrease the mutual trust among the nodes in CIDS required for sharing of critical and sensitive alert data in CIDS. Without the data sharing, the nodes of CIDS cannot collaborate efficiently to form a comprehensive view of events in the networks monitored to detect distributed attacks. The compromised nodes will further decrease the accuracy of CIDS by generating false positives and false negatives of the traffic event classifications. In this thesis, an approach based on a trust score system is presented to detect and suspend the compromised nodes in CIDS to improve the trust among the nodes for efficient collaboration. This trust score-based approach is implemented as a consensus model on a private blockchain because private blockchain has the features to address the accountability, integrity and privacy requirements of CIDS. In this approach, the trust scores of malicious nodes are decreased with every reported false negative or false positive of the traffic event classifications. When the trust scores of any node falls below a threshold, the node is identified as compromised and suspended. The approach is evaluated for the accuracy of identifying malicious nodes in CIDS.

ContributorsYenugunti, Chandralekha (Author) / Yau, Stephen S. (Thesis advisor) / Yang, Yezhou (Committee member) / Zou, Jia (Committee member) / Arizona State University (Publisher)

Created2020

Improved Bi-criteria Approximation for the All-or-Nothing Multicommodity Flow Problem in Arbitrary Networks

Description

This thesis addresses the following fundamental maximum throughput routing problem: Given an arbitrary edge-capacitated n-node directed network and a set of k commodities, with source-destination pairs (s_i,t_i) and demands d_i> 0, admit and route the largest possible number of commodities -- i.e., the maximum throughput -- to satisfy their demands.…

This thesis addresses the following fundamental maximum throughput routing problem: Given an arbitrary edge-capacitated n-node directed network and a set of k commodities, with source-destination pairs (s_i,t_i) and demands d_i> 0, admit and route the largest possible number of commodities -- i.e., the maximum throughput -- to satisfy their demands.

The main contributions of this thesis are three-fold: First, a bi-criteria approximation algorithm is presented for this all-or-nothing multicommodity flow (ANF) problem. This algorithm is the first to achieve a constant approximation of the maximum throughput with an edge capacity violation ratio that is at most logarithmic in n, with high probability. The approach used is based on a version of randomized rounding that keeps splittable flows, rather than approximating those via a non-splittable path for each commodity: This allows it to work for arbitrary directed edge-capacitated graphs, unlike most of the prior work on the ANF problem. The algorithm also works if a weighted throughput is considered, where the benefit gained by fully satisfying the demand for commodity i is determined by a given weight w_i>0. Second, a derandomization of the algorithm is presented that maintains the same approximation bounds, using novel pessimistic estimators for Bernstein's inequality. In addition, it is shown how the framework can be adapted to achieve a polylogarithmic fraction of the maximum throughput while maintaining a constant edge capacity violation, if the network capacity is large enough. Lastly, one important aspect of the randomized and derandomized algorithms is their simplicity, which lends to efficient implementations in practice. The implementations of both randomized rounding and derandomized algorithms for the ANF problem are presented and show their efficiency in practice.

ContributorsChaturvedi, Anya (Author) / Richa, Andréa W. (Thesis advisor) / Sen, Arunabha (Committee member) / Schmid, Stefan (Committee member) / Arizona State University (Publisher)

Created2020

Towards Advanced Malware Classification: A Reused Code Analysis of Mirai Bonnet and Ransomware

Description

Due to the increase in computer and database dependency, the damage caused by malicious codes increases. Moreover, gravity and the magnitude of malicious attacks by hackers grow at an unprecedented rate. A key challenge lies on detecting such malicious attacks and codes in real-time by the use of existing methods,…

Due to the increase in computer and database dependency, the damage caused by malicious codes increases. Moreover, gravity and the magnitude of malicious attacks by hackers grow at an unprecedented rate. A key challenge lies on detecting such malicious attacks and codes in real-time by the use of existing methods, such as a signature-based detection approach. To this end, computer scientists have attempted to classify heterogeneous types of malware on the basis of their observable characteristics. Existing literature focuses on classifying binary codes, due to the greater accessibility of malware binary than source code. Also, for the improved speed and scalability, machine learning-based approaches are widely used. Despite such merits, the machine learning-based approach critically lacks the interpretability of its outcome, thus restricts understandings of why a given code belongs to a particular type of malicious malware and, importantly, why some portions of a code are reused very often by hackers. In this light, this study aims to enhance understanding of malware by directly investigating reused codes and uncovering their characteristics.

To examine reused codes in malware, both malware with source code and malware with binary code are considered in this thesis. For malware with source code, reused code chunks in the Mirai botnet. This study lists frequently reused code chunks and analyzes the characteristics and location of the code. For malware with binary code, this study performs reverse engineering on the binary code for human readers to comprehend, visually inspects reused codes in binary ransomware code, and illustrates the functionality of the reused codes on the basis of similar behaviors and tactics.

This study makes a novel contribution to the literature by directly investigating the characteristics of reused code in malware. The findings of the study can help cybersecurity practitioners and scholars increase the performance of malware classification.

ContributorsLEe, Yeonjung (Author) / Bao, Youzhi (Thesis advisor) / Doupe, Adam (Committee member) / Shoshitaishvili, Yan (Committee member) / Arizona State University (Publisher)

Created2020

Poincare Embeddings for Visualizing Eigenvector Centrality

Description

Hyperbolic geometry, which is a geometry which concerns itself with hyperbolic space, has caught the eye of certain circles in the machine learning community as of late. Lauded for its ability to encapsulate strong clustering as well as latent hierarchies in complex and social networks, hyperbolic geometry has proven itself…

Hyperbolic geometry, which is a geometry which concerns itself with hyperbolic space, has caught the eye of certain circles in the machine learning community as of late. Lauded for its ability to encapsulate strong clustering as well as latent hierarchies in complex and social networks, hyperbolic geometry has proven itself to be an enduring presence in the network science community throughout the 2010s, with no signs of fading into obscurity anytime soon. Hyperbolic embeddings, which map a given graph to hyperbolic space, have particularly proven to be a powerful and dynamic tool for studying complex networks. Hyperbolic embeddings are exploited in this thesis to illustrate centrality in a graph. In network science, centrality quantifies the influence of individual nodes in a graph. Eigenvector centrality is one type of such measure, and assigns an influence weight to each node in a graph by solving for an eigenvector equation. A procedure is defined to embed a given network in a model of hyperbolic space, known as the Poincare disk, according to the influence weights computed by three eigenvector centrality measures: the PageRank algorithm, the Hyperlink-Induced Topic Search (HITS) algorithm, and the Pinski-Narin algorithm. The resulting embeddings are shown to accurately and meaningfully reflect each node's influence and proximity to influential nodes.

ContributorsChang, Alena (Author) / Xue, Guoliang (Thesis advisor) / Yang, Dejun (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

Referring Expression Comprehension for CLEVR-Ref+ Dataset

Description

Referring Expression Comprehension (REC) is an important area of research in Natural Language Processing (NLP) and vision domain. It involves locating an object in an image described by a natural language referring expression. This task requires information from both Natural Language and Vision aspect. The task is compositional in nature…

Referring Expression Comprehension (REC) is an important area of research in Natural Language Processing (NLP) and vision domain. It involves locating an object in an image described by a natural language referring expression. This task requires information from both Natural Language and Vision aspect. The task is compositional in nature as it requires visual reasoning as underlying process along with relationships among the objects in the image. Recent works based on modular networks have

displayed to be an effective framework for performing visual reasoning task.

Although this approach is effective, it has been established that the current benchmark datasets for referring expression comprehension suffer from bias. Recent work on CLEVR-Ref+ dataset deals with bias issues by constructing a synthetic dataset

and provides an approach for the aforementioned task which performed better than the previous state-of-the-art models as well as showing the reasoning process. This work aims to improve the performance on CLEVR-Ref+ dataset and achieve comparable interpretability. In this work, the neural module network approach with the attention map technique is employed. The neural module network is composed of the primitive operation modules which are specific to their functions and the output is generated using a separate segmentation module. From empirical results, it is clear that this approach is performing significantly better than the current State-of-theart in one aspect (Predicted programs) and achieving comparable results for another aspect (Ground truth programs)

ContributorsRathor, Kuldeep Singh (Author) / Baral, Chitta (Thesis advisor) / Yang, Yezhou (Committee member) / Simeone, Michael (Committee member) / Arizona State University (Publisher)

Created2020

Generating Trusted Coordination of Collaborative Software Development Using Blockchain

Description

The coordination of developing various complex and large-scale projects using computers has been well established and is the so-called computer-supported cooperative work (CSCW). Collaborative software development consists of a group of teams working together to achieve a common goal for developing a high-quality, complex, and large-scale software system efficiently, and…

The coordination of developing various complex and large-scale projects using computers has been well established and is the so-called computer-supported cooperative work (CSCW). Collaborative software development consists of a group of teams working together to achieve a common goal for developing a high-quality, complex, and large-scale software system efficiently, and it requires common processes and communication channels among these teams. The common processes for coordination among software development teams can be handled by similar principles in CSCW. The development of complex and large-scale software becomes complicated due to the involvement of many software development teams. The development of such a software system can be largely improved by effective collaboration among the participating software development teams at both software components and system levels. The efficiency of developing software components depends on trusted coordination among the participating teams for sharing, processing, and managing information on various participating teams, which are often operating in a distributed environment. Participating teams may belong to the same organization or different organizations. Existing approaches to coordination in collaborative software development are based on using a centralized repository to store, process, and retrieve information on participating software development teams during the development. These approaches use a centralized authority, have a single point of failure, and restricted rights to own data and software. In this thesis, the generation of trusted coordination in collaborative software development using blockchain is studied, and an approach to achieving trusted cooperation for collaborative software development using blockchain is presented. The smart contracts are created in the blockchain to encode software specifications and acceptance criteria for the software results generated by participating teams. The blockchain used in the approach is a private blockchain because a private blockchain has the characteristics of providing non-repudiation, privacy, and integrity, which are required in trusted coordination of collaborative software development. This approach is implemented using Hyperledger, an open-source private blockchain. An example to illustrate the approach is also given.

ContributorsPatel, Jinal Sunilkumar (Author) / Yau, Stephen S. (Thesis advisor) / Bansal, Ajay (Committee member) / Zou, Jia (Committee member) / Arizona State University (Publisher)

Created2020

Peer to Peer Microlending: A Charitable Donation Management Platform on Blockchain

Description

Microlending aims at providing low-barrier loans to small to medium scaled family run businesses that are financially disincluded historically. These borrowers might be in third world countries where traditional financing is not accessible. Lenders can be individual investors or institutions making risky investments or willing to help people who cannot…

Microlending aims at providing low-barrier loans to small to medium scaled family run businesses that are financially disincluded historically. These borrowers might be in third world countries where traditional financing is not accessible. Lenders can be individual investors or institutions making risky investments or willing to help people who cannot access traditional banks or do not have the credibility to get loans from traditional sources. Microlending involves a charitable cause as well where lenders are not really concerned about what and how they are paid.

This thesis aims at building a platform that will support both commercial microlending as well as charitable donation to support the real cause of microlending. The platform is expected to ensure privacy and transparency to the users in order to attract more users to use the system. Microlending involves monetary transactions, hence possible security threats to the system are discussed.

Blockchain is one of the technologies which has revolutionized financial transactions and microlending involves monetary transactions. Therefore, blockchain is viable option for microlending platform. Permissioned blockchain restricts the user admission to the platform and provides with identity management feature. This feature is required to ensure the security and privacy of various types of participants on the microlending platform.

ContributorsSiddharth, Sourabh (Author) / Boscovic, Dragan (Thesis advisor) / Basnal, Srividya (Thesis advisor) / Sanchez, Javier Gonzalez (Committee member) / Arizona State University (Publisher)

Created2020

Learning High-Dimensional Critical Regions for Efficient Robot Planning

Description

Robot motion planning requires computing a sequence of waypoints from an initial configuration of the robot to the goal configuration. Solving a motion planning problem optimally is proven to be NP-Complete. Sampling-based motion planners efficiently compute an approximation of the optimal solution. They sample the configuration space uniformly and hence…

Robot motion planning requires computing a sequence of waypoints from an initial configuration of the robot to the goal configuration. Solving a motion planning problem optimally is proven to be NP-Complete. Sampling-based motion planners efficiently compute an approximation of the optimal solution. They sample the configuration space uniformly and hence fail to sample regions of the environment that have narrow passages or pinch points. These critical regions are analogous to landmarks from planning literature as the robot is required to pass through them to reach the goal.

This work proposes a deep learning approach that identifies critical regions in the environment and learns a sampling distribution to effectively sample them in high dimensional configuration spaces.

A classification-based approach is used to learn the distributions. The robot degrees of freedom (DOF) limits are binned and a distribution is generated from sampling motion plan solutions. Conditional information like goal configuration and robot location encoded in the network inputs showcase the network learning to bias the identified critical regions towards the goal configuration. Empirical evaluations are performed against the state of the art sampling-based motion planners on a variety of tasks requiring the robot to pass through critical regions. An empirical analysis of robotic systems with three to eight degrees of freedom indicates that this approach effectively improves planning performance.

ContributorsSrinet, Abhyudaya (Author) / Srivastava, Siddharth (Thesis advisor) / Zhang, Yu (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

Natural Intent: The Use and Misuse of Intents in Android Applications

Description

The Java programing language was implemented in such a way as to limit the amount of possible ways that a program written in Java could be exploited. Unfortunately, all of the protections and safeguards put in place for Java can be circumvented if a program created in Java utilizes…

The Java programing language was implemented in such a way as to limit the amount of possible ways that a program written in Java could be exploited. Unfortunately, all of the protections and safeguards put in place for Java can be circumvented if a program created in Java utilizes internal or external libraries that were created in a separate, insecure language such as C or C++. A secure Java program can then be made insecure and susceptible to even classic vulnerabilities such as stack overflows, string format attacks, and heap overflows and corruption. Through the internal or external libraries included in the Java program, an attacker could potentially hijack the execution flow of the program. Once the Attacker has control of where and how the program executes, the attacker can spread their influence to the rest of the system.

However, since these classic vulnerabilities are known weaknesses, special types of protections have been added to the compilers which create the executable code and the systems that run them. The most common forms of protection include Address SpaceLayout Randomization (ASLR), Non-eXecutable stack (NX Stack), and stack cookies or canaries. Of course, these protections and their implementations vary depending on the system. I intend to look specifically at the Android operating system which is used in the daily lives of a significant portion of the planet. Most Android applications execute in a Java context and leave little room for exploitability, however, there are also many applications which utilize external libraries to handle more computationally intensive tasks.

The goal of this thesis is to take a closer look at such applications and the protections surrounding them, especially how the default system protections as mentioned above are implemented and applied to the vulnerable external libraries. However, this is only half of the problem. The attacker must get their payload inside of the application in the first place. Since it is necessary to understand how this is occurring, I will also be exploring how the Android operating system gives outside information to applications and how developers have chosen to use that information.

ContributorsGibbs, William (Author) / Doupe, Adam (Thesis advisor) / Wang, Ruoyu (Committee member) / Shoshitaishvilli, Yan (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by