Search Content

Towards Advanced Malware Classification: A Reused Code Analysis of Mirai Bonnet and Ransomware

Description

Due to the increase in computer and database dependency, the damage caused by malicious codes increases. Moreover, gravity and the magnitude of malicious attacks by hackers grow at an unprecedented rate. A key challenge lies on detecting such malicious attacks and codes in real-time by the use of existing methods,…

Due to the increase in computer and database dependency, the damage caused by malicious codes increases. Moreover, gravity and the magnitude of malicious attacks by hackers grow at an unprecedented rate. A key challenge lies on detecting such malicious attacks and codes in real-time by the use of existing methods, such as a signature-based detection approach. To this end, computer scientists have attempted to classify heterogeneous types of malware on the basis of their observable characteristics. Existing literature focuses on classifying binary codes, due to the greater accessibility of malware binary than source code. Also, for the improved speed and scalability, machine learning-based approaches are widely used. Despite such merits, the machine learning-based approach critically lacks the interpretability of its outcome, thus restricts understandings of why a given code belongs to a particular type of malicious malware and, importantly, why some portions of a code are reused very often by hackers. In this light, this study aims to enhance understanding of malware by directly investigating reused codes and uncovering their characteristics.

To examine reused codes in malware, both malware with source code and malware with binary code are considered in this thesis. For malware with source code, reused code chunks in the Mirai botnet. This study lists frequently reused code chunks and analyzes the characteristics and location of the code. For malware with binary code, this study performs reverse engineering on the binary code for human readers to comprehend, visually inspects reused codes in binary ransomware code, and illustrates the functionality of the reused codes on the basis of similar behaviors and tactics.

This study makes a novel contribution to the literature by directly investigating the characteristics of reused code in malware. The findings of the study can help cybersecurity practitioners and scholars increase the performance of malware classification.

ContributorsLEe, Yeonjung (Author) / Bao, Youzhi (Thesis advisor) / Doupe, Adam (Committee member) / Shoshitaishvili, Yan (Committee member) / Arizona State University (Publisher)

Created2020

Proactive Identification of Cybersecurity Threats Using Online Sources

Description

Many existing applications of machine learning (ML) to cybersecurity are focused on detecting malicious activity already present in an enterprise. However, recent high-profile cyberattacks proved that certain threats could have been avoided. The speed of contemporary attacks along with the high costs of remediation incentivizes avoidance over response. Yet, avoidance…

Many existing applications of machine learning (ML) to cybersecurity are focused on detecting malicious activity already present in an enterprise. However, recent high-profile cyberattacks proved that certain threats could have been avoided. The speed of contemporary attacks along with the high costs of remediation incentivizes avoidance over response. Yet, avoidance implies the ability to predict - a notoriously difficult task due to high rates of false positives, difficulty in finding data that is indicative of future events, and the unexplainable results from machine learning algorithms.

In this dissertation, these challenges are addressed by presenting three artificial intelligence (AI) approaches to support prioritizing defense measures. The first two approaches leverage ML on cyberthreat intelligence data to predict if exploits are going to be used in the wild. The first work focuses on what data feeds are generated after vulnerability disclosures. The developed ML models outperform the current industry-standard method with F1 score more than doubled. Then, an approach to derive features about who generated the said data feeds is developed. The addition of these features increase recall by over 19% while maintaining precision. Finally, frequent itemset mining is combined with a variant of a probabilistic temporal logic framework to predict when attacks are likely to occur. In this approach, rules correlating malicious activity in the hacking community platforms with real-world cyberattacks are mined. They are then used in a deductive reasoning approach to generate predictions. The developed approach predicted unseen real-world attacks with an average increase in the value of F1 score by over 45%, compared to a baseline approach.

ContributorsAlmukaynizi, Mohammed (Author) / Shakarian, Paulo (Thesis advisor) / Huang, Dijiang (Committee member) / Maciejewski, Ross (Committee member) / Simari, Gerardo I. (Committee member) / Arizona State University (Publisher)

Created2019

Explainable AI in Workflow Development and Verification Using Pi-Calculus

Description

Computer science education is an increasingly vital area of study with various challenges that increase the difficulty level for new students resulting in higher attrition rates. As part of an effort to resolve this issue, a new visual programming language environment was developed for this research, the Visual IoT and…

Computer science education is an increasingly vital area of study with various challenges that increase the difficulty level for new students resulting in higher attrition rates. As part of an effort to resolve this issue, a new visual programming language environment was developed for this research, the Visual IoT and Robotics Programming Language Environment (VIPLE). VIPLE is based on computational thinking and flowchart, which reduces the needs of memorization of detailed syntax in text-based programming languages. VIPLE has been used at Arizona State University (ASU) in multiple years and sections of FSE100 as well as in universities worldwide. Another major issue with teaching large programming classes is the potential lack of qualified teaching assistants to grade and offer insight to a student’s programs at a level beyond output analysis.

In this dissertation, I propose a novel framework for performing semantic autograding, which analyzes student programs at a semantic level to help students learn with additional and systematic help. A general autograder is not practical for general programming languages, due to the flexibility of semantics. A practical autograder is possible in VIPLE, because of its simplified syntax and restricted options of semantics. The design of this autograder is based on the concept of theorem provers. To achieve this goal, I employ a modified version of Pi-Calculus to represent VIPLE programs and Hoare Logic to formalize program requirements. By building on the inference rules of Pi-Calculus and Hoare Logic, I am able to construct a theorem prover that can perform automated semantic analysis. Furthermore, building on this theorem prover enables me to develop a self-learning algorithm that can learn the conditions for a program’s correctness according to a given solution program.

ContributorsDe Luca, Gennaro (Author) / Chen, Yinong (Thesis advisor) / Liu, Huan (Thesis advisor) / Hsiao, Sharon (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2020

A Blockchain-Based Approach to Developing Scalable and Auditable E-Voting Systems Without Requiring a Trustworthy Central Authority

Description

The purpose of an election is for the voice of the voters to be heard. All the participants in an election must be able to trust that the result of an election is actually the opinion of the people, unaltered by anything or anyone that may be trying to sway…

The purpose of an election is for the voice of the voters to be heard. All the participants in an election must be able to trust that the result of an election is actually the opinion of the people, unaltered by anything or anyone that may be trying to sway the vote. In the voting process, any "black boxes" or secrets can lead to mistrust in the system. In this thesis, an approach is developed for an electronic voting framework that is transparent, auditable, and scalable, making it trustworthy and usable for a wide-scale election. Based on my analysis, linkable ring signatures are utilized in order to preserve voter privacy while ensuring that a corrupt authenticating authority could not sway the vote. A hierarchical blockchain framework is presented to make ring signatures a viable signature scheme even when working with large populations. The solution is evaluated for compliance with secure voting requirements and scalability.

ContributorsMarple, Sam (Author) / Yau, Sik-Sang (Thesis advisor) / Huang, Dijiang (Committee member) / Trieu, Ni (Committee member) / Arizona State University (Publisher)

Created2021

MADM-Based Smart Parking Guidance Algorithm

Description

In smart parking environments, how to choose suitable parking facilities with various attributes to satisfy certain criteria is an important decision issue. Based on the multiple attributes decision making (MADM) theory, this study proposed a smart parking guidance algorithm by considering three representative decision factors (i.e., walk duration, parking fee,…

In smart parking environments, how to choose suitable parking facilities with various attributes to satisfy certain criteria is an important decision issue. Based on the multiple attributes decision making (MADM) theory, this study proposed a smart parking guidance algorithm by considering three representative decision factors (i.e., walk duration, parking fee, and the number of vacant parking spaces) and various preferences of drivers. In this paper, the expected number of vacant parking spaces is regarded as an important attribute to reflect the difficulty degree of finding available parking spaces, and a queueing theory-based theoretical method was proposed to estimate this expected number for candidate parking facilities with different capacities, arrival rates, and service rates. The effectiveness of the MADM-based parking guidance algorithm was investigated and compared with a blind search-based approach in comprehensive scenarios with various distributions of parking facilities, traffic intensities, and user preferences. Experimental results show that the proposed MADM-based algorithm is effective to choose suitable parking resources to satisfy users’ preferences. Furthermore, it has also been observed that this newly proposed Markov Chain-based availability attribute is more effective to represent the availability of parking spaces than the arrival rate-based availability attribute proposed in existing research.

ContributorsLi, Bo (Author) / Pei, Yijian (Author) / Wu, Hao (Author) / Huang, Dijiang (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2017-12-13

VIPLE Extensions in Robotic Simulation, Quadrotor Control Platform, and Machine Learning for Multirotor Activity Recognition

Description

Machine learning tutorials often employ an application and runtime specific solution for a given problem in which users are expected to have a broad understanding of data analysis and software programming. This thesis focuses on designing and implementing a new, hands-on approach to teaching machine learning by streamlining the process…

Machine learning tutorials often employ an application and runtime specific solution for a given problem in which users are expected to have a broad understanding of data analysis and software programming. This thesis focuses on designing and implementing a new, hands-on approach to teaching machine learning by streamlining the process of generating Inertial Movement Unit (IMU) data from multirotor flight sessions, training a linear classifier, and applying said classifier to solve Multi-rotor Activity Recognition (MAR) problems in an online lab setting. MAR labs leverage cloud computing and data storage technologies to host a versatile environment capable of logging, orchestrating, and visualizing the solution for an MAR problem through a user interface. MAR labs extends Arizona State University’s Visual IoT/Robotics Programming Language Environment (VIPLE) as a control platform for multi-rotors used in data collection. VIPLE is a platform developed for teaching computational thinking, visual programming, Internet of Things (IoT) and robotics application development. As a part of this education platform, this work also develops a 3D simulator capable of simulating the programmable behaviors of a robot within a maze environment and builds a physical quadrotor for use in MAR lab experiments.

ContributorsDe La Rosa, Matthew Lee (Author) / Chen, Yinong (Thesis advisor) / Collofello, James (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2018