To examine reused codes in malware, both malware with source code and malware with binary code are considered in this thesis. For malware with source code, reused code chunks in the Mirai botnet. This study lists frequently reused code chunks and analyzes the characteristics and location of the code. For malware with binary code, this study performs reverse engineering on the binary code for human readers to comprehend, visually inspects reused codes in binary ransomware code, and illustrates the functionality of the reused codes on the basis of similar behaviors and tactics.
This study makes a novel contribution to the literature by directly investigating the characteristics of reused code in malware. The findings of the study can help cybersecurity practitioners and scholars increase the performance of malware classification.
In this dissertation, these challenges are addressed by presenting three artificial intelligence (AI) approaches to support prioritizing defense measures. The first two approaches leverage ML on cyberthreat intelligence data to predict if exploits are going to be used in the wild. The first work focuses on what data feeds are generated after vulnerability disclosures. The developed ML models outperform the current industry-standard method with F1 score more than doubled. Then, an approach to derive features about who generated the said data feeds is developed. The addition of these features increase recall by over 19% while maintaining precision. Finally, frequent itemset mining is combined with a variant of a probabilistic temporal logic framework to predict when attacks are likely to occur. In this approach, rules correlating malicious activity in the hacking community platforms with real-world cyberattacks are mined. They are then used in a deductive reasoning approach to generate predictions. The developed approach predicted unseen real-world attacks with an average increase in the value of F1 score by over 45%, compared to a baseline approach.
In this dissertation, I propose a novel framework for performing semantic autograding, which analyzes student programs at a semantic level to help students learn with additional and systematic help. A general autograder is not practical for general programming languages, due to the flexibility of semantics. A practical autograder is possible in VIPLE, because of its simplified syntax and restricted options of semantics. The design of this autograder is based on the concept of theorem provers. To achieve this goal, I employ a modified version of Pi-Calculus to represent VIPLE programs and Hoare Logic to formalize program requirements. By building on the inference rules of Pi-Calculus and Hoare Logic, I am able to construct a theorem prover that can perform automated semantic analysis. Furthermore, building on this theorem prover enables me to develop a self-learning algorithm that can learn the conditions for a program’s correctness according to a given solution program.
In smart parking environments, how to choose suitable parking facilities with various attributes to satisfy certain criteria is an important decision issue. Based on the multiple attributes decision making (MADM) theory, this study proposed a smart parking guidance algorithm by considering three representative decision factors (i.e., walk duration, parking fee, and the number of vacant parking spaces) and various preferences of drivers. In this paper, the expected number of vacant parking spaces is regarded as an important attribute to reflect the difficulty degree of finding available parking spaces, and a queueing theory-based theoretical method was proposed to estimate this expected number for candidate parking facilities with different capacities, arrival rates, and service rates. The effectiveness of the MADM-based parking guidance algorithm was investigated and compared with a blind search-based approach in comprehensive scenarios with various distributions of parking facilities, traffic intensities, and user preferences. Experimental results show that the proposed MADM-based algorithm is effective to choose suitable parking resources to satisfy users’ preferences. Furthermore, it has also been observed that this newly proposed Markov Chain-based availability attribute is more effective to represent the availability of parking spaces than the arrival rate-based availability attribute proposed in existing research.