Search Content

Malicious IP Address Prediction

Description

IP blacklisting is a popular technique to bolster an enterprise's security, where access to and from designated IP addresses is explicitly restricted. The fundamental idea behind blacklists is to continually add IP addresses that reputable entities, such as security researchers, have labeled as malicious to the list. Currently IP blacklisting…

IP blacklisting is a popular technique to bolster an enterprise's security, where access to and from designated IP addresses is explicitly restricted. The fundamental idea behind blacklists is to continually add IP addresses that reputable entities, such as security researchers, have labeled as malicious to the list. Currently IP blacklisting is a reactive method, where malicious IP addresses are identified after their engagement in malicious activities is detected (e.g. hosting malware samples or sending spam emails). This thesis project aims to address this issue, by laying the groundwork for a machine learning tool that proactively identifies malicious IP address. The ground truth data derives from VirusTotal, a company that synthesizes security knowledge from prominent sources, such as Symantec, Fortinet, and ESET. I passed 307,621 IP addresses found in posts on the D2web (deep and dark web) through VirusTotal. If at least one detected URL associates with the IP address and VirusTotal deems it positive, I accordingly label the IP address as positive (malicious), and negative (non-malicious) otherwise. To give some insight into the ground truth, 6,147 IP addresses were identified as positive from the original 307,621. Furthermore, in order to quantify the prediction capabilities of our models, I introduce a metric called lead time. Lead time represents the difference between the date an IP address was first seen on the D2web and its earliest date on VirusTotal. For example, if an IP address was mentioned on the D2web on 1/5/2017 and mentioned on VirusTotal on 1/25/2017, then its lead time is 20 days. After feature selection, where I handpicked features from the data mined from the D2web, I attempted various combinations of classifiers and feature sets in order to create the best model. The final machine learning models implement temporal cross validation - where I train a model on data from 1/1/2016 up until a testing month in 2017, and test on data from the testing month - with a Random Forest classifier. The following are results from a model that was tested on January 2017, which exhibits median performance among the final models. The true positive rate is 0.2558, the false positive rate is 0.3612, and the average lead time (for leading true positives) is 193 days, where the model picks up 33.33% of all leading true positives. Although the model finds a respectable number of true positives, it picks up too many false positives. Thus, my approach is ineffective at predicting malicious IP addresses in their current state, meaning additional efforts will be required to transform the current work into a viable tool

ContributorsAlba, Iden (Author) / Shakarian, Paulo (Thesis director) / Shakarian, Jana (Committee member) / Barrett, The Honors College (Contributor)

Created2018-05

Darkweb Cyber Threat Intelligence Mining through the I2P Protocol

Description

This thesis project focused on malicious hacking community activities accessible through the I2P protocol. We visited 315 distinct I2P sites to identify those with malicious hacking content. We also wrote software to scrape and parse data from relevant I2P sites. The data was integrated into the CySIS databases for further…

This thesis project focused on malicious hacking community activities accessible through the I2P protocol. We visited 315 distinct I2P sites to identify those with malicious hacking content. We also wrote software to scrape and parse data from relevant I2P sites. The data was integrated into the CySIS databases for further analysis to contribute to the larger CySIS Lab Darkweb Cyber Threat Intelligence Mining research. We found that the I2P cryptonet was slow and had only a small amount of malicious hacking community activity. However, we also found evidence of a growing perception that Tor anonymity could be compromised. This work will contribute to understanding the malicious hacker community as some Tor users, seeking assured anonymity, transition to I2P.

ContributorsHutchins, James Keith (Author) / Shakarian, Paulo (Thesis director) / Ahn, Gail-Joon (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Data Driven Game Theoretic Cyber Threat Mitigation

Description

Penetration testing is regarded as the gold-standard for understanding how well an organization can withstand sophisticated cyber-attacks. However, the recent prevalence of markets specializing in zero-day exploits on the darknet make exploits widely available to potential attackers. The cost associated with these sophisticated kits generally precludes penetration testers from simply…

Penetration testing is regarded as the gold-standard for understanding how well an organization can withstand sophisticated cyber-attacks. However, the recent prevalence of markets specializing in zero-day exploits on the darknet make exploits widely available to potential attackers. The cost associated with these sophisticated kits generally precludes penetration testers from simply obtaining such exploits – so an alternative approach is needed to understand what exploits an attacker will most likely purchase and how to defend against them. In this paper, we introduce a data-driven security game framework to model an attacker and provide policy recommendations to the defender. In addition to providing a formal framework and algorithms to develop strategies, we present experimental results from applying our framework, for various system conﬁgurations, on real-world exploit market data actively mined from the darknet.

ContributorsRobertson, John James (Author) / Shakarian, Paulo (Thesis director) / Doupe, Adam (Committee member) / Electrical Engineering Program (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

An Algorithm for Merging Identities

Description

In online social networks the identities of users are concealed, often by design. This anonymity makes it possible for a single person to have multiple accounts and to engage in malicious activity such as defrauding a service providers, leveraging social influence, or hiding activities that would otherwise be detected. There…

In online social networks the identities of users are concealed, often by design. This anonymity makes it possible for a single person to have multiple accounts and to engage in malicious activity such as defrauding a service providers, leveraging social influence, or hiding activities that would otherwise be detected. There are various methods for detecting whether two online users in a network are the same people in reality and the simplest way to utilize this information is to simply merge their identities and treat the two users as a single user. However, this then raises the issue of how we deal with these composite identities. To solve this problem, we introduce a mathematical abstraction for representing users and their identities as partitions on a set. We then define a similarity function, SIM, between two partitions, a set of properties that SIM must have, and a threshold that SIM must exceed for two users to be considered the same person. The main theoretical result of our work is a proof that for any given partition and similarity threshold, there is only a single unique way to merge the identities of similar users such that no two identities are similar. We also present two algorithms, COLLAPSE and SIM_MERGE, that merge the identities of users to find this unique set of identities. We prove that both algorithms execute in polynomial time and we also perform an experiment on dark web social network data from over 6000 users that demonstrates the runtime of SIM_MERGE.

ContributorsPolican, Andrew Dominic (Author) / Shakarian, Paulo (Thesis director) / Sen, Arunabha (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Modeling State to Improve Defensive Cyberattack Strategies

Description

Human civilization within the last two decades has largely transformed into an online one, with many of its associated activities taking place on computers and complex networked systems -- their analog and real-world equivalents having been rendered obsolete.These activities run the gamut from the ordinary and mundane, like ordering food,…

Human civilization within the last two decades has largely transformed into an online one, with many of its associated activities taking place on computers and complex networked systems -- their analog and real-world equivalents having been rendered obsolete.These activities run the gamut from the ordinary and mundane, like ordering food, to complex and large-scale, such as those involving critical infrastructure or global trade and communications. Unfortunately, the activities of human civilization also involve criminal, adversarial, and malicious ones with the result that they also now have their digital equivalents. Ransomware, malware, and targeted cyberattacks are a fact of life today and are instigated not only by organized criminal gangs, but adversarial nation-states and organizations as well. Needless to say, such actions result in disastrous and harmful real-world consequences. As the complexity and variety of software has evolved, so too has the ingenuity of attacks that exploit them; for example modern cyberattacks typically involve sequential exploitation of multiple software vulnerabilities.Compared to a decade ago, modern software stacks on personal computers, laptops, servers, mobile phones, and even Internet of Things (IoT) devices involve a dizzying array of interdependent programs and software libraries, with each of these components presenting attractive attack-surfaces for adversarial actors. However, the responses to this still rely on paradigms that can neither react quickly enough nor scale to increasingly dynamic, ever-changing, and complex software environments. Better approaches are therefore needed, that can assess system readiness and vulnerabilities, identify potential attack vectors and strategies (including ways to counter them), and proactively detect vulnerabilities in complex software before they can be exploited. In this dissertation, I first present a mathematical model and associated algorithms to identify attacker strategies for sequential cyberattacks based on attacker state, attributes and publicly-available vulnerability information.Second, I extend the model and design algorithms to help identify defensive courses of action against attacker strategies. Finally, I present my work to enhance the ability of coverage-based fuzzers to identify software vulnerabilities by providing visibility into complex, internal program-states.

ContributorsPaliath, Vivin Suresh (Author) / Doupe, Adam (Thesis advisor) / Shoshitaishvili, Yan (Thesis advisor) / Wang, Ruoyu (Committee member) / Shakarian, Paulo (Committee member) / Arizona State University (Publisher)

Created2023

TradingDawg: Operationalizing an Algorithmic Trading Application

Description

For my Thesis Project, I worked to operationalize an algorithmic trading application called Trading Dawg. Over the year, I was able to implement several analysis models, including accuracy, performance, volume, and hyperparameter analysis. With these improvements, we are in a strong position to create valuable tools in the algorithmic trading…

For my Thesis Project, I worked to operationalize an algorithmic trading application called Trading Dawg. Over the year, I was able to implement several analysis models, including accuracy, performance, volume, and hyperparameter analysis. With these improvements, we are in a strong position to create valuable tools in the algorithmic trading space.

ContributorsPayne, Colton (Author) / Shakarian, Paulo (Thesis director) / Brandt, William (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Department of Finance (Contributor)

Created2023-05

Deriving Intrinsic Baseball Pitcher Value By Predicting Pitcher Performance From Individual Pitch Metrics

Description

Historically, the predominant strategy for evaluating baseball pitchers has been through statistics created directly from the offensive production against the pitcher, such as ERA. Such statistics are inherently relative to the abilities and competition level of the opposing offense and the field defense, which the pitcher has no control over,…

Historically, the predominant strategy for evaluating baseball pitchers has been through statistics created directly from the offensive production against the pitcher, such as ERA. Such statistics are inherently relative to the abilities and competition level of the opposing offense and the field defense, which the pitcher has no control over, making it difficult to compare pitchers across leagues. In this paper, I use cutting edge pitch-tracking data to develop a pitch evaluation model that is intrinsic to the attributes of the pitches themselves, and not influenced directly by the outcomes of each individual pitch. I train four different classifiers to predict the probability of each pitch belonging to different subsets of outcomes, then multiply the probability of each outcome by that outcome’s average run value to arrive at an expected run value for the pitch. I compare the performance of each classifier to a baseline, examine the most impactful features, and compare the top pitchers identified by the model to those identified by a different baseball statistics resource, ultimately concluding that three of the four classification models are productive and that the overall intrinsic evaluation model accurately identifies the sports top performers.

ContributorsSmith, Roman (Author) / Shakarian, Paulo (Thesis director) / Macdonald, Brian (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Probable Perils: An Analysis of the Workings and Social Impact of ChatGPT

Description

This thesis provides an analysis of the potential issues of using ChatGPT, as despite its benefits it does have its concerns that may deter societal progress. The thesis first provides insight into how ChatGPT generates text and provides insight into how the process of generating its outputs can lead to…

This thesis provides an analysis of the potential issues of using ChatGPT, as despite its benefits it does have its concerns that may deter societal progress. The thesis first provides insight into how ChatGPT generates text and provides insight into how the process of generating its outputs can lead to a variety of issues in the output such as hallucinated and biased output. After explaining how these issues occur, the thesis focuses on the impact of these issues in important industries such as medicine, education, and security, comparing them to popular open-source models such as Llama and Falcon.

ContributorsTsai, Brandon (Author) / Martin, Thomas (Thesis director) / Shakarian, Paulo (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2024-05

Understanding Hacking-as-a-Service Markets

Description

An examination of 12 darkweb sites involved in selling hacking services - often referred to as ”Hacking-as-a-Service” (HaaS) sites is performed. Data is gathered and analyzed for 7 months via weekly site crawling and parsing. In this empirical study, after examining over 200 forum threads, common categories of services available…

An examination of 12 darkweb sites involved in selling hacking services - often referred to as ”Hacking-as-a-Service” (HaaS) sites is performed. Data is gathered and analyzed for 7 months via weekly site crawling and parsing. In this empirical study, after examining over 200 forum threads, common categories of services available on HaaS sites are identified as well as their associated topics of conversation. Some of the most common hacking service categories in the HaaS market include Social Media, Database, and Phone hacking. These types of services are the most commonly advertised; found on over 50\% of all HaaS sites, while services related to Malware and Ransomware are advertised on less than 30\% of these sites. Additionally, an analysis is performed on prices of these services along with their volume of demand and comparisons made between the prices listed in posts seeking services with those sites selling services. It is observed that individuals looking to hire hackers for these services are offering to pay premium prices, on average, 73\% more than what the individual hackers are requesting on their own sites. Overall, this study provides insights into illicit markets for contact based hacking especially with regards to services such as social media hacking, email breaches, and website defacement.

ContributorsVincent, Brian W (Author) / Shakarian, Paulo (Thesis advisor) / Candan, Selcuk (Committee member) / Ahn, Gail-Joon (Committee member) / Arizona State University (Publisher)

Created2018

Analysis and Management of Security State for Large-Scale Data Center Networks

Description

With the increasing complexity of computing systems and the rise in the number of risks and vulnerabilities, it is necessary to provide a scalable security situation awareness tool to assist the system administrator in protecting the critical assets, as well as managing the security state of the system. There are…

With the increasing complexity of computing systems and the rise in the number of risks and vulnerabilities, it is necessary to provide a scalable security situation awareness tool to assist the system administrator in protecting the critical assets, as well as managing the security state of the system. There are many methods to provide security states' analysis and management. For instance, by using a Firewall to manage the security state, and/or a graphical analysis tools such as attack graphs for analysis.

Attack Graphs are powerful graphical security analysis tools as they provide a visual representation of all possible attack scenarios that an attacker may take to exploit system vulnerabilities. The attack graph's scalability, however, is a major concern for enumerating all possible attack scenarios as it is considered an NP-complete problem. There have been many research work trying to come up with a scalable solution for the attack graph. Nevertheless, non-practical attack graph based solutions have been used in practice for realtime security analysis.

In this thesis, a new framework, namely 3S (Scalable Security Sates) analysis framework is proposed, which present a new approach of utilizing Software-Defined Networking (SDN)-based distributed firewall capabilities and the concept of stateful data plane to construct scalable attack graphs in near-realtime, which is a practical approach to use attack graph for realtime security decisions. The goal of the proposed work is to control reachability information between different datacenter segments to reduce the dependencies among vulnerabilities and restrict the attack graph analysis in a relative small scope. The proposed framework is based on SDN's programmable capabilities to adjust the distributed firewall policies dynamically according to security situations during the running time. It apply white-list-based security policies to limit the attacker's capability from moving or exploiting different segments by only allowing uni-directional vulnerability dependency links between segments. Specifically, several test cases will be presented with various attack scenarios and analyze how distributed firewall and stateful SDN data plan can significantly reduce the security states construction and analysis. The proposed approach proved to achieve a percentage of improvement over 61% in comparison with prior modules were SDN and distributed firewall are not in use.

ContributorsSabur, Abdulhakim (Author) / Huang, Dijiang (Thesis advisor) / Zhang, Yancho (Committee member) / Shakarian, Paulo (Committee member) / Arizona State University (Publisher)

Created2018