Search Content

Topic chains for determining risk of unauthorized information transfer

Description

Corporations invest considerable resources to create, preserve and analyze

their data; yet while organizations are interested in protecting against

unauthorized data transfer, there lacks a comprehensive metric to discriminate

what data are at risk of leaking.

This thesis motivates the need for a quantitative leakage risk metric, and

provides a risk assessment system,…

Corporations invest considerable resources to create, preserve and analyze

their data; yet while organizations are interested in protecting against

unauthorized data transfer, there lacks a comprehensive metric to discriminate

what data are at risk of leaking.

This thesis motivates the need for a quantitative leakage risk metric, and

provides a risk assessment system, called Whispers, for computing it. Using

unsupervised machine learning techniques, Whispers uncovers themes in an

organization's document corpus, including previously unknown or unclassified

data. Then, by correlating the document with its authors, Whispers can

identify which data are easier to contain, and conversely which are at risk.

Using the Enron email database, Whispers constructs a social network segmented

by topic themes. This graph uncovers communication channels within the

organization. Using this social network, Whispers determines the risk of each

topic by measuring the rate at which simulated leaks are not detected. For the

Enron set, Whispers identified 18 separate topic themes between January 1999

and December 2000. The highest risk data emanated from the legal department

with a leakage risk as high as 60%.

ContributorsWright, Jeremy (Author) / Syrotiuk, Violet (Thesis advisor) / Davulcu, Hasan (Committee member) / Yau, Stephen (Committee member) / Arizona State University (Publisher)

Created2014

Managing a user's vulnerability on a social networking site

Description

Users often join an online social networking (OSN) site, like Facebook, to remain social, by either staying connected with friends or expanding social networks. On an OSN site, users generally share variety of personal information which is often expected to be visible to their friends, but sometimes vulnerable to…

Users often join an online social networking (OSN) site, like Facebook, to remain social, by either staying connected with friends or expanding social networks. On an OSN site, users generally share variety of personal information which is often expected to be visible to their friends, but sometimes vulnerable to unwarranted access from others. The recent study suggests that many personal attributes, including religious and political affiliations, sexual orientation, relationship status, age, and gender, are predictable using users' personal data from an OSN site. The majority of users want to remain socially active, and protect their personal data at the same time. This tension leads to a user's vulnerability, allowing privacy attacks which can cause physical and emotional distress to a user, sometimes with dire consequences. For example, stalkers can make use of personal information available on an OSN site to their personal gain. This dissertation aims to systematically study a user vulnerability against such privacy attacks.

A user vulnerability can be managed in three steps: (1) identifying, (2) measuring and (3) reducing a user vulnerability. Researchers have long been identifying vulnerabilities arising from user's personal data, including user names, demographic attributes, lists of friends, wall posts and associated interactions, multimedia data such as photos, audios and videos, and tagging of friends. Hence, this research first proposes a way to measure and reduce a user vulnerability to protect such personal data. This dissertation also proposes an algorithm to minimize a user's vulnerability while maximizing their social utility values.

To address these vulnerability concerns, social networking sites like Facebook usually let their users to adjust their profile settings so as to make some of their data invisible. However, users sometimes interact with others using unprotected posts (e.g., posts from a ``Facebook page\footnote{The term ''Facebook page`` refers to the page which are commonly dedicated for businesses, brands and organizations to share their stories and connect with people.}''). Such interactions help users to become more social and are publicly accessible to everyone. Thus, visibilities of these interactions are beyond the control of their profile settings. I explore such unprotected interactions so that users' are well aware of these new vulnerabilities and adopt measures to mitigate them further. In particular, {\em are users' personal attributes predictable using only the unprotected interactions}? To answer this question, I address a novel problem of predictability of users' personal attributes with unprotected interactions. The extreme sparsity patterns in users' unprotected interactions pose a serious challenge. Therefore, I approach to mitigating the data sparsity challenge by designing a novel attribute prediction framework using only the unprotected interactions. Experimental results on Facebook dataset demonstrates that the proposed framework can predict users' personal attributes.

ContributorsGundecha, Pritam S (Author) / Liu, Huan (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Ye, Jieping (Committee member) / Barbier, Geoffrey (Committee member) / Arizona State University (Publisher)

Created2015

Reasoning about Cyber Threat Actors

Description

Reasoning about the activities of cyber threat actors is critical to defend against cyber

attacks. However, this task is difficult for a variety of reasons. In simple terms, it is difficult

to determine who the attacker is, what the desired goals are of the attacker, and how they will

carry out their attacks.…

Reasoning about the activities of cyber threat actors is critical to defend against cyber

attacks. However, this task is difficult for a variety of reasons. In simple terms, it is difficult

to determine who the attacker is, what the desired goals are of the attacker, and how they will

carry out their attacks. These three questions essentially entail understanding the attacker’s

use of deception, the capabilities available, and the intent of launching the attack. These

three issues are highly inter-related. If an adversary can hide their intent, they can better

deceive a defender. If an adversary’s capabilities are not well understood, then determining

what their goals are becomes difficult as the defender is uncertain if they have the necessary

tools to accomplish them. However, the understanding of these aspects are also mutually

supportive. If we have a clear picture of capabilities, intent can better be deciphered. If we

understand intent and capabilities, a defender may be able to see through deception schemes.

In this dissertation, I present three pieces of work to tackle these questions to obtain

a better understanding of cyber threats. First, we introduce a new reasoning framework

to address deception. We evaluate the framework by building a dataset from DEFCON

capture-the-flag exercise to identify the person or group responsible for a cyber attack.

We demonstrate that the framework not only handles cases of deception but also provides

transparent decision making in identifying the threat actor. The second task uses a cognitive

learning model to determine the intent – goals of the threat actor on the target system.

The third task looks at understanding the capabilities of threat actors to target systems by

identifying at-risk systems from hacker discussions on darkweb websites. To achieve this

task we gather discussions from more than 300 darkweb websites relating to malicious

hacking.

ContributorsNunes, Eric (Author) / Shakarian, Paulo (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Baral, Chitta (Committee member) / Cooke, Nancy J. (Committee member) / Arizona State University (Publisher)

Created2018

Cyber Attacks Detection and Mitigation in SDN Environments

Description

Cyber-systems and networks are the target of different types of cyber-threats and attacks, which are becoming more common, sophisticated, and damaging. Those attacks can vary in the way they are performed. However, there are similar strategies

and tactics often used because they are time-proven to be effective. The motivations behind cyber-attacks…

Cyber-systems and networks are the target of different types of cyber-threats and attacks, which are becoming more common, sophisticated, and damaging. Those attacks can vary in the way they are performed. However, there are similar strategies

and tactics often used because they are time-proven to be effective. The motivations behind cyber-attacks play an important role in designating how attackers plan and proceed to achieve their goals. Generally, there are three categories of motivation

are: political, economical, and socio-cultural motivations. These indicate that to defend against possible attacks in an enterprise environment, it is necessary to consider what makes such an enterprise environment a target. That said, we can understand

what threats to consider and how to deploy the right defense system. In other words, detecting an attack depends on the defenders having a clear understanding of why they become targets and what possible attacks they should expect. For instance,

attackers may preform Denial of Service (DoS), or even worse Distributed Denial of Service (DDoS), with intention to cause damage to targeted organizations and prevent legitimate users from accessing their services. However, in some cases, attackers are very skilled and try to hide in a system undetected for a long period of time with the incentive to steal and collect data rather than causing damages.

Nowadays, not only the variety of attack types and the way they are launched are important. However, advancement in technology is another factor to consider. Over the last decades, we have experienced various new technologies. Obviously, in the beginning, new technologies will have their own limitations before they stand out. There are a number of related technical areas whose understanding is still less than satisfactory, and in which long-term research is needed. On the other hand, these new technologies can boost the advancement of deploying security solutions and countermeasures when they are carefully adapted. That said, Software Defined Networking i(SDN), its related security threats and solutions, and its adaption in enterprise environments bring us new chances to enhance our security solutions. To reach the optimal level of deploying SDN technology in enterprise environments, it is important to consider re-evaluating current deployed security solutions in traditional networks before deploying them to SDN-based infrastructures. Although DDoS attacks are a bit sinister, there are other types of cyber-threats that are very harmful, sophisticated, and intelligent. Thus, current security defense solutions to detect DDoS cannot detect them. These kinds of attacks are complex, persistent, and stealthy, also referred to Advanced Persistent Threats (APTs) which often leverage the bot control and remotely access valuable information. APT uses multiple stages to break into a network. APT is a sort of unseen, continuous and long-term penetrative network and attackers can bypass the existing security detection systems. It can modify and steal the sensitive data as well as specifically cause physical damage the target system. In this dissertation, two cyber-attack motivations are considered: sabotage, where the motive is the destruction; and information theft, where attackers aim to acquire invaluable information (customer info, business information, etc). I deal with two types of attacks (DDoS attacks and APT attacks) where DDoS attacks are classified under sabotage motivation category, and the APT attacks are classified under information theft motivation category. To detect and mitigate each of these attacks, I utilize the ease of programmability in SDN and its great platform for implementation, dynamic topology changes, decentralized network management, and ease of deploying security countermeasures.

ContributorsAlshamrani, Adel (Author) / Huang, Dijiang (Thesis advisor) / Doupe, Adam (Committee member) / Ahn, Gail-Joon (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2018

Malware Analysis and Classification Framework: Detecting Financial Malware Using Machine Learning Techniques

Description

Malware that perform identity theft or steal bank credentials are becoming increasingly common and can cause millions of dollars of damage annually. A large area of research focus is the automated detection and removal of such malware, due to their large impact on millions of people each year. Such a…

Malware that perform identity theft or steal bank credentials are becoming increasingly common and can cause millions of dollars of damage annually. A large area of research focus is the automated detection and removal of such malware, due to their large impact on millions of people each year. Such a detector will be beneficial to any industry that is regularly the target of malware, such as the financial sector. Typical detection approaches such as those found in commercial anti-malware software include signature-based scanning, in which malware executables are identified based on a unique signature or fingerprint developed for that malware. However, as malware authors continue to modify and obfuscate their malware, heuristic detection is increasingly popular, in which the behaviors of the malware are identified and patterns recognized. We explore a malware analysis and classification framework using machine learning to train classifiers to distinguish between malware and benign programs based upon their features and behaviors. Using both decision tree learning and support vector machines as classifier models, we obtained overall classification accuracies of around 80%. Due to limitations primarily including the usage of a small data set, our approach may not be suitable for practical classification of malware and benign programs, as evident by a high error rate.

ContributorsAnwar, Sajid (Co-author) / Chan, Tsz (Co-author) / Ahn, Gail-Joon (Thesis director) / Zhao, Ziming (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

E-mail header injections - an analysis of the World Wide Web

Description

E-Mail header injection vulnerability is a class of vulnerability that can occur in web applications that use user input to construct e-mail messages. E-Mail injection is possible when the mailing script fails to check for the presence of e-mail headers in user input (either form fields or URL parameters). The…

E-Mail header injection vulnerability is a class of vulnerability that can occur in web applications that use user input to construct e-mail messages. E-Mail injection is possible when the mailing script fails to check for the presence of e-mail headers in user input (either form fields or URL parameters). The vulnerability exists in the reference implementation of the built-in “mail” functionality in popular languages like PHP, Java, Python, and Ruby. With the proper injection string, this vulnerability can be exploited to inject additional headers and/or modify existing headers in an e-mail message, allowing an attacker to completely alter the content of the e-mail.

This thesis develops a scalable mechanism to automatically detect E-Mail Header Injection vulnerability and uses this mechanism to quantify the prevalence of E- Mail Header Injection vulnerabilities on the Internet. Using a black-box testing approach, the system crawled 21,675,680 URLs to find URLs which contained form fields. 6,794,917 such forms were found by the system, of which 1,132,157 forms contained e-mail fields. The system used this data feed to discern the forms that could be fuzzed with malicious payloads. Amongst the 934,016 forms tested, 52,724 forms were found to be injectable with more malicious payloads. The system tested 46,156 of these and was able to find 496 vulnerable URLs across 222 domains, which proves that the threat is widespread and deserves future research attention.

ContributorsChandramouli, Sai Prashanth (Author) / Doupe, Adam (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Zhao, Ziming (Committee member) / Arizona State University (Publisher)

Created2016

An investigation of machine learning for password evaluation

Description

Passwords are ubiquitous and are poised to stay that way due to their relative usability, security and deployability when compared with alternative authentication schemes. Unfortunately, humans struggle with some of the assumptions or requirements that are necessary for truly strong passwords. As administrators try to push users towards password complexity…

Passwords are ubiquitous and are poised to stay that way due to their relative usability, security and deployability when compared with alternative authentication schemes. Unfortunately, humans struggle with some of the assumptions or requirements that are necessary for truly strong passwords. As administrators try to push users towards password complexity and diversity, users still end up using predictable mangling patterns on old passwords and reusing the same passwords across services; users even inadvertently converge on the same patterns to a surprising degree, making an attacker’s job easier. This work explores using machine learning techniques to pick out strong passwords from weak ones, from a dataset of 10 million passwords, based on how structurally similar they were to the rest of the set.

ContributorsTodd, Margaret Nicole (Author) / Xue, Guoliang (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2016

iGen: Toward Automatic Generation and Analysis of Indicators of Compromise (IOCs) using Convolutional Neural Network

Description

Field of cyber threats is evolving rapidly and every day multitude of new information about malware and Advanced Persistent Threats (APTs) is generated in the form of malware reports, blog articles, forum posts, etc. However, current Threat Intelligence (TI) systems have several limitations. First, most of the TI systems examine…

Field of cyber threats is evolving rapidly and every day multitude of new information about malware and Advanced Persistent Threats (APTs) is generated in the form of malware reports, blog articles, forum posts, etc. However, current Threat Intelligence (TI) systems have several limitations. First, most of the TI systems examine and interpret data manually with the help of analysts. Second, some of them generate Indicators of Compromise (IOCs) directly using regular expressions without understanding the contextual meaning of those IOCs from the data sources which allows the tools to include lot of false positives. Third, lot of TI systems consider either one or two data sources for the generation of IOCs, and misses some of the most valuable IOCs from other data sources.

To overcome these limitations, we propose iGen, a novel approach to fully automate the process of IOC generation and analysis. Proposed approach is based on the idea that our model can understand English texts like human beings, and extract the IOCs from the different data sources intelligently. Identification of the IOCs is done on the basis of the syntax and semantics of the sentence as well as context words (e.g., ``attacked'', ``suspicious'') present in the sentence which helps the approach work on any kind of data source. Our proposed technique, first removes the words with no contextual meaning like stop words and punctuations etc. Then using the rest of the words in the sentence and output label (IOC or non-IOC sentence), our model intelligently learn to classify sentences into IOC and non-IOC sentences. Once IOC sentences are identified using this learned Convolutional Neural Network (CNN) based approach, next step is to identify the IOC tokens (like domains, IP, URL) in the sentences. This CNN based classification model helps in removing false positives (like IPs which are not malicious). Afterwards, IOCs extracted from different data sources are correlated to find the links between thousands of apparently unrelated attack instances, particularly infrastructures shared between them. Our approach fully automates the process of IOC generation from gathering data from different sources to creating rules (e.g. OpenIOC, snort rules, STIX rules) for deployment on

the security infrastructure.

iGen has collected around 400K IOCs till now with a precision of 95\%, better than any state-of-art method.

ContributorsPanwar, Anupam (Author) / Ahn, Gail-Joon (Thesis advisor) / Doupe, Adam (Committee member) / Zhao, Ziming (Committee member) / Arizona State University (Publisher)

Created2017

An Image Analysis Environment for Species Identification of Food Contaminating Beetles

Description

Food safety is vital to the well-being of society; therefore, it is important to inspect food products to ensure minimal health risks are present. A crucial phase of food inspection is the identification of foreign particles found in the sample, such as insect body parts. The presence of certain species…

Food safety is vital to the well-being of society; therefore, it is important to inspect food products to ensure minimal health risks are present. A crucial phase of food inspection is the identification of foreign particles found in the sample, such as insect body parts. The presence of certain species of insects, especially storage beetles, is a reliable indicator of possible contamination during storage and food processing. However, the current approach to identifying species is visual examination by human analysts; this method is rather subjective and time-consuming. Furthermore, confident identification requires extensive experience and training. To aid this inspection process, we have developed in collaboration with FDA analysts some image analysis-based machine intelligence to achieve species identification with up to 90% accuracy. The current project is a continuation of this development effort. Here we present an image analysis environment that allows practical deployment of the machine intelligence on computers with limited processing power and memory. Using this environment, users can prepare input sets by selecting images for analysis, and inspect these images through the integrated pan, zoom, and color analysis capabilities. After species analysis, the results panel allows the user to compare the analyzed images with referenced images of the proposed species. Further additions to this environment should include a log of previously analyzed images, and eventually extend to interaction with a central cloud repository of images through a web-based interface. Additional issues to address include standardization of image layout, extension of the feature-extraction algorithm, and utilizing image classification to build a central search engine for widespread usage.

ContributorsMartin, Daniel Luis (Author) / Ahn, Gail-Joon (Thesis director) / DoupÃÂ©, Adam (Committee member) / Xu, Joshua (Committee member) / Computer Science and Engineering Program (Contributor) / Department of Finance (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

PyAntiPhish: A Python-Based Machine Learning Detector of Phishing Websites and An Examination of Relevant URL-Based Features

Description

Phishing is one of most common and effective attack vectors in modern cybercrime. Rather than targeting a technical vulnerability in a computer system, phishing attacks target human behavioral or emotional tendencies through manipulative emails, text messages, or phone calls. Through PyAntiPhish, I attempt to create my own version of an…

Phishing is one of most common and effective attack vectors in modern cybercrime. Rather than targeting a technical vulnerability in a computer system, phishing attacks target human behavioral or emotional tendencies through manipulative emails, text messages, or phone calls. Through PyAntiPhish, I attempt to create my own version of an anti-phishing solution, through a series of experiments testing different machine learning classifiers and URL features. With an end-goal implementation as a Chromium browser extension utilizing Python-based machine learning classifiers (those available via the scikit-learn library), my project uses a combination of Python, TypeScript, Node.js, as well as AWS Lambda and API Gateway to act as a solution capable of blocking phishing attacks from the web browser.

ContributorsYang, Branden (Author) / Osburn, Steven (Thesis director) / Malpe, Adwith (Committee member) / Ahn, Gail-Joon (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2024-05

Filtering by