Filtering by
- All Subjects: Machine Learning
attacks. However, this task is difficult for a variety of reasons. In simple terms, it is difficult
to determine who the attacker is, what the desired goals are of the attacker, and how they will
carry out their attacks. These three questions essentially entail understanding the attacker’s
use of deception, the capabilities available, and the intent of launching the attack. These
three issues are highly inter-related. If an adversary can hide their intent, they can better
deceive a defender. If an adversary’s capabilities are not well understood, then determining
what their goals are becomes difficult as the defender is uncertain if they have the necessary
tools to accomplish them. However, the understanding of these aspects are also mutually
supportive. If we have a clear picture of capabilities, intent can better be deciphered. If we
understand intent and capabilities, a defender may be able to see through deception schemes.
In this dissertation, I present three pieces of work to tackle these questions to obtain
a better understanding of cyber threats. First, we introduce a new reasoning framework
to address deception. We evaluate the framework by building a dataset from DEFCON
capture-the-flag exercise to identify the person or group responsible for a cyber attack.
We demonstrate that the framework not only handles cases of deception but also provides
transparent decision making in identifying the threat actor. The second task uses a cognitive
learning model to determine the intent – goals of the threat actor on the target system.
The third task looks at understanding the capabilities of threat actors to target systems by
identifying at-risk systems from hacker discussions on darkweb websites. To achieve this
task we gather discussions from more than 300 darkweb websites relating to malicious
hacking.
Historically, the predominant strategy for evaluating baseball pitchers has been through statistics created directly from the offensive production against the pitcher, such as ERA. Such statistics are inherently relative to the abilities and competition level of the opposing offense and the field defense, which the pitcher has no control over, making it difficult to compare pitchers across leagues. In this paper, I use cutting edge pitch-tracking data to develop a pitch evaluation model that is intrinsic to the attributes of the pitches themselves, and not influenced directly by the outcomes of each individual pitch. I train four different classifiers to predict the probability of each pitch belonging to different subsets of outcomes, then multiply the probability of each outcome by that outcome’s average run value to arrive at an expected run value for the pitch. I compare the performance of each classifier to a baseline, examine the most impactful features, and compare the top pitchers identified by the model to those identified by a different baseball statistics resource, ultimately concluding that three of the four classification models are productive and that the overall intrinsic evaluation model accurately identifies the sports top performers.
Despite this uniqueness, almost no scientific work has been performed on this public social network. Thus, it is unclear what user interaction features present on other social networks exist on Twitch. Investigating the interactions between users and identifying which, if any, of the common user behaviors on social network exist on Twitch is an important step in understanding how Twitch fits in to the social media ecosystem. For example, there are users that have large followings on Twitch and amass a large number of viewers, but do those users exert influence over the behavior of other user the way that popular users on Twitter do?
This task, however, will not be trivial. The same hyper-focus on live content that makes Twitch unique in the social network space invalidates many of the traditional approaches to social network analysis. Thus, new algorithms and techniques must be developed in order to tap this data source. In this thesis, a novel algorithm for finding games whose releases have made a significant impact on the network is described as well as a novel algorithm for detecting and identifying influential players of games. In addition, the Twitch network is described in detail along with the data that was collected in order to power the two previously described algorithms.
To address the above mentioned challenges, in this dissertation I investigate the propagation of online malicious information from two broad perspectives: (1) content posted by users and (2) information cascades formed by resharing mechanisms in social media. More specifically, first, non-parametric and semi-supervised learning algorithms are introduced to discern potential patterns of human trafficking activities that are of high interest to law enforcement. Second, a time-decay causality-based framework is introduced for early detection of “Pathogenic Social Media (PSM)” accounts (e.g., terrorist supporters). Third, due to the lack of sufficient annotated data for training PSM detection approaches, a semi-supervised causal framework is proposed that utilizes causal-related attributes from unlabeled instances to compensate for the lack of enough labeled data. Fourth, a feature-driven approach for PSM detection is introduced that leverages different sets of attributes from users’ causal activities, account-level and content-related information as well as those from URLs shared by users.