Filtering by
- All Subjects: Machine Learning
- Creators: Bansal, Ajay
- Creators: Baral, Chitta
- Member of: Theses and Dissertations
attacks. However, this task is difficult for a variety of reasons. In simple terms, it is difficult
to determine who the attacker is, what the desired goals are of the attacker, and how they will
carry out their attacks. These three questions essentially entail understanding the attacker’s
use of deception, the capabilities available, and the intent of launching the attack. These
three issues are highly inter-related. If an adversary can hide their intent, they can better
deceive a defender. If an adversary’s capabilities are not well understood, then determining
what their goals are becomes difficult as the defender is uncertain if they have the necessary
tools to accomplish them. However, the understanding of these aspects are also mutually
supportive. If we have a clear picture of capabilities, intent can better be deciphered. If we
understand intent and capabilities, a defender may be able to see through deception schemes.
In this dissertation, I present three pieces of work to tackle these questions to obtain
a better understanding of cyber threats. First, we introduce a new reasoning framework
to address deception. We evaluate the framework by building a dataset from DEFCON
capture-the-flag exercise to identify the person or group responsible for a cyber attack.
We demonstrate that the framework not only handles cases of deception but also provides
transparent decision making in identifying the threat actor. The second task uses a cognitive
learning model to determine the intent – goals of the threat actor on the target system.
The third task looks at understanding the capabilities of threat actors to target systems by
identifying at-risk systems from hacker discussions on darkweb websites. To achieve this
task we gather discussions from more than 300 darkweb websites relating to malicious
hacking.
The verses generated by the system are evaluated using rhyme, rhythm, syllable counts and stress patterns. These computational features of language are considered for generating haikus, limericks and iambic pentameter verses. The generated poems are evaluated using a Turing test on both experts and non-experts. The user study finds that only 38% computer generated poems were correctly identified by nonexperts while 65% of the computer generated poems were correctly identified by experts. Although the system does not pass the Turing test, the results from the Turing test suggest an improvement of over 17% when compared to previous methods which use Turing tests to evaluate poetry generators.
received increasing attention in recent years. The availability of sheer amounts of
user-generated data presents data scientists both opportunities and challenges. Opportunities are presented with additional data sources. The abundant link information
in social networks could provide another rich source in deriving implicit information
for social data mining. However, the vast majority of existing studies overwhelmingly
focus on positive links between users while negative links are also prevailing in real-
world social networks such as distrust relations in Epinions and foe links in Slashdot.
Though recent studies show that negative links have some added value over positive
links, it is dicult to directly employ them because of its distinct characteristics from
positive interactions. Another challenge is that label information is rather limited
in social media as the labeling process requires human attention and may be very
expensive. Hence, alternative criteria are needed to guide the learning process for
many tasks such as feature selection and sentiment analysis.
To address above-mentioned issues, I study two novel problems for signed social
networks mining, (1) unsupervised feature selection in signed social networks; and
(2) unsupervised sentiment analysis with signed social networks. To tackle the first problem, I propose a novel unsupervised feature selection framework SignedFS. In
particular, I model positive and negative links simultaneously for user preference
learning, and then embed the user preference learning into feature selection. To study the second problem, I incorporate explicit sentiment signals in textual terms and
implicit sentiment signals from signed social networks into a coherent model Signed-
Senti. Empirical experiments on real-world datasets corroborate the effectiveness of
these two frameworks on the tasks of feature selection and sentiment analysis.
In this thesis, several different methods for detecting and removing satellite streaks from astronomic images were evaluated and compared with a new machine learning based approach. Simulated data was generated with a variety of conditions, and the performance of each method was evaluated both quantitatively, using Mean Absolute Error (MAE) against a ground truth detection mask and processing throughput of the method, as well as qualitatively, examining the situations in which each model performs well and poorly. Detection methods from existing systems Pyradon and ASTRiDE were implemented and tested. A machine learning (ML) image segmentation model was trained on simulated data and used to detect streaks in test data. The ML model performed favorably relative to the traditional methods tested, and demonstrated superior robustness in general. However, the model also exhibited some unpredictable behavior in certain scenarios which should be considered. This demonstrated that machine learning is a viable tool for the detection of satellite streaks in astronomic images, however special care must be taken to prevent and to minimize the effects of unpredictable behavior in such models.