Matching Items (2)
Filtering by

Clear all filters

134346-Thumbnail Image.png
Description
Malware forensics is a time-consuming process that involves a significant amount of data collection. To ease the load on security analysts, many attempts have been made to automate the intelligence gathering process and provide a centralized search interface. Certain of these solutions map existing relations between threats and can discover

Malware forensics is a time-consuming process that involves a significant amount of data collection. To ease the load on security analysts, many attempts have been made to automate the intelligence gathering process and provide a centralized search interface. Certain of these solutions map existing relations between threats and can discover new intelligence by identifying correlations in the data. However, such systems generally treat each unique malware sample as its own distinct threat. This fails to model the real malware landscape, in which so many ``new" samples are actually variants of samples that have already been discovered. Were there some way to reliably determine whether two malware samples belong to the same family, intelligence for one sample could be applied to any sample in the family, greatly reducing the complexity of intelligence synthesis. Clustering is a common big data approach for grouping data samples which have common features, and has been applied in several recent papers for identifying related malware. It therefore has the potential to be used as described to simplify the intelligence synthesis process. However, existing threat intelligence systems do not use malware clustering. In this paper, we attempt to design a highly accurate malware clustering system, with the ultimate goal of integrating it into a threat intelligence platform. Toward this end, we explore the many considerations of designing such a system: how to extract features to compare malware, and how to use these features for accurate clustering. We then create an experimental clustering system, and evaluate its effectiveness using two different clustering algorithms.
ContributorsSmith, Joshua Michael (Author) / Ahn, Gail-Joon (Thesis director) / Zhao, Ziming (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)
Created2017-05
155706-Thumbnail Image.png
Description
The volume and frequency of cyber attacks have exploded in recent years. Organizations subscribe to multiple threat intelligence feeds to increase their knowledge base and better equip their security teams with the latest information in threat intelligence domain. Though such subscriptions add intelligence and can help in taking more informed

The volume and frequency of cyber attacks have exploded in recent years. Organizations subscribe to multiple threat intelligence feeds to increase their knowledge base and better equip their security teams with the latest information in threat intelligence domain. Though such subscriptions add intelligence and can help in taking more informed decisions, organizations have to put considerable efforts in facilitating and analyzing a large number of threat indicators. This problem worsens further, due to a large number of false positives and irrelevant events detected as threat indicators by existing threat feed sources. It is often neither practical nor cost-effective to analyze every single alert considering the staggering volume of indicators. The very reason motivates to solve the overcrowded threat indicators problem by prioritizing and filtering them.

To overcome above issue, I explain the necessity of determining how likely a reported indicator is malicious given the evidence and prioritizing it based on such determination. Confidence Score Measurement system (CSM) introduces the concept of confidence score, where it assigns a score of being malicious to a threat indicator based on the evaluation of different threat intelligence systems. An indicator propagates maliciousness to adjacent indicators based on relationship determined from behavior of an indicator. The propagation algorithm derives final confidence to determine overall maliciousness of the threat indicator. CSM can prioritize the indicators based on confidence score; however, an analyst may not be interested in the entire result set, so CSM narrows down the results based on the analyst-driven input. To this end, CSM introduces the concept of relevance score, where it combines the confidence score with analyst-driven search by applying full-text search techniques. It prioritizes the results based on relevance score to provide meaningful results to the analyst. The analysis shows the propagation algorithm of CSM linearly scales with larger datasets and achieves 92% accuracy in determining threat indicators. The evaluation of the result demonstrates the effectiveness and practicality of the approach.
ContributorsModi, Ajay (Author) / Ahn, Gail-Joon (Thesis advisor) / Zhao, Ziming (Committee member) / Doupe, Adam (Committee member) / Arizona State University (Publisher)
Created2017