There currently exist various challenges in learning cybersecuirty knowledge, along with a shortage of experts in the related areas, while the demand for such talents keeps growing. Unlike other topics related to the computer system such as computer architecture and computer network, cybersecurity is a multidisciplinary topic involving scattered technologies, which yet remains blurry for its future direction. Constructing a knowledge graph (KG) in cybersecurity education is a first step to address the challenges and improve the academic learning efficiency.
With the advancement of big data and Natural Language Processing (NLP) technologies, constructing large KGs and mining concepts, from unstructured text by using learning methodologies, become possible. The NLP-based KG with the semantic similarity between concepts has brought inspiration to different industrial applications, yet far from completeness in the domain expertise, including education in computer science related fields.
In this research work, a KG in cybersecurity area has been constructed using machine-learning-based word embedding (i.e., mapping a word or phrase onto a vector of low dimensions) and hyperlink-based concept mining from the full dataset of words available using the latest Wikipedia dump. The different approaches in corpus training are compared and the performance based on different similarity tasks is evaluated. As a result, the best performance of trained word vectors has been applied, which is obtained by using Skip-Gram model of Word2Vec, to construct the needed KG. In order to improve the efficiency of knowledge learning, a web-based front-end is constructed to visualize the KG, which provides the convenience in browsing related materials and searching for cybersecurity-related concepts and independence relations.