Search Content

Matching Items (2)

Filtering by

All Subjects: Text Mining
Creators: Gaffar, Ashraf

Hierarchical Sequential Event Prediction and Translation from Aviation Accident Report Data

Description

Sequential event prediction or sequential pattern mining is a well-studied topic in the literature. There are a lot of real-world scenarios where the data is released sequentially. People believe that there exist repetitive patterns of event sequences so that the future events can be predicted. For example, many companies build their recommender system to predict the next possible product for the users according to their purchase history. The healthcare system discovers the relationships among patients’ sequential symptoms to mitigate the adverse effect of a treatment (drugs or surgery). Modern engineering systems like aviation/distributed computing/energy systems diagnosed failure event logs and took prompt actions to avoid disaster when a similar failure pattern occurs. In this dissertation, I specifically focus on building a scalable algorithm for event prediction and extraction in the aviation domain. Understanding the accident event is always the major concern of the safety issue in the aviation system. A flight accident is often caused by a sequence of failure events. Accurate modeling of the failure event sequence and how it leads to the final accident is important for aviation safety. This work aims to study the relationship of the failure event sequence and evaluate the risk of the final accident according to these failure events. There are three major challenges I am trying to deal with. (1) Modeling Sequential Events with Hierarchical Structure: I aim to improve the prediction accuracy by taking advantage of the multi-level or hierarchical representation of these rare events. Specifically, I proposed to build a sequential Encoder-Decoder framework with a hierarchical embedding representation of the events. (2) Lack of high-quality and consistent event log data: In order to acquire more accurate event data from aviation accident reports, I convert the problem into a multi-label classification. An attention-based Bidirectional Encoder Representations from Transformers model is developed to achieve good performance and interpretability. (3) Ontology-based event extraction: In order to extract detailed events, I proposed to solve the problem as a hierarchical classification task. I improve the model performance by incorporating event ontology. By solving these three challenges, I provide a framework to extract events from narrative reports and estimate the risk level of aviation accidents through event sequence modeling.

ContributorsZhao, Xinyu (Author) / Yan, Hao (Thesis advisor) / Liu, Yongming (Committee member) / Ju, Feng (Committee member) / Iquebal, Ashif (Committee member) / Arizona State University (Publisher)

Created2022

A study of text mining framework for automated classification of software requirements in enterprise systems

Description

Text Classification is a rapidly evolving area of Data Mining while Requirements Engineering is a less-explored area of Software Engineering which deals the process of defining, documenting and maintaining a software system's requirements. When researchers decided to blend these two streams in, there was research on automating the process of classification of software requirements statements into categories easily comprehensible for developers for faster development and delivery, which till now was mostly done manually by software engineers - indeed a tedious job. However, most of the research was focused on classification of Non-functional requirements pertaining to intangible features such as security, reliability, quality and so on. It is indeed a challenging task to automatically classify functional requirements, those pertaining to how the system will function, especially those belonging to different and large enterprise systems. This requires exploitation of text mining capabilities. This thesis aims to investigate results of text classification applied on functional software requirements by creating a framework in R and making use of algorithms and techniques like k-nearest neighbors, support vector machine, and many others like boosting, bagging, maximum entropy, neural networks and random forests in an ensemble approach. The study was conducted by collecting and visualizing relevant enterprise data manually classified previously and subsequently used for training the model. Key components for training included frequency of terms in the documents and the level of cleanliness of data. The model was applied on test data and validated for analysis, by studying and comparing parameters like precision, recall and accuracy.

ContributorsSwadia, Japa (Author) / Ghazarian, Arbi (Thesis advisor) / Bansal, Srividya (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)

Created2016