The classification of domain concepts in object-oriented systems

Carey, Maurice

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The…

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

Copyright Statement

Reuse Permissions

Downloads

pdf (970.7 KB)

Details

Title

The classification of domain concepts in object-oriented systems

Contributors

Carey, Maurice (Author)
Colbourn, Charles (Thesis advisor)
Collofello, James (Thesis advisor)
Davulcu, Hasan (Committee member)
Sarjoughian, Hessam S. (Committee member)
Ye, Jieping (Committee member)
Arizona State University (Publisher)

Date Created

2013

Subjects

Resource Type

Text

Collections this item is in

ASU Electronic Theses and Dissertations

Note

Partial requirement for: Ph.D., Arizona State University, 2013

Note type

thesis
Includes bibliographical references (p. 77-81)

Note type

bibliography
Field of study: Computer science

The classification of domain concepts in object-oriented systems

Details

Citation and reuse

Statement of Responsibility

Machine-readable links