Search Content

The classification of domain concepts in object-oriented systems

Description

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software…

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

ContributorsCarey, Maurice (Author) / Colbourn, Charles (Thesis advisor) / Collofello, James (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Application of a temporal database framework for processing event queries

Description

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query…

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to support temporal queries. The language features and formal semantic extensions to extend the relational framework are identified. The extended framework supports continuous, step-wise evaluation of temporal queries. The incremental evaluation of TEQL operators is formalized to avoid re-computation of previous results. The research includes the development of a prototype that supports the integrated event and temporal query processing framework, with support for incremental evaluation and materialization of intermediate results. TEQL enables reporting temporal data in the output, direct specification of conditions over timestamps, and specification of temporal relational operators. Through the integration of temporal database operators with event languages, a new class of temporal queries is made possible for querying event streams. New features include semantic aggregation, extraction of temporal patterns using set operators, and a more accurate specification of event co-occurrence.

ContributorsShiva, Foruhar Ali (Author) / Urban, Susan D (Thesis advisor) / Chen, Yi (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2012

Sentimental bi-partite graph of political blogs

Description

Analysis of political texts, which contains a huge amount of personal political opinions, sentiments, and emotions towards powerful individuals, leaders, organizations, and a large number of people, is an interesting task, which can lead to discover interesting interactions between the political parties and people. Recently, political blogosphere plays an increasingly…

Analysis of political texts, which contains a huge amount of personal political opinions, sentiments, and emotions towards powerful individuals, leaders, organizations, and a large number of people, is an interesting task, which can lead to discover interesting interactions between the political parties and people. Recently, political blogosphere plays an increasingly important role in politics, as a forum for debating political issues. Most of the political weblogs are biased towards their political parties, and they generally express their sentiments towards their issues (i.e. leaders, topics etc.,) and also towards issues of the opposing parties. In this thesis, I have modeled the above interactions/debate as a sentimental bi-partite graph, a bi-partite graph with Blogs forming vertices of a disjoint set, and the issues (i.e. leaders, topics etc.,) forming the other disjoint set,and the edges between the two sets representing the sentiment of the blogs towards the issues. I have used American Political blog data to model the sentimental bi- partite graph, in particular, a set of popular political liberal and conservative blogs that have clearly declared positions. These blogs contain discussion about social, political, economic issues and related key individuals in their conservative/liberal view. To be more focused and more polarized, 22 most popular liberal/conservative blogs of a particular time period, May 2008 - October 2008(because of high intensity of debate and discussions), just before the presidential elections, was considered, involving around 23,800 articles. This thesis involves solving the questions: a) which is the most liberal/conservative blogs on the web? b) Who is on which side of debate and what are the issues? c) Who are the important leaders? d) How do you model the relationship between the participants of the debate and the underlying issues?

ContributorsThirumalai, Dananjayan (Author) / Davulcu, Hasan (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2012

Toward customizable multi-tenant SaaS applications

Description

Nowadays, Computing is so pervasive that it has become indeed the 5th utility (after water, electricity, gas, telephony) as Leonard Kleinrock once envisioned. Evolved from utility computing, cloud computing has emerged as a computing infrastructure that enables rapid delivery of computing resources as a utility in a dynamically…

Nowadays, Computing is so pervasive that it has become indeed the 5th utility (after water, electricity, gas, telephony) as Leonard Kleinrock once envisioned. Evolved from utility computing, cloud computing has emerged as a computing infrastructure that enables rapid delivery of computing resources as a utility in a dynamically scalable, virtualized manner. However, the current industrial cloud computing implementations promote segregation among different cloud providers, which leads to user lockdown because of prohibitive migration cost. On the other hand, Service-Orented Computing (SOC) including service-oriented architecture (SOA) and Web Services (WS) promote standardization and openness with its enabling standards and communication protocols. This thesis proposes a Service-Oriented Cloud Computing Architecture by combining the best attributes of the two paradigms to promote an open, interoperable environment for cloud computing development. Mutil-tenancy SaaS applicantions built on top of SOCCA have more flexibility and are not locked down by a certain platform. Tenants residing on a multi-tenant application appear to be the sole owner of the application and not aware of the existence of others. A multi-tenant SaaS application accommodates each tenant’s unique requirements by allowing tenant-level customization. A complex SaaS application that supports hundreds, even thousands of tenants could have hundreds of customization points with each of them providing multiple options, and this could result in a huge number of ways to customize the application. This dissertation also proposes innovative customization approaches, which studies similar tenants’ customization choices and each individual users behaviors, then provides guided semi-automated customization process for the future tenants. A semi-automated customization process could enable tenants to quickly implement the customization that best suits their business needs.

ContributorsSun, Xin (Author) / Tsai, Wei-Tek (Thesis advisor) / Xue, Guoliang (Committee member) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2016

Test algebra for concurrent combinatorial testing

Description

A new algebraic system, Test Algebra (TA), is proposed for identifying faults in combinatorial testing for SaaS (Software-as-a-Service) applications. In the context of cloud computing, SaaS is a new software delivery model, in which mission-critical applications are composed, deployed, and executed on cloud platforms. Testing SaaS applications is challenging because…

A new algebraic system, Test Algebra (TA), is proposed for identifying faults in combinatorial testing for SaaS (Software-as-a-Service) applications. In the context of cloud computing, SaaS is a new software delivery model, in which mission-critical applications are composed, deployed, and executed on cloud platforms. Testing SaaS applications is challenging because new applications need to be tested once they are composed, and prior to their deployment. A composition of components providing services yields a configuration providing a SaaS application. While individual components

in the configuration may have been thoroughly tested, faults still arise due to interactions among the components composed, making the configuration faulty. When there are k components, combinatorial testing algorithms can be used to identify faulty interactions for t or fewer components, for some threshold 2 <= t <= k on the size of interactions considered. In general these methods do not identify specific faults, but rather indicate the presence or absence of some fault. To identify specific faults, an adaptive testing regime repeatedly constructs and tests configurations in order to determine, for each interaction of interest, whether it is faulty or not. In order to perform such testing in a loosely coupled distributed environment such as

the cloud, it is imperative that testing results can be combined from many different servers. The TA defines rules to permit results to be combined, and to identify the faulty interactions. Using the TA, configurations can be tested concurrently on different servers and in any order. The results, using the TA, remain the same.

ContributorsQi, Guanqiu (Author) / Tsai, Wei-Tek (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Yu, Hongyu (Committee member) / Arizona State University (Publisher)

Created2014

Multi-Tenancy and Sub-Tenancy Architecture in Software-As-A-Service (Saas)

Description

Multi-tenancy architecture (MTA) is often used in Software-as-a-Service (SaaS) and

the central idea is that multiple tenant applications can be developed using compo

nents stored in the SaaS infrastructure. Recently, MTA has been extended where

a tenant application can have its own sub-tenants as the tenant application acts

like a SaaS infrastructure. In other…

Multi-tenancy architecture (MTA) is often used in Software-as-a-Service (SaaS) and

the central idea is that multiple tenant applications can be developed using compo

nents stored in the SaaS infrastructure. Recently, MTA has been extended where

a tenant application can have its own sub-tenants as the tenant application acts

like a SaaS infrastructure. In other words, MTA is extended to STA (Sub-Tenancy

Architecture ). In STA, each tenant application not only need to develop its own

functionalities, but also need to prepare an infrastructure to allow its sub-tenants to

develop customized applications. This dissertation formulates eight models for STA,

and proposes a Variant Point based customization model to help tenants and sub

tenants customize tenant and sub-tenant applications. In addition, this dissertation

introduces Crowd- sourcing to become the core of STA component development life

cycle. To discover ﬁt tenant developers or components to help building and com

posing new components, dynamic and static ranking models are proposed. Further,

rank computation architecture is presented to deal with the case when the number of

tenants and components becomes huge. At last, an experiment is performed to prove

rank models and the rank computation architecture work as design.

ContributorsZhong, Peide (Author) / Davulcu, Hasan (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Huang, Dijiang (Committee member) / Tsai, Wei-Tek (Committee member) / Arizona State University (Publisher)

Created2017

Distributed RDF Storage and Querying Using In-Memory Processing Engine

Description

The proliferation of semantic data in the form of RDF (Resource Description Framework) triples demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing…

The proliferation of semantic data in the form of RDF (Resource Description Framework) triples demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second is that solutions are optimized for certain types of query patterns and don’t necessarily work well for all types, and third is concerned with reducing pre-processing cost. Therefore, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL (SPARQL Protocol and RDF Query Language) query performance regardless of its pattern shape with minimized pre-processing overhead. In this context, the first contribution of this work is a distributed RDF data partitioning schema called 3CStore that extends the existing VP (Vertical Partitioning) approach by using a subset of triples from the VP tables based on different join correlations. This approach speeds up queries at the cost of additional pre-processing overhead. To solve this, a relational partitioning schema called VPExp was developed by splitting predicates based on explicit type information of objects. This approach gains a significant query performance only for the specific type of query where the object is bound to a value for a particular predicate. To get efficient query performance on a wide range of query patterns, an improved solution is proposed by extending the existing Property Table approach to Subset-Property Table and combined with the VP approach. Further investigation on distributed RDF processing and querying systems based on typical use cases led to a novel relational partitioning schema called PTP (Property Table Partitioning) that further partitions the whole Property Table into the number of unique properties to minimize query input size and join operations during query evaluation. Finally, an RDF data management system based on the SPARQL-over-SQL approach called S3QLRDF is developed that generates the optimal query execution plan using statistics of PTP tables to provide efficient SPARQL query processing on a distributed system.

ContributorsHassan, P M Mahmudul Mahmudul (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Davulcu, Hasan (Committee member) / Sarwat Abdelghany Aly Elsayed, Mohamed (Committee member) / Arizona State University (Publisher)

Created2021

ASU Electronic Theses and Dissertations

Filtering by

The classification of domain concepts in object-oriented systems

Application of a temporal database framework for processing event queries

Sentimental bi-partite graph of political blogs

Toward customizable multi-tenant SaaS applications

Test algebra for concurrent combinatorial testing

Multi-Tenancy and Sub-Tenancy Architecture in Software-As-A-Service (Saas)

Distributed RDF Storage and Querying Using In-Memory Processing Engine