Search Content

Cognitive software complexity analysis

Description

A well-defined Software Complexity Theory which captures the Cognitive means of algorithmic information comprehension is needed in the domain of cognitive informatics & computing. The existing complexity heuristics are vague and empirical. Industrial software is a combination of algorithms implemented. However, it would be wrong to conclude that algorithmic space…

A well-defined Software Complexity Theory which captures the Cognitive means of algorithmic information comprehension is needed in the domain of cognitive informatics & computing. The existing complexity heuristics are vague and empirical. Industrial software is a combination of algorithms implemented. However, it would be wrong to conclude that algorithmic space and time complexity is software complexity. An algorithm with multiple lines of pseudocode might sometimes be simpler to understand that the one with fewer lines. So, it is crucial to determine the Algorithmic Understandability for an algorithm, in order to better understand Software Complexity. This work deals with understanding Software Complexity from a cognitive angle. Also, it is vital to compute the effect of reducing cognitive complexity. The work aims to prove three important statements. The first being, that, while algorithmic complexity is a part of software complexity, software complexity does not solely and entirely mean algorithmic Complexity. Second, the work intends to bring to light the importance of cognitive understandability of algorithms. Third, is about the impact, reducing Cognitive Complexity, would have on Software Design and Development.

ContributorsMannava, Manasa Priyamvada (Author) / Ghazarian, Arbi (Thesis advisor) / Gaffar, Ashraf (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2016

An adaptable iOS mobile application for mobile data collection

Description

Mobile data collection (MDC) applications have been growing in the last decade

especially in the field of education and research. Although many MDC applications are

available, almost all of them are tailor-made for a very specific task in a very specific

field (i.e. health, traffic, weather forecasts, …etc.). Since the main users of…

Mobile data collection (MDC) applications have been growing in the last decade

especially in the field of education and research. Although many MDC applications are

available, almost all of them are tailor-made for a very specific task in a very specific

field (i.e. health, traffic, weather forecasts, …etc.). Since the main users of these apps are

researchers, physicians or generally data collectors, it can be extremely challenging for

them to make adjustments or modifications to these applications given that they have

limited or no technical background in coding. Another common issue with MDC

applications is that its functionalities are limited only to data collection and storing. Other

functionalities such as data visualizations, data sharing, data synchronization and/or data updating are rarely found in MDC apps.

This thesis tries to solve the problems mentioned above by adding the following

two enhancements: (a) the ability for data collectors to customize their own applications

based on the project they’re working on, (b) and introducing new tools that would help

manage the collected data. This will be achieved by creating a Java standalone

application where data collectors can use to design their own mobile apps in a userfriendly Graphical User Interface (GUI). Once the app has been completely designed

using the Java tool, a new iOS mobile application would be automatically generated

based on the user’s input. By using this tool, researchers now are able to create mobile

applications that are completely tailored to their needs, in addition to enjoying new

features such as visualize and analyze data, synchronize data to the remote database,

share data with other data collectors and update existing data.

ContributorsAl-Kaf, Zahra M (Author) / Lindquist, Timothy E (Thesis advisor) / Bansal, Srividya (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2016

Dependency analysis in the HTML5, JavaScript and CSS3 Stack

Description

The Internet is transforming its look, in a short span of time we have come very far from black and white web forms with plain buttons to responsive, colorful and appealing user interface elements. With the sudden rise in demand of web applications, developers are making full use of the…

The Internet is transforming its look, in a short span of time we have come very far from black and white web forms with plain buttons to responsive, colorful and appealing user interface elements. With the sudden rise in demand of web applications, developers are making full use of the power of HTML5, JavaScript and CSS3 to cater to their users on various platforms. There was never a need of classifying the ways in which these languages can be interconnected to each other as the size of the front end code base was relatively small and did not involve critical business logic. This thesis focuses on listing and defining all dependencies between HTML5, JavaScript and CSS3 that will help developers better understand the interconnections within these languages. We also explore the present techniques available to a developer to make his code free of dependency related defects. We build a prototype tool, HJCDepend, based on our model, which aims at helping developers discover and remove defects early in the development cycle.

ContributorsVasugupta (Author) / Gary, Kevin (Thesis advisor) / Lindquist, Timothy (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2014

Distributed SPARQL over big RDF data: a comparative analysis using Presto and MapReduce

Description

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the…

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example.

This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.

ContributorsMammo, Mulugeta (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Lindquist, Timothy (Committee member) / Arizona State University (Publisher)

Created2014

Control for Resonant Microbeam Vibrotactile Haptic Displays

Description

The world’s population is currently 9% visually impaired. Medical sciences do not have a biological fix that can cure this visual impairment. Visually impaired people are currently being assisted with biological fixes or assistive devices. The current assistive devices are limited in size as well as resolution. This thesis presents…

The world’s population is currently 9% visually impaired. Medical sciences do not have a biological fix that can cure this visual impairment. Visually impaired people are currently being assisted with biological fixes or assistive devices. The current assistive devices are limited in size as well as resolution. This thesis presents the development and experimental validation of a control system for a new vibrotactile haptic display that is currently in development. In order to allow the vibrotactile haptic display to be used to represent motion, the control system must be able to change the image displayed at a rate of at least 30 frames/second. In order to achieve this, this thesis introduces and investigates the use of three improvements: threading, change filtering, and wave libraries. Through these methods, it is determined that an average of 40 frames/second can be achieved.

ContributorsKIM, KENDRA (Author) / Sodemann, Angela (Thesis advisor) / Robertson, John (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2018

Graph Search as a Feature in Imperative/Procedural Programming Languages

Description

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and…

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.

ContributorsHenderson, Christopher (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2018

A comparative analysis of graph vs relational database for instructional module development system

Description

In today's data-driven world, every datum is connected to a large amount of data. Relational databases have been proving itself a pioneer in the field of data storage and manipulation since 1970s. But more recently they have been challenged by NoSQL graph databases in handling data models which have an…

In today's data-driven world, every datum is connected to a large amount of data. Relational databases have been proving itself a pioneer in the field of data storage and manipulation since 1970s. But more recently they have been challenged by NoSQL graph databases in handling data models which have an inherent graphical representation. Graph databases with the ability to store physical relationships between two nodes and native graph processing technique have been doing exceptionally well in graph data storage and management for applications like recommendation engines, biological modeling, network modeling, social media applications, etc.

Instructional Module Development System (IMODS) is a web-based software system that guides STEM instructors through the complex task of curriculum design, ensures tight alignment between various components of a course (i.e., learning objectives, content, assessments), and provides relevant information about research-based pedagogical and assessment strategies. The data model of IMODS is highly connected and has an inherent graphical representation between all its entities with numerous relationships between them. This thesis focuses on developing an algorithm to determine completeness of course design developed using IMODS. As part of this research objective, the study also analyzes the data model for best fit database to run these algorithms. As part of this thesis, two separate applications abstracting the data model of IMODS have been developed - one with Neo4j (graph database) and another with PostgreSQL (relational database). The research objectives of the thesis are as follows: (i) evaluate the performance of Neo4j and PostgreSQL in handling complex queries that will be fired throughout the life cycle of the course design process; (ii) devise an algorithm to determine the completeness of a course design developed using IMODS. This thesis presents the process of creating data model for PostgreSQL and converting it into a graph data model to be abstracted by Neo4j, creating SQL and CYPHER scripts for undertaking experiments on both platforms, testing and elaborate analysis of the results and evaluation of the databases in the context of IMODS.

ContributorsSaha, Abir Lal (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Gonzalez-Sanchez, Javier (Committee member) / Arizona State University (Publisher)

Created2017

Minimizing Dataset Size Requirements for Machine Learning

Description

Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately…

Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset for classification is then extended to active machine learning methodology where, first a one class classifier will predict the outliers in the data and then the outlier samples are added to a training set for support vector machine classifier for labeling the unlabeled data. The labeling of dataset can be scaled up to avoid manual labeling and building more robust machine learning methodologies.

ContributorsBatra, Salil (Author) / Femiani, John (Thesis advisor) / Amresh, Ashish (Thesis advisor) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2017

Analysis and Performance Optimization of a GPGPU Implementation of Image Quality Assessment (IQA) Algorithm VSNR

Description

Image processing has changed the way we store, view and share images. One important component of sharing images over the networks is image compression. Lossy image compression techniques compromise the quality of images to reduce their size. To ensure that the distortion of images due to image compression is not…

Image processing has changed the way we store, view and share images. One important component of sharing images over the networks is image compression. Lossy image compression techniques compromise the quality of images to reduce their size. To ensure that the distortion of images due to image compression is not highly detectable by humans, the perceived quality of an image needs to be maintained over a certain threshold. Determining this threshold is best done using human subjects, but that is impractical in real-world scenarios. As a solution to this issue, image quality assessment (IQA) algorithms are used to automatically compute a fidelity score of an image.

However, poor performance of IQA algorithms has been observed due to complex statistical computations involved. General Purpose Graphics Processing Unit (GPGPU) programming is one of the solutions proposed to optimize the performance of these algorithms.

This thesis presents a Compute Unified Device Architecture (CUDA) based optimized implementation of full reference IQA algorithm, Visual Signal to Noise Ratio (VSNR) that uses M-level 2D Discrete Wavelet Transform (DWT) with 9/7 biorthogonal filters among other statistical computations. The presented implementation is tested upon four different image quality databases containing images with multiple distortions and sizes ranging from 512 x 512 to 1600 x 1280. The CUDA implementation of VSNR shows a speedup of over 32x for 1600 x 1280 images. It is observed that the speedup scales with the increase in size of images. The results showed that the implementation is fast enough to use VSNR on high definition videos with a frame rate of 60 fps. This work presents the optimizations made due to the use of GPU’s constant memory and reuse of allocated memory on the GPU. Also, it shows the performance improvement using profiler driven GPGPU development in CUDA. The presented implementation can be deployed in production combined with existing applications.

ContributorsGupta, Ayush (Author) / Sohoni, Sohum (Thesis advisor) / Amresh, Ashish (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2017

Optimizing Performance Measures in Classification Using Ensemble Learning Methods

Description

Ensemble learning methods like bagging, boosting, adaptive boosting, stacking have traditionally shown promising results in improving the predictive accuracy in classification. These techniques have recently been widely used in various domains and applications owing to the improvements in computational efficiency and distributed computing advances. However, with the advent of wide…

Ensemble learning methods like bagging, boosting, adaptive boosting, stacking have traditionally shown promising results in improving the predictive accuracy in classification. These techniques have recently been widely used in various domains and applications owing to the improvements in computational efficiency and distributed computing advances. However, with the advent of wide variety of applications of machine learning techniques to class imbalance problems, further focus is needed to evaluate, improve and optimize other performance measures such as sensitivity (true positive rate) and specificity (true negative rate) in classification. This thesis demonstrates a novel approach to evaluate and optimize the performance measures (specifically sensitivity and specificity) using ensemble learning methods for classification that can be especially useful in class imbalanced datasets. In this thesis, ensemble learning methods (specifically bagging and boosting) are used to optimize the performance measures (sensitivity and specificity) on a UC Irvine (UCI) 130 hospital diabetes dataset to predict if a patient will be readmitted to the hospital based on various feature vectors. From the experiments conducted, it can be empirically concluded that, by using ensemble learning methods, although accuracy does improve to some margin, both sensitivity and specificity are optimized significantly and consistently over different cross validation approaches. The implementation and evaluation has been done on a subset of the large UCI 130 hospital diabetes dataset. The performance measures of ensemble learners are compared to the base machine learning classification algorithms such as Naive Bayes, Logistic Regression, k Nearest Neighbor, Decision Trees and Support Vector Machines.

ContributorsBahl, Neeraj Dharampal (Author) / Bansal, Ajay (Thesis advisor) / Amresh, Ashish (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2017