Search Content

Graph Search as a Feature in Imperative/Procedural Programming Languages

Description

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and…

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.

ContributorsHenderson, Christopher (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2018

UVLabel A Tool for the Future of Interferometry Analysis

Description

UVLabel was created to enable radio astronomers to view and annotate their own data such that they could then expand their future research paths. It simplifies their data rendering process by providing a simple user interface to better access sections of their data. Furthermore, it provides an interface to track…

UVLabel was created to enable radio astronomers to view and annotate their own data such that they could then expand their future research paths. It simplifies their data rendering process by providing a simple user interface to better access sections of their data. Furthermore, it provides an interface to track trends in their data through a labelling feature.

The tool was developed following the incremental development process in order to quickly create a functional and testable tool. The incremental process also allowed for feedback from radio astronomers to help guide the project's development.

UVLabel provides both a functional product, and a modifiable and scalable code base for radio astronomer developers. This enables astronomers studying various astronomical interferometric data labelling capabilities. The tool can then be used to improve their filtering methods, pursue machine learning solutions, and discover new trends. Finally, UVLabel will be open source to put customization, scalability, and adaptability in the hands of these researchers.

ContributorsLa Place, Cecilia (Author) / Bansal, Ajay (Thesis advisor) / Jacobs, Daniel (Thesis advisor) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2019

Ensemble Learning on Deep Neural Networks for Image Caption Generation

Description

Capturing the information in an image into a natural language sentence is

considered a difficult problem to be solved by computers. Image captioning involves not just detecting objects from images but understanding the interactions between the objects to be translated into relevant captions. So, expertise in the fields of computer vision…

Capturing the information in an image into a natural language sentence is

considered a difficult problem to be solved by computers. Image captioning involves not just detecting objects from images but understanding the interactions between the objects to be translated into relevant captions. So, expertise in the fields of computer vision paired with natural language processing are supposed to be crucial for this purpose. The sequence to sequence modelling strategy of deep neural networks is the traditional approach to generate a sequential list of words which are combined to represent the image. But these models suffer from the problem of high variance by not being able to generalize well on the training data.

The main focus of this thesis is to reduce the variance factor which will help in generating better captions. To achieve this, Ensemble Learning techniques have been explored, which have the reputation of solving the high variance problem that occurs in machine learning algorithms. Three different ensemble techniques namely, k-fold ensemble, bootstrap aggregation ensemble and boosting ensemble have been evaluated in this thesis. For each of these techniques, three output combination approaches have been analyzed. Extensive experiments have been conducted on the Flickr8k dataset which has a collection of 8000 images and 5 different captions for every image. The bleu score performance metric, which is considered to be the standard for evaluating natural language processing (NLP) problems, is used to evaluate the predictions. Based on this metric, the analysis shows that ensemble learning performs significantly better and generates more meaningful captions compared to any of the individual models used.

ContributorsKatpally, Harshitha (Author) / Bansal, Ajay (Thesis advisor) / Acuna, Ruben (Committee member) / Gonzalez-Sanchez, Javier (Committee member) / Arizona State University (Publisher)

Created2019

A Neural Network Model for a Tutoring Companion Supporting Students in a Programming with Java Course

Description

Feedback represents a vital component of the learning process and is especially important for Computer Science students. With class sizes that are often large, it can be challenging to provide individualized feedback to students. Consistent, constructive, supportive feedback through a tutoring companion can scaffold the learning process for students.

This work…

Feedback represents a vital component of the learning process and is especially important for Computer Science students. With class sizes that are often large, it can be challenging to provide individualized feedback to students. Consistent, constructive, supportive feedback through a tutoring companion can scaffold the learning process for students.

This work contributes to the construction of a tutoring companion designed to provide this feedback to students. It aims to bridge the gap between the messages the compiler delivers, and the support required for a novice student to understand the problem and fix their code. Particularly, it provides support for students learning about recursion in a beginning university Java programming course. Besides also providing affective support, a tutoring companion could be more effective when it is embedded into the environment that the student is already using, instead of an additional tool for the student to learn. The proposed Tutoring Companion is embedded into the Eclipse Integrated Development Environment (IDE).

This thesis focuses on the reasoning model for the Tutoring Companion and is developed using the techniques of a neural network. While a student uses the IDE, the Tutoring Companion collects 16 data points, including the presence of certain key words, cyclomatic complexity, and error messages from the compiler, every time it detects an event, such as a run attempt, debug attempt, or a request for help, in the IDE. This data is used as inputs to the neural network. The neural network produces a correlating single output code for the feedback to be provided to the student, which is displayed in the IDE.

The effectiveness of the approach is examined among 38 Computer Science students who solve a programming assignment while the Tutoring Companion assists them. Data is collected from these interactions, including all inputs and outputs for the neural network, and students are surveyed regarding their experience. Results suggest that students feel supported while working with the Companion and promising potential for using a neural network with an embedded companion in the future. Challenges in developing an embedded companion are discussed, as well as opportunities for future work.

ContributorsDay, Melissa (Author) / Gonzalez-Sanchez, Javier (Thesis advisor) / Bansal, Ajay (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)

Created2019

Deep Learning-Based Monocular SLAM

Description

SLAM (Simultaneous Localization and Mapping) is a problem that has existed for a long time in robotics and autonomous navigation. The objective of SLAM is for a robot to simultaneously figure out its position in space and map its environment. SLAM is especially useful and mandatory for robots that want…

SLAM (Simultaneous Localization and Mapping) is a problem that has existed for a long time in robotics and autonomous navigation. The objective of SLAM is for a robot to simultaneously figure out its position in space and map its environment. SLAM is especially useful and mandatory for robots that want to navigate autonomously. The description might make it seem like a chicken and egg problem, but numerous methods have been proposed to tackle SLAM. Before the rise in the popularity of deep learning and AI (Artificial Intelligence), most existing algorithms involved traditional hard-coded algorithms that would receive and process sensor information and convert it into some solvable sensor-agnostic problem. The challenge for these sorts of methods is having to tackle dynamic environments. The more variety in the environment, the poorer the results. Also due to the increase in computational power and the capability of deep learning-based image processing, visual SLAM has become extremely viable and maybe even preferable to traditional SLAM algorithms. In this research, a deep learning-based solution to the SLAM problem is proposed, specifically monocular visual SLAM which is solving the problem of SLAM purely with a singular camera as the input, and the model is tested on the KITTI (Karlsruhe Institute of Technology & Toyota Technological Institute) odometry dataset.

ContributorsRupaakula, Krishna Sandeep (Author) / Bansal, Ajay (Thesis advisor) / Baron, Tyler (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2023

Improving Ontology Alignment Using Machine Learning Techniques

Description

Ontologies play an important role in storing and exchanging digitized data. As the need for semantic web information grows, organizations from around the globe has defined ontologies in different domains to better represent the data. But different organizations define ontologies of the same entity in their own way. Finding ontologies…

Ontologies play an important role in storing and exchanging digitized data. As the need for semantic web information grows, organizations from around the globe has defined ontologies in different domains to better represent the data. But different organizations define ontologies of the same entity in their own way. Finding ontologies of the same entity in different fields and domains has become very important for unifying and improving interoperability of data between these multiple domains. Many different techniques have been used over the year, including human assisted, automated and hybrid. In recent years with the availability of many machine learning techniques, researchers are trying to apply these techniques to solve the ontology alignment problem across different domains. In this study I have looked into the use of different machine learning techniques such as Support Vector Machine, Stochastic Gradient Descent, Random Forest etc. for solving ontology alignment problem with some of the most commonly used datasets found from the famous Ontology Alignment Evaluation Initiative (OAEI). I have proposed a method OntoAlign which demonstrates the importance of using different types of similarity measures for feature extraction from ontology data in order to achieve better results for ontology alignment.

ContributorsNasim, Tariq M (Author) / Bansal, Srividya (Thesis advisor) / Mehlhase, Alexandra (Committee member) / Banerjee, Ayan (Committee member) / Arizona State University (Publisher)

Created2022

Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts

Description

A significant proportion of medical errors exist in crucial medical information, and most stem from misinterpreting non-standardized clinical notes. Clinical Skills exam offered by the United States Medical Licensing Examination (USMLE) was put in place to certify patient note-taking skills before medical students joined professional practices, offering the first line…

A significant proportion of medical errors exist in crucial medical information, and most stem from misinterpreting non-standardized clinical notes. Clinical Skills exam offered by the United States Medical Licensing Examination (USMLE) was put in place to certify patient note-taking skills before medical students joined professional practices, offering the first line of defense in protecting patients from medical errors. Nonetheless, the exams were discontinued in 2021 following high costs and resource usage in scoring the exams. This thesis compares four transformer-based models, namely BERT (Bidirectional Encoder Representations from Transformers) Base Uncased, Emilyalsentzer Bio_ClinicalBERT, RoBERTa (Robustly Optimized BERT Pre-Training Approach), and DeBERTa (Decoding-enhanced BERT with disentangled attention), with the goal to map free text in patient notes to clinical concepts present in the exam rubric. The impact of context-specific embeddings on BERT was also studied to determine the need for a clinical BERT in Clinical Skills exam. This thesis proposes the use of DeBERTa as a backbone model in patient note scoring for the USMLE Clinical Skills exam after comparing it with three other transformer models. Disentangled attention and enhanced mask decoder integrated into DeBERTa were credited for the high performance of DeBERTa as compared to the other models. Besides, the effect of meta pseudo labeling was also investigated in this thesis, which in turn, further enhanced DeBERTa’s performance.

ContributorsGanesh, Jay (Author) / Bansal, Ajay (Thesis advisor) / Mehlhase, Alexandra (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2022

Investigating the Utility of Agile and Lean Software Process Metrics for Open Source Software Communities: An Exploratory Study

Description

The adoption of Open Source Software (OSS) by organizations has become a strategic need in a wide variety of software applications and platforms. Open Source has changed the way organizations develop, acquire, use, and commercialize software. Further, OSS projects often incorporate similar principles and practices as Agile and Lean software…

The adoption of Open Source Software (OSS) by organizations has become a strategic need in a wide variety of software applications and platforms. Open Source has changed the way organizations develop, acquire, use, and commercialize software. Further, OSS projects often incorporate similar principles and practices as Agile and Lean software development projects. Contrary to traditional organizations, the environment in which these projects function has an impact on process-related elements like the flow of work and value definition. Process metrics are typically employed during Agile Software Engineering projects as a means of providing meaningful feedback. Investigating these metrics to see if OSS projects and communities can utilize them in a beneficial way thus becomes an interesting research topic. In that context, this exploratory research investigates whether well-established Agile and Lean software engineering metrics provide useful feedback about OSS projects. This knowledge will assist in educating the Open Source community about the applications of Agile Software Engineering and its variations in Open Source projects. Each of the Open Source projects included in this analysis has a substantial development team that maintains a mature, well-established codebase with process flow information. These OSS projects listed on GitHub are investigated by applying process flow metrics. The methodology used to collect these metrics and relevant findings are discussed in this thesis. This study also compares the results to distinctive Open Source project characteristics as part of the analysis. In this exploratory research best-fit versions of published Agile and Lean software process metrics are applied to OSS, and following these explorations, specific questions are further addressed using the data collected. This research's original contribution is to determine whether Agile and Lean process metrics are helpful in OSS, as well as the opportunities and obstacles that may arise when applying Agile and Lean principles to OSS.

ContributorsSuresh, Disha (Author) / Gary, Kevin (Thesis advisor) / Bansal, Srividya (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)

Created2022

Kitsune: Structurally-Aware and Adaptable Plagiarism Detection

Description

Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a number of plagiarism detection tools that attempt to encode knowledge…

Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a number of plagiarism detection tools that attempt to encode knowledge about the programming languages they support in order to better detect obscured duplicates. Many such tools do not support a large number of languages because doing so requires too much code and therefore too much maintenance. It is also difficult to add support for new languages because each language is vastly different syntactically. Tools that are more extensible often do so by reducing the features of a language that are encoded and end up closer to text comparison tools than structurally-aware program analysis tools.

Kitsune attempts to remedy these issues by tying itself to Antlr, a pre-existing language recognition tool with over 200 currently supported languages. In addition, it provides an interface through which generic manipulations can be applied to the parse tree generated by Antlr. As Kitsune relies on language-agnostic structure modifications, it can be adapted with minimal effort to provide plagiarism detection for new languages. Kitsune has been evaluated for 10 of the languages in the Antlr grammar repository with success and could easily be extended to support all of the grammars currently developed by Antlr or future grammars which are developed as new languages are written.

ContributorsMonroe, Zachary Lynn (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2020

Domain-Agnostic Context-Aware Assistant Framework for Task-Based Environment

Description

Smart home assistants are becoming a norm due to their ease-of-use. They employ spoken language as an interface, facilitating easy interaction with their users. Even with their obvious advantages, natural-language based interfaces are not prevalent outside the domain of home assistants. It is hard to adopt them for computer-controlled systems…

Smart home assistants are becoming a norm due to their ease-of-use. They employ spoken language as an interface, facilitating easy interaction with their users. Even with their obvious advantages, natural-language based interfaces are not prevalent outside the domain of home assistants. It is hard to adopt them for computer-controlled systems due to the numerous complexities involved with their implementation in varying fields. The main challenge is the grounding of natural language base terms into the underlying system's primitives. The existing systems that do use natural language interfaces are specific to one problem domain only.

In this thesis, a domain-agnostic framework that creates natural language interfaces for computer-controlled systems has been developed by making the mapping between the language constructs and the system primitives customizable. The framework employs ontologies built using OWL (Web Ontology Language) for knowledge representation purposes and machine learning models for language processing tasks. It has been evaluated within a simulation environment consisting of objects and a robot. This environment has been deployed as a web application, providing anonymous user testing for evaluation, and generating training data for machine learning components. Performance evaluation has been done on metrics such as time taken for a task or the number of instructions given by the user to the robot to accomplish a task. Additionally, the framework has been used to create a natural language interface for a database system to demonstrate its domain independence.

ContributorsTiwari, Sarthak (Author) / Bansal, Ajay (Thesis advisor) / Mehlhase, Alexandra (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by