Search Content

Distributed SPARQL over big RDF data: a comparative analysis using Presto and MapReduce

Description

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the…

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example.

This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.

ContributorsMammo, Mulugeta (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Lindquist, Timothy (Committee member) / Arizona State University (Publisher)

Created2014

Effects of Error Messages on a Student’s Ability to Understand and Fix Programming Errors

Description

Assemblers and compilers provide feedback to a programmer in the form of error messages. These error messages become input to the debugging model of the programmer. For the programmer to fix an error, they should first locate the error in the program, understand what is causing that error, and finally…

Assemblers and compilers provide feedback to a programmer in the form of error messages. These error messages become input to the debugging model of the programmer. For the programmer to fix an error, they should first locate the error in the program, understand what is causing that error, and finally resolve that error. Error messages play an important role in all three stages of fixing of errors. This thesis studies the effects of error messages in the context of teaching programming. Given an error message, this work investigates how it effects student’s way of 1) understanding the error, and 2) fixing the error. As part of the study, three error message types were developed – Default, Link and Example, to better understand the effects of error messages. The Default type provides an assembler-centric single line error message, the Link type provides a program-centric detailed error description with a hyperlink for more information, and the Example type provides a program centric detailed error description with a relevant example. All these error message types were developed for assembly language programming. A think aloud programming exercise was conducted as part of the study to capture the student programmer’s knowledge model. Different codes were developed to analyze the data collected as part of think aloud exercise. After transcribing, coding, and analyzing the data, it was found that the Link type of error message helped to fix the error in less time and with fewer steps. Among the three types, the Link type of error message also resulted in a significantly higher ratio of correct to incorrect steps taken by the programmer to fix the error.

ContributorsBeejady Murthy Kadekar, Harsha Kadekar (Author) / Sohoni, Sohum (Thesis advisor) / Craig, Scotty D. (Committee member) / Jordan, Shawn S (Committee member) / Gary, Kevin A (Committee member) / Arizona State University (Publisher)

Created2017

Increasing the Effectiveness of Error Messages in a Computer Programming and Simulation Tool

Description

Each programming language has a compiler associated with it which helps to identify logical or syntactical errors in the program. These compiler error messages play important part in the form of formative feedback for the programmer. Thus, the error messages should be constructed carefully, considering the affective and cognitive needs…

Each programming language has a compiler associated with it which helps to identify logical or syntactical errors in the program. These compiler error messages play important part in the form of formative feedback for the programmer. Thus, the error messages should be constructed carefully, considering the affective and cognitive needs of programmers. This is especially true for systems that are used in educational settings, as the messages are typically seen by students who are novice programmers. If the error messages are hard to understand then they might discourage students from understanding or learning the programming language. The primary goal of this research is to identify methods to make the error messages more effective so that students can understand them better and simultaneously learn from their mistakes. This study is focused on understanding how the error message affects the understanding of the error and the approach students take to solve the error. In this study, three types of error messages were provided to the students. The first type is Default type error message which is an assembler centric error message. The second type is Link type error message which is a descriptive error message along with a link to the appropriate section of the PLP manual. The third type is Example type error message which is again a descriptive error message with an example of the similar type of error along with correction step. All these error types were developed for the PLP assembly language. A think-aloud experiment was designed and conducted on the students. The experiment was later transcribed and coded to understand different approach students take to solve different type of error message. After analyzing the result of the think-aloud experiment it was found that student read the Link type error message completely and they understood and learned from the error message to solve the error. The results also indicated that Link type was more helpful compare to other types of error message. The Link type made error solving process more effective compared to other error types.

ContributorsTanpure, Siddhant Bapusaheb (Author) / Sohoni, Sohum (Thesis advisor) / Gary, Kevin A (Committee member) / Craig, Scotty D. (Committee member) / Arizona State University (Publisher)

Created2018

All Purpose Textual Data Information Extraction, Visualization and Querying

Description

Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and…

Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and extraction worthy. Using data visualization theory and fast, interactive querying methods, leaving out information might not really be necessary. This thesis explores textual data visualization techniques, intuitive querying, and a novel approach to all-purpose textual information extraction to encode large text corpus to improve human understanding of the information present in textual data.

This thesis presents a modified traversal algorithm on dependency parse output of text to extract all subject predicate object pairs from text while ensuring that no information is missed out. To support full scale, all-purpose information extraction from large text corpuses, a data preprocessing pipeline is recommended to be used before the extraction is run. The output format is designed specifically to fit on a node-edge-node model and form the building blocks of a network which makes understanding of the text and querying of information from corpus quick and intuitive. It attempts to reduce reading time and enhancing understanding of the text using interactive graph and timeline.

ContributorsHashmi, Syed Usama (Author) / Bansal, Ajay (Thesis advisor) / Bansal, Srividya (Committee member) / Gonzalez Sanchez, Javier (Committee member) / Arizona State University (Publisher)

Created2018

Evaluation of instructional module development system

Description

Academia is not what it used to be. In today’s fast-paced world, requirements

are constantly changing, and adapting to these changes in an academic curriculum

can be challenging. Given a specific aspect of a domain, there can be various levels of

proficiency that can be achieved by the students. Considering the wide array…

Academia is not what it used to be. In today’s fast-paced world, requirements

are constantly changing, and adapting to these changes in an academic curriculum

can be challenging. Given a specific aspect of a domain, there can be various levels of

proficiency that can be achieved by the students. Considering the wide array of needs,

diverse groups need customized course curriculum. The need for having an archetype

to design a course focusing on the outcomes paved the way for Outcome-based

Education (OBE). OBE focuses on the outcomes as opposed to the traditional way of

following a process [23]. According to D. Clark, the major reason for the creation of

Bloom’s taxonomy was not only to stimulate and inspire a higher quality of thinking

in academia – incorporating not just the basic fact-learning and application, but also

to evaluate and analyze on the facts and its applications [7]. Instructional Module

Development System (IMODS) is the culmination of both these models – Bloom’s

Taxonomy and OBE. It is an open-source web-based software that has been

developed on the principles of OBE and Bloom’s Taxonomy. It guides an instructor,

step-by-step, through an outcomes-based process as they define the learning

objectives, the content to be covered and develop an instruction and assessment plan.

The tool also provides the user with a repository of techniques based on the choices

made by them regarding the level of learning while defining the objectives. This helps

in maintaining alignment among all the components of the course design. The tool

also generates documentation to support the course design and provide feedback

when the course is lacking in certain aspects.

It is not just enough to come up with a model that theoretically facilitates

effective result-oriented course design. There should be facts, experiments and proof

that any model succeeds in achieving what it aims to achieve. And thus, there are two

research objectives of this thesis: (i) design a feature for course design feedback and

evaluate its effectiveness; (ii) evaluate the usefulness of a tool like IMODS on various

aspects – (a) the effectiveness of the tool in educating instructors on OBE; (b) the

effectiveness of the tool in providing appropriate and efficient pedagogy and

assessment techniques; (c) the effectiveness of the tool in building the learning

objectives; (d) effectiveness of the tool in document generation; (e) Usability of the

tool; (f) the effectiveness of OBE on course design and expected student outcomes.

The thesis presents a detailed algorithm for course design feedback, its pseudocode, a

description and proof of the correctness of the feature, methods used for evaluation

of the tool, experiments for evaluation and analysis of the obtained results.

ContributorsRaj, Vaishnavi (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)

Created2018

Template-Based Question Answering over Linked Data using Recursive Neural Networks

Description

The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies…

The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies used by these knowledge graphs. So, to make these knowledge graphs more accessible (even for non- experts) several question answering (QA) systems have been developed over the last decade. Due to the complexity of the task, several approaches have been undertaken that include techniques from natural language processing (NLP), information retrieval (IR), machine learning (ML) and the Semantic Web (SW). At a higher level, most question answering systems approach the question answering task as a conversion from the natural language question to its corresponding SPARQL query. These systems then utilize the query to retrieve the desired entities or literals. One approach to solve this problem, that is used by most systems today, is to apply deep syntactic and semantic analysis on the input question to derive the SPARQL query. This has resulted in the evolution of natural language processing pipelines that have common characteristics such as answer type detection, segmentation, phrase matching, part-of-speech-tagging, named entity recognition, named entity disambiguation, syntactic or dependency parsing, semantic role labeling, etc.

This has lead to NLP pipeline architectures that integrate components that solve a specific aspect of the problem and pass on the results to subsequent components for further processing eg: DBpedia Spotlight for named entity recognition, RelMatch for relational mapping, etc. A major drawback in this approach is error propagation that is a common problem in NLP. This can occur due to mistakes early on in the pipeline that can adversely affect successive steps further down the pipeline. Another approach is to use query templates either manually generated or extracted from existing benchmark datasets such as Question Answering over Linked Data (QALD) to generate the SPARQL queries that is basically a set of predefined queries with various slots that need to be filled. This approach potentially shifts the question answering problem into a classification task where the system needs to match the input question to the appropriate template (class label).

This thesis proposes a neural network approach to automatically learn and classify natural language questions into its corresponding template using recursive neural networks. An obvious advantage of using neural networks is the elimination for the need of laborious feature engineering that can be cumbersome and error prone. The input question would be encoded into a vector representation. The model will be trained and evaluated on the LC-QuAD Dataset (Large-scale Complex Question Answering Dataset). The dataset was created explicitly for machine learning based QA approaches for learning complex SPARQL queries. The dataset consists of 5000 questions along with their corresponding SPARQL queries over the DBpedia dataset spanning 5042 entities and 615 predicates. These queries were annotated based on 38 unique templates that the model will attempt to classify. The resulting model will be evaluated against both the LC-QuAD dataset and the Question Answering Over Linked Data (QALD-7) dataset.

The recursive neural network achieves template classification accuracy of 0.828 on the LC-QuAD dataset and an accuracy of 0.618 on the QALD-7 dataset. When the top-2 most likely templates were considered the model achieves an accuracy of 0.945 on the LC-QuAD dataset and 0.786 on the QALD-7 dataset.

After slot filling, the overall system achieves a macro F-score 0.419 on the LC- QuAD dataset and a macro F-score of 0.417 on the QALD-7 dataset.

ContributorsAthreya, Ram G (Author) / Bansal, Srividya (Thesis advisor) / Usbeck, Ricardo (Committee member) / Gary, Kevin (Committee member) / Arizona State University (Publisher)

Created2018

Investigation in Prolog-based Machine Translation with English-Hungarian Case Study

Description

This undergraduate thesis explores the efficacy of developing a translator generator in the Prolog programming language using Lexical Functional Grammars. A bidirectional machine translator between English and Hungarian, developed as a proof-of-concept case study, is discussed and assessed. The benefits and drawbacks of this approach as generalized to Machine Translation…

This undergraduate thesis explores the efficacy of developing a translator generator in the Prolog programming language using Lexical Functional Grammars. A bidirectional machine translator between English and Hungarian, developed as a proof-of-concept case study, is discussed and assessed. The benefits and drawbacks of this approach as generalized to Machine Translation systems are also discussed, along with possible areas of future work.

ContributorsLane, Ryan Andrew (Author) / Bansal, Ajay (Thesis director) / Bansal, Srividya (Committee member) / Barrett, The Honors College (Contributor)

Created2015-05

Development and Evaluation of an Electrical Engineering and Math Curriculum Module for High School Students

Description

Parents in STEM careers are more apt to guide their kids towards STEM careers (Sherburne-Michigan, 2017). There are STEM programs and classes for students who are interested in related fields, but the conundrum is that students need to be interested in order to choose to participate. The goal of this…

Parents in STEM careers are more apt to guide their kids towards STEM careers (Sherburne-Michigan, 2017). There are STEM programs and classes for students who are interested in related fields, but the conundrum is that students need to be interested in order to choose to participate. The goal of this creative project was to introduce engineering concepts in a high school class to reveal and investigate the ways in which engineering concepts can be successfully introduced to a larger student populace to increase interest in engineering programs, courses, and degrees. A lesson plan and corresponding materials - including circuit kits and a simulated ball launching station with graphical display - were made to accomplish this goal. Throughout the lesson students were asked to (1) use given materials to accomplish a goal, (2) predict outcomes based on conceptual understanding and mathematical calculations, (3) test predictions, (4) record data, and (5) analyze data to generate results. The students first created a simple circuit to understand the circuit components and learn general electrical engineering concepts. A simple light dimmer circuit let students demonstrate understanding of electrical concepts (e.g., voltage, current resistance) before using the circuit to a simulated motor in order to launch a ball. The students were then asked to predict the time and height of a ball launched with various settings of their control circuit. The students were able to test their theories with the simulated launcher test set up shown in Figure 25 and collect data to create a parabolic height versus time graph. Based on the measured graph, the students were able to record their results and compare calculated values to real-world measured values. The results of the study suggest ways to introduce students to engineering while developing hands-on concept modeling of projectile motion and circuit design in math classrooms. Additionally, this lesson identifies a rich topic for teachers and STEM education researchers to explore lesson plans with interdisciplinary connections to engineering. This report will include the inspiration for the product, related work, iterative design process, and the final design. This information will be followed by user feedback, a project reflection, and lessons learned. The report will conclude with a summary and a discussion of future work.

ContributorsBurgess, Kylee Rae (Author) / Jordan, Shawn (Thesis director) / Sohoni, Sohum (Committee member) / Kinach, Barbara (Committee member) / Engineering Programs (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Lambda Starship: A Video Game for Teaching Functional Programming with Lisp

Description

The functional programming paradigm is able to provide clean and concise solutions to many common programming problems, as well as promote safer, more testable code by encouraging an isolation of state-modifying behavior. Functional programming is finding its way into traditionally object-oriented and imperative languages, most notably with the introduction of…

The functional programming paradigm is able to provide clean and concise solutions to many common programming problems, as well as promote safer, more testable code by encouraging an isolation of state-modifying behavior. Functional programming is finding its way into traditionally object-oriented and imperative languages, most notably with the introduction of Java 8 and in LINQ for C#. However, no functional programming language has achieved widespread adoption, meaning that students without a formal computer science background who learn technology on-demand for personal projects or for business may not come across functional programming in a significant way. Programmers need a reason to spend time learning these concepts to not miss out on the subtle but profound benefits they provide. I propose the use of a video game as an environment in which learning functional programming is the player's goal. In this carefully constructed video game, learning functional programming is the key to progression. Players will be motivated to learn and will be given an immediate chance to test and demonstrate their understanding. The game, named Lambda Starship (stylized as (lambda () starship)), is a 3D first-person video game. It takes place in a spaceship that, due to extreme magnetic interference, has lost all on-board software while leaving the hardware completely intact. The player is tasked to write software using functional programming paradigms to replace the old software and bring the spaceship back to a working state. Throughout the process, the player is guided by an in-game manual and other descriptive resources. The game is implemented in Unity and scripted using C#. The game's educational and entertainment value was evaluated with a study case. 24 undergraduate students at Arizona State University (ASU) played the game and were surveyed detailing their experience. During play, user statistics were recorded automatically, providing a data-driven way to analyze where players struggled with the concepts introduced in the game. Reception was neutral or positive in both the entertainment and educational sides of the game. A few players expressed concerns about the manual in its form factor and engagement value.

ContributorsCompton, Tyler Alexander (Author) / Gonzalez-Sanchez, Javier (Thesis director) / Bansal, Srividya (Committee member) / Software Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Smart car technologies: a comprehensive study of the state of the art with analysis and trends

Description

Driving is already a complex task that demands a varying level of cognitive and physical load. With the advancement in technology, the car has become a place for media consumption, a communications center and an interconnected workplace. The number of features in a car has also increased. As a result,…

Driving is already a complex task that demands a varying level of cognitive and physical load. With the advancement in technology, the car has become a place for media consumption, a communications center and an interconnected workplace. The number of features in a car has also increased. As a result, the user interaction inside the car has become overcrowded and more complex. This has increased the amount of distraction while driving and has also increased the number of accidents due to distracted driving. This thesis focuses on the critical analysis of today’s in-car environment covering two main aspects, Multi Modal Interaction (MMI), and Advanced Driver Assistance Systems (ADAS), to minimize the distraction. It also provides deep market research on future trends in the smart car technology. After careful analysis, it was observed that an infotainment screen cluttered with lots of small icons, a center stack with a plethora of small buttons and a poor Voice Recognition (VR) results in high cognitive load, and these are the reasons for the increased driver distraction. Though the VR has become a standard technology, the current state of technology is focused on features oriented design and a sales driven approach. Most of the automotive manufacturers are focusing on making the VR better but attaining perfection in VR is not the answer as there are inherent challenges and limitations in respect to the in-car environment and cognitive load. Accordingly, the research proposed a novel in-car interaction design solution: Multi-Modal Interaction (MMI). The MMI is a new term when used in the context of vehicles, but it is widely used in human-human interaction. The approach offers a non-intrusive alternative to the driver to interact with the features in the car. With the focus on user-centered design, the MMI and ADAS can potentially help to reduce the distraction. To support the discussion, an experiment was conducted to benchmark a minimalist UI design. An engineering based method was used to test and measure distraction of four different UIs with varying numbers of icons and screen sizes. Lastly, in order to compete with the market, the basic features that are provided by all the other competitors cannot be eliminated, but the hard work can be done to improve the HCaI and to make driving safer.

ContributorsNakrani, Paresh Keshubhai (Author) / Gaffar, Ashraf (Thesis advisor) / Sohoni, Sohum (Committee member) / Ghazarian, Arabi (Committee member) / Arizona State University (Publisher)

Created2015