Search Content

Distributed SPARQL over big RDF data: a comparative analysis using Presto and MapReduce

Description

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the…

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example.

This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.

ContributorsMammo, Mulugeta (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Lindquist, Timothy (Committee member) / Arizona State University (Publisher)

Created2014

All Purpose Textual Data Information Extraction, Visualization and Querying

Description

Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and…

Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and extraction worthy. Using data visualization theory and fast, interactive querying methods, leaving out information might not really be necessary. This thesis explores textual data visualization techniques, intuitive querying, and a novel approach to all-purpose textual information extraction to encode large text corpus to improve human understanding of the information present in textual data.

This thesis presents a modified traversal algorithm on dependency parse output of text to extract all subject predicate object pairs from text while ensuring that no information is missed out. To support full scale, all-purpose information extraction from large text corpuses, a data preprocessing pipeline is recommended to be used before the extraction is run. The output format is designed specifically to fit on a node-edge-node model and form the building blocks of a network which makes understanding of the text and querying of information from corpus quick and intuitive. It attempts to reduce reading time and enhancing understanding of the text using interactive graph and timeline.

ContributorsHashmi, Syed Usama (Author) / Bansal, Ajay (Thesis advisor) / Bansal, Srividya (Committee member) / Gonzalez Sanchez, Javier (Committee member) / Arizona State University (Publisher)

Created2018

Template-Based Question Answering over Linked Data using Recursive Neural Networks

Description

The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies…

The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies used by these knowledge graphs. So, to make these knowledge graphs more accessible (even for non- experts) several question answering (QA) systems have been developed over the last decade. Due to the complexity of the task, several approaches have been undertaken that include techniques from natural language processing (NLP), information retrieval (IR), machine learning (ML) and the Semantic Web (SW). At a higher level, most question answering systems approach the question answering task as a conversion from the natural language question to its corresponding SPARQL query. These systems then utilize the query to retrieve the desired entities or literals. One approach to solve this problem, that is used by most systems today, is to apply deep syntactic and semantic analysis on the input question to derive the SPARQL query. This has resulted in the evolution of natural language processing pipelines that have common characteristics such as answer type detection, segmentation, phrase matching, part-of-speech-tagging, named entity recognition, named entity disambiguation, syntactic or dependency parsing, semantic role labeling, etc.

This has lead to NLP pipeline architectures that integrate components that solve a specific aspect of the problem and pass on the results to subsequent components for further processing eg: DBpedia Spotlight for named entity recognition, RelMatch for relational mapping, etc. A major drawback in this approach is error propagation that is a common problem in NLP. This can occur due to mistakes early on in the pipeline that can adversely affect successive steps further down the pipeline. Another approach is to use query templates either manually generated or extracted from existing benchmark datasets such as Question Answering over Linked Data (QALD) to generate the SPARQL queries that is basically a set of predefined queries with various slots that need to be filled. This approach potentially shifts the question answering problem into a classification task where the system needs to match the input question to the appropriate template (class label).

This thesis proposes a neural network approach to automatically learn and classify natural language questions into its corresponding template using recursive neural networks. An obvious advantage of using neural networks is the elimination for the need of laborious feature engineering that can be cumbersome and error prone. The input question would be encoded into a vector representation. The model will be trained and evaluated on the LC-QuAD Dataset (Large-scale Complex Question Answering Dataset). The dataset was created explicitly for machine learning based QA approaches for learning complex SPARQL queries. The dataset consists of 5000 questions along with their corresponding SPARQL queries over the DBpedia dataset spanning 5042 entities and 615 predicates. These queries were annotated based on 38 unique templates that the model will attempt to classify. The resulting model will be evaluated against both the LC-QuAD dataset and the Question Answering Over Linked Data (QALD-7) dataset.

The recursive neural network achieves template classification accuracy of 0.828 on the LC-QuAD dataset and an accuracy of 0.618 on the QALD-7 dataset. When the top-2 most likely templates were considered the model achieves an accuracy of 0.945 on the LC-QuAD dataset and 0.786 on the QALD-7 dataset.

After slot filling, the overall system achieves a macro F-score 0.419 on the LC- QuAD dataset and a macro F-score of 0.417 on the QALD-7 dataset.

ContributorsAthreya, Ram G (Author) / Bansal, Srividya (Thesis advisor) / Usbeck, Ricardo (Committee member) / Gary, Kevin (Committee member) / Arizona State University (Publisher)

Created2018

Improving the Engineering Retention Rate: A tool to match students to their ideal field of study

Description

Engineering is a multidisciplinary field with a variety of applications. However, since there are so many disciplines of engineering, it is often challenging to find the discipline that best suits an individual interested in engineering. Not knowing which area of engineering most aligns to one’s interests is challenging when deciding…

Engineering is a multidisciplinary field with a variety of applications. However, since there are so many disciplines of engineering, it is often challenging to find the discipline that best suits an individual interested in engineering. Not knowing which area of engineering most aligns to one’s interests is challenging when deciding on a major and a career. With the development of the Engineering Interest Quiz (EIQ), the goal was to help individuals find the field of engineering that is most similar to their interests. Initially, an Engineering Faculty Survey (EFS) was created to gather information from engineering faculty at Arizona State University (ASU) and to determine keywords that describe each field of engineering. With this list of keywords, the EIQ was developed. Data from the EIQ compared the engineering students’ top three results for the best engineering discipline for them with their current engineering major of study. The data analysis showed that 70% of the respondents had their major listed as one of the top three results they were given and 30% of the respondents did not have their major listed. Of that 70%, 64% had their current major listed as the highest or tied for the highest percentage and 36% had their major listed as the second or third highest percentage. Furthermore, the EIQ data was compared between genders. Only 33% of the male students had their current major listed as their highest percentage, but 55% had their major as one of their top three results. Women had higher percentages with 63% listing their current major as their highest percentage and 81% listing it in the top three of their final results.

ContributorsWagner, Avery Rose (Co-author) / Lucca, Claudia (Co-author) / Taylor, David (Thesis director) / Miller, Cindy (Committee member) / Chemical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Helix: A First Game Retrospective

Description

The original version of Helix, the one I pitched when first deciding to make a video game
for my thesis, is an action-platformer, with the intent of metroidvania-style progression
and an interconnected world map.

The current version of Helix is a turn based role-playing game, with the intent of roguelike
gameplay and a dark…

The original version of Helix, the one I pitched when first deciding to make a video game
for my thesis, is an action-platformer, with the intent of metroidvania-style progression
and an interconnected world map.

The current version of Helix is a turn based role-playing game, with the intent of roguelike
gameplay and a dark fantasy theme. We will first be exploring the challenges that came
with programming my own game - not quite from scratch, but also without a prebuilt
engine - then transition into game design and how Helix has evolved from its original form
to what we see today.

ContributorsDiscipulo, Isaiah K (Author) / Meuth, Ryan (Thesis director) / Kobayashi, Yoshihiro (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

RecyclePlus: A Sustainability iOS Application

Description

RecyclePlus is an iOS mobile application that allows users to be knowledgeable in the realms of sustainability. It gives encourages users to be environmental responsible by providing them access to recycling information. In particular, it allows users to search up certain materials and learn about its recyclability and how to…

RecyclePlus is an iOS mobile application that allows users to be knowledgeable in the realms of sustainability. It gives encourages users to be environmental responsible by providing them access to recycling information. In particular, it allows users to search up certain materials and learn about its recyclability and how to properly dispose of the material. Some searches will show locations of facilities near users that collect certain materials and dispose of the materials properly. This is a full stack software project that explores open source software and APIs, UI/UX design, and iOS development.

ContributorsTran, Nikki (Author) / Ganesh, Tirupalavanam (Thesis director) / Meuth, Ryan (Committee member) / Watts College of Public Service & Community Solut (Contributor) / Department of Information Systems (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Listen Up! Developing Accessible Educational Materials for Aspiring Audio Engineers

Description

In the past ten years, the United States’ sound recording industries have experienced significant decreases in employment opportunities for aspiring audio engineers from economic imbalances in the music industry’s digital streaming era and reductions in government funding for career and technical education (CTE). The Recording Industry Association of America reports…

In the past ten years, the United States’ sound recording industries have experienced significant decreases in employment opportunities for aspiring audio engineers from economic imbalances in the music industry’s digital streaming era and reductions in government funding for career and technical education (CTE). The Recording Industry Association of America reports promises of music industry sustainability based on increasing annual revenues in paid streaming services and artists’ high creative demand. The rate of new audio engineer entries in the sound recording subsection of the music industry is not viable to support streaming artists’ high demand to engineer new music recordings. Offering CTE programs in secondary education is rare for aspiring engineers with insufficient accessibility to pursue a post-secondary or vocational education because of financial and academic limitations. These aspiring engineers seek alternatives for receiving an informal education in audio engineering on the Internet using video sharing services like YouTube to search for tutorials and improve their engineering skills. The shortage of accessible educational materials on the Internet restricts engineers from advancing their own audio engineering education, reducing opportunities to enter a desperate job market in need of independent, home studio-based engineers. Content creators on YouTube take advantage of this situation and commercialize their own video tutorial series for free and selling paid subscriptions to exclusive content. This is misleading for newer engineers because these tutorials omit important understandings of fundamental engineering concepts. Instead, content creators teach inflexible engineering methodologies that are mostly beneficial to their own way of thinking. Content creators do not often assess the incompatibility of teaching their own methodologies to potential entrants in a profession that demands critical thinking skills requiring applied fundamental audio engineering concepts and techniques. This project analyzes potential solutions to resolve the deficiencies in online audio engineering education and experiments with structuring simple, deliverable, accessible educational content and materials to new entries in audio engineering. Designing clear, easy to follow material to these new entries in audio engineering is essential for developing a strong understanding for the application of fundamental concepts in future engineers’ careers. Approaches to creating and designing educational content requires translating complex engineering concepts through simplified mediums that reduce limitations in learning for future audio engineers.

ContributorsBurns, Triston Connor (Author) / Tobias, Evan (Thesis director) / Libman, Jeff (Committee member) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Understanding and Predicting Persistence in First Year Engineering Students

Description

Based on James Marcia's theory, identity development in youth is the degree to which one has explored and committed to a vocation [1], [2]. During the path to an engineering identity, students will experience a crisis, when one's values and choices are examined and reevaluated, and a commitment, when the…

Based on James Marcia's theory, identity development in youth is the degree to which one has explored and committed to a vocation [1], [2]. During the path to an engineering identity, students will experience a crisis, when one's values and choices are examined and reevaluated, and a commitment, when the outcome of the crisis leads the student to commit to becoming an engineer. During the crisis phase, students are offered a multitude of experiences to shape their values and choices to influence commitment to becoming an engineering student. Student's identities in engineering are fostered through mentoring from industry, alumni, and peer coaching [3], [4]; experiences that emphasize awareness of the importance of professional interactions [5]; and experiences that show creativity, collaboration, and communication as crucial components to engineering. Further strategies to increase students' persistence include support in their transition to becoming an engineering student, education about professional engineers and the workplace [6], and engagement in engineering activities beyond the classroom. Though these strategies are applied to all students, there are challenges students face in confronting their current identity and beliefs before they can understand their value to society and achieve personal satisfaction. To understand student's progression in developing their engineering identity, first year engineering students were surveyed at the beginning and end of their first semester. Students were asked to rate their level of agreement with 22 statements about their engineering experience. Data included 840 cases. Items with factor loading less than 0.6 suggesting no sufficient explanation were removed in successive factor analysis to identify the four factors. Factor analysis indicated that 60.69% of the total variance was explained by the successive factors. Survey questions were categorized into three factors: engineering identity as defined by sense of belonging and self-efficacy, doubts about becoming an engineer, and exploring engineering. Statements in exploring engineering indicated student awareness, interest and enjoyment within engineering. Students were asked to think about whether they spent time learning what engineers do and participating in engineering activities. Statements about doubts about engineering to engineering indicated whether students had formed opinions about their engineering experience and had understanding about their environment. Engineering identity required thought in belonging and self-efficacy. Belonging statements called for thought about one's opinion in the importance of being an engineer, the meaning of engineering, an attachment to engineering, and self-identification as an engineer. Statements about self-efficacy required students to contemplate their personal judgement of whether they would be able to succeed and their ability to become an engineer. Effort in engineering indicated student willingness to invest time and effort and their choices and effort in their engineering discipline.

ContributorsNguyen, Amanda (Author) / Ganesh, Tirupalavanam (Thesis director) / Robinson, Carrie (Committee member) / Harrington Bioengineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Robot Head Kit for High School Robotics Education

Description

The field of robotics is rapidly expanding, and with it, the methods of teaching and introducing students must also advance alongside new technologies. There is a challenge in robotics education, especially at high school levels, to expose them to more modern and practical robots. One way to bridge this ga…

The field of robotics is rapidly expanding, and with it, the methods of teaching and introducing students must also advance alongside new technologies. There is a challenge in robotics education, especially at high school levels, to expose them to more modern and practical robots. One way to bridge this gap is human-robot interaction for a more hands-on and impactful experience that will leave students more interested in pursuing the field. Our project is a Robotic Head Kit that can be used in an educational setting to teach about its electrical, mechanical, programming, and psychological concepts. We took an existing robot head prototype and further advanced it so it can be easily assembled while still maintaining human complexity. Our research for this project dove into the electronics, mechanics, software, and even psychological barriers present in order to advance the already existing head design. The kit we have developed combines the field of robotics with psychology to create and add more life-like features and functionality to the robot, nicknamed "James Junior." The goal of our Honors Thesis was to initially fix electrical, mechanical, and software problems present. We were then tasked to run tests with high school students to validate our assembly instructions while gathering their observations and feedback about the robot's programmed reactions and emotions. The electrical problems were solved with custom PCBs designed to power and program the existing servo motors on the head. A new set of assembly instructions were written and modifications to the 3D printed parts were made for the kit. In software, existing code was improved to implement a user interface via keypad and joystick to give students control of the robot head they construct themselves. The results of our tests showed that we were not only successful in creating an intuitive robot head kit that could be easily assembled by high school students, but we were also successful in programming human-like expressions that could be emotionally perceived by the students.

ContributorsRathke, Benjamin (Co-author) / Rivera, Gerardo (Co-author) / Sodemann, Angela (Thesis director) / Itagi, Manjunath (Committee member) / Engineering Programs (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Blockchain Design and Simulation

Description

This paper details the specification and implementation of a single-machine blockchain simulator. It also includes a brief introduction on the history & underlying concepts of blockchain, with explanations on features such as decentralization, openness, trustlessness, and consensus. The introduction features a brief overview of public interest and current implementations of…

This paper details the specification and implementation of a single-machine blockchain simulator. It also includes a brief introduction on the history & underlying concepts of blockchain, with explanations on features such as decentralization, openness, trustlessness, and consensus. The introduction features a brief overview of public interest and current implementations of blockchain before stating potential use cases for blockchain simulation software. The paper then gives a brief literature review of blockchain's role, both as a disruptive technology and a foundational technology. The literature review also addresses the potential and difficulties regarding the use of blockchain in Internet of Things (IoT) networks, and also describes the limitations of blockchain in general regarding computational intensity, storage capacity, and network architecture. Next, the paper gives the specification for a generic blockchain structure, with summaries on the behaviors and purposes of transactions, blocks, nodes, miners, public & private key cryptography, signature validation, and hashing. Finally, the author gives an overview of their specific implementation of the blockchain using C/C++ and OpenSSL. The overview includes a brief description of all the classes and data structures involved in the implementation, including their function and behavior. While the implementation meets the requirements set forward in the specification, the results are more qualitative and intuitive, as time constraints did not allow for quantitative measurements of the network simulation. The paper concludes by discussing potential applications for the simulator, and the possibility for future hardware implementations of blockchain.

ContributorsRauschenbach, Timothy Rex (Author) / Vrudhula, Sarma (Thesis director) / Nakamura, Mutsumi (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2017-12

Filtering by