Search Content

Distributed SPARQL over big RDF data: a comparative analysis using Presto and MapReduce

Description

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the…

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example.

This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.

ContributorsMammo, Mulugeta (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Lindquist, Timothy (Committee member) / Arizona State University (Publisher)

Created2014

All Purpose Textual Data Information Extraction, Visualization and Querying

Description

Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and…

Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and extraction worthy. Using data visualization theory and fast, interactive querying methods, leaving out information might not really be necessary. This thesis explores textual data visualization techniques, intuitive querying, and a novel approach to all-purpose textual information extraction to encode large text corpus to improve human understanding of the information present in textual data.

This thesis presents a modified traversal algorithm on dependency parse output of text to extract all subject predicate object pairs from text while ensuring that no information is missed out. To support full scale, all-purpose information extraction from large text corpuses, a data preprocessing pipeline is recommended to be used before the extraction is run. The output format is designed specifically to fit on a node-edge-node model and form the building blocks of a network which makes understanding of the text and querying of information from corpus quick and intuitive. It attempts to reduce reading time and enhancing understanding of the text using interactive graph and timeline.

ContributorsHashmi, Syed Usama (Author) / Bansal, Ajay (Thesis advisor) / Bansal, Srividya (Committee member) / Gonzalez Sanchez, Javier (Committee member) / Arizona State University (Publisher)

Created2018

Evaluation of instructional module development system

Description

Academia is not what it used to be. In today’s fast-paced world, requirements

are constantly changing, and adapting to these changes in an academic curriculum

can be challenging. Given a specific aspect of a domain, there can be various levels of

proficiency that can be achieved by the students. Considering the wide array…

Academia is not what it used to be. In today’s fast-paced world, requirements

are constantly changing, and adapting to these changes in an academic curriculum

can be challenging. Given a specific aspect of a domain, there can be various levels of

proficiency that can be achieved by the students. Considering the wide array of needs,

diverse groups need customized course curriculum. The need for having an archetype

to design a course focusing on the outcomes paved the way for Outcome-based

Education (OBE). OBE focuses on the outcomes as opposed to the traditional way of

following a process [23]. According to D. Clark, the major reason for the creation of

Bloom’s taxonomy was not only to stimulate and inspire a higher quality of thinking

in academia – incorporating not just the basic fact-learning and application, but also

to evaluate and analyze on the facts and its applications [7]. Instructional Module

Development System (IMODS) is the culmination of both these models – Bloom’s

Taxonomy and OBE. It is an open-source web-based software that has been

developed on the principles of OBE and Bloom’s Taxonomy. It guides an instructor,

step-by-step, through an outcomes-based process as they define the learning

objectives, the content to be covered and develop an instruction and assessment plan.

The tool also provides the user with a repository of techniques based on the choices

made by them regarding the level of learning while defining the objectives. This helps

in maintaining alignment among all the components of the course design. The tool

also generates documentation to support the course design and provide feedback

when the course is lacking in certain aspects.

It is not just enough to come up with a model that theoretically facilitates

effective result-oriented course design. There should be facts, experiments and proof

that any model succeeds in achieving what it aims to achieve. And thus, there are two

research objectives of this thesis: (i) design a feature for course design feedback and

evaluate its effectiveness; (ii) evaluate the usefulness of a tool like IMODS on various

aspects – (a) the effectiveness of the tool in educating instructors on OBE; (b) the

effectiveness of the tool in providing appropriate and efficient pedagogy and

assessment techniques; (c) the effectiveness of the tool in building the learning

objectives; (d) effectiveness of the tool in document generation; (e) Usability of the

tool; (f) the effectiveness of OBE on course design and expected student outcomes.

The thesis presents a detailed algorithm for course design feedback, its pseudocode, a

description and proof of the correctness of the feature, methods used for evaluation

of the tool, experiments for evaluation and analysis of the obtained results.

ContributorsRaj, Vaishnavi (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)

Created2018

Template-Based Question Answering over Linked Data using Recursive Neural Networks

Description

The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies…

The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies used by these knowledge graphs. So, to make these knowledge graphs more accessible (even for non- experts) several question answering (QA) systems have been developed over the last decade. Due to the complexity of the task, several approaches have been undertaken that include techniques from natural language processing (NLP), information retrieval (IR), machine learning (ML) and the Semantic Web (SW). At a higher level, most question answering systems approach the question answering task as a conversion from the natural language question to its corresponding SPARQL query. These systems then utilize the query to retrieve the desired entities or literals. One approach to solve this problem, that is used by most systems today, is to apply deep syntactic and semantic analysis on the input question to derive the SPARQL query. This has resulted in the evolution of natural language processing pipelines that have common characteristics such as answer type detection, segmentation, phrase matching, part-of-speech-tagging, named entity recognition, named entity disambiguation, syntactic or dependency parsing, semantic role labeling, etc.

This has lead to NLP pipeline architectures that integrate components that solve a specific aspect of the problem and pass on the results to subsequent components for further processing eg: DBpedia Spotlight for named entity recognition, RelMatch for relational mapping, etc. A major drawback in this approach is error propagation that is a common problem in NLP. This can occur due to mistakes early on in the pipeline that can adversely affect successive steps further down the pipeline. Another approach is to use query templates either manually generated or extracted from existing benchmark datasets such as Question Answering over Linked Data (QALD) to generate the SPARQL queries that is basically a set of predefined queries with various slots that need to be filled. This approach potentially shifts the question answering problem into a classification task where the system needs to match the input question to the appropriate template (class label).

This thesis proposes a neural network approach to automatically learn and classify natural language questions into its corresponding template using recursive neural networks. An obvious advantage of using neural networks is the elimination for the need of laborious feature engineering that can be cumbersome and error prone. The input question would be encoded into a vector representation. The model will be trained and evaluated on the LC-QuAD Dataset (Large-scale Complex Question Answering Dataset). The dataset was created explicitly for machine learning based QA approaches for learning complex SPARQL queries. The dataset consists of 5000 questions along with their corresponding SPARQL queries over the DBpedia dataset spanning 5042 entities and 615 predicates. These queries were annotated based on 38 unique templates that the model will attempt to classify. The resulting model will be evaluated against both the LC-QuAD dataset and the Question Answering Over Linked Data (QALD-7) dataset.

The recursive neural network achieves template classification accuracy of 0.828 on the LC-QuAD dataset and an accuracy of 0.618 on the QALD-7 dataset. When the top-2 most likely templates were considered the model achieves an accuracy of 0.945 on the LC-QuAD dataset and 0.786 on the QALD-7 dataset.

After slot filling, the overall system achieves a macro F-score 0.419 on the LC- QuAD dataset and a macro F-score of 0.417 on the QALD-7 dataset.

ContributorsAthreya, Ram G (Author) / Bansal, Srividya (Thesis advisor) / Usbeck, Ricardo (Committee member) / Gary, Kevin (Committee member) / Arizona State University (Publisher)

Created2018

An Ethical Study: Development of An Electronic Wearable Pregnancy Monitor

Description

This paper will review past unethical studies conducted in the last 100 years on humans, including studies such as the WWII Concentration Camp studies on hypothermia and sterilization, Tuskegee Syphilis Study, and the case of Henrietta Lacks; Analyze why they were deemed unethical, the laws that emerged from these studies,…

This paper will review past unethical studies conducted in the last 100 years on humans, including studies such as the WWII Concentration Camp studies on hypothermia and sterilization, Tuskegee Syphilis Study, and the case of Henrietta Lacks; Analyze why they were deemed unethical, the laws that emerged from these studies, and how it relates to contemporary technology, with a focus on the issues surrounding the development of an electronic wearable pregnancy monitor. The studies will include details of how they were conducted as well as what deemed them unethical and an explanation of why the results are unusable. Following the studies will be an explanation of the laws that were set into place following the studies with a lead into current technologies and how these technologies created a new set of ethics. The Google Mini, the wearable biosensor onesies for infants, and the intensive care unit at Banner Baywood will be described and so will their role in the development of an electronic wearable pregnancy monitor. The mini-meta analysis includes possible features of the monitor as well as a description of what the ethical consent form will look like. To conclude the paper, the importance of analyzing past unethical studies will help create a new ethical device that will make a point to go above and beyond to ensure the physical health of unborn children, in a way that is both ethical and significant.

ContributorsWallace, Sydney Sarah (Author) / Hall, Rick (Thesis director) / Kamenca, Andrea (Committee member) / Human Systems Engineering (Contributor) / Arizona State University. College of Nursing & Healthcare Innovation (Contributor) / Barrett, The Honors College (Contributor)

Created2017-12

The Development and Validation of LGBT Bias Content for Use in an Online Training Program

Description

Previous research has shown that an individual's bias can have a negative impact on behavior. One proposed method of modifying such behavior is vicarious (observational) learning. In the current study, the researcher explored the possibility of using vicarious learning to create an effective training video on LGBT bias. The researcher…

Previous research has shown that an individual's bias can have a negative impact on behavior. One proposed method of modifying such behavior is vicarious (observational) learning. In the current study, the researcher explored the possibility of using vicarious learning to create an effective training video on LGBT bias. The researcher predicted that a vicarious learning video would be more effective at reducing negative LGBT bias than an informationally-equivalent control video. Participants completed the Explicit Attitudes of Sexuality questionnaire (EASQ), were randomized into one of two groups (vicarious or control), watched the assigned training video, and then completed the EASQ again to measure any changes in LGBT bias. The results of the study indicated that the vicarious video was no more effective in reducing negative LGBT bias when compared to the control. Additionally it was found that the vicarious training video was significantly more effective in eliciting new knowledge when compared to the control. The researcher discusses these findings in relation to Social Cognitive Theory for Personal and Social Change by Enabling Media. The researcher also explains how findings of insignificance could have been caused by a selection bias, self-report bias, and/or not enough treatment dosage.

ContributorsIoia, Kody Allan (Author) / Craig, Scotty (Thesis director) / Roscoe, Rod (Committee member) / Human Systems Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

FROM SUBSISTENCE TO SURPLUS: HELPING FARMERS IN RURAL PERU TO INCREASE CROP PRODUCTION BY IMPROVING SOILS

Description

"Seventy five percent of the world's poor live in rural areas of developing countries, where most people's livelihoods rely directly on agriculture." (USAid, 2014) Reduced levels of crop production and the accompanying problems of malnourishment exist all over the world. In rural Peru, for example, 11 percent of the population…

"Seventy five percent of the world's poor live in rural areas of developing countries, where most people's livelihoods rely directly on agriculture." (USAid, 2014) Reduced levels of crop production and the accompanying problems of malnourishment exist all over the world. In rural Peru, for example, 11 percent of the population is malnourished. (Global Healthfacts.org, 2012) Since the success in agriculture relies importantly on the fertility of the soil, it is imperative that any efforts at reversing this trend be primarily directed at improving the existing soils. This, in turn, will increase crop yields, and if done properly, will also conserve natural resources and maximize profits for farmers. In order to improve the lives of those at the bottom of the pyramid through agriculture, certain tools and knowledge must be provided in order to empower such persons to help themselves. An ancient method of soil improvement, known as Terra Preta do Indio (Indian dark earth), was discovered by Anthropologists in the 1800's. These dark, carbon-rich, soils are notable for their high fertility, high amounts of plant available nutrients, and their high moisture retention rates. The key to their long-lasting fertility and durability is the presence of high levels of biochar, a highly stable organic carbon \u2014 produced when organic matter (crop residues, food waste, manure, etc.) is burned at low temperatures in the absence of oxygen. Research has shown that when charcoal (biochar) and fertilizers are combined, it can yield as much as 880 percent more than when fertilizers are used by themselves. (Steiner, University of Bayreuth, 2004)

ContributorsStefanik, Kathleen Ann (Author) / Henderson, Mark (Thesis director) / Johnson, Nathan (Committee member) / Barrett, The Honors College (Contributor) / Human Systems Engineering (Contributor)

Created2014-12

Perceptual Training for Sensor Operators: Scenario Development

Description

Improvised explosive devices (IEDs) have become a major threat to military personnel in recent years. In the United States Army, Mission Payload Operators (MPOs) operate cameras from unmanned aerial vehicles (UAVs) to detect the threat of IEDs using real-time images received. Previous researchers obtained the expert knowledge of twelve MPOs…

Improvised explosive devices (IEDs) have become a major threat to military personnel in recent years. In the United States Army, Mission Payload Operators (MPOs) operate cameras from unmanned aerial vehicles (UAVs) to detect the threat of IEDs using real-time images received. Previous researchers obtained the expert knowledge of twelve MPOs at Fort Huachuca and learned that they rely on "behavioral signatures," the behavioral and environmental cues associated with IED threat rather than the IED itself (Cooke, Hosch, Banas, Hunn, Staszewski & Fensterer, 2010). To the best of our knowledge, no formal MPO training exists and all training is acquired on-the-job. The end goal is to create training systems for future MPOs using cognitive engineering based on expert skill (CEBES) that focus on detection of behavioral cues associated with IED threats. The complexity and dynamicity of cues associated with IED emplacement is to be noted, as such cues are influenced by sociocultural knowledge and often develop over significant periods of time. A dynamic full motion video simulation environment has been created, and embedded with cues elicited from expert MPOs. A three-part simulation has been created. The next step is verifying critical cues MPOs identify and focus on using eye tracking equipment.

ContributorsKnobloch, Ashley Kay (Author) / Cooke, Nancy (Thesis director) / Branaghan, Russ (Committee member) / Barrett, The Honors College (Contributor) / School of Nutrition and Health Promotion (Contributor) / Human Systems Engineering (Contributor)

Created2014-12

Investigation in Prolog-based Machine Translation with English-Hungarian Case Study

Description

This undergraduate thesis explores the efficacy of developing a translator generator in the Prolog programming language using Lexical Functional Grammars. A bidirectional machine translator between English and Hungarian, developed as a proof-of-concept case study, is discussed and assessed. The benefits and drawbacks of this approach as generalized to Machine Translation…

This undergraduate thesis explores the efficacy of developing a translator generator in the Prolog programming language using Lexical Functional Grammars. A bidirectional machine translator between English and Hungarian, developed as a proof-of-concept case study, is discussed and assessed. The benefits and drawbacks of this approach as generalized to Machine Translation systems are also discussed, along with possible areas of future work.

ContributorsLane, Ryan Andrew (Author) / Bansal, Ajay (Thesis director) / Bansal, Srividya (Committee member) / Barrett, The Honors College (Contributor)

Created2015-05

Teamwork in Orchestras

Description

The knowledge of cognitive processes of teams and how they work as a system, has drastically broadened in recent years. However, few researchers have applied their findings to an orchestral setting. In the current study, team cognition was observed and analyzed based off an 8th grade orchestra, in addition to…

The knowledge of cognitive processes of teams and how they work as a system, has drastically broadened in recent years. However, few researchers have applied their findings to an orchestral setting. In the current study, team cognition was observed and analyzed based off an 8th grade orchestra, in addition to the middle and highest-level orchestras at a junior high and high school in the Arizona Public School system. It was found, that in the 8th grade orchestra, most communication is either given or received in the form of auditory cues both verbal and musical. Regardless of skill level, groups that have higher interactions during practices have better performances.

ContributorsColeman, Pamela Brooke (Author) / Cooke, Nancy (Thesis director) / Craig, Scotty (Committee member) / Human Systems Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05