Search Content

Matching Items (3)

Filtering by

All Subjects: Facebook Presto
Genre: Masters Thesis
Creators: Bansal, Ajay
Creators: Henderson, Christopher

Distributed SPARQL over big RDF data: a comparative analysis using Presto and MapReduce

Description

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example.

This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.

ContributorsMammo, Mulugeta (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Lindquist, Timothy (Committee member) / Arizona State University (Publisher)

Created2014

Graph Search as a Feature in Imperative/Procedural Programming Languages

Description

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.

ContributorsHenderson, Christopher (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2018

An adaptable iOS mobile application for mobile data collection

Description

Mobile data collection (MDC) applications have been growing in the last decade

especially in the field of education and research. Although many MDC applications are

available, almost all of them are tailor-made for a very specific task in a very specific

field (i.e. health, traffic, weather forecasts, …etc.). Since the main users of these apps are

researchers, physicians or generally data collectors, it can be extremely challenging for

them to make adjustments or modifications to these applications given that they have

limited or no technical background in coding. Another common issue with MDC

applications is that its functionalities are limited only to data collection and storing. Other

functionalities such as data visualizations, data sharing, data synchronization and/or data updating are rarely found in MDC apps.

This thesis tries to solve the problems mentioned above by adding the following

two enhancements: (a) the ability for data collectors to customize their own applications

based on the project they’re working on, (b) and introducing new tools that would help

manage the collected data. This will be achieved by creating a Java standalone

application where data collectors can use to design their own mobile apps in a userfriendly Graphical User Interface (GUI). Once the app has been completely designed

using the Java tool, a new iOS mobile application would be automatically generated

based on the user’s input. By using this tool, researchers now are able to create mobile

applications that are completely tailored to their needs, in addition to enjoying new

features such as visualize and analyze data, synchronize data to the remote database,

share data with other data collectors and update existing data.

ContributorsAl-Kaf, Zahra M (Author) / Lindquist, Timothy E (Thesis advisor) / Bansal, Srividya (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2016