ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- All Subjects: Big Data
First, while an increasing amount of investigation has been done in this important area, most existing work concentrates on efficiency instead of search quality and may fail to deliver high quality results from semantic perspectives. Majority of the existing work generates minimal sub-graph results that are oblivious to the entity and relationship semantics embedded in the data and in the user query. There are also studies that define results to be subtrees or subgraphs that contain all query keywords but are not necessarily ``minimal''. However, such result construction method suffers from the same problem of semantic mis-alignment between data and user query. In this work the semantics of how to {\em define} results that can capture users' search intention and then the generation of search intention aware results is studied.
Second, most existing research is incapable of handling large-scale structured data. However, as data volume has seen rapid growth in recent years, the problem of how to efficiently process keyword queries on large-scale structured data becomes important. MapReduce is widely acknowledged as an effective programming model to process big data. For keyword query processing on data graph, first graph algorithms which can efficiently return query results that are consistent with users' search intention are proposed. Then these algorithms are migrated to MapReduce to support big data. For keyword query processing on schema graph, it first transforms a keyword query into multiple SQL queries, then all generated SQL queries are run on the structured data. Therefore it is crucial to find the optimal way to execute a SQL query using MapReduce, which can minimize the processing time. In this work, a system called SOSQL is developed which generates the optimal query execution plan using MapReduce for a SQL query $Q$ with time complexity $O(n^2)$, where $n$ is the number of input tables of $Q$.
A critical review of the existing SI modeling paradigms is first presented, which also highlights features of big data that are particular to SI data. Next, a simulation experiment is carried out to evaluate three different statistical modeling frameworks for SI data that are supported by different underlying conceptual frameworks. Then, two approaches are taken to identify the potential and pitfalls associated with two newer sources of data from New York City - bike-share cycling trips and taxi trips. The first approach builds a model of commuting behavior using a traditional census data set and then compares the results for the same model when it is applied to these newer data sources. The second approach examines how the increased temporal resolution of big SI data may be incorporated into SI models.
Several important results are obtained through this research. First, it is demonstrated that different SI models account for different types of spatial effects and that the Competing Destination framework seems to be the most robust for capturing spatial structure effects. Second, newer sources of big SI data are shown to be very useful for complimenting traditional sources of data, though they are not sufficient substitutions. Finally, it is demonstrated that the increased temporal resolution of new data sources may usher in a new era of SI modeling that allows us to better understand the dynamics of human behavior.