ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- All Subjects: Big Data
- All Subjects: Web Mining
- Creators: Davulcu, Hasan
and social topics (Papacharissi 2002; Himelboim 2010). Hotly debated issues
span all spheres of human activity; from liberal vs. conservative politics, to radical
vs. counter-radical religious debate, to climate change debate in scientific community,
to globalization debate in economics, and to nuclear disarmament debate in
security. Many prominent ’camps’ have emerged within Internet debate rhetoric and
practice (Dahlberg, n.d.).
In this research I utilized feature extraction and model fitting techniques to process
the rhetoric found in the web sites of 23 Indonesian Islamic religious organizations,
later with 26 similar organizations from the United Kingdom to profile their
ideology and activity patterns along a hypothesized radical/counter-radical scale, and
presented an end-to-end system that is able to help researchers to visualize the data
in an interactive fashion on a time line. The subject data of this study is the articles
downloaded from the web sites of these organizations dating from 2001 to 2011,
and in 2013. I developed algorithms to rank these organizations by assigning them
to probable positions on the scale. I showed that the developed Rasch model fits
the data using Andersen’s LR-test (likelihood ratio). I created a gold standard of
the ranking of these organizations through an expertise elicitation tool. Then using
my system I computed expert-to-expert agreements, and then presented experimental
results comparing the performance of three baseline methods to show that the
Rasch model not only outperforms the baseline methods, but it was also the only
system that performs at expert-level accuracy.
I developed an end-to-end system that receives list of organizations from experts,
mines their web corpus, prepare discourse topic lists with expert support, and then
ranks them on scales with partial expert interaction, and finally presents them on an
easy to use web based analytic system.
First, while an increasing amount of investigation has been done in this important area, most existing work concentrates on efficiency instead of search quality and may fail to deliver high quality results from semantic perspectives. Majority of the existing work generates minimal sub-graph results that are oblivious to the entity and relationship semantics embedded in the data and in the user query. There are also studies that define results to be subtrees or subgraphs that contain all query keywords but are not necessarily ``minimal''. However, such result construction method suffers from the same problem of semantic mis-alignment between data and user query. In this work the semantics of how to {\em define} results that can capture users' search intention and then the generation of search intention aware results is studied.
Second, most existing research is incapable of handling large-scale structured data. However, as data volume has seen rapid growth in recent years, the problem of how to efficiently process keyword queries on large-scale structured data becomes important. MapReduce is widely acknowledged as an effective programming model to process big data. For keyword query processing on data graph, first graph algorithms which can efficiently return query results that are consistent with users' search intention are proposed. Then these algorithms are migrated to MapReduce to support big data. For keyword query processing on schema graph, it first transforms a keyword query into multiple SQL queries, then all generated SQL queries are run on the structured data. Therefore it is crucial to find the optimal way to execute a SQL query using MapReduce, which can minimize the processing time. In this work, a system called SOSQL is developed which generates the optimal query execution plan using MapReduce for a SQL query $Q$ with time complexity $O(n^2)$, where $n$ is the number of input tables of $Q$.