ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- All Subjects: Data Mining
- Creators: Davulcu, Hasan
- Creators: Horvitz, Eric
This dissertation studies how to effectively discover information in health forums. Several challenges have been identified. First, the existing work relies on the syntactic information unit, such as a sentence, a post, or a thread, to bind different pieces of information in a forum. However, most of information discovery tasks should be based on the semantic information unit, a patient. For instance, given a keyword query that involves the relationship between a treatment and side effects, it is expected that the matched keywords refer to the same patient. In this work, patient-centered mining is proposed to mine patient semantic information units. In a patient information unit, the health information, such as diseases, symptoms, treatments, effects, and etc., is connected by the corresponding patient.
Second, the information published in health forums has varying degree of quality. Some information includes patient-reported personal health experience, while others can be hearsay. In this work, a context-aware experience extraction framework is proposed to mine patient-reported personal health experience, which can be used for evidence-based knowledge discovery or finding patients with similar experience.
At last, the proposed patient-centered and experience-aware mining framework is used to build a patient health information database for effectively discovering adverse drug reactions (ADRs) from health forums. ADRs have become a serious health problem and even a leading cause of death in the United States. Health forums provide valuable evidences in a large scale and in a timely fashion through the active participation of patients, caregivers, and doctors. Empirical evaluation shows the effectiveness of the proposed approach.
a small set of labeled documents which can be used to classify a larger set of unknown
documents. Machine learning techniques can be used to analyze a political scenario
in a given society. A lot of research has been going on in this field to understand
the interactions of various people in the society in response to actions taken by their
organizations.
This paper talks about understanding the Russian influence on people in Latvia.
This is done by building an eeffective model learnt on initial set of documents
containing a combination of official party web-pages, important political leaders' social
networking sites. Since twitter is a micro-blogging site which allows people to post
their opinions on any topic, the model built is used for estimating the tweets sup-
porting the Russian and Latvian political organizations in Latvia. All the documents
collected for analysis are in Latvian and Russian languages which are rich in vocabulary resulting into huge number of features. Hence, feature selection techniques can
be used to reduce the vocabulary set relevant to the classification model. This thesis
provides a comparative analysis of traditional feature selection techniques and implementation of a new iterative feature selection method using EM and cross-domain
training along with supportive visualization tool. This method out performed other
feature selection methods by reducing the number of features up-to 50% along with
good model accuracy. The results from the classification are used to interpret user
behavior and their political influence patterns across organizations in Latvia using
interactive dashboard with combination of powerful widgets.
This research explores the problem of the why so few of the published algorithms enter production and furthermore, fewer end up generating sustained value. The dissertation proposes a ‘Design for Deployment’ (DFD) framework to successfully build machine learning analytics so they can be deployed to generate sustained value. The framework emphasizes and elaborates the often neglected but immensely important latter steps of an analytics process: ‘Evaluation’ and ‘Deployment’. A representative evaluation framework is proposed that incorporates the temporal-shifts and dynamism of real-world scenarios. Additionally, the recommended infrastructure allows analytics projects to pivot rapidly when a particular venture does not materialize. Deployment needs and apprehensions of the industry are identified and gaps addressed through a 4-step process for sustainable deployment. Lastly, the need for analytics as a functional area (like finance and IT) is identified to maximize the return on machine-learning deployment.
The framework and process is demonstrated in semiconductor manufacturing – it is highly complex process involving hundreds of optical, electrical, chemical, mechanical, thermal, electrochemical and software processes which makes it a highly dynamic non-stationary system. Due to the 24/7 uptime requirements in manufacturing, high-reliability and fail-safe are a must. Moreover, the ever growing volumes mean that the system must be highly scalable. Lastly, due to the high cost of change, sustained value proposition is a must for any proposed changes. Hence the context is ideal to explore the issues involved. The enterprise use-cases are used to demonstrate the robustness of the framework in addressing challenges encountered in the end-to-end process of productizing machine learning analytics in dynamic read-world scenarios.
- in fact, the de facto - virtual town halls for people to discover, report, share and
communicate with others about various types of events. These events range from
widely-known events such as the U.S Presidential debate to smaller scale, local events
such as a local Halloween block party. During these events, we often witness a large
amount of commentary contributed by crowds on social media. This burst of social
media responses surges with the "second-screen" behavior and greatly enriches the
user experience when interacting with the event and people's awareness of an event.
Monitoring and analyzing this rich and continuous flow of user-generated content can
yield unprecedentedly valuable information about the event, since these responses
usually offer far more rich and powerful views about the event that mainstream news
simply could not achieve. Despite these benefits, social media also tends to be noisy,
chaotic, and overwhelming, posing challenges to users in seeking and distilling high
quality content from that noise.
In this dissertation, I explore ways to leverage social media as a source of information and analyze events based on their social media responses collectively. I develop, implement and evaluate EventRadar, an event analysis toolbox which is able to identify, enrich, and characterize events using the massive amounts of social media responses. EventRadar contains three automated, scalable tools to handle three core event analysis tasks: Event Characterization, Event Recognition, and Event Enrichment. More specifically, I develop ET-LDA, a Bayesian model and SocSent, a matrix factorization framework for handling the Event Characterization task, i.e., modeling characterizing an event in terms of its topics and its audience's response behavior (via ET-LDA), and the sentiments regarding its topics (via SocSent). I also develop DeMa, an unsupervised event detection algorithm for handling the Event Recognition task, i.e., detecting trending events from a stream of noisy social media posts. Last, I develop CrowdX, a spatial crowdsourcing system for handling the Event Enrichment task, i.e., gathering additional first hand information (e.g., photos) from the field to enrich the given event's context.
Enabled by EventRadar, it is more feasible to uncover patterns that have not been
explored previously and re-validating existing social theories with new evidence. As a
result, I am able to gain deep insights into how people respond to the event that they
are engaged in. The results reveal several key insights into people's various responding
behavior over the event's timeline such the topical context of people's tweets does not
always correlate with the timeline of the event. In addition, I also explore the factors
that affect a person's engagement with real-world events on Twitter and find that
people engage in an event because they are interested in the topics pertaining to
that event; and while engaging, their engagement is largely affected by their friends'
behavior.