Matching Items (15)
Filtering by

Clear all filters

152541-Thumbnail Image.png
Description
Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like sustainability, public health care awareness, or to promote a specific product? The focus of my thesis is to answer that question. In this thesis, I develop a framework to personalize social signals for users to guide their activities on an online platform. As the result, we gradually nudge the activity distribution on the platform from the initial distribution p to the target distribution q. My work is particularly applicable to guiding collaborations, guiding collective actions, and online advertising. In particular, I first propose a probabilistic model on how users behave and how information flows on the platform. The main part of this thesis after that discusses the Influence Individuals through Social Signals (IISS) framework. IISS consists of four main components: (1) Learner: it learns users' interests and characteristics from their historical activities using Bayesian model, (2) Calculator: it uses gradient descent method to compute the intermediate activity distributions, (3) Selector: it selects users who can be influenced to adopt or drop specific activities, (4) Designer: it personalizes social signals for each user. I evaluate the performance of IISS framework by simulation on several network topologies such as preferential attachment, small world, and random. I show that the framework gradually nudges users' activities to approach the target distribution. I use both simulation and mathematical method to analyse convergence properties such as how fast and how close we can approach the target distribution. When the number of activities is 3, I show that for about 45% of target distributions, we can achieve KL-divergence as low as 0.05. But for some other distributions KL-divergence can be as large as 0.5.
ContributorsLe, Tien D (Author) / Sundaram, Hari (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2014
Description
Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with his/her friend on that topic. Finding context for an Orphaned tweet manually is challenging because of larger social graph of a user , the enormous volume of tweets generated per second, topic diversity, and limited information from tweet length of 140 characters. To help the user to get the context of an orphaned tweet, this thesis aims at building a hashtag recommendation system called TweetSense, to suggest hashtags as a context or metadata for the orphaned tweets. This in turn would increase user's social engagement and impact Twitter to maintain its monthly active online users in its social network. In contrast to other existing systems, this hashtag recommendation system recommends personalized hashtags by exploiting the social signals of users in Twitter. The novelty with this system is that it emphasizes on selecting the suitable candidate set of hashtags from the related tweets of user's social graph (timeline).The system then rank them based on the combination of features scores computed from their tweet and user related features. It is evaluated based on its ability to predict suitable hashtags for a random sample of tweets whose existing hashtags are deliberately removed for evaluation. I present a detailed internal empirical evaluation of TweetSense, as well as an external evaluation in comparison with current state of the art method.
ContributorsVijayakumar, Manikandan (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2014
153229-Thumbnail Image.png
Description
Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by many skyline algorithms is that a skyline query is applied

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms have been proposed to address this problem in the context of static data sources. However, these algorithms suffer from several drawbacks: they often need to scan the data sources exhaustively to obtain the skyline-join results; moreover, the pruning techniques employed to eliminate tuples are largely based on expensive tuple-to-tuple comparisons. On the other hand, most data stream techniques focus on single stream skyline queries, thus rendering them unsuitable for skyline-join queries.

Another assumption typically made by most of the earlier skyline algorithms is that the data is complete and all skyline attribute values are available. Due to this constraint, these algorithms cannot be applied to incomplete data sources in which some of the attribute values are missing and are represented by NULL values. There exists a definition of dominance for incomplete data, but this leads to undesirable consequences such as non-transitive and cyclic dominance relations both of which are detrimental to skyline processing.

Based on the aforementioned observations, the main goal of the research described in this dissertation is the design and development of a framework of skyline operators that effectively handles three distinct types of skyline queries: 1) skyline-join queries on static data sources, 2) skyline-window-join queries over data streams, and 3) strata-skyline queries on incomplete datasets. This dissertation presents the unique challenges posed by these skyline queries and addresses the shortcomings of current skyline techniques by proposing efficient methods to tackle the added overhead in processing skyline queries on static data sources, data streams, and incomplete datasets.
ContributorsNagendra, Mithila (Author) / Candan, Kasim Selcuk (Thesis advisor) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Silva, Yasin N. (Committee member) / Sundaram, Hari (Committee member) / Arizona State University (Publisher)
Created2014
150969-Thumbnail Image.png
Description
Analysis of political texts, which contains a huge amount of personal political opinions, sentiments, and emotions towards powerful individuals, leaders, organizations, and a large number of people, is an interesting task, which can lead to discover interesting interactions between the political parties and people. Recently, political blogosphere plays an increasingly

Analysis of political texts, which contains a huge amount of personal political opinions, sentiments, and emotions towards powerful individuals, leaders, organizations, and a large number of people, is an interesting task, which can lead to discover interesting interactions between the political parties and people. Recently, political blogosphere plays an increasingly important role in politics, as a forum for debating political issues. Most of the political weblogs are biased towards their political parties, and they generally express their sentiments towards their issues (i.e. leaders, topics etc.,) and also towards issues of the opposing parties. In this thesis, I have modeled the above interactions/debate as a sentimental bi-partite graph, a bi-partite graph with Blogs forming vertices of a disjoint set, and the issues (i.e. leaders, topics etc.,) forming the other disjoint set,and the edges between the two sets representing the sentiment of the blogs towards the issues. I have used American Political blog data to model the sentimental bi- partite graph, in particular, a set of popular political liberal and conservative blogs that have clearly declared positions. These blogs contain discussion about social, political, economic issues and related key individuals in their conservative/liberal view. To be more focused and more polarized, 22 most popular liberal/conservative blogs of a particular time period, May 2008 - October 2008(because of high intensity of debate and discussions), just before the presidential elections, was considered, involving around 23,800 articles. This thesis involves solving the questions: a) which is the most liberal/conservative blogs on the web? b) Who is on which side of debate and what are the issues? c) Who are the important leaders? d) How do you model the relationship between the participants of the debate and the underlying issues?
ContributorsThirumalai, Dananjayan (Author) / Davulcu, Hasan (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2012
150836-Thumbnail Image.png
Description
Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is ineffective in containing radicalism as a whole. There is a need to understand the origin, ideologies and behavior of Radical

Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is ineffective in containing radicalism as a whole. There is a need to understand the origin, ideologies and behavior of Radical and Counter-Radical organizations and how they shape up over a period of time. Recognizing and supporting counter-radical organizations is one of the most important steps towards impeding radical organizations. A lot of research has already been done to categorize and recognize organizations, to understand their behavior, their interactions with other organizations, their target demographics and the area of influence. We have a huge amount of information which is a result of the research done over these topics. This thesis provides a powerful and interactive way to navigate through all this information, using a Visualization Dashboard. The dashboard makes it easier for Social Scientists, Policy Analysts, Military and other personnel to visualize an organization's propensity towards violence and radicalism. It also tracks the peaking religious, political and socio-economic markers, their target demographics and locations. A powerful search interface with parametric search helps in narrowing down to specific scenarios and view the corresponding information related to the organizations. This tool helps to identify moderate Counter-Radical organizations and also has the potential of predicting the orientation of various organizations based on the current information.
ContributorsNair, Shreejay (Author) / Davulcu, Hasan (Thesis advisor) / Dasgpta, Partha (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2012
153988-Thumbnail Image.png
Description
With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain

With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from W3C are used in this system. Apart from common IR features like Term Frequency, Inverse Document Frequency, Term Rank, a variation of page rank based on graph model, which can cluster the words with respective to word ambiguity, is implemented. Term Rank also considers the possibility of co-occurrence of words with the corpus and evaluates the rank of the word accordingly. Sentences of email threads are ranked as per features and summaries are generated. System implemented the concept of pyramid evaluation in content selection. The system can be considered as a framework for Unsupervised Learning in text summarization.
ContributorsNadella, Sravan (Author) / Davulcu, Hasan (Thesis advisor) / Li, Baoxin (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2015
156205-Thumbnail Image.png
Description
The subliminal impact of framing of social, political and environmental issues such as climate change has been studied for decades in political science and communications research. Media framing offers an “interpretative package" for average citizens on how to make sense of climate change and its consequences to their livelihoods, how

The subliminal impact of framing of social, political and environmental issues such as climate change has been studied for decades in political science and communications research. Media framing offers an “interpretative package" for average citizens on how to make sense of climate change and its consequences to their livelihoods, how to deal with its negative impacts, and which mitigation or adaptation policies to support. A line of related work has used bag of words and word-level features to detect frames automatically in text. Such works face limitations since standard keyword based features may not generalize well to accommodate surface variations in text when different keywords are used for similar concepts.

This thesis develops a unique type of textual features that generalize triplets extracted from text, by clustering them into high-level concepts. These concepts are utilized as features to detect frames in text. Compared to uni-gram and bi-gram based models, classification and clustering using generalized concepts yield better discriminating features and a higher classification accuracy with a 12% boost (i.e. from 74% to 83% F-measure) and 0.91 clustering purity for Frame/Non-Frame detection.

The automatic discovery of complex causal chains among interlinked events and their participating actors has not yet been thoroughly studied. Previous studies related to extracting causal relationships from text were based on laborious and incomplete hand-developed lists of explicit causal verbs, such as “causes" and “results in." Such approaches result in limited recall because standard causal verbs may not generalize well to accommodate surface variations in texts when different keywords and phrases are used to express similar causal effects. Therefore, I present a system that utilizes generalized concepts to extract causal relationships. The proposed algorithms overcome surface variations in written expressions of causal relationships and discover the domino effects between climate events and human security. This semi-supervised approach alleviates the need for labor intensive keyword list development and annotated datasets. Experimental evaluations by domain experts achieve an average precision of 82%. Qualitative assessments of causal chains show that results are consistent with the 2014 IPCC report illuminating causal mechanisms underlying the linkages between climatic stresses and social instability.
ContributorsAlashri, Saud (Author) / Davulcu, Hasan (Thesis advisor) / Desouza, Kevin C. (Committee member) / Maciejewski, Ross (Committee member) / Hsiao, Sharon (Committee member) / Arizona State University (Publisher)
Created2018
153858-Thumbnail Image.png
Description
Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who has chosen to subscribe to the tweets from the originator by following the originator. Although, hashtags are used to tag tweets in an effort to attach context to the tweets, many tweets do not have a hashtag. Such tweets are called orphan tweets and they adversely affect the experience of a browser.

A hashtag is a type of label or meta-data tag used in social networks and micro-blogging services which makes it easier for users to find messages with a specific theme or content. The context of a tweet can be defined as a set of one or more hashtags. Users often do not use hashtags to tag their tweets. This leads to the problem of missing context for tweets. To address the problem of missing hashtags, a statistical method was proposed which predicts most likely hashtags based on the social circle of an originator.

In this thesis, we propose to improve on the existing context recovery system by selectively limiting the candidate set of hashtags to be derived from the intimate circle of the originator rather than from every user in the social network of the originator. This helps in reducing the computation, increasing speed of prediction, scaling the system to originators with large social networks while still preserving most of the accuracy of the predictions. We also propose to not only derive the candidate hashtags from the social network of the originator but also derive the candidate hashtags based on the content of the tweet. We further propose to learn personalized statistical models according to the adoption patterns of different originators. This helps in not only identifying the personalized candidate set of hashtags based on the social circle and content of the tweets but also in customizing the hashtag adoption pattern to the originator of the tweet.
ContributorsMallapura Umamaheshwar, Tejas (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2015
153586-Thumbnail Image.png
Description
With the advent of social media and micro-blogging sites, people have become active in sharing their thoughts, opinions, ideologies and furthermore enforcing them on others. Users have become the source for the production and dissemination of real time information. The content posted by the users can be used to understand

With the advent of social media and micro-blogging sites, people have become active in sharing their thoughts, opinions, ideologies and furthermore enforcing them on others. Users have become the source for the production and dissemination of real time information. The content posted by the users can be used to understand them and track their behavior. Using this content of the user, data analysis can be performed to understand their social ideology and affinity towards Radical and Counter-Radical Movements. During the process of expressing their opinions people use hashtags in their messages in Twitter. These hashtags are a rich source of information in understanding the content based relationship between the online users apart from the existing context based follower and friend relationship.

An intelligent visual dash-board system is necessary which can track the activities of the users and diffusion of the online social movements, identify the hot-spots in the users' network, show the geographic foot print of the users and to understand the socio-cultural, economic and political drivers for the relationship among different groups of the users.
ContributorsGaripalli, Sravan Kumar (Author) / Davulcu, Hasan (Thesis advisor) / Shakarian, Paulo (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)
Created2015
154589-Thumbnail Image.png
Description
Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability

Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability of customer information and interests. In this thesis, bank telemarketing data from a Portuguese banking institution were analyzed to determine predictability of several client demographic and financial attributes and find most contributing factors in each. Data were preprocessed to ensure quality, and then data mining models were generated for the attributes with logistic regression, support vector machine (SVM) and random forest using Orange as the data mining tool. Results were analyzed using precision, recall and F1 score.
ContributorsEjaz, Samira (Author) / Davulcu, Hasan (Thesis advisor) / Balasooriya, Janaka (Committee member) / Candan, Kasim (Committee member) / Arizona State University (Publisher)
Created2016