Matching Items (21)
Filtering by

Clear all filters

156107-Thumbnail Image.png
Description
Online social media is popular due to its real-time nature, extensive connectivity and a large user base. This motivates users to employ social media for seeking information by reaching out to their large number of social connections. Information seeking can manifest in the form of requests for personal and time-critical

Online social media is popular due to its real-time nature, extensive connectivity and a large user base. This motivates users to employ social media for seeking information by reaching out to their large number of social connections. Information seeking can manifest in the form of requests for personal and time-critical information or gathering perspectives on important issues. Social media platforms are not designed for resource seeking and experience large volumes of messages, leading to requests not being fulfilled satisfactorily. Designing frameworks to facilitate efficient information seeking in social media will help users to obtain appropriate assistance for their needs

and help platforms to increase user satisfaction.

Several challenges exist in the way of facilitating information seeking in social media. First, the characteristics affecting the user’s response time for a question are not known, making it hard to identify prompt responders. Second, the social context in which the user has asked the question has to be determined to find personalized responders. Third, users employ rhetorical requests, which are statements having the

syntax of questions, and systems assisting information seeking might be hindered from focusing on genuine questions. Fouth, social media advocates of political campaigns employ nuanced strategies to prevent users from obtaining balanced perspectives on

issues of public importance.

Sociological and linguistic studies on user behavior while making or responding to information seeking requests provides concepts drawing from which we can address these challenges. We propose methods to estimate the response time of the user for a given question to identify prompt responders. We compute the question specific social context an asker shares with his social connections to identify personalized responders. We draw from theories of political mobilization to model the behaviors arising from the strategies of people trying to skew perspectives. We identify rhetorical questions by modeling user motivations to post them.
ContributorsRanganath, Suhas (Author) / Liu, Huan (Thesis advisor) / Lai, Ying-Cheng (Thesis advisor) / Tong, Hanghang (Committee member) / Vaculin, Roman (Committee member) / Arizona State University (Publisher)
Created2017
156246-Thumbnail Image.png
Description
Diffusion processes in networks can be used to model many real-world processes, such as the propagation of a rumor on social networks and cascading failures on power networks. Analysis of diffusion processes in networks can help us answer important questions such as the role and the importance of each node

Diffusion processes in networks can be used to model many real-world processes, such as the propagation of a rumor on social networks and cascading failures on power networks. Analysis of diffusion processes in networks can help us answer important questions such as the role and the importance of each node in the network for spreading the diffusion and how to top or contain a cascading failure in the network. This dissertation consists of three parts.

In the first part, we study the problem of locating multiple diffusion sources in networks under the Susceptible-Infected-Recovered (SIR) model. Given a complete snapshot of the network, we developed a sample-path-based algorithm, named clustering and localization, and proved that for regular trees, the estimators produced by the proposed algorithm are within a constant distance from the real sources with a high probability. Then, we considered the case in which only a partial snapshot is observed and proposed a new algorithm, named Optimal-Jordan-Cover (OJC). The algorithm first extracts a subgraph using a candidate selection algorithm that selects source candidates based on the number of observed infected nodes in their neighborhoods. Then, in the extracted subgraph, OJC finds a set of nodes that "cover" all observed infected nodes with the minimum radius. The set of nodes is called the Jordan cover, and is regarded as the set of diffusion sources. We proved that OJC can locate all sources with probability one asymptotically with partial observations in the Erdos-Renyi (ER) random graph. Multiple experiments on different networks were done, which show our algorithms outperform others.

In the second part, we tackle the problem of reconstructing the diffusion history from partial observations. We formulated the diffusion history reconstruction problem as a maximum a posteriori (MAP) problem and proved the problem is NP hard. Then we proposed a step-by- step reconstruction algorithm, which can always produce a diffusion history that is consistent with the partial observations. Our experimental results based on synthetic and real networks show that the algorithm significantly outperforms some existing methods.

In the third part, we consider the problem of improving the robustness of an interdependent network by rewiring a small number of links during a cascading attack. We formulated the problem as a Markov decision process (MDP) problem. While the problem is NP-hard, we developed an effective and efficient algorithm, RealWire, to robustify the network and to mitigate the damage during the attack. Extensive experimental results show that our algorithm outperforms other algorithms on most of the robustness metrics.
ContributorsChen, Zhen (Author) / Ying, Lei (Thesis advisor) / Tong, Hanghang (Thesis advisor) / Zhang, Junshan (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)
Created2018
156735-Thumbnail Image.png
Description
The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture

The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture certain properties of the networks. With the learned node representations, machine learning and data mining algorithms can be applied for network mining tasks such as link prediction and node classification. Because of its ability to learn good node representations, network representation learning is attracting increasing attention and various network embedding algorithms are proposed.

Despite the success of these network embedding methods, the majority of them are dedicated to static plain networks, i.e., networks with fixed nodes and links only; while in social media, networks can present in various formats, such as attributed networks, signed networks, dynamic networks and heterogeneous networks. These social networks contain abundant rich information to alleviate the network sparsity problem and can help learn a better network representation; while plain network embedding approaches cannot tackle such networks. For example, signed social networks can have both positive and negative links. Recent study on signed networks shows that negative links have added value in addition to positive links for many tasks such as link prediction and node classification. However, the existence of negative links challenges the principles used for plain network embedding. Thus, it is important to study signed network embedding. Furthermore, social networks can be dynamic, where new nodes and links can be introduced anytime. Dynamic networks can reveal the concept drift of a user and require efficiently updating the representation when new links or users are introduced. However, static network embedding algorithms cannot deal with dynamic networks. Therefore, it is important and challenging to propose novel algorithms for tackling different types of social networks.

In this dissertation, we investigate network representation learning in social media. In particular, we study representative social networks, which includes attributed network, signed networks, dynamic networks and document networks. We propose novel frameworks to tackle the challenges of these networks and learn representations that not only capture the network structure but also the unique properties of these social networks.
ContributorsWang, Suhang (Author) / Liu, Huan (Thesis advisor) / Aggarwal, Charu (Committee member) / Sen, Arunabha (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2018
156711-Thumbnail Image.png
Description
Social media refers computer-based technology that allows the sharing of information and building the virtual networks and communities. With the development of internet based services and applications, user can engage with social media via computer and smart mobile devices. In recent years, social media has taken the form

Social media refers computer-based technology that allows the sharing of information and building the virtual networks and communities. With the development of internet based services and applications, user can engage with social media via computer and smart mobile devices. In recent years, social media has taken the form of different activities such as social network, business network, text sharing, photo sharing, blogging, etc. With the increasing popularity of social media, it has accumulated a large amount of data which enables understanding the human behavior possible. Compared with traditional survey based methods, the analysis of social media provides us a golden opportunity to understand individuals at scale and in turn allows us to design better services that can tailor to individuals’ needs. From this perspective, we can view social media as sensors, which provides online signals from a virtual world that has no geographical boundaries for the real world individual's activity.

One of the key features for social media is social, where social media users actively interact to each via generating content and expressing the opinions, such as post and comment in Facebook. As a result, sentiment analysis, which refers a computational model to identify, extract or characterize subjective information expressed in a given piece of text, has successfully employs user signals and brings many real world applications in different domains such as e-commerce, politics, marketing, etc. The goal of sentiment analysis is to classify a user’s attitude towards various topics into positive, negative or neutral categories based on textual data in social media. However, recently, there is an increasing number of people start to use photos to express their daily life on social media platforms like Flickr and Instagram. Therefore, analyzing the sentiment from visual data is poise to have great improvement for user understanding.

In this dissertation, I study the problem of understanding human sentiments from large scale collection of social images based on both image features and contextual social network features. We show that neither

visual features nor the textual features are by themselves sufficient for accurate sentiment prediction. Therefore, we provide a way of using both of them, and formulate sentiment prediction problem in two scenarios: supervised and unsupervised. We first show that the proposed framework has flexibility to incorporate multiple modalities of information and has the capability to learn from heterogeneous features jointly with sufficient training data. Secondly, we observe that negative sentiment may related to human mental health issues. Based on this observation, we aim to understand the negative social media posts, especially the post related to depression e.g., self-harm content. Our analysis, the first of its kind, reveals a number of important findings. Thirdly, we extend the proposed sentiment prediction task to a general multi-label visual recognition task to demonstrate the methodology flexibility behind our sentiment analysis model.
ContributorsWang, Yilin (Author) / Li, Baoxin (Thesis advisor) / Liu, Huan (Committee member) / Tong, Hanghang (Committee member) / Chang, Yi (Committee member) / Arizona State University (Publisher)
Created2018
156862-Thumbnail Image.png
Description
Teams are increasingly indispensable to achievements in any organizations. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the {\it social, cognitive} and {\it information} level in relation to team performance and network dynamics. The goal of this dissertation is

Teams are increasingly indispensable to achievements in any organizations. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the {\it social, cognitive} and {\it information} level in relation to team performance and network dynamics. The goal of this dissertation is to create new instruments to {\it predict}, {\it optimize} and {\it explain} teams' performance in the context of composite networks (i.e., social-cognitive-information networks).

Understanding the dynamic mechanisms that drive the success of high-performing teams can provide the key insights into building the best teams and hence lift the productivity and profitability of the organizations. For this purpose, novel predictive models to forecast the long-term performance of teams ({\it point prediction}) as well as the pathway to impact ({\it trajectory prediction}) have been developed. A joint predictive model by exploring the relationship between team level and individual level performances has also been proposed.

For an existing team, it is often desirable to optimize its performance through expanding the team by bringing a new team member with certain expertise, or finding a new candidate to replace an existing under-performing member. I have developed graph kernel based performance optimization algorithms by considering both the structural matching and skill matching to solve the above enhancement scenarios. I have also worked towards real time team optimization by leveraging reinforcement learning techniques.

With the increased complexity of the machine learning models for predicting and optimizing teams, it is critical to acquire a deeper understanding of model behavior. For this purpose, I have investigated {\em explainable prediction} -- to provide explanation behind a performance prediction and {\em explainable optimization} -- to give reasons why the model recommendations are good candidates for certain enhancement scenarios.
ContributorsLi, Liangyue (Author) / Tong, Hanghang (Thesis advisor) / Baral, Chitta (Committee member) / Liu, Huan (Committee member) / Buchler, Norbou (Committee member) / Arizona State University (Publisher)
Created2018
157057-Thumbnail Image.png
Description
The pervasive use of social media gives it a crucial role in helping the public perceive reliable information. Meanwhile, the openness and timeliness of social networking sites also allow for the rapid creation and dissemination of misinformation. It becomes increasingly difficult for online users to find accurate and trustworthy information.

The pervasive use of social media gives it a crucial role in helping the public perceive reliable information. Meanwhile, the openness and timeliness of social networking sites also allow for the rapid creation and dissemination of misinformation. It becomes increasingly difficult for online users to find accurate and trustworthy information. As witnessed in recent incidents of misinformation, it escalates quickly and can impact social media users with undesirable consequences and wreak havoc instantaneously. Different from some existing research in psychology and social sciences about misinformation, social media platforms pose unprecedented challenges for misinformation detection. First, intentional spreaders of misinformation will actively disguise themselves. Second, content of misinformation may be manipulated to avoid being detected, while abundant contextual information may play a vital role in detecting it. Third, not only accuracy, earliness of a detection method is also important in containing misinformation from being viral. Fourth, social media platforms have been used as a fundamental data source for various disciplines, and these research may have been conducted in the presence of misinformation. To tackle the challenges, we focus on developing machine learning algorithms that are robust to adversarial manipulation and data scarcity.

The main objective of this dissertation is to provide a systematic study of misinformation detection in social media. To tackle the challenges of adversarial attacks, I propose adaptive detection algorithms to deal with the active manipulations of misinformation spreaders via content and networks. To facilitate content-based approaches, I analyze the contextual data of misinformation and propose to incorporate the specific contextual patterns of misinformation into a principled detection framework. Considering its rapidly growing nature, I study how misinformation can be detected at an early stage. In particular, I focus on the challenge of data scarcity and propose a novel framework to enable historical data to be utilized for emerging incidents that are seemingly irrelevant. With misinformation being viral, applications that rely on social media data face the challenge of corrupted data. To this end, I present robust statistical relational learning and personalization algorithms to minimize the negative effect of misinformation.
ContributorsWu, Liang (Author) / Liu, Huan (Thesis advisor) / Tong, Hanghang (Committee member) / Doupe, Adam (Committee member) / Davison, Brian D. (Committee member) / Arizona State University (Publisher)
Created2019
157077-Thumbnail Image.png
Description
Networks naturally appear in many high-impact applications. The simplest model of networks is single-layered networks, where the nodes are from the same domain and the links are of the same type. However, as the world is highly coupled, nodes from different application domains tend to be interdependent on each

Networks naturally appear in many high-impact applications. The simplest model of networks is single-layered networks, where the nodes are from the same domain and the links are of the same type. However, as the world is highly coupled, nodes from different application domains tend to be interdependent on each other, forming a more complex network model called multi-layered networks.

Among the various aspects of network studies, network connectivity plays an important role in a myriad of applications. The diversified application areas have spurred numerous connectivity measures, each designed for some specific tasks. Although effective in their own fields, none of the connectivity measures is generally applicable to all the tasks. Moreover, existing connectivity measures are predominantly based on single-layered networks, with few attempts made on multi-layered networks.

Most connectivity analyzing methods assume that the input network is static and accurate, which is not realistic in many applications. As real-world networks are evolving, their connectivity scores would vary by time as well, making it imperative to keep track of those changing parameters in a timely manner. Furthermore, as the observed links in the input network may be inaccurate due to noise and incomplete data sources, it is crucial to infer a more accurate network structure to better approximate its connectivity scores.

The ultimate goal of connectivity studies is to optimize the connectivity scores via manipulating the network structures. For most complex measures, the hardness of the optimization problem still remains unknown. Meanwhile, current optimization methods are mainly ad-hoc solutions for specific types of connectivity measures on single-layered networks. No optimization framework has ever been proposed to tackle a wider range of connectivity measures on complex networks.

In this thesis, an in-depth study of connectivity measures, inference, and optimization problems will be proposed. Specifically, a unified connectivity measure model will be introduced to unveil the commonality among existing connectivity measures. For the connectivity inference aspect, an effective network inference method and connectivity tracking framework will be described. Last, a generalized optimization framework will be built to address the connectivity minimization/maximization problems on both single-layered and multi-layered networks.
ContributorsChen, Chen (Author) / Tong, Hanghang (Thesis advisor) / Davulcu, Hasan (Committee member) / Sen, Arunabha (Committee member) / Subrahmanian, V.S. (Committee member) / Ying, Lei (Committee member) / Arizona State University (Publisher)
Created2019
154769-Thumbnail Image.png
Description
Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index (DJI) to train a directional stock price prediction system based

Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index (DJI) to train a directional stock price prediction system based on news content. Next, I proceed to show that information in articles indicated by breaking Tweet volumes leads to a statistically significant boost in the hourly directional prediction accuracies for the DJI stock prices mentioned in these articles. Secondly, I show that using document-level sentiment extraction does not yield a statistically significant boost in the directional predictive accuracies in the presence of other 1-gram keyword features. Thirdly I test the performance of the system on several time-frames and identify the 4 hour time-frame for both the price charts and for Tweet breakout detection as the best time-frame combination. Finally, I develop a set of price momentum based trade exit rules to cut losing trades early and to allow the winning trades run longer. I show that the Tweet volume breakout based trading system with the price momentum based exit rules not only improves the winning accuracy and the return on investment, but it also lowers the maximum drawdown and achieves the highest overall return over maximum drawdown.
ContributorsAlostad, Hana (Author) / Davulcu, Hasan (Thesis advisor) / Corman, Steven (Committee member) / Tong, Hanghang (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)
Created2016
155830-Thumbnail Image.png
Description
Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and

Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and to generate a textual answer to the input question. There are two key research problems in VQA: image understanding and the question answering. My research mainly focuses on developing solutions to support solving these two problems.

In image understanding, one important research area is semantic segmentation, which takes images as input and output the label of each pixel. As much manual work is needed to label a useful training set, typical training sets for such supervised approaches are always small. There are also approaches with relaxed labeling requirement, called weakly supervised semantic segmentation, where only image-level labels are needed. With the development of social media, there are more and more user-uploaded images available

on-line. Such user-generated content often comes with labels like tags and may be coarsely labelled by various tools. To use these information for computer vision tasks, I propose a new graphic model by considering the neighborhood information and their interactions to obtain the pixel-level labels of the images with only incomplete image-level labels. The method was evaluated on both synthetic and real images.

In question answering, my research centers on best answer prediction, which addressed two main research topics: feature design and model construction. In the feature design part, most existing work discussed how to design effective features for answer quality / best answer prediction. However, little work mentioned how to design features by considering the relationship between answers of one given question. To fill this research gap, I designed new features to help improve the prediction performance. In the modeling part, to employ the structure of the feature space, I proposed an innovative learning-to-rank model by considering the hierarchical lasso. Experiments with comparison with the state-of-the-art in the best answer prediction literature have confirmed

that the proposed methods are effective and suitable for solving the research task.
ContributorsTian, Qiongjie (Author) / Li, Baoxin (Thesis advisor) / Tong, Hanghang (Committee member) / Davulcu, Hasan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2017
155865-Thumbnail Image.png
Description
Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions

Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions and applications. Despite their effectiveness, however, there are various shortcomings. One such shortcoming is a scalability problem due to their high computation costs on large size graphs and another problem on the measures is low accuracy when the significance of node and its degree in the graph are not related. The other problem is that their effectiveness is less when information for a graph is uncertain. For an uncertain graph, they require exponential computation costs to calculate ranking scores with considering all possible worlds.

In this thesis, I first introduce Locality-sensitive, Re-use promoting, approximate Personalized PageRank (LR-PPR) which is an approximate personalized PageRank calculating node rankings for the locality information for seeds without calculating the entire graph and reusing the precomputed locality information for different locality combinations. For the identification of locality information, I present Impact Neighborhood Indexing (INI) to find impact neighborhoods with nodes' fingerprints propagation on the network. For the accuracy challenge, I introduce Degree Decoupled PageRank (D2PR) technique to improve the effectiveness of PageRank based knowledge discovery, especially considering the significance of neighbors and degree of a given node. To tackle the uncertain challenge, I introduce Uncertain Personalized PageRank (UPPR) to approximately compute personalized PageRank values on uncertainties of edge existence and Interval Personalized PageRank with Integration (IPPR-I) and Interval Personalized PageRank with Mean (IPPR-M) to compute ranking scores for the case when uncertainty exists on edge weights as interval values.
ContributorsKim, Jung Hyun (Author) / Candan, K. Selcuk (Thesis advisor) / Davulcu, Hasan (Committee member) / Tong, Hanghang (Committee member) / Sapino, Maria Luisa (Committee member) / Arizona State University (Publisher)
Created2017