Search Content

Network Representation Learning in Social Media

Description

The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture…

The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture certain properties of the networks. With the learned node representations, machine learning and data mining algorithms can be applied for network mining tasks such as link prediction and node classification. Because of its ability to learn good node representations, network representation learning is attracting increasing attention and various network embedding algorithms are proposed.

Despite the success of these network embedding methods, the majority of them are dedicated to static plain networks, i.e., networks with fixed nodes and links only; while in social media, networks can present in various formats, such as attributed networks, signed networks, dynamic networks and heterogeneous networks. These social networks contain abundant rich information to alleviate the network sparsity problem and can help learn a better network representation; while plain network embedding approaches cannot tackle such networks. For example, signed social networks can have both positive and negative links. Recent study on signed networks shows that negative links have added value in addition to positive links for many tasks such as link prediction and node classification. However, the existence of negative links challenges the principles used for plain network embedding. Thus, it is important to study signed network embedding. Furthermore, social networks can be dynamic, where new nodes and links can be introduced anytime. Dynamic networks can reveal the concept drift of a user and require efficiently updating the representation when new links or users are introduced. However, static network embedding algorithms cannot deal with dynamic networks. Therefore, it is important and challenging to propose novel algorithms for tackling different types of social networks.

In this dissertation, we investigate network representation learning in social media. In particular, we study representative social networks, which includes attributed network, signed networks, dynamic networks and document networks. We propose novel frameworks to tackle the challenges of these networks and learn representations that not only capture the network structure but also the unique properties of these social networks.

ContributorsWang, Suhang (Author) / Liu, Huan (Thesis advisor) / Aggarwal, Charu (Committee member) / Sen, Arunabha (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2018

Design, Analysis and Computation in Wireless and Optical Networks

Description

In the realm of network science, many topics can be abstracted as graph problems, such as routing, connectivity enhancement, resource/frequency allocation and so on. Though most of them are NP-hard to solve, heuristics as well as approximation algorithms are proposed to achieve reasonably good results. Accordingly, this dissertation studies graph…

In the realm of network science, many topics can be abstracted as graph problems, such as routing, connectivity enhancement, resource/frequency allocation and so on. Though most of them are NP-hard to solve, heuristics as well as approximation algorithms are proposed to achieve reasonably good results. Accordingly, this dissertation studies graph related problems encountered in real applications. Two problems studied in this dissertation are derived from wireless network, two more problems studied are under scenarios of FIWI and optical network, one more problem is in Radio- Frequency Identification (RFID) domain and the last problem is inspired by satellite deployment.

The objective of most of relay nodes placement problems, is to place the fewest number of relay nodes in the deployment area so that the network, formed by the sensors and the relay nodes, is connected. Under the fixed budget scenario, the expense involved in procuring the minimum number of relay nodes to make the network connected, may exceed the budget. In this dissertation, we study a family of problems whose goal is to design a network with “maximal connectedness” or “minimal disconnectedness”, subject to a fixed budget constraint. Apart from “connectivity”, we also study relay node problem in which degree constraint is considered. The balance of reducing the degree of the network while maximizing communication forms the basis of our d-degree minimum arrangement(d-MA) problem. In this dissertation, we look at several approaches to solving the generalized d-MA problem where we embed a graph onto a subgraph of a given degree.

In recent years, considerable research has been conducted on optical and FIWI networks. Utilizing a recently proposed concept “candidate trees” in optical network, this dissertation studies counting problem on complete graphs. Closed form expressions are given for certain cases and a polynomial counting algorithm for general cases is also presented. Routing plays a major role in FiWi networks. Accordingly to a novel path length metric which emphasizes on “heaviest edge”, this dissertation proposes a polynomial algorithm on single path computation. NP-completeness proof as well as approximation algorithm are presented for multi-path routing.

Radio-frequency identification (RFID) technology is extensively used at present for identification and tracking of a multitude of objects. In many configurations, simultaneous activation of two readers may cause a “reader collision” when tags are present in the intersection of the sensing ranges of both readers. This dissertation ad- dresses slotted time access for Readers and tries to provide a collision-free scheduling scheme while minimizing total reading time.

Finally, this dissertation studies a monitoring problem on the surface of the earth for significant environmental, social/political and extreme events using satellites as sensors. It is assumed that the impact of a significant event spills into neighboring regions and there will be corresponding indicators. Careful deployment of sensors, utilizing “Identifying Codes”, can ensure that even though the number of deployed sensors is fewer than the number of regions, it may be possible to uniquely identify the region where the event has taken place.

ContributorsZhou, Chenyang (Author) / Richa, Andrea (Thesis advisor) / Sen, Arunabha (Thesis advisor) / Xue, Guoliang (Committee member) / Walkowiak, Krzysztof (Committee member) / Arizona State University (Publisher)

Created2019

Connectivity in Complex Networks: Measures, Inference and Optimization

Description

Networks naturally appear in many high-impact applications. The simplest model of networks is single-layered networks, where the nodes are from the same domain and the links are of the same type. However, as the world is highly coupled, nodes from different application domains tend to be interdependent on each…

Networks naturally appear in many high-impact applications. The simplest model of networks is single-layered networks, where the nodes are from the same domain and the links are of the same type. However, as the world is highly coupled, nodes from different application domains tend to be interdependent on each other, forming a more complex network model called multi-layered networks.

Among the various aspects of network studies, network connectivity plays an important role in a myriad of applications. The diversified application areas have spurred numerous connectivity measures, each designed for some specific tasks. Although effective in their own fields, none of the connectivity measures is generally applicable to all the tasks. Moreover, existing connectivity measures are predominantly based on single-layered networks, with few attempts made on multi-layered networks.

Most connectivity analyzing methods assume that the input network is static and accurate, which is not realistic in many applications. As real-world networks are evolving, their connectivity scores would vary by time as well, making it imperative to keep track of those changing parameters in a timely manner. Furthermore, as the observed links in the input network may be inaccurate due to noise and incomplete data sources, it is crucial to infer a more accurate network structure to better approximate its connectivity scores.

The ultimate goal of connectivity studies is to optimize the connectivity scores via manipulating the network structures. For most complex measures, the hardness of the optimization problem still remains unknown. Meanwhile, current optimization methods are mainly ad-hoc solutions for specific types of connectivity measures on single-layered networks. No optimization framework has ever been proposed to tackle a wider range of connectivity measures on complex networks.

In this thesis, an in-depth study of connectivity measures, inference, and optimization problems will be proposed. Specifically, a unified connectivity measure model will be introduced to unveil the commonality among existing connectivity measures. For the connectivity inference aspect, an effective network inference method and connectivity tracking framework will be described. Last, a generalized optimization framework will be built to address the connectivity minimization/maximization problems on both single-layered and multi-layered networks.

ContributorsChen, Chen (Author) / Tong, Hanghang (Thesis advisor) / Davulcu, Hasan (Committee member) / Sen, Arunabha (Committee member) / Subrahmanian, V.S. (Committee member) / Ying, Lei (Committee member) / Arizona State University (Publisher)

Created2019

Smart resource allocation in internet-of-things: perspectives of network, security, and economics

Description

Emerging from years of research and development, the Internet-of-Things (IoT) has finally paved its way into our daily lives. From smart home to Industry 4.0, IoT has been fundamentally transforming numerous domains with its unique superpower of interconnecting world-wide devices. However, the capability of IoT is largely constrained by the…

Emerging from years of research and development, the Internet-of-Things (IoT) has finally paved its way into our daily lives. From smart home to Industry 4.0, IoT has been fundamentally transforming numerous domains with its unique superpower of interconnecting world-wide devices. However, the capability of IoT is largely constrained by the limited resources it can employ in various application scenarios, including computing power, network resource, dedicated hardware, etc. The situation is further exacerbated by the stringent quality-of-service (QoS) requirements of many IoT applications, such as delay, bandwidth, security, reliability, and more. This mismatch in resources and demands has greatly hindered the deployment and utilization of IoT services in many resource-intense and QoS-sensitive scenarios like autonomous driving and virtual reality.

I believe that the resource issue in IoT will persist in the near future due to technological, economic and environmental factors. In this dissertation, I seek to address this issue by means of smart resource allocation. I propose mathematical models to formally describe various resource constraints and application scenarios in IoT. Based on these, I design smart resource allocation algorithms and protocols to maximize the system performance in face of resource restrictions. Different aspects are tackled, including networking, security, and economics of the entire IoT ecosystem. For different problems, different algorithmic solutions are devised, including optimal algorithms, provable approximation algorithms, and distributed protocols. The solutions are validated with rigorous theoretical analysis and/or extensive simulation experiments.

ContributorsYu, Ruozhou, Ph.D (Author) / Xue, Guoliang (Thesis advisor) / Huang, Dijiang (Committee member) / Sen, Arunabha (Committee member) / Zhang, Yanchao (Committee member) / Arizona State University (Publisher)

Created2019

A Model for Calculating Damage Potential in Computer Systems

Description

For systems having computers as a significant component, it becomes a critical task to identify the potential threats that the users of the system can present, while being both inside and outside the system. One of the most important factors that differentiate an insider from an outsider is the fact…

For systems having computers as a significant component, it becomes a critical task to identify the potential threats that the users of the system can present, while being both inside and outside the system. One of the most important factors that differentiate an insider from an outsider is the fact that the insider being a part of the system, owns privileges that enable him/her access to the resources and processes of the system through valid capabilities. An insider with malicious intent can potentially be more damaging compared to outsiders. The above differences help to understand the notion and scope of an insider.

The significant loss to organizations due to the failure to detect and mitigate the insider threat has resulted in an increased interest in insider threat detection. The well-studied effective techniques proposed for defending against attacks by outsiders have not been proven successful against insider attacks. Although a number of security policies and models to deal with the insider threat have been developed, the approach taken by most organizations is the use of audit logs after the attack has taken place. Such approaches are inspired by academic research proposals to address the problem by tracking activities of the insider in the system. Although tracking and logging are important, it is argued that they are not sufficient. Thus, the necessity to predict the potential damage of an insider is considered to help build a stronger evaluation and mitigation strategy for the insider attack. In this thesis, the question that seeks to be answered is the following: `Considering the relationships that exist between the insiders and their role, their access to the resources and the resource set, what is the potential damage that an insider can cause?'

A general system model is introduced that can capture general insider attacks including those documented by Computer Emergency Response Team (CERT) for the Software Engineering Institute (SEI). Further, initial formulations of the damage potential for leakage and availability in the model is introduced. The model usefulness is shown by expressing 14 of actual attacks in the model and show how for each case the attack could have been mitigated.

ContributorsNolastname, Sharad (Author) / Bazzi, Rida (Thesis advisor) / Sen, Arunabha (Committee member) / Doupe, Adam (Committee member) / Arizona State University (Publisher)

Created2019

Feature selection techniques for effective model building and estimation on Twitter data to understand the political scenario in Latvia with supporting visualizations

Description

In supervised learning, machine learning techniques can be applied to learn a model on

a small set of labeled documents which can be used to classify a larger set of unknown

documents. Machine learning techniques can be used to analyze a political scenario

in a given society. A lot of research has been…

In supervised learning, machine learning techniques can be applied to learn a model on

a small set of labeled documents which can be used to classify a larger set of unknown

documents. Machine learning techniques can be used to analyze a political scenario

in a given society. A lot of research has been going on in this field to understand

the interactions of various people in the society in response to actions taken by their

organizations.

This paper talks about understanding the Russian influence on people in Latvia.

This is done by building an eeffective model learnt on initial set of documents

containing a combination of official party web-pages, important political leaders' social

networking sites. Since twitter is a micro-blogging site which allows people to post

their opinions on any topic, the model built is used for estimating the tweets sup-

porting the Russian and Latvian political organizations in Latvia. All the documents

collected for analysis are in Latvian and Russian languages which are rich in vocabulary resulting into huge number of features. Hence, feature selection techniques can

be used to reduce the vocabulary set relevant to the classification model. This thesis

provides a comparative analysis of traditional feature selection techniques and implementation of a new iterative feature selection method using EM and cross-domain

training along with supportive visualization tool. This method out performed other

feature selection methods by reducing the number of features up-to 50% along with

good model accuracy. The results from the classification are used to interpret user

behavior and their political influence patterns across organizations in Latvia using

interactive dashboard with combination of powerful widgets.

ContributorsBollapragada, Lakshmi Gayatri Niharika (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2016

SPSR efficient processing of socially k-nearest neighbors with spatial range filter

Description

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users.…

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users. And with rapid increase in the usage of mobile phones and wearables, social media data is being tied to spatial networks. This research document proposes an efficient technique that answers socially k-Nearest Neighbors with Spatial Range Filter. The proposed approach performs a joint search on both the social and spatial domains which radically improves the performance compared to straight forward solutions. The research document proposes a novel index that combines social and spatial indexes. In other words, graph data is stored in an organized manner to filter it based on spatial (region of interest) and social constraints (top-k closest vertices) at query time. That leads to pruning necessary paths during the social graph traversal procedure, and only returns the top-K social close venues. The research document then experimentally proves how the proposed approach outperforms existing baseline approaches by at least three times and also compare how each of our algorithms perform under various conditions on a real geo-social dataset extracted from Yelp.

ContributorsPasumarthy, Nitin (Author) / Sarwat, Mohamed (Thesis advisor) / Papotti, Paolo (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2016

Enhanced topic-based modeling for Twitter sentiment analysis

Description

In this thesis multiple approaches are explored to enhance sentiment analysis of tweets. A standard sentiment analysis model with customized features is first trained and tested to establish a baseline. This is compared to an existing topic based mixture model and a new proposed topic based vector model both of…

In this thesis multiple approaches are explored to enhance sentiment analysis of tweets. A standard sentiment analysis model with customized features is first trained and tested to establish a baseline. This is compared to an existing topic based mixture model and a new proposed topic based vector model both of which use Latent Dirichlet Allocation (LDA) for topic modeling. The proposed topic based vector model has higher accuracies in terms of averaged F scores than the other two models.

ContributorsBaskaran, Swetha (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2016

An empirical evaluation of social influence metrics

Description

Predicting when an individual will adopt a new behavior is an important problem in application domains such as marketing and public health. This thesis examines the performance of a wide variety of social network based measurements proposed in the literature - which have not been previously compared directly.…

Predicting when an individual will adopt a new behavior is an important problem in application domains such as marketing and public health. This thesis examines the performance of a wide variety of social network based measurements proposed in the literature - which have not been previously compared directly. This research studies the probability of an individual becoming influenced based on measurements derived from neighborhood (i.e. number of influencers, personal network exposure), structural diversity, locality, temporal measures, cascade measures, and metadata. It also examines the ability to predict influence based on choice of the classifier and how the ratio of positive to negative samples in both training and testing affect prediction results - further enabling practical use of these concepts for social influence applications.

ContributorsNanda Kumar, Nikhil (Author) / Shakarian, Paulo (Thesis advisor) / Sen, Arunabha (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2016

Web Intelligence for Scaling Discourse of Organizations

Description

Internet and social media devices created a new public space for debate on political

and social topics (Papacharissi 2002; Himelboim 2010). Hotly debated issues

span all spheres of human activity; from liberal vs. conservative politics, to radical

vs. counter-radical religious debate, to climate change debate in scientific community,

to globalization debate in economics, and…

Internet and social media devices created a new public space for debate on political

and social topics (Papacharissi 2002; Himelboim 2010). Hotly debated issues

span all spheres of human activity; from liberal vs. conservative politics, to radical

vs. counter-radical religious debate, to climate change debate in scientific community,

to globalization debate in economics, and to nuclear disarmament debate in

security. Many prominent ’camps’ have emerged within Internet debate rhetoric and

practice (Dahlberg, n.d.).

In this research I utilized feature extraction and model fitting techniques to process

the rhetoric found in the web sites of 23 Indonesian Islamic religious organizations,

later with 26 similar organizations from the United Kingdom to profile their

ideology and activity patterns along a hypothesized radical/counter-radical scale, and

presented an end-to-end system that is able to help researchers to visualize the data

in an interactive fashion on a time line. The subject data of this study is the articles

downloaded from the web sites of these organizations dating from 2001 to 2011,

and in 2013. I developed algorithms to rank these organizations by assigning them

to probable positions on the scale. I showed that the developed Rasch model fits

the data using Andersen’s LR-test (likelihood ratio). I created a gold standard of

the ranking of these organizations through an expertise elicitation tool. Then using

my system I computed expert-to-expert agreements, and then presented experimental

results comparing the performance of three baseline methods to show that the

Rasch model not only outperforms the baseline methods, but it was also the only

system that performs at expert-level accuracy.

I developed an end-to-end system that receives list of organizations from experts,

mines their web corpus, prepare discourse topic lists with expert support, and then

ranks them on scales with partial expert interaction, and finally presents them on an

easy to use web based analytic system.

ContributorsTikves, Sukru (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Liu, Huan (Committee member) / Woodward, Mark (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by