Search Content

Matching Items (3)

Filtering by

Making thin data thick: user behavior analysis with minimum information

Description

With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of user-generated content have introduced the human behavior's big-data.

This big data has brought about countless opportunities for analyzing human behavior at scale. However, is this data enough? Unfortunately, the data available at the individual-level is limited for most users. This limited individual-level data is often referred to as thin data. Hence, researchers face a big-data paradox, where this big-data is a large collection of mostly limited individual-level information. Researchers are often constrained to derive meaningful insights regarding online user behavior with this limited information. Simply put, they have to make thin data thick.

In this dissertation, how human behavior's thin data can be made thick is investigated. The chief objective of this dissertation is to demonstrate how traces of human behavior can be efficiently gleaned from the, often limited, individual-level information; hence, introducing an all-inclusive user behavior analysis methodology that considers social media users with different levels of information availability. To that end, the absolute minimum information in terms of both link or content data that is available for any social media user is determined. Utilizing only minimum information in different applications on social media such as prediction or recommendation tasks allows for solutions that are (1) generalizable to all social media users and that are (2) easy to implement. However, are applications that employ only minimum information as effective or comparable to applications that use more information?

In this dissertation, it is shown that common research challenges such as detecting malicious users or friend recommendation (i.e., link prediction) can be effectively performed using only minimum information. More importantly, it is demonstrated that unique user identification can be achieved using minimum information. Theoretical boundaries of unique user identification are obtained by introducing social signatures. Social signatures allow for user identification in any large-scale network on social media. The results on single-site user identification are generalized to multiple sites and it is shown how the same user can be uniquely identified across multiple sites using only minimum link or content information.

The findings in this dissertation allows finding the same user across multiple sites, which in turn has multiple implications. In particular, by identifying the same users across sites, (1) patterns that users exhibit across sites are identified, (2) how user behavior varies across sites is determined, and (3) activities that are observed only across sites are identified and studied.

ContributorsZafarani, Reza, 1983- (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Xue, Guoliang (Committee member) / Leskovec, Jure (Committee member) / Arizona State University (Publisher)

Created2015

Optimal resource allocation in social and critical infrastructure networks

Description

We live in a networked world with a multitude of networks, such as communication networks, electric power grid, transportation networks and water distribution networks, all around us. In addition to such physical (infrastructure) networks, recent years have seen tremendous proliferation of social networks, such as Facebook, Twitter, LinkedIn, Instagram, Google+ and others. These powerful social networks are not only used for harnessing revenue from the infrastructure networks, but are also increasingly being used as “non-conventional sensors” for monitoring the infrastructure networks. Accordingly, nowadays, analyses of social and infrastructure networks go hand-in-hand. This dissertation studies resource allocation problems encountered in this set of diverse, heterogeneous, and interdependent networks. Three problems studied in this dissertation are encountered in the physical network domain while the three other problems studied are encountered in the social network domain.

The first problem from the infrastructure network domain relates to distributed files storage scheme with a goal of enhancing robustness of data storage by making it tolerant against large scale geographically-correlated failures. The second problem relates to placement of relay nodes in a deployment area with multiple sensor nodes with a goal of augmenting connectivity of the resulting network, while staying within the budget specifying the maximum number of relay nodes that can be deployed. The third problem studied in this dissertation relates to complex interdependencies that exist between infrastructure networks, such as power grid and communication network. The progressive recovery problem in an interdependent network is studied whose goal is to maximize system utility over the time when recovery process of failed entities takes place in a sequential manner.

The three problems studied from the social network domain relate to influence propagation in adversarial environment and political sentiment assessment in various states in a country with a goal of creation of a “political heat map” of the country. In the first problem of the influence propagation domain, the goal of the second player is to restrict the influence of the first player, while in the second problem the goal of the second player is to have a larger market share with least amount of initial investment.

ContributorsMazumder, Anisha (Author) / Sen, Arunabha (Thesis advisor) / Richa, Andrea (Committee member) / Xue, Guoliang (Committee member) / Reisslein, Martin (Committee member) / Arizona State University (Publisher)

Created2016

Protecting identity and location privacy in online environment

Description

The recent years have witnessed a rapid development of mobile devices and smart devices. As more and more people are getting involved in the online environment, privacy issues are becoming increasingly important. People’s privacy in the digital world is much easier to leak than in the real world, because every action people take online would leave a trail of information which could be recorded, collected and used by malicious attackers. Besides, service providers might collect users’ information and analyze them, which also leads to a privacy breach. Therefore, preserving people’s privacy is very important in the online environment.

In this dissertation, I study the problems of preserving people’s identity privacy and loca- tion privacy in the online environment. Specifically, I study four topics: identity privacy in online social networks (OSNs), identity privacy in anonymous message submission, lo- cation privacy in location based social networks (LBSNs), and location privacy in location based reminders. In the first topic, I propose a system which can hide users’ identity and data from untrusted storage site where the OSN provider puts users’ data. I also design a fine grained access control mechanism which prevents unauthorized users from accessing the data. Based on the secret sharing scheme, I construct a shuffle protocol that disconnects the relationship between members’ identities and their submitted messages in the topic of identity privacy in anonymous message submission. The message is encrypted on the mem- ber side and decrypted on the message collector side. The collector eventually gets all of the messages but does not know who submitted which message. In the third topic, I pro- pose a framework that hides users’ check-in information from the LBSN. Considering the limited computation resources on smart devices, I propose a delegatable pseudo random function to outsource computations to the much more powerful server while preserving privacy. I also implement efficient revocations. In the topic of location privacy in location based reminders, I propose a system to hide users’ reminder locations from an untrusted cloud server. I propose a cross based approach and an improved bar based approach, re- spectively, to represent a reminder area. The reminder location and reminder message are encrypted before uploading to the cloud server, which then can determine whether the dis- tance between the user’s current location and the reminder location is within the reminder distance without knowing anything about the user’s location information and the content of the reminder message.

ContributorsZhao, Xinxin (Author) / Xue, Guoliang (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Huang, Dijiang (Committee member) / Zhang, Yanchao (Committee member) / Arizona State University (Publisher)

Created2015