The explosive growth of data generated from different services has opened a new vein of research commonly called ``big data.'' The sheer volume of the information in this data has yielded new applications in a wide range of fields, but the difficulties inherent in processing the enormous amount of data, as well as the rate at which it is generated, also give rise to significant challenges. In particular, processing, modeling, and understanding the structure of online social networks is computationally difficult due to these challenges. The goal of this study is twofold: first to present a new networked data processing framework to model this social structure, and second to highlight the wireless networking gains possible by using this social structure.
The first part of the dissertation considers a new method for modeling social networks via probabilistic graphical models. Specifically, this new method employs the t-cherry junction tree, a recent advancement in probabilistic graphical models, to develop a compact representation and good approximation of an otherwise intractable probabilistic model. There are a number of advantages in this approach: 1) the best approximation possible via junction trees belongs to the class of t-cherry junction trees; 2) constructing a t-cherry junction tree can be largely parallelized; and 3) inference can be performed using distributed computation. To improve the quality of approximation, an algorithm to build a higher order tree gracefully from an existing one, without constructing it from scratch, is developed. this approach is applied to Twitter data containing 100,000 nodes to study the problem of recommending connections to new users.
Next, the t-cherry junction tree framework is extended by considering the impact of estimating the distributions involved from a training data set. Understanding this impact is vital to real-world applications as distributions are not known perfectly, but rather generated from training data. First, the fidelity of the t-cherry junction tree approximation due to this estimation is quantified. Then the scaling behavior, in terms of the size of the t-cherry junction tree, is approximated to show that higher-order t-cherry junction trees, which with perfect information are higher fidelity approximations, may actually result in decreased fidelity due to the difficulties in accurately estimating higher-dimensional distributions. Finally, this part concludes by demonstrating these findings by considering a distributed detection situation in which the sensors' measurements are correlated.
Having developed a framework to model social structure in online social networks, the study then highlights two approaches for utilizing this social network data in existing wireless communication networks. The first approach is a novel application: using social networks to enhance device-to-device wireless communication. It is well known that wireless communication can be significantly improved by utilizing relays to aid in transmission. Rather than deploying dedicated relays, a system is designed in which users can relay traffic for other users if there is a shared social trust between them, e.g., they are ``friends'' on Facebook, and for users that do not share social trust, implements a coalitional game framework to motivate users to relay traffic for each other. This framework guarantees that all users improve their throughput via relaying while ensuring that each user will function as a relay only if there is a social trust relationship or, if there is no social trust, a cycle of reciprocity is established in which a set of users will agree to relay for each other. This new system shows significant throughput gain in simulated networks that utilize real-world social network traces.
The second application of social structure to wireless communication is an approach to reduce the congestion in cellular networks during peak times. This is achieved by two means: preloading and offloading. Preloading refers to the process of using social network data to predict user demand and serve some users early, before the cellular network traffic peaks. Offloading allows users that have already obtained a copy of the content to opportunistically serve other users using device-to-device communication, thus eliminating the need for some cellular traffic. These two methods work especially well in tandem, as preloading creates a base of users that can serve later users via offloading. These two processes can greatly reduce the peak cellular traffic under ideal conditions, and in a more realistic situation, the impact of uncertainty in human mobility and the social network structure is analyzed. Even with the randomness inherent in these processes, both preloading and offloading offer substantial improvement. Finally, potential difficulties in preloading multiple pieces of content simultaneously are highlighted, and a heuristic method to solve these challenges is developed.