Matching Items (60)
Filtering by
- All Subjects: Computer Science
- Creators: Xue, Guoliang
Description
This thesis studies recommendation systems and considers joint sampling and learning. Sampling in recommendation systems is to obtain users' ratings on specific items chosen by the recommendation platform, and learning is to infer the unknown ratings of users to items given the existing data. In this thesis, the problem is formulated as an adaptive matrix completion problem in which sampling is to reveal the unknown entries of a $U\times M$ matrix where $U$ is the number of users, $M$ is the number of items, and each entry of the $U\times M$ matrix represents the rating of a user to an item. In the literature, this matrix completion problem has been studied under a static setting, i.e., recovering the matrix based on a set of partial ratings. This thesis considers both sampling and learning, and proposes an adaptive algorithm. The algorithm adapts its sampling and learning based on the existing data. The idea is to sample items that reveal more information based on the previous sampling results and then learn based on clustering. Performance of the proposed algorithm has been evaluated using simulations.
ContributorsZhu, Lingfang (Author) / Xue, Guoliang (Thesis advisor) / He, Jingrui (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2015
Description
There are many applications where the truth is unknown. The truth values are
guessed by different sources. The values of different properties can be obtained from
various sources. These will lead to the disagreement in sources. An important task
is to obtain the truth from these sometimes contradictory sources. In the extension
of computing the truth, the reliability of sources needs to be computed. There are
models which compute the precision values. In those earlier models Banerjee et al.
(2005) Dong and Naumann (2009) Kasneci et al. (2011) Li et al. (2012) Marian and
Wu (2011) Zhao and Han (2012) Zhao et al. (2012), multiple properties are modeled
individually. In one of the existing works, the heterogeneous properties are modeled in
a joined way. In that work, the framework i.e. Conflict Resolution on Heterogeneous
Data (CRH) framework is based on the single objective optimization. Due to the
single objective optimization and non-convex optimization problem, only one local
optimal solution is found. As this is a non-convex optimization problem, the optimal
point depends upon the initial point. This single objective optimization problem is
converted into a multi-objective optimization problem. Due to the multi-objective
optimization problem, the Pareto optimal points are computed. In an extension of
that, the single objective optimization problem is solved with numerous initial points.
The above two approaches are used for finding the solution better than the solution
obtained in the CRH with median as the initial point for the continuous variables and
majority voting as the initial point for the categorical variables. In the experiments,
the solution, coming from the CRH, lies in the Pareto optimal points of the multiobjective
optimization and the solution coming from the CRH is the optimum solution
in these experiments.
guessed by different sources. The values of different properties can be obtained from
various sources. These will lead to the disagreement in sources. An important task
is to obtain the truth from these sometimes contradictory sources. In the extension
of computing the truth, the reliability of sources needs to be computed. There are
models which compute the precision values. In those earlier models Banerjee et al.
(2005) Dong and Naumann (2009) Kasneci et al. (2011) Li et al. (2012) Marian and
Wu (2011) Zhao and Han (2012) Zhao et al. (2012), multiple properties are modeled
individually. In one of the existing works, the heterogeneous properties are modeled in
a joined way. In that work, the framework i.e. Conflict Resolution on Heterogeneous
Data (CRH) framework is based on the single objective optimization. Due to the
single objective optimization and non-convex optimization problem, only one local
optimal solution is found. As this is a non-convex optimization problem, the optimal
point depends upon the initial point. This single objective optimization problem is
converted into a multi-objective optimization problem. Due to the multi-objective
optimization problem, the Pareto optimal points are computed. In an extension of
that, the single objective optimization problem is solved with numerous initial points.
The above two approaches are used for finding the solution better than the solution
obtained in the CRH with median as the initial point for the continuous variables and
majority voting as the initial point for the categorical variables. In the experiments,
the solution, coming from the CRH, lies in the Pareto optimal points of the multiobjective
optimization and the solution coming from the CRH is the optimum solution
in these experiments.
ContributorsJain, Karan (Author) / Xue, Guoliang (Thesis advisor) / Sen, Arunabha (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)
Created2019
Description
Energy management system (EMS) is at the heart of the operation and control of a modern electrical grid. Because of economic, safety, and security reasons, access to industrial grade EMS and real-world power system data is extremely limited. Therefore, the ability to simulate an EMS is invaluable in researching the EMS in normal and anomalous operating conditions.
I first lay the groundwork for a basic EMS loop simulation in modern power grids and review a class of cybersecurity threats called false data injection (FDI) attacks. Then I propose a software architecture as the basis of software simulation of the EMS loop and explain an actual software platform built using the proposed architecture. I also explain in detail the power analysis libraries used for building the platform with examples and illustrations from the implemented application. Finally, I will use the platform to simulate FDI attacks on two synthetic power system test cases and analyze and visualize the consequences using the capabilities built into the platform.
I first lay the groundwork for a basic EMS loop simulation in modern power grids and review a class of cybersecurity threats called false data injection (FDI) attacks. Then I propose a software architecture as the basis of software simulation of the EMS loop and explain an actual software platform built using the proposed architecture. I also explain in detail the power analysis libraries used for building the platform with examples and illustrations from the implemented application. Finally, I will use the platform to simulate FDI attacks on two synthetic power system test cases and analyze and visualize the consequences using the capabilities built into the platform.
ContributorsKhodadadeh, Roozbeh (Author) / Sankar, Lalitha (Thesis advisor) / Xue, Guoliang (Thesis advisor) / Kosut, Oliver (Committee member) / Arizona State University (Publisher)
Created2019
Description
Many applications require efficient data routing and dissemination in Delay Tolerant Networks (DTNs) in order to maximize the throughput of data in the network, such as providing healthcare to remote communities, and spreading related information in Mobile Social Networks (MSNs). In this thesis, the feasibility of using boats in the Amazon Delta Riverine region as data mule nodes is investigated and a robust data routing algorithm based on a fountain code approach is designed to ensure fast and timely data delivery considering unpredictable boat delays, break-downs, and high transmission failures. Then, the scenario of providing healthcare in Amazon Delta Region is extended to a general All-or-Nothing (Splittable) Multicommodity Flow (ANF) problem and a polynomial time constant approximation algorithm is designed for the maximum throughput routing problem based on a randomized rounding scheme with applications to DTNs. In an MSN, message content is closely related to users’ preferences, and can be used to significantly impact the performance of data dissemination. An interest- and content-based algorithm is developed where the contents of the messages, along with the network structural information are taken into consideration when making message relay decisions in order to maximize data throughput in an MSN. Extensive experiments show the effectiveness of the above proposed data dissemination algorithm by comparing it with state-of-the-art techniques.
ContributorsLiu, Mengxue (Author) / Richa, Andréa W. (Thesis advisor) / Johnson, Thienne (Committee member) / Syrotiuk, Violet R. (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)
Created2018
Description
In the realm of network science, many topics can be abstracted as graph problems, such as routing, connectivity enhancement, resource/frequency allocation and so on. Though most of them are NP-hard to solve, heuristics as well as approximation algorithms are proposed to achieve reasonably good results. Accordingly, this dissertation studies graph related problems encountered in real applications. Two problems studied in this dissertation are derived from wireless network, two more problems studied are under scenarios of FIWI and optical network, one more problem is in Radio- Frequency Identification (RFID) domain and the last problem is inspired by satellite deployment.
The objective of most of relay nodes placement problems, is to place the fewest number of relay nodes in the deployment area so that the network, formed by the sensors and the relay nodes, is connected. Under the fixed budget scenario, the expense involved in procuring the minimum number of relay nodes to make the network connected, may exceed the budget. In this dissertation, we study a family of problems whose goal is to design a network with “maximal connectedness” or “minimal disconnectedness”, subject to a fixed budget constraint. Apart from “connectivity”, we also study relay node problem in which degree constraint is considered. The balance of reducing the degree of the network while maximizing communication forms the basis of our d-degree minimum arrangement(d-MA) problem. In this dissertation, we look at several approaches to solving the generalized d-MA problem where we embed a graph onto a subgraph of a given degree.
In recent years, considerable research has been conducted on optical and FIWI networks. Utilizing a recently proposed concept “candidate trees” in optical network, this dissertation studies counting problem on complete graphs. Closed form expressions are given for certain cases and a polynomial counting algorithm for general cases is also presented. Routing plays a major role in FiWi networks. Accordingly to a novel path length metric which emphasizes on “heaviest edge”, this dissertation proposes a polynomial algorithm on single path computation. NP-completeness proof as well as approximation algorithm are presented for multi-path routing.
Radio-frequency identification (RFID) technology is extensively used at present for identification and tracking of a multitude of objects. In many configurations, simultaneous activation of two readers may cause a “reader collision” when tags are present in the intersection of the sensing ranges of both readers. This dissertation ad- dresses slotted time access for Readers and tries to provide a collision-free scheduling scheme while minimizing total reading time.
Finally, this dissertation studies a monitoring problem on the surface of the earth for significant environmental, social/political and extreme events using satellites as sensors. It is assumed that the impact of a significant event spills into neighboring regions and there will be corresponding indicators. Careful deployment of sensors, utilizing “Identifying Codes”, can ensure that even though the number of deployed sensors is fewer than the number of regions, it may be possible to uniquely identify the region where the event has taken place.
The objective of most of relay nodes placement problems, is to place the fewest number of relay nodes in the deployment area so that the network, formed by the sensors and the relay nodes, is connected. Under the fixed budget scenario, the expense involved in procuring the minimum number of relay nodes to make the network connected, may exceed the budget. In this dissertation, we study a family of problems whose goal is to design a network with “maximal connectedness” or “minimal disconnectedness”, subject to a fixed budget constraint. Apart from “connectivity”, we also study relay node problem in which degree constraint is considered. The balance of reducing the degree of the network while maximizing communication forms the basis of our d-degree minimum arrangement(d-MA) problem. In this dissertation, we look at several approaches to solving the generalized d-MA problem where we embed a graph onto a subgraph of a given degree.
In recent years, considerable research has been conducted on optical and FIWI networks. Utilizing a recently proposed concept “candidate trees” in optical network, this dissertation studies counting problem on complete graphs. Closed form expressions are given for certain cases and a polynomial counting algorithm for general cases is also presented. Routing plays a major role in FiWi networks. Accordingly to a novel path length metric which emphasizes on “heaviest edge”, this dissertation proposes a polynomial algorithm on single path computation. NP-completeness proof as well as approximation algorithm are presented for multi-path routing.
Radio-frequency identification (RFID) technology is extensively used at present for identification and tracking of a multitude of objects. In many configurations, simultaneous activation of two readers may cause a “reader collision” when tags are present in the intersection of the sensing ranges of both readers. This dissertation ad- dresses slotted time access for Readers and tries to provide a collision-free scheduling scheme while minimizing total reading time.
Finally, this dissertation studies a monitoring problem on the surface of the earth for significant environmental, social/political and extreme events using satellites as sensors. It is assumed that the impact of a significant event spills into neighboring regions and there will be corresponding indicators. Careful deployment of sensors, utilizing “Identifying Codes”, can ensure that even though the number of deployed sensors is fewer than the number of regions, it may be possible to uniquely identify the region where the event has taken place.
ContributorsZhou, Chenyang (Author) / Richa, Andrea (Thesis advisor) / Sen, Arunabha (Thesis advisor) / Xue, Guoliang (Committee member) / Walkowiak, Krzysztof (Committee member) / Arizona State University (Publisher)
Created2019
Description
Models using feature interactions have been applied successfully in many areas such as biomedical analysis, recommender systems. The popularity of using feature interactions mainly lies in (1) they are able to capture the nonlinearity of the data compared with linear effects and (2) they enjoy great interpretability. In this thesis, I propose a series of formulations using feature interactions for real world problems and develop efficient algorithms for solving them.
Specifically, I first propose to directly solve the non-convex formulation of the weak hierarchical Lasso which imposes weak hierarchy on individual features and interactions but can only be approximately solved by a convex relaxation in existing studies. I further propose to use the non-convex weak hierarchical Lasso formulation for hypothesis testing on the interaction features with hierarchical assumptions. Secondly, I propose a type of bi-linear models that take advantage of interactions of features for drug discovery problems where specific drug-drug pairs or drug-disease pairs are of interest. These models are learned by maximizing the number of positive data pairs that rank above the average score of unlabeled data pairs. Then I generalize the method to the case of using the top-ranked unlabeled data pairs for representative construction and derive an efficient algorithm for the extended formulation. Last but not least, motivated by a special form of bi-linear models, I propose a framework that enables simultaneously subgrouping data points and building specific models on the subgroups for learning on massive and heterogeneous datasets. Experiments on synthetic and real datasets are conducted to demonstrate the effectiveness or efficiency of the proposed methods.
Specifically, I first propose to directly solve the non-convex formulation of the weak hierarchical Lasso which imposes weak hierarchy on individual features and interactions but can only be approximately solved by a convex relaxation in existing studies. I further propose to use the non-convex weak hierarchical Lasso formulation for hypothesis testing on the interaction features with hierarchical assumptions. Secondly, I propose a type of bi-linear models that take advantage of interactions of features for drug discovery problems where specific drug-drug pairs or drug-disease pairs are of interest. These models are learned by maximizing the number of positive data pairs that rank above the average score of unlabeled data pairs. Then I generalize the method to the case of using the top-ranked unlabeled data pairs for representative construction and derive an efficient algorithm for the extended formulation. Last but not least, motivated by a special form of bi-linear models, I propose a framework that enables simultaneously subgrouping data points and building specific models on the subgroups for learning on massive and heterogeneous datasets. Experiments on synthetic and real datasets are conducted to demonstrate the effectiveness or efficiency of the proposed methods.
ContributorsLiu, Yashu (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Thesis advisor) / Liu, Huan (Committee member) / Mittelmann, Hans D (Committee member) / Arizona State University (Publisher)
Created2018
Description
Many forms of programmable matter have been proposed for various tasks. We use an abstract model of self-organizing particle systems for programmable matter which could be used for a variety of applications, including smart paint and coating materials for engineering or programmable cells for medical uses. Previous research using this model has focused on shape formation and other spatial configuration problems, including line formation, compression, and coating. In this work we study foundational computational tasks that exceed the capabilities of the individual constant memory particles described by the model. These tasks represent new ways to use these self-organizing systems, which, in conjunction with previous shape and configuration work, make the systems useful for a wider variety of tasks. We present an implementation of a counter using a line of particles, which makes it possible for the line of particles to count to and store values much larger than their individual capacities. We then present an algorithm that takes a matrix and a vector as input and then sets up and uses a rectangular block of particles to compute the matrix-vector multiplication. This setup also utilizes the counter implementation to store the resulting vector from the matrix-vector multiplication. Operations such as counting and matrix multiplication can leverage the distributed and dynamic nature of the self-organizing system to be more efficient and adaptable than on traditional linear computing hardware. Such computational tools also give the systems more power to make complex decisions when adapting to new situations or to analyze the data they collect, reducing reliance on a central controller for setup and output processing. Finally, we demonstrate an application of similar types of computations with self-organizing systems to image processing, with an implementation of an image edge detection algorithm.
ContributorsPorter, Alexandra Marie (Author) / Richa, Andrea (Thesis director) / Xue, Guoliang (Committee member) / School of Music (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
Description
Commercial load balancers are often in use, and the production network at Arizona State University (ASU) is no exception. However, because the load balancer uses IP addresses, the solution does not apply to all applications. One such application is Rsyslog. This software processes syslog packets and stores them in files. The loss rate of incoming log packets is high due to the incoming rate of the data. The Rsyslog servers are overwhelmed by the continuous data stream. To solve this problem a software defined networking (SDN) based load balancer is designed to perform a transport-level load balancing over the incoming load to Rsyslog servers. In this solution the load is forwarded to one Rsyslog server at a time, according to one of a Round-Robin, Random, or Load-Based policy. This gives time to other servers to process the data they have received and prevent them from being overwhelmed. The evaluation of the proposed solution is conducted a physical testbed with the same data feed as the commercial solution. The results suggest that the SDN-based load balancer is competitive with the commercial load balancer. Replacing the software OpenFlow switch with a hardware switch is likely to further improve the results.
ContributorsGhaffarinejad, Ashkan (Author) / Syrotiuk, Violet R. (Thesis advisor) / Xue, Guoliang (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)
Created2015
Description
With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of user-generated content have introduced the human behavior's big-data.
This big data has brought about countless opportunities for analyzing human behavior at scale. However, is this data enough? Unfortunately, the data available at the individual-level is limited for most users. This limited individual-level data is often referred to as thin data. Hence, researchers face a big-data paradox, where this big-data is a large collection of mostly limited individual-level information. Researchers are often constrained to derive meaningful insights regarding online user behavior with this limited information. Simply put, they have to make thin data thick.
In this dissertation, how human behavior's thin data can be made thick is investigated. The chief objective of this dissertation is to demonstrate how traces of human behavior can be efficiently gleaned from the, often limited, individual-level information; hence, introducing an all-inclusive user behavior analysis methodology that considers social media users with different levels of information availability. To that end, the absolute minimum information in terms of both link or content data that is available for any social media user is determined. Utilizing only minimum information in different applications on social media such as prediction or recommendation tasks allows for solutions that are (1) generalizable to all social media users and that are (2) easy to implement. However, are applications that employ only minimum information as effective or comparable to applications that use more information?
In this dissertation, it is shown that common research challenges such as detecting malicious users or friend recommendation (i.e., link prediction) can be effectively performed using only minimum information. More importantly, it is demonstrated that unique user identification can be achieved using minimum information. Theoretical boundaries of unique user identification are obtained by introducing social signatures. Social signatures allow for user identification in any large-scale network on social media. The results on single-site user identification are generalized to multiple sites and it is shown how the same user can be uniquely identified across multiple sites using only minimum link or content information.
The findings in this dissertation allows finding the same user across multiple sites, which in turn has multiple implications. In particular, by identifying the same users across sites, (1) patterns that users exhibit across sites are identified, (2) how user behavior varies across sites is determined, and (3) activities that are observed only across sites are identified and studied.
This big data has brought about countless opportunities for analyzing human behavior at scale. However, is this data enough? Unfortunately, the data available at the individual-level is limited for most users. This limited individual-level data is often referred to as thin data. Hence, researchers face a big-data paradox, where this big-data is a large collection of mostly limited individual-level information. Researchers are often constrained to derive meaningful insights regarding online user behavior with this limited information. Simply put, they have to make thin data thick.
In this dissertation, how human behavior's thin data can be made thick is investigated. The chief objective of this dissertation is to demonstrate how traces of human behavior can be efficiently gleaned from the, often limited, individual-level information; hence, introducing an all-inclusive user behavior analysis methodology that considers social media users with different levels of information availability. To that end, the absolute minimum information in terms of both link or content data that is available for any social media user is determined. Utilizing only minimum information in different applications on social media such as prediction or recommendation tasks allows for solutions that are (1) generalizable to all social media users and that are (2) easy to implement. However, are applications that employ only minimum information as effective or comparable to applications that use more information?
In this dissertation, it is shown that common research challenges such as detecting malicious users or friend recommendation (i.e., link prediction) can be effectively performed using only minimum information. More importantly, it is demonstrated that unique user identification can be achieved using minimum information. Theoretical boundaries of unique user identification are obtained by introducing social signatures. Social signatures allow for user identification in any large-scale network on social media. The results on single-site user identification are generalized to multiple sites and it is shown how the same user can be uniquely identified across multiple sites using only minimum link or content information.
The findings in this dissertation allows finding the same user across multiple sites, which in turn has multiple implications. In particular, by identifying the same users across sites, (1) patterns that users exhibit across sites are identified, (2) how user behavior varies across sites is determined, and (3) activities that are observed only across sites are identified and studied.
ContributorsZafarani, Reza, 1983- (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Xue, Guoliang (Committee member) / Leskovec, Jure (Committee member) / Arizona State University (Publisher)
Created2015
Description
Mobile Cloud computing has shown its capability to support mobile devices for
provisioning computing, storage and communication resources. A distributed mobile
cloud service system called "POEM" is presented to manage the mobile cloud resource
and compose mobile cloud applications. POEM considers resource management not
only between mobile devices and clouds, but also among mobile devices. It implements
both computation offloading and service composition features. The proposed POEM
solution is demonstrated by using OSGi and XMPP techniques.
Offloading is one major type of collaborations between mobile device and cloud
to achieve less execution time and less energy consumption. Offloading decisions for
mobile cloud collaboration involve many decision factors. One of important decision
factors is the network unavailability. This report presents an offloading decision model
that takes network unavailability into consideration. The application execution time
and energy consumption in both ideal network and network with some unavailability
are analyzed. Based on the presented theoretical model, an application partition
algorithm and a decision module are presented to produce an offloading decision that
is resistant to network unavailability.
Existing offloading models mainly focus on the one-to-one offloading relation. To
address the multi-factor and multi-site offloading mobile cloud application scenarios,
a multi-factor multi-site risk-based offloading model is presented, which abstracts the
offloading impact factors as for offloading benefit and offloading risk. The offloading
decision is made based on a comprehensive offloading risk evaluation. This presented
model is generic and expendable. Four offloading impact factors are presented to show
the construction and operation of the presented offloading model, which can be easily
extended to incorporate more factors to make offloading decision more comprehensive.
The overall offloading benefits and risks are aggregated based on the mobile cloud
users' preference.
The offloading topology may change during the whole application life. A set of
algorithms are presented to address the service topology reconfiguration problem in
several mobile cloud representative application scenarios, i.e., they are modeled as
finite horizon scenarios, infinite horizon scenarios, and large state space scenarios to
represent ad hoc, long-term, and large-scale mobile cloud service composition scenarios,
respectively.
provisioning computing, storage and communication resources. A distributed mobile
cloud service system called "POEM" is presented to manage the mobile cloud resource
and compose mobile cloud applications. POEM considers resource management not
only between mobile devices and clouds, but also among mobile devices. It implements
both computation offloading and service composition features. The proposed POEM
solution is demonstrated by using OSGi and XMPP techniques.
Offloading is one major type of collaborations between mobile device and cloud
to achieve less execution time and less energy consumption. Offloading decisions for
mobile cloud collaboration involve many decision factors. One of important decision
factors is the network unavailability. This report presents an offloading decision model
that takes network unavailability into consideration. The application execution time
and energy consumption in both ideal network and network with some unavailability
are analyzed. Based on the presented theoretical model, an application partition
algorithm and a decision module are presented to produce an offloading decision that
is resistant to network unavailability.
Existing offloading models mainly focus on the one-to-one offloading relation. To
address the multi-factor and multi-site offloading mobile cloud application scenarios,
a multi-factor multi-site risk-based offloading model is presented, which abstracts the
offloading impact factors as for offloading benefit and offloading risk. The offloading
decision is made based on a comprehensive offloading risk evaluation. This presented
model is generic and expendable. Four offloading impact factors are presented to show
the construction and operation of the presented offloading model, which can be easily
extended to incorporate more factors to make offloading decision more comprehensive.
The overall offloading benefits and risks are aggregated based on the mobile cloud
users' preference.
The offloading topology may change during the whole application life. A set of
algorithms are presented to address the service topology reconfiguration problem in
several mobile cloud representative application scenarios, i.e., they are modeled as
finite horizon scenarios, infinite horizon scenarios, and large state space scenarios to
represent ad hoc, long-term, and large-scale mobile cloud service composition scenarios,
respectively.
ContributorsWu, Huijun (Author) / Huang, Dijiang (Thesis advisor) / Xue, Guoliang (Committee member) / Dasgupta, Partha (Committee member) / Mirchandani, Pitu (Committee member) / Arizona State University (Publisher)
Created2016