Matching Items (1,046)
Filtering by

Clear all filters

153085-Thumbnail Image.png
Description
Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous data, it is interesting to design efficient machine learning models that are capable of performing variable selection and feature group (data source) selection simultaneously (a.k.a bi-level selection). In this thesis, I carry out research along this direction with a particular focus on designing efficient optimization algorithms. I start with a unified bi-level learning model that contains several existing feature selection models as special cases. Then the proposed model is further extended to tackle the block-wise missing data, one of the major challenges in the diagnosis of Alzheimer's Disease (AD). Moreover, I propose a novel interpretable sparse group feature selection model that greatly facilitates the procedure of parameter tuning and model selection. Last but not least, I show that by solving the sparse group hard thresholding problem directly, the sparse group feature selection model can be further improved in terms of both algorithmic complexity and efficiency. Promising results are demonstrated in the extensive evaluation on multiple real-world data sets.
ContributorsXiang, Shuo (Author) / Ye, Jieping (Thesis advisor) / Mittelmann, Hans D (Committee member) / Davulcu, Hasan (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)
Created2014
154168-Thumbnail Image.png
Description
This thesis studies recommendation systems and considers joint sampling and learning. Sampling in recommendation systems is to obtain users' ratings on specific items chosen by the recommendation platform, and learning is to infer the unknown ratings of users to items given the existing data. In this thesis, the problem is

This thesis studies recommendation systems and considers joint sampling and learning. Sampling in recommendation systems is to obtain users' ratings on specific items chosen by the recommendation platform, and learning is to infer the unknown ratings of users to items given the existing data. In this thesis, the problem is formulated as an adaptive matrix completion problem in which sampling is to reveal the unknown entries of a $U\times M$ matrix where $U$ is the number of users, $M$ is the number of items, and each entry of the $U\times M$ matrix represents the rating of a user to an item. In the literature, this matrix completion problem has been studied under a static setting, i.e., recovering the matrix based on a set of partial ratings. This thesis considers both sampling and learning, and proposes an adaptive algorithm. The algorithm adapts its sampling and learning based on the existing data. The idea is to sample items that reveal more information based on the previous sampling results and then learn based on clustering. Performance of the proposed algorithm has been evaluated using simulations.
ContributorsZhu, Lingfang (Author) / Xue, Guoliang (Thesis advisor) / He, Jingrui (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2015
156246-Thumbnail Image.png
Description
Diffusion processes in networks can be used to model many real-world processes, such as the propagation of a rumor on social networks and cascading failures on power networks. Analysis of diffusion processes in networks can help us answer important questions such as the role and the importance of each node

Diffusion processes in networks can be used to model many real-world processes, such as the propagation of a rumor on social networks and cascading failures on power networks. Analysis of diffusion processes in networks can help us answer important questions such as the role and the importance of each node in the network for spreading the diffusion and how to top or contain a cascading failure in the network. This dissertation consists of three parts.

In the first part, we study the problem of locating multiple diffusion sources in networks under the Susceptible-Infected-Recovered (SIR) model. Given a complete snapshot of the network, we developed a sample-path-based algorithm, named clustering and localization, and proved that for regular trees, the estimators produced by the proposed algorithm are within a constant distance from the real sources with a high probability. Then, we considered the case in which only a partial snapshot is observed and proposed a new algorithm, named Optimal-Jordan-Cover (OJC). The algorithm first extracts a subgraph using a candidate selection algorithm that selects source candidates based on the number of observed infected nodes in their neighborhoods. Then, in the extracted subgraph, OJC finds a set of nodes that "cover" all observed infected nodes with the minimum radius. The set of nodes is called the Jordan cover, and is regarded as the set of diffusion sources. We proved that OJC can locate all sources with probability one asymptotically with partial observations in the Erdos-Renyi (ER) random graph. Multiple experiments on different networks were done, which show our algorithms outperform others.

In the second part, we tackle the problem of reconstructing the diffusion history from partial observations. We formulated the diffusion history reconstruction problem as a maximum a posteriori (MAP) problem and proved the problem is NP hard. Then we proposed a step-by- step reconstruction algorithm, which can always produce a diffusion history that is consistent with the partial observations. Our experimental results based on synthetic and real networks show that the algorithm significantly outperforms some existing methods.

In the third part, we consider the problem of improving the robustness of an interdependent network by rewiring a small number of links during a cascading attack. We formulated the problem as a Markov decision process (MDP) problem. While the problem is NP-hard, we developed an effective and efficient algorithm, RealWire, to robustify the network and to mitigate the damage during the attack. Extensive experimental results show that our algorithm outperforms other algorithms on most of the robustness metrics.
ContributorsChen, Zhen (Author) / Ying, Lei (Thesis advisor) / Tong, Hanghang (Thesis advisor) / Zhang, Junshan (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)
Created2018
156577-Thumbnail Image.png
Description
Network mining has been attracting a lot of research attention because of the prevalence of networks. As the world is becoming increasingly connected and correlated, networks arising from inter-dependent application domains are often collected from different sources, forming the so-called multi-sourced networks. Examples of such multi-sourced networks include critical infrastructure

Network mining has been attracting a lot of research attention because of the prevalence of networks. As the world is becoming increasingly connected and correlated, networks arising from inter-dependent application domains are often collected from different sources, forming the so-called multi-sourced networks. Examples of such multi-sourced networks include critical infrastructure networks, multi-platform social networks, cross-domain collaboration networks, and many more. Compared with single-sourced network, multi-sourced networks bear more complex structures and therefore could potentially contain more valuable information.

This thesis proposes a multi-layered HITS (Hyperlink-Induced Topic Search) algorithm to perform the ranking task on multi-sourced networks. Specifically, each node in the network receives an authority score and a hub score for evaluating the value of the node itself and the value of its outgoing links respectively. Based on a recent multi-layered network model, which allows more flexible dependency structure across different sources (i.e., layers), the proposed algorithm leverages both within-layer smoothness and cross-layer consistency. This essentially allows nodes from different layers to be ranked accordingly. The multi-layered HITS is formulated as a regularized optimization problem with non-negative constraint and solved by an iterative update process. Extensive experimental evaluations demonstrate the effectiveness and explainability of the proposed algorithm.
ContributorsYu, Haichao (Author) / Tong, Hanghang (Thesis advisor) / He, Jingrui (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2018