Matching Items (100)
Filtering by

Clear all filters

150235-Thumbnail Image.png
Description
Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based local measures for assessing source quality. These local measures are

Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based local measures for assessing source quality. These local measures are necessary but not sufficient as they are agnostic to source trustworthiness and result importance, which, given the autonomous and uncurated nature of deep-web, have become indispensible for searching deep-web. SourceRank provides a global measure for assessing source quality based on source trustworthiness and result importance. SourceRank's effectiveness has been evaluated in single-topic deep-web environments. The goal of the thesis is to extend sourcerank to a multi-topic deep-web environment. Topic-sensitive sourcerank is introduced as an effective way of extending sourcerank to a deep-web environment containing a set of representative topics. In topic-sensitive sourcerank, multiple sourcerank vectors are created, each biased towards a representative topic. At query time, using the topic of query keywords, a query-topic sensitive, composite sourcerank vector is computed as a linear combination of these pre-computed biased sourcerank vectors. Extensive experiments on more than a thousand sources in multiple domains show 18-85% improvements in result quality over Google Product Search and other existing methods.
ContributorsJha, Manishkumar (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2011
150174-Thumbnail Image.png
Description
Internet sites that support user-generated content, so-called Web 2.0, have become part of the fabric of everyday life in technologically advanced nations. Users collectively spend billions of hours consuming and creating content on social networking sites, weblogs (blogs), and various other types of sites in the United States and around

Internet sites that support user-generated content, so-called Web 2.0, have become part of the fabric of everyday life in technologically advanced nations. Users collectively spend billions of hours consuming and creating content on social networking sites, weblogs (blogs), and various other types of sites in the United States and around the world. Given the fundamentally emotional nature of humans and the amount of emotional content that appears in Web 2.0 content, it is important to understand how such websites can affect the emotions of users. This work attempts to determine whether emotion spreads through an online social network (OSN). To this end, a method is devised that employs a model based on a general threshold diffusion model as a classifier to predict the propagation of emotion between users and their friends in an OSN by way of mood-labeled blog entries. The model generalizes existing information diffusion models in that the state machine representation of a node is generalized from being binary to having n-states in order to support n class labels necessary to model emotional contagion. In the absence of ground truth, the prediction accuracy of the model is benchmarked with a baseline method that predicts the majority label of a user's emotion label distribution. The model significantly outperforms the baseline method in terms of prediction accuracy. The experimental results make a strong case for the existence of emotional contagion in OSNs in spite of possible alternative arguments such confounding influence and homophily, since these alternatives are likely to have negligible effect in a large dataset or simply do not apply to the domain of human emotions. A hybrid manual/automated method to map mood-labeled blog entries to a set of emotion labels is also presented, which enables the application of the model to a large set (approximately 900K) of blog entries from LiveJournal.
ContributorsCole, William David, M.S (Author) / Liu, Huan (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Candan, Kasim S (Committee member) / Arizona State University (Publisher)
Created2011
151129-Thumbnail Image.png
Description
Ranking is of definitive importance to both usability and profitability of web information systems. While ranking of results is crucial for the accessibility of information to the user, the ranking of online ads increases the profitability of the search provider. The scope of my thesis includes both search and ad

Ranking is of definitive importance to both usability and profitability of web information systems. While ranking of results is crucial for the accessibility of information to the user, the ranking of online ads increases the profitability of the search provider. The scope of my thesis includes both search and ad ranking. I consider the emerging problem of ranking the deep web data considering trustworthiness and relevance. I address the end-to-end deep web ranking by focusing on: (i) ranking and selection of the deep web databases (ii) topic sensitive ranking of the sources (iii) ranking the result tuples from the selected databases. Especially, assessing the trustworthiness and relevances of results for ranking is hard since the currently used link analysis is inapplicable (since deep web records do not have links). I formulated a method---namely SourceRank---to assess the trustworthiness and relevance of the sources based on the inter-source agreement. Secondly, I extend the SourceRank to consider the topic of the agreeing sources in multi-topic environments. Further, I formulate a ranking sensitive to trustworthiness and relevance for the individual results returned by the selected sources. For ad ranking, I formulate a generalized ranking function---namely Click Efficiency (CE)---based on a realistic user click model of ads and documents. The CE ranking considers hitherto ignored parameters of perceived relevance and user dissatisfaction. CE ranking guaranteeing optimal utilities for the click model. Interestingly, I show that the existing ad and document ranking functions are reduced forms of the CE ranking under restrictive assumptions. Subsequently, I extend the CE ranking to include a pricing mechanism, designing a complete auction mechanism. My analysis proves several desirable properties including revenue dominance over popular Vickery-Clarke-Groves (VCG) auctions for the same bid vector and existence of a Nash equilibrium in pure strategies. The equilibrium is socially optimal, and revenue equivalent to the truthful VCG equilibrium. Further, I relax the independence assumption in CE ranking and analyze the diversity ranking problem. I show that optimal diversity ranking is NP-Hard in general, and that a constant time approximation algorithm is not likely.
ContributorsBalakrishnan, Nagraj (Author) / Kambhampati, Subbarao (Thesis advisor) / Chen, Yi (Committee member) / Doan, AnHai (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2012
154086-Thumbnail Image.png
Description
Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution, but we may have plenty of labeled data from one or multiple related sources with different distributions.

Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution, but we may have plenty of labeled data from one or multiple related sources with different distributions. Due to its capability of migrating knowledge from related domains, transfer learning has shown to be effective for cross-domain learning problems. In this dissertation, I carry out research along this direction with a particular focus on designing efficient and effective algorithms for BioImaging and Bilingual applications. Specifically, I propose deep transfer learning algorithms which combine transfer learning and deep learning to improve image annotation performance. Firstly, I propose to generate the deep features for the Drosophila embryo images via pretrained deep models and build linear classifiers on top of the deep features. Secondly, I propose to fine-tune the pretrained model with a small amount of labeled images. The time complexity and performance of deep transfer learning methodologies are investigated. Promising results have demonstrated the knowledge transfer ability of proposed deep transfer algorithms. Moreover, I propose a novel Robust Principal Component Analysis (RPCA) approach to process the noisy images in advance. In addition, I also present a two-stage re-weighting framework for general domain adaptation problems. The distribution of source domain is mapped towards the target domain in the first stage, and an adaptive learning model is proposed in the second stage to incorporate label information from the target domain if it is available. Then the proposed model is applied to tackle cross lingual spam detection problem at LinkedIn’s website. Our experimental results on real data demonstrate the efficiency and effectiveness of the proposed algorithms.
ContributorsSun, Qian (Author) / Ye, Jieping (Committee member) / Xue, Guoliang (Committee member) / Liu, Huan (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)
Created2015
154137-Thumbnail Image.png
Description
The purpose of information source detection problem (or called rumor source detection) is to identify the source of information diffusion in networks based on available observations like the states of the nodes and the timestamps at which nodes adopted the information (or called infected). The solution of the problem can

The purpose of information source detection problem (or called rumor source detection) is to identify the source of information diffusion in networks based on available observations like the states of the nodes and the timestamps at which nodes adopted the information (or called infected). The solution of the problem can be used to answer a wide range of important questions in epidemiology, computer network security, etc. This dissertation studies the fundamental theory and the design of efficient and robust algorithms for the information source detection problem.

For tree networks, the maximum a posterior (MAP) estimator of the information source is derived under the independent cascades (IC) model with a complete snapshot and a Short-Fat Tree (SFT) algorithm is proposed for general networks based on the MAP estimator. Furthermore, the following possibility and impossibility results are established on the Erdos-Renyi (ER) random graph: $(i)$ when the infection duration $<\frac{2}{3}t_u,$ SFT identifies the source with probability one asymptotically, where $t_u=\left\lceil\frac{\log n}{\log \mu}\right\rceil+2$ and $\mu$ is the average node degree, $(ii)$ when the infection duration $>t_u,$ the probability of identifying the source approaches zero asymptotically under any algorithm; and $(iii)$ when infection duration $
In practice, other than the nodes' states, side information like partial timestamps may also be available. Such information provides important insights of the diffusion process. To utilize the partial timestamps, the information source detection problem is formulated as a ranking problem on graphs and two ranking algorithms, cost-based ranking (CR) and tree-based ranking (TR), are proposed. Extensive experimental evaluations of synthetic data of different diffusion models and real world data demonstrate the effectiveness and robustness of CR and TR compared with existing algorithms.
ContributorsZhu, Kai (Author) / Ying, Lei (Thesis advisor) / Lai, Ying-Cheng (Committee member) / Liu, Huan (Committee member) / Shakarian, Paulo (Committee member) / Arizona State University (Publisher)
Created2015
156107-Thumbnail Image.png
Description
Online social media is popular due to its real-time nature, extensive connectivity and a large user base. This motivates users to employ social media for seeking information by reaching out to their large number of social connections. Information seeking can manifest in the form of requests for personal and time-critical

Online social media is popular due to its real-time nature, extensive connectivity and a large user base. This motivates users to employ social media for seeking information by reaching out to their large number of social connections. Information seeking can manifest in the form of requests for personal and time-critical information or gathering perspectives on important issues. Social media platforms are not designed for resource seeking and experience large volumes of messages, leading to requests not being fulfilled satisfactorily. Designing frameworks to facilitate efficient information seeking in social media will help users to obtain appropriate assistance for their needs

and help platforms to increase user satisfaction.

Several challenges exist in the way of facilitating information seeking in social media. First, the characteristics affecting the user’s response time for a question are not known, making it hard to identify prompt responders. Second, the social context in which the user has asked the question has to be determined to find personalized responders. Third, users employ rhetorical requests, which are statements having the

syntax of questions, and systems assisting information seeking might be hindered from focusing on genuine questions. Fouth, social media advocates of political campaigns employ nuanced strategies to prevent users from obtaining balanced perspectives on

issues of public importance.

Sociological and linguistic studies on user behavior while making or responding to information seeking requests provides concepts drawing from which we can address these challenges. We propose methods to estimate the response time of the user for a given question to identify prompt responders. We compute the question specific social context an asker shares with his social connections to identify personalized responders. We draw from theories of political mobilization to model the behaviors arising from the strategies of people trying to skew perspectives. We identify rhetorical questions by modeling user motivations to post them.
ContributorsRanganath, Suhas (Author) / Liu, Huan (Thesis advisor) / Lai, Ying-Cheng (Thesis advisor) / Tong, Hanghang (Committee member) / Vaculin, Roman (Committee member) / Arizona State University (Publisher)
Created2017
156080-Thumbnail Image.png
Description
While techniques for reading DNA in some capacity has been possible for decades,

the ability to accurately edit genomes at scale has remained elusive. Novel techniques

have been introduced recently to aid in the writing of DNA sequences. While writing

DNA is more accessible, it still remains expensive, justifying the increased interest in

in

While techniques for reading DNA in some capacity has been possible for decades,

the ability to accurately edit genomes at scale has remained elusive. Novel techniques

have been introduced recently to aid in the writing of DNA sequences. While writing

DNA is more accessible, it still remains expensive, justifying the increased interest in

in silico predictions of cell behavior. In order to accurately predict the behavior of

cells it is necessary to extensively model the cell environment, including gene-to-gene

interactions as completely as possible.

Significant algorithmic advances have been made for identifying these interactions,

but despite these improvements current techniques fail to infer some edges, and

fail to capture some complexities in the network. Much of this limitation is due to

heavily underdetermined problems, whereby tens of thousands of variables are to be

inferred using datasets with the power to resolve only a small fraction of the variables.

Additionally, failure to correctly resolve gene isoforms using short reads contributes

significantly to noise in gene quantification measures.

This dissertation introduces novel mathematical models, machine learning techniques,

and biological techniques to solve the problems described above. Mathematical

models are proposed for simulation of gene network motifs, and raw read simulation.

Machine learning techniques are shown for DNA sequence matching, and DNA

sequence correction.

Results provide novel insights into the low level functionality of gene networks. Also

shown is the ability to use normalization techniques to aggregate data for gene network

inference leading to larger data sets while minimizing increases in inter-experimental

noise. Results also demonstrate that high error rates experienced by third generation

sequencing are significantly different than previous error profiles, and that these errors can be modeled, simulated, and rectified. Finally, techniques are provided for amending this DNA error that preserve the benefits of third generation sequencing.
ContributorsFaucon, Philippe Christophe (Author) / Liu, Huan (Thesis advisor) / Wang, Xiao (Committee member) / Crook, Sharon M (Committee member) / Wang, Yalin (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)
Created2017
156735-Thumbnail Image.png
Description
The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture

The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture certain properties of the networks. With the learned node representations, machine learning and data mining algorithms can be applied for network mining tasks such as link prediction and node classification. Because of its ability to learn good node representations, network representation learning is attracting increasing attention and various network embedding algorithms are proposed.

Despite the success of these network embedding methods, the majority of them are dedicated to static plain networks, i.e., networks with fixed nodes and links only; while in social media, networks can present in various formats, such as attributed networks, signed networks, dynamic networks and heterogeneous networks. These social networks contain abundant rich information to alleviate the network sparsity problem and can help learn a better network representation; while plain network embedding approaches cannot tackle such networks. For example, signed social networks can have both positive and negative links. Recent study on signed networks shows that negative links have added value in addition to positive links for many tasks such as link prediction and node classification. However, the existence of negative links challenges the principles used for plain network embedding. Thus, it is important to study signed network embedding. Furthermore, social networks can be dynamic, where new nodes and links can be introduced anytime. Dynamic networks can reveal the concept drift of a user and require efficiently updating the representation when new links or users are introduced. However, static network embedding algorithms cannot deal with dynamic networks. Therefore, it is important and challenging to propose novel algorithms for tackling different types of social networks.

In this dissertation, we investigate network representation learning in social media. In particular, we study representative social networks, which includes attributed network, signed networks, dynamic networks and document networks. We propose novel frameworks to tackle the challenges of these networks and learn representations that not only capture the network structure but also the unique properties of these social networks.
ContributorsWang, Suhang (Author) / Liu, Huan (Thesis advisor) / Aggarwal, Charu (Committee member) / Sen, Arunabha (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2018
156711-Thumbnail Image.png
Description
Social media refers computer-based technology that allows the sharing of information and building the virtual networks and communities. With the development of internet based services and applications, user can engage with social media via computer and smart mobile devices. In recent years, social media has taken the form

Social media refers computer-based technology that allows the sharing of information and building the virtual networks and communities. With the development of internet based services and applications, user can engage with social media via computer and smart mobile devices. In recent years, social media has taken the form of different activities such as social network, business network, text sharing, photo sharing, blogging, etc. With the increasing popularity of social media, it has accumulated a large amount of data which enables understanding the human behavior possible. Compared with traditional survey based methods, the analysis of social media provides us a golden opportunity to understand individuals at scale and in turn allows us to design better services that can tailor to individuals’ needs. From this perspective, we can view social media as sensors, which provides online signals from a virtual world that has no geographical boundaries for the real world individual's activity.

One of the key features for social media is social, where social media users actively interact to each via generating content and expressing the opinions, such as post and comment in Facebook. As a result, sentiment analysis, which refers a computational model to identify, extract or characterize subjective information expressed in a given piece of text, has successfully employs user signals and brings many real world applications in different domains such as e-commerce, politics, marketing, etc. The goal of sentiment analysis is to classify a user’s attitude towards various topics into positive, negative or neutral categories based on textual data in social media. However, recently, there is an increasing number of people start to use photos to express their daily life on social media platforms like Flickr and Instagram. Therefore, analyzing the sentiment from visual data is poise to have great improvement for user understanding.

In this dissertation, I study the problem of understanding human sentiments from large scale collection of social images based on both image features and contextual social network features. We show that neither

visual features nor the textual features are by themselves sufficient for accurate sentiment prediction. Therefore, we provide a way of using both of them, and formulate sentiment prediction problem in two scenarios: supervised and unsupervised. We first show that the proposed framework has flexibility to incorporate multiple modalities of information and has the capability to learn from heterogeneous features jointly with sufficient training data. Secondly, we observe that negative sentiment may related to human mental health issues. Based on this observation, we aim to understand the negative social media posts, especially the post related to depression e.g., self-harm content. Our analysis, the first of its kind, reveals a number of important findings. Thirdly, we extend the proposed sentiment prediction task to a general multi-label visual recognition task to demonstrate the methodology flexibility behind our sentiment analysis model.
ContributorsWang, Yilin (Author) / Li, Baoxin (Thesis advisor) / Liu, Huan (Committee member) / Tong, Hanghang (Committee member) / Chang, Yi (Committee member) / Arizona State University (Publisher)
Created2018
156862-Thumbnail Image.png
Description
Teams are increasingly indispensable to achievements in any organizations. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the {\it social, cognitive} and {\it information} level in relation to team performance and network dynamics. The goal of this dissertation is

Teams are increasingly indispensable to achievements in any organizations. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the {\it social, cognitive} and {\it information} level in relation to team performance and network dynamics. The goal of this dissertation is to create new instruments to {\it predict}, {\it optimize} and {\it explain} teams' performance in the context of composite networks (i.e., social-cognitive-information networks).

Understanding the dynamic mechanisms that drive the success of high-performing teams can provide the key insights into building the best teams and hence lift the productivity and profitability of the organizations. For this purpose, novel predictive models to forecast the long-term performance of teams ({\it point prediction}) as well as the pathway to impact ({\it trajectory prediction}) have been developed. A joint predictive model by exploring the relationship between team level and individual level performances has also been proposed.

For an existing team, it is often desirable to optimize its performance through expanding the team by bringing a new team member with certain expertise, or finding a new candidate to replace an existing under-performing member. I have developed graph kernel based performance optimization algorithms by considering both the structural matching and skill matching to solve the above enhancement scenarios. I have also worked towards real time team optimization by leveraging reinforcement learning techniques.

With the increased complexity of the machine learning models for predicting and optimizing teams, it is critical to acquire a deeper understanding of model behavior. For this purpose, I have investigated {\em explainable prediction} -- to provide explanation behind a performance prediction and {\em explainable optimization} -- to give reasons why the model recommendations are good candidates for certain enhancement scenarios.
ContributorsLi, Liangyue (Author) / Tong, Hanghang (Thesis advisor) / Baral, Chitta (Committee member) / Liu, Huan (Committee member) / Buchler, Norbou (Committee member) / Arizona State University (Publisher)
Created2018