Matching Items (9)

Filtering by

Clear all filters

152100-Thumbnail Image.png

Decentralized information search

Description

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated with the query. In general, human networks are good in answering imprecise queries. We try to use the social network of a person to answer his query. Our research aims at designing a framework that exploits the user's social network in order to maximize the answers for a given query. Exploiting an user's social network has several challenges. The major challenge is that the user's immediate social circle may not possess the answer for the given query, and hence the framework designed needs to carry out the query diffusion process across the network. The next challenge involves in finding the right set of seeds to pass the query to in the user's social circle. One other challenge is to incentivize people in the social network to respond to the query and thereby maximize the quality and quantity of replies. Our proposed framework is a mobile application where an individual can either respond to the query or forward it to his friends. We simulated the query diffusion process in three types of graphs: Small World, Random and Preferential Attachment. Given a type of network and a particular query, we carried out the query diffusion by selecting seeds based on attributes of the seed. The main attributes are Topic relevance, Replying or Forwarding probability and Time to Respond. We found that there is a considerable increase in the number of replies attained, even without saturating the user's network, if we adopt an optimal seed selection process. We found the output of the optimal algorithm to be satisfactory as the number of replies received at the interrogator's end was close to three times the number of neighbors an interrogator has. We addressed the challenge of incentivizing people to respond by associating a particular amount of points for each query asked, and awarding the same to people involved in answering the query. Thus, we aim to design a mobile application based on our proposed framework so that it helps in maximizing the replies for the interrogator's query by diffusing the query across his/her social network.

Contributors

Agent

Created

Date Created
2013

152337-Thumbnail Image.png

Study of an epidemic multiple behavior diffusion model in a resource constrained social network

Description

In contemporary society, sustainability and public well-being have been pressing challenges. Some of the important questions are:how can sustainable practices, such as reducing carbon emission, be encouraged? , How can a healthy lifestyle be maintained?Even though individuals are interested, they

In contemporary society, sustainability and public well-being have been pressing challenges. Some of the important questions are:how can sustainable practices, such as reducing carbon emission, be encouraged? , How can a healthy lifestyle be maintained?Even though individuals are interested, they are unable to adopt these behaviors due to resource constraints. Developing a framework to enable cooperative behavior adoption and to sustain it for a long period of time is a major challenge. As a part of developing this framework, I am focusing on methods to understand behavior diffusion over time. Facilitating behavior diffusion with resource constraints in a large population is qualitatively different from promoting cooperation in small groups. Previous work in social sciences has derived conditions for sustainable cooperative behavior in small homogeneous groups. However, how groups of individuals having resource constraint co-operate over extended periods of time is not well understood, and is the focus of my thesis. I develop models to analyze behavior diffusion over time through the lens of epidemic models with the condition that individuals have resource constraint. I introduce an epidemic model SVRS ( Susceptible-Volatile-Recovered-Susceptible) to accommodate multiple behavior adoption. I investigate the longitudinal effects of behavior diffusion by varying different properties of an individual such as resources,threshold and cost of behavior adoption. I also consider how behavior adoption of an individual varies with her knowledge of global adoption. I evaluate my models on several synthetic topologies like complete regular graph, preferential attachment and small-world and make some interesting observations. Periodic injection of early adopters can help in boosting the spread of behaviors and sustain it for a longer period of time. Also, behavior propagation for the classical epidemic model SIRS (Susceptible-Infected-Recovered-Susceptible) does not continue for an infinite period of time as per conventional wisdom. One interesting future direction is to investigate how behavior adoption is affected when number of individuals in a network changes. The affects on behavior adoption when availability of behavior changes with time can also be examined.

Contributors

Agent

Created

Date Created
2013

150235-Thumbnail Image.png

Topic sensitive sourcerank: extending sourcerank for performing context-sensitive search over deep-web

Description

Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based

Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based local measures for assessing source quality. These local measures are necessary but not sufficient as they are agnostic to source trustworthiness and result importance, which, given the autonomous and uncurated nature of deep-web, have become indispensible for searching deep-web. SourceRank provides a global measure for assessing source quality based on source trustworthiness and result importance. SourceRank's effectiveness has been evaluated in single-topic deep-web environments. The goal of the thesis is to extend sourcerank to a multi-topic deep-web environment. Topic-sensitive sourcerank is introduced as an effective way of extending sourcerank to a deep-web environment containing a set of representative topics. In topic-sensitive sourcerank, multiple sourcerank vectors are created, each biased towards a representative topic. At query time, using the topic of query keywords, a query-topic sensitive, composite sourcerank vector is computed as a linear combination of these pre-computed biased sourcerank vectors. Extensive experiments on more than a thousand sources in multiple domains show 18-85% improvements in result quality over Google Product Search and other existing methods.

Contributors

Agent

Created

Date Created
2011

149307-Thumbnail Image.png

BioEve: user interface framework bridging IE and IR

Description

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many systems aid in navigating through this wide collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also facilitate discovery of the unknown information implicitly conveyed in the texts. This thesis presents the different approaches used for large scale biomedical named entity recognition, and the challenges faced in each. It also proposes BioEve: an integrative framework to fuse a faceted search with information extraction to provide a search service that addresses the user's desire for "completeness" of the query results, not just the top-ranked ones. This information extraction system enables discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines and events from biomedical text on MEDLINE, which is the largest publicly available database of the world's biomedical journal literature. It is an innovative search and discovery service that makes it easier to search
avigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm.

Contributors

Agent

Created

Date Created
2010

149907-Thumbnail Image.png

CPR complex pattern ranking for evaluating top-k pattern queries over event streams

Description

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this work, we propose a complex pattern ranking (CPR) framework for specifying top-k pattern queries over streaming data, present new algorithms to support top-k pattern queries in data streaming environments, and verify the effectiveness and efficiency of the proposed algorithms. The developed algorithms identify top-k matching results satisfying both patterns as well as additional criteria. To support real-time processing of the data streams, instead of computing top-k results from scratch for each time window, we maintain top-k results dynamically as new events come and old ones expire. We also develop new top-k join execution strategies that are able to adapt to the changing situations (e.g., sorted and random access costs, join rates) without having to assume a priori presence of data statistics. Experiments show significant improvements over existing approaches.

Contributors

Agent

Created

Date Created
2011

151323-Thumbnail Image.png

The interpersonal determinants of green purchasing: an assessment of the empirical record

Description

This study investigates how well prominent behavioral theories from social psychology explain green purchasing behavior (GPB). I assess three prominent theories in terms of their suitability for GPB research, their attractiveness to GPB empiricists, and the strength of their empirical

This study investigates how well prominent behavioral theories from social psychology explain green purchasing behavior (GPB). I assess three prominent theories in terms of their suitability for GPB research, their attractiveness to GPB empiricists, and the strength of their empirical evidence when applied to GPB. First, a qualitative assessment of the Theory of Planned Behavior (TPB), Norm Activation Theory (NAT), and Value-Belief-Norm Theory (VBN) is conducted to evaluate a) how well the phenomenon and concepts in each theory match the characteristics of pro-environmental behavior and b) how well the assumptions made in each theory match common assumptions made in purchasing theory. Second, a quantitative assessment of these three theories is conducted in which r2 values and methodological parameters (e.g., sample size) are collected from a sample of 21 empirical studies on GPB to evaluate the accuracy and generalize-ability of empirical evidence. In the qualitative assessment, the results show each theory has its advantages and disadvantages. The results also provide a theoretically-grounded roadmap for modifying each theory to be more suitable for GPB research. In the quantitative assessment, the TPB outperforms the other two theories in every aspect taken into consideration. It proves to 1) create the most accurate models 2) be supported by the most generalize-able empirical evidence and 3) be the most attractive theory to empiricists. Although the TPB establishes itself as the best foundational theory for an empiricist to start from, it's clear that a more comprehensive model is needed to achieve consistent results and improve our understanding of GPB. NAT and the Theory of Interpersonal Behavior (TIB) offer pathways to extend the TPB. The TIB seems particularly apt for this endeavor, while VBN does not appear to have much to offer. Overall, the TPB has already proven to hold a relatively high predictive value. But with the state of ecosystem services continuing to decline on a global scale, it's important for models of GPB to become more accurate and reliable. Better models have the capacity to help marketing professionals, product developers, and policy makers develop strategies for encouraging consumers to buy green products.

Contributors

Agent

Created

Date Created
2012

153229-Thumbnail Image.png

Efficient processing of skyline queries on static data sources, data streams and incomplete datasets

Description

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms have been proposed to address this problem in the context of static data sources. However, these algorithms suffer from several drawbacks: they often need to scan the data sources exhaustively to obtain the skyline-join results; moreover, the pruning techniques employed to eliminate tuples are largely based on expensive tuple-to-tuple comparisons. On the other hand, most data stream techniques focus on single stream skyline queries, thus rendering them unsuitable for skyline-join queries.

Another assumption typically made by most of the earlier skyline algorithms is that the data is complete and all skyline attribute values are available. Due to this constraint, these algorithms cannot be applied to incomplete data sources in which some of the attribute values are missing and are represented by NULL values. There exists a definition of dominance for incomplete data, but this leads to undesirable consequences such as non-transitive and cyclic dominance relations both of which are detrimental to skyline processing.

Based on the aforementioned observations, the main goal of the research described in this dissertation is the design and development of a framework of skyline operators that effectively handles three distinct types of skyline queries: 1) skyline-join queries on static data sources, 2) skyline-window-join queries over data streams, and 3) strata-skyline queries on incomplete datasets. This dissertation presents the unique challenges posed by these skyline queries and addresses the shortcomings of current skyline techniques by proposing efficient methods to tackle the added overhead in processing skyline queries on static data sources, data streams, and incomplete datasets.

Contributors

Agent

Created

Date Created
2014

153303-Thumbnail Image.png

Space adaptation techniques for preference oriented skyline processing

Description

Skyline queries are a well-established technique used in multi criteria decision applications. There is a recent interest among the research community to efficiently compute skylines but the problem of presenting the skyline that takes into account the preferences of the

Skyline queries are a well-established technique used in multi criteria decision applications. There is a recent interest among the research community to efficiently compute skylines but the problem of presenting the skyline that takes into account the preferences of the user is still open. Each user has varying interests towards each attribute and hence "one size fits all" methodology might not satisfy all the users. True user satisfaction can be obtained only when the skyline is tailored specifically for each user based on his preferences.

This research investigates the problem of preference aware skyline processing which consists of inferring the preferences of users and computing a skyline specific to that user, taking into account his preferences. This research proposes a model that transforms the data from a given space to a user preferential space where each attribute represents the preference of the user. This study proposes two techniques "Preferential Skyline Processing" and "Latent Skyline Processing" to efficiently compute preference aware skylines in the user preferential space. Finally, through extensive experiments and performance analysis the correctness of the recommendations and the algorithm's ability to outperform the naïve ones is confirmed.

Contributors

Agent

Created

Date Created
2014

154272-Thumbnail Image.png

Locality sensitive indexing for efficient high-dimensional query answering in the presence of excluded regions

Description

Similarity search in high-dimensional spaces is popular for applications like image

processing, time series, and genome data. In higher dimensions, the phenomenon of

curse of dimensionality kills the effectiveness of most of the index structures, giving

way to approximate methods like Locality Sensitive

Similarity search in high-dimensional spaces is popular for applications like image

processing, time series, and genome data. In higher dimensions, the phenomenon of

curse of dimensionality kills the effectiveness of most of the index structures, giving

way to approximate methods like Locality Sensitive Hashing (LSH), to answer similarity

searches. In addition to range searches and k-nearest neighbor searches, there

is a need to answer negative queries formed by excluded regions, in high-dimensional

data. Though there have been a slew of variants of LSH to improve efficiency, reduce

storage, and provide better accuracies, none of the techniques are capable of

answering queries in the presence of excluded regions.

This thesis provides a novel approach to handle such negative queries. This is

achieved by creating a prefix based hierarchical index structure. First, the higher

dimensional space is projected to a lower dimension space. Then, a one-dimensional

ordering is developed, while retaining the hierarchical traits. The algorithm intelligently

prunes the irrelevant candidates while answering queries in the presence of

excluded regions. While naive LSH would need to filter out the negative query results

from the main results, the new algorithm minimizes the need to fetch the redundant

results in the first place. Experiment results show that this reduces post-processing

cost thereby reducing the query processing time.

Contributors

Agent

Created

Date Created
2016