Matching Items (129)

Filtering by

Clear all filters

149695-Thumbnail Image.png

Materialized views over heterogeneous structured data sources in a distributed event stream processing environment

Description

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment.

Contributors

Agent

Created

Date Created
2011

151718-Thumbnail Image.png

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone.

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

Contributors

Agent

Created

Date Created
2013

152310-Thumbnail Image.png

We built this town: raising activity awareness through the workplace using gamification

Description

The wide adoption and continued advancement of information and communications technologies (ICT) have made it easier than ever for individuals and groups to stay connected over long distances. These advances have greatly contributed in dramatically changing the dynamics of the

The wide adoption and continued advancement of information and communications technologies (ICT) have made it easier than ever for individuals and groups to stay connected over long distances. These advances have greatly contributed in dramatically changing the dynamics of the modern day workplace to the point where it is now commonplace to see large, distributed multidisciplinary teams working together on a daily basis. However, in this environment, motivating, understanding, and valuing the diverse contributions of individual workers in collaborative enterprises becomes challenging. To address these issues, this thesis presents the goals, design, and implementation of Taskville, a distributed workplace game played by teams on large, public displays. Taskville uses a city building metaphor to represent the completion of individual and group tasks within an organization. Promising results from two usability studies and two longitudinal studies at a multidisciplinary school demonstrate that Taskville supports personal reflection and improves team awareness through an engaging workplace activity.

Contributors

Agent

Created

Date Created
2013

152236-Thumbnail Image.png

A cloud based continuous delivery software developing system on Vlab platform

Description

Continuous Delivery, as one of the youngest and most popular member of agile model family, has become a popular concept and method in software development industry recently. Instead of the traditional software development method, which requirements and solutions must be

Continuous Delivery, as one of the youngest and most popular member of agile model family, has become a popular concept and method in software development industry recently. Instead of the traditional software development method, which requirements and solutions must be fixed before starting software developing, it promotes adaptive planning, evolutionary development and delivery, and encourages rapid and flexible response to change. However, several problems prevent Continuous Delivery to be introduced into education world. Taking into the consideration of the barriers, we propose a new Cloud based Continuous Delivery Software Developing System. This system is designed to fully utilize the whole life circle of software developing according to Continuous Delivery concepts in a virtualized environment in Vlab platform.

Contributors

Agent

Created

Date Created
2013

152100-Thumbnail Image.png

Decentralized information search

Description

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated with the query. In general, human networks are good in answering imprecise queries. We try to use the social network of a person to answer his query. Our research aims at designing a framework that exploits the user's social network in order to maximize the answers for a given query. Exploiting an user's social network has several challenges. The major challenge is that the user's immediate social circle may not possess the answer for the given query, and hence the framework designed needs to carry out the query diffusion process across the network. The next challenge involves in finding the right set of seeds to pass the query to in the user's social circle. One other challenge is to incentivize people in the social network to respond to the query and thereby maximize the quality and quantity of replies. Our proposed framework is a mobile application where an individual can either respond to the query or forward it to his friends. We simulated the query diffusion process in three types of graphs: Small World, Random and Preferential Attachment. Given a type of network and a particular query, we carried out the query diffusion by selecting seeds based on attributes of the seed. The main attributes are Topic relevance, Replying or Forwarding probability and Time to Respond. We found that there is a considerable increase in the number of replies attained, even without saturating the user's network, if we adopt an optimal seed selection process. We found the output of the optimal algorithm to be satisfactory as the number of replies received at the interrogator's end was close to three times the number of neighbors an interrogator has. We addressed the challenge of incentivizing people to respond by associating a particular amount of points for each query asked, and awarding the same to people involved in answering the query. Thus, we aim to design a mobile application based on our proposed framework so that it helps in maximizing the replies for the interrogator's query by diffusing the query across his/her social network.

Contributors

Agent

Created

Date Created
2013

151802-Thumbnail Image.png

The classification of domain concepts in object-oriented systems

Description

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

Contributors

Agent

Created

Date Created
2013

152337-Thumbnail Image.png

Study of an epidemic multiple behavior diffusion model in a resource constrained social network

Description

In contemporary society, sustainability and public well-being have been pressing challenges. Some of the important questions are:how can sustainable practices, such as reducing carbon emission, be encouraged? , How can a healthy lifestyle be maintained?Even though individuals are interested, they

In contemporary society, sustainability and public well-being have been pressing challenges. Some of the important questions are:how can sustainable practices, such as reducing carbon emission, be encouraged? , How can a healthy lifestyle be maintained?Even though individuals are interested, they are unable to adopt these behaviors due to resource constraints. Developing a framework to enable cooperative behavior adoption and to sustain it for a long period of time is a major challenge. As a part of developing this framework, I am focusing on methods to understand behavior diffusion over time. Facilitating behavior diffusion with resource constraints in a large population is qualitatively different from promoting cooperation in small groups. Previous work in social sciences has derived conditions for sustainable cooperative behavior in small homogeneous groups. However, how groups of individuals having resource constraint co-operate over extended periods of time is not well understood, and is the focus of my thesis. I develop models to analyze behavior diffusion over time through the lens of epidemic models with the condition that individuals have resource constraint. I introduce an epidemic model SVRS ( Susceptible-Volatile-Recovered-Susceptible) to accommodate multiple behavior adoption. I investigate the longitudinal effects of behavior diffusion by varying different properties of an individual such as resources,threshold and cost of behavior adoption. I also consider how behavior adoption of an individual varies with her knowledge of global adoption. I evaluate my models on several synthetic topologies like complete regular graph, preferential attachment and small-world and make some interesting observations. Periodic injection of early adopters can help in boosting the spread of behaviors and sustain it for a longer period of time. Also, behavior propagation for the classical epidemic model SIRS (Susceptible-Infected-Recovered-Susceptible) does not continue for an infinite period of time as per conventional wisdom. One interesting future direction is to investigate how behavior adoption is affected when number of individuals in a network changes. The affects on behavior adoption when availability of behavior changes with time can also be examined.

Contributors

Agent

Created

Date Created
2013

152541-Thumbnail Image.png

IISS a framework to influence individuals through social signals on a social network

Description

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like sustainability, public health care awareness, or to promote a specific product? The focus of my thesis is to answer that question. In this thesis, I develop a framework to personalize social signals for users to guide their activities on an online platform. As the result, we gradually nudge the activity distribution on the platform from the initial distribution p to the target distribution q. My work is particularly applicable to guiding collaborations, guiding collective actions, and online advertising. In particular, I first propose a probabilistic model on how users behave and how information flows on the platform. The main part of this thesis after that discusses the Influence Individuals through Social Signals (IISS) framework. IISS consists of four main components: (1) Learner: it learns users' interests and characteristics from their historical activities using Bayesian model, (2) Calculator: it uses gradient descent method to compute the intermediate activity distributions, (3) Selector: it selects users who can be influenced to adopt or drop specific activities, (4) Designer: it personalizes social signals for each user. I evaluate the performance of IISS framework by simulation on several network topologies such as preferential attachment, small world, and random. I show that the framework gradually nudges users' activities to approach the target distribution. I use both simulation and mathematical method to analyse convergence properties such as how fast and how close we can approach the target distribution. When the number of activities is 3, I show that for about 45% of target distributions, we can achieve KL-divergence as low as 0.05. But for some other distributions KL-divergence can be as large as 0.5.

Contributors

Agent

Created

Date Created
2014

153901-Thumbnail Image.png

Online ET-LDA - joint modeling of events and their related tweets with online streaming data

Description

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the popular public opinion about these events. So it would be useful to annotate the transcript with tweets. The technical challenge is to align the tweets with the correct segment of the transcript. ET-LDA by Hu et al [9] addresses this issue by modeling the whole process with an LDA-based graphical model. The system segments the transcript into coherent and meaningful parts and also determines if a tweet is a general tweet about the event or it refers to a particular segment of the transcript. One characteristic of the Hu et al’s model is that it expects all the data to be available upfront and uses batch inference procedure. But in many cases we find that data is not available beforehand, and it is often streaming. In such cases it is infeasible to repeatedly run the batch inference algorithm. My thesis presents an online inference algorithm for the ET-LDA model, with a continuous stream of tweet data and compare their runtime and performance to existing algorithms.

Contributors

Agent

Created

Date Created
2015

153085-Thumbnail Image.png

Simultaneous variable and feature group selection in heterogeneous learning: optimization and applications

Description

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous data, it is interesting to design efficient machine learning models that are capable of performing variable selection and feature group (data source) selection simultaneously (a.k.a bi-level selection). In this thesis, I carry out research along this direction with a particular focus on designing efficient optimization algorithms. I start with a unified bi-level learning model that contains several existing feature selection models as special cases. Then the proposed model is further extended to tackle the block-wise missing data, one of the major challenges in the diagnosis of Alzheimer's Disease (AD). Moreover, I propose a novel interpretable sparse group feature selection model that greatly facilitates the procedure of parameter tuning and model selection. Last but not least, I show that by solving the sparse group hard thresholding problem directly, the sparse group feature selection model can be further improved in terms of both algorithmic complexity and efficiency. Promising results are demonstrated in the extensive evaluation on multiple real-world data sets.

Contributors

Agent

Created

Date Created
2014