Matching Items (31)

Filtering by

Clear all filters

134286-Thumbnail Image.png

Fielding an Autonomous Cobot in a University Environment: Engineering and Evaluation

Description

Many researchers aspire to create robotics systems that assist humans in common office tasks, especially by taking over delivery and messaging tasks. For meaningful interactions to take place, a mobile robot must be able to identify the humans it interacts

Many researchers aspire to create robotics systems that assist humans in common office tasks, especially by taking over delivery and messaging tasks. For meaningful interactions to take place, a mobile robot must be able to identify the humans it interacts with and communicate successfully with them. It must also be able to successfully navigate the office environment. While mobile robots are well suited for navigating and interacting with elements inside a deterministic office environment, attempting to interact with human beings in an office environment remains a challenge due to the limits on the amount of cost-efficient compute power onboard the robot. In this work, I propose the use of remote cloud services to offload intensive interaction tasks. I detail the interactions required in an office environment and discuss the challenges faced when implementing a human-robot interaction platform in a stochastic office environment. I also experiment with cloud services for facial recognition, speech recognition, and environment navigation and discuss my results. As part of my thesis, I have implemented a human-robot interaction system utilizing cloud APIs into a mobile robot, enabling it to navigate the office environment, identify humans within the environment, and communicate with these humans.

Contributors

Created

Date Created
2017-05

151718-Thumbnail Image.png

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone.

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

Contributors

Agent

Created

Date Created
2013

153872-Thumbnail Image.png

Making thin data thick: user behavior analysis with minimum information

Description

With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of

With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of user-generated content have introduced the human behavior's big-data.

This big data has brought about countless opportunities for analyzing human behavior at scale. However, is this data enough? Unfortunately, the data available at the individual-level is limited for most users. This limited individual-level data is often referred to as thin data. Hence, researchers face a big-data paradox, where this big-data is a large collection of mostly limited individual-level information. Researchers are often constrained to derive meaningful insights regarding online user behavior with this limited information. Simply put, they have to make thin data thick.

In this dissertation, how human behavior's thin data can be made thick is investigated. The chief objective of this dissertation is to demonstrate how traces of human behavior can be efficiently gleaned from the, often limited, individual-level information; hence, introducing an all-inclusive user behavior analysis methodology that considers social media users with different levels of information availability. To that end, the absolute minimum information in terms of both link or content data that is available for any social media user is determined. Utilizing only minimum information in different applications on social media such as prediction or recommendation tasks allows for solutions that are (1) generalizable to all social media users and that are (2) easy to implement. However, are applications that employ only minimum information as effective or comparable to applications that use more information?

In this dissertation, it is shown that common research challenges such as detecting malicious users or friend recommendation (i.e., link prediction) can be effectively performed using only minimum information. More importantly, it is demonstrated that unique user identification can be achieved using minimum information. Theoretical boundaries of unique user identification are obtained by introducing social signatures. Social signatures allow for user identification in any large-scale network on social media. The results on single-site user identification are generalized to multiple sites and it is shown how the same user can be uniquely identified across multiple sites using only minimum link or content information.

The findings in this dissertation allows finding the same user across multiple sites, which in turn has multiple implications. In particular, by identifying the same users across sites, (1) patterns that users exhibit across sites are identified, (2) how user behavior varies across sites is determined, and (3) activities that are observed only across sites are identified and studied.

Contributors

Agent

Created

Date Created
2015

153901-Thumbnail Image.png

Online ET-LDA - joint modeling of events and their related tweets with online streaming data

Description

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the popular public opinion about these events. So it would be useful to annotate the transcript with tweets. The technical challenge is to align the tweets with the correct segment of the transcript. ET-LDA by Hu et al [9] addresses this issue by modeling the whole process with an LDA-based graphical model. The system segments the transcript into coherent and meaningful parts and also determines if a tweet is a general tweet about the event or it refers to a particular segment of the transcript. One characteristic of the Hu et al’s model is that it expects all the data to be available upfront and uses batch inference procedure. But in many cases we find that data is not available beforehand, and it is often streaming. In such cases it is infeasible to repeatedly run the batch inference algorithm. My thesis presents an online inference algorithm for the ET-LDA model, with a continuous stream of tweet data and compare their runtime and performance to existing algorithms.

Contributors

Agent

Created

Date Created
2015

153858-Thumbnail Image.png

Improved, scalable, and personalized context recovery system: E-TweetSense

Description

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who has chosen to subscribe to the tweets from the originator by following the originator. Although, hashtags are used to tag tweets in an effort to attach context to the tweets, many tweets do not have a hashtag. Such tweets are called orphan tweets and they adversely affect the experience of a browser.

A hashtag is a type of label or meta-data tag used in social networks and micro-blogging services which makes it easier for users to find messages with a specific theme or content. The context of a tweet can be defined as a set of one or more hashtags. Users often do not use hashtags to tag their tweets. This leads to the problem of missing context for tweets. To address the problem of missing hashtags, a statistical method was proposed which predicts most likely hashtags based on the social circle of an originator.

In this thesis, we propose to improve on the existing context recovery system by selectively limiting the candidate set of hashtags to be derived from the intimate circle of the originator rather than from every user in the social network of the originator. This helps in reducing the computation, increasing speed of prediction, scaling the system to originators with large social networks while still preserving most of the accuracy of the predictions. We also propose to not only derive the candidate hashtags from the social network of the originator but also derive the candidate hashtags based on the content of the tweet. We further propose to learn personalized statistical models according to the adoption patterns of different originators. This helps in not only identifying the personalized candidate set of hashtags based on the social circle and content of the tweets but also in customizing the hashtag adoption pattern to the originator of the tweet.

Contributors

Agent

Created

Date Created
2015

151129-Thumbnail Image.png

Trust and profit sensitive ranking for the deep web and on-line advertisements

Description

Ranking is of definitive importance to both usability and profitability of web information systems. While ranking of results is crucial for the accessibility of information to the user, the ranking of online ads increases the profitability of the search provider.

Ranking is of definitive importance to both usability and profitability of web information systems. While ranking of results is crucial for the accessibility of information to the user, the ranking of online ads increases the profitability of the search provider. The scope of my thesis includes both search and ad ranking. I consider the emerging problem of ranking the deep web data considering trustworthiness and relevance. I address the end-to-end deep web ranking by focusing on: (i) ranking and selection of the deep web databases (ii) topic sensitive ranking of the sources (iii) ranking the result tuples from the selected databases. Especially, assessing the trustworthiness and relevances of results for ranking is hard since the currently used link analysis is inapplicable (since deep web records do not have links). I formulated a method---namely SourceRank---to assess the trustworthiness and relevance of the sources based on the inter-source agreement. Secondly, I extend the SourceRank to consider the topic of the agreeing sources in multi-topic environments. Further, I formulate a ranking sensitive to trustworthiness and relevance for the individual results returned by the selected sources. For ad ranking, I formulate a generalized ranking function---namely Click Efficiency (CE)---based on a realistic user click model of ads and documents. The CE ranking considers hitherto ignored parameters of perceived relevance and user dissatisfaction. CE ranking guaranteeing optimal utilities for the click model. Interestingly, I show that the existing ad and document ranking functions are reduced forms of the CE ranking under restrictive assumptions. Subsequently, I extend the CE ranking to include a pricing mechanism, designing a complete auction mechanism. My analysis proves several desirable properties including revenue dominance over popular Vickery-Clarke-Groves (VCG) auctions for the same bid vector and existence of a Nash equilibrium in pure strategies. The equilibrium is socially optimal, and revenue equivalent to the truthful VCG equilibrium. Further, I relax the independence assumption in CE ranking and analyze the diversity ranking problem. I show that optimal diversity ranking is NP-Hard in general, and that a constant time approximation algorithm is not likely.

Contributors

Agent

Created

Date Created
2012

151144-Thumbnail Image.png

Partial satisfaction planning: representation and solving methods

Description

Automated planning problems classically involve finding a sequence of actions that transform an initial state to some state satisfying a conjunctive set of goals with no temporal constraints. But in many real-world problems, the best plan may involve satisfying only

Automated planning problems classically involve finding a sequence of actions that transform an initial state to some state satisfying a conjunctive set of goals with no temporal constraints. But in many real-world problems, the best plan may involve satisfying only a subset of goals or missing defined goal deadlines. For example, this may be required when goals are logically conflicting, or when there are time or cost constraints such that achieving all goals on time may be too expensive. In this case, goals and deadlines must be declared as soft. I call these partial satisfaction planning (PSP) problems. In this work, I focus on particular types of PSP problems, where goals are given a quantitative value based on whether (or when) they are achieved. The objective is to find a plan with the best quality. A first challenge is in finding adequate goal representations that capture common types of goal achievement rewards and costs. One popular representation is to give a single reward on each goal of a planning problem. I further expand on this approach by allowing users to directly introduce utility dependencies, providing for changes of goal achievement reward directly based on the goals a plan achieves. After, I introduce time-dependent goal costs, where a plan incurs penalty if it will achieve a goal past a specified deadline. To solve PSP problems with goal utility dependencies, I look at using state-of-the-art methodologies currently employed for classical planning problems involving heuristic search. In doing so, one faces the challenge of simultaneously determining the best set of goals and plan to achieve them. This is complicated by utility dependencies defined by a user and cost dependencies within the plan. To address this, I introduce a set of heuristics based on combinations using relaxed plans and integer programming formulations. Further, I explore an approach to improve search through learning techniques by using automatically generated state features to find new states from which to search. Finally, the investigation into handling time-dependent goal costs leads us to an improved search technique derived from observations based on solving discretized approximations of cost functions.

Contributors

Agent

Created

Date Created
2012

150062-Thumbnail Image.png

TaxiWorld: developing and evaluating solution methods for multi-agent planning domains

Description

TaxiWorld is a Matlab simulation of a city with a fleet of taxis which operate within it, with the goal of transporting passengers to their destinations. The size of the city, as well as the number of available taxis and

TaxiWorld is a Matlab simulation of a city with a fleet of taxis which operate within it, with the goal of transporting passengers to their destinations. The size of the city, as well as the number of available taxis and the frequency and general locations of fare appearances can all be set on a scenario-by-scenario basis. The taxis must attempt to service the fares as quickly as possible, by picking each one up and carrying it to its drop-off location. The TaxiWorld scenario is formally modeled using both Decentralized Partially-Observable Markov Decision Processes (Dec-POMDPs) and Multi-agent Markov Decision Processes (MMDPs). The purpose of developing formal models is to learn how to build and use formal Markov models, such as can be given to planners to solve for optimal policies in problem domains. However, finding optimal solutions for Dec-POMDPs is NEXP-Complete, so an empirical algorithm was also developed as an improvement to the method already in use on the simulator, and the methods were compared in identical scenarios to determine which is more effective. The empirical method is of course not optimal - rather, it attempts to simply account for some of the most important factors to achieve an acceptable level of effectiveness while still retaining a reasonable level of computational complexity for online solving.

Contributors

Agent

Created

Date Created
2011

150235-Thumbnail Image.png

Topic sensitive sourcerank: extending sourcerank for performing context-sensitive search over deep-web

Description

Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based

Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based local measures for assessing source quality. These local measures are necessary but not sufficient as they are agnostic to source trustworthiness and result importance, which, given the autonomous and uncurated nature of deep-web, have become indispensible for searching deep-web. SourceRank provides a global measure for assessing source quality based on source trustworthiness and result importance. SourceRank's effectiveness has been evaluated in single-topic deep-web environments. The goal of the thesis is to extend sourcerank to a multi-topic deep-web environment. Topic-sensitive sourcerank is introduced as an effective way of extending sourcerank to a deep-web environment containing a set of representative topics. In topic-sensitive sourcerank, multiple sourcerank vectors are created, each biased towards a representative topic. At query time, using the topic of query keywords, a query-topic sensitive, composite sourcerank vector is computed as a linear combination of these pre-computed biased sourcerank vectors. Extensive experiments on more than a thousand sources in multiple domains show 18-85% improvements in result quality over Google Product Search and other existing methods.

Contributors

Agent

Created

Date Created
2011

150534-Thumbnail Image.png

Bridging the gap between classical logic based formalisms and logic programs

Description

Different logic-based knowledge representation formalisms have different limitations either with respect to expressivity or with respect to computational efficiency. First-order logic, which is the basis of Description Logics (DLs), is not suitable for defeasible reasoning due to its monotonic nature.

Different logic-based knowledge representation formalisms have different limitations either with respect to expressivity or with respect to computational efficiency. First-order logic, which is the basis of Description Logics (DLs), is not suitable for defeasible reasoning due to its monotonic nature. The nonmonotonic formalisms that extend first-order logic, such as circumscription and default logic, are expressive but lack efficient implementations. The nonmonotonic formalisms that are based on the declarative logic programming approach, such as Answer Set Programming (ASP), have efficient implementations but are not expressive enough for representing and reasoning with open domains. This dissertation uses the first-order stable model semantics, which extends both first-order logic and ASP, to relate circumscription to ASP, and to integrate DLs and ASP, thereby partially overcoming the limitations of the formalisms. By exploiting the relationship between circumscription and ASP, well-known action formalisms, such as the situation calculus, the event calculus, and Temporal Action Logics, are reformulated in ASP. The advantages of these reformulations are shown with respect to the generality of the reasoning tasks that can be handled and with respect to the computational efficiency. The integration of DLs and ASP presented in this dissertation provides a framework for integrating rules and ontologies for the semantic web. This framework enables us to perform nonmonotonic reasoning with DL knowledge bases. Observing the need to integrate action theories and ontologies, the above results are used to reformulate the problem of integrating action theories and ontologies as a problem of integrating rules and ontologies, thus enabling us to use the computational tools developed in the context of the latter for the former.

Contributors

Agent

Created

Date Created
2012