Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Connecting users with similar interests for group understanding

Description

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our…

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our connections and the expansion of our social networks easier. The aggregation of people who share common interests forms social groups, which are fundamental parts of our social lives. Social behavioral analysis at a group level is an active research area and attracts many interests from the industry. Challenges of my work mainly arise from the scale and complexity of user generated behavioral data. The multiple types of interactions, highly dynamic nature of social networking and the volatile user behavior suggest that these data are complex and big in general. Effective and efficient approaches are required to analyze and interpret such data. My work provide effective channels to help connect the like-minded and, furthermore, understand user behavior at a group level. The contributions of this dissertation are in threefold: (1) proposing novel representation of collective tagging knowledge via tag networks; (2) proposing the new information spreader identification problem in egocentric soical networks; (3) defining group profiling as a systematic approach to understanding social groups. In sum, the research proposes novel concepts and approaches for connecting the like-minded, enables the understanding of user groups, and exposes interesting research opportunities.

ContributorsWang, Xufei (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Sundaram, Hari (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

When is temporal planning really temporal

Description

In this dissertation I develop a deep theory of temporal planning well-suited to analyzing, understanding, and improving the state of the art implementations (as of 2012). At face-value the work is strictly theoretical; nonetheless its impact is entirely real and practical. The easiest portion of that impact to highlight concerns…

In this dissertation I develop a deep theory of temporal planning well-suited to analyzing, understanding, and improving the state of the art implementations (as of 2012). At face-value the work is strictly theoretical; nonetheless its impact is entirely real and practical. The easiest portion of that impact to highlight concerns the notable improvements to the format of the temporal fragment of the International Planning Competitions (IPCs). Particularly: the theory I expound upon here is the primary cause of--and justification for--the altered (i) selection of benchmark problems, and (ii) notion of "winning temporal planner". For higher level motivation: robotics, web service composition, industrial manufacturing, business process management, cybersecurity, space exploration, deep ocean exploration, and logistics all benefit from applying domain-independent automated planning technique. Naturally, actually carrying out such case studies has much to offer. For example, we may extract the lesson that reasoning carefully about deadlines is rather crucial to planning in practice. More generally, effectively automating specifically temporal planning is well-motivated from applications. Entirely abstractly, the aim is to improve the theory of automated temporal planning by distilling from its practice. My thesis is that the key feature of computational interest is concurrency. To support, I demonstrate by way of compilation methods, worst-case counting arguments, and analysis of algorithmic properties such as completeness that the more immediately pressing computational obstacles (facing would-be temporal generalizations of classical planning systems) can be dealt with in theoretically efficient manner. So more accurately the technical contribution here is to demonstrate: The computationally significant obstacle to automated temporal planning that remains is just concurrency.

ContributorsCushing, William Albemarle (Author) / Kambhampati, Subbarao (Thesis advisor) / Weld, Daniel S. (Committee member) / Smith, David E. (Committee member) / Baral, Chitta (Committee member) / Davalcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2012

An architecture for on-demand wireless sensor networks

Description

Majority of the Sensor networks consist of low-cost autonomously powered devices, and are used to collect data in physical world. Today's sensor network deployments are mostly application specific & owned by a particular entity. Because of this application specific nature & the ownership boundaries, this modus operandi hinders large scale…

Majority of the Sensor networks consist of low-cost autonomously powered devices, and are used to collect data in physical world. Today's sensor network deployments are mostly application specific & owned by a particular entity. Because of this application specific nature & the ownership boundaries, this modus operandi hinders large scale sensing & overall network operational capacity. The main goal of this research work is to create a mechanism to dynamically form personal area networks based on mote class devices spanning ownership boundaries. When coupled with an overlay based control system, this architecture can be conveniently used by a remote client to dynamically create sensor networks (personal area network based) even when the client does not own a network. The nodes here are "borrowed" from existing host networks & the application related to the newly formed network will co-exist with the native applications thanks to concurrency. The result allows users to embed a single collection tree onto spatially distant networks as if they were within communication range. This implementation consists of core operating system & various other external components that support injection maintenance & dissolution sensor network applications at client's request. A large object data dissemination protocol was designed for reliable application injection. The ability of this system to remotely reconfigure a network is useful given the high failure rate of real-world sensor network deployments. Collaborative sensing, various physical phenomenon monitoring also be considered as applications of this architecture.

ContributorsFernando, M. S. R (Author) / Dasgupta, Partha (Thesis advisor) / Bhattacharya, Amiya (Thesis advisor) / Gupta, Sandeep (Committee member) / Arizona State University (Publisher)

Created2013

Ensuring safety of model-based generated code for pervasive health monitoring systems

Description

Wireless technologies for health monitoring systems have seen considerable interest in recent years owing to it's potential to achieve vision of pervasive healthcare, that is healthcare to anyone, anywhere and anytime. Development of wearable wireless medical devices which have the capability to sense, compute, and send physiological information to a…

Wireless technologies for health monitoring systems have seen considerable interest in recent years owing to it's potential to achieve vision of pervasive healthcare, that is healthcare to anyone, anywhere and anytime. Development of wearable wireless medical devices which have the capability to sense, compute, and send physiological information to a mobile gateway, forming a Body Sensor Network (BSN) is considered as a step towards achieving the vision of pervasive health monitoring systems (PHMS). PHMS consisting of wearable body sensors encourages unsupervised long-term monitoring, reducing frequent visit to hospital and nursing cost. Therefore, it is of utmost importance that operation of PHMS must be reliable, safe and have longer lifetime. A model-based automatic code generation provides a state-of-art code generation of sensor and smart phone code from high-level specification of a PHMS. Code generator intakes meta-model of PHMS specification, uses codebase containing code templates and algorithms, and generates platform specific code. Health-Dev, a framework for model-based development of PHMS, uses code generation to implement PHMS in sensor and smart phone. As a part of this thesis, model-based automatic code generation was evaluated and experimentally validated. The generated code was found to be safe in terms of ensuring no race condition, array, or pointer related errors in the generated code and more optimized as compared to hand-written BSN benchmark code in terms of lesser unreachable code.

ContributorsVerma, Sunit (Author) / Gupta, Sandeep (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Reisslein, Martin (Committee member) / Arizona State University (Publisher)

Created2013

Modeling, experimentation, and analysis of data center waste heat recovery and utilization

Description

Increasing computational demands in data centers require facilities to operate at higher ambient temperatures and at higher power densities. Conventionally, data centers are cooled with electrically-driven vapor-compressor equipment. This paper proposes an alternative data center cooling architecture that is heat-driven. The source is heat produced by the computer equipment. This…

Increasing computational demands in data centers require facilities to operate at higher ambient temperatures and at higher power densities. Conventionally, data centers are cooled with electrically-driven vapor-compressor equipment. This paper proposes an alternative data center cooling architecture that is heat-driven. The source is heat produced by the computer equipment. This dissertation details experiments investigating the quantity and quality of heat that can be captured from a liquid-cooled microprocessor on a computer server blade from a data center. The experiments involve four liquid-cooling setups and associated heat-extraction, including a radical approach using mineral oil. The trials examine the feasibility of using the thermal energy from a CPU to drive a cooling process. Uniquely, the investigation establishes an interesting and useful relationship simultaneously among CPU temperatures, power, and utilization levels. In response to the system data, this project explores the heat, temperature and power effects of adding insulation, varying water flow, CPU loading, and varying the cold plate-to-CPU clamping pressure. The idea is to provide an optimal and steady range of temperatures necessary for a chiller to operate. Results indicate an increasing relationship among CPU temperature, power and utilization. Since the dissipated heat can be captured and removed from the system for reuse elsewhere, the need for electricity-consuming computer fans is eliminated. Thermocouple readings of CPU temperatures as high as 93°C and a calculated CPU thermal energy up to 67Wth show a sufficiently high temperature and thermal energy to serve as the input temperature and heat medium input to an absorption chiller. This dissertation performs a detailed analysis of the exergy of a processor and determines the maximum amount of energy utilizable for work. Exergy as a source of realizable work is separated into its two contributing constituents: thermal exergy and informational exergy. The informational exergy is that usable form of work contained within the most fundamental unit of information output by a switching device within a CPU. Exergetic thermal, informational and efficiency values are calculated and plotted for our particular CPU, showing how the datasheet standards compare with experimental values. The dissertation concludes with a discussion of the work's significance.

ContributorsHaywood, Anna (Author) / Phelan, Patrick E (Thesis advisor) / Herrmann, Marcus (Committee member) / Gupta, Sandeep (Committee member) / Trimble, Steve (Committee member) / Myhajlenko, Stefan (Committee member) / Arizona State University (Publisher)

Created2014

Thermal aware scheduling in hadoop map reduce framework

Description

The energy consumption of data centers is increasing steadily along with the associ- ated power-density. Approximately half of such energy consumption is attributed to the cooling energy, as a result of which reducing cooling energy along with reducing servers energy consumption in data centers is becoming imperative so as to…

The energy consumption of data centers is increasing steadily along with the associ- ated power-density. Approximately half of such energy consumption is attributed to the cooling energy, as a result of which reducing cooling energy along with reducing servers energy consumption in data centers is becoming imperative so as to achieve greening of the data centers. This thesis deals with cooling energy management in data centers running data-processing frameworks. In particular, we propose ther- mal aware scheduling for MapReduce framework and its Hadoop implementation to reduce cooling energy in data centers. Data-processing frameworks run many low- priority batch processing jobs, such as background log analysis, that do not have strict completion time requirements; they can be delayed by a bounded amount of time. Cooling energy savings are possible by being able to temporally spread the workload, and assign it to the computing equipments which reduce the heat recirculation in data center room and therefore the load on the cooling systems. We implement our scheme in Hadoop and performs some experiments using both CPU-intensive and I/O-intensive workload benchmarks in order to evaluate the efficiency of our scheme. The evaluation results highlight that our thermal aware scheduling reduces hot-spots and makes uniform temperature distribution within the data center possible. Sum- marizing the contribution, we incorporated thermal awareness in Hadoop MapReduce framework by enhancing the native scheduler to make it thermally aware, compare the Thermal Aware Scheduler(TAS) with the Hadoop scheduler (FCFS) by running PageRank and TeraSort benchmarks in the BlueTool data center of Impact lab and show that there is reduction in peak temperature and decrease in cooling power using TAS over FCFS scheduler.

ContributorsKole, Sayan (Author) / Gupta, Sandeep (Thesis advisor) / Huang, Dijiang (Committee member) / Varsamopoulos, Georgios (Committee member) / Arizona State University (Publisher)

Created2013

Utility of considering multiple alternative rectifications in data cleaning

Description

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many…

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.

ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

TweetSense: recommending hashtags for orphaned tweets by exploiting social signals in Twitter

Description

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with…

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with his/her friend on that topic. Finding context for an Orphaned tweet manually is challenging because of larger social graph of a user , the enormous volume of tweets generated per second, topic diversity, and limited information from tweet length of 140 characters. To help the user to get the context of an orphaned tweet, this thesis aims at building a hashtag recommendation system called TweetSense, to suggest hashtags as a context or metadata for the orphaned tweets. This in turn would increase user's social engagement and impact Twitter to maintain its monthly active online users in its social network. In contrast to other existing systems, this hashtag recommendation system recommends personalized hashtags by exploiting the social signals of users in Twitter. The novelty with this system is that it emphasizes on selecting the suitable candidate set of hashtags from the related tweets of user's social graph (timeline).The system then rank them based on the combination of features scores computed from their tweet and user related features. It is evaluated based on its ability to predict suitable hashtags for a random sample of tweets whose existing hashtags are deliberately removed for evaluation. I present a detailed internal empirical evaluation of TweetSense, as well as an external evaluation in comparison with current state of the art method.

ContributorsVijayakumar, Manikandan (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2014

Planning with incomplete user preferences and domain models

Description

Current work in planning assumes that user preferences and/or domain dynamics are completely specified in advance, and aims to search for a single solution plan to satisfy these. In many real world scenarios, however, providing a complete specification of user preferences and domain dynamics becomes a time-consuming and error-prone task.…

Current work in planning assumes that user preferences and/or domain dynamics are completely specified in advance, and aims to search for a single solution plan to satisfy these. In many real world scenarios, however, providing a complete specification of user preferences and domain dynamics becomes a time-consuming and error-prone task. More often than not, a user may provide no knowledge or at best partial knowledge of her preferences with respect to a desired plan. Similarly, a domain writer may only be able to determine certain parts, not all, of the model of some actions in a domain. Such modeling issues requires new concepts on what a solution should be, and novel techniques in solving the problem. When user preferences are incomplete, rather than presenting a single plan, the planner must instead provide a set of plans containing one or more plans that are similar to the one that the user prefers. This research first proposes the usage of different measures to capture the quality of such plan sets. These are domain-independent distance measures based on plan elements if no knowledge of the user preferences is given, or the Integrated Preference Function measure in case incomplete knowledge of such preferences is provided. It then investigates various heuristic approaches to generate plan sets in accordance with these measures, and presents empirical results demonstrating the promise of the methods. The second part of this research addresses planning problems with incomplete domain models, specifically those annotated with possible preconditions and effects of actions. It formalizes the notion of plan robustness capturing the probability of success for plans during execution. A method of assessing plan robustness based on the weighted model counting approach is proposed. Two approaches for synthesizing robust plans are introduced. The first one compiles the robust plan synthesis problems to the conformant probabilistic planning problems. The second approximates the robustness measure with lower and upper bounds, incorporating them into a stochastic local search for estimating distance heuristic to a goal state. The resulting planner outperforms a state-of-the-art planner that can handle incomplete domain models in both plan quality and planning time.

ContributorsNguyễn, Tuấn Anh (Author) / Kambhampati, Subbarao (Thesis advisor) / Baral, Chitta (Committee member) / Do, Minh (Committee member) / Lee, Joohyung (Committee member) / Smith, David E. (Committee member) / Arizona State University (Publisher)

Created2014