Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

A cloud based continuous delivery software developing system on Vlab platform

Description

Continuous Delivery, as one of the youngest and most popular member of agile model family, has become a popular concept and method in software development industry recently. Instead of the traditional software development method, which requirements and solutions must be fixed before starting software developing, it promotes adaptive planning, evolutionary…

Continuous Delivery, as one of the youngest and most popular member of agile model family, has become a popular concept and method in software development industry recently. Instead of the traditional software development method, which requirements and solutions must be fixed before starting software developing, it promotes adaptive planning, evolutionary development and delivery, and encourages rapid and flexible response to change. However, several problems prevent Continuous Delivery to be introduced into education world. Taking into the consideration of the barriers, we propose a new Cloud based Continuous Delivery Software Developing System. This system is designed to fully utilize the whole life circle of software developing according to Continuous Delivery concepts in a virtualized environment in Vlab platform.

ContributorsDeng, Yuli (Author) / Huang, Dijiang (Thesis advisor) / Davulcu, Hasan (Committee member) / Chen, Yinong (Committee member) / Arizona State University (Publisher)

Created2013

Time division multiplexing of network access by security groups in high performance computing environments

Description

It is commonly known that High Performance Computing (HPC) systems are most frequently used by multiple users for batch job, parallel computations. Less well known, however, are the numerous HPC systems servicing data so sensitive that administrators enforce either a) sequential job processing - only one job at a time…

It is commonly known that High Performance Computing (HPC) systems are most frequently used by multiple users for batch job, parallel computations. Less well known, however, are the numerous HPC systems servicing data so sensitive that administrators enforce either a) sequential job processing - only one job at a time on the entire system, or b) physical separation - devoting an entire HPC system to a single project until recommissioned. The driving forces behind this type of security are numerous but share the common origin of data so sensitive that measures above and beyond industry standard are used to ensure information security. This paper presents a network security solution that provides information security above and beyond industry standard, yet still enabling multi-user computations on the system. This paper's main contribution is a mechanism designed to enforce high level time division multiplexing of network access (Time Division Multiple Access, or TDMA) according to security groups. By dividing network access into time windows, interactions between applications over the network can be prevented in an easily verifiable way.

ContributorsFerguson, Joshua (Author) / Gupta, Sandeep Ks (Thesis advisor) / Varsamopoulos, Georgios (Committee member) / Ball, George (Committee member) / Arizona State University (Publisher)

Created2013

Connecting users with similar interests for group understanding

Description

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our…

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our connections and the expansion of our social networks easier. The aggregation of people who share common interests forms social groups, which are fundamental parts of our social lives. Social behavioral analysis at a group level is an active research area and attracts many interests from the industry. Challenges of my work mainly arise from the scale and complexity of user generated behavioral data. The multiple types of interactions, highly dynamic nature of social networking and the volatile user behavior suggest that these data are complex and big in general. Effective and efficient approaches are required to analyze and interpret such data. My work provide effective channels to help connect the like-minded and, furthermore, understand user behavior at a group level. The contributions of this dissertation are in threefold: (1) proposing novel representation of collective tagging knowledge via tag networks; (2) proposing the new information spreader identification problem in egocentric soical networks; (3) defining group profiling as a systematic approach to understanding social groups. In sum, the research proposes novel concepts and approaches for connecting the like-minded, enables the understanding of user groups, and exposes interesting research opportunities.

ContributorsWang, Xufei (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Sundaram, Hari (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

When is temporal planning really temporal

Description

In this dissertation I develop a deep theory of temporal planning well-suited to analyzing, understanding, and improving the state of the art implementations (as of 2012). At face-value the work is strictly theoretical; nonetheless its impact is entirely real and practical. The easiest portion of that impact to highlight concerns…

In this dissertation I develop a deep theory of temporal planning well-suited to analyzing, understanding, and improving the state of the art implementations (as of 2012). At face-value the work is strictly theoretical; nonetheless its impact is entirely real and practical. The easiest portion of that impact to highlight concerns the notable improvements to the format of the temporal fragment of the International Planning Competitions (IPCs). Particularly: the theory I expound upon here is the primary cause of--and justification for--the altered (i) selection of benchmark problems, and (ii) notion of "winning temporal planner". For higher level motivation: robotics, web service composition, industrial manufacturing, business process management, cybersecurity, space exploration, deep ocean exploration, and logistics all benefit from applying domain-independent automated planning technique. Naturally, actually carrying out such case studies has much to offer. For example, we may extract the lesson that reasoning carefully about deadlines is rather crucial to planning in practice. More generally, effectively automating specifically temporal planning is well-motivated from applications. Entirely abstractly, the aim is to improve the theory of automated temporal planning by distilling from its practice. My thesis is that the key feature of computational interest is concurrency. To support, I demonstrate by way of compilation methods, worst-case counting arguments, and analysis of algorithmic properties such as completeness that the more immediately pressing computational obstacles (facing would-be temporal generalizations of classical planning systems) can be dealt with in theoretically efficient manner. So more accurately the technical contribution here is to demonstrate: The computationally significant obstacle to automated temporal planning that remains is just concurrency.

ContributorsCushing, William Albemarle (Author) / Kambhampati, Subbarao (Thesis advisor) / Weld, Daniel S. (Committee member) / Smith, David E. (Committee member) / Baral, Chitta (Committee member) / Davalcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2012

Automated testing for RBAC policies

Description

Access control is necessary for information assurance in many of today's applications such as banking and electronic health record. Access control breaches are critical security problems that can result from unintended and improper implementation of security policies. Security testing can help identify security vulnerabilities early and avoid unexpected expensive cost…

Access control is necessary for information assurance in many of today's applications such as banking and electronic health record. Access control breaches are critical security problems that can result from unintended and improper implementation of security policies. Security testing can help identify security vulnerabilities early and avoid unexpected expensive cost in handling breaches for security architects and security engineers. The process of security testing which involves creating tests that effectively examine vulnerabilities is a challenging task. Role-Based Access Control (RBAC) has been widely adopted to support fine-grained access control. However, in practice, due to its complexity including role management, role hierarchy with hundreds of roles, and their associated privileges and users, systematically testing RBAC systems is crucial to ensure the security in various domains ranging from cyber-infrastructure to mission-critical applications. In this thesis, we introduce i) a security testing technique for RBAC systems considering the principle of maximum privileges, the structure of the role hierarchy, and a new security test coverage criterion; ii) a MTBDD (Multi-Terminal Binary Decision Diagram) based representation of RBAC security policy including RHMTBDD (Role Hierarchy MTBDD) to efficiently generate effective positive and negative security test cases; and iii) a security testing framework which takes an XACML-based RBAC security policy as an input, parses it into a RHMTBDD representation and then generates positive and negative test cases. We also demonstrate the efficacy of our approach through case studies.

ContributorsGupta, Poonam (Author) / Ahn, Gail-Joon (Thesis advisor) / Collofello, James (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2014

Sharing is caring: a data exchange framework for colocated mobile apps

Description

Mobile apps have improved human lifestyle in various aspects ranging from instant messaging to tele-health. In the current app development paradigm, apps are being developed individually and agnostic of each other. The goal of this thesis is to allow a new world where multiple apps communicate with each other to…

Mobile apps have improved human lifestyle in various aspects ranging from instant messaging to tele-health. In the current app development paradigm, apps are being developed individually and agnostic of each other. The goal of this thesis is to allow a new world where multiple apps communicate with each other to achieve synergistic benefits. To enable integration between apps, manual communication between developers is needed, which can be problematic on many levels. In order to promote app integration, a systematic approach towards data sharing between multiple apps is essential. However, current approaches to app integration require large code modifications to reap the benefits of shared data such as requiring developers to provide APIs or use large, invasive middlewares. In this thesis, a data sharing framework was developed providing a non-invasive interface between mobile apps for data sharing and integration. A separate app acts as a registry to allow apps to register database tables to be shared and query this information. Two health monitoring apps were developed to evaluate the sharing framework and different methods of data integration between apps to promote synergistic feedback. The health monitoring apps have shown non-invasive solutions can provide data sharing functionality without large code modifications and manual communication between developers.

ContributorsMilazzo, Joseph (Author) / Gupta, Sandeep K.S. (Thesis advisor) / Varsamopoulos, Georgios (Committee member) / Nelson, Brian (Committee member) / Arizona State University (Publisher)

Created2014

Thermal aware scheduling in hadoop map reduce framework

Description

The energy consumption of data centers is increasing steadily along with the associ- ated power-density. Approximately half of such energy consumption is attributed to the cooling energy, as a result of which reducing cooling energy along with reducing servers energy consumption in data centers is becoming imperative so as to…

The energy consumption of data centers is increasing steadily along with the associ- ated power-density. Approximately half of such energy consumption is attributed to the cooling energy, as a result of which reducing cooling energy along with reducing servers energy consumption in data centers is becoming imperative so as to achieve greening of the data centers. This thesis deals with cooling energy management in data centers running data-processing frameworks. In particular, we propose ther- mal aware scheduling for MapReduce framework and its Hadoop implementation to reduce cooling energy in data centers. Data-processing frameworks run many low- priority batch processing jobs, such as background log analysis, that do not have strict completion time requirements; they can be delayed by a bounded amount of time. Cooling energy savings are possible by being able to temporally spread the workload, and assign it to the computing equipments which reduce the heat recirculation in data center room and therefore the load on the cooling systems. We implement our scheme in Hadoop and performs some experiments using both CPU-intensive and I/O-intensive workload benchmarks in order to evaluate the efficiency of our scheme. The evaluation results highlight that our thermal aware scheduling reduces hot-spots and makes uniform temperature distribution within the data center possible. Sum- marizing the contribution, we incorporated thermal awareness in Hadoop MapReduce framework by enhancing the native scheduler to make it thermally aware, compare the Thermal Aware Scheduler(TAS) with the Hadoop scheduler (FCFS) by running PageRank and TeraSort benchmarks in the BlueTool data center of Impact lab and show that there is reduction in peak temperature and decrease in cooling power using TAS over FCFS scheduler.

ContributorsKole, Sayan (Author) / Gupta, Sandeep (Thesis advisor) / Huang, Dijiang (Committee member) / Varsamopoulos, Georgios (Committee member) / Arizona State University (Publisher)

Created2013

Utility of considering multiple alternative rectifications in data cleaning

Description

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many…

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.

ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

An ontology-based approach to attribute management in ABAC environment

Description

Attribute Based Access Control (ABAC) mechanisms have been attracting a lot of interest from the research community in recent times. This is especially because of the flexibility and extensibility it provides by using attributes assigned to subjects as the basis for access control. ABAC enables an administrator of a server…

Attribute Based Access Control (ABAC) mechanisms have been attracting a lot of interest from the research community in recent times. This is especially because of the flexibility and extensibility it provides by using attributes assigned to subjects as the basis for access control. ABAC enables an administrator of a server to enforce access policies on the data, services and other such resources fairly easily. It also accommodates new policies and changes to existing policies gracefully, thereby making it a potentially good mechanism for implementing access control in large systems, particularly in today's age of Cloud Computing. However management of the attributes in ABAC environment is an area that has been little touched upon. Having a mechanism to allow multiple ABAC based systems to share data and resources can go a long way in making ABAC scalable. At the same time each system should be able to specify their own attribute sets independently. In the research presented in this document a new mechanism is proposed that would enable users to share resources and data in a cloud environment using ABAC techniques in a distributed manner. The focus is mainly on decentralizing the access policy specifications for the shared data so that each data owner can specify the access policy independent of others. The concept of ontologies and semantic web is introduced in the ABAC paradigm that would help in giving a scalable structure to the attributes and also allow systems having different sets of attributes to communicate and share resources.

ContributorsPrabhu Verleker, Ashwin Narayan (Author) / Huang, Dijiang (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)

Created2014