Search Content

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Connecting users with similar interests for group understanding

Description

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our…

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our connections and the expansion of our social networks easier. The aggregation of people who share common interests forms social groups, which are fundamental parts of our social lives. Social behavioral analysis at a group level is an active research area and attracts many interests from the industry. Challenges of my work mainly arise from the scale and complexity of user generated behavioral data. The multiple types of interactions, highly dynamic nature of social networking and the volatile user behavior suggest that these data are complex and big in general. Effective and efficient approaches are required to analyze and interpret such data. My work provide effective channels to help connect the like-minded and, furthermore, understand user behavior at a group level. The contributions of this dissertation are in threefold: (1) proposing novel representation of collective tagging knowledge via tag networks; (2) proposing the new information spreader identification problem in egocentric soical networks; (3) defining group profiling as a systematic approach to understanding social groups. In sum, the research proposes novel concepts and approaches for connecting the like-minded, enables the understanding of user groups, and exposes interesting research opportunities.

ContributorsWang, Xufei (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Sundaram, Hari (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

When is temporal planning really temporal

Description

In this dissertation I develop a deep theory of temporal planning well-suited to analyzing, understanding, and improving the state of the art implementations (as of 2012). At face-value the work is strictly theoretical; nonetheless its impact is entirely real and practical. The easiest portion of that impact to highlight concerns…

In this dissertation I develop a deep theory of temporal planning well-suited to analyzing, understanding, and improving the state of the art implementations (as of 2012). At face-value the work is strictly theoretical; nonetheless its impact is entirely real and practical. The easiest portion of that impact to highlight concerns the notable improvements to the format of the temporal fragment of the International Planning Competitions (IPCs). Particularly: the theory I expound upon here is the primary cause of--and justification for--the altered (i) selection of benchmark problems, and (ii) notion of "winning temporal planner". For higher level motivation: robotics, web service composition, industrial manufacturing, business process management, cybersecurity, space exploration, deep ocean exploration, and logistics all benefit from applying domain-independent automated planning technique. Naturally, actually carrying out such case studies has much to offer. For example, we may extract the lesson that reasoning carefully about deadlines is rather crucial to planning in practice. More generally, effectively automating specifically temporal planning is well-motivated from applications. Entirely abstractly, the aim is to improve the theory of automated temporal planning by distilling from its practice. My thesis is that the key feature of computational interest is concurrency. To support, I demonstrate by way of compilation methods, worst-case counting arguments, and analysis of algorithmic properties such as completeness that the more immediately pressing computational obstacles (facing would-be temporal generalizations of classical planning systems) can be dealt with in theoretically efficient manner. So more accurately the technical contribution here is to demonstrate: The computationally significant obstacle to automated temporal planning that remains is just concurrency.

ContributorsCushing, William Albemarle (Author) / Kambhampati, Subbarao (Thesis advisor) / Weld, Daniel S. (Committee member) / Smith, David E. (Committee member) / Baral, Chitta (Committee member) / Davalcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2012

Design, analysis and resource allocations in networks in presence of region-based faults

Description

Communication networks, both wired and wireless, are expected to have a certain level of fault-tolerance capability.These networks are also expected to ensure a graceful degradation in performance when some of the network components fail. Traditional studies on fault tolerance in communication networks, for the most part, make no assumptions regarding…

Communication networks, both wired and wireless, are expected to have a certain level of fault-tolerance capability.These networks are also expected to ensure a graceful degradation in performance when some of the network components fail. Traditional studies on fault tolerance in communication networks, for the most part, make no assumptions regarding the location of node/link faults, i.e., the faulty nodes and links may be close to each other or far from each other. However, in many real life scenarios, there exists a strong spatial correlation among the faulty nodes and links. Such failures are often encountered in disaster situations, e.g., natural calamities or enemy attacks. In presence of such region-based faults, many of traditional network analysis and fault-tolerant metrics, that are valid under non-spatially correlated faults, are no longer applicable. To this effect, the main thrust of this research is design and analysis of robust networks in presence of such region-based faults. One important finding of this research is that if some prior knowledge is available on the maximum size of the region that might be affected due to a region-based fault, this piece of knowledge can be effectively utilized for resource efficient design of networks. It has been shown in this dissertation that in some scenarios, effective utilization of this knowledge may result in substantial saving is transmission power in wireless networks. In this dissertation, the impact of region-based faults on the connectivity of wireless networks has been studied and a new metric, region-based connectivity, is proposed to measure the fault-tolerance capability of a network. In addition, novel metrics, such as the region-based component decomposition number(RBCDN) and region-based largest component size(RBLCS) have been proposed to capture the network state, when a region-based fault disconnects the network. Finally, this dissertation presents efficient resource allocation techniques that ensure tolerance against region-based faults, in distributed file storage networks and data center networks.

ContributorsBanerjee, Sujogya (Author) / Sen, Arunabha (Thesis advisor) / Xue, Guoliang (Committee member) / Richa, Andrea (Committee member) / Hurlbert, Glenn (Committee member) / Arizona State University (Publisher)

Created2013

Compiler and runtime for memory management on software managed manycore processors

Description

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale…

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.

ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014

Utility of considering multiple alternative rectifications in data cleaning

Description

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many…

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.

ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

Resource allocation in communication and social networks

Description

As networks are playing an increasingly prominent role in different aspects of our lives, there is a growing awareness that improving their performance is of significant importance. In order to enhance performance of networks, it is essential that scarce networking resources be allocated smartly to match the continuously changing network…

As networks are playing an increasingly prominent role in different aspects of our lives, there is a growing awareness that improving their performance is of significant importance. In order to enhance performance of networks, it is essential that scarce networking resources be allocated smartly to match the continuously changing network environment. This dissertation focuses on two different kinds of networks - communication and social, and studies resource allocation problems in these networks. The study on communication networks is further divided into different networking technologies - wired and wireless, optical and mobile, airborne and terrestrial. Since nodes in an airborne network (AN) are heterogeneous and mobile, the design of a reliable and robust AN is highly complex. The dissertation studies connectivity and fault-tolerance issues in ANs and proposes algorithms to compute the critical transmission range in fault free, faulty and delay tolerant scenarios. Just as in the case of ANs, power optimization and fault tolerance are important issues in wireless sensor networks (WSN). In a WSN, a tree structure is often used to deliver sensor data to a sink node. In a tree, failure of a node may disconnect the tree. The dissertation investigates the problem of enhancing the fault tolerance capability of data gathering trees in WSN. The advent of OFDM technology provides an opportunity for efficient resource utilization in optical networks and also introduces a set of novel problems, such as routing and spectrum allocation (RSA) problem. This dissertation proves that RSA problem is NP-complete even when the network topology is a chain, and proposes approximation algorithms. In the domain of social networks, the focus of this dissertation is study of influence propagation in presence of active adversaries. In a social network multiple vendors may attempt to influence the nodes in a competitive fashion. This dissertation investigates the scenario where the first vendor has already chosen a set of nodes and the second vendor, with the knowledge of the choice of the first, attempts to identify a smallest set of nodes so that after the influence propagation, the second vendor's market share is larger than the first.

ContributorsShirazipourazad, Shahrzad (Author) / Sen, Arunabha (Committee member) / Xue, Guoliang (Committee member) / Richa, Andrea (Committee member) / Saripalli, Srikanth (Committee member) / Arizona State University (Publisher)

Created2014

Detection of advanced bots in smartphones through user profiling

Description

This thesis addresses the ever increasing threat of botnets in the smartphone domain and focuses on the Android platform and the botnets using Online Social Networks (OSNs) as Command and Control (C&C;) medium. With any botnet, C&C; is one of the components on which the survival of botnet depends. Individual…

This thesis addresses the ever increasing threat of botnets in the smartphone domain and focuses on the Android platform and the botnets using Online Social Networks (OSNs) as Command and Control (C&C;) medium. With any botnet, C&C; is one of the components on which the survival of botnet depends. Individual bots use the C&C; channel to receive commands and send the data. This thesis develops active host based approach for identifying the presence of bot based on the anomalies in the usage patterns of the user before and after the bot is installed on the user smartphone and alerting the user to the presence of the bot. A profile is constructed for each user based on the regular web usage patterns (achieved by intercepting the http(s) traffic) and implementing machine learning techniques to continuously learn the user's behavior and changes in the behavior and all the while looking for any anomalies in the user behavior above a threshold which will cause the user to be notified of the anomalous traffic. A prototype bot which uses OSN s as C&C; channel is constructed and used for testing. Users are given smartphones(Nexus 4 and Galaxy Nexus) running Application proxy which intercepts http(s) traffic and relay it to a server which uses the traffic and constructs the model for a particular user and look for any signs of anomalies. This approach lays the groundwork for the future host-based counter measures for smartphone botnets using OSN s as C&C; channel.

ContributorsKilari, Vishnu Teja (Author) / Xue, Guoliang (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)

Created2013

An SDN-based IPS development framework in cloud networking environment

Description

Security has been one of the top concerns in cloud community while cloud resource abuse and malicious insiders are considered as top threats. Traditionally, Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) have been widely deployed to manipulate cloud security, with the latter one providing additional prevention capability. However,…

Security has been one of the top concerns in cloud community while cloud resource abuse and malicious insiders are considered as top threats. Traditionally, Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) have been widely deployed to manipulate cloud security, with the latter one providing additional prevention capability. However, as one of the most creative networking technologies, Software-Defined Networking (SDN) is rarely used to implement IDPS in the cloud computing environment because the lack of comprehensive development framework and processing flow. Simply migration from traditional IDS/IPS systems to SDN environment are not effective enough for detecting and defending malicious attacks. Hence, in this thesis, we present an IPS development framework to help user easily design and implement their defensive systems in cloud system by SDN technology. This framework enables SDN approaches to enhance the system security and performance. A Traffic Information Platform (TIP) is proposed as the cornerstone with several upper layer security modules such as Detection, Analysis and Prevention components. Benefiting from the flexible, compatible and programmable features of SDN, Customized Detection Engine, Network Topology Finder, Source Tracer and further user-developed security appliances are plugged in our framework to construct a SDN-based defensive system. Two main categories Python-based APIs are designed to support developers for further development. This system is designed and implemented based on the POX controller and Open vSwitch in the cloud computing environment. The efficiency of this framework is demonstrated by a sample IPS implementation and the performance of our framework is also evaluated.

ContributorsXiong, Zhengyang (Author) / Huang, Dijiang (Thesis advisor) / Xue, Guoliang (Committee member) / Dalvucu, Hasan (Committee member) / Arizona State University (Publisher)

Created2014

Filtering by