Search Content

Using contextual information to improve phishing warning effectiveness

Description

Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator (URL) requested by a user as a reported phishing URL,…

Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator (URL) requested by a user as a reported phishing URL, browsers like Mozilla Firefox and Google Chrome display an 'active' warning message in an attempt to stop the user from making a potentially dangerous decision of visiting the website and sharing confidential information like username-password, credit card information, social security number etc.

However, these warnings are not always successful at safeguarding the user from a phishing attack. On several occasions, users ignore these warnings and 'click through' them, eventually landing at the potentially dangerous website and giving away confidential information. Failure to understand the warning, failure to differentiate different types of browser warnings, diminishing trust on browser warnings due to repeated encounter are some of the reasons that make users ignore these warnings. It is important to address these factors in order to eventually improve a user’s reaction to these warnings.

In this thesis, I propose a novel design to improve the effectiveness and reliability of phishing warning messages. This design utilizes the name of the target website that a fake website is mimicking, to display a simple, easy to understand and interactive warning message with the primary objective of keeping the user away from a potentially spoof website.

ContributorsSharma, Satyabrata (Author) / Bazzi, Rida (Thesis advisor) / Walker, Erin (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)

Created2015

Client-driven dynamic database updates

Description

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management…

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management systems (DBMS). Also, unlike other client-driven work, this approach provides support for a richer set of schema updates including vertical split (normalization), horizontal split, vertical and horizontal merge (union), difference and intersection. The update process automatically generates a runtime update client from a mapping between the old the new schemas. The solution has been validated by testing it on a relatively small database of around 300,000 records per table and less than 1 Gb, but with limited memory buffer size of 24 Mb. This thesis presents the study of the overhead of the update process as a function of the transaction rates and the batch size used to copy data from the old to the new schema. It shows that the overhead introduced is minimal for medium size applications and that the update can be achieved with no more than one minute of downtime.

ContributorsTyagi, Preetika (Author) / Bazzi, Rida (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

The design and analysis of hash families for use in broadcast encryption

Description

Broadcast Encryption is the task of cryptographically securing communication in a broadcast environment so that only a dynamically specified subset of subscribers, called the privileged subset, may decrypt the communication. In practical applications, it is desirable for a Broadcast Encryption Scheme (BES) to demonstrate resilience against attacks by colluding, unprivileged…

Broadcast Encryption is the task of cryptographically securing communication in a broadcast environment so that only a dynamically specified subset of subscribers, called the privileged subset, may decrypt the communication. In practical applications, it is desirable for a Broadcast Encryption Scheme (BES) to demonstrate resilience against attacks by colluding, unprivileged subscribers. Minimal Perfect Hash Families (PHFs) have been shown to provide a basis for the construction of memory-efficient t-resilient Key Pre-distribution Schemes (KPSs) from multiple instances of 1-resilient KPSs. Using this technique, the task of constructing a large t-resilient BES is reduced to finding a near-minimal PHF of appropriate parameters. While combinatorial and probabilistic constructions exist for minimal PHFs with certain parameters, the complexity of constructing them in general is currently unknown. This thesis introduces a new type of hash family, called a Scattering Hash Family (ScHF), which is designed to allow for the scalable and ingredient-independent design of memory-efficient BESs for large parameters, specifically resilience and total number of subscribers. A general BES construction using ScHFs is shown, which constructs t-resilient KPSs from other KPSs of any resilience ≤w≤t. In addition to demonstrating how ScHFs can be used to produce BESs , this thesis explores several ScHF construction techniques. The initial technique demonstrates a probabilistic, non-constructive proof of existence for ScHFs . This construction is then derandomized into a direct, polynomial time construction of near-minimal ScHFs using the method of conditional expectations. As an alternative approach to direct construction, representing ScHFs as a k-restriction problem allows for the indirect construction of ScHFs via randomized post-optimization. Using the methods defined, ScHFs are constructed and the parameters' effects on solution size are analyzed. For large strengths, constructive techniques lose significant performance, and as such, asymptotic analysis is performed using the non-constructive existential results. This work concludes with an analysis of the benefits and disadvantages of BESs based on the constructed ScHFs. Due to the novel nature of ScHFs, the results of this analysis are used as the foundation for an empirical comparison between ScHF-based and PHF-based BESs . The primary bases of comparison are construction efficiency, key material requirements, and message transmission overhead.

ContributorsO'Brien, Devon James (Author) / Colbourn, Charles J (Thesis advisor) / Bazzi, Rida (Committee member) / Richa, Andrea (Committee member) / Arizona State University (Publisher)

Created2012

A network function virtualization based load balancer for TCP

Description

A load balancer is an essential part of many network systems. A load balancer is capable of dividing and redistributing incoming network traffic to different back end servers, thus improving reliability and performance. Existing load balancing solutions can be classified into two categories: hardware-based or software-based. Hardware-based load balancing systems…

A load balancer is an essential part of many network systems. A load balancer is capable of dividing and redistributing incoming network traffic to different back end servers, thus improving reliability and performance. Existing load balancing solutions can be classified into two categories: hardware-based or software-based. Hardware-based load balancing systems are hard to manage and force network administrators to scale up (replacing with more powerful but expensive hardware) when their system can not handle the growing traffic. Software-based solutions have a limitation when dealing with a single large TCP flow. In recent years, with the fast developments of virtualization technology, a new trend of network function virtualization (NFV) is being adopted. Instead of using proprietary hardware, an NFV network infrastructure uses virtual machines running to implement network functions such as load balancers, firewalls, etc. In this thesis, a new load balancing system is designed and evaluated. This system is high performance and flexible. It can fully utilize the bandwidth between a load balancer and back end servers compared to traditional load balancers such as HAProxy. The experimental results show that using this NFV load balancer could have $n$ ($n$ is the number of back end servers) times better performance than HAProxy. Also, an extract, transform and load (ETL) application was implemented to demonstrate that this load balancer can shorten data load time. The experiment shows that when loading a large data set (18.3GB), our load balancer needs only 28\% less time than traditional load balancer.

ContributorsWu, Jinxuan (Author) / Syrotiuk, Violet R. (Thesis advisor) / Bazzi, Rida (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2015

Novel Image Representations and Learning Tasks

Description

Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is…

Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is generalization of supervised learning, is one

example of task learning that is discussed. In particular, a novel non-parametric k-

NN-based multiple-instance learning is proposed, which is shown to outperform other

existing approaches. This solution is applied to a diabetic retinopathy pathology

detection problem eectively.

In cases of representation learning, generality of neural features are investigated

rst. This investigation leads to some critical understanding and results in feature

generality among datasets. The possibility of learning from a mentor network instead

of from labels is then investigated. Distillation of dark knowledge is used to eciently

mentor a small network from a pre-trained large mentor network. These studies help

in understanding representation learning with smaller and compressed networks.

ContributorsVenkatesan, Ragav (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2017

Mining Marked Nodes in Large Graphs

Description

With the rise of the Big Data Era, an exponential amount of network data is being generated at an unprecedented rate across a wide-range of high impact micro and macro areas of research---from protein interaction to social networks. The critical challenge is translating this large scale network data into actionable…

With the rise of the Big Data Era, an exponential amount of network data is being generated at an unprecedented rate across a wide-range of high impact micro and macro areas of research---from protein interaction to social networks. The critical challenge is translating this large scale network data into actionable information.

A key task in the data translation is the analysis of network connectivity via marked nodes---the primary focus of our research. We have developed a framework for analyzing network connectivity via marked nodes in large scale graphs, utilizing novel algorithms in three interrelated areas: (1) analysis of a single seed node via it’s ego-centric network (AttriPart algorithm); (2) pathway identification between two seed nodes (K-Simple Shortest Paths Multithreaded and Search Reduced (KSSPR) algorithm); and (3) tree detection, defining the interaction between three or more seed nodes (Shortest Path MST algorithm).

In an effort to address both fundamental and applied research issues, we have developed the LocalForcasting algorithm to explore how network connectivity analysis can be applied to local community evolution and recommender systems. The goal is to apply the LocalForecasting algorithm to various domains---e.g., friend suggestions in social networks or future collaboration in co-authorship networks. This algorithm utilizes link prediction in combination with the AttriPart algorithm to predict future connections in local graph partitions.

Results show that our proposed AttriPart algorithm finds up to 1.6x denser local partitions, while running approximately 43x faster than traditional local partitioning techniques (PageRank-Nibble). In addition, our LocalForecasting algorithm demonstrates a significant improvement in the number of nodes and edges correctly predicted over baseline methods. Furthermore, results for the KSSPR algorithm demonstrate a speed-up of up to 2.5x the standard k-simple shortest paths algorithm.

ContributorsFreitas, Scott (Author) / Tong, Hanghang (Thesis advisor) / Maciejewski, Ross (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2018

Towards Learning Representations in Visual Computing Tasks

Description

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle…

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos.

The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss.

In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks.

ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2017

Perturbation Robust Representations of Topological Persistence Diagrams

Description

Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision: including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is attributed to the complementary information that topology provides, as well…

Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision: including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is attributed to the complementary information that topology provides, as well as availability of tools for computing topological summaries such as persistence diagrams. However, persistence diagrams are multi-sets of points and hence it is not straightforward to fuse them with features used for contemporary machine learning tools like deep-nets. In this paper theoretically well-grounded approaches to develop novel perturbation robust topological representations are presented, with the long-term view of making them amenable to fusion with contemporary learning architectures. The proposed representation lives on a Grassmann manifold and hence can be efficiently used in machine learning pipelines.

The proposed representation.The efficacy of the proposed descriptor was explored on three applications: view-invariant activity analysis, 3D shape analysis, and non-linear dynamical modeling. Favorable results in both high-level recognition performance and improved performance in reduction of time-complexity when compared to other baseline methods are obtained.

ContributorsThopalli, Kowshik (Author) / Turaga, Pavan Kumar (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2017

Knowledge and Reasoning for Image Understanding

Description

Image Understanding is a long-established discipline in computer vision, which encompasses a body of advanced image processing techniques, that are used to locate (“where”), characterize and recognize (“what”) objects, regions, and their attributes in the image. However, the notion of “understanding” (and the goal of artificial intelligent machines) goes beyond…

Image Understanding is a long-established discipline in computer vision, which encompasses a body of advanced image processing techniques, that are used to locate (“where”), characterize and recognize (“what”) objects, regions, and their attributes in the image. However, the notion of “understanding” (and the goal of artificial intelligent machines) goes beyond factual recall of the recognized components and includes reasoning and thinking beyond what can be seen (or perceived). Understanding is often evaluated by asking questions of increasing difficulty. Thus, the expected functionalities of an intelligent Image Understanding system can be expressed in terms of the functionalities that are required to answer questions about an image. Answering questions about images require primarily three components: Image Understanding, question (natural language) understanding, and reasoning based on knowledge. Any question, asking beyond what can be directly seen, requires modeling of commonsense (or background/ontological/factual) knowledge and reasoning.

Knowledge and reasoning have seen scarce use in image understanding applications. In this thesis, we demonstrate the utilities of incorporating background knowledge and using explicit reasoning in image understanding applications. We first present a comprehensive survey of the previous work that utilized background knowledge and reasoning in understanding images. This survey outlines the limited use of commonsense knowledge in high-level applications. We then present a set of vision and reasoning-based methods to solve several applications and show that these approaches benefit in terms of accuracy and interpretability from the explicit use of knowledge and reasoning. We propose novel knowledge representations of image, knowledge acquisition methods, and a new implementation of an efficient probabilistic logical reasoning engine that can utilize publicly available commonsense knowledge to solve applications such as visual question answering, image puzzles. Additionally, we identify the need for new datasets that explicitly require external commonsense knowledge to solve. We propose the new task of Image Riddles, which requires a combination of vision, and reasoning based on ontological knowledge; and we collect a sufficiently large dataset to serve as an ideal testbed for vision and reasoning research. Lastly, we propose end-to-end deep architectures that can combine vision, knowledge and reasoning modules together and achieve large performance boosts over state-of-the-art methods.

ContributorsAditya, Somak (Author) / Baral, Chitta (Thesis advisor) / Yang, Yezhou (Thesis advisor) / Aloimonos, Yiannis (Committee member) / Lee, Joohyung (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2018

Multi-layered HITS on Multi-sourced Networks

Description

Network mining has been attracting a lot of research attention because of the prevalence of networks. As the world is becoming increasingly connected and correlated, networks arising from inter-dependent application domains are often collected from different sources, forming the so-called multi-sourced networks. Examples of such multi-sourced networks include critical infrastructure…

Network mining has been attracting a lot of research attention because of the prevalence of networks. As the world is becoming increasingly connected and correlated, networks arising from inter-dependent application domains are often collected from different sources, forming the so-called multi-sourced networks. Examples of such multi-sourced networks include critical infrastructure networks, multi-platform social networks, cross-domain collaboration networks, and many more. Compared with single-sourced network, multi-sourced networks bear more complex structures and therefore could potentially contain more valuable information.

This thesis proposes a multi-layered HITS (Hyperlink-Induced Topic Search) algorithm to perform the ranking task on multi-sourced networks. Specifically, each node in the network receives an authority score and a hub score for evaluating the value of the node itself and the value of its outgoing links respectively. Based on a recent multi-layered network model, which allows more flexible dependency structure across different sources (i.e., layers), the proposed algorithm leverages both within-layer smoothness and cross-layer consistency. This essentially allows nodes from different layers to be ranked accordingly. The multi-layered HITS is formulated as a regularized optimization problem with non-negative constraint and solved by an iterative update process. Extensive experimental evaluations demonstrate the effectiveness and explainability of the proposed algorithm.

ContributorsYu, Haichao (Author) / Tong, Hanghang (Thesis advisor) / He, Jingrui (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2018

Filtering by