Search Content

Using contextual information to improve phishing warning effectiveness

Description

Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator (URL) requested by a user as a reported phishing URL,…

Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator (URL) requested by a user as a reported phishing URL, browsers like Mozilla Firefox and Google Chrome display an 'active' warning message in an attempt to stop the user from making a potentially dangerous decision of visiting the website and sharing confidential information like username-password, credit card information, social security number etc.

However, these warnings are not always successful at safeguarding the user from a phishing attack. On several occasions, users ignore these warnings and 'click through' them, eventually landing at the potentially dangerous website and giving away confidential information. Failure to understand the warning, failure to differentiate different types of browser warnings, diminishing trust on browser warnings due to repeated encounter are some of the reasons that make users ignore these warnings. It is important to address these factors in order to eventually improve a user’s reaction to these warnings.

In this thesis, I propose a novel design to improve the effectiveness and reliability of phishing warning messages. This design utilizes the name of the target website that a fake website is mimicking, to display a simple, easy to understand and interactive warning message with the primary objective of keeping the user away from a potentially spoof website.

ContributorsSharma, Satyabrata (Author) / Bazzi, Rida (Thesis advisor) / Walker, Erin (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)

Created2015

Client-driven dynamic database updates

Description

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management…

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management systems (DBMS). Also, unlike other client-driven work, this approach provides support for a richer set of schema updates including vertical split (normalization), horizontal split, vertical and horizontal merge (union), difference and intersection. The update process automatically generates a runtime update client from a mapping between the old the new schemas. The solution has been validated by testing it on a relatively small database of around 300,000 records per table and less than 1 Gb, but with limited memory buffer size of 24 Mb. This thesis presents the study of the overhead of the update process as a function of the transaction rates and the batch size used to copy data from the old to the new schema. It shows that the overhead introduced is minimal for medium size applications and that the update can be achieved with no more than one minute of downtime.

ContributorsTyagi, Preetika (Author) / Bazzi, Rida (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

Time efficient and quality effective K nearest neighbor search in high dimension space

Description

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and…

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and LSH-Forest improve upon the basic LSH algorithm by varying hash bucket size dynamically at query time, so these two algorithms can answer different KNN queries adaptively. However, these two algorithms need a data access post-processing step after candidates' collection in order to get the final answer to the KNN query. In this thesis, Multi-Probe LSH with data access post-processing (Multi-Probe LSH with DAPP) algorithm and LSH-Forest with data access post-processing (LSH-Forest with DAPP) algorithm are improved by replacing the costly data access post-processing (DAPP) step with a much faster histogram-based post-processing (HBPP). Two HBPP algorithms: LSH-Forest with HBPP and Multi- Probe LSH with HBPP are presented in this thesis, both of them achieve the three goals for KNN search in large scale high dimensional data set: high search quality, high time efficiency, high space efficiency. None of the previous KNN algorithms can achieve all three goals. More specifically, it is shown that HBPP algorithms can always achieve high search quality (as good as LSH-Forest with DAPP and Multi-Probe LSH with DAPP) with much less time cost (one to several orders of magnitude speedup) and same memory usage. It is also shown that with almost same time cost and memory usage, HBPP algorithms can always achieve better search quality than LSH-Forest with random pick (LSH-Forest with RP) and Multi-Probe LSH with random pick (Multi-Probe LSH with RP). Moreover, to achieve a very high search quality, Multi-Probe with HBPP is always a better choice than LSH-Forest with HBPP, regardless of the distribution, size and dimension number of the data set.

ContributorsYu, Renwei (Author) / Candan, Kasim S (Thesis advisor) / Sapino, Maria L (Committee member) / Chen, Yi (Committee member) / Sundaram, Hari (Committee member) / Arizona State University (Publisher)

Created2011

Enhancing the usability of complex structured data by supporting keyword searches

Description

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily…

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.

ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)

Created2011

An information diffusion approach to detecting emotional contagion in online social networks

Description

Internet sites that support user-generated content, so-called Web 2.0, have become part of the fabric of everyday life in technologically advanced nations. Users collectively spend billions of hours consuming and creating content on social networking sites, weblogs (blogs), and various other types of sites in the United States and around…

Internet sites that support user-generated content, so-called Web 2.0, have become part of the fabric of everyday life in technologically advanced nations. Users collectively spend billions of hours consuming and creating content on social networking sites, weblogs (blogs), and various other types of sites in the United States and around the world. Given the fundamentally emotional nature of humans and the amount of emotional content that appears in Web 2.0 content, it is important to understand how such websites can affect the emotions of users. This work attempts to determine whether emotion spreads through an online social network (OSN). To this end, a method is devised that employs a model based on a general threshold diffusion model as a classifier to predict the propagation of emotion between users and their friends in an OSN by way of mood-labeled blog entries. The model generalizes existing information diffusion models in that the state machine representation of a node is generalized from being binary to having n-states in order to support n class labels necessary to model emotional contagion. In the absence of ground truth, the prediction accuracy of the model is benchmarked with a baseline method that predicts the majority label of a user's emotion label distribution. The model significantly outperforms the baseline method in terms of prediction accuracy. The experimental results make a strong case for the existence of emotional contagion in OSNs in spite of possible alternative arguments such confounding influence and homophily, since these alternatives are likely to have negligible effect in a large dataset or simply do not apply to the domain of human emotions. A hybrid manual/automated method to map mood-labeled blog entries to a set of emotion labels is also presented, which enables the application of the model to a large set (approximately 900K) of blog entries from LiveJournal.

ContributorsCole, William David, M.S (Author) / Liu, Huan (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Candan, Kasim S (Committee member) / Arizona State University (Publisher)

Created2011

The design and analysis of hash families for use in broadcast encryption

Description

Broadcast Encryption is the task of cryptographically securing communication in a broadcast environment so that only a dynamically specified subset of subscribers, called the privileged subset, may decrypt the communication. In practical applications, it is desirable for a Broadcast Encryption Scheme (BES) to demonstrate resilience against attacks by colluding, unprivileged…

Broadcast Encryption is the task of cryptographically securing communication in a broadcast environment so that only a dynamically specified subset of subscribers, called the privileged subset, may decrypt the communication. In practical applications, it is desirable for a Broadcast Encryption Scheme (BES) to demonstrate resilience against attacks by colluding, unprivileged subscribers. Minimal Perfect Hash Families (PHFs) have been shown to provide a basis for the construction of memory-efficient t-resilient Key Pre-distribution Schemes (KPSs) from multiple instances of 1-resilient KPSs. Using this technique, the task of constructing a large t-resilient BES is reduced to finding a near-minimal PHF of appropriate parameters. While combinatorial and probabilistic constructions exist for minimal PHFs with certain parameters, the complexity of constructing them in general is currently unknown. This thesis introduces a new type of hash family, called a Scattering Hash Family (ScHF), which is designed to allow for the scalable and ingredient-independent design of memory-efficient BESs for large parameters, specifically resilience and total number of subscribers. A general BES construction using ScHFs is shown, which constructs t-resilient KPSs from other KPSs of any resilience ≤w≤t. In addition to demonstrating how ScHFs can be used to produce BESs , this thesis explores several ScHF construction techniques. The initial technique demonstrates a probabilistic, non-constructive proof of existence for ScHFs . This construction is then derandomized into a direct, polynomial time construction of near-minimal ScHFs using the method of conditional expectations. As an alternative approach to direct construction, representing ScHFs as a k-restriction problem allows for the indirect construction of ScHFs via randomized post-optimization. Using the methods defined, ScHFs are constructed and the parameters' effects on solution size are analyzed. For large strengths, constructive techniques lose significant performance, and as such, asymptotic analysis is performed using the non-constructive existential results. This work concludes with an analysis of the benefits and disadvantages of BESs based on the constructed ScHFs. Due to the novel nature of ScHFs, the results of this analysis are used as the foundation for an empirical comparison between ScHF-based and PHF-based BESs . The primary bases of comparison are construction efficiency, key material requirements, and message transmission overhead.

ContributorsO'Brien, Devon James (Author) / Colbourn, Charles J (Thesis advisor) / Bazzi, Rida (Committee member) / Richa, Andrea (Committee member) / Arizona State University (Publisher)

Created2012

A network function virtualization based load balancer for TCP

Description

A load balancer is an essential part of many network systems. A load balancer is capable of dividing and redistributing incoming network traffic to different back end servers, thus improving reliability and performance. Existing load balancing solutions can be classified into two categories: hardware-based or software-based. Hardware-based load balancing systems…

A load balancer is an essential part of many network systems. A load balancer is capable of dividing and redistributing incoming network traffic to different back end servers, thus improving reliability and performance. Existing load balancing solutions can be classified into two categories: hardware-based or software-based. Hardware-based load balancing systems are hard to manage and force network administrators to scale up (replacing with more powerful but expensive hardware) when their system can not handle the growing traffic. Software-based solutions have a limitation when dealing with a single large TCP flow. In recent years, with the fast developments of virtualization technology, a new trend of network function virtualization (NFV) is being adopted. Instead of using proprietary hardware, an NFV network infrastructure uses virtual machines running to implement network functions such as load balancers, firewalls, etc. In this thesis, a new load balancing system is designed and evaluated. This system is high performance and flexible. It can fully utilize the bandwidth between a load balancer and back end servers compared to traditional load balancers such as HAProxy. The experimental results show that using this NFV load balancer could have $n$ ($n$ is the number of back end servers) times better performance than HAProxy. Also, an extract, transform and load (ETL) application was implemented to demonstrate that this load balancer can shorten data load time. The experiment shows that when loading a large data set (18.3GB), our load balancer needs only 28\% less time than traditional load balancer.

ContributorsWu, Jinxuan (Author) / Syrotiuk, Violet R. (Thesis advisor) / Bazzi, Rida (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2015

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

Description

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness,…

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach.

In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore

and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.

The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.

ContributorsTian, Wenbo (Author) / Hsiao, Ihan (Thesis advisor) / Bazzi, Rida (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2019

Digital Fountain for Multi-node Aggregation of Data in Blockchains

Description

Blockchain scalability is one of the issues that concerns its current adopters. The current popular blockchains have initially been designed with imperfections that in- troduce fundamental bottlenecks which limit their ability to have a higher throughput and a lower latency.

One of the major bottlenecks for existing blockchain technologies is fast…

Blockchain scalability is one of the issues that concerns its current adopters. The current popular blockchains have initially been designed with imperfections that in- troduce fundamental bottlenecks which limit their ability to have a higher throughput and a lower latency.

One of the major bottlenecks for existing blockchain technologies is fast block propagation. A faster block propagation enables a miner to reach a majority of the network within a time constraint and therefore leading to a lower orphan rate and better profitability. In order to attain a throughput that could compete with the current state of the art transaction processing, while also keeping the block intervals same as today, a 24.3 Gigabyte block will be required every 10 minutes with an average transaction size of 500 bytes, which translates to 48600000 transactions every 10 minutes or about 81000 transactions per second.

In order to synchronize such large blocks faster across the network while maintain- ing consensus by keeping the orphan rate below 50%, the thesis proposes to aggregate partial block data from multiple nodes using digital fountain codes. The advantages of using a fountain code is that all connected peers can send part of data in an encoded form. When the receiving peer has enough data, it then decodes the information to reconstruct the block. Along with them sending only part information, the data can be relayed over UDP, instead of TCP, improving upon the speed of propagation in the current blockchains. Fountain codes applied in this research are Raptor codes, which allow construction of infinite decoding symbols. The research, when applied to blockchains, increases success rate of block delivery on decode failures.

ContributorsChawla, Nakul (Author) / Boscovic, Dragan (Thesis advisor) / Candan, Kasim S (Thesis advisor) / Zhao, Ming (Committee member) / Arizona State University (Publisher)

Created2018

Expansion Algorithms in Self-Organizing Particle Systems

Description

A primary goal in computer science is to develop autonomous systems. Usually, we provide computers with tasks and rules for completing those tasks, but what if we could extend this type of system to physical technology as well? In the field of programmable matter, researchers are tasked with developing synthetic…

A primary goal in computer science is to develop autonomous systems. Usually, we provide computers with tasks and rules for completing those tasks, but what if we could extend this type of system to physical technology as well? In the field of programmable matter, researchers are tasked with developing synthetic materials that can change their physical properties \u2014 such as color, density, and even shape \u2014 based on predefined rules or continuous, autonomous collection of input. In this research, we are most interested in particles that can perform computations, bond with other particles, and move. In this paper, we provide a theoretical particle model that can be used to simulate the performance of such physical particle systems, as well as an algorithm to perform expansion, wherein these particles can be used to enclose spaces or even objects.

ContributorsLaff, Miles (Author) / Richa, Andrea (Thesis director) / Bazzi, Rida (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05