Matching Items (21)
Filtering by

Clear all filters

153487-Thumbnail Image.png
Description
Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator (URL) requested by a user as a reported phishing URL,

Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator (URL) requested by a user as a reported phishing URL, browsers like Mozilla Firefox and Google Chrome display an 'active' warning message in an attempt to stop the user from making a potentially dangerous decision of visiting the website and sharing confidential information like username-password, credit card information, social security number etc.

However, these warnings are not always successful at safeguarding the user from a phishing attack. On several occasions, users ignore these warnings and 'click through' them, eventually landing at the potentially dangerous website and giving away confidential information. Failure to understand the warning, failure to differentiate different types of browser warnings, diminishing trust on browser warnings due to repeated encounter are some of the reasons that make users ignore these warnings. It is important to address these factors in order to eventually improve a user’s reaction to these warnings.

In this thesis, I propose a novel design to improve the effectiveness and reliability of phishing warning messages. This design utilizes the name of the target website that a fake website is mimicking, to display a simple, easy to understand and interactive warning message with the primary objective of keeping the user away from a potentially spoof website.
ContributorsSharma, Satyabrata (Author) / Bazzi, Rida (Thesis advisor) / Walker, Erin (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)
Created2015
150212-Thumbnail Image.png
Description
This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management systems (DBMS). Also, unlike other client-driven work, this approach provides support for a richer set of schema updates including vertical split (normalization), horizontal split, vertical and horizontal merge (union), difference and intersection. The update process automatically generates a runtime update client from a mapping between the old the new schemas. The solution has been validated by testing it on a relatively small database of around 300,000 records per table and less than 1 Gb, but with limited memory buffer size of 24 Mb. This thesis presents the study of the overhead of the update process as a function of the transaction rates and the batch size used to copy data from the old to the new schema. It shows that the overhead introduced is minimal for medium size applications and that the update can be achieved with no more than one minute of downtime.
ContributorsTyagi, Preetika (Author) / Bazzi, Rida (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2011
150895-Thumbnail Image.png
Description
Broadcast Encryption is the task of cryptographically securing communication in a broadcast environment so that only a dynamically specified subset of subscribers, called the privileged subset, may decrypt the communication. In practical applications, it is desirable for a Broadcast Encryption Scheme (BES) to demonstrate resilience against attacks by colluding, unprivileged

Broadcast Encryption is the task of cryptographically securing communication in a broadcast environment so that only a dynamically specified subset of subscribers, called the privileged subset, may decrypt the communication. In practical applications, it is desirable for a Broadcast Encryption Scheme (BES) to demonstrate resilience against attacks by colluding, unprivileged subscribers. Minimal Perfect Hash Families (PHFs) have been shown to provide a basis for the construction of memory-efficient t-resilient Key Pre-distribution Schemes (KPSs) from multiple instances of 1-resilient KPSs. Using this technique, the task of constructing a large t-resilient BES is reduced to finding a near-minimal PHF of appropriate parameters. While combinatorial and probabilistic constructions exist for minimal PHFs with certain parameters, the complexity of constructing them in general is currently unknown. This thesis introduces a new type of hash family, called a Scattering Hash Family (ScHF), which is designed to allow for the scalable and ingredient-independent design of memory-efficient BESs for large parameters, specifically resilience and total number of subscribers. A general BES construction using ScHFs is shown, which constructs t-resilient KPSs from other KPSs of any resilience ≤w≤t. In addition to demonstrating how ScHFs can be used to produce BESs , this thesis explores several ScHF construction techniques. The initial technique demonstrates a probabilistic, non-constructive proof of existence for ScHFs . This construction is then derandomized into a direct, polynomial time construction of near-minimal ScHFs using the method of conditional expectations. As an alternative approach to direct construction, representing ScHFs as a k-restriction problem allows for the indirect construction of ScHFs via randomized post-optimization. Using the methods defined, ScHFs are constructed and the parameters' effects on solution size are analyzed. For large strengths, constructive techniques lose significant performance, and as such, asymptotic analysis is performed using the non-constructive existential results. This work concludes with an analysis of the benefits and disadvantages of BESs based on the constructed ScHFs. Due to the novel nature of ScHFs, the results of this analysis are used as the foundation for an empirical comparison between ScHF-based and PHF-based BESs . The primary bases of comparison are construction efficiency, key material requirements, and message transmission overhead.
ContributorsO'Brien, Devon James (Author) / Colbourn, Charles J (Thesis advisor) / Bazzi, Rida (Committee member) / Richa, Andrea (Committee member) / Arizona State University (Publisher)
Created2012
154142-Thumbnail Image.png
Description
A load balancer is an essential part of many network systems. A load balancer is capable of dividing and redistributing incoming network traffic to different back end servers, thus improving reliability and performance. Existing load balancing solutions can be classified into two categories: hardware-based or software-based. Hardware-based load balancing systems

A load balancer is an essential part of many network systems. A load balancer is capable of dividing and redistributing incoming network traffic to different back end servers, thus improving reliability and performance. Existing load balancing solutions can be classified into two categories: hardware-based or software-based. Hardware-based load balancing systems are hard to manage and force network administrators to scale up (replacing with more powerful but expensive hardware) when their system can not handle the growing traffic. Software-based solutions have a limitation when dealing with a single large TCP flow. In recent years, with the fast developments of virtualization technology, a new trend of network function virtualization (NFV) is being adopted. Instead of using proprietary hardware, an NFV network infrastructure uses virtual machines running to implement network functions such as load balancers, firewalls, etc. In this thesis, a new load balancing system is designed and evaluated. This system is high performance and flexible. It can fully utilize the bandwidth between a load balancer and back end servers compared to traditional load balancers such as HAProxy. The experimental results show that using this NFV load balancer could have $n$ ($n$ is the number of back end servers) times better performance than HAProxy. Also, an extract, transform and load (ETL) application was implemented to demonstrate that this load balancer can shorten data load time. The experiment shows that when loading a large data set (18.3GB), our load balancer needs only 28\% less time than traditional load balancer.
ContributorsWu, Jinxuan (Author) / Syrotiuk, Violet R. (Thesis advisor) / Bazzi, Rida (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)
Created2015
157028-Thumbnail Image.png
Description
Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness,

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach.

In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore

and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.

The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.
ContributorsTian, Wenbo (Author) / Hsiao, Ihan (Thesis advisor) / Bazzi, Rida (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2019
136549-Thumbnail Image.png
Description
A primary goal in computer science is to develop autonomous systems. Usually, we provide computers with tasks and rules for completing those tasks, but what if we could extend this type of system to physical technology as well? In the field of programmable matter, researchers are tasked with developing synthetic

A primary goal in computer science is to develop autonomous systems. Usually, we provide computers with tasks and rules for completing those tasks, but what if we could extend this type of system to physical technology as well? In the field of programmable matter, researchers are tasked with developing synthetic materials that can change their physical properties \u2014 such as color, density, and even shape \u2014 based on predefined rules or continuous, autonomous collection of input. In this research, we are most interested in particles that can perform computations, bond with other particles, and move. In this paper, we provide a theoretical particle model that can be used to simulate the performance of such physical particle systems, as well as an algorithm to perform expansion, wherein these particles can be used to enclose spaces or even objects.
ContributorsLaff, Miles (Author) / Richa, Andrea (Thesis director) / Bazzi, Rida (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)
Created2015-05
134339-Thumbnail Image.png
Description
Implementing a distributed algorithm is more complicated than implementing a non-distributed algorithm. This is because distributed systems involve coordination of different processes each of which has a partial view of the global system state. The only way to share information in a distributed system is by message passing. Task that

Implementing a distributed algorithm is more complicated than implementing a non-distributed algorithm. This is because distributed systems involve coordination of different processes each of which has a partial view of the global system state. The only way to share information in a distributed system is by message passing. Task that are straightforward in a non-distributed system, like deciding on the value of a global system state, can be quite complicated to achieve in a distributed system [1]. On top of the difficulties caused by the distributed nature of the computations, distributed systems typically need to be able to operate normally even if some of the nodes in the system are faulty which further adds to the uncertainty that processes have about the global state. Many factors make the implementation of a distributed algorithms difficult. Design patterns [2] are useful in simplifying the development of general algorithms. A design pattern describes a high level solution to a common, abstract problem that many systems may face. Common structural, creational, and behavioral problems are identified and elegantly solved by design patterns. By identifying features that an algorithm uses, and framing each feature as one of the common problems that a specific design pattern solves, designing a robust implementation of an algorithm becomes more manageable. In this way, design patterns can aid the implementation of algorithms. Unfortunately, design patterns are typically not discussed when developing distributed algorithms. Because correctly developing a distributed algorithm is difficult, many papers (eg. [1], [3], [4]) focus on verifying the correctness of the developed algorithm. Papers that are more practical ([5], [6]) establish the correctness of their algorithm and that their algorithm is efficient enough to be practical. However, papers on distributed algorithms usually make little mention of design patterns. The goal of this work was to gain experience implementing distributed systems including learning the application of design patterns and the application of related technical topics. This was achieved by implementing a currently unpublished algorithm that is tentatively called Bakery Consensus. Bakery Consensus is a replicated state-machine protocol that can tolerate servers with Byzantine faults, but assumes non-faulty clients. The algorithm also establishes non-skipping timestamps for each operation completed by the replicated state-machine. The design of the structure, communication, and creation of the different system parts depended heavily upon the book Design Patterns [2]. After implementing the system, the success of the in implementing its various parts was based upon their ability to satisfy the SOLID [7] principles as well as their ability to establish low coupling and high cohesion [8]. The rest of this paper is organized as follows. We begin by providing background information about distributed algorithms, including replicated state-machine protocols and the Bakery Consensus algorithm. Section 3 gives a background on several design patterns and software engineering principles that were used in the development process. Section 4 discusses the well designed parts of the system that used design patterns, and how these design patterns were chosen. Section 5 discusses well designed system parts that relied upon other technical topics. Section 6 discusses system parts that need redesign. The conclusion summarizes what was accomplished by the implementation process and the lessons learned about design patterns for distributed algorithms.
ContributorsStoutenburg, Tristan Kaleb (Author) / Bazzi, Rida (Thesis director) / Richa, Andrea (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2017-05
153583-Thumbnail Image.png
Description
When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due to the accessibility of dynamic languages. Unfortunately, ad-hoc workflows lack

When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due to the accessibility of dynamic languages. Unfortunately, ad-hoc workflows lack a structured description as provided by specialized management systems, making ad-hoc workflow maintenance and reuse difficult, and motivating the need for analysis methods. The analysis of ad-hoc workflows using compiler techniques does not address dynamic languages - a program has so few constrains that its behavior cannot be predicted. In contrast, workflow provenance tracking has had success using run-time techniques to record data. The aim of this work is to develop a new analysis method for extracting workflow structure at run-time, thus avoiding issues with dynamics.

The method captures the dataflow of an ad-hoc workflow through its execution and abstracts it with a process for simplifying repetition. An instrumentation system first processes the workflow to produce an instrumented version, capable of logging events, which is then executed on an input to produce a trace. The trace undergoes dataflow construction to produce a provenance graph. The dataflow is examined for equivalent regions, which are collected into a single unit. The workflow is thus characterized in terms of its treatment of an input. Unlike other methods, a run-time approach characterizes the workflow's actual behavior; including elements which static analysis cannot predict (for example, code dynamically evaluated based on input parameters). This also enables the characterization of dataflow through external tools.

The contributions of this work are: a run-time method for recording a provenance graph from an ad-hoc Python workflow, and a method to analyze the structure of a workflow from provenance. Methods are implemented in Python and are demonstrated on real world Python workflows. These contributions enable users to derive graph structure from workflows. Empowered by a graphical view, users can better understand a legacy workflow. This makes the wealth of legacy ad-hoc workflows accessible, enabling workflow reuse instead of investing time and resources into creating a workflow.
ContributorsAcűna, Ruben (Author) / Bazzi, Rida (Thesis advisor) / Lacroix, Zoé (Thesis advisor) / Candan, Kasim (Committee member) / Arizona State University (Publisher)
Created2015
154497-Thumbnail Image.png
Description
Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require the robots to allocate themselves around the boundaries of regions

Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require the robots to allocate themselves around the boundaries of regions or features of interest and achieve target objectives that derive from their resulting spatial configurations, such as forming a connected communication network or acquiring sensor data around the entire boundary. We refer to this spatial allocation problem as boundary coverage. Possible swarm tasks that will involve boundary coverage include cooperative load manipulation for applications in construction, manufacturing, and disaster response.

In this work, I address the challenges of controlling a swarm of resource-constrained robots to achieve boundary coverage, which I refer to as the problem of stochastic boundary coverage. I first examined an instance of this behavior in the biological phenomenon of group food retrieval by desert ants, and developed a hybrid dynamical system model of this process from experimental data. Subsequently, with the aid of collaborators, I used a continuum abstraction of swarm population dynamics, adapted from a modeling framework used in chemical kinetics, to derive stochastic robot control policies that drive a swarm to target steady-state allocations around multiple boundaries in a way that is robust to environmental variations.

Next, I determined the statistical properties of the random graph that is formed by a group of robots, each with the same capabilities, that have attached to a boundary at random locations. I also computed the probability density functions (pdfs) of the robot positions and inter-robot distances for this case.

I then extended this analysis to cases in which the robots have heterogeneous communication/sensing radii and attach to a boundary according to non-uniform, non-identical pdfs. I proved that these more general coverage strategies generate random graphs whose probability of connectivity is Sharp-P Hard to compute. Finally, I investigated possible approaches to validating our boundary coverage strategies in multi-robot simulations with realistic Wi-fi communication.
ContributorsPeruvemba Kumar, Ganesh (Author) / Berman, Spring M (Thesis advisor) / Fainekos, Georgios (Thesis advisor) / Bazzi, Rida (Committee member) / Syrotiuk, Violet (Committee member) / Taylor, Thomas (Committee member) / Arizona State University (Publisher)
Created2016
154657-Thumbnail Image.png
Description
Several decades of transistor technology scaling has brought the threat of soft errors to modern embedded processors. Several techniques have been proposed to protect these systems from soft errors. However, their effectiveness in protecting the computation cannot be ascertained without accurate and quantitative estimation of system reliability. Vulnerability -- a

Several decades of transistor technology scaling has brought the threat of soft errors to modern embedded processors. Several techniques have been proposed to protect these systems from soft errors. However, their effectiveness in protecting the computation cannot be ascertained without accurate and quantitative estimation of system reliability. Vulnerability -- a metric that defines the probability of system-failure (reliability) through analytical models -- is the most effective mechanism for our current estimation and early design space exploration needs. Previous vulnerability estimation tools are based around the Sim-Alpha simulator which has been to shown to have several limitations. In this thesis, I present gemV: an accurate and comprehensive vulnerability estimation tool based on gem5. Gem5 is a popular cycle-accurate micro-architectural simulator that can model several different processor models in close to real hardware form. GemV can be used for fast and early design space exploration and also evaluate the protection afforded by commodity processors. gemV is comprehensive, since it models almost all sequential components of the processor. gemV is accurate because of fine-grain vulnerability tracking, accurate vulnerability modeling of squashed instructions, and accurate vulnerability modeling of shared data structures in gem5. gemV has been thoroughly validated against extensive fault injection experiments and achieves a 97\% accuracy with 95\% confidence. A micro-architect can use gemV to discover micro-architectural variants of a processor that minimize vulnerability for allowed performance penalty. A software developer can use gemV to explore the performance-vulnerability trade-off by choosing different algorithms and compiler optimizations, while the system designer can use gemV to explore the performance-vulnerability trade-offs of choosing different Insruction Set Architectures (ISA).
ContributorsTanikella, Srinivas Karthik (Author) / Shrivastava, Aviral (Thesis advisor) / Bazzi, Rida (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2016