Search Content

Locating Arrays: Construction, Analysis, and Robustness

Description

Modern computer systems are complex engineered systems involving a large collection of individual parts, each with many parameters, or factors, affecting system performance. One way to understand these complex systems and their performance is through experimentation. However, most modern computer systems involve such a large number of factors that thorough…

Modern computer systems are complex engineered systems involving a large collection of individual parts, each with many parameters, or factors, affecting system performance. One way to understand these complex systems and their performance is through experimentation. However, most modern computer systems involve such a large number of factors that thorough experimentation on all of them is impossible. An initial screening step is thus necessary to determine which factors are relevant to the system's performance and which factors can be eliminated from experimentation.

Factors may impact system performance in different ways. A factor at a specific level may significantly affect performance as a main effect, or in combination with other main effects as an interaction. For screening, it is necessary both to identify the presence of these effects and to locate the factors responsible for them. A locating array is a relatively new experimental design that causes every main effect and interaction to occur and distinguishes all sets of d main effects and interactions from each other in the tests where they occur. This design is therefore helpful in screening complex systems.

The process of screening using locating arrays involves multiple steps. First, a locating array is constructed for all possibly significant factors. Next, the system is executed for all tests indicated by the locating array and a response is observed. Finally, the response is analyzed to identify the significant system factors for future experimentation. However, simply constructing a reasonably sized locating array for a large system is no easy task and analyzing the response of the tests presents additional difficulties due to the large number of possible predictors and the inherent imbalance in the experimental design itself. Further complications can arise from noise in the system or errors in testing.

This thesis has three contributions. First, it provides an algorithm to construct locating arrays using the Lovász Local Lemma with Moser-Tardos resampling. Second, it gives an algorithm to analyze the system response efficiently. Finally, it studies the robustness of the analysis to the heavy-hitters assumption underlying the approach as well as to varying amounts of system noise.

ContributorsSeidel, Stephen (Author) / Syrotiuk, Violet R. (Thesis advisor) / Colbourn, Charles J (Committee member) / Montgomery, Douglas C. (Committee member) / Arizona State University (Publisher)

Created2018

Interaction Testing, Fault Location, and Anonymous Attribute-Based Authorization

Description

This dissertation studies three classes of combinatorial arrays with practical applications in testing, measurement, and security. Covering arrays are widely studied in software and hardware testing to indicate the presence of faulty interactions. Locating arrays extend covering arrays to achieve identification of the interactions causing a fault by requiring additional…

This dissertation studies three classes of combinatorial arrays with practical applications in testing, measurement, and security. Covering arrays are widely studied in software and hardware testing to indicate the presence of faulty interactions. Locating arrays extend covering arrays to achieve identification of the interactions causing a fault by requiring additional conditions on how interactions are covered in rows. This dissertation introduces a new class, the anonymizing arrays, to guarantee a degree of anonymity by bounding the probability a particular row is identified by the interaction presented. Similarities among these arrays lead to common algorithmic techniques for their construction which this dissertation explores. Differences arising from their application domains lead to the unique features of each class, requiring tailoring the techniques to the specifics of each problem.

One contribution of this work is a conditional expectation algorithm to build covering arrays via an intermediate combinatorial object. Conditional expectation efficiently finds intermediate-sized arrays that are particularly useful as ingredients for additional recursive algorithms. A cut-and-paste method creates large arrays from small ingredients. Performing transformations on the copies makes further improvements by reducing redundancy in the composed arrays and leads to fewer rows.

This work contains the first algorithm for constructing locating arrays for general values of $d$ and $t$. A randomized computational search algorithmic framework verifies if a candidate array is $(\bar{d},t)$-locating by partitioning the search space and performs random resampling if a candidate fails. Algorithmic parameters determine which columns to resample and when to add additional rows to the candidate array. Additionally, analysis is conducted on the performance of the algorithmic parameters to provide guidance on how to tune parameters to prioritize speed, accuracy, or a combination of both.

This work proposes anonymizing arrays as a class related to covering arrays with a higher coverage requirement and constraints. The algorithms for covering and locating arrays are tailored to anonymizing array construction. An additional property, homogeneity, is introduced to meet the needs of attribute-based authorization. Two metrics, local and global homogeneity, are designed to compare anonymizing arrays with the same parameters. Finally, a post-optimization approach reduces the homogeneity of an anonymizing array.

ContributorsLanus, Erin (Author) / Colbourn, Charles J (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Montgomery, Douglas C. (Committee member) / Syrotiuk, Violet R. (Committee member) / Arizona State University (Publisher)

Created2019

Comparing a commercial and an SDN-based load balancer in a campus network

Description

Commercial load balancers are often in use, and the production network at Arizona State University (ASU) is no exception. However, because the load balancer uses IP addresses, the solution does not apply to all applications. One such application is Rsyslog. This software processes syslog packets and stores them in files.…

Commercial load balancers are often in use, and the production network at Arizona State University (ASU) is no exception. However, because the load balancer uses IP addresses, the solution does not apply to all applications. One such application is Rsyslog. This software processes syslog packets and stores them in files. The loss rate of incoming log packets is high due to the incoming rate of the data. The Rsyslog servers are overwhelmed by the continuous data stream. To solve this problem a software defined networking (SDN) based load balancer is designed to perform a transport-level load balancing over the incoming load to Rsyslog servers. In this solution the load is forwarded to one Rsyslog server at a time, according to one of a Round-Robin, Random, or Load-Based policy. This gives time to other servers to process the data they have received and prevent them from being overwhelmed. The evaluation of the proposed solution is conducted a physical testbed with the same data feed as the commercial solution. The results suggest that the SDN-based load balancer is competitive with the commercial load balancer. Replacing the software OpenFlow switch with a hardware switch is likely to further improve the results.

ContributorsGhaffarinejad, Ashkan (Author) / Syrotiuk, Violet R. (Thesis advisor) / Xue, Guoliang (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2015

Performance optimization of linux networking for latency-sensitive virtual systems

Description

Virtual machines and containers have steadily improved their performance over time as a result of innovations in their architecture and software ecosystems. Network functions and workloads are increasingly migrating to virtual environments, supported by developments in software defined networking (SDN) and network function virtualization (NFV). Previous performance analyses…

Virtual machines and containers have steadily improved their performance over time as a result of innovations in their architecture and software ecosystems. Network functions and workloads are increasingly migrating to virtual environments, supported by developments in software defined networking (SDN) and network function virtualization (NFV). Previous performance analyses of virtual systems in this context often ignore significant performance gains that can be acheived with practical modifications to hypervisor and host systems. In this thesis, the network performance of containers and virtual machines are measured with standard network performance tools. The performance of these systems utilizing a standard 3.18.20 Linux kernel is compared to that of a realtime-tuned variant of the same kernel. This thesis motivates improving determinism in virtual systems with modifications to host and guest kernels and thoughtful process isolation. With the system modifications described, the median TCP bandwidth of KVM virtual machines over bridged network interfaces, is increased by 10.8% with a corresponding reduction in standard deviation of 87.6%. Docker containers see a 8.8% improvement in median bandwidth and 4.4% reduction in standard deviation of TCP measurements using similar bridged networking. System tuning also reduces the standard deviation of TCP request/response latency (TCP RR) over bridged interfaces by 86.8% for virtual machines and 97.9% for containers. Hardware devices assigned to virtual systems also see reductions in variance, although not as noteworthy.

ContributorsWelch, James Matthew (Author) / Syrotiuk, Violet R. (Thesis advisor) / Wu, Carole-Jean (Committee member) / Speyer, Gil (Committee member) / Arizona State University (Publisher)

Created2015

Fixed verse generation using neural word embeddings

Description

For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in artificial intelligence (AI) and natural language generation (NLG) research, and among linguistic creativity researchers in particular. This thesis presents a…

For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in artificial intelligence (AI) and natural language generation (NLG) research, and among linguistic creativity researchers in particular. This thesis presents a novel approach to fixed verse poetry generation using neural word embeddings. During the course of generation, a two layered poetry classifier is developed. The first layer uses a lexicon based method to classify poems into types based on form and structure, and the second layer uses a supervised classification method to classify poems into subtypes based on content with an accuracy of 92%. The system then uses a two-layer neural network to generate poetry based on word similarities and word movements in a 50-dimensional vector space.

The verses generated by the system are evaluated using rhyme, rhythm, syllable counts and stress patterns. These computational features of language are considered for generating haikus, limericks and iambic pentameter verses. The generated poems are evaluated using a Turing test on both experts and non-experts. The user study finds that only 38% computer generated poems were correctly identified by nonexperts while 65% of the computer generated poems were correctly identified by experts. Although the system does not pass the Turing test, the results from the Turing test suggest an improvement of over 17% when compared to previous methods which use Turing tests to evaluate poetry generators.

ContributorsMagge, Arjun (Author) / Syrotiuk, Violet R. (Thesis advisor) / Baral, Chitta (Committee member) / Hogue, Cynthia (Committee member) / Bazzi, Rida (Committee member) / Arizona State University (Publisher)

Created2016

An evaluation of SDN based network virtualization techniques

Description

With the software-defined networking trend growing, several network virtualization controllers have been developed in recent years. These controllers, also called network hypervisors, attempt to manage physical SDN based networks so that multiple tenants can safely share the same forwarding plane hardware without risk of being affected by or affecting other…

With the software-defined networking trend growing, several network virtualization controllers have been developed in recent years. These controllers, also called network hypervisors, attempt to manage physical SDN based networks so that multiple tenants can safely share the same forwarding plane hardware without risk of being affected by or affecting other tenants. However, many areas remain unexplored by current network hypervisor implementations. This thesis presents and evaluates some of the features offered by network hypervisors, such as full header space availability, isolation, and transparent traffic forwarding capabilities for tenants. Flow setup time and throughput are also measured and compared among different network hypervisors. Three different network hypervisors are evaluated: FlowVisor, VeRTIGO and OpenVirteX. These virtualization tools are assessed with experiments conducted on three different testbeds: an emulated Mininet scenario, a physical single-switch testbed, and also a remote GENI testbed. The results indicate that network hypervisors bring SDN flexibility to network virtualization, making it easier for network administrators to define with precision how the network is sliced and divided among tenants. This increased flexibility, however, may come with the cost of decreased performance, and also brings additional risks of interoperability due to a lack of standardization of virtualization methods.

ContributorsStall Rechia, Felipe (Author) / Syrotiuk, Violet R. (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2016

Analysis and visualization of OpenFlow rule conflicts

Description

In traditional networks the control and data plane are highly coupled, hindering development. With Software Defined Networking (SDN), the two planes are separated, allowing innovations on either one independently of the other. Here, the control plane is formed by the applications that specify an organization's policy and the data plane…

In traditional networks the control and data plane are highly coupled, hindering development. With Software Defined Networking (SDN), the two planes are separated, allowing innovations on either one independently of the other. Here, the control plane is formed by the applications that specify an organization's policy and the data plane contains the forwarding logic. The application sends all commands to an SDN controller which then performs the requested action on behalf of the application. Generally, the requested action is a modification to the flow tables, present in the switches, to reflect a change in the organization's policy. There are a number of ways to control the network using the SDN principles, but the most widely used approach is OpenFlow.

With the applications now having direct access to the flow table entries, it is easy to have inconsistencies arise in the flow table rules. Since the flow rules are structured similar to firewall rules, the research done in analyzing and identifying firewall rule conflicts can be adapted to work with OpenFlow rules.

The main work of this thesis is to implement flow conflict detection logic in OpenDaylight and inspect the applicability of techniques in visualizing the conflicts. A hierarchical edge-bundling technique coupled with a Reingold-Tilford tree is employed to present the relationship between the conflicting rules. Additionally, a table-driven approach is also implemented to display the details of each flow.

Both types of visualization are then tested for correctness by providing them with flows which are known to have conflicts. The conflicts were identified properly and displayed by the views.

ContributorsNatarajan, Janakarajan (Author) / Huang, Dijiang (Thesis advisor) / Syrotiuk, Violet R. (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Arizona State University (Publisher)

Created2016

Covering arrays: algorithms and asymptotics

Description

Modern software and hardware systems are composed of a large number of components. Often different components of a system interact with each other in unforeseen and undesired ways to cause failures. Covering arrays are a useful mathematical tool for testing all possible t-way interactions among the components of a system.
…

Modern software and hardware systems are composed of a large number of components. Often different components of a system interact with each other in unforeseen and undesired ways to cause failures. Covering arrays are a useful mathematical tool for testing all possible t-way interactions among the components of a system.

The two major issues concerning covering arrays are explicit construction of a covering array, and exact or approximate determination of the covering array number---the minimum size of a covering array. Although these problems have been investigated extensively for the last couple of decades, in this thesis we present significant improvements on both of these questions using tools from the probabilistic method and randomized algorithms.

First, a series of improvements is developed on the previously known upper bounds on covering array numbers. An estimate for the discrete Stein-Lovász-Johnson bound is derived and the Stein- Lovász -Johnson bound is improved upon using an alteration strategy. Then group actions on the set of symbols are explored to establish two asymptotic upper bounds on covering array numbers that are tighter than any of the presently known bounds.

Second, an algorithmic paradigm, called the two-stage framework, is introduced for covering array construction. A number of concrete algorithms from this framework are analyzed, and it is shown that they outperform current methods in the range of parameter values that are of practical relevance. In some cases, a reduction in the number of tests by more than 50% is achieved.

Third, the Lovász local lemma is applied on covering perfect hash families to obtain an upper bound on covering array numbers that is tightest of all known bounds. This bound leads to a Moser-Tardos type algorithm that employs linear algebraic computation over finite fields to construct covering arrays. In some cases, this algorithm outperforms currently used methods by more than an 80% margin.

Finally, partial covering arrays are introduced to investigate a few practically relevant relaxations of the covering requirement. Using probabilistic methods, bounds are obtained on partial covering arrays that are significantly smaller than for covering arrays. Also, randomized algorithms are provided that construct such arrays in expected polynomial time.

ContributorsSarakāra, Kauśika (Author) / Colbourn, Charles J. (Thesis advisor) / Czygrinow, Andrzej (Committee member) / Richa, Andréa W. (Committee member) / Syrotiuk, Violet R. (Committee member) / Arizona State University (Publisher)

Created2016

Cooperative multi-channel MAC protocols for wireless ad hoc networks

Description

Today, many wireless networks are single-channel systems. However, as the interest in wireless services increases, the contention by nodes to occupy the medium is more intense and interference worsens. One direction with the potential to increase system throughput is multi-channel systems. Multi-channel systems have been shown to reduce collisions and…

Today, many wireless networks are single-channel systems. However, as the interest in wireless services increases, the contention by nodes to occupy the medium is more intense and interference worsens. One direction with the potential to increase system throughput is multi-channel systems. Multi-channel systems have been shown to reduce collisions and increase concurrency thus producing better bandwidth usage. However, the well-known hidden- and exposed-terminal problems inherited from single-channel systems remain, and a new channel selection problem is introduced. In this dissertation, Multi-channel medium access control (MAC) protocols are proposed for mobile ad hoc networks (MANETs) for nodes equipped with a single half-duplex transceiver, using more sophisticated physical layer technologies. These include code division multiple access (CDMA), orthogonal frequency division multiple access (OFDMA), and diversity. CDMA increases channel reuse, while OFDMA enables communication by multiple users in parallel. There is a challenge to using each technology in MANETs, where there is no fixed infrastructure or centralized control. CDMA suffers from the near-far problem, while OFDMA requires channel synchronization to decode the signal. As a result CDMA and OFDMA are not yet widely used. Cooperative (diversity) mechanisms provide vital information to facilitate communication set-up between source-destination node pairs and help overcome limitations of physical layer technologies in MANETs. In this dissertation, the Cooperative CDMA-based Multi-channel MAC (CCM-MAC) protocol uses CDMA to enable concurrent transmissions on each channel. The Power-controlled CDMA-based Multi-channel MAC (PCC-MAC) protocol uses transmission power control at each node and mitigates collisions of control packets on the control channel by using different sizes of the spreading factor to have different processing gains for the control signals. The Cooperative Dual-access Multi-channel MAC (CDM-MAC) protocol combines the use of OFDMA and CDMA and minimizes channel interference by a resolvable balanced incomplete block design (BIBD). In each protocol, cooperating nodes help reduce the incidence of the multi-channel hidden- and exposed-terminal and help address the near-far problem of CDMA by supplying information. Simulation results show that each of the proposed protocols achieve significantly better system performance when compared to IEEE 802.11, other multi-channel protocols, and another protocol CDMA-based.

ContributorsMoon, Yuhan (Author) / Syrotiuk, Violet R. (Thesis advisor) / Huang, Dijiang (Committee member) / Reisslein, Martin (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2010

Heuristics for Arc Routing Problems and Their Applications

Description

Arc Routing Problems (ARPs) are a type of routing problem that finds routes of minimum total cost covering the edges or arcs in a graph representing street or road networks. They find application in many essential services such as residential waste collection, winter gritting, and others. Being NP-hard, solutions are…

Arc Routing Problems (ARPs) are a type of routing problem that finds routes of minimum total cost covering the edges or arcs in a graph representing street or road networks. They find application in many essential services such as residential waste collection, winter gritting, and others. Being NP-hard, solutions are usually found using heuristic methods. This dissertation contributes to heuristics for ARP, with a focus on the Capacitated Arc Routing Problem (CARP) with additional constraints. In operations such as residential waste collection, vehicle breakdown disruptions occur frequently. A new variant Capacitated Arc Re-routing Problem for Vehicle Break-down (CARP-VB) is introduced to address the need to re-route using only remaining vehicles to avoid missing services. A new heuristic Probe is developed to solve CARP-VB. Experiments on benchmark instances show that Probe is better in reducing the makespan and hence effective in reducing delays and avoiding missing services. In addition to total cost, operators are also interested in solutions that are attractive, that is, routes that are contiguous, compact, and non-overlapping to manage the work. Operators may not adopt a solution that is not attractive even if it is optimum. They are also interested in solutions that are balanced in workload to meet equity requirements. A new multi-objective memetic algorithm, MA-ABC is developed, that optimizes three objectives: Attractiveness, makespan, and total cost. On testing with benchmark instances, MA-ABC was found to be effective in providing attractive and balanced route solutions without affecting the total cost. Changes in the problem specification such as demand and topology occurs frequently in business operations. Machine learning be applied to learn the distribution behind these changes and generate solutions quickly at time of inference. Splice is a machine learning framework for CARP that generates closer to optimum solutions quickly using a graph neural network and deep Q-learning. Splice can solve several variants of node and arc routing problems using the same architecture without any modification. Splice was trained and tested using randomly generated instances. Splice generated solutions faster that are also better in comparison to popular metaheuristics.

ContributorsRamamoorthy, Muhilan (Author) / Syrotiuk, Violet R. (Thesis advisor) / Forrest, Stephanie (Committee member) / Mirchandani, Pitu (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2022