Search Content

A Value-preserving Approach to Regression Test Selection in Agile Methods

Description

This thesis introduces a requirement-based regression test selection approach in an agile development context. Regression testing is critical in ensuring software quality but demands substantial time and resources. The rise of agile methodologies emphasizes the need for swift, iterative software delivery, requiring efficient regression testing. Although executing all existing test…

This thesis introduces a requirement-based regression test selection approach in an agile development context. Regression testing is critical in ensuring software quality but demands substantial time and resources. The rise of agile methodologies emphasizes the need for swift, iterative software delivery, requiring efficient regression testing. Although executing all existing test cases is the most thorough approach, it becomes impractical and resource-intensive for large real-world projects. Regression test selection emerges as a solution to this challenge, focusing on identifying a subset of test cases that efficiently uncover potential faults due to changes in the existing code. Existing literature on regression test selection in agile settings presents strategies that may only partially embrace agile characteristics. This research proposes a regression test selection method by utilizing data from user stories—agile's equivalent of requirements—and the associated business value spanning successive releases to pinpoint regression test cases. Given that value is a chief metric in agile, and testing—particularly regression testing—is often viewed more as value preservation than creation, the approach in this thesis demonstrates that integrating user stories and business value can lead to notable advancements in agile regression testing efficiency.

ContributorsMondal, Aniruddha (Author) / Gary, Kevin KG (Thesis advisor) / Bansal, Srividya SB (Thesis advisor) / Tuzmen, Ayca AT (Committee member) / Arizona State University (Publisher)

Created2023

Elliptic Fourier Features for Robustness to Rotations and Translations in Neural Networks

Description

In image classification tasks, images are often corrupted by spatial transformationslike translations and rotations. In this work, I utilize an existing method that uses the Fourier series expansion to generate a rotation and translation invariant representation of closed contours found in sketches, aiming to attenuate the effects of distribution shift caused…

In image classification tasks, images are often corrupted by spatial transformationslike translations and rotations. In this work, I utilize an existing method that uses the Fourier series expansion to generate a rotation and translation invariant representation of closed contours found in sketches, aiming to attenuate the effects of distribution shift caused by the aforementioned transformations. I use this technique to transform input images into one of two different invariant representations, a Fourier series representation and a corrected raster image representation, prior to passing them to a neural network for classification. The architectures used include convolutional neutral networks (CNNs), multi-layer perceptrons (MLPs), and graph neural networks (GNNs). I compare the performance of this method to using data augmentation during training, the standard approach for addressing distribution shift, to see which strategy yields the best performance when evaluated against a test set with rotations and translations applied. I include experiments where the augmentations applied during training both do and do not accurately reflect the transformations encountered at test time. Additionally, I investigate the robustness of both approaches to high-frequency noise. In each experiment, I also compare training efficiency across models. I conduct experiments on three data sets, the MNIST handwritten digit dataset, a custom dataset (QD-3) consisting of three classes of geometric figures from the Quick, Draw! hand-drawn sketch dataset, and another custom dataset (QD-345) featuring sketches from all 345 classes found in Quick, Draw!. On the smaller problem space of MNIST and QD-3, the networks utilizing the Fourier-based technique to attenuate distribution shift perform competitively with the standard data augmentation strategy. On the more complex problem space of QD-345, the networks using the Fourier technique do not achieve the same test performance as correctly-applied data augmentation. However, they still outperform instances where train-time augmentations mis-predict test-time transformations, and outperform a naive baseline model where no strategy is used to attenuate distribution shift. Overall, this work provides evidence that strategies which attempt to directly mitigate distribution shift, rather than simply increasing the diversity of the training data, can be successful when certain conditions hold.

ContributorsWatson, Matthew (Author) / Yang, Yezhou YY (Thesis advisor) / Kerner, Hannah HK (Committee member) / Yang, Yingzhen YY (Committee member) / Arizona State University (Publisher)

Created2023

Semantic Information Extraction From Natural Language Using a Learning and Rule-Based Approach

Description

Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject, predicate, and object. The extraction of RDF triples from…

Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject, predicate, and object. The extraction of RDF triples from natural language is an essential step towards importing data into web ontologies as part of the linked open data cloud on the Semantic web. There have been a number of related techniques for extraction of triples from plain natural language text including but not limited to ClausIE, OLLIE, Reverb, and DeepEx. This proposed study aims to reduce the dependency on conventional machine learning models since they require training datasets, and the models are not easily customizable or explainable. By leveraging a context-free grammar (CFG) based model, this thesis aims to address some of these issues while minimizing the trade-offs on performance and accuracy. Furthermore, a deep-dive is conducted to analyze the strengths and limitations of the proposed approach.

ContributorsSingh, Varun (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)

Created2023

Instruction Tuned Models Are Quick Learners with Instruction Equipped Data on Downstream Tasks

Description

Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of training data for downstream or “Held-in” tasks. Often in real-world situations, there is a scarcity of data available…

Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of training data for downstream or “Held-in” tasks. Often in real-world situations, there is a scarcity of data available for finetuning, falling somewhere between few shot inference and fully supervised finetuning. In this work, I demonstrate the sample efficiency of instruction tuned models over various tasks by estimating the minimal training data required by downstream or “Held-In” tasks to perform transfer learning and match the performance of state-of-the-art (SOTA) supervised models. I conduct experiments on 119 tasks from Super Natural Instructions (SuperNI) in both the single task learning / Expert Modelling (STL) and multi task learning (MTL) settings. My findings reveal that, in the STL setting, instruction tuned models equipped with 25% of the downstream train data surpass the SOTA performance on the downstream tasks. In the MTL setting, an instruction tuned model trained on only 6% of downstream training data achieve SOTA, while using 100% of the training data results in a 3.69% points improvement (ROUGE-L 74.68) over the previous SOTA. I conduct an analysis on T5 vs Tk-Instruct by developing several baselines to demonstrate that instruction tuning aids in increasing both sample efficiency and transfer learning. Additionally, I observe a consistent ∼ 4% performance increase in both settings when pre-finetuning is performed with instructions. Finally, I conduct a categorical study and find that contrary to previous results, tasks in the question rewriting and title generation categories suffer from instruction tuning.

ContributorsGupta, Himanshu (Author) / Baral, Chitta Dr (Thesis advisor) / Mitra, Arindam Dr (Committee member) / Gopalan, Nakul Dr (Committee member) / Arizona State University (Publisher)

Created2023

Improving Smart Home Security: Using Blockchain-Based Situation-Aware Access Control

Description

The evolution of technology, including the proliferation of the Internet of Things (IoT), advanced sensors, intelligent systems, and more, has paved the way for the establishment of smart homes. These homes bring a new era of automation with interconnected devices, offering increased services. However, they also introduce data security and…

The evolution of technology, including the proliferation of the Internet of Things (IoT), advanced sensors, intelligent systems, and more, has paved the way for the establishment of smart homes. These homes bring a new era of automation with interconnected devices, offering increased services. However, they also introduce data security and device management challenges. Current smart home technologies are susceptible to security violations, leaving users vulnerable to data compromise, privacy invasions, and physical risks. These systems often fall short in implementing stringent data security safeguards, and the user control process is complex. In this thesis, an approach is presented to improve smart home security by integrating private blockchain technology with situational awareness access control. Using blockchain technology ensures transparency and immutability in data transactions. Transparency from the blockchain enables meticulous tracking of data access, modifications, and policy changes. The immutability of blockchain is utilized to strengthen the integrity of data, deterring, and preventing unauthorized alterations. While the designed solution leverages these specific blockchain features, it consciously does not employ blockchain's decentralization due to the limited computational resources of IoT devices and the focused requirement for centralized management within a smart home context. Additionally, situational awareness facilitates the dynamic adaptation of access policies. The strategies in this thesis excel beyond existing solutions, providing fine-grained access control, reliable transaction data storage, data ownership, audibility, transparency, access policy, and immutability. This approach is thoroughly evaluated against existing smart home security improvement solutions.

ContributorsLin, Zhicheng (Author) / Yau, Stephen S. (Thesis advisor) / Baek, Jaejong (Committee member) / Ghayekhloo, Samira (Committee member) / Arizona State University (Publisher)

Created2023

A Network-Based Intrusion Prevention Approach for Cloud Systems Using XGBoost and LSTM Models

Description

The advancement of cloud technology has impacted society positively in a number of ways, but it has also led to an increase in threats that target private information available on cloud systems. Intrusion prevention systems play a crucial role in protecting cloud systems from such threats. In this thesis, an…

The advancement of cloud technology has impacted society positively in a number of ways, but it has also led to an increase in threats that target private information available on cloud systems. Intrusion prevention systems play a crucial role in protecting cloud systems from such threats. In this thesis, an intrusion prevention approach todetect and prevent such threats in real-time is proposed. This approach is designed for network-based intrusion prevention systems and leverages the power of supervised machine learning with Extreme Gradient Boosting (XGBoost) and Long Short-Term Memory (LSTM) algorithms, to analyze the flow of each packet that is sent to a cloud system through the network. The innovations of this thesis include developing a custom LSTM architecture, using this architecture to train a LSTM model to identify attacks and using TCP reset functionality to prevent attacks for cloud systems. The aim of this thesis is to provide a framework for an Intrusion Prevention System. Based on simulations and experimental results with the NF-UQ-NIDS-v2 dataset, the proposed system is accurate, fast, scalable and has a low rate of false positives, making it suitable for real world applications.

ContributorsGianchandani, Siddharth (Author) / Yau, Stephen (Thesis advisor) / Zhao, Ming (Committee member) / Lee, Kookjin (Committee member) / Arizona State University (Publisher)

Created2023

TIPANGLE: A Machine Learning Approach for Accurate Spatial Pan and Tilt Angle Determination of Pan Tilt Traffic Cameras

Description

Pan Tilt Traffic Cameras (PTTC) are a vital component of traffic managementsystems for monitoring/surveillance. In a real world scenario, if a vehicle is in pursuit of another vehicle or an accident has occurred at an intersection causing traffic stoppages, accurate and venerable data from PTTC is necessary to quickly localize the cars on…

Pan Tilt Traffic Cameras (PTTC) are a vital component of traffic managementsystems for monitoring/surveillance. In a real world scenario, if a vehicle is in pursuit of another vehicle or an accident has occurred at an intersection causing traffic stoppages, accurate and venerable data from PTTC is necessary to quickly localize the cars on a map for adept emergency response as more and more traffic systems are getting automated using machine learning concepts. However, the position(orientation) of the PTTC with respect to the environment is often unknown as most of them lack Inertial Measurement Units or Encoders. Current State Of the Art systems 1. Demand high performance compute and use carbon footprint heavy Deep Neural Networks(DNN), 2. Are only applicable to scenarios with appropriate lane markings or only roundabouts, 3. Demand complex mathematical computations to determine focal length and optical center first before determining the pose. A compute light approach "TIPANGLE" is presented in this work. The approach uses the concept of Siamese Neural Networks(SNN) encompassing simple mathematical functions i.e., Euclidian Distance and Contrastive Loss to achieve the objective. The effectiveness of the approach is reckoned with a thorough comparison study with alternative approaches and also by executing the approach on an embedded system i.e., Raspberry Pi 3.

ContributorsJagadeesha, Shreehari (Author) / Shrivastava, Aviral (Thesis advisor) / Gopalan, Nakul (Committee member) / Arora, Aman (Committee member) / Arizona State University (Publisher)

Created2023

Protecting Oneself in a Digital World

Description

Due to the internet being in its infancy, there is no consensus regarding policy approaches that various countries have taken. These policies range from strict government control to liberal access to the internet which makes protecting individual private data difficult. There are too many loopholes and various forms of policy…

Due to the internet being in its infancy, there is no consensus regarding policy approaches that various countries have taken. These policies range from strict government control to liberal access to the internet which makes protecting individual private data difficult. There are too many loopholes and various forms of policy on how to approach protecting data. There must be effort by both the individual, government, and private entities by using theoretical mixed methods to approach protecting oneself properly online.

ContributorsPeralta, Christina A (Author) / Scheall, Scott (Thesis advisor) / Hollinger, Keith (Thesis advisor) / Alozie, Nicholas (Committee member) / Arizona State University (Publisher)

Created2023

Enhancing and Evaluating Neural Network Extraction Through Floating Point Timing Side Channels

Description

The rise in popularity of applications and services that charge for access to proprietary trained models has led to increased interest in the robustness of these models and the security of the environments in which inference is conducted. State-of-the-art attacks extract models and generate adversarial examples by inferring relationships between…

The rise in popularity of applications and services that charge for access to proprietary trained models has led to increased interest in the robustness of these models and the security of the environments in which inference is conducted. State-of-the-art attacks extract models and generate adversarial examples by inferring relationships between a model’s input and output. Popular variants of these attacks have been shown to be deterred by countermeasures that poison predicted class distributions and mask class boundary gradients. Neural networks are also vulnerable to timing side-channel attacks. This work builds on top of Subneural, an attack framework that uses floating point timing side channels to extract neural structures. Novel applications of addition timing side channels are introduced, allowing the signs and arrangements of leaked parameters to be discerned more efficiently. Addition timing is also used to leak network biases, making the framework applicable to a wider range of targets. The enhanced framework is shown to be effective against models protected by prediction poisoning and gradient masking adversarial countermeasures and to be competitive with adaptive black box adversarial attacks against stateful defenses. Mitigations necessary to protect against floating-point timing side-channel attacks are also presented.

ContributorsVipat, Gaurav (Author) / Shoshitaishvili, Yan (Thesis advisor) / Doupe, Adam (Committee member) / Srivastava, Siddharth (Committee member) / Arizona State University (Publisher)

Created2023

The Perception of Graph Properties In Graph Layouts

Description

When looking at drawings of graphs, questions about graph density, community structures, local clustering and other graph properties may be of critical importance for analysis. While graph layout algorithms have focused on minimizing edge crossing, symmetry, and other such layout properties, there is not much known about how these algorithms…

When looking at drawings of graphs, questions about graph density, community structures, local clustering and other graph properties may be of critical importance for analysis. While graph layout algorithms have focused on minimizing edge crossing, symmetry, and other such layout properties, there is not much known about how these algorithms relate to a user’s ability to perceive graph properties for a given graph layout. This study applies previously established methodologies for perceptual analysis to identify which graph drawing layout will help the user best perceive a particular graph property. A large scale (n = 588) crowdsourced experiment is conducted to investigate whether the perception of two graph properties (graph density and average local clustering coefficient) can be modeled using Weber’s law. Three graph layout algorithms from three representative classes (Force Directed - FD, Circular, and Multi-Dimensional Scaling - MDS) are studied, and the results of this experiment establish the precision of judgment for these graph layouts and properties. The findings demonstrate that the perception of graph density can be modeled with Weber’s law. Furthermore, the perception of the average clustering coefficient can be modeled as an inverse of Weber’s law, and the MDS layout showed a significantly different precision of judgment than the FD layout.

ContributorsSoni, Utkarsh (Author) / Maciejewski, Ross (Thesis advisor) / Kobourov, Stephen (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)

Created2018

Filtering by