Improving and Automating Machine Learning Model Compression

193384-Thumbnail Image.png
Description
Machine learning models are increasingly employed by smart devices on the edge to support important applications such as real-time virtual assistants and privacy-preserving healthcare. However, deploying state-of-the-art (SOTA) deep learning models on devices faces multiple serious challenges. First, it is

Machine learning models are increasingly employed by smart devices on the edge to support important applications such as real-time virtual assistants and privacy-preserving healthcare. However, deploying state-of-the-art (SOTA) deep learning models on devices faces multiple serious challenges. First, it is infeasible to deploy large models on resource-constrained edge devices whereas small models cannot achieve the SOTA accuracy. Second, it is difficult to customize the models according to diverse application requirements in accuracy and speed and diverse capabilities of edge devices. This study proposes several novel solutions to comprehensively address the above challenges through automated and improved model compression. First, it introduces Automatic Attention Pruning (AAP), an adaptive, attention-based pruning approach to automatically reduce model parameters while meeting diverse user objectives in model size, speed, and accuracy. AAP achieves an impressive 92.72% parameter reduction in ResNet-101 on Tiny-ImageNet without causing any accuracy loss. Second, it presents Self-Supervised Quantization-Aware Knowledge Distillation (SQAKD), a framework for reducing model precision without supervision from labeled training data. For example, it quantizes VGG-8 to 2 bits on CIFAR-10 without any accuracy loss. Finally, the study explores two more works, Contrastive Knowledge Distillation Framework (CKDF) and Log-Curriculum based Module Replacing (LCMR), for further improving the performance of small models. All the works proposed in this study are designed to address real-world challenges, and have been successfully deployed on diverse hardware platforms, including cloud instances and edge devices, catalyzing AI for the edge.
Date Created
2024
Agent

Optimizing Memory and Storage Disaggregation for Data-intensive Systems

191751-Thumbnail Image.png
Description
Data-intensive systems such as big data and large machine learning (ML) systems experience serious scalability challenges due to the ever-increasing data demand from ML and analytics applications and the resource fragmentation caused by conventional monolithic server architecture. Memory and storage

Data-intensive systems such as big data and large machine learning (ML) systems experience serious scalability challenges due to the ever-increasing data demand from ML and analytics applications and the resource fragmentation caused by conventional monolithic server architecture. Memory and storage disaggregation emerges as a pivotal technology to address these challenges by decoupling memory and storage resources from individual servers and managing and provisioning them to applications as a shared resource pool. This dissertation investigates several important aspects of memory and storage disaggregation and proposes novel solutions to support data-intensive applications.First, caching is a fundamental way to utilize disaggregated storage, but building a large disaggregated cache is challenging because the commonly-used fix-sized cache block allocation scheme is unable to provide good cache performance with low memory overhead for diverse cloud workloads with vastly different I/O patterns. The dissertation proposes a novel adaptive cache block allocation approach that dynamically adjusts cache block sizes based on changing I/O patterns. This approach significantly improves I/O performance while reducing memory usage, outperforming traditional fixed-size cache systems in diverse cloud workloads. Evaluation shows that it improves read latency by 20% and write latency by 9%. It also reduces the amount of I/O traffic to cloud block storage by up to 74% while achieving up to 41% memory savings with only 2 ms. Second, large ML applications such as large language model (LLM) inference are memory demanding, but to support them using disaggregated memory brings challenges to memory management since disaggregated memory has higher memory access latency compared to local memory. The dissertation proposes latency-aware memory aggregation which cautiously distributes memory accesses to minimize the latency gap between local and disaggregated memory. It also proposes NUMA-aligned tensor parallelism to further improve the computing efficiency. With these optimizations, LLM inference achieves substantial speedups. For example, first token latency improves by 61%, and end-to-end latency improves by 43% for a LLM inference task which uses a model of 66 billion parameters when the batch size is 8. Finally, to address the cost, power consumption, and volatility of DRAM, the dissertation proposes to incorporate flash memory into memory pools within the disaggregation framework. By establishing a tiered memory architecture which combines fast-tier local DRAM with slow-tier DRAM and flash memory in the memory pool and effectively migrates data based on hotness across memory tiers, this approach not only reduces expenses but also maintains the overall performance and scalability of data-intensive systems. For example, with 50% saving in memory cost, the performance degradation of training ResNet50 on ImageNet dataset is only 2.68%. Together, these contributions systematically optimize the use of memory and storage disaggregation to deliver more efficient, scalable, and cost-effective systems for supporting the data explosion in today’s and future computing systems.
Date Created
2024
Agent

A Network-Based Intrusion Prevention Approach for Cloud Systems Using XGBoost and LSTM Models

190927-Thumbnail Image.png
Description
The advancement of cloud technology has impacted society positively in a number of ways, but it has also led to an increase in threats that target private information available on cloud systems. Intrusion prevention systems play a crucial role in

The advancement of cloud technology has impacted society positively in a number of ways, but it has also led to an increase in threats that target private information available on cloud systems. Intrusion prevention systems play a crucial role in protecting cloud systems from such threats. In this thesis, an intrusion prevention approach todetect and prevent such threats in real-time is proposed. This approach is designed for network-based intrusion prevention systems and leverages the power of supervised machine learning with Extreme Gradient Boosting (XGBoost) and Long Short-Term Memory (LSTM) algorithms, to analyze the flow of each packet that is sent to a cloud system through the network. The innovations of this thesis include developing a custom LSTM architecture, using this architecture to train a LSTM model to identify attacks and using TCP reset functionality to prevent attacks for cloud systems. The aim of this thesis is to provide a framework for an Intrusion Prevention System. Based on simulations and experimental results with the NF-UQ-NIDS-v2 dataset, the proposed system is accurate, fast, scalable and has a low rate of false positives, making it suitable for real world applications.
Date Created
2023
Agent

Towards Scalable Security State Management in The Cloud

187520-Thumbnail Image.png
Description
Modern data center networks require efficient and scalable security analysis approaches that can analyze the relationship between the vulnerabilities. Utilizing the Attack Representation Methods (ARMs) and Attack Graphs (AGs) enables the security administrator to understand the cloud network’s current security

Modern data center networks require efficient and scalable security analysis approaches that can analyze the relationship between the vulnerabilities. Utilizing the Attack Representation Methods (ARMs) and Attack Graphs (AGs) enables the security administrator to understand the cloud network’s current security situation at the low-level. However, the AG approach suffers from scalability challenges. It relies on the connectivity between the services and the vulnerabilities associated with the services to allow the system administrator to realize its security state. In addition, the security policies created by the administrator can have conflicts among them, which is often detected in the data plane of the Software Defined Networking (SDN) system. Such conflicts can cause security breaches and increase the flow rules processing delay. This dissertation addresses these challenges with novel solutions to tackle the scalability issue of Attack Graphs and detect security policy conflictsin the application plane before they are transmitted into the data plane for final installation. Specifically, it introduces a segmentation-based scalable security state (S3) framework for the cloud network. This framework utilizes the well-known divide-and-conquer approach to divide the large network region into smaller, manageable segments. It follows a well-known segmentation approach derived from the K-means clustering algorithm to partition the system into segments based on the similarity between the services. Furthermore, the dissertation presents unified intent rules that abstract the network administration from the underlying network controller’s format. It develops a networking service solution to use a bounded formal model for network service compliance checking that significantly reduces the complexity of flow rule conflict checking at the data plane level. The solution can be expended from a single SDN domain to multiple SDN domains and hybrid networks by applying network service function chaining (SFC) for inter-domain policy management.
Date Created
2023
Agent

Nexus Modeling and Distributed Simulation: A RESTful Framework for Understanding and Predicting Dynamics of Interacting Water and Energy Systems

187443-Thumbnail Image.png
Description
Water, energy, and food are essential resources to sustain the development of the society. The Food-Energy-Water Nexus (FEW-Nexus) must account for synergies and trade-offs among these resources. The nexus concept highlights the importance of integrative solutions that secure supplies to

Water, energy, and food are essential resources to sustain the development of the society. The Food-Energy-Water Nexus (FEW-Nexus) must account for synergies and trade-offs among these resources. The nexus concept highlights the importance of integrative solutions that secure supplies to meet demands sustainably. The existing frameworks and tools do not focus on formal model composability, a key capability for creating simulations created from separately developed models. The Knowledge Interchange Broker (KIB) approach is used to model the interactions among models to achieve composition flexibility for the FEW-Nexus.Domain experts generally use the Water Evaluation and Planning (WEAP) and Low Emissions Analysis Platform (LEAP) systems to study water and energy systems, respectively. The food part of FEW systems can be modeled inside the WEAP system. An internal linkage mechanism is available for combining and simulating WEAP and LEAP models. This mechanism is used for the validation and performance evaluation of independent modeling and simulation proposed in this research. The Componentized WEAP and LEAP RESTful frameworks are component-based representations for the legacy and closed-source WEAP and LEAP systems. These modularized systems simplify their use with other simulation frameworks. This research proposes two interaction model frameworks based on the Knowledge Interchange Broker approach. First, an Algorithmic Interaction Model (Algorithmic-IM) was developed to integrate the WEAP and LEAP models. The Algorithmic-IM model can be defined via programming language and has a fixed cyclic execution protocol. However, this approach has tightly interwoven the interaction model with its execution and has limited support for flexibly creating model hierarchies. To overcome these restrictions, the system-theoretic Parallel DEVS formalism is used to develop a DEVS-Based Interaction Model (DEVS-IM). As in the Algorithmic-IM, the DEVS-IM is implemented as a RESTful framework, uses MongoDB for defining structural DEVS models, and supports automatic code generation for the DEVSSuite simulator. The DEVS-IM offers modular, hierarchical structural modeling, reusability, flexibility, and maintainability for integrating disparate systems. The Phoenix Active Management Area (AMA) is used to demonstrate the real-world application of the proposed research. Furthermore, the correctness and performance of the presented frameworks in this research are evaluated using the Phoenix-AMA model.
Date Created
2023
Agent

Enhancing Stress Detection Systems Using Real-World Data and Deep Neural Networks

187320-Thumbnail Image.png
Description
As threats emerge and change, the life of a police officer continues to intensify. To better support police training curriculums and police cadets through this critical career juncture, this thesis proposes a state-of-the-art framework for stress detection using real-world data

As threats emerge and change, the life of a police officer continues to intensify. To better support police training curriculums and police cadets through this critical career juncture, this thesis proposes a state-of-the-art framework for stress detection using real-world data and deep neural networks. As an integral step of a larger study, this thesis investigates data processing techniques to handle the ambiguity of data collected in naturalistic contexts and leverages data structuring approaches to train deep neural networks. The analysis used data collected from 37 police training cadetsin five different training cohorts at the Phoenix Police Regional Training Academy. The data was collected at different intervals during the cadets’ rigorous six-month training course. In total, data were collected over 11 months from all the cohorts combined. All cadets were equipped with a Fitbit wearable device with a custom-built application to collect biometric data, including heart rate and self-reported stress levels. Throughout the data collection period, the cadets were asked to wear the Fitbit device and respond to stress level prompts to capture real-time responses. To manage this naturalistic data, this thesis leveraged heart rate filtering algorithms, including Hampel, Median, Savitzky-Golay, and Wiener, to remove potentially noisy data. After data processing and noise removal, the heart rate data and corresponding stress level labels are processed into two different dataset sizes. The data is then fed into a Deep ECGNet (created by Prajod et al.), a simple Feed Forward network (created by Sim et al.), and a Multilayer Perceptron (MLP) network for binary classification. Experimental results show that the Feed Forward network achieves the highest accuracy (90.66%) for data from a single cohort, while the MLP model performs best on data across cohorts, achieving an 85.92% accuracy. These findings suggest that stress detection is feasible on a variate set of real-world data using deepneural networks.
Date Created
2023
Agent

GPU-enabled Functional-as-a-Service

171964-Thumbnail Image.png
Description
Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve scalability and usability for a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable computation resources and complicated configurations. Many applications, including

Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve scalability and usability for a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable computation resources and complicated configurations. Many applications, including ML inference, rely on Graphics-Processing-Unit (GPU) to achieve high performance; however, support for GPUs is currently lacking in existing FaaS solutions. The unique event-triggered and short-lived nature of functions poses new challenges to enabling GPUs on FaaS which must consider the overhead of transferring data (e.g., ML model parameters and inputs/outputs) between GPU and host memory. This thesis presents a new GPU-enabled FaaS solution that enables functions to efficiently utilize GPUs to accelerate computations such as model inference. First, the work extends existing open-source FaaS frameworks such as OpenFaaS to support the scheduling and execution of functions across GPUs in a FaaS cluster. Second, it provides caching of ML models in GPU memory to improve the performance of model inference functions and global management of GPU memories to improve the cache utilization. Third, it offers co-designed GPU function scheduling and cache management to optimize the performance of ML inference functions. Specifically, the thesis proposes locality-aware scheduling which maximizes the utilization of both GPU memory for cache hits and GPU cores for parallel processing. A thorough evaluation based on real-world traces and ML models shows that the proposed GPU-enabled FaaS works well for ML inference tasks, and the proposed locality-aware scheduler achieves a speedup of 34x compared to the default, load-balancing only scheduler.
Date Created
2022
Agent

Risk-based Network Vulnerability Prioritization

171813-Thumbnail Image.png
Description
This dissertation investigates the problem of efficiently and effectively prioritizing a vulnerability risk in a computer networking system. Vulnerability prioritization is one of the most challenging issues in vulnerability management, which affects allocating preventive and defensive resources in a computer

This dissertation investigates the problem of efficiently and effectively prioritizing a vulnerability risk in a computer networking system. Vulnerability prioritization is one of the most challenging issues in vulnerability management, which affects allocating preventive and defensive resources in a computer networking system. Due to the large number of identified vulnerabilities, it is very challenging to remediate them all in a timely fashion. Thus, an efficient and effective vulnerability prioritization framework is required. To deal with this challenge, this dissertation proposes a novel risk-based vulnerability prioritization framework that integrates the recent artificial intelligence techniques (i.e., neuro-symbolic computing and logic reasoning). The proposed work enhances the vulnerability management process by prioritizing vulnerabilities with high risk by refining the initial risk assessment with the network constraints. This dissertation is organized as follows. The first part of this dissertation presents the overview of the proposed risk-based vulnerability prioritization framework, which contains two stages. The second part of the dissertation investigates vulnerability risk features in a computer networking system. The third part proposes the first stage of this framework, a vulnerability risk assessment model. The proposed assessment model captures the pattern of vulnerability risk features to provide a more comprehensive risk assessment for a vulnerability. The fourth part proposes the second stage of this framework, a vulnerability prioritization reasoning engine. This reasoning engine derives network constraints from interactions between vulnerabilities and network environment elements based on network and system setups. This proposed framework assesses a vulnerability in a computer networking system based on its actual security impact by refining the initial risk assessment with the network constraints.
Date Created
2022
Agent

Understanding Social Media Influence, Semantic Network Analysis, and Thematic Campaign Campaign Classification Using Machine Learning.

171644-Thumbnail Image.png
Description
Individuals and organizations have greater access to the world's population than ever before. The effects of Social Media Influence have already impacted the behaviour and actions of the world's population. This research employed mixed methods to investigate the mechanisms to

Individuals and organizations have greater access to the world's population than ever before. The effects of Social Media Influence have already impacted the behaviour and actions of the world's population. This research employed mixed methods to investigate the mechanisms to further the understand of how Social Media Influence Campaigns (SMIC) impact the global community as well as develop tools and frameworks to conduct analysis. The research has qualitatively examined the perceptions of Social Media, specifically how leadership believe it will change and it's role within future conflict. This research has developed and tested semantic ontological modelling to provide insights into the nature of network related behaviour of SMICs. This research also developed exemplar data sets of SMICs. The insights gained from initial research were used to train Machine Learning classifiers to identify thematically related campaigns. This work has been conducted in close collaboration with Alliance Plus Network partner, University of New South Wales and the Australian Defence Force.
Date Created
2022
Agent

Exploration of Edge Machine Learning-based Stress Detection Using Wearable Devices

168716-Thumbnail Image.png
Description
Stress is one of the critical factors in daily lives, as it has a profound impact onperformance at work and decision-making processes. With the development of IoT technology, smart wearables can handle diverse operations, including networking and recording biometric signals. Also, it

Stress is one of the critical factors in daily lives, as it has a profound impact onperformance at work and decision-making processes. With the development of IoT technology, smart wearables can handle diverse operations, including networking and recording biometric signals. Also, it has become easier for individual users to selfdetect stress with recorded data since these wearables as well as their accompanying smartphones now have data processing capability. Edge computing on such devices enables real-time feedback and in turn preemptive identification of reactions to stress. This can provide an opportunity to prevent more severe consequences that might result if stress is unaddressed. From a system perspective, leveraging edge computing allows saving energy such as network bandwidth and latency since it processes data in proximity to the data source. It can also strengthen privacy by implementing stress prediction at local devices without transferring personal information to the public cloud. This thesis presents a framework for real-time stress prediction using Fitbit and machine learning with the support from cloud computing. Fitbit is a wearable tracker that records biometric measurements using optical sensors on the wrist. It also provides developers with platforms to design custom applications. I developed an application for the Fitbit and the user’s accompanying mobile device to collect heart rate fluctuations and corresponding stress levels entered by users. I also established the dataset collected from police cadets during their academy training program. Machine learning classifiers for stress prediction are built using classic models and TensorFlow in the cloud. Lastly, the classifiers are optimized using model compression techniques for deploying them on the smartphones and analyzed how efficiently stress prediction can be performed on the edge.
Date Created
2022
Agent