Continual Learning with Novelty Detection: Algorithms and Applications to Image and Microelectronic Design

193492-Thumbnail Image.png
Description
Machine learning techniques have found extensive application in dynamic fields like drones, self-driving vehicles, surveillance, and more. Their effectiveness stems from meticulously crafted deep neural networks (DNNs), extensive data gathering efforts, and resource-intensive model training processes. However, due to the

Machine learning techniques have found extensive application in dynamic fields like drones, self-driving vehicles, surveillance, and more. Their effectiveness stems from meticulously crafted deep neural networks (DNNs), extensive data gathering efforts, and resource-intensive model training processes. However, due to the unpredictable nature of the environment, these systems will inevitably encounter input samples that deviate from the distribution of their original training data, resulting in instability and performance degradation.To effectively detect the emergence of out-of-distribution (OOD) data, this dissertation first proposes a novel, self-supervised approach that evaluates the Mahalanobis distance between the in-distribution (ID) and OOD in gradient space. A binary classifier is then introduced to guide the label selection for gradients calculation, which further boosts the detection performance. Next, to continuously adapt the new OOD into the existing knowledge base, an unified framework for novelty detection and continual learning is proposed. The binary classifier, trained to distinguish OOD data from ID, is connected sequentially with the pre-trained model to form a “N + 1” classifier, where “N” represents prior knowledge which contains N classes and “1” refers to the newly arrival OOD. This continual learning process continues as “N+1+1+1+...”, assimilating the knowledge of each new OOD instance into the system. Finally, this dissertation demonstrates the practical implementation of novelty detection and continual learning within the domain of thermal analysis. To rapidly address the impact of voids in thermal interface material (TIM), a continuous adaptation approach is proposed, which integrates trainable nodes into the graph at the locations where abnormal thermal behaviors are detected. With minimal training overhead, the model can quickly adapts to the change caused by the defects and regenerate accurate thermal prediction. In summary, this dissertation proposes several algorithms and practical applications in continual learning aimed at enhancing the stability and adaptability of the system. All proposed algorithms are validated through extensive experiments conducted on benchmark datasets such as CIFAR-10, CIFAR-100, TinyImageNet for continual learning, and real thermal data for thermal analysis.
Date Created
2024
Agent

Sensing for Wireless Communication: From Theory to Reality

191748-Thumbnail Image.png
Description
Millimeter-wave (mmWave) and sub-terahertz (sub-THz) systems aim to utilize the large bandwidth available at these frequencies. This has the potential to enable several future applications that require high data rates, such as autonomous vehicles and digital twins. These systems, however,

Millimeter-wave (mmWave) and sub-terahertz (sub-THz) systems aim to utilize the large bandwidth available at these frequencies. This has the potential to enable several future applications that require high data rates, such as autonomous vehicles and digital twins. These systems, however, have several challenges that need to be addressed to realize their gains in practice. First, they need to deploy large antenna arrays and use narrow beams to guarantee sufficient receive power. Adjusting the narrow beams of the large antenna arrays incurs massive beam training overhead. Second, the sensitivity to blockages is a key challenge for mmWave and THz networks. Since these networks mainly rely on line-of-sight (LOS) links, sudden link blockages highly threaten the reliability of the networks. Further, when the LOS link is blocked, the network typically needs to hand off the user to another LOS basestation, which may incur critical time latency, especially if a search over a large codebook of narrow beams is needed. A promising way to tackle both these challenges lies in leveraging additional side information such as visual, LiDAR, radar, and position data. These sensors provide rich information about the wireless environment, which can be utilized for fast beam and blockage prediction. This dissertation presents a machine-learning framework for sensing-aided beam and blockage prediction. In particular, for beam prediction, this work proposes to utilize visual and positional data to predict the optimal beam indices. For the first time, this work investigates the sensing-aided beam prediction task in a real-world vehicle-to-infrastructure and drone communication scenario. Similarly, for blockage prediction, this dissertation proposes a multi-modal wireless communication solution that utilizes bimodal machine learning to perform proactive blockage prediction and user hand-off. Evaluations on both real-world and synthetic datasets illustrate the promising performance of the proposed solutions and highlight their potential for next-generation communication and sensing systems.
Date Created
2024
Agent

Increasing the Efficiency of Heterogeneous System Operation: from Scheduling to Implementation of Split Federated Learning

190975-Thumbnail Image.png
Description
This thesis addresses the problems of (a) scheduling multiple streaming jobs with soft deadline constraints to minimize the risk/energy consumption in heterogeneous Systems-on-chip (SoCs), and (b) training a neural network model with high accuracy and low training time using split

This thesis addresses the problems of (a) scheduling multiple streaming jobs with soft deadline constraints to minimize the risk/energy consumption in heterogeneous Systems-on-chip (SoCs), and (b) training a neural network model with high accuracy and low training time using split federated learning (SFL) with heterogeneous clients. Designing a scheduler for heterogeneous SoC SoCs built with different types of processing elements (PEs) is quite challenging, especially when it has to balance the conflicting requirements of low energy consumption, low risk, and high throughput for randomly streaming jobs at run time. Two probabilistic deadline-aware schedulers are designed for heterogeneous SoCs for such jobs with soft deadline constraints with the goals of optimizing job-level risk and energy efficiency. The key idea of the probabilistic scheduler is to calculate the task-to-PE allocation probabilities when a job arrives in the system. This allocation probability, generated by manually designed or neural network (NN) based allocation function, is used to compute the intra-job and inter-job contentions to derive the task-level slack. The tasks are allocated to the PEs that can complete the task within the task-level slack with minimum risk or minimum energy consumption. SFL is an edge-friendly decentralized NN training scheme, where the model is split and only a small client-side model is trained in the clients. The communication overhead in SFL is significant since the intermediate activations and gradients of every sample are transmitted in every epoch. Two communication reduction methods have been proposed, namely, loss-aware selective updating to reduce the number of training epochs and bottleneck layer (BL) to reduce the feature size.Next, the SFL system is trained with heterogeneous clients having different data rates and operating on non-IID data. The communication time of clients in low-end group with slow data rates dominates the training time. To reduce the training time without sacrificing accuracy significantly, HeteroSFL is built with HetBL and bi- directional knowledge sharing (BDKS). HetBL compresses data with different factors in low- and high-end groups using narrow and wide bottleneck layers respectively. BDKS is proposed to mitigate the label distribution skew across different groups. BDKS can also be applied in Federated Learning to address the label distribution skew.
Date Created
2023
Agent

In-Memory Computing Based Hardware Accelerators Advancing the Implementation of AI/ML Tasks in Edge Devices

190780-Thumbnail Image.png
Description
Artificial Intelligence (AI) and Machine Learning (ML) techniques have come a long way since their inception and have been used to build intelligent systems for a wide range of applications in everyday life. However they are very computationintensive and require

Artificial Intelligence (AI) and Machine Learning (ML) techniques have come a long way since their inception and have been used to build intelligent systems for a wide range of applications in everyday life. However they are very computationintensive and require transfer of large volume of data from memory to the computation units. This memory access time constitute significant part of the computational latency and a performance bottleneck. To address this limitation and the ever-growing demand for implementation in hand-held and edge-devices, In-memory computing (IMC) based AI/ML hardware accelerators have emerged. First, the dissertation presents an IMC static random access memory (SRAM) based hardware modeling and optimization framework. A unified systematic study closely models the IMC hardware, and investigates how a number of design variables and non-idealities (e.g. device mismatch and ADC quantization) affect the Deep Neural Network (DNN) accuracy of the IMC design. The framework allows co-optimized selection of different design variables accounting for sources of noise in IMC hardware and robust implementation of a high accuracy DNN. Next, it presents a kNN hardware accelerator in 65nm Complementary Metal-Oxide-Semiconductor (CMOS) technology. The accelerator combines an IMC SRAM that is developed for binarized deep neural networks and other digital hardware that performs top-k sorting. The simulated k Nearest Neighbor accelerator design processes up to 17.9 million query vectors per second while consuming 11.8 mW, demonstrating >4.8× energy-efficiency improvement over prior works. This dissertation also presents a novel floating-point precision IMC (FP-IMC) macro with a hybrid architecture that configurably supports two Floating Point (FP) precisions. Implementing FP precision MAC has been a challenge owing to its complexity. The design is implemented on 28nm CMOS, and taped-out on chip demonstrating 12.1 TFLOPS/W and 66.1 TFLOPS/W for 8-bit Floating Point (FP8) and Block Floating point (BF8) respectively. Finally, another iteration of the FP design is presented that is modeled to support multiple precision modes from FP8 up to FP32. Two approaches to the architectural design were compared illustrating the throughput-area overhead trade-off. The simulated design shows a 2.1 × normalized energy-efficiency compared to the on-chip implementation of the FP-IMC.
Date Created
2023
Agent

System Solutions Towards High-Precision Visual Computing at Low Power

189373-Thumbnail Image.png
Description
Efficient visual sensing plays a pivotal role in enabling high-precision applications in augmented reality and low-power Internet of Things (IoT) devices. This dissertation addresses the primary challenges that hinder energy efficiency in visual sensing: the bottleneck of pixel traffic across

Efficient visual sensing plays a pivotal role in enabling high-precision applications in augmented reality and low-power Internet of Things (IoT) devices. This dissertation addresses the primary challenges that hinder energy efficiency in visual sensing: the bottleneck of pixel traffic across camera and memory interfaces and the energy-intensive analog readout process in image sensors. To overcome the bottleneck of pixel traffic, this dissertation proposes a visual sensing pipeline architecture that enables application developers to dynamically adapt the spatial resolution and update rates for specific regions within the scene. By selectively capturing and processing high-resolution frames only where necessary, the system significantly reduces energy consumption associated with memory traffic. This is achieved by encoding only the relevant pixels from the commercial image sensors with standard raster-scan pixel read-out patterns, thus minimizing the data stored in memory. The stored rhythmic pixel region stream is decoded into traditional frame-based representations, enabling seamless integration into existing video pipelines. Moreover, the system includes runtime support that allows flexible specification of the region labels, giving developers fine-grained control over the resolution adaptation process. Experimental evaluations conducted on a Xilinx Field Programmable Gate Array (FPGA) platform demonstrate substantial reductions of 43-64% in interface traffic, while maintaining controllable task accuracy. In addition to the pixel traffic bottleneck, the dissertation tackles the energy intensive analog readout process in image sensors. To address this, the dissertation proposes aggressive scaling of the analog voltage supplied to the camera. Extensive characterization on off-the-shelf sensors demonstrates that analog voltage scaling can significantly reduce sensor power, albeit at the expense of image quality. To mitigate this trade-off, this research develops a pipeline that allows application developers to adapt the sensor voltage on a frame-by-frame basis. A voltage controller is integrated into the existing Raspberry Pi (RPi) based video streaming pipeline, generating the sensor voltage. On top of that, the system provides a software interface for vision applications to specify the desired voltage levels. Evaluation of the system across a range of voltage scaling policies on popular vision tasks demonstrates that the technique can deliver up to 73% sensor power savings while maintaining reasonable task fidelity.
Date Created
2023
Agent

On Addressing Security Issues in Edge Neural Network Systems

189327-Thumbnail Image.png
Description
In recent years, the proliferation of deep neural networks (DNNs) has revolutionized the field of artificial intelligence, enabling advancements in various domains. With the emergence of efficient learning techniques such as quantization and distributed learning, DNN systems have become increasingly

In recent years, the proliferation of deep neural networks (DNNs) has revolutionized the field of artificial intelligence, enabling advancements in various domains. With the emergence of efficient learning techniques such as quantization and distributed learning, DNN systems have become increasingly accessible for deployment on edge devices. This accessibility brings significant benefits, including real-time inference on the edge, which mitigates communication latency, and on-device learning, which addresses privacy concerns and enables continuous improvement. However, the resource limitations of edge devices pose challenges in equipping them with robust safety protocols, making them vulnerable to various attacks. Two notable attacks that affect edge DNN systems are Bit-Flip Attacks (BFA) and architecture stealing attacks. BFA compromises the integrity of DNN models, while architecture stealing attacks aim to extract valuable intellectual property by reverse engineering the model's architecture. Furthermore, in Split Federated Learning (SFL) scenarios, where training occurs on distributed edge devices, Model Inversion (MI) attacks can reconstruct clients' data, and Model Extraction (ME) attacks can extract sensitive model parameters. This thesis aims to address these four attack scenarios and develop effective defense mechanisms. To defend against BFA, both passive and active defensive strategies are discussed. Furthermore, for both model inference and training, architecture stealing attacks are mitigated through novel defense techniques, ensuring the integrity and confidentiality of edge DNN systems. In the context of SFL, the thesis showcases defense mechanisms against MI attacks for both supervised and self-supervised learning applications. Additionally, the research investigates ME attacks in SFL and proposes countermeasures to enhance resistance against potential ME attackers. By examining and addressing these attack scenarios, this research contributes to the security and privacy enhancement of edge DNN systems. The proposed defense mechanisms enable safer deployment of DNN models on resource-constrained edge devices, facilitating the advancement of real-time applications, preserving data privacy, and fostering the widespread adoption of edge computing technologies.
Date Created
2023
Agent

Improving the Programmability of a Systolic Array Processor

171954-Thumbnail Image.png
Description
This thesis presents a code generation tool to improve the programmability of systolic array processors such as the Domain Adaptive Processor (DAP) that was designed by researchers at the University of Michigan for wireless communication workloads. Unlike application-specific integrated circuits,

This thesis presents a code generation tool to improve the programmability of systolic array processors such as the Domain Adaptive Processor (DAP) that was designed by researchers at the University of Michigan for wireless communication workloads. Unlike application-specific integrated circuits, DAP aims to achieve high performance without trading off much on programmability and reconfigurability. The structure of a typical DAP code for each Processing Element (PE) is very different from any other programming language format. As a result, writing code for DAP requires the programmer to acquire processor-specific knowledge including configuration rules, cycle accurate execution state for memory and datapath components within each PE, etc. Each code must be carefully handcrafted to meet the strict timing and resource constraints, leading to very long programming times and low productivity. In this thesis, a code generation and optimization tool is introduced to improve the programmability of DAP and make code development easier. The tool consists of a configuration code generator, optimizer, and a scheduler. An Instruction Set Architecture (ISA) has been designed specifically for DAP. The programmer writes the assembly code for each PE using the DAP ISA. The assembly code is then translated into a low-level configuration code. This configuration code undergoes several optimizations passes. Level 1 (L1) optimization handles instruction redundancy and performs loop optimizations through code movement. The Level 2 (L2) optimization performs instruction-level parallelism. Use of L1 and L2 optimization passes result in a code that has fewer instructions and requires fewer cycles. In addition, a scheduling tool has been introduced which performs final timing adjustments on the code to match the input data rate.
Date Created
2022
Agent

Modeling of Intel’s Advanced Interface Bus and its Evaluation on a Chiplet-based System

171924-Thumbnail Image.png
Description
Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous

Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous multi-chiplet systems provide high performance at minimum fabrication cost. The main challenge is to interconnect these chiplets while keeping the power and performance closer to monolithic grade. Intel’s Advanced Interface Bus (AIB) is a short reach interface that offers high bandwidth, power efficient, low latency, and cost effective on-package connectivity between chiplets. It supports flexible interconnection of the chiplets with high speed data transfer. Specifically, it is a die to die parallel interface implemented with multiple configurable channels, routed between micro-bumps. In this work, the AIB model is synthesized in 65nm technology node and a performancemodel is generated. This model generates area, power and latency results for multiple technology nodes using technology scaling methods. For all nodes, the area, power and latency values increase linearly with frequency and number of channels. The bandwidth also increases linearly with the number of input/output lanes, which is a function of the micro-bump pitch. Next, the AIB performance model is integrated with the benchmarking simulator, Scalable In-Memory Acceleration With Mesh (SIAM), to realize a scalable chipletbased end-to-end system. The Ground-Referenced Signaling (GRS) driver model in SIAM is replaced with the AIB model and an end-to-end evaluation of Deep Neural Network (DNN) performance is carried out for two contemporary DNN models. Comparative analysis between SIAM with GRS and SIAM with AIB show that while the area of AIB transmitter is less compared to GRS transmitter, the AIB transmitter offers higher bandwidth than GRS transmitter at the expense of higher energy. Furthermore, SIAM with AIB provides more realistic timing numbers since the NoP driver latency is also taken into consideration.
Date Created
2022
Agent

Exploration of Security and Privacy Challenges through Adversarial Weight Perturbation in Deep Learning Models

171895-Thumbnail Image.png
Description
Adversarial threats of deep learning are increasingly becoming a concern due to the ubiquitous deployment of deep neural networks(DNNs) in many security-sensitive domains. Among the existing threats, adversarial weight perturbation is an emerging class of threats that attempts to perturb

Adversarial threats of deep learning are increasingly becoming a concern due to the ubiquitous deployment of deep neural networks(DNNs) in many security-sensitive domains. Among the existing threats, adversarial weight perturbation is an emerging class of threats that attempts to perturb the weight parameters of DNNs to breach security and privacy.In this thesis, the first weight perturbation attack introduced is called Bit-Flip Attack (BFA), which can maliciously flip a small number of bits within a computer’s main memory system storing the DNN weight parameter to achieve malicious objectives. Our developed algorithm can achieve three specific attack objectives: I) Un-targeted accuracy degradation attack, ii) Targeted attack, & iii) Trojan attack. Moreover, BFA utilizes the rowhammer technique to demonstrate the bit-flip attack in an actual computer prototype. While the bit-flip attack is conducted in a white-box setting, the subsequent contribution of this thesis is to develop another novel weight perturbation attack in a black-box setting. Consequently, this thesis discusses a new study of DNN model vulnerabilities in a multi-tenant Field Programmable Gate Array (FPGA) cloud under a strict black-box framework. This newly developed attack framework injects faults in the malicious tenant by duplicating specific DNN weight packages during data transmission between off-chip memory and on-chip buffer of a victim FPGA. The proposed attack is also experimentally validated in a multi-tenant cloud FPGA prototype. In the final part, the focus shifts toward deep learning model privacy, popularly known as model extraction, that can steal partial DNN weight parameters remotely with the aid of a memory side-channel attack. In addition, a novel training algorithm is designed to utilize the partially leaked DNN weight bit information, making the model extraction attack more effective. The algorithm effectively leverages the partial leaked bit information and generates a substitute prototype of the victim model with almost identical performance to the victim.
Date Created
2022
Agent

Bayesian Filtering and Smoothing for Tracking in High Noise and Clutter Environments

171768-Thumbnail Image.png
Description
Object tracking refers to the problem of estimating a moving object's time-varying parameters that are indirectly observed in measurements at each time step. Increased noise and clutter in the measurements reduce estimation accuracy as they increase the uncertainty of tracking

Object tracking refers to the problem of estimating a moving object's time-varying parameters that are indirectly observed in measurements at each time step. Increased noise and clutter in the measurements reduce estimation accuracy as they increase the uncertainty of tracking in the field of view. Whereas tracking is performed using a Bayesian filter, a Bayesian smoother can be utilized to refine parameter state estimations that occurred before the current time. In practice, smoothing can be widely used to improve state estimation or correct data association errors, and it can lead to significantly better estimation performance as it reduces the impact of noise and clutter. In this work, a single object tracking method is proposed based on integrating Kalman filtering and smoothing with thresholding to remove unreliable measurements. As the new method is effective when the noise and clutter in the measurements are high, the main goal is to find these measurements using a moving average filter and a thresholding method to improve estimation. Thus, the proposed method is designed to reduce estimation errors that result from measurements corrupted with high noise and clutter. Simulations are provided to demonstrate the improved performance of the new method when compared to smoothing without thresholding. The root-mean-square error in estimating the object state parameters is shown to be especially reduced under high noise conditions.
Date Created
2022
Agent