Continual Learning with Novelty Detection: Algorithms and Applications to Image and Microelectronic Design

193492-Thumbnail Image.png
Description
Machine learning techniques have found extensive application in dynamic fields like drones, self-driving vehicles, surveillance, and more. Their effectiveness stems from meticulously crafted deep neural networks (DNNs), extensive data gathering efforts, and resource-intensive model training processes. However, due to the

Machine learning techniques have found extensive application in dynamic fields like drones, self-driving vehicles, surveillance, and more. Their effectiveness stems from meticulously crafted deep neural networks (DNNs), extensive data gathering efforts, and resource-intensive model training processes. However, due to the unpredictable nature of the environment, these systems will inevitably encounter input samples that deviate from the distribution of their original training data, resulting in instability and performance degradation.To effectively detect the emergence of out-of-distribution (OOD) data, this dissertation first proposes a novel, self-supervised approach that evaluates the Mahalanobis distance between the in-distribution (ID) and OOD in gradient space. A binary classifier is then introduced to guide the label selection for gradients calculation, which further boosts the detection performance. Next, to continuously adapt the new OOD into the existing knowledge base, an unified framework for novelty detection and continual learning is proposed. The binary classifier, trained to distinguish OOD data from ID, is connected sequentially with the pre-trained model to form a “N + 1” classifier, where “N” represents prior knowledge which contains N classes and “1” refers to the newly arrival OOD. This continual learning process continues as “N+1+1+1+...”, assimilating the knowledge of each new OOD instance into the system. Finally, this dissertation demonstrates the practical implementation of novelty detection and continual learning within the domain of thermal analysis. To rapidly address the impact of voids in thermal interface material (TIM), a continuous adaptation approach is proposed, which integrates trainable nodes into the graph at the locations where abnormal thermal behaviors are detected. With minimal training overhead, the model can quickly adapts to the change caused by the defects and regenerate accurate thermal prediction. In summary, this dissertation proposes several algorithms and practical applications in continual learning aimed at enhancing the stability and adaptability of the system. All proposed algorithms are validated through extensive experiments conducted on benchmark datasets such as CIFAR-10, CIFAR-100, TinyImageNet for continual learning, and real thermal data for thermal analysis.
Date Created
2024
Agent

In-Memory Computing Based Hardware Accelerators Advancing the Implementation of AI/ML Tasks in Edge Devices

190780-Thumbnail Image.png
Description
Artificial Intelligence (AI) and Machine Learning (ML) techniques have come a long way since their inception and have been used to build intelligent systems for a wide range of applications in everyday life. However they are very computationintensive and require

Artificial Intelligence (AI) and Machine Learning (ML) techniques have come a long way since their inception and have been used to build intelligent systems for a wide range of applications in everyday life. However they are very computationintensive and require transfer of large volume of data from memory to the computation units. This memory access time constitute significant part of the computational latency and a performance bottleneck. To address this limitation and the ever-growing demand for implementation in hand-held and edge-devices, In-memory computing (IMC) based AI/ML hardware accelerators have emerged. First, the dissertation presents an IMC static random access memory (SRAM) based hardware modeling and optimization framework. A unified systematic study closely models the IMC hardware, and investigates how a number of design variables and non-idealities (e.g. device mismatch and ADC quantization) affect the Deep Neural Network (DNN) accuracy of the IMC design. The framework allows co-optimized selection of different design variables accounting for sources of noise in IMC hardware and robust implementation of a high accuracy DNN. Next, it presents a kNN hardware accelerator in 65nm Complementary Metal-Oxide-Semiconductor (CMOS) technology. The accelerator combines an IMC SRAM that is developed for binarized deep neural networks and other digital hardware that performs top-k sorting. The simulated k Nearest Neighbor accelerator design processes up to 17.9 million query vectors per second while consuming 11.8 mW, demonstrating >4.8× energy-efficiency improvement over prior works. This dissertation also presents a novel floating-point precision IMC (FP-IMC) macro with a hybrid architecture that configurably supports two Floating Point (FP) precisions. Implementing FP precision MAC has been a challenge owing to its complexity. The design is implemented on 28nm CMOS, and taped-out on chip demonstrating 12.1 TFLOPS/W and 66.1 TFLOPS/W for 8-bit Floating Point (FP8) and Block Floating point (BF8) respectively. Finally, another iteration of the FP design is presented that is modeled to support multiple precision modes from FP8 up to FP32. Two approaches to the architectural design were compared illustrating the throughput-area overhead trade-off. The simulated design shows a 2.1 × normalized energy-efficiency compared to the on-chip implementation of the FP-IMC.
Date Created
2023
Agent

Deep Learning based Air Quality Prediction with Time Series Sensor Data

Description
ABSTRACT With the fast development of industry, it brings indelible pollution to the natural environment. As a consequence, the air quality is getting worse which will seriously affect people's health. With such concern, continuous air quality monitoring and

ABSTRACT With the fast development of industry, it brings indelible pollution to the natural environment. As a consequence, the air quality is getting worse which will seriously affect people's health. With such concern, continuous air quality monitoring and prediction are necessary. Traditional air quality monitoring methods cannot use large amount of historical data to make accurate predic-tions. Moreover, the traditional prediction method can only roughly predict the air quality level in a short time. With the development of artificial intelligence al-gorithms [1] and high performance computing, the latest mathematical methods and algorithms are able to generate much more accurate predictions based on long term past data. In this master thesis project, it explore to develop deep learning based air quality prediction based on real sensor network time series air quality data from STAIR system [3].
Date Created
2023
Agent

Enabling Deep Learning at Edge: From Efficient and Dynamic Inference to On-Device Learning

189353-Thumbnail Image.png
Description
In recent years, Artificial Intelligence (AI) (e.g., Deep Neural Networks (DNNs), Transformer) has shown great success in real-world applications due to its superior performance in various cognitive tasks. The impressive performance achieved by AI models normally accompanies the cost of

In recent years, Artificial Intelligence (AI) (e.g., Deep Neural Networks (DNNs), Transformer) has shown great success in real-world applications due to its superior performance in various cognitive tasks. The impressive performance achieved by AI models normally accompanies the cost of enormous model size and high computational complexity, which significantly hampers their implementation on resource-limited Cyber-Physical Systems (CPS), Internet-of-Things (IoT), or Edge systems due to their tightly constrained energy, computing, size, and memory budget. Thus, the urgent demand for enhancing the \textbf{Efficiency} of DNN has drawn significant research interests across various communities. Motivated by the aforementioned concerns, this doctoral research has been mainly focusing on Enabling Deep Learning at Edge: From Efficient and Dynamic Inference to On-Device Learning. Specifically, from the inference perspective, this dissertation begins by investigating a hardware-friendly model compression method that effectively reduces the size of AI model while simultaneously achieving improved speed on edge devices. Additionally, due to the fact that diverse resource constraints of different edge devices, this dissertation further explores dynamic inference, which allows for real-time tuning of inference model size, computation, and latency to accommodate the limitations of each edge device. Regarding efficient on-device learning, this dissertation starts by analyzing memory usage during transfer learning training. Based on this analysis, a novel framework called "Reprogramming Network'' (Rep-Net) is introduced that offers a fresh perspective on the on-device transfer learning problem. The Rep-Net enables on-device transferlearning by directly learning to reprogram the intermediate features of a pre-trained model. Lastly, this dissertation studies an efficient continual learning algorithm that facilitates learning multiple tasks without the risk of forgetting previously acquired knowledge. In practice, through the exploration of task correlation, an interesting phenomenon is observed that the intermediate features are highly correlated between tasks with the self-supervised pre-trained model. Building upon this observation, a novel approach called progressive task-correlated layer freezing is proposed to gradually freeze a subset of layers with the highest correlation ratios for each task leading to training efficiency.
Date Created
2023
Agent

On Addressing Security Issues in Edge Neural Network Systems

189327-Thumbnail Image.png
Description
In recent years, the proliferation of deep neural networks (DNNs) has revolutionized the field of artificial intelligence, enabling advancements in various domains. With the emergence of efficient learning techniques such as quantization and distributed learning, DNN systems have become increasingly

In recent years, the proliferation of deep neural networks (DNNs) has revolutionized the field of artificial intelligence, enabling advancements in various domains. With the emergence of efficient learning techniques such as quantization and distributed learning, DNN systems have become increasingly accessible for deployment on edge devices. This accessibility brings significant benefits, including real-time inference on the edge, which mitigates communication latency, and on-device learning, which addresses privacy concerns and enables continuous improvement. However, the resource limitations of edge devices pose challenges in equipping them with robust safety protocols, making them vulnerable to various attacks. Two notable attacks that affect edge DNN systems are Bit-Flip Attacks (BFA) and architecture stealing attacks. BFA compromises the integrity of DNN models, while architecture stealing attacks aim to extract valuable intellectual property by reverse engineering the model's architecture. Furthermore, in Split Federated Learning (SFL) scenarios, where training occurs on distributed edge devices, Model Inversion (MI) attacks can reconstruct clients' data, and Model Extraction (ME) attacks can extract sensitive model parameters. This thesis aims to address these four attack scenarios and develop effective defense mechanisms. To defend against BFA, both passive and active defensive strategies are discussed. Furthermore, for both model inference and training, architecture stealing attacks are mitigated through novel defense techniques, ensuring the integrity and confidentiality of edge DNN systems. In the context of SFL, the thesis showcases defense mechanisms against MI attacks for both supervised and self-supervised learning applications. Additionally, the research investigates ME attacks in SFL and proposes countermeasures to enhance resistance against potential ME attackers. By examining and addressing these attack scenarios, this research contributes to the security and privacy enhancement of edge DNN systems. The proposed defense mechanisms enable safer deployment of DNN models on resource-constrained edge devices, facilitating the advancement of real-time applications, preserving data privacy, and fostering the widespread adoption of edge computing technologies.
Date Created
2023
Agent

Analog-based Neural Network Implementation Using Hexagonal Boron Nitride Memristors

187773-Thumbnail Image.png
Description
Resistive random-access memory (RRAM) or memristor, is an emerging technology used in neuromorphic computing to exceed the traditional von Neumann obstacle by merging the processing and memory units. Two-dimensional (2D) materials with non-volatile switching behavior can be used as the

Resistive random-access memory (RRAM) or memristor, is an emerging technology used in neuromorphic computing to exceed the traditional von Neumann obstacle by merging the processing and memory units. Two-dimensional (2D) materials with non-volatile switching behavior can be used as the switching layer of RRAMs, exhibiting superior behavior compared to conventional oxide-based RRAMs. The use of 2D materials allows scaling the resistive switching layer thickness to sub-nanometer dimensions enabling devices to operate with low switching voltages and high programming speeds, offering large improvements in efficiency and performance as well as ultra-dense integration. This dissertation presents an extensive study of linear and logistic regression algorithms implemented with 1-transistor-1-resistor (1T1R) memristor crossbars arrays. For this task, a simulation platform is used that wraps circuit-level simulations of 1T1R crossbars and physics-based model of RRAM to elucidate the impact of device variability on algorithm accuracy, convergence rate, and precision. Moreover, a smart pulsing strategy is proposed for the practical implementation of synaptic weight updates that can accelerate training in real crossbar architectures. Next, this dissertation reports on the hardware implementation of analog dot-product operation on arrays of 2D hexagonal boron nitride (h-BN) memristors. This extends beyond previous work that studied isolated device characteristics towards the application of analog neural network accelerators based on 2D memristor arrays. The wafer-level fabrication of the memristor arrays is enabled by large-area transfer of CVD-grown few-layer h-BN films. The dot-product operation shows excellent linearity and repeatability, with low read energy consumption, with minimal error and deviation over various measurement cycles. Moreover, the successful implementation of a stochastic linear and logistic regression algorithm in 2D h-BN memristor hardware is presented for the classification of noisy images. Additionally, the electrical performance of novel 2D h-BN memristor for SNN applications is extensively investigated. Then, using the experimental behavior of the h-BN memristor as the artificial synapse, an unsupervised spiking neural network (SNN) is simulated for the image classification task. A novel and simple Spike-Timing-Dependent-Plasticity (STDP)-based dropout technique is presented to enhance the recognition task of the h-BN memristor-based SNN.
Date Created
2023
Agent

Modeling of Intel’s Advanced Interface Bus and its Evaluation on a Chiplet-based System

171924-Thumbnail Image.png
Description
Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous

Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous multi-chiplet systems provide high performance at minimum fabrication cost. The main challenge is to interconnect these chiplets while keeping the power and performance closer to monolithic grade. Intel’s Advanced Interface Bus (AIB) is a short reach interface that offers high bandwidth, power efficient, low latency, and cost effective on-package connectivity between chiplets. It supports flexible interconnection of the chiplets with high speed data transfer. Specifically, it is a die to die parallel interface implemented with multiple configurable channels, routed between micro-bumps. In this work, the AIB model is synthesized in 65nm technology node and a performancemodel is generated. This model generates area, power and latency results for multiple technology nodes using technology scaling methods. For all nodes, the area, power and latency values increase linearly with frequency and number of channels. The bandwidth also increases linearly with the number of input/output lanes, which is a function of the micro-bump pitch. Next, the AIB performance model is integrated with the benchmarking simulator, Scalable In-Memory Acceleration With Mesh (SIAM), to realize a scalable chipletbased end-to-end system. The Ground-Referenced Signaling (GRS) driver model in SIAM is replaced with the AIB model and an end-to-end evaluation of Deep Neural Network (DNN) performance is carried out for two contemporary DNN models. Comparative analysis between SIAM with GRS and SIAM with AIB show that while the area of AIB transmitter is less compared to GRS transmitter, the AIB transmitter offers higher bandwidth than GRS transmitter at the expense of higher energy. Furthermore, SIAM with AIB provides more realistic timing numbers since the NoP driver latency is also taken into consideration.
Date Created
2022
Agent

Exploration of Security and Privacy Challenges through Adversarial Weight Perturbation in Deep Learning Models

171895-Thumbnail Image.png
Description
Adversarial threats of deep learning are increasingly becoming a concern due to the ubiquitous deployment of deep neural networks(DNNs) in many security-sensitive domains. Among the existing threats, adversarial weight perturbation is an emerging class of threats that attempts to perturb

Adversarial threats of deep learning are increasingly becoming a concern due to the ubiquitous deployment of deep neural networks(DNNs) in many security-sensitive domains. Among the existing threats, adversarial weight perturbation is an emerging class of threats that attempts to perturb the weight parameters of DNNs to breach security and privacy.In this thesis, the first weight perturbation attack introduced is called Bit-Flip Attack (BFA), which can maliciously flip a small number of bits within a computer’s main memory system storing the DNN weight parameter to achieve malicious objectives. Our developed algorithm can achieve three specific attack objectives: I) Un-targeted accuracy degradation attack, ii) Targeted attack, & iii) Trojan attack. Moreover, BFA utilizes the rowhammer technique to demonstrate the bit-flip attack in an actual computer prototype. While the bit-flip attack is conducted in a white-box setting, the subsequent contribution of this thesis is to develop another novel weight perturbation attack in a black-box setting. Consequently, this thesis discusses a new study of DNN model vulnerabilities in a multi-tenant Field Programmable Gate Array (FPGA) cloud under a strict black-box framework. This newly developed attack framework injects faults in the malicious tenant by duplicating specific DNN weight packages during data transmission between off-chip memory and on-chip buffer of a victim FPGA. The proposed attack is also experimentally validated in a multi-tenant cloud FPGA prototype. In the final part, the focus shifts toward deep learning model privacy, popularly known as model extraction, that can steal partial DNN weight parameters remotely with the aid of a memory side-channel attack. In addition, a novel training algorithm is designed to utilize the partially leaked DNN weight bit information, making the model extraction attack more effective. The algorithm effectively leverages the partial leaked bit information and generates a substitute prototype of the victim model with almost identical performance to the victim.
Date Created
2022
Agent

Energy Efficient ASIC/FPGA Neural Network Accelerators

171744-Thumbnail Image.png
Description
Convolutional neural networks(CNNs) achieve high accuracy on large datasets but requires significant computation and storage requirement for training/testing. While many applications demand low latency and energy-efficient processing of the images, deploying these complex algorithms on the hardware is a challenging

Convolutional neural networks(CNNs) achieve high accuracy on large datasets but requires significant computation and storage requirement for training/testing. While many applications demand low latency and energy-efficient processing of the images, deploying these complex algorithms on the hardware is a challenging task. This dissertation first presents a compiler-based CNN training accelerator using DDR3 and HBM2 memory. An optimized RTL library is implemented to perform training-specific tasks and an RTL compiler is developed to generate FPGA-synthesizable RTL based on user-defined constraints. High Bandwidth Memory(HBM) provides efficient off-chip communication and improves the training performance. The impact of HBM2 on CNN training workloads is analyzed and compressively compared with DDR3. For training ResNet-20/VGG-like CNNs for the CIFAR-10 dataset, the proposed CNN training accelerator on Stratix-10 GX FPGA(DDR3) demonstrates 479 GOPS performance, and on Stratix-10 MX FPGA(HBM) shows 4.5/9.7 X energy-efficiency improvement compared to Tesla V100 GPU. Next, the FPGA online learning accelerator is presented. Adopting model segmentation techniques from Progressive Segmented Training(PST), the online learning accelerator achieved a 4.2X reduction in training latency. Furthermore, this dissertation presents an 8-bit floating-point (FP8) training processor which implements (1) Highly parallel tensor cores that maintain high PE utilization, (2) Hardware-efficient channel gating for dynamic output activation sparsity (3) Dynamic weight sparsity based on group Lasso (4) Gradient skipping based on FP prediction error. The 28nm prototype chip demonstrates significant improvements in FLOPs reduction (7.3×), energy efficiency (16.4 TFLOPS/W), and overall training latency speedup (4.7×) for both supervised training and self-supervised training tasks. In addition to the training accelerators, this dissertation also presents a CNN inference accelerator on ASIC(FixyNN) and FPGA(FixyFPGA). FixyNN consists of a fixed-weight feature extractor that generates ubiquitous CNN features and a conventional programmable CNN accelerator. In the fixed-weight feature extractor, the network weights are hard-coded into hardware and used as a fixed operand for the multiplication. Experimental results demonstrate FixyNN can achieve very high energy efficiencies up to 26.6 TOPS/W, and FixyFPGA achieves $2.34\times$ higher GOPS on ImageNet classification. In summary, this dissertation comprehensively discusses novel architectures of high-performance and energy-efficient ASIC/FPGA CNN inference/training accelerators.
Date Created
2022
Agent

Energy-Efficient In-Memory Acceleration of Deep Neural Networks Through a Hardware-Software Co-Design Approach

171380-Thumbnail Image.png
Description
Deep neural networks (DNNs), as a main-stream algorithm for various AI tasks, achieve higher accuracy at the cost of increased computational complexity and model size, posing great challenges to hardware platforms. This dissertation first tackles the design challenges of resistive

Deep neural networks (DNNs), as a main-stream algorithm for various AI tasks, achieve higher accuracy at the cost of increased computational complexity and model size, posing great challenges to hardware platforms. This dissertation first tackles the design challenges of resistive random-access-memory (RRAM) based in-memory computing (IMC) architectures. A new metric, model stability from the loss landscape, is proposed to help shed light on accuracy under variations and model compression and guide a novel variation-aware training (VAT) solution. The proposed method effectively improves post-mapping accuracy of multiple datasets. Next, a hybrid RRAM/SRAM IMC DNN inference accelerator is developed, that integrates an RRAM-based IMC macro, a reconfigurable SRAM-based multiply-accumulate (MAC) macro, and a programmable shifter. The hybrid IMC accelerator fully recovers the inference accuracy post the mapping. Furthermore, this dissertation researches on architectural optimizations for high IMC utilization, low on-chip communication cost, and low energy-delay product (EDP), including on-chip interconnect design, PE array utilization, and tile-to-router mapping and scheduling. The optimal choice of on-chip interconnect results in up to 6x improvement in energy-delay-area product for RRAM IMC architectures. Furthermore, the PE and NoC optimizations show up to 62% improvement in PE utilization, 78% reduction in area, and 78% lower energy-area product for a wide range of modern DNNs. Finally, this dissertation proposes a novel chiplet-based IMC benchmarking simulator, SIAM, and a heterogeneous chiplet IMC architecture to address the limitations of a monolithic DNN accelerator. SIAM utilizes model-based and cycle-accurate simulation to provide a scalable and flexible architecture. SIAM is calibrated against a published silicon result, SIMBA, from Nvidia. The heterogeneous architecture utilizes a custom mapping with a bank of big and little chiplets, and a hybrid network-on-package (NoP) to optimize the utilization, interconnect bandwidth, and energy efficiency. The proposed big-little chiplet-based RRAM IMC architecture significantly improves energy efficiency at lower area, compared to conventional GPUs. In summary, this dissertation comprehensively investigates novel methods that encompass device, circuits, architecture, packaging, and algorithm to design scalable high-performance and energy-efficient IMC architectures.
Date Created
2022
Agent