Search Content

Distributed Learning and Adaptive Algorithms for Edge Networks

Description

Edge networks pose unique challenges for machine learning and network management. The primary objective of this dissertation is to study deep learning and adaptive control aspects of edge networks and to address some of the unique challenges therein. This dissertation explores four particular problems of interest at the intersection of…

Edge networks pose unique challenges for machine learning and network management. The primary objective of this dissertation is to study deep learning and adaptive control aspects of edge networks and to address some of the unique challenges therein. This dissertation explores four particular problems of interest at the intersection of edge intelligence, deep learning and network management. The first problem explores the learning of generative models in edge learning setting. Since the learning tasks in similar environments share model similarity, it is plausible to leverage pre-trained generative models from other edge nodes. Appealing to optimal transport theory tailored towards Wasserstein-1 generative adversarial networks, this part aims to develop a framework which systematically optimizes the generative model learning performance using local data at the edge node while exploiting the adaptive coalescence of pre-trained generative models from other nodes. In the second part, a many-to-one wireless architecture for federated learning at the network edge, where multiple edge devices collaboratively train a model using local data, is considered. The unreliable nature of wireless connectivity, togetherwith the constraints in computing resources at edge devices, dictates that the local updates at edge devices should be carefully crafted and compressed to match the wireless communication resources available and should work in concert with the receiver. Therefore, a Stochastic Gradient Descent based bandlimited coordinate descent algorithm is designed for such settings. The third part explores the adaptive traffic engineering algorithms in a dynamic network environment. The ages of traffic measurements exhibit significant variation due to asynchronization and random communication delays between routers and controllers. Inspired by the software defined networking architecture, a controller-assisted distributed routing scheme with recursive link weight reconfigurations, accounting for the impact of measurement ages and routing instability, is devised. The final part focuses on developing a federated learning based framework for traffic reshaping of electric vehicle (EV) charging. The absence of private EV owner information and scattered EV charging data among charging stations motivates the utilization of a federated learning approach. Federated learning algorithms are devised to minimize peak EV charging demand both spatially and temporarily, while maximizing the charging station profit.

ContributorsDedeoglu, Mehmet (Author) / Zhang, Junshan (Thesis advisor) / Kosut, Oliver (Committee member) / Zhang, Yanchao (Committee member) / Fan, Deliang (Committee member) / Arizona State University (Publisher)

Created2021

Accelerating Genome Quantification in FPGA

Description

The growth in speed and density of programmable logic devices, such as Field programmable gate arrays (FPGA), enables sophisticated designs to be created within a short time frame. The flexibility of a programmable device alleviates the difficulty of the integration of a design with a wide range of components on…

The growth in speed and density of programmable logic devices, such as Field programmable gate arrays (FPGA), enables sophisticated designs to be created within a short time frame. The flexibility of a programmable device alleviates the difficulty of the integration of a design with a wide range of components on a single chip. FPGAs bring both performance and power efficiency, especially for compute or data-intensive applications. Efficient and accurate mRNA quantification is an essential step for molecular signature identification, disease outcome prediction, and drug development, which is a typical compute- and data-intensive compute workload. In this work, I propose to accelerate mRNA quantification with FPGA implementation. I analyze the performance of mRNA Quantification with FPGA, which shows better or similar performance compared to that of CPU implementation.

ContributorsKim, Kiju (Author) / Fan, Deliang (Thesis advisor) / Cao, Kevin (Committee member) / Zhang, Wei (Committee member) / Arizona State University (Publisher)

Created2022

On Addressing Security Issues in Edge Neural Network Systems

Description

In recent years, the proliferation of deep neural networks (DNNs) has revolutionized the field of artificial intelligence, enabling advancements in various domains. With the emergence of efficient learning techniques such as quantization and distributed learning, DNN systems have become increasingly accessible for deployment on edge devices. This accessibility brings significant…

In recent years, the proliferation of deep neural networks (DNNs) has revolutionized the field of artificial intelligence, enabling advancements in various domains. With the emergence of efficient learning techniques such as quantization and distributed learning, DNN systems have become increasingly accessible for deployment on edge devices. This accessibility brings significant benefits, including real-time inference on the edge, which mitigates communication latency, and on-device learning, which addresses privacy concerns and enables continuous improvement. However, the resource limitations of edge devices pose challenges in equipping them with robust safety protocols, making them vulnerable to various attacks. Two notable attacks that affect edge DNN systems are Bit-Flip Attacks (BFA) and architecture stealing attacks. BFA compromises the integrity of DNN models, while architecture stealing attacks aim to extract valuable intellectual property by reverse engineering the model's architecture. Furthermore, in Split Federated Learning (SFL) scenarios, where training occurs on distributed edge devices, Model Inversion (MI) attacks can reconstruct clients' data, and Model Extraction (ME) attacks can extract sensitive model parameters. This thesis aims to address these four attack scenarios and develop effective defense mechanisms. To defend against BFA, both passive and active defensive strategies are discussed. Furthermore, for both model inference and training, architecture stealing attacks are mitigated through novel defense techniques, ensuring the integrity and confidentiality of edge DNN systems. In the context of SFL, the thesis showcases defense mechanisms against MI attacks for both supervised and self-supervised learning applications. Additionally, the research investigates ME attacks in SFL and proposes countermeasures to enhance resistance against potential ME attackers. By examining and addressing these attack scenarios, this research contributes to the security and privacy enhancement of edge DNN systems. The proposed defense mechanisms enable safer deployment of DNN models on resource-constrained edge devices, facilitating the advancement of real-time applications, preserving data privacy, and fostering the widespread adoption of edge computing technologies.

ContributorsLi, Jingtao (Author) / Chakrabarti, Chaitali (Thesis advisor) / Fan, Deliang (Committee member) / Cao, Yu (Committee member) / Trieu, Ni (Committee member) / Arizona State University (Publisher)

Created2023

Deep Learning based Air Quality Prediction with Time Series Sensor Data

Description

ABSTRACT With the fast development of industry, it brings indelible pollution to the natural environment. As a consequence, the air quality is getting worse which will seriously affect people's health. With such concern, continuous air quality monitoring and prediction are necessary. Traditional air quality monitoring methods cannot use…

ABSTRACT With the fast development of industry, it brings indelible pollution to the natural environment. As a consequence, the air quality is getting worse which will seriously affect people's health. With such concern, continuous air quality monitoring and prediction are necessary. Traditional air quality monitoring methods cannot use large amount of historical data to make accurate predic-tions. Moreover, the traditional prediction method can only roughly predict the air quality level in a short time. With the development of artificial intelligence al-gorithms [1] and high performance computing, the latest mathematical methods and algorithms are able to generate much more accurate predictions based on long term past data. In this master thesis project, it explore to develop deep learning based air quality prediction based on real sensor network time series air quality data from STAIR system [3].

ContributorsZhou, Zeming (Author) / Fan, Deliang (Thesis advisor) / Cao, Yu (Committee member) / Yu, Haofei (Committee member) / Arizona State University (Publisher)

Created2023

Energy Efficient ASIC/FPGA Neural Network Accelerators

Description

Convolutional neural networks(CNNs) achieve high accuracy on large datasets but requires significant computation and storage requirement for training/testing. While many applications demand low latency and energy-efficient processing of the images, deploying these complex algorithms on the hardware is a challenging task. This dissertation first presents a compiler-based CNN training accelerator…

Convolutional neural networks(CNNs) achieve high accuracy on large datasets but requires significant computation and storage requirement for training/testing. While many applications demand low latency and energy-efficient processing of the images, deploying these complex algorithms on the hardware is a challenging task. This dissertation first presents a compiler-based CNN training accelerator using DDR3 and HBM2 memory. An optimized RTL library is implemented to perform training-specific tasks and an RTL compiler is developed to generate FPGA-synthesizable RTL based on user-defined constraints. High Bandwidth Memory(HBM) provides efficient off-chip communication and improves the training performance. The impact of HBM2 on CNN training workloads is analyzed and compressively compared with DDR3. For training ResNet-20/VGG-like CNNs for the CIFAR-10 dataset, the proposed CNN training accelerator on Stratix-10 GX FPGA(DDR3) demonstrates 479 GOPS performance, and on Stratix-10 MX FPGA(HBM) shows 4.5/9.7 X energy-efficiency improvement compared to Tesla V100 GPU. Next, the FPGA online learning accelerator is presented. Adopting model segmentation techniques from Progressive Segmented Training(PST), the online learning accelerator achieved a 4.2X reduction in training latency. Furthermore, this dissertation presents an 8-bit floating-point (FP8) training processor which implements (1) Highly parallel tensor cores that maintain high PE utilization, (2) Hardware-efficient channel gating for dynamic output activation sparsity (3) Dynamic weight sparsity based on group Lasso (4) Gradient skipping based on FP prediction error. The 28nm prototype chip demonstrates significant improvements in FLOPs reduction (7.3×), energy efficiency (16.4 TFLOPS/W), and overall training latency speedup (4.7×) for both supervised training and self-supervised training tasks. In addition to the training accelerators, this dissertation also presents a CNN inference accelerator on ASIC(FixyNN) and FPGA(FixyFPGA). FixyNN consists of a fixed-weight feature extractor that generates ubiquitous CNN features and a conventional programmable CNN accelerator. In the fixed-weight feature extractor, the network weights are hard-coded into hardware and used as a fixed operand for the multiplication. Experimental results demonstrate FixyNN can achieve very high energy efficiencies up to 26.6 TOPS/W, and FixyFPGA achieves $2.34\times$ higher GOPS on ImageNet classification. In summary, this dissertation comprehensively discusses novel architectures of high-performance and energy-efficient ASIC/FPGA CNN inference/training accelerators.

ContributorsKolala Venkataramaniah, Shreyas (Author) / Seo, Jae-Sun (Thesis advisor) / Cao, Yu (Committee member) / Chakrabarti, Chaitali (Committee member) / Fan, Deliang (Committee member) / Arizona State University (Publisher)

Created2022

Modeling of Intel’s Advanced Interface Bus and its Evaluation on a Chiplet-based System

Description

Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous multi-chiplet systems provide high performance at minimum fabrication cost. The…

Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous multi-chiplet systems provide high performance at minimum fabrication cost. The main challenge is to interconnect these chiplets while keeping the power and performance closer to monolithic grade. Intel’s Advanced Interface Bus (AIB) is a short reach interface that offers high bandwidth, power efficient, low latency, and cost effective on-package connectivity between chiplets. It supports flexible interconnection of the chiplets with high speed data transfer. Specifically, it is a die to die parallel interface implemented with multiple configurable channels, routed between micro-bumps. In this work, the AIB model is synthesized in 65nm technology node and a performancemodel is generated. This model generates area, power and latency results for multiple technology nodes using technology scaling methods. For all nodes, the area, power and latency values increase linearly with frequency and number of channels. The bandwidth also increases linearly with the number of input/output lanes, which is a function of the micro-bump pitch. Next, the AIB performance model is integrated with the benchmarking simulator, Scalable In-Memory Acceleration With Mesh (SIAM), to realize a scalable chipletbased end-to-end system. The Ground-Referenced Signaling (GRS) driver model in SIAM is replaced with the AIB model and an end-to-end evaluation of Deep Neural Network (DNN) performance is carried out for two contemporary DNN models. Comparative analysis between SIAM with GRS and SIAM with AIB show that while the area of AIB transmitter is less compared to GRS transmitter, the AIB transmitter offers higher bandwidth than GRS transmitter at the expense of higher energy. Furthermore, SIAM with AIB provides more realistic timing numbers since the NoP driver latency is also taken into consideration.

ContributorsCHERIAN, NINOO SUSAN (Author) / Chakrabarti, Chaitali (Thesis advisor) / Cao, Yu (Committee member) / Fan, Deliang (Committee member) / Arizona State University (Publisher)

Created2022

Experimental Evaluation of the Feasibility of Wearable Piezoelectric Energy Harvesting

Description

Technological advances in low power wearable electronics and energy optimization techniques

make motion energy harvesting a viable energy source. However, it has not been

widely adopted due to bulky energy harvester designs that are uncomfortable to wear. This

work addresses this problem by analyzing the feasibility of powering low wearable power

devices using piezoelectric…

Technological advances in low power wearable electronics and energy optimization techniques

make motion energy harvesting a viable energy source. However, it has not been

widely adopted due to bulky energy harvester designs that are uncomfortable to wear. This

work addresses this problem by analyzing the feasibility of powering low wearable power

devices using piezoelectric energy generated at the human knee. We start with a novel

mathematical model for estimating the power generated from human knee joint movements.

This thesis’s major contribution is to analyze the feasibility of human motion energy harvesting

and validating this analytical model using a commercially available piezoelectric

module. To this end, we implemented an experimental setup that replicates a human knee.

Then, we performed experiments at different excitation frequencies and amplitudes with

two commercially available Macro Fiber Composite (MFC) modules. These experimental

results are used to validate the analytical model and predict the energy harvested as a function

of the number of steps taken in a day. The model estimates that 13μWcan be generated

on an average while walking with a 4.8% modeling error. The obtained results show that

piezoelectricity is indeed a viable approach for powering low-power wearable devices.

ContributorsBandyopadhyay, Shiva (Author) / Ogras, Umit Y. (Thesis advisor) / Fan, Deliang (Committee member) / Trichopoulos, Georgios (Committee member) / Arizona State University (Publisher)

Created2020

Efficient and Secure Deep Learning Inference System: A Software and Hardware Co-design Perspective

Description

The advances of Deep Learning (DL) achieved recently have successfully demonstrated its great potential of surpassing or close to human-level performance across multiple domains. Consequently, there exists a rising demand to deploy state-of-the-art DL algorithms, e.g., Deep Neural Networks (DNN), in real-world applications to release labors from repetitive work. On…

The advances of Deep Learning (DL) achieved recently have successfully demonstrated its great potential of surpassing or close to human-level performance across multiple domains. Consequently, there exists a rising demand to deploy state-of-the-art DL algorithms, e.g., Deep Neural Networks (DNN), in real-world applications to release labors from repetitive work. On the one hand, the impressive performance achieved by the DNN normally accompanies with the drawbacks of intensive memory and power usage due to enormous model size and high computation workload, which significantly hampers their deployment on the resource-limited cyber-physical systems or edge devices. Thus, the urgent demand for enhancing the inference efficiency of DNN has also great research interests across various communities. On the other hand, scientists and engineers still have insufficient knowledge about the principles of DNN which makes it mostly be treated as a black-box. Under such circumstance, DNN is like "the sword of Damocles" where its security or fault-tolerance capability is an essential concern which cannot be circumvented.

Motivated by the aforementioned concerns, this dissertation comprehensively investigates the emerging efficiency and security issues of DNNs, from both software and hardware design perspectives. From the efficiency perspective, as the foundation technique for efficient inference of target DNN, the model compression via quantization is elaborated. In order to maximize the inference performance boost, the deployment of quantized DNN on the revolutionary Computing-in-Memory based neural accelerator is presented in a cross-layer (device/circuit/system) fashion. From the security perspective, the well known adversarial attack is investigated spanning from its original input attack form (aka. Adversarial example generation) to its parameter attack variant.

Contributorshe, zhezhi (Author) / Fan, Deliang (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Cao, Yu (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2020

Processing-in-Memory for Data-Intensive Applications, From Device to Algorithm

Description

Over the past decades, the amount of data required to be processed and analyzed by computing systems has been increasing dramatically to exascale (10^18 bytes/s or ops). However, modern computing platforms' inability to deliver both energy-efficient and high-performance computing solutions leads to a gap between meets and needs, especially in…

Over the past decades, the amount of data required to be processed and analyzed by computing systems has been increasing dramatically to exascale (10^18 bytes/s or ops). However, modern computing platforms' inability to deliver both energy-efficient and high-performance computing solutions leads to a gap between meets and needs, especially in resource-constraint Internet of Things (IoT) devices. Unfortunately, such a gap will keep widening mainly due to limitations in both devices and architectures. With this motivation, this dissertation's focus is on cross-layer (device/circuit/architecture/application) co-design of energy-efficient and high-performance Processing-in-Memory (PIM) platforms for implementing complex big data applications, i.e., deep learning, bioinformatics, graph processing tasks, and data encryption. The dissertation shows how to leverage innovations from device, circuit, and architecture to integrate memory and logic to break the existing memory and power walls and dramatically increase computing efficiency of today’s non-Von-Neumann computing systems.The proposed PIM platforms transform current volatile and non-volatile random access memory arrays to computational units capable of working as both memory and low-area-overhead, massively parallel, fast, reconfigurable in-memory logic. Instead of integrating complex logic units in cost-sensitive memory, the explored designs exploit hardware-friendly bit-line computing methods to implement complete Boolean logic functions between operands within a memory array in a reduced clock cycle, overcoming the multi-cycle logic issue in modern PIM platforms. Besides, new customized in-memory algorithms and mapping methods are developed to convert the crucial iteratively-used big data application's functions to bit-wise PIM-supported logic. To quantitatively analyze the performance of various PIM platforms running big data applications, a generic and comprehensive evaluation framework is presented. The overall system computing performance (throughput, latency, energy efficiency) for each application is explored through the developed framework. The device-to-algorithm co-simulation results on neural network acceleration demonstrate that the proposed platforms can obtain 36.8× higher energy-efficiency and 22× speed-up compared to state-of-the-art Graphics Processing Unit (GPU). In accelerating bioinformatics tasks such as biological sequence alignment, the presented PIM designs result in ~2×, 43.8×, 458× more throughput per Watt compared to state-of-the-art Application-Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA), and GPU platforms, respectively.

ContributorsAngizi, Shaahin (Author) / Fan, Deliang (Thesis advisor) / Seo, Jae-Sun (Committee member) / Awad, Amro (Committee member) / Zhang, Wei (Committee member) / Arizona State University (Publisher)

Created2021

Machine Learning Assisted Security for Edge Computing Applications

Description

Edge computing applications have recently gained prominence as the world of internet-of-things becomes increasingly embedded into people's lives. Performing computations at the edge addresses multiple issues, such as memory bandwidth-latency bottlenecks, exposure of sensitive data to external attackers, etc. It is important to protect the data collected and processed by…

Edge computing applications have recently gained prominence as the world of internet-of-things becomes increasingly embedded into people's lives. Performing computations at the edge addresses multiple issues, such as memory bandwidth-latency bottlenecks, exposure of sensitive data to external attackers, etc. It is important to protect the data collected and processed by edge devices, and also to prevent unauthorized access to such data. It is also important to ensure that the computing hardware fits well within the tight energy and area budgets for the edge devices which are being progressively scaled-down in size. Firstly, a novel low-power smart security prototype chip that combines multiple entropy sources, such as real-time electrocardiogram (ECG) data, and SRAM-based physical unclonable functions (PUF), for authentication and cryptography applications is proposed. Up to ~12X improvement in the equal error rate compared to a prior ECG-only authentication system is achieved by combining feature vectors obtained from ECG, heart rate variability, and SRAM PUF. The resulting vectors can also be utilized for secure cryptography applications. Secondly, a novel in-memory computing (IMC) hardware noise-aware training algorithms that make DNNs more robust to hardware noise is developed and evaluated. Up to 17% accuracy was recovered in deep neural networks (DNNs) deployed on IMC prototype hardware. The noise-aware training principles are also used to improve the adversarial robustness of DNNs, and successfully defend against both adversarial input and weight attacks. Up to ~10\% improvement in robustness against adversarial input attacks, and up to 33% improvement in robustness against adversarial weight attacks are achieved. Finally, a DNN training algorithm that pursues and optimises both activation and weight sparsity simultaneously is proposed and evaluated to obtain highly compressed DNNs. This lead to up to 4.7x reduction in the total number of flops required to perform complex image recognition tasks. A custom sparse inference accelerator is designed and synthesized to evaluate the benefits of the above flop reduction. A speedup of 4.24x is achieved. In summary, this dissertation contains innovative algorithm and hardware design techniques aided by machine learning, which enhance the security and efficiency of edge computing applications.

ContributorsCherupally, Sai Kiran (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Cao, Yu (Kevin) (Committee member) / Fan, Deliang (Committee member) / Arizona State University (Publisher)

Created2022

ASU Electronic Theses and Dissertations

Filtering by

Distributed Learning and Adaptive Algorithms for Edge Networks

Accelerating Genome Quantification in FPGA

On Addressing Security Issues in Edge Neural Network Systems

Deep Learning based Air Quality Prediction with Time Series Sensor Data

Energy Efficient ASIC/FPGA Neural Network Accelerators

Modeling of Intel’s Advanced Interface Bus and its Evaluation on a Chiplet-based System

Experimental Evaluation of the Feasibility of Wearable Piezoelectric Energy Harvesting

Efficient and Secure Deep Learning Inference System: A Software and Hardware Co-design Perspective

Processing-in-Memory for Data-Intensive Applications, From Device to Algorithm

Machine Learning Assisted Security for Edge Computing Applications