Matching Items (31)
Filtering by

Clear all filters

133397-Thumbnail Image.png
Description
Students learn in various ways \u2014 visualization, auditory, memorizing, or making analogies. Traditional lecturing in engineering courses and the learning styles of engineering students are inharmonious causing students to be at a disadvantage based on their learning style (Felder & Silverman, 1988). My study analyzes the traditional approach to learning

Students learn in various ways \u2014 visualization, auditory, memorizing, or making analogies. Traditional lecturing in engineering courses and the learning styles of engineering students are inharmonious causing students to be at a disadvantage based on their learning style (Felder & Silverman, 1988). My study analyzes the traditional approach to learning coding skills which is unnatural to engineering students with no previous exposure and examining if visual learning enhances introductory computer science education. Visual and text-based learning are evaluated to determine how students learn introductory coding skills and associated problem solving skills. My study was conducted to observe how the two types of learning aid the students in learning how to problem solve as well as how much knowledge can be obtained in a short period of time. The application used for visual learning was Scratch and Repl.it was used for text-based learning. Two exams were made to measure the progress made by each student. The topics covered by the exam were initialization, variable reassignment, output, if statements, if else statements, nested if statements, logical operators, arrays/lists, while loop, type casting, functions, object orientation, and sorting. Analysis of the data collected in the study allow us to observe whether the traditional method of teaching programming or block-based programming is more beneficial and in what topics of introductory computer science concepts.
ContributorsVidaure, Destiny Vanessa (Author) / Meuth, Ryan (Thesis director) / Yang, Yezhou (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
171744-Thumbnail Image.png
Description
Convolutional neural networks(CNNs) achieve high accuracy on large datasets but requires significant computation and storage requirement for training/testing. While many applications demand low latency and energy-efficient processing of the images, deploying these complex algorithms on the hardware is a challenging task. This dissertation first presents a compiler-based CNN training accelerator

Convolutional neural networks(CNNs) achieve high accuracy on large datasets but requires significant computation and storage requirement for training/testing. While many applications demand low latency and energy-efficient processing of the images, deploying these complex algorithms on the hardware is a challenging task. This dissertation first presents a compiler-based CNN training accelerator using DDR3 and HBM2 memory. An optimized RTL library is implemented to perform training-specific tasks and an RTL compiler is developed to generate FPGA-synthesizable RTL based on user-defined constraints. High Bandwidth Memory(HBM) provides efficient off-chip communication and improves the training performance. The impact of HBM2 on CNN training workloads is analyzed and compressively compared with DDR3. For training ResNet-20/VGG-like CNNs for the CIFAR-10 dataset, the proposed CNN training accelerator on Stratix-10 GX FPGA(DDR3) demonstrates 479 GOPS performance, and on Stratix-10 MX FPGA(HBM) shows 4.5/9.7 X energy-efficiency improvement compared to Tesla V100 GPU. Next, the FPGA online learning accelerator is presented. Adopting model segmentation techniques from Progressive Segmented Training(PST), the online learning accelerator achieved a 4.2X reduction in training latency. Furthermore, this dissertation presents an 8-bit floating-point (FP8) training processor which implements (1) Highly parallel tensor cores that maintain high PE utilization, (2) Hardware-efficient channel gating for dynamic output activation sparsity (3) Dynamic weight sparsity based on group Lasso (4) Gradient skipping based on FP prediction error. The 28nm prototype chip demonstrates significant improvements in FLOPs reduction (7.3×), energy efficiency (16.4 TFLOPS/W), and overall training latency speedup (4.7×) for both supervised training and self-supervised training tasks. In addition to the training accelerators, this dissertation also presents a CNN inference accelerator on ASIC(FixyNN) and FPGA(FixyFPGA). FixyNN consists of a fixed-weight feature extractor that generates ubiquitous CNN features and a conventional programmable CNN accelerator. In the fixed-weight feature extractor, the network weights are hard-coded into hardware and used as a fixed operand for the multiplication. Experimental results demonstrate FixyNN can achieve very high energy efficiencies up to 26.6 TOPS/W, and FixyFPGA achieves $2.34\times$ higher GOPS on ImageNet classification. In summary, this dissertation comprehensively discusses novel architectures of high-performance and energy-efficient ASIC/FPGA CNN inference/training accelerators.
ContributorsKolala Venkataramaniah, Shreyas (Author) / Seo, Jae-Sun (Thesis advisor) / Cao, Yu (Committee member) / Chakrabarti, Chaitali (Committee member) / Fan, Deliang (Committee member) / Arizona State University (Publisher)
Created2022
168821-Thumbnail Image.png
Description
It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.
ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)
Created2022
171895-Thumbnail Image.png
Description
Adversarial threats of deep learning are increasingly becoming a concern due to the ubiquitous deployment of deep neural networks(DNNs) in many security-sensitive domains. Among the existing threats, adversarial weight perturbation is an emerging class of threats that attempts to perturb the weight parameters of DNNs to breach security and privacy.In

Adversarial threats of deep learning are increasingly becoming a concern due to the ubiquitous deployment of deep neural networks(DNNs) in many security-sensitive domains. Among the existing threats, adversarial weight perturbation is an emerging class of threats that attempts to perturb the weight parameters of DNNs to breach security and privacy.In this thesis, the first weight perturbation attack introduced is called Bit-Flip Attack (BFA), which can maliciously flip a small number of bits within a computer’s main memory system storing the DNN weight parameter to achieve malicious objectives. Our developed algorithm can achieve three specific attack objectives: I) Un-targeted accuracy degradation attack, ii) Targeted attack, & iii) Trojan attack. Moreover, BFA utilizes the rowhammer technique to demonstrate the bit-flip attack in an actual computer prototype. While the bit-flip attack is conducted in a white-box setting, the subsequent contribution of this thesis is to develop another novel weight perturbation attack in a black-box setting. Consequently, this thesis discusses a new study of DNN model vulnerabilities in a multi-tenant Field Programmable Gate Array (FPGA) cloud under a strict black-box framework. This newly developed attack framework injects faults in the malicious tenant by duplicating specific DNN weight packages during data transmission between off-chip memory and on-chip buffer of a victim FPGA. The proposed attack is also experimentally validated in a multi-tenant cloud FPGA prototype. In the final part, the focus shifts toward deep learning model privacy, popularly known as model extraction, that can steal partial DNN weight parameters remotely with the aid of a memory side-channel attack. In addition, a novel training algorithm is designed to utilize the partially leaked DNN weight bit information, making the model extraction attack more effective. The algorithm effectively leverages the partial leaked bit information and generates a substitute prototype of the victim model with almost identical performance to the victim.
ContributorsRakin, Adnan Siraj (Author) / Fan, Deliang (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Seo, Jae-Sun (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2022
190975-Thumbnail Image.png
Description
This thesis addresses the problems of (a) scheduling multiple streaming jobs with soft deadline constraints to minimize the risk/energy consumption in heterogeneous Systems-on-chip (SoCs), and (b) training a neural network model with high accuracy and low training time using split federated learning (SFL) with heterogeneous clients. Designing a scheduler for

This thesis addresses the problems of (a) scheduling multiple streaming jobs with soft deadline constraints to minimize the risk/energy consumption in heterogeneous Systems-on-chip (SoCs), and (b) training a neural network model with high accuracy and low training time using split federated learning (SFL) with heterogeneous clients. Designing a scheduler for heterogeneous SoC SoCs built with different types of processing elements (PEs) is quite challenging, especially when it has to balance the conflicting requirements of low energy consumption, low risk, and high throughput for randomly streaming jobs at run time. Two probabilistic deadline-aware schedulers are designed for heterogeneous SoCs for such jobs with soft deadline constraints with the goals of optimizing job-level risk and energy efficiency. The key idea of the probabilistic scheduler is to calculate the task-to-PE allocation probabilities when a job arrives in the system. This allocation probability, generated by manually designed or neural network (NN) based allocation function, is used to compute the intra-job and inter-job contentions to derive the task-level slack. The tasks are allocated to the PEs that can complete the task within the task-level slack with minimum risk or minimum energy consumption. SFL is an edge-friendly decentralized NN training scheme, where the model is split and only a small client-side model is trained in the clients. The communication overhead in SFL is significant since the intermediate activations and gradients of every sample are transmitted in every epoch. Two communication reduction methods have been proposed, namely, loss-aware selective updating to reduce the number of training epochs and bottleneck layer (BL) to reduce the feature size.Next, the SFL system is trained with heterogeneous clients having different data rates and operating on non-IID data. The communication time of clients in low-end group with slow data rates dominates the training time. To reduce the training time without sacrificing accuracy significantly, HeteroSFL is built with HetBL and bi- directional knowledge sharing (BDKS). HetBL compresses data with different factors in low- and high-end groups using narrow and wide bottleneck layers respectively. BDKS is proposed to mitigate the label distribution skew across different groups. BDKS can also be applied in Federated Learning to address the label distribution skew.
ContributorsChen, Xing (Author) / Chakrabarti, Chaitali (Thesis advisor, Committee member) / Ogras, Umit (Committee member) / Fan, Deliang (Committee member) / Zhang, Jeff (Committee member) / Arizona State University (Publisher)
Created2023
189353-Thumbnail Image.png
Description
In recent years, Artificial Intelligence (AI) (e.g., Deep Neural Networks (DNNs), Transformer) has shown great success in real-world applications due to its superior performance in various cognitive tasks. The impressive performance achieved by AI models normally accompanies the cost of enormous model size and high computational complexity, which significantly hampers

In recent years, Artificial Intelligence (AI) (e.g., Deep Neural Networks (DNNs), Transformer) has shown great success in real-world applications due to its superior performance in various cognitive tasks. The impressive performance achieved by AI models normally accompanies the cost of enormous model size and high computational complexity, which significantly hampers their implementation on resource-limited Cyber-Physical Systems (CPS), Internet-of-Things (IoT), or Edge systems due to their tightly constrained energy, computing, size, and memory budget. Thus, the urgent demand for enhancing the \textbf{Efficiency} of DNN has drawn significant research interests across various communities. Motivated by the aforementioned concerns, this doctoral research has been mainly focusing on Enabling Deep Learning at Edge: From Efficient and Dynamic Inference to On-Device Learning. Specifically, from the inference perspective, this dissertation begins by investigating a hardware-friendly model compression method that effectively reduces the size of AI model while simultaneously achieving improved speed on edge devices. Additionally, due to the fact that diverse resource constraints of different edge devices, this dissertation further explores dynamic inference, which allows for real-time tuning of inference model size, computation, and latency to accommodate the limitations of each edge device. Regarding efficient on-device learning, this dissertation starts by analyzing memory usage during transfer learning training. Based on this analysis, a novel framework called "Reprogramming Network'' (Rep-Net) is introduced that offers a fresh perspective on the on-device transfer learning problem. The Rep-Net enables on-device transferlearning by directly learning to reprogram the intermediate features of a pre-trained model. Lastly, this dissertation studies an efficient continual learning algorithm that facilitates learning multiple tasks without the risk of forgetting previously acquired knowledge. In practice, through the exploration of task correlation, an interesting phenomenon is observed that the intermediate features are highly correlated between tasks with the self-supervised pre-trained model. Building upon this observation, a novel approach called progressive task-correlated layer freezing is proposed to gradually freeze a subset of layers with the highest correlation ratios for each task leading to training efficiency.
ContributorsYang, Li (Author) / Fan, Deliang (Thesis advisor) / Seo, Jae-Sun (Committee member) / Zhang, Junshan (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2023
171924-Thumbnail Image.png
Description
Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous multi-chiplet systems provide high performance at minimum fabrication cost. The

Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous multi-chiplet systems provide high performance at minimum fabrication cost. The main challenge is to interconnect these chiplets while keeping the power and performance closer to monolithic grade. Intel’s Advanced Interface Bus (AIB) is a short reach interface that offers high bandwidth, power efficient, low latency, and cost effective on-package connectivity between chiplets. It supports flexible interconnection of the chiplets with high speed data transfer. Specifically, it is a die to die parallel interface implemented with multiple configurable channels, routed between micro-bumps. In this work, the AIB model is synthesized in 65nm technology node and a performancemodel is generated. This model generates area, power and latency results for multiple technology nodes using technology scaling methods. For all nodes, the area, power and latency values increase linearly with frequency and number of channels. The bandwidth also increases linearly with the number of input/output lanes, which is a function of the micro-bump pitch. Next, the AIB performance model is integrated with the benchmarking simulator, Scalable In-Memory Acceleration With Mesh (SIAM), to realize a scalable chipletbased end-to-end system. The Ground-Referenced Signaling (GRS) driver model in SIAM is replaced with the AIB model and an end-to-end evaluation of Deep Neural Network (DNN) performance is carried out for two contemporary DNN models. Comparative analysis between SIAM with GRS and SIAM with AIB show that while the area of AIB transmitter is less compared to GRS transmitter, the AIB transmitter offers higher bandwidth than GRS transmitter at the expense of higher energy. Furthermore, SIAM with AIB provides more realistic timing numbers since the NoP driver latency is also taken into consideration.
ContributorsCHERIAN, NINOO SUSAN (Author) / Chakrabarti, Chaitali (Thesis advisor) / Cao, Yu (Committee member) / Fan, Deliang (Committee member) / Arizona State University (Publisher)
Created2022
171963-Thumbnail Image.png
Description
The Internet-of-Things (IoT) paradigm is reshaping the ways to interact with the physical space. Many emerging IoT applications need to acquire, process, gain insights from, and act upon the massive amount of data continuously produced by ubiquitous IoT sensors. It is nevertheless technically challenging and economically prohibitive for each IoT

The Internet-of-Things (IoT) paradigm is reshaping the ways to interact with the physical space. Many emerging IoT applications need to acquire, process, gain insights from, and act upon the massive amount of data continuously produced by ubiquitous IoT sensors. It is nevertheless technically challenging and economically prohibitive for each IoT application to deploy and maintain a dedicated large-scale sensor network over distributed wide geographic areas. Built upon the Sensing-as-a-Service paradigm, cloud-sensing service providers are emerging to provide heterogeneous sensing data to various IoT applications with a shared sensing substrate. Cyber threats are among the biggest obstacles against the faster development of cloud-sensing services. This dissertation presents novel solutions to achieve trustworthy IoT sensing-as-a-service. Chapter 1 introduces the cloud-sensing system architecture and the outline of this dissertation. Chapter 2 presents MagAuth, a secure and usable two-factor authentication scheme that explores commercial off-the-shelf wrist wearables with magnetic strap bands to enhance the security and usability of password-based authentication for touchscreen IoT devices. Chapter 3 presents SmartMagnet, a novel scheme that combines smartphones and cheap magnets to achieve proximity-based access control for IoT devices. Chapter 4 proposes SpecKriging, a new spatial-interpolation technique based on graphic neural networks for secure cooperative spectrum sensing which is an important application of cloud-sensing systems. Chapter 5 proposes a trustworthy multi-transmitter localization scheme based on SpecKriging. Chapter 6 discusses the future work.
ContributorsZhang, Yan (Author) / Zhang, Yanchao YZ (Thesis advisor) / Fan, Deliang (Committee member) / Xue, Guoliang (Committee member) / Reisslein, Martin (Committee member) / Arizona State University (Publisher)
Created2022
171818-Thumbnail Image.png
Description
Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability

Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability of human-driven vehicles. Given that eliminating accidents is impossible, an achievable goal of designing AVs is to design them in a way so that they will not be blamed for any accident in which they are involved in. This work proposes BlaFT – a Blame-Free motion planning algorithm in hybrid Traffic. BlaFT is designed to be compatible with HVs and other AVs, and will not be blamed for accidents in a structured road environment. Also, it proves that no accidents will happen if all AVs are using the BlaFT motion planner and that when in hybrid traffic, the AV using BlaFT will be blame-free even if it is involved in a collision. The work instantiated scores of BlaFT and HV vehicles in an urban road scape loop in the 'Simulation of Urban MObility', ran the simulation for several hours, and observe that as the percentage of BlaFT vehicles increases, the traffic becomes safer. Adding BlaFT vehicles to HVs also increases the efficiency of traffic as a whole by up to 34%.
ContributorsPark, Sanggu (Author) / Shrivastava, Aviral (Thesis advisor) / Wang, Ruoyu (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2022
171844-Thumbnail Image.png
Description
Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews

Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews and a set of neuropsychiatric batteries; a key component of nearly all of these evaluations is some spoken language task. Clinicians have long used speech and language production as a proxy for neurological health, but most of these assessments are subjective in nature. Meanwhile, technological advancements in speech and natural language processing have grown exponentially over the past decade, increasing the capacity of computer models to assess particular aspects of speech and language. For this reason, many have seen an opportunity to leverage signal processing and machine learning applications to objectively assess clinical speech samples in order to automatically compute objective measures of neurological health. This document summarizes several contributions to expand upon this body of research. Mainly, there is still a large gap between the theoretical power of computational language models and their actual use in clinical applications. One of the largest concerns is the limited and inconsistent reliability of speech and language features used in models for assessing specific aspects of mental health; numerous methods may exist to measure the same or similar constructs and lead researchers to different conclusions in different studies. To address this, a novel measurement model based on a theoretical framework of speech production is used to motivate feature selection, while also performing a smoothing operation on features across several domains of interest. Then, these composite features are used to perform a much wider range of analyses than is typical of previous studies, looking at everything from diagnosis to functional competency assessments. Lastly, potential improvements to address practical implementation challenges associated with the use of speech and language technology in a real-world environment are investigated. The goal of this work is to demonstrate the ability of speech and language technology to aid clinical practitioners toward improvements in quality of life outcomes for their patients.
ContributorsVoleti, Rohit Nihar Uttam (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie M (Thesis advisor) / Turaga, Pavan (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)
Created2022