Matching Items (21)
Filtering by

Clear all filters

156610-Thumbnail Image.png
Description
Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.

This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs.
ContributorsSrivastava, Gaurav (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2018
133397-Thumbnail Image.png
Description
Students learn in various ways \u2014 visualization, auditory, memorizing, or making analogies. Traditional lecturing in engineering courses and the learning styles of engineering students are inharmonious causing students to be at a disadvantage based on their learning style (Felder & Silverman, 1988). My study analyzes the traditional approach to learning

Students learn in various ways \u2014 visualization, auditory, memorizing, or making analogies. Traditional lecturing in engineering courses and the learning styles of engineering students are inharmonious causing students to be at a disadvantage based on their learning style (Felder & Silverman, 1988). My study analyzes the traditional approach to learning coding skills which is unnatural to engineering students with no previous exposure and examining if visual learning enhances introductory computer science education. Visual and text-based learning are evaluated to determine how students learn introductory coding skills and associated problem solving skills. My study was conducted to observe how the two types of learning aid the students in learning how to problem solve as well as how much knowledge can be obtained in a short period of time. The application used for visual learning was Scratch and Repl.it was used for text-based learning. Two exams were made to measure the progress made by each student. The topics covered by the exam were initialization, variable reassignment, output, if statements, if else statements, nested if statements, logical operators, arrays/lists, while loop, type casting, functions, object orientation, and sorting. Analysis of the data collected in the study allow us to observe whether the traditional method of teaching programming or block-based programming is more beneficial and in what topics of introductory computer science concepts.
ContributorsVidaure, Destiny Vanessa (Author) / Meuth, Ryan (Thesis director) / Yang, Yezhou (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
154545-Thumbnail Image.png
Description
Many neurological disorders, especially those that result in dementia, impact speech and language production. A number of studies have shown that there exist subtle changes in linguistic complexity in these individuals that precede disease onset. However, these studies are conducted on controlled speech samples from a specific task. This thesis

Many neurological disorders, especially those that result in dementia, impact speech and language production. A number of studies have shown that there exist subtle changes in linguistic complexity in these individuals that precede disease onset. However, these studies are conducted on controlled speech samples from a specific task. This thesis explores the possibility of using natural language processing in order to detect declining linguistic complexity from more natural discourse. We use existing data from public figures suspected (or at risk) of suffering from cognitive-linguistic decline, downloaded from the Internet, to detect changes in linguistic complexity. In particular, we focus on two case studies. The first case study analyzes President Ronald Reagan’s transcribed spontaneous speech samples during his presidency. President Reagan was diagnosed with Alzheimer’s disease in 1994, however my results showed declining linguistic complexity during the span of the 8 years he was in office. President George Herbert Walker Bush, who has no known diagnosis of Alzheimer’s disease, shows no decline in the same measures. In the second case study, we analyze transcribed spontaneous speech samples from the news conferences of 10 current NFL players and 18 non-player personnel since 2007. The non-player personnel have never played professional football. Longitudinal analysis of linguistic complexity showed contrasting patterns in the two groups. The majority (6 of 10) of current players showed decline in at least one measure of linguistic complexity over time. In contrast, the majority (11 out of 18) of non-player personnel showed an increase in at least one linguistic complexity measure.
ContributorsWang, Shuai (Author) / Berisha, Visar (Thesis advisor) / LaCross, Amy (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2016
155401-Thumbnail Image.png
Description
This work presents a communication paradigm, using a context-aware mixed reality approach, for instructing human workers when collaborating with robots. The main objective of this approach is to utilize the physical work environment as a canvas to communicate task-related instructions and robot intentions in the form of visual cues. A

This work presents a communication paradigm, using a context-aware mixed reality approach, for instructing human workers when collaborating with robots. The main objective of this approach is to utilize the physical work environment as a canvas to communicate task-related instructions and robot intentions in the form of visual cues. A vision-based object tracking algorithm is used to precisely determine the pose and state of physical objects in and around the workspace. A projection mapping technique is used to overlay visual cues on tracked objects and the workspace. Simultaneous tracking and projection onto objects enables the system to provide just-in-time instructions for carrying out a procedural task. Additionally, the system can also inform and warn humans about the intentions of the robot and safety of the workspace. It was hypothesized that using this system for executing a human-robot collaborative task will improve the overall performance of the team and provide a positive experience to the human partner. To test this hypothesis, an experiment involving human subjects was conducted and the performance (both objective and subjective) of the presented system was compared with a conventional method based on printed instructions. It was found that projecting visual cues enabled human subjects to collaborate more effectively with the robot and resulted in higher efficiency in completing the task.
ContributorsKalpagam Ganesan, Ramsundar (Author) / Ben Amor, Hani (Thesis advisor) / Yang, Yezhou (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)
Created2017
155809-Thumbnail Image.png
Description
Light field imaging is limited in its computational processing demands of high

sampling for both spatial and angular dimensions. Single-shot light field cameras

sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing

incoming rays onto a 2D sensor array. While this resolution can be recovered using

compressive sensing, these iterative solutions are slow

Light field imaging is limited in its computational processing demands of high

sampling for both spatial and angular dimensions. Single-shot light field cameras

sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing

incoming rays onto a 2D sensor array. While this resolution can be recovered using

compressive sensing, these iterative solutions are slow in processing a light field. We

present a deep learning approach using a new, two branch network architecture,

consisting jointly of an autoencoder and a 4D CNN, to recover a high resolution

4D light field from a single coded 2D image. This network decreases reconstruction

time significantly while achieving average PSNR values of 26-32 dB on a variety of

light fields. In particular, reconstruction time is decreased from 35 minutes to 6.7

minutes as compared to the dictionary method for equivalent visual quality. These

reconstructions are performed at small sampling/compression ratios as low as 8%,

allowing for cheaper coded light field cameras. We test our network reconstructions

on synthetic light fields, simulated coded measurements of real light fields captured

from a Lytro Illum camera, and real coded images from a custom CMOS diffractive

light field camera. The combination of compressive light field capture with deep

learning allows the potential for real-time light field video acquisition systems in the

future.
ContributorsGupta, Mayank (Author) / Turaga, Pavan (Thesis advisor) / Yang, Yezhou (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2017
168714-Thumbnail Image.png
Description
Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs remains a challenge, which impedes their deployment on embedded devices.

Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs remains a challenge, which impedes their deployment on embedded devices. For instance, the intersection management of Connected Autonomous Vehicles (CAVs) requires running computationally intensive object recognition algorithms on low-power traffic cameras. This dissertation aims to study the effect of a dynamic hardware and software approach to address this issue. Characteristics of real-world applications can facilitate this dynamic adjustment and reduce the computation. Specifically, this dissertation starts with a dynamic hardware approach that adjusts itself based on the toughness of input and extracts deeper features if needed. Next, an adaptive learning mechanism has been studied that use extracted feature from previous inputs to improve system performance. Finally, a system (ARGOS) was proposed and evaluated that can be run on embedded systems while maintaining the desired accuracy. This system adopts shallow features at inference time, but it can switch to deep features if the system desires a higher accuracy. To improve the performance, ARGOS distills the temporal knowledge from deep features to the shallow system. Moreover, ARGOS reduces the computation furthermore by focusing on regions of interest. The response time and mean average precision are adopted for the performance evaluation to evaluate the proposed ARGOS system.
ContributorsFarhadi, Mohammad (Author) / Yang, Yezhou (Thesis advisor) / Vrudhula, Sarma (Committee member) / Wu, Carole-Jean (Committee member) / Ren, Yi (Committee member) / Arizona State University (Publisher)
Created2022
171818-Thumbnail Image.png
Description
Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability

Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability of human-driven vehicles. Given that eliminating accidents is impossible, an achievable goal of designing AVs is to design them in a way so that they will not be blamed for any accident in which they are involved in. This work proposes BlaFT – a Blame-Free motion planning algorithm in hybrid Traffic. BlaFT is designed to be compatible with HVs and other AVs, and will not be blamed for accidents in a structured road environment. Also, it proves that no accidents will happen if all AVs are using the BlaFT motion planner and that when in hybrid traffic, the AV using BlaFT will be blame-free even if it is involved in a collision. The work instantiated scores of BlaFT and HV vehicles in an urban road scape loop in the 'Simulation of Urban MObility', ran the simulation for several hours, and observe that as the percentage of BlaFT vehicles increases, the traffic becomes safer. Adding BlaFT vehicles to HVs also increases the efficiency of traffic as a whole by up to 34%.
ContributorsPark, Sanggu (Author) / Shrivastava, Aviral (Thesis advisor) / Wang, Ruoyu (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2022
168821-Thumbnail Image.png
Description
It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.
ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)
Created2022
157886-Thumbnail Image.png
Description
Visual navigation is a multi-disciplinary field across computer vision, machine learning and robotics. It is of great significance in both research and industrial applications. An intelligent agent with visual navigation ability will be capable of performing the following tasks: actively explore in environments, distinguish and localize a requested target and

Visual navigation is a multi-disciplinary field across computer vision, machine learning and robotics. It is of great significance in both research and industrial applications. An intelligent agent with visual navigation ability will be capable of performing the following tasks: actively explore in environments, distinguish and localize a requested target and approach the target following acquired strategies. Despite a variety of advances in mobile robotics, enabling an autonomous with above-mentioned abilities is still a challenging and complex task. However, the solution to the task is very likely to accelerate the landing of assistive robots.

Reinforcement learning is a method that trains autonomous robot based on rewarding desired behaviors to help it obtain an action policy that maximizes rewards while the robot interacting with the environment. Through trial and error, an agent learns sophisticated and skillful strategies to handle complex tasks in the environment. Inspired by navigation procedures of human beings that when navigating through environments, humans reason about accessible spaces and geometry of the environment a lot based on first-person view, figure out the destination and then ease over, this work develops a model that maps from pixels to actions and inherently estimate the target as well as the free-space map. The model has three major constituents: (i) a cognitive mapper that maps the topologic free-space map from first-person view images, (ii) a target recognition network that locates a desired object and (iii) an action policy deep reinforcement learning network. Further, a planner model with cascade architecture based on multi-scale semantic top-down occupancy map input is proposed.
ContributorsZheng, Shibin (Author) / Yang, Yezhou (Thesis advisor) / Zhang, Wenlong (Committee member) / Ren, Yi (Committee member) / Arizona State University (Publisher)
Created2019
157977-Thumbnail Image.png
Description
Deep neural networks (DNNs) have had tremendous success in a variety of

statistical learning applications due to their vast expressive power. Most

applications run DNNs on the cloud on parallelized architectures. There is a need

for for efficient DNN inference on edge with low precision hardware and analog

accelerators. To make trained models more

Deep neural networks (DNNs) have had tremendous success in a variety of

statistical learning applications due to their vast expressive power. Most

applications run DNNs on the cloud on parallelized architectures. There is a need

for for efficient DNN inference on edge with low precision hardware and analog

accelerators. To make trained models more robust for this setting, quantization and

analog compute noise are modeled as weight space perturbations to DNNs and an

information theoretic regularization scheme is used to penalize the KL-divergence

between perturbed and unperturbed models. This regularizer has similarities to

both natural gradient descent and knowledge distillation, but has the advantage of

explicitly promoting the network to and a broader minimum that is robust to

weight space perturbations. In addition to the proposed regularization,

KL-divergence is directly minimized using knowledge distillation. Initial validation

on FashionMNIST and CIFAR10 shows that the information theoretic regularizer

and knowledge distillation outperform existing quantization schemes based on the

straight through estimator or L2 constrained quantization.
ContributorsKadambi, Pradyumna (Author) / Berisha, Visar (Thesis advisor) / Dasarathy, Gautam (Committee member) / Seo, Jae-Sun (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2019