Matching Items (49)
Filtering by

Clear all filters

171768-Thumbnail Image.png
Description
Object tracking refers to the problem of estimating a moving object's time-varying parameters that are indirectly observed in measurements at each time step. Increased noise and clutter in the measurements reduce estimation accuracy as they increase the uncertainty of tracking in the field of view. Whereas tracking is performed using

Object tracking refers to the problem of estimating a moving object's time-varying parameters that are indirectly observed in measurements at each time step. Increased noise and clutter in the measurements reduce estimation accuracy as they increase the uncertainty of tracking in the field of view. Whereas tracking is performed using a Bayesian filter, a Bayesian smoother can be utilized to refine parameter state estimations that occurred before the current time. In practice, smoothing can be widely used to improve state estimation or correct data association errors, and it can lead to significantly better estimation performance as it reduces the impact of noise and clutter. In this work, a single object tracking method is proposed based on integrating Kalman filtering and smoothing with thresholding to remove unreliable measurements. As the new method is effective when the noise and clutter in the measurements are high, the main goal is to find these measurements using a moving average filter and a thresholding method to improve estimation. Thus, the proposed method is designed to reduce estimation errors that result from measurements corrupted with high noise and clutter. Simulations are provided to demonstrate the improved performance of the new method when compared to smoothing without thresholding. The root-mean-square error in estimating the object state parameters is shown to be especially reduced under high noise conditions.
ContributorsSeo, Yongho (Author) / Papandreaou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel W (Committee member) / Chakrabarti, Chaitali (Committee member) / Moraffah, Bahman (Committee member) / Arizona State University (Publisher)
Created2022
171924-Thumbnail Image.png
Description
Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous multi-chiplet systems provide high performance at minimum fabrication cost. The

Many of the advanced integrated circuits in the past used monolithic grade die due to power, performance and cost considerations. Today, heterogenous integration of multiple dies into a single package is possible because of the advancement in packaging. These heterogeneous multi-chiplet systems provide high performance at minimum fabrication cost. The main challenge is to interconnect these chiplets while keeping the power and performance closer to monolithic grade. Intel’s Advanced Interface Bus (AIB) is a short reach interface that offers high bandwidth, power efficient, low latency, and cost effective on-package connectivity between chiplets. It supports flexible interconnection of the chiplets with high speed data transfer. Specifically, it is a die to die parallel interface implemented with multiple configurable channels, routed between micro-bumps. In this work, the AIB model is synthesized in 65nm technology node and a performancemodel is generated. This model generates area, power and latency results for multiple technology nodes using technology scaling methods. For all nodes, the area, power and latency values increase linearly with frequency and number of channels. The bandwidth also increases linearly with the number of input/output lanes, which is a function of the micro-bump pitch. Next, the AIB performance model is integrated with the benchmarking simulator, Scalable In-Memory Acceleration With Mesh (SIAM), to realize a scalable chipletbased end-to-end system. The Ground-Referenced Signaling (GRS) driver model in SIAM is replaced with the AIB model and an end-to-end evaluation of Deep Neural Network (DNN) performance is carried out for two contemporary DNN models. Comparative analysis between SIAM with GRS and SIAM with AIB show that while the area of AIB transmitter is less compared to GRS transmitter, the AIB transmitter offers higher bandwidth than GRS transmitter at the expense of higher energy. Furthermore, SIAM with AIB provides more realistic timing numbers since the NoP driver latency is also taken into consideration.
ContributorsCHERIAN, NINOO SUSAN (Author) / Chakrabarti, Chaitali (Thesis advisor) / Cao, Yu (Committee member) / Fan, Deliang (Committee member) / Arizona State University (Publisher)
Created2022
171954-Thumbnail Image.png
Description
This thesis presents a code generation tool to improve the programmability of systolic array processors such as the Domain Adaptive Processor (DAP) that was designed by researchers at the University of Michigan for wireless communication workloads. Unlike application-specific integrated circuits, DAP aims to achieve high performance without trading off much

This thesis presents a code generation tool to improve the programmability of systolic array processors such as the Domain Adaptive Processor (DAP) that was designed by researchers at the University of Michigan for wireless communication workloads. Unlike application-specific integrated circuits, DAP aims to achieve high performance without trading off much on programmability and reconfigurability. The structure of a typical DAP code for each Processing Element (PE) is very different from any other programming language format. As a result, writing code for DAP requires the programmer to acquire processor-specific knowledge including configuration rules, cycle accurate execution state for memory and datapath components within each PE, etc. Each code must be carefully handcrafted to meet the strict timing and resource constraints, leading to very long programming times and low productivity. In this thesis, a code generation and optimization tool is introduced to improve the programmability of DAP and make code development easier. The tool consists of a configuration code generator, optimizer, and a scheduler. An Instruction Set Architecture (ISA) has been designed specifically for DAP. The programmer writes the assembly code for each PE using the DAP ISA. The assembly code is then translated into a low-level configuration code. This configuration code undergoes several optimizations passes. Level 1 (L1) optimization handles instruction redundancy and performs loop optimizations through code movement. The Level 2 (L2) optimization performs instruction-level parallelism. Use of L1 and L2 optimization passes result in a code that has fewer instructions and requires fewer cycles. In addition, a scheduling tool has been introduced which performs final timing adjustments on the code to match the input data rate.
ContributorsVipperla, Anish (Author) / Chakrabarti, Chaitali (Thesis advisor) / Bliss, Daniel (Committee member) / Akoglu, Ali (Committee member) / Arizona State University (Publisher)
Created2022
171406-Thumbnail Image.png
Description
Coarse-grain reconfigurable architectures (CGRAs) have shown significant improvements as hardware accelerator whilst demanding low power. Such acceleration inherits from the nature of instruction-level parallelism and exploited by many techniques. Modulo scheduling is a popular approach to software pipelining techniques that provides an efficient heuristic to accelerations on loops, repetitive regions

Coarse-grain reconfigurable architectures (CGRAs) have shown significant improvements as hardware accelerator whilst demanding low power. Such acceleration inherits from the nature of instruction-level parallelism and exploited by many techniques. Modulo scheduling is a popular approach to software pipelining techniques that provides an efficient heuristic to accelerations on loops, repetitive regions of an application. Existing scheduling algorithms for modulo scheduling heuristic persist on loop exiting problems that limit CGRA acceleration to only loops with known trip count and no exit statements. Another notable limitation is the early exit problem, where loops can only terminate after certain iterations as CGRA moves to kernel stage. In attempts to circumvent such obstacles, COMSAT introduces a modified modulo scheduling technique that acts as an external module and can be applied to any existing scheduling/mapping algorithms with minimal hardware changes. Experiments from MiBench and Rodinia benchmark suites have shown that COMSAT achieved an average speedup of 3x in overall benchmarks and 10x speedup in kernel regions. Without COMSAT techniques, only 25% of said loops would have been able to accelerate, reducing benchmark and kernel speedups to 1.25x and 3.63x respectively.
ContributorsTa, Vinh (Author) / Shrivastava, Aviral (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Kinsey, Michel (Committee member) / Arizona State University (Publisher)
Created2022
168739-Thumbnail Image.png
Description
Visual navigation is a useful and important task for a variety of applications. As the preva­lence of robots increase, there is an increasing need for energy-­efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented in the lit­erature, but there is a lack of work

Visual navigation is a useful and important task for a variety of applications. As the preva­lence of robots increase, there is an increasing need for energy-­efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented in the lit­erature, but there is a lack of work on evaluation of the efficiency of the image sensors. In this thesis, two methods are evaluated: adaptive image sensor quantization for traditional camera pipelines as well as new event­-based sensors for low­-power computer vision.The first contribution in this thesis is an evaluation of performing varying levels of sen­sor linear and logarithmic quantization with the task of visual simultaneous localization and mapping (SLAM). This unconventional method can provide efficiency benefits with a trade­ off between accuracy of the task and energy-­efficiency. A new sensor quantization method, gradient­-based quantization, is introduced to improve the accuracy of the task. This method only lowers the bit level of parts of the image that are less likely to be important in the SLAM algorithm since lower bit levels signify better energy­-efficiency, but worse task accuracy. The third contribution is an evaluation of the efficiency and accuracy of event­-based camera inten­sity representations for the task of optical flow. The results of performing a learning based optical flow are provided for each of five different reconstruction methods along with ablation studies. Lastly, the challenges of an event feature­-based SLAM system are presented with re­sults demonstrating the necessity for high quality and high­ resolution event data. The work in this thesis provides studies useful for examining trade­offs for an efficient visual navigation system with traditional and event vision sensors. The results of this thesis also provide multiple directions for future work.
ContributorsChristie, Olivia Catherine (Author) / Jayasuriya, Suren (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2022
156822-Thumbnail Image.png
Description
Hardware implementation of deep neural networks is earning significant importance nowadays. Deep neural networks are mathematical models that use learning algorithms inspired by the brain. Numerous deep learning algorithms such as multi-layer perceptrons (MLP) have demonstrated human-level recognition accuracy in image and speech classification tasks. Multiple layers of processing elements

Hardware implementation of deep neural networks is earning significant importance nowadays. Deep neural networks are mathematical models that use learning algorithms inspired by the brain. Numerous deep learning algorithms such as multi-layer perceptrons (MLP) have demonstrated human-level recognition accuracy in image and speech classification tasks. Multiple layers of processing elements called neurons with several connections between them called synapses are used to build these networks. Hence, it involves operations that exhibit a high level of parallelism making it computationally and memory intensive. Constrained by computing resources and memory, most of the applications require a neural network which utilizes less energy. Energy efficient implementation of these computationally intense algorithms on neuromorphic hardware demands a lot of architectural optimizations. One of these optimizations would be the reduction in the network size using compression and several studies investigated compression by introducing element-wise or row-/column-/block-wise sparsity via pruning and regularization. Additionally, numerous recent works have concentrated on reducing the precision of activations and weights with some reducing to a single bit. However, combining various sparsity structures with binarized or very-low-precision (2-3 bit) neural networks have not been comprehensively explored. Output activations in these deep neural network algorithms are habitually non-binary making it difficult to exploit sparsity. On the other hand, biologically realistic models like spiking neural networks (SNN) closely mimic the operations in biological nervous systems and explore new avenues for brain-like cognitive computing. These networks deal with binary spikes, and they can exploit the input-dependent sparsity or redundancy to dynamically scale the amount of computation in turn leading to energy-efficient hardware implementation. This work discusses configurable spiking neuromorphic architecture that supports multiple hidden layers exploiting hardware reuse. It also presents design techniques for minimum-area/-energy DNN hardware with minimal degradation in accuracy. Area, performance and energy results of these DNN and SNN hardware is reported for the MNIST dataset. The Neuromorphic hardware designed for SNN algorithm in 28nm CMOS demonstrates high classification accuracy (>98% on MNIST) and low energy (51.4 - 773 (nJ) per classification). The optimized DNN hardware designed in 40nm CMOS that combines 8X structured compression and 3-bit weight precision showed 98.4% accuracy at 33 (nJ) per classification.
ContributorsKolala Venkataramanaiah, Shreyas (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2018
156790-Thumbnail Image.png
Description
Vision processing on traditional architectures is inefficient due to energy-expensive off-chip data movements. Many researchers advocate pushing processing close to the sensor to substantially reduce data movements. However, continuous near-sensor processing raises the sensor temperature, impairing the fidelity of imaging/vision tasks.

The work characterizes the thermal implications of using 3D stacked

Vision processing on traditional architectures is inefficient due to energy-expensive off-chip data movements. Many researchers advocate pushing processing close to the sensor to substantially reduce data movements. However, continuous near-sensor processing raises the sensor temperature, impairing the fidelity of imaging/vision tasks.

The work characterizes the thermal implications of using 3D stacked image sensors with near-sensor vision processing units. The characterization reveals that near-sensor processing reduces system power but degrades image quality. For reasonable image fidelity, the sensor temperature needs to stay below a threshold, situationally determined by application needs. Fortunately, the characterization also identifies opportunities -- unique to the needs of near-sensor processing -- to regulate temperature based on dynamic visual task requirements and rapidly increase capture quality on demand.

Based on the characterization, the work proposes and investigate two thermal management strategies -- stop-capture-go and seasonal migration -- for imaging-aware thermal management. The work present parameters that govern the policy decisions and explore the trade-offs between system power and policy overhead. The work's evaluation shows that the novel dynamic thermal management strategies can unlock the energy-efficiency potential of near-sensor processing with minimal performance impact, without compromising image fidelity.
ContributorsKodukula, Venkatesh (Author) / LiKamWa, Robert (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Brunhaver, John (Committee member) / Arizona State University (Publisher)
Created2019
156610-Thumbnail Image.png
Description
Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.

This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs.
ContributorsSrivastava, Gaurav (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2018
157644-Thumbnail Image.png
Description
Background. Despite extensive research in the literature aimed at understanding the role of hypertension as a major risk factor for numerous leading causes of death in the United

States, rates of this disease continue to rise. Recent findings suggest that antiseptic mouthwash use may increase blood pressure through elimination of oral

Background. Despite extensive research in the literature aimed at understanding the role of hypertension as a major risk factor for numerous leading causes of death in the United

States, rates of this disease continue to rise. Recent findings suggest that antiseptic mouthwash use may increase blood pressure through elimination of oral bacteria that facilitate the enterosalivary nitrate-nitrite-nitric oxide pathway.

Objective. The purpose of this randomized, controlled, crossover trial was to examine the effects of antiseptic mouthwash use and sodium intake on blood pressure and salivary nitrate levels in prehypertensive adults.

Methods. Healthy adults (n=10; 47.3±12.5) with mildly elevated blood pressure (average baseline blood pressure of 114.9/75.2 mmHg) were recruited and were randomly assigned to a control condition, antiseptic mouthwash use, or antiseptic mouthwash use + consumption of three pickles per day (~6000 mg/day of sodium) for a total of 7 days. Given the crossover design of this study, participants adhered to a 1-week washout period between each condition and all participants received all three treatments. Findings were considered significant at a p-value of <0.05 and a repeated measures ANOVA test was used to compare change data of each condition.

Results. Changes in systolic and diastolic blood pressure were not statistically significant (p=0.469 and p=0.859, respectively). Changes in salivary nitrite levels were not statistically significant (p=0.493). Although there appeared to be fluctuations in sodium intake between interventions, differences in sodium intake were not statistically significant when pickles were not accounted for (p=0.057).

Conclusion. Antiseptic mouthwash use did not appear to induce significant changes in systolic or diastolic blood pressure in this population.
ContributorsShaw, Karrol (Author) / Johnston, Carol (Thesis advisor) / Alexon, Christy (Committee member) / Sweazea, Karen (Committee member) / Arizona State University (Publisher)
Created2019
156936-Thumbnail Image.png
Description
In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional CNN. It uses capsules to describe features in multiple dimensions and dynamic routing to increase the statistical stability of the network.

In this work, we first use capsule network for overlapping digit recognition problem. We evaluate the performance of the network with respect to recognition accuracy, convergence and training time per epoch. We show that capsule network achieves higher accuracy when training set size is small. When training set size is larger, capsule network and conventional CNN have comparable recognition accuracy. The training time per epoch for capsule network is longer than conventional CNN because of the dynamic routing algorithm. An analysis of the GPU timing shows that adjusting the capsule structure can help decrease the time complexity of the dynamic routing algorithm significantly.

Next, we design a capsule network for speech recognition, specifically, overlapping word recognition. We use both capsule network and conventional CNN to recognize 2 overlapping words in speech files created from 5 word classes. We show that capsule network achieves a considerably higher recognition accuracy (96.92%) compared to conventional CNN (85.19%). Our results show that capsule network recognizes overlapping word by recognizing each individual word in the speech. We also verify the scalability of capsule network by increasing the number of word classes from 5 to 10. Capsule network still shows a high recognition accuracy of 95.42% in case of 10 words while the accuracy of conventional CNN decreases sharply to 73.18%.
ContributorsXiong, Yan (Author) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Thesis advisor) / Weng, Yang (Committee member) / Arizona State University (Publisher)
Created2018