Search Content

Approximate neural networks for speech applications in resource-constrained environments

Description

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the…

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the memory and computation cost

of keyword detection and speech recognition networks (or DNNs) are presented.

The first technique is based on representing all weights and biases by a small number of bits and mapping all nodal computations into fixed-point ones with minimal degradation in the

accuracy. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Additional reduction in memory is achieved by a technique called weight pruning,

where the weights are classified as sensitive and insensitive and the sensitive weights are represented with higher precision. A combination of these two techniques helps reduce the memory

footprint by 81 - 84% for speech recognition and keyword detection networks respectively.

Further reduction in memory size is achieved by judiciously dropping connections for large blocks of weights. The corresponding technique, termed coarse-grain sparsification, introduces

hardware-aware sparsity during DNN training, which leads to efficient weight memory compression and significant reduction in the number of computations during classification without

loss of accuracy. Keyword detection and speech recognition DNNs trained with 75% of the weights dropped and classified with 5-6 bit weight precision effectively reduced the weight memory

requirement by ~95% compared to a fully-connected network with double precision, while showing similar performance in keyword detection accuracy and word error rate.

ContributorsArunachalam, Sairam (Author) / Chakrabarti, Chaitali (Thesis advisor) / Seo, Jae-Sun (Thesis advisor) / Cao, Yu (Committee member) / Arizona State University (Publisher)

Created2016

Target discrimination against clutter based on unsupervised clustering and sequential Monte Carlo tracking

Description

The radar performance of detecting a target and estimating its parameters can deteriorate rapidly in the presence of high clutter. This is because radar measurements due to clutter returns can be falsely detected as if originating from the actual target. Various data association methods and multiple hypothesis filtering…

The radar performance of detecting a target and estimating its parameters can deteriorate rapidly in the presence of high clutter. This is because radar measurements due to clutter returns can be falsely detected as if originating from the actual target. Various data association methods and multiple hypothesis filtering approaches have been considered to solve this problem. Such methods, however, can be computationally intensive for real time radar processing. This work proposes a new approach that is based on the unsupervised clustering of target and clutter detections before target tracking using particle filtering. In particular, Gaussian mixture modeling is first used to separate detections into two Gaussian distinct mixtures. Using eigenvector analysis, the eccentricity of the covariance matrices of the Gaussian mixtures are computed and compared to threshold values that are obtained a priori. The thresholding allows only target detections to be used for target tracking. Simulations demonstrate the performance of the new algorithm and compare it with using k-means for clustering instead of Gaussian mixture modeling.

ContributorsFreeman, Matthew Gregory (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2016

Low-power Physical-layer Design for LTE Based Very NarrowBand IoT (VNB - IoT) Communication

Description

With the new age Internet of Things (IoT) revolution, there is a need to connect a wide range of devices with varying throughput and performance requirements. In this thesis, a wireless system is proposed which is targeted towards very low power, delay insensitive IoT applications with low throughput requirements. The…

With the new age Internet of Things (IoT) revolution, there is a need to connect a wide range of devices with varying throughput and performance requirements. In this thesis, a wireless system is proposed which is targeted towards very low power, delay insensitive IoT applications with low throughput requirements. The low cost receivers for such devices will have very low complexity, consume very less power and hence will run for several years.

Long Term Evolution (LTE) is a standard developed and administered by 3rd Generation Partnership Project (3GPP) for high speed wireless communications for mobile devices. As a part of Release 13, another standard called narrowband IoT (NB-IoT) was introduced by 3GPP to serve the needs of IoT applications with low throughput requirements. Working along similar lines, this thesis proposes yet another LTE based solution called very narrowband IoT (VNB-IoT), which further reduces the complexity and power consumption of the user equipment (UE) while maintaining the base station (BS) architecture as defined in NB-IoT.

In the downlink operation, the transmitter of the proposed system uses the NB-IoT resource block with each subcarrier modulated with data symbols intended for a different user. On the receiver side, each UE locks to a particular subcarrier frequency instead of the entire resource block and operates as a single carrier receiver. On the uplink, the system uses a single-tone transmission as specified in the NB-IoT standard.

Performance of the proposed system is analyzed in an additive white Gaussian noise (AWGN) channel followed by an analysis of the inter carrier interference (ICI). Relationship between the overall filter bandwidth and ICI is established towards the end.

ContributorsSharma, Prashant (Author) / Bliss, Daniel (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / McGiffen, Thomas (Committee member) / Arizona State University (Publisher)

Created2017

Ionospheric Channel Modeling and Estimation

Description

The goal is to provide accurate measurement of the channel between a ground source and a receiving satellite.

The effects of the the ionosphere for ground to space propagation for radio waves in the 3-30 MHz HF band is an unstudied subject.

The effects of the ionosphere on radio propagation is a…

The goal is to provide accurate measurement of the channel between a ground source and a receiving satellite.

The effects of the the ionosphere for ground to space propagation for radio waves in the 3-30 MHz HF band is an unstudied subject.

The effects of the ionosphere on radio propagation is a long studied subject, the primary focus has been ground to ground by means of ionospheric reflection and space to ground corrections of ionospheric distortions of GPS.

Because of the plasma properties of the ionosphere there is a strong dependence on the frequency of use.

GPS L1 1575.42 MHz and L2 1227.60 MHz are much less effected than the 3-30 MHz HF band used for skywave propagation.

The channel between the ground transmitter and the satellite receiver is characterized by 2 unique polarization modes with respective delays and Dopplers.

Accurate estimates of delay and Doppler are done using polynomial fit functions.

The application of polarimetric separation of the two propagating polarizations allows improved estimate quality of delay and Doppler of the respective mode.

These methods yield good channel models and an effective channel estimation method well suited for the ground to space propagation.

ContributorsStandage-Beier, Wylie S (Author) / Bliss, Daniel W (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / McGiffen, Thomas (Committee member) / Arizona State University (Publisher)

Created2017

Low Complexity Optical Flow Using Neighbor-Guided Semi-Global Matching

Description

Many real-time vision applications require accurate estimation of optical flow. This problem is quite challenging due to extremely high computation and memory requirements. This thesis focuses on designing low complexity dense optical flow algorithms.

First, a new method for optical flow that is based on Semi-Global Matching (SGM), a popular dynamic…

Many real-time vision applications require accurate estimation of optical flow. This problem is quite challenging due to extremely high computation and memory requirements. This thesis focuses on designing low complexity dense optical flow algorithms.

First, a new method for optical flow that is based on Semi-Global Matching (SGM), a popular dynamic programming algorithm for stereo vision, is presented. In SGM, the disparity of each pixel is calculated by aggregating local matching costs over the entire image to resolve local ambiguity in texture-less and occluded regions. The proposed method, Neighbor-Guided Semi-Global Matching (NG-fSGM) achieves significantly less complexity compared to SGM, by 1) operating on a subset of the search space that has been aggressively pruned based on neighboring pixels’ information, 2) using a simple cost aggregation function, 3) approximating aggregated cost array and embedding pixel-wise matching cost computation and flow computation in aggregation. Evaluation on the Middlebury benchmark suite showed that, compared to a prior SGM extension for optical flow, the proposed basic NG-fSGM provides robust optical flow with 0.53% accuracy improvement, 40x reduction in number of operations and 6x reduction in memory size. To further reduce the complexity, sparse-to-dense flow estimation method is proposed. The number of operations and memory size are reduced by 68% and 47%, respectively, with only 0.42% accuracy degradation, compared to the basic NG-fSGM.

A parallel block-based version of NG-fSGM is also proposed. The image is divided into overlapping blocks and the blocks are processed in parallel to improve throughput, latency and power efficiency. To minimize the amount of overlap among blocks with minimal effect on the accuracy, temporal information is used to estimate a flow map that guides flow vector selections for pixels along block boundaries. The proposed block-based NG-fSGM achieves significant reduction in complexity with only 0.51% accuracy degradation compared to the basic NG-fSGM.

ContributorsXiang, Jiang (Author) / Chakrabarti, Chaitali (Thesis advisor) / Karam, Lina (Committee member) / Kim, Hun Seok (Committee member) / Arizona State University (Publisher)

Created2017

Development of Simulation Components for Wireless Communication

Description

This thesis work present the simulation of Bluetooth and Wi-Fi radios in real life interference environments. When information is transmitted via communication channels, data may get corrupted due to noise and other channel discrepancies. In order to receive the information safely and correctly, error correction coding schemes are generally employed…

This thesis work present the simulation of Bluetooth and Wi-Fi radios in real life interference environments. When information is transmitted via communication channels, data may get corrupted due to noise and other channel discrepancies. In order to receive the information safely and correctly, error correction coding schemes are generally employed during the design of communication systems. Usually the simulations of wireless communication systems are done in such a way that they focus on some aspect of communications and neglect the others. The simulators available currently will either do network layer simulations or physical layer level simulations. In many situations, simulations are required which show inter-layer aspects of communication systems. For all such scenarios, a simulation environment, WiscaComm which is based on time-domain samples is built. WiscaComm allows the study of network and physical layer interactions in detail. The advantage of time domain sampling is that it allows the simulation of different radios together which is better than the complex baseband representation of symbols. The environment also supports study of multiple protocols operating simultaneously, which is of increasing importance in today's environment.

ContributorsNolastname, Ujjwala (Author) / Bliss, Daniel W. (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / McGiffen, Thomas (Committee member) / Arizona State University (Publisher)

Created2017

Development of Multiple Protocols in Novel Simulation Environment

Description

When one considers the current state of wireless communications, it becomes clear that it is both absolutely amazing and something of a mess. Present communications standards are the result of local optimizations over time that led to a confusing set of suboptimal and fragile wireless standards. Starting from a clean…

When one considers the current state of wireless communications, it becomes clear that it is both absolutely amazing and something of a mess. Present communications standards are the result of local optimizations over time that led to a confusing set of suboptimal and fragile wireless standards. Starting from a clean sheet of paper, Bliss Laboratory for Information, Signals, and Systems (BLISS) is considering a fluid set of communications standards co-optimized with flexible but power-efficient computational implementations that will enable the next revolution of wireless communications. The main aim is to enable much higher data rates and much lower data rates with corresponding lower power consumption as the needs of the users vary.

The thesis mainly looks at the different sections of the work done, to prime the development of the protocol development engine. It discusses channel modeling, and system integration of receiver and channel noise. It also proposes a Carrier-Sense Multiple Access (CSMA) Media Access Control (MAC) layer protocol implementation for (Wireless Fidelity) Wi-Fi protocol. This work also talks about the Graphical User Interface (GUI), which is a part of Protocol Development Kit (PDK) - a combination of the Protocol Recommendation Engine (PRE) and simulation package to aid the development of protocols. It also sheds light on the Automatic Dependent Surveillance - Broadcast (ADS-B) radio protocol, that will eventually replace radar as Air Traffic Control's (ATC) primary tool for separating aircraft.

All the algorithms used in this thesis, to define radio operation were in principle defined by mathematical descriptions; however, to test and implement these algorithms they had to be converted to a computer language. There were multiple phases of this conversion. In the first phase, the implementation of these algorithms was done in Matrix Laboratory (MATLAB). To aid this development, basic radio finite state machines and radio algorithmic tools were provided.

ContributorsRupakula, Venkata Sai Karteek (Author) / Bliss, Daniel W (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / McGiffen, Tom (Committee member) / Arizona State University (Publisher)

Created2017

Low Complexity Wireless Communication Digital Baseband Design

Description

This thesis addresses two problems in digital baseband design of wireless communication systems, namely, those in Internet of Things (IoT) terminals that support long range communications and those in full-duplex systems that are designed for high spectral efficiency.

IoT terminals for long range communications are typically based on Orthogonal Frequency-Division Multiple…

This thesis addresses two problems in digital baseband design of wireless communication systems, namely, those in Internet of Things (IoT) terminals that support long range communications and those in full-duplex systems that are designed for high spectral efficiency.

IoT terminals for long range communications are typically based on Orthogonal Frequency-Division Multiple Access (OFDMA) and spread spectrum technologies. In order to design an efficient baseband architecture for such terminals, the workload profiles of both systems are analyzed. Since frame detection unit has by far the highest computational load, a simple architecture that uses only a scalar datapath is proposed. To optimize for low energy consumption, application-specific instructions that minimize register accesses and address generation units for streamlined memory access are introduced. Two parameters, namely, correlation window size and threshold value, affect the detection probability, the false alarm probability and hence energy consumption. Next, energy-optimal operation settings for correlation window size and threshold value are derived for different channel conditions. For both good and bad channel conditions, if target signal detection probability is greater than 0.9, the baseband processor has the lowest energy when the frame detection algorithm uses the longest correlation window and the highest threshold value.

A full-duplex system has high spectral efficiency but suffers from self-interference. Part of the interference can be cancelled digitally using equalization techniques. The cancellation performance and computation complexity of the competing equalization algorithms, namely, Least Mean Square (LMS), Normalized LMS (NLMS), Recursive Least Square (RLS) and feedback equalizers based on LMS, NLMS and RLS are analyzed, and a trade-off between performance and complexity established. NLMS linear equalizer is found to be suitable for resource-constrained mobile devices and NLMS decision feedback equalizer is more appropriate for base stations that are not energy constrained.

ContributorsWu, Shunyao (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Lee, Hyunseok (Committee member) / Arizona State University (Publisher)

Created2017

FPGA-based implementation of QR decomposition

Description

This thesis report aims at introducing the background of QR decomposition and its application. QR decomposition using Givens rotations is a efficient method to prevent directly matrix inverse in solving least square minimization problem, which is a typical approach for weight calculation in adaptive beamforming. Furthermore, this thesis introduces Givens…

This thesis report aims at introducing the background of QR decomposition and its application. QR decomposition using Givens rotations is a efficient method to prevent directly matrix inverse in solving least square minimization problem, which is a typical approach for weight calculation in adaptive beamforming. Furthermore, this thesis introduces Givens rotations algorithm and two general VLSI (very large scale integrated circuit) architectures namely triangular systolic array and linear systolic array for numerically QR decomposition. To fulfill the goal, a 4 input channels triangular systolic array with 16 bits fixed-point format and a 5 input channels linear systolic array are implemented on FPGA (Field programmable gate array). The final result shows that the estimated clock frequencies of 65 MHz and 135 MHz on post-place and route static timing report could be achieved using Xilinx Virtex 6 xc6vlx240t chip. Meanwhile, this report proposes a new method to test the dynamic range of QR-D. The dynamic range of the both architectures can be achieved around 110dB.

ContributorsYu, Hanguang (Author) / Bliss, Daniel W (Thesis advisor) / Ying, Lei (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014

Novel Learning-Based Task Schedulers for Domain-Specific SoCs

Description

This Master’s thesis includes the design, integration on-chip, and evaluation of a set of imitation learning (IL)-based scheduling policies: deep neural network (DNN)and decision tree (DT). We first developed IL-based scheduling policies for heterogeneous systems-on-chips (SoCs). Then, we tested these policies using a system-level domain-specific system-on-chip simulation framework [11]. Finally,…

This Master’s thesis includes the design, integration on-chip, and evaluation of a set of imitation learning (IL)-based scheduling policies: deep neural network (DNN)and decision tree (DT). We first developed IL-based scheduling policies for heterogeneous systems-on-chips (SoCs). Then, we tested these policies using a system-level domain-specific system-on-chip simulation framework [11]. Finally, we transformed them into efficient code using a cloud engine [1] and implemented on a user-space emulation framework [61] on a Unix-based SoC. IL is one area of machine learning (ML) and a useful method to train artificial intelligence (AI) models by imitating the decisions of an expert or Oracle that knows the optimal solution. This thesis's primary focus is to adapt an ML model to work on-chip and optimize the resource allocation for a set of domain-specific wireless and radar systems applications. Evaluation results with four streaming applications from wireless communications and radar domains show how the proposed IL-based scheduler approximates an offline Oracle expert with more than 97% accuracy and 1.20× faster execution time. The models have been implemented as an add-on, making it easy to port to other SoCs.

ContributorsHolt, Conrad Mestres (Author) / Ogras, Umit Y. (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Akoglu, Ali (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by