Matching Items (86)

131572-Thumbnail Image.png

Wireless Charging Technologies

Description

In the world we live in today, nothing is impossible. Due to the advancements of technology, humans around the globe are able to hold computers that fit within the size

In the world we live in today, nothing is impossible. Due to the advancements of technology, humans around the globe are able to hold computers that fit within the size of their pocket. These computers can do marvelous things, however run off batteries. These batteries need to be charged and up until a little while ago there was only one option available: wired chargers; however, because of the advancement of technology society has created a way to transfer power via magnetic fields. Now this concept has been around for a long time since the days of Nikola Tesla but just recently society has been able to apply his discoveries to charging these computers in our pockets. Unfortunately, the current models of these chargers come with a drawback as they are less efficient than wired chargers. However, this is the question our group has set out to answer. Is there any way possible to improve the efficiency of these wireless chargers so they are equal or even more efficient than wired chargers. This paper explores how to improve the efficiency in wireless chargers. Through research, simulations and testing the group has discovered areas that efficiency can be improved as well as makes recommendations to change the current wireless chargers on the market today. This paper also explores future applications of wireless chargers that can not only make life much easier but could also save lives in some cases. These applications can have many effects on hospitality, the medical field, as well as the supply chain and logistics of America.

Contributors

Agent

Created

Date Created
  • 2020-05

129328-Thumbnail Image.png

Within and cross-corpus speech emotion recognition using latent topic model-based features

Description

Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtained via a brute-force collection

Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtained via a brute-force collection of statistics over frames, thereby losing important local information in the process which affects the performance. To overcome these limitations, a novel feature extraction approach using latent topic models (LTMs) is presented in this study. Speech is assumed to comprise of a mixture of emotion-specific topics, where the latter capture emotionally salient information from the co-occurrences of frame-level acoustic features and yield better descriptors. Specifically, a supervised replicated softmax model (sRSM), based on restricted Boltzmann machines and distributed representations, is proposed to learn naturally discriminative topics. The proposed features are evaluated for the recognition of categorical or continuous emotional attributes via within and cross-corpus experiments conducted over acted and spontaneous expressions. In a within-corpus scenario, sRSM outperforms competing LTMs, while obtaining a significant improvement of 16.75% over popular statistics-based turn-level features for valence-based classification, which is considered to be a difficult task using only speech. Further analyses with respect to the turn duration show that the improvement is even more significant, 35%, on longer turns (>6 s), which is highly desirable for current turn-based practices. In a cross-corpus scenario, two novel adaptation-based approaches, instance selection, and weight regularization are proposed to reduce the inherent bias due to varying annotation procedures and cultural perceptions across databases. Experimental results indicate a natural, yet less severe, deterioration in performance - only 2.6% and 2.7%, thereby highlighting the generalization ability of the proposed features.

Contributors

Agent

Created

Date Created
  • 2015-01-25

158876-Thumbnail Image.png

RNS-Based NTT Polynomial Multiplier for Lattice-Based Cryptography

Description

Lattice-based Cryptography is an up and coming field of cryptography that utilizes the difficulty of lattice problems to design lattice-based cryptosystems that are resistant to quantum attacks and applicable to

Lattice-based Cryptography is an up and coming field of cryptography that utilizes the difficulty of lattice problems to design lattice-based cryptosystems that are resistant to quantum attacks and applicable to Fully Homomorphic Encryption schemes (FHE). In this thesis, the parallelization of the Residue Number System (RNS) and algorithmic efficiency of the Number Theoretic Transform (NTT) are combined to tackle the most significant bottleneck of polynomial ring multiplication with the hardware design of an optimized RNS-based NTT polynomial multiplier. The design utilizes Negative Wrapped Convolution, the NTT, RNS Montgomery reduction with Bajard and Shenoy extensions, and optimized modular 32-bit channel arithmetic for nine RNS channels to accomplish an RNS polynomial multiplication. In addition to a full software implementation of the whole system, a pipelined and optimized RNS-based NTT unit with 4 RNS butterflies is implemented on the Xilinx Artix-7 FPGA(xc7a200tlffg1156-2L) for size and delay estimates. The hardware implementation achieves an operating frequency of 47.043 MHz and utilizes 13239 LUT's, 4010 FF's, and 330 DSP blocks, allowing for multiple simultaneously operating NTT units depending on FGPA size constraints.

Contributors

Agent

Created

Date Created
  • 2020

158775-Thumbnail Image.png

Dash Database: Structured Kernel Data For The Machine Understanding of Computation

Description

As device and voltage scaling cease, ever-increasing performance targets can only be achieved through the design of parallel, heterogeneous architectures. The workloads targeted by these domain-specific

As device and voltage scaling cease, ever-increasing performance targets can only be achieved through the design of parallel, heterogeneous architectures. The workloads targeted by these domain-specific architectures must be designed to leverage the strengths of the platform: a task that has proven to be extremely difficult and expensive.
Machine learning has the potential to automate this process by understanding the features of computation that optimize device utilization and throughput.
Unfortunately, applications of this technique have utilized small data-sets and specific feature extraction, limiting the impact of their contributions.

To address this problem I present Dash-Database; a repository of C and C++ programs for software-defined radio applications and its neighboring fields; a methodology for structuring the features of computation using kernels, and a set of evaluation metrics to standardize computation data sets. Dash-Database contributes a general data set that supports machine understanding of computation and standardizes the input corpus utilized for machine learning of computation; currently only a small set of benchmarks and features are being used.
I present an evaluation of Dash-Database using three novel metrics: breadth, depth and richness; and compare its results to a data set largely representative of those used in prior work, indicating a 5x increase in breadth, 40x increase in depth, and a rich set of sample features.
Using Dash-Database, the broader community can work toward a general machine understanding of computation that can automate the design of workloads for domain-specific computation.

Contributors

Agent

Created

Date Created
  • 2020

158769-Thumbnail Image.png

Efficient and Online Deep Learning through Model Plasticity and Stability

Description

The rapid advancement of Deep Neural Networks (DNNs), computing, and sensing technology has enabled many new applications, such as the self-driving vehicle, the surveillance drone, and the robotic system. Compared

The rapid advancement of Deep Neural Networks (DNNs), computing, and sensing technology has enabled many new applications, such as the self-driving vehicle, the surveillance drone, and the robotic system. Compared to conventional edge devices (e.g. cell phone or smart home devices), these emerging devices are required to deal with much more complicated and dynamic situations in real-time with bounded computation resources. However, there are several challenges, including but not limited to efficiency, real-time adaptation, model stability, and automation of architecture design.

To tackle the challenges mentioned above, model plasticity and stability are leveraged to achieve efficient and online deep learning, especially in the scenario of learning streaming data at the edge:

First, a dynamic training scheme named Continuous Growth and Pruning (CGaP) is proposed to compress the DNNs through growing important parameters and pruning unimportant ones, achieving up to 98.1% reduction in the number of parameters.

Second, this dissertation presents Progressive Segmented Training (PST), which targets catastrophic forgetting problems in continual learning through importance sampling, model segmentation, and memory-assisted balancing. PST achieves state-of-the-art accuracy with 1.5X FLOPs reduction in the complete inference path.

Third, to facilitate online learning in real applications, acquisitive learning (AL) is further proposed to emphasize both knowledge inheritance and acquisition: the majority of the knowledge is first pre-trained in the inherited model and then adapted to acquire new knowledge. The inherited model's stability is monitored by noise injection and the landscape of the loss function, while the acquisition is realized by importance sampling and model segmentation. Compared to a conventional scheme, AL reduces accuracy drop by >10X on CIFAR-100 dataset, with 5X reduction in latency per training image and 150X reduction in training FLOPs.

Finally, this dissertation presents evolutionary neural architecture search in light of model stability (ENAS-S). ENAS-S uses a novel fitness score, which addresses not only the accuracy but also the model stability, to search for an optimal inherited model for the application of continual learning. ENAS-S outperforms hand-designed DNNs when learning from a data stream at the edge.

In summary, in this dissertation, several algorithms exploiting model plasticity and model stability are presented to improve the efficiency and accuracy of deep neural networks, especially for the scenario of continual learning.

Contributors

Agent

Created

Date Created
  • 2020

Design, Optimization, and Applications of Wearable IoT Devices

Description

Movement disorders are becoming one of the leading causes of functional disability due to aging populations and extended life expectancy. Diagnosis, treatment, and rehabilitation currently depend on the behavior observed

Movement disorders are becoming one of the leading causes of functional disability due to aging populations and extended life expectancy. Diagnosis, treatment, and rehabilitation currently depend on the behavior observed in a clinical environment. After the patient leaves the clinic, there is no standard approach to continuously monitor the patient and report potential problems. Furthermore, self-recording is inconvenient and unreliable. To address these challenges, wearable health monitoring is emerging as an effective way to augment clinical care for movement disorders.

Wearable devices are being used in many health, fitness, and activity monitoring applications. However, their widespread adoption has been hindered by several adaptation and technical challenges. First, conventional rigid devices are uncomfortable to wear for long periods. Second, wearable devices must operate under very low-energy budgets due to their small battery capacities. Small batteries create a need for frequent recharging, which in turn leads users to stop using them. Third, the usefulness of wearable devices must be demonstrated through high impact applications such that users can get value out of them.

This dissertation presents solutions to solving the challenges faced by wearable devices. First, it presents an open-source hardware/software platform for wearable health monitoring. The proposed platform uses flexible hybrid electronics to enable devices that conform to the shape of the user’s body. Second, it proposes an algorithm to enable recharge-free operation of wearable devices that harvest energy from the environment. The proposed solution maximizes the performance of the wearable device under minimum energy constraints. The results of the proposed algorithm are, on average, within 3% of the optimal solution computed offline. Third, a comprehensive framework for human activity recognition (HAR), one of the first steps towards a solution for movement disorders is presented. It starts with an online learning framework for HAR. Experiments on a low power IoT device (TI-CC2650 MCU) with twenty-two users show 95% accuracy in identifying seven activities and their transitions with less than 12.5 mW power consumption. The online learning framework is accompanied by a transfer learning approach for HAR that determines the number of neural network layers to transfer among uses to enable efficient online learning. Next, a technique to co-optimize the accuracy and active time of wearable applications by utilizing multiple design points with different energy-accuracy trade-offs is presented. The proposed technique switches between the design points at runtime to maximize a generalized objective function under tight harvested energy budget constraints. Finally, we present the first ultra-low-energy hardware accelerator that makes it practical to perform HAR on energy harvested from wearable devices. The accelerator consumes 22.4 microjoules per operation using a commercial 65 nm technology. In summary, the solutions presented in this dissertation can enable the wider adoption of wearable devices.

Contributors

Agent

Created

Date Created
  • 2020

152360-Thumbnail Image.png

Image processing using approximate data-path units

Description

In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and

In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and power consumption compared to their accurate counterparts and produce fairly accurate results. We build upon the work on approximate adders and multipliers presented in [23] and [24]. First, we show how choice of algorithm and parallel adder design can be used to implement 2D Discrete Cosine Transform (DCT) algorithm with good performance but low area. Our implementation of the 2D DCT has comparable PSNR performance with respect to the algorithm presented in [23] with ~35-50% reduction in area. Next, we use the approximate 2x2 multiplier presented in [24] to implement parallel approximate multipliers. We demonstrate that if some of the 2x2 multipliers in the design of the parallel multiplier are accurate, the accuracy of the multiplier improves significantly, especially when two large numbers are multiplied. We choose Gaussian FIR Filter and Fast Fourier Transform (FFT) algorithms to illustrate the efficacy of our proposed approximate multiplier. We show that application of the proposed approximate multiplier improves the PSNR performance of 32x32 FFT implementation by 4.7 dB compared to the implementation using the approximate multiplier described in [24]. We also implement a state-of-the-art image enlargement algorithm, namely Segment Adaptive Gradient Angle (SAGA) [29], in hardware. The algorithm is mapped to pipelined hardware blocks and we synthesized the design using 90 nm technology. We show that a 64x64 image can be processed in 496.48 µs when clocked at 100 MHz. The average PSNR performance of our implementation using accurate parallel adders and multipliers is 31.33 dB and that using approximate parallel adders and multipliers is 30.86 dB, when evaluated against the original image. The PSNR performance of both designs is comparable to the performance of the double precision floating point MATLAB implementation of the algorithm.

Contributors

Agent

Created

Date Created
  • 2013

152173-Thumbnail Image.png

Dynamic scheduling of stream programs on embedded multi-core processors

Description

Stream computing has emerged as an importantmodel of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core

Stream computing has emerged as an importantmodel of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core processors have been proposed for streaming applications. This thesis examines the execution and dynamic scheduling of stream programs on embedded multi-core processors. The thesis addresses the problem in the context of a multi-tasking environment with a time varying allocation of processing elements for a particular streaming application. As a solution the thesis proposes a two step approach where the stream program is compiled to gather key application information, and to generate re-targetable code. A light weight dynamic scheduler incorporates the second stage of the approach. The dynamic scheduler utilizes the static information and available resources to assign or partition the application across the multi-core architecture. The objective of the dynamic scheduler is to maximize the throughput of the application, and it is sensitive to the resource (processing elements, scratch-pad memory, DMA bandwidth) constraints imposed by the target architecture. We evaluate the proposed approach by compiling and scheduling benchmark stream programs on a representative embedded multi-core processor. We present experimental results that evaluate the quality of the solutions generated by the proposed approach by comparisons with existing techniques.

Contributors

Agent

Created

Date Created
  • 2013

152892-Thumbnail Image.png

Constrained energy optimization in heterogeneous platforms using generalized scaling models

Description

Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on-chip (MpSoC) with numerous resources including display, memory, power management IC (PMIC), battery and wireless modems into a compact

Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on-chip (MpSoC) with numerous resources including display, memory, power management IC (PMIC), battery and wireless modems into a compact package. Furthermore, the MpSoC itself is a heterogeneous resource that integrates many processing elements such as CPU cores, GPU, video, image, and audio processors. As a result, optimization approaches targeting mobile computing needs to consider the platform at various levels of granularity.

Platform energy consumption and responsiveness are two major considerations for mobile systems since they determine the battery life and user satisfaction, respectively. In this work, the models for power consumption, response time, and energy consumption of heterogeneous mobile platforms are presented. Then, these models are used to optimize the energy consumption of baseline platforms under power, response time, and temperature constraints with and without introducing new resources. It is shown, the optimal design choices depend on dynamic power management algorithm, and adding new resources is more energy efficient than scaling existing resources alone. The framework is verified through actual experiments on Qualcomm Snapdragon 800 based tablet MDP/T. Furthermore, usage of the framework at both design and runtime optimization is also presented.

Contributors

Agent

Created

Date Created
  • 2014

151700-Thumbnail Image.png

Increasing the efficiency of Doppler processing and backend processing in medical ultrasound systems

Description

Ultrasound imaging is one of the major medical imaging modalities. It is cheap, non-invasive and has low power consumption. Doppler processing is an important part of many ultrasound imaging systems.

Ultrasound imaging is one of the major medical imaging modalities. It is cheap, non-invasive and has low power consumption. Doppler processing is an important part of many ultrasound imaging systems. It is used to provide blood velocity information and is built on top of B-mode systems. We investigate the performance of two velocity estimation schemes used in Doppler processing systems, namely, directional velocity estimation (DVE) and conventional velocity estimation (CVE). We find that DVE provides better estimation performance and is the only functioning method when the beam to flow angle is large. Unfortunately, DVE is computationally expensive and also requires divisions and square root operations that are hard to implement. We propose two approximation techniques to replace these computations. The simulation results on cyst images show that the proposed approximations do not affect the estimation performance. We also study backend processing which includes envelope detection, log compression and scan conversion. Three different envelope detection methods are compared. Among them, FIR based Hilbert Transform is considered the best choice when phase information is not needed, while quadrature demodulation is a better choice if phase information is necessary. Bilinear and Gaussian interpolation are considered for scan conversion. Through simulations of a cyst image, we show that bilinear interpolation provides comparable contrast-to-noise ratio (CNR) performance with Gaussian interpolation and has lower computational complexity. Thus, bilinear interpolation is chosen for our system.

Contributors

Agent

Created

Date Created
  • 2013