Matching Items (51)
Filtering by

Clear all filters

151700-Thumbnail Image.png
Description
Ultrasound imaging is one of the major medical imaging modalities. It is cheap, non-invasive and has low power consumption. Doppler processing is an important part of many ultrasound imaging systems. It is used to provide blood velocity information and is built on top of B-mode systems. We investigate the performance

Ultrasound imaging is one of the major medical imaging modalities. It is cheap, non-invasive and has low power consumption. Doppler processing is an important part of many ultrasound imaging systems. It is used to provide blood velocity information and is built on top of B-mode systems. We investigate the performance of two velocity estimation schemes used in Doppler processing systems, namely, directional velocity estimation (DVE) and conventional velocity estimation (CVE). We find that DVE provides better estimation performance and is the only functioning method when the beam to flow angle is large. Unfortunately, DVE is computationally expensive and also requires divisions and square root operations that are hard to implement. We propose two approximation techniques to replace these computations. The simulation results on cyst images show that the proposed approximations do not affect the estimation performance. We also study backend processing which includes envelope detection, log compression and scan conversion. Three different envelope detection methods are compared. Among them, FIR based Hilbert Transform is considered the best choice when phase information is not needed, while quadrature demodulation is a better choice if phase information is necessary. Bilinear and Gaussian interpolation are considered for scan conversion. Through simulations of a cyst image, we show that bilinear interpolation provides comparable contrast-to-noise ratio (CNR) performance with Gaussian interpolation and has lower computational complexity. Thus, bilinear interpolation is chosen for our system.
ContributorsWei, Siyuan (Author) / Chakrabarti, Chaitali (Thesis advisor) / Frakes, David (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2013
152139-Thumbnail Image.png
Description
ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a

ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a certain kind of membrane systems, is inspired by the way the neurons in brain interact using electrical spikes. Compared to the traditional Boolean logic, SNP systems not only perform similar functions but also provide a more promising solution for reliable computation. Two basic neuron types, Low Pass (LP) neurons and High Pass (HP) neurons, are introduced. These two basic types of neurons are capable to build an arbitrary SNP neuron. This leads to the conclusion that these two basic neuron types are Turing complete since SNP systems has been proved Turing complete. These two basic types of neurons are further used as the elements to construct general-purpose arithmetic circuits, such as adder, subtractor and comparator. In this thesis, erroneous behaviors of neurons are discussed. Transmission error (spike loss) is proved to be equivalent to threshold error, which makes threshold error discussion more universal. To improve the reliability, a new structure called motif is proposed. Compared to Triple Modular Redundancy improvement, motif design presents its efficiency and effectiveness in both single neuron and arithmetic circuit analysis. DRAM-based CMOS circuits are used to implement the two basic types of neurons. Functionality of basic type neurons is proved using the SPICE simulations. The motif improved adder and the comparator, as compared to conventional Boolean logic design, are much more reliable with lower leakage, and smaller silicon area. This leads to the conclusion that SNP system could provide a more promising solution for reliable computation than the conventional Boolean logic.
ContributorsAn, Pei (Author) / Cao, Yu (Thesis advisor) / Barnaby, Hugh (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)
Created2013
151941-Thumbnail Image.png
Description
With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.
ContributorsDesai, Digant Pareshkumar (Author) / Vrudhula, Sarma (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2013
152173-Thumbnail Image.png
Description
Stream computing has emerged as an importantmodel of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core processors have been proposed for streaming applications. This thesis examines the execution and dynamic scheduling of stream programs on embedded

Stream computing has emerged as an importantmodel of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core processors have been proposed for streaming applications. This thesis examines the execution and dynamic scheduling of stream programs on embedded multi-core processors. The thesis addresses the problem in the context of a multi-tasking environment with a time varying allocation of processing elements for a particular streaming application. As a solution the thesis proposes a two step approach where the stream program is compiled to gather key application information, and to generate re-targetable code. A light weight dynamic scheduler incorporates the second stage of the approach. The dynamic scheduler utilizes the static information and available resources to assign or partition the application across the multi-core architecture. The objective of the dynamic scheduler is to maximize the throughput of the application, and it is sensitive to the resource (processing elements, scratch-pad memory, DMA bandwidth) constraints imposed by the target architecture. We evaluate the proposed approach by compiling and scheduling benchmark stream programs on a representative embedded multi-core processor. We present experimental results that evaluate the quality of the solutions generated by the proposed approach by comparisons with existing techniques.
ContributorsLee, Haeseung (Author) / Chatha, Karamvir (Thesis advisor) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2013
152970-Thumbnail Image.png
Description
Neural activity tracking using electroencephalography (EEG) and magnetoencephalography (MEG) brain scanning methods has been widely used in the field of neuroscience to provide insight into the nervous system. However, the tracking accuracy depends on the presence of artifacts in the EEG/MEG recordings. Artifacts include any signals that do not originate

Neural activity tracking using electroencephalography (EEG) and magnetoencephalography (MEG) brain scanning methods has been widely used in the field of neuroscience to provide insight into the nervous system. However, the tracking accuracy depends on the presence of artifacts in the EEG/MEG recordings. Artifacts include any signals that do not originate from neural activity, including physiological artifacts such as eye movement and non-physiological activity caused by the environment.

This work proposes an integrated method for simultaneously tracking multiple neural sources using the probability hypothesis density particle filter (PPHDF) and reducing the effect of artifacts using feature extraction and stochastic modeling. Unique time-frequency features are first extracted using matching pursuit decomposition for both neural activity and artifact signals.

The features are used to model probability density functions for each signal type using Gaussian mixture modeling for use in the PPHDF neural tracking algorithm. The probability density function of the artifacts provides information to the tracking algorithm that can help reduce the probability of incorrectly estimating the dynamically varying number of current dipole sources and their corresponding neural activity localization parameters. Simulation results demonstrate the effectiveness of the proposed algorithm in increasing the tracking accuracy performance for multiple dipole sources using recordings that have been contaminated by artifacts.
ContributorsJiang, Jiewei (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)
Created2014
152892-Thumbnail Image.png
Description
Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on-chip (MpSoC) with numerous resources including display, memory, power management IC (PMIC), battery and wireless modems into a compact package. Furthermore, the MpSoC itself is a heterogeneous resource that integrates many processing elements such as CPU cores, GPU, video,

Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on-chip (MpSoC) with numerous resources including display, memory, power management IC (PMIC), battery and wireless modems into a compact package. Furthermore, the MpSoC itself is a heterogeneous resource that integrates many processing elements such as CPU cores, GPU, video, image, and audio processors. As a result, optimization approaches targeting mobile computing needs to consider the platform at various levels of granularity.

Platform energy consumption and responsiveness are two major considerations for mobile systems since they determine the battery life and user satisfaction, respectively. In this work, the models for power consumption, response time, and energy consumption of heterogeneous mobile platforms are presented. Then, these models are used to optimize the energy consumption of baseline platforms under power, response time, and temperature constraints with and without introducing new resources. It is shown, the optimal design choices depend on dynamic power management algorithm, and adding new resources is more energy efficient than scaling existing resources alone. The framework is verified through actual experiments on Qualcomm Snapdragon 800 based tablet MDP/T. Furthermore, usage of the framework at both design and runtime optimization is also presented.
ContributorsGupta, Ujjwala (Author) / Ogras, Umit Y. (Thesis advisor) / Ozev, Sule (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)
Created2014
152527-Thumbnail Image.png
Description
This thesis report aims at introducing the background of QR decomposition and its application. QR decomposition using Givens rotations is a efficient method to prevent directly matrix inverse in solving least square minimization problem, which is a typical approach for weight calculation in adaptive beamforming. Furthermore, this thesis introduces Givens

This thesis report aims at introducing the background of QR decomposition and its application. QR decomposition using Givens rotations is a efficient method to prevent directly matrix inverse in solving least square minimization problem, which is a typical approach for weight calculation in adaptive beamforming. Furthermore, this thesis introduces Givens rotations algorithm and two general VLSI (very large scale integrated circuit) architectures namely triangular systolic array and linear systolic array for numerically QR decomposition. To fulfill the goal, a 4 input channels triangular systolic array with 16 bits fixed-point format and a 5 input channels linear systolic array are implemented on FPGA (Field programmable gate array). The final result shows that the estimated clock frequencies of 65 MHz and 135 MHz on post-place and route static timing report could be achieved using Xilinx Virtex 6 xc6vlx240t chip. Meanwhile, this report proposes a new method to test the dynamic range of QR-D. The dynamic range of the both architectures can be achieved around 110dB.
ContributorsYu, Hanguang (Author) / Bliss, Daniel W (Thesis advisor) / Ying, Lei (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)
Created2014
153414-Thumbnail Image.png
Description
Driven by stringent power and thermal constraints, heterogeneous multi-core processors, such as the ARM big-LITTLE architecture, are becoming increasingly popular. In this thesis, the use of low-power heterogeneous multi-cores as Microservers using web search as a motivational application is addressed. In particular, I propose a new family of scheduling policies

Driven by stringent power and thermal constraints, heterogeneous multi-core processors, such as the ARM big-LITTLE architecture, are becoming increasingly popular. In this thesis, the use of low-power heterogeneous multi-cores as Microservers using web search as a motivational application is addressed. In particular, I propose a new family of scheduling policies for heterogeneous microservers that assign incoming search queries to available cores so as to optimize for performance metrics such as mean response time and service level agreements, while guaranteeing thermally-safe operation. Thorough experimental evaluations on a big-LITTLE platform demonstrate, on an heterogeneous eight-core Samsung Exynos 5422 MpSoC, with four big and little cores each, that naive performance oriented scheduling policies quickly result in thermal instability, while the proposed policies not only reduce peak temperature but also achieve 4.8x reduction in processing time and 5.6x increase in energy efficiency compared to baseline scheduling policies.
ContributorsJain, Sankalp (Author) / Ogras, Umit Y. (Thesis advisor) / Garg, Siddharth (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)
Created2015
153249-Thumbnail Image.png
Description
In this thesis we consider the problem of facial expression recognition (FER) from video sequences. Our method is based on subspace representations and Grassmann manifold based learning. We use Local Binary Pattern (LBP) at the frame level for representing the facial features. Next we develop a model to represent the

In this thesis we consider the problem of facial expression recognition (FER) from video sequences. Our method is based on subspace representations and Grassmann manifold based learning. We use Local Binary Pattern (LBP) at the frame level for representing the facial features. Next we develop a model to represent the video sequence in a lower dimensional expression subspace and also as a linear dynamical system using Autoregressive Moving Average (ARMA) model. As these subspaces lie on Grassmann space, we use Grassmann manifold based learning techniques such as kernel Fisher Discriminant Analysis with Grassmann kernels for classification. We consider six expressions namely, Angry (AN), Disgust (Di), Fear (Fe), Happy (Ha), Sadness (Sa) and Surprise (Su) for classification. We perform experiments on extended Cohn-Kanade (CK+) facial expression database to evaluate the expression recognition performance. Our method demonstrates good expression recognition performance outperforming other state of the art FER algorithms. We achieve an average recognition accuracy of 97.41% using a method based on expression subspace, kernel-FDA and Support Vector Machines (SVM) classifier. By using a simpler classifier, 1-Nearest Neighbor (1-NN) along with kernel-FDA, we achieve a recognition accuracy of 97.09%. We find that to process a group of 19 frames in a video sequence, LBP feature extraction requires majority of computation time (97 %) which is about 1.662 seconds on the Intel Core i3, dual core platform. However when only 3 frames (onset, middle and peak) of a video sequence are used, the computational complexity is reduced by about 83.75 % to 260 milliseconds at the expense of drop in the recognition accuracy to 92.88 %.
ContributorsYellamraju, Anirudh (Author) / Chakrabarti, Chaitali (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Karam, Lina (Committee member) / Arizona State University (Publisher)
Created2014
150167-Thumbnail Image.png
Description
Redundant Binary (RBR) number representations have been extensively used in the past for high-throughput Digital Signal Processing (DSP) systems. Data-path components based on this number system have smaller critical path delay but larger area compared to conventional two's complement systems. This work explores the use of RBR number representation for

Redundant Binary (RBR) number representations have been extensively used in the past for high-throughput Digital Signal Processing (DSP) systems. Data-path components based on this number system have smaller critical path delay but larger area compared to conventional two's complement systems. This work explores the use of RBR number representation for implementing high-throughput DSP systems that are also energy-efficient. Data-path components such as adders and multipliers are evaluated with respect to critical path delay, energy and Energy-Delay Product (EDP). A new design for a RBR adder with very good EDP performance has been proposed. The corresponding RBR parallel adder has a much lower critical path delay and EDP compared to two's complement carry select and carry look-ahead adder implementations. Next, several RBR multiplier architectures are investigated and their performance compared to two's complement systems. These include two new multiplier architectures: a purely RBR multiplier where both the operands are in RBR form, and a hybrid multiplier where the multiplicand is in RBR form and the other operand is represented in conventional two's complement form. Both the RBR and hybrid designs are demonstrated to have better EDP performance compared to conventional two's complement multipliers. The hybrid multiplier is also shown to have a superior EDP performance compared to the RBR multiplier, with much lower implementation area. Analysis on the effect of bit-precision is also performed, and it is shown that the performance gain of RBR systems improves for higher bit precision. Next, in order to demonstrate the efficacy of the RBR representation at the system-level, the performance of RBR and hybrid implementations of some common DSP kernels such as Discrete Cosine Transform, edge detection using Sobel operator, complex multiplication, Lifting-based Discrete Wavelet Transform (9, 7) filter, and FIR filter, is compared with two's complement systems. It is shown that for relatively large computation modules, the RBR to two's complement conversion overhead gets amortized. In case of systems with high complexity, for iso-throughput, both the hybrid and RBR implementations are demonstrated to be superior with lower average energy consumption. For low complexity systems, the conversion overhead is significant, and overpowers the EDP performance gain obtained from the RBR computation operation.
ContributorsMahadevan, Rupa (Author) / Chakrabarti, Chaitali (Thesis advisor) / Kiaei, Sayfe (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2011