Matching Items (90)

Filtering by

Clear all filters

152307-Thumbnail Image.png

Adaptive learning and unsupervised clustering of immune responses using microarray random sequence peptides

Description

Immunosignaturing is a medical test for assessing the health status of a patient by applying microarrays of random sequence peptides to determine the patient's immune fingerprint by associating antibodies from a biological sample to immune responses. The immunosignature measurements can

Immunosignaturing is a medical test for assessing the health status of a patient by applying microarrays of random sequence peptides to determine the patient's immune fingerprint by associating antibodies from a biological sample to immune responses. The immunosignature measurements can potentially provide pre-symptomatic diagnosis for infectious diseases or detection of biological threats. Currently, traditional bioinformatics tools, such as data mining classification algorithms, are used to process the large amount of peptide microarray data. However, these methods generally require training data and do not adapt to changing immune conditions or additional patient information. This work proposes advanced processing techniques to improve the classification and identification of single and multiple underlying immune response states embedded in immunosignatures, making it possible to detect both known and previously unknown diseases or biothreat agents. Novel adaptive learning methodologies for un- supervised and semi-supervised clustering integrated with immunosignature feature extraction approaches are proposed. The techniques are based on extracting novel stochastic features from microarray binding intensities and use Dirichlet process Gaussian mixture models to adaptively cluster the immunosignatures in the feature space. This learning-while-clustering approach allows continuous discovery of antibody activity by adaptively detecting new disease states, with limited a priori disease or patient information. A beta process factor analysis model to determine underlying patient immune responses is also proposed to further improve the adaptive clustering performance by formatting new relationships between patients and antibody activity. In order to extend the clustering methods for diagnosing multiple states in a patient, the adaptive hierarchical Dirichlet process is integrated with modified beta process factor analysis latent feature modeling to identify relationships between patients and infectious agents. The use of Bayesian nonparametric adaptive learning techniques allows for further clustering if additional patient data is received. Significant improvements in feature identification and immune response clustering are demonstrated using samples from patients with different diseases.

Contributors

Agent

Created

Date Created
2013

151215-Thumbnail Image.png

Energy and quality-aware multimedia signal processing

Description

Today's mobile devices have to support computation-intensive multimedia applications with a limited energy budget. In this dissertation, we present architecture level and algorithm-level techniques that reduce energy consumption of these devices with minimal impact on system quality. First, we present

Today's mobile devices have to support computation-intensive multimedia applications with a limited energy budget. In this dissertation, we present architecture level and algorithm-level techniques that reduce energy consumption of these devices with minimal impact on system quality. First, we present novel techniques to mitigate the effects of SRAM memory failures in JPEG2000 implementations operating in scaled voltages. We investigate error control coding schemes and propose an unequal error protection scheme tailored for JPEG2000 that reduces overhead without affecting the performance. Furthermore, we propose algorithm-specific techniques for error compensation that exploit the fact that in JPEG2000 the discrete wavelet transform outputs have larger values for low frequency subband coefficients and smaller values for high frequency subband coefficients. Next, we present use of voltage overscaling to reduce the data-path power consumption of JPEG codecs. We propose an algorithm-specific technique which exploits the characteristics of the quantized coefficients after zig-zag scan to mitigate errors introduced by aggressive voltage scaling. Third, we investigate the effect of reducing dynamic range for datapath energy reduction. We analyze the effect of truncation error and propose a scheme that estimates the mean value of the truncation error during the pre-computation stage and compensates for this error. Such a scheme is very effective for reducing the noise power in applications that are dominated by additions and multiplications such as FIR filter and transform computation. We also present a novel sum of absolute difference (SAD) scheme that is based on most significant bit truncation. The proposed scheme exploits the fact that most of the absolute difference (AD) calculations result in small values, and most of the large AD values do not contribute to the SAD values of the blocks that are selected. Such a scheme is highly effective in reducing the energy consumption of motion estimation and intra-prediction kernels in video codecs. Finally, we present several hybrid energy-saving techniques based on combination of voltage scaling, computation reduction and dynamic range reduction that further reduce the energy consumption while keeping the performance degradation very low. For instance, a combination of computation reduction and dynamic range reduction for Discrete Cosine Transform shows on average, 33% to 46% reduction in energy consumption while incurring only 0.5dB to 1.5dB loss in PSNR.

Contributors

Agent

Created

Date Created
2012

152459-Thumbnail Image.png

Improving the reliability of NAND Flash, phase-change RAM and spin-torque transfer RAM

Description

Non-volatile memories (NVM) are widely used in modern electronic devices due to their non-volatility, low static power consumption and high storage density. While Flash memories are the dominant NVM technology, resistive memories such as phase change access memory (PRAM) and

Non-volatile memories (NVM) are widely used in modern electronic devices due to their non-volatility, low static power consumption and high storage density. While Flash memories are the dominant NVM technology, resistive memories such as phase change access memory (PRAM) and spin torque transfer random access memory (STT-MRAM) are gaining ground. All these technologies suffer from reliability degradation due to process variations, structural limits and material property shift. To address the reliability concerns of these NVM technologies, multi-level low cost solutions are proposed for each of them. My approach consists of first building a comprehensive error model. Next the error characteristics are exploited to develop low cost multi-level strategies to compensate for the errors. For instance, for NAND Flash memory, I first characterize errors due to threshold voltage variations as a function of the number of program/erase cycles. Next a flexible product code is designed to migrate to a stronger ECC scheme as program/erase cycles increases. An adaptive data refresh scheme is also proposed to improve memory reliability with low energy cost for applications with different data update frequencies. For PRAM, soft errors and hard errors models are built based on shifts in the resistance distributions. Next I developed a multi-level error control approach involving bit interleaving and subblock flipping at the architecture level, threshold resistance tuning at the circuit level and programming current profile tuning at the device level. This approach helped reduce the error rate significantly so that it was now sufficient to use a low cost ECC scheme to satisfy the memory reliability constraint. I also studied the reliability of a PRAM+DRAM hybrid memory system and analyzed the tradeoffs between memory performance, programming energy and lifetime. For STT-MRAM, I first developed an error model based on process variations. I developed a multi-level approach to reduce the error rates that consisted of increasing the W/L ratio of the access transistor, increasing the voltage difference across the memory cell and adjusting the current profile during write operation. This approach enabled use of a low cost BCH based ECC scheme to achieve very low block failure rates.

Contributors

Agent

Created

Date Created
2014

152344-Thumbnail Image.png

Adaptive methods within a sequential Bayesian approach for structural health monitoring

Description

Structural integrity is an important characteristic of performance for critical components used in applications such as aeronautics, materials, construction and transportation. When appraising the structural integrity of these components, evaluation methods must be accurate. In addition to possessing capability to

Structural integrity is an important characteristic of performance for critical components used in applications such as aeronautics, materials, construction and transportation. When appraising the structural integrity of these components, evaluation methods must be accurate. In addition to possessing capability to perform damage detection, the ability to monitor the level of damage over time can provide extremely useful information in assessing the operational worthiness of a structure and in determining whether the structure should be repaired or removed from service. In this work, a sequential Bayesian approach with active sensing is employed for monitoring crack growth within fatigue-loaded materials. The monitoring approach is based on predicting crack damage state dynamics and modeling crack length observations. Since fatigue loading of a structural component can change while in service, an interacting multiple model technique is employed to estimate probabilities of different loading modes and incorporate this information in the crack length estimation problem. For the observation model, features are obtained from regions of high signal energy in the time-frequency plane and modeled for each crack length damage condition. Although this observation model approach exhibits high classification accuracy, the resolution characteristics can change depending upon the extent of the damage. Therefore, several different transmission waveforms and receiver sensors are considered to create multiple modes for making observations of crack damage. Resolution characteristics of the different observation modes are assessed using a predicted mean squared error criterion and observations are obtained using the predicted, optimal observation modes based on these characteristics. Calculation of the predicted mean square error metric can be computationally intensive, especially if performed in real time, and an approximation method is proposed. With this approach, the real time computational burden is decreased significantly and the number of possible observation modes can be increased. Using sensor measurements from real experiments, the overall sequential Bayesian estimation approach, with the adaptive capability of varying the state dynamics and observation modes, is demonstrated for tracking crack damage.

Contributors

Agent

Created

Date Created
2013

152360-Thumbnail Image.png

Image processing using approximate data-path units

Description

In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and power consumption compared to their accurate counterparts and produce fairly

In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and power consumption compared to their accurate counterparts and produce fairly accurate results. We build upon the work on approximate adders and multipliers presented in [23] and [24]. First, we show how choice of algorithm and parallel adder design can be used to implement 2D Discrete Cosine Transform (DCT) algorithm with good performance but low area. Our implementation of the 2D DCT has comparable PSNR performance with respect to the algorithm presented in [23] with ~35-50% reduction in area. Next, we use the approximate 2x2 multiplier presented in [24] to implement parallel approximate multipliers. We demonstrate that if some of the 2x2 multipliers in the design of the parallel multiplier are accurate, the accuracy of the multiplier improves significantly, especially when two large numbers are multiplied. We choose Gaussian FIR Filter and Fast Fourier Transform (FFT) algorithms to illustrate the efficacy of our proposed approximate multiplier. We show that application of the proposed approximate multiplier improves the PSNR performance of 32x32 FFT implementation by 4.7 dB compared to the implementation using the approximate multiplier described in [24]. We also implement a state-of-the-art image enlargement algorithm, namely Segment Adaptive Gradient Angle (SAGA) [29], in hardware. The algorithm is mapped to pipelined hardware blocks and we synthesized the design using 90 nm technology. We show that a 64x64 image can be processed in 496.48 µs when clocked at 100 MHz. The average PSNR performance of our implementation using accurate parallel adders and multipliers is 31.33 dB and that using approximate parallel adders and multipliers is 30.86 dB, when evaluated against the original image. The PSNR performance of both designs is comparable to the performance of the double precision floating point MATLAB implementation of the algorithm.

Contributors

Agent

Created

Date Created
2013

152527-Thumbnail Image.png

FPGA-based implementation of QR decomposition

Description

This thesis report aims at introducing the background of QR decomposition and its application. QR decomposition using Givens rotations is a efficient method to prevent directly matrix inverse in solving least square minimization problem, which is a typical approach for

This thesis report aims at introducing the background of QR decomposition and its application. QR decomposition using Givens rotations is a efficient method to prevent directly matrix inverse in solving least square minimization problem, which is a typical approach for weight calculation in adaptive beamforming. Furthermore, this thesis introduces Givens rotations algorithm and two general VLSI (very large scale integrated circuit) architectures namely triangular systolic array and linear systolic array for numerically QR decomposition. To fulfill the goal, a 4 input channels triangular systolic array with 16 bits fixed-point format and a 5 input channels linear systolic array are implemented on FPGA (Field programmable gate array). The final result shows that the estimated clock frequencies of 65 MHz and 135 MHz on post-place and route static timing report could be achieved using Xilinx Virtex 6 xc6vlx240t chip. Meanwhile, this report proposes a new method to test the dynamic range of QR-D. The dynamic range of the both architectures can be achieved around 110dB.

Contributors

Agent

Created

Date Created
2014

153890-Thumbnail Image.png

RRAM-based PUF: design and applications in cryptography

Description

The recent flurry of security breaches have raised serious concerns about the security of data communication and storage. A promising way to enhance the security of the system is through physical root of trust, such as, through use of physical

The recent flurry of security breaches have raised serious concerns about the security of data communication and storage. A promising way to enhance the security of the system is through physical root of trust, such as, through use of physical unclonable functions (PUF). PUF leverages the inherent randomness in physical systems to provide device specific authentication and encryption.

In this thesis, first the design of a highly reliable resistive random access memory (RRAM) PUF is presented. Compared to existing 1 cell/bit RRAM, here the sum of the read-out currents of multiple RRAM cells are used for generating one response bit. This method statistically minimizes any early-lifetime failure due to RRAM retention degradation at high temperature or under voltage stress. Using a device model that was calibrated using IMEC HfOx RRAM experimental data, it was shown that an 8 cells/bit architecture achieves 99.9999% reliability for a lifetime >10 years at 125℃ . Also, the hardware area overhead of the proposed 8 cells/bit RRAM PUF architecture was smaller than 1 cell/bit RRAM PUF that requires error correction coding to achieve the same reliability.

Next, a basic security primitive is presented, where the RRAM PUF is embedded in the cryptographic module, SHA-256. This architecture is referred to as Embedded PUF or EPUF. EPUF has a security advantage over SHA-256 as it never exposes the PUF response to the outside world. Instead, in each round, the PUF response is used to change a few bits of the message word to produce a unique message digest for each IC. The use of EPUF as a key generation module for AES is also shown. The hardware area requirement for SHA-256 and AES-128 is then analyzed using synthesis results based on TSMC 65nm library. It is shown that the area overhead of 8 cells/bit RRAM PUF is only 1.08% of the SHA-256 module and 0.04% of the AES-128 module. The security analysis of the PUF based systems is also presented. It is shown that the EPUF-based systems are resistant towards standard attacks on PUFs, and that the security of the cryptographic modules is not compromised.

Contributors

Agent

Created

Date Created
2015

151941-Thumbnail Image.png

Towards energy efficient computing with Linux: enabling task level power awareness and support for energy efficient accelerator

Description

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators.

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.

Contributors

Agent

Created

Date Created
2013

151971-Thumbnail Image.png

Efficient Bayesian tracking of multiple sources of neural activity: algorithms and real-time FPGA implementation

Description

Electrical neural activity detection and tracking have many applications in medical research and brain computer interface technologies. In this thesis, we focus on the development of advanced signal processing algorithms to track neural activity and on the mapping of these

Electrical neural activity detection and tracking have many applications in medical research and brain computer interface technologies. In this thesis, we focus on the development of advanced signal processing algorithms to track neural activity and on the mapping of these algorithms onto hardware to enable real-time tracking. At the heart of these algorithms is particle filtering (PF), a sequential Monte Carlo technique used to estimate the unknown parameters of dynamic systems. First, we analyze the bottlenecks in existing PF algorithms, and we propose a new parallel PF (PPF) algorithm based on the independent Metropolis-Hastings (IMH) algorithm. We show that the proposed PPF-IMH algorithm improves the root mean-squared error (RMSE) estimation performance, and we demonstrate that a parallel implementation of the algorithm results in significant reduction in inter-processor communication. We apply our implementation on a Xilinx Virtex-5 field programmable gate array (FPGA) platform to demonstrate that, for a one-dimensional problem, the PPF-IMH architecture with four processing elements and 1,000 particles can process input samples at 170 kHz by using less than 5% FPGA resources. We also apply the proposed PPF-IMH to waveform-agile sensing to achieve real-time tracking of dynamic targets with high RMSE tracking performance. We next integrate the PPF-IMH algorithm to track the dynamic parameters in neural sensing when the number of neural dipole sources is known. We analyze the computational complexity of a PF based method and propose the use of multiple particle filtering (MPF) to reduce the complexity. We demonstrate the improved performance of MPF using numerical simulations with both synthetic and real data. We also propose an FPGA implementation of the MPF algorithm and show that the implementation supports real-time tracking. For the more realistic scenario of automatically estimating an unknown number of time-varying neural dipole sources, we propose a new approach based on the probability hypothesis density filtering (PHDF) algorithm. The PHDF is implemented using particle filtering (PF-PHDF), and it is applied in a closed-loop to first estimate the number of dipole sources and then their corresponding amplitude, location and orientation parameters. We demonstrate the improved tracking performance of the proposed PF-PHDF algorithm and map it onto a Xilinx Virtex-5 FPGA platform to show its real-time implementation potential. Finally, we propose the use of sensor scheduling and compressive sensing techniques to reduce the number of active sensors, and thus overall power consumption, of electroencephalography (EEG) systems. We propose an efficient sensor scheduling algorithm which adaptively configures EEG sensors at each measurement time interval to reduce the number of sensors needed for accurate tracking. We combine the sensor scheduling method with PF-PHDF and implement the system on an FPGA platform to achieve real-time tracking. We also investigate the sparsity of EEG signals and integrate compressive sensing with PF to estimate neural activity. Simulation results show that both sensor scheduling and compressive sensing based methods achieve comparable tracking performance with significantly reduced number of sensors.

Contributors

Agent

Created

Date Created
2013

Unified framework for energy-proportional computing in multicore processors: novel algorithms and practical implementation

Description

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a lower power consumption rate. With multi-cores, there is increased need for dynamic energy management (DEM), much more than for single-core processors, as DEM for multi-cores is no more a mechanism just to ensure that a processor is kept under specified temperature limits, but also a set of techniques that manage various processor controls like dynamic voltage and frequency scaling (DVFS), task migration, fan speed, etc. to achieve a stated objective. The objectives span a wide range from maximizing throughput, minimizing power consumption, reducing peak temperature, maximizing energy efficiency, maximizing processor reliability, and so on, along with much more wider constraints of temperature, power, timing, and reliability constraints. Thus DEM can be very complex and challenging to achieve. Since often times many DEMs operate together on a single processor, there is a need to unify various DEM techniques. This dissertation address such a need. In this work, a framework for DEM is proposed that provides a unifying processor model that includes processor power, thermal, timing, and reliability models, supports various DEM control mechanisms, many different objective functions along with equally diverse constraint specifications. Using the framework, a range of novel solutions is derived for instances of DEM problems, that include maximizing processor performance, energy efficiency, or minimizing power consumption, peak temperature under constraints of maximum temperature, memory reliability and task deadlines. Finally, a robust closed-loop controller to implement the above solutions on a real processor platform with a very low operational overhead is proposed. Along with the controller design, a model identification methodology for obtaining the required power and thermal models for the controller is also discussed. The controller is architecture independent and hence easily portable across many platforms. The controller has been successfully deployed on Intel Sandy Bridge processor and the use of the controller has increased the energy efficiency of the processor by over 30%

Contributors

Agent

Created

Date Created
2013