Matching Items (97)
- Genre: Academic theses
- Creators: Chakrabarti, Chaitali
- Status: Published
There is a growing interest in the creation of three-dimensional (3D) images and videos due to the growing demand for 3D visual media in commercial markets. A possible solution to produce 3D media files is to convert existing 2D images and videos to 3D. The 2D to 3D conversion methods that estimate the depth map from 2D scenes for 3D reconstruction present an efficient approach to save on the cost of the coding, transmission and storage of 3D visual media in practical applications. Various 2D to 3D conversion methods based on depth maps have been developed using existing image and video processing techniques. The depth maps can be estimated either from a single 2D view or from multiple 2D views. This thesis presents a MATLAB-based 2D to 3D conversion system from multiple views based on the computation of a sparse depth map. The 2D to 3D conversion system is able to deal with the multiple views obtained from uncalibrated hand-held cameras without knowledge of the prior camera parameters or scene geometry. The implemented system consists of techniques for image feature detection and registration, two-view geometry estimation, projective 3D scene reconstruction and metric upgrade to reconstruct the 3D structures by means of a metric transformation. The implemented 2D to 3D conversion system is tested using different multi-view image sets. The obtained experimental results of reconstructed sparse depth maps of feature points in 3D scenes provide relative depth information of the objects. Sample ground-truth depth data points are used to calculate a scale factor in order to estimate the true depth by scaling the obtained relative depth information using the estimated scale factor. It was found out that the obtained reconstructed depth map is consistent with the ground-truth depth data.
Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.
The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.
Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.
Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.
The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
The recent flurry of security breaches have raised serious concerns about the security of data communication and storage. A promising way to enhance the security of the system is through physical root of trust, such as, through use of physical unclonable functions (PUF). PUF leverages the inherent randomness in physical systems to provide device specific authentication and encryption.
In this thesis, first the design of a highly reliable resistive random access memory (RRAM) PUF is presented. Compared to existing 1 cell/bit RRAM, here the sum of the read-out currents of multiple RRAM cells are used for generating one response bit. This method statistically minimizes any early-lifetime failure due to RRAM retention degradation at high temperature or under voltage stress. Using a device model that was calibrated using IMEC HfOx RRAM experimental data, it was shown that an 8 cells/bit architecture achieves 99.9999% reliability for a lifetime >10 years at 125℃ . Also, the hardware area overhead of the proposed 8 cells/bit RRAM PUF architecture was smaller than 1 cell/bit RRAM PUF that requires error correction coding to achieve the same reliability.
Next, a basic security primitive is presented, where the RRAM PUF is embedded in the cryptographic module, SHA-256. This architecture is referred to as Embedded PUF or EPUF. EPUF has a security advantage over SHA-256 as it never exposes the PUF response to the outside world. Instead, in each round, the PUF response is used to change a few bits of the message word to produce a unique message digest for each IC. The use of EPUF as a key generation module for AES is also shown. The hardware area requirement for SHA-256 and AES-128 is then analyzed using synthesis results based on TSMC 65nm library. It is shown that the area overhead of 8 cells/bit RRAM PUF is only 1.08% of the SHA-256 module and 0.04% of the AES-128 module. The security analysis of the PUF based systems is also presented. It is shown that the EPUF-based systems are resistant towards standard attacks on PUFs, and that the security of the cryptographic modules is not compromised.
Non-volatile memories (NVM) are widely used in modern electronic devices due to their non-volatility, low static power consumption and high storage density. While Flash memories are the dominant NVM technology, resistive memories such as phase change access memory (PRAM) and spin torque transfer random access memory (STT-MRAM) are gaining ground. All these technologies suffer from reliability degradation due to process variations, structural limits and material property shift. To address the reliability concerns of these NVM technologies, multi-level low cost solutions are proposed for each of them. My approach consists of first building a comprehensive error model. Next the error characteristics are exploited to develop low cost multi-level strategies to compensate for the errors. For instance, for NAND Flash memory, I first characterize errors due to threshold voltage variations as a function of the number of program/erase cycles. Next a flexible product code is designed to migrate to a stronger ECC scheme as program/erase cycles increases. An adaptive data refresh scheme is also proposed to improve memory reliability with low energy cost for applications with different data update frequencies. For PRAM, soft errors and hard errors models are built based on shifts in the resistance distributions. Next I developed a multi-level error control approach involving bit interleaving and subblock flipping at the architecture level, threshold resistance tuning at the circuit level and programming current profile tuning at the device level. This approach helped reduce the error rate significantly so that it was now sufficient to use a low cost ECC scheme to satisfy the memory reliability constraint. I also studied the reliability of a PRAM+DRAM hybrid memory system and analyzed the tradeoffs between memory performance, programming energy and lifetime. For STT-MRAM, I first developed an error model based on process variations. I developed a multi-level approach to reduce the error rates that consisted of increasing the W/L ratio of the access transistor, increasing the voltage difference across the memory cell and adjusting the current profile during write operation. This approach enabled use of a low cost BCH based ECC scheme to achieve very low block failure rates.
In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and power consumption compared to their accurate counterparts and produce fairly accurate results. We build upon the work on approximate adders and multipliers presented in  and . First, we show how choice of algorithm and parallel adder design can be used to implement 2D Discrete Cosine Transform (DCT) algorithm with good performance but low area. Our implementation of the 2D DCT has comparable PSNR performance with respect to the algorithm presented in  with ~35-50% reduction in area. Next, we use the approximate 2x2 multiplier presented in  to implement parallel approximate multipliers. We demonstrate that if some of the 2x2 multipliers in the design of the parallel multiplier are accurate, the accuracy of the multiplier improves significantly, especially when two large numbers are multiplied. We choose Gaussian FIR Filter and Fast Fourier Transform (FFT) algorithms to illustrate the efficacy of our proposed approximate multiplier. We show that application of the proposed approximate multiplier improves the PSNR performance of 32x32 FFT implementation by 4.7 dB compared to the implementation using the approximate multiplier described in . We also implement a state-of-the-art image enlargement algorithm, namely Segment Adaptive Gradient Angle (SAGA) , in hardware. The algorithm is mapped to pipelined hardware blocks and we synthesized the design using 90 nm technology. We show that a 64x64 image can be processed in 496.48 µs when clocked at 100 MHz. The average PSNR performance of our implementation using accurate parallel adders and multipliers is 31.33 dB and that using approximate parallel adders and multipliers is 30.86 dB, when evaluated against the original image. The PSNR performance of both designs is comparable to the performance of the double precision floating point MATLAB implementation of the algorithm.
We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.
Structural integrity is an important characteristic of performance for critical components used in applications such as aeronautics, materials, construction and transportation. When appraising the structural integrity of these components, evaluation methods must be accurate. In addition to possessing capability to perform damage detection, the ability to monitor the level of damage over time can provide extremely useful information in assessing the operational worthiness of a structure and in determining whether the structure should be repaired or removed from service. In this work, a sequential Bayesian approach with active sensing is employed for monitoring crack growth within fatigue-loaded materials. The monitoring approach is based on predicting crack damage state dynamics and modeling crack length observations. Since fatigue loading of a structural component can change while in service, an interacting multiple model technique is employed to estimate probabilities of different loading modes and incorporate this information in the crack length estimation problem. For the observation model, features are obtained from regions of high signal energy in the time-frequency plane and modeled for each crack length damage condition. Although this observation model approach exhibits high classification accuracy, the resolution characteristics can change depending upon the extent of the damage. Therefore, several different transmission waveforms and receiver sensors are considered to create multiple modes for making observations of crack damage. Resolution characteristics of the different observation modes are assessed using a predicted mean squared error criterion and observations are obtained using the predicted, optimal observation modes based on these characteristics. Calculation of the predicted mean square error metric can be computationally intensive, especially if performed in real time, and an approximation method is proposed. With this approach, the real time computational burden is decreased significantly and the number of possible observation modes can be increased. Using sensor measurements from real experiments, the overall sequential Bayesian estimation approach, with the adaptive capability of varying the state dynamics and observation modes, is demonstrated for tracking crack damage.
In this thesis, an adaptive waveform selection technique for dynamic target tracking under low signal-to-noise ratio (SNR) conditions is investigated. The approach is integrated with a track-before-detect (TBD) algorithm and uses delay-Doppler matched filter (MF) outputs as raw measurements without setting any threshold for extracting delay-Doppler estimates. The particle filter (PF) Bayesian sequential estimation approach is used with the TBD algorithm (PF-TBD) to estimate the dynamic target state. A waveform-agile TBD technique is proposed that integrates the PF-TBD with a waveform selection technique. The new approach predicts the waveform to transmit at the next time step by minimizing the predicted mean-squared error (MSE). As a result, the radar parameters are adaptively and optimally selected for superior performance. Based on previous work, this thesis highlights the applicability of the predicted covariance matrix to the lower SNR waveform-agile tracking problem. The adaptive waveform selection algorithm's MSE performance was compared against fixed waveforms using Monte Carlo simulations. It was found that the adaptive approach performed at least as well as the best fixed waveform when focusing on estimating only position or only velocity. When these estimates were weighted by different amounts, then the adaptive performance exceeded all fixed waveforms. This improvement in performance demonstrates the utility of the predicted covariance in waveform design, at low SNR conditions that are poorly handled with more traditional tracking algorithms.
Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU.
Research on developing new algorithms to improve information on brain functionality and structure is ongoing. Studying neural activity through dipole source localization with electroencephalography (EEG) and magnetoencephalography (MEG) sensor measurements can lead to diagnosis and treatment of a brain disorder and can also identify the area of the brain from where the disorder has originated. Designing advanced localization algorithms that can adapt to environmental changes is considered a significant shift from manual diagnosis which is based on the knowledge and observation of the doctor, to an adaptive and improved brain disorder diagnosis as these algorithms can track activities that might not be noticed by the human eye. An important consideration of these localization algorithms, however, is to try and minimize the overall power consumption in order to improve the study and treatment of brain disorders. This thesis considers the problem of estimating dynamic parameters of neural dipole sources while minimizing the system's overall power consumption; this is achieved by minimizing the number of EEG/MEG measurements sensors without a loss in estimation performance accuracy. As the EEG/MEG measurements models are related non-linearity to the dipole source locations and moments, these dynamic parameters can be estimated using sequential Monte Carlo methods such as particle filtering. Due to the large number of sensors required to record EEG/MEG Measurements for use in the particle filter, over long period recordings, a large amounts of power is required for storage and transmission. In order to reduce the overall power consumption, two methods are proposed. The first method used the predicted mean square estimation error as the performance metric under the constraint of a maximum power consumption. The performance metric of the second method uses the distance between the location of the sensors and the location estimate of the dipole source at the previous time step; this sensor scheduling scheme results in maximizing the overall signal-to-noise ratio. The performance of both methods is demonstrated using simulated data, and both methods show that they can provide good estimation results with significant reduction in the number of activated sensors at each time step.