Search Content

Semantic sparse learning in images and videos

Description

Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many…

Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many of such sparse learning methods focus on designing or application of some learning techniques for certain feature space without much explicit consideration on possible interaction between the underlying semantics of the visual data and the employed learning technique. Rich semantic information in most visual data, if properly incorporated into algorithm design, should help achieving improved performance while delivering intuitive interpretation of the algorithmic outcomes. My study addresses the problem of how to explicitly consider the semantic information of the visual data in the sparse learning algorithms. In this work, we identify four problems which are of great importance and broad interest to the community. Specifically, a novel approach is proposed to incorporate label information to learn a dictionary which is not only reconstructive but also discriminative; considering the formation process of face images, a novel image decomposition approach for an ensemble of correlated images is proposed, where a subspace is built from the decomposition and applied to face recognition; based on the observation that, the foreground (or salient) objects are sparse in input domain and the background is sparse in frequency domain, a novel and efficient spatio-temporal saliency detection algorithm is proposed to identify the salient regions in video; and a novel hidden Markov model learning approach is proposed by utilizing a sparse set of pairwise comparisons among the data, which is easier to obtain and more meaningful, consistent than tradition labels, in many scenarios, e.g., evaluating motion skills in surgical simulations. In those four problems, different types of semantic information are modeled and incorporated in designing sparse learning algorithms for the corresponding visual computing tasks. Several real world applications are selected to demonstrate the effectiveness of the proposed methods, including, face recognition, spatio-temporal saliency detection, abnormality detection, spatio-temporal interest point detection, motion analysis and emotion recognition. In those applications, data of different modalities are involved, ranging from audio signal, image to video. Experiments on large scale real world data with comparisons to state-of-art methods confirm the proposed approaches deliver salient advantages, showing adding those semantic information dramatically improve the performances of the general sparse learning methods.

ContributorsZhang, Qiang (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Wang, Yalin (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2014

Context recognition methods using audio signals for human-machine interaction

Description

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents…

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.

ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2015

Multiple radar target tracking in environments with high noise and clutter

Description

Tracking a time-varying number of targets is a challenging

dynamic state estimation problem whose complexity is intensified

under low signal-to-noise ratio (SNR) or high clutter conditions.

This is important, for example, when tracking

multiple, closely spaced targets moving in the same direction such as a

convoy of low observable vehicles moving…

Tracking a time-varying number of targets is a challenging

dynamic state estimation problem whose complexity is intensified

under low signal-to-noise ratio (SNR) or high clutter conditions.

This is important, for example, when tracking

multiple, closely spaced targets moving in the same direction such as a

convoy of low observable vehicles moving through a forest or multiple

targets moving in a crisscross pattern. The SNR in

these applications is usually low as the reflected signals from

the targets are weak or the noise level is very high.

An effective approach for detecting and tracking a single target

under low SNR conditions is the track-before-detect filter (TBDF)

that uses unthresholded measurements. However, the TBDF has only been used to

track a small fixed number of targets at low SNR.

This work proposes a new multiple target TBDF approach to track a

dynamically varying number of targets under the recursive Bayesian framework.

For a given maximum number of

targets, the state estimates are obtained by estimating the joint

multiple target posterior probability density function under all possible

target

existence combinations. The estimation of the corresponding target existence

combination probabilities and the target existence probabilities are also

derived. A feasible sequential Monte Carlo (SMC) based implementation

algorithm is proposed. The approximation accuracy of the SMC

method with a reduced number of particles is improved by an efficient

proposal density function that partitions the multiple target space into a

single target space.

The proposed multiple target TBDF method is extended to track targets in sea

clutter using highly time-varying radar measurements. A generalized

likelihood function for closely spaced multiple targets in compound Gaussian

sea clutter is derived together with the maximum likelihood estimate of

the model parameters using an iterative fixed point algorithm.

The TBDF performance is improved by proposing a computationally feasible

method to estimate the space-time covariance matrix of rapidly-varying sea

clutter. The method applies the Kronecker product approximation to the

covariance matrix and uses particle filtering to solve the resulting dynamic

state space model formulation.

ContributorsEbenezer, Samuel P (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Bliss, Daniel (Committee member) / Kovvali, Narayan (Committee member) / Arizona State University (Publisher)

Created2015

Multiple detection and tracking in complex time-varying environments

Description

This work considers the problem of multiple detection and tracking in two complex time-varying environments, urban terrain and underwater. Tracking multiple radar targets in urban environments is rst investigated by exploiting multipath signal returns, wideband underwater acoustic (UWA) communications channels are estimated using adaptive learning methods, and multiple UWA communications…

This work considers the problem of multiple detection and tracking in two complex time-varying environments, urban terrain and underwater. Tracking multiple radar targets in urban environments is rst investigated by exploiting multipath signal returns, wideband underwater acoustic (UWA) communications channels are estimated using adaptive learning methods, and multiple UWA communications users are detected by designing the transmit signal to match the environment. For the urban environment, a multi-target tracking algorithm is proposed that integrates multipath-to-measurement association and the probability hypothesis density method implemented using particle filtering. The algorithm is designed to track an unknown time-varying number of targets by extracting information from multiple measurements due to multipath returns in the urban terrain. The path likelihood probability is calculated by considering associations between measurements and multipath returns, and an adaptive clustering algorithm is used to estimate the number of target and their corresponding parameters. The performance of the proposed algorithm is demonstrated for different multiple target scenarios and evaluated using the optimal subpattern assignment metric. The underwater environment provides a very challenging communication channel due to its highly time-varying nature, resulting in large distortions due to multipath and Doppler-scaling, and frequency-dependent path loss. A model-based wideband UWA channel estimation algorithm is first proposed to estimate the channel support and the wideband spreading function coefficients. A nonlinear frequency modulated signaling scheme is proposed that is matched to the wideband characteristics of the underwater environment. Constraints on the signal parameters are derived to optimally reduce multiple access interference and the UWA channel effects. The signaling scheme is compared to a code division multiple access (CDMA) scheme to demonstrate its improved bit error rate performance. The overall multi-user communication system performance is finally analyzed by first estimating the UWA channel and then designing the signaling scheme for multiple communications users.

ContributorsZhou, Meng (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Kovvali, Narayan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2014

Modeling and simulation tools for aging effects in scaled CMOS design

Description

The aging process due to Bias Temperature Instability (both NBTI and PBTI) and Channel Hot Carrier (CHC) is a key limiting factor of circuit lifetime in CMOS design. Threshold voltage shift due to BTI is a strong function of stress voltage and temperature complicating stress and recovery prediction. This poses…

The aging process due to Bias Temperature Instability (both NBTI and PBTI) and Channel Hot Carrier (CHC) is a key limiting factor of circuit lifetime in CMOS design. Threshold voltage shift due to BTI is a strong function of stress voltage and temperature complicating stress and recovery prediction. This poses a unique challenge for long-term aging prediction for wide range of stress patterns. Traditional approaches usually resort to an average stress waveform to simplify the lifetime prediction. They are efficient, but fail to capture circuit operation, especially under dynamic voltage scaling (DVS) or in analog/mixed signal designs where the stress waveform is much more random. This work presents a suite of modelling solutions for BTI that enable aging simulation under all possible stress conditions. Key features of this work are compact models to predict BTI aging based on Reaction-Diffusion theory when the stress voltage is varying. The results to both reaction-diffusion (RD) and trapping-detrapping (TD) mechanisms are presented to cover underlying physics. Silicon validation of these models is performed at 28nm, 45nm and 65nm technology nodes, at both device and circuit levels. Efficient simulation leveraging the BTI models under DVS and random input waveform is applied to both digital and analog representative circuits such as ring oscillators and LNA. Both physical mechanisms are combined into a unified model which improves prediction accuracy at 45nm and 65nm nodes. Critical failure condition is also illustrated based on NBTI and PBTI at 28nm. A comprehensive picture for duty cycle shift is shown. DC stress under clock gating schemes results in monotonic shift in duty cycle which an AC stress causes duty cycle to converge close to 50% value. Proposed work provides a general and comprehensive solution to aging analysis under random stress patterns under BTI.

Channel hot carrier (CHC) is another dominant degradation mechanism which affects analog and mixed signal circuits (AMS) as transistor operates continuously in saturation condition. New model is proposed to account for e-e scattering in advanced technology nodes due to high gate electric field. The model is validated with 28nm and 65nm thick oxide data for different stress voltages. It demonstrates shift in worst case CHC condition to Vgs=Vds from Vgs=0.5Vds. A novel iteration based aging simulation framework for AMS designs is proposed which eliminates limitation for conventional reliability tools. This approach helps us identify a unique positive feedback mechanism termed as Bias Runaway. Bias runaway, is rapid increase of the bias voltage in AMS circuits which occurs when the feedback between the bias current and the effect of channel hot carrier turns into positive. The degradation of CHC is a gradual process but under specific circumstances, the degradation rate can be dramatically accelerated. Such a catastrophic phenomenon is highly sensitive to the initial operation condition, as well as transistor gate length. Based on 65nm silicon data, our work investigates the critical condition that triggers bias runaway, and the impact of gate length tuning. We develop new compact models as well as the simulation methodology for circuit diagnosis, and propose design solutions and the trade-offs to avoid bias runaway, which is vitally important to reliable AMS designs.

ContributorsSutaria, Ketul (Author) / Cao, Yu (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Chakrabarti, Chaitali (Committee member) / Yu, Shimeng (Committee member) / Arizona State University (Publisher)

Created2014

In support of high quality 3-D ultrasound imaging for hand-held devices

Description

Three dimensional (3-D) ultrasound is safe, inexpensive, and has been shown to drastically improve system ease-of-use, diagnostic efficiency, and patient throughput. However, its high computational complexity and resulting high power consumption has precluded its use in hand-held applications.

In this dissertation, algorithm-architecture co-design techniques that aim to make hand-held 3-D ultrasound…

Three dimensional (3-D) ultrasound is safe, inexpensive, and has been shown to drastically improve system ease-of-use, diagnostic efficiency, and patient throughput. However, its high computational complexity and resulting high power consumption has precluded its use in hand-held applications.

In this dissertation, algorithm-architecture co-design techniques that aim to make hand-held 3-D ultrasound a reality are presented. First, image enhancement methods to improve signal-to-noise ratio (SNR) are proposed. These include virtual source firing techniques and a low overhead digital front-end architecture using orthogonal chirps and orthogonal Golay codes.

Second, algorithm-architecture co-design techniques to reduce the power consumption of 3-D SAU imaging systems is presented. These include (i) a subaperture multiplexing strategy and the corresponding apodization method to alleviate the signal bandwidth bottleneck, and (ii) a highly efficient iterative delay calculation method to eliminate complex operations such as multiplications, divisions and square-root in delay calculation during beamforming. These techniques were used to define Sonic Millip3De, a 3-D die stacked architecture for digital beamforming in SAU systems. Sonic Millip3De produces 3-D high resolution images at 2 frames per second with system power consumption of 15W in 45nm technology.

Third, a new beamforming method based on separable delay decomposition is proposed to reduce the computational complexity of the beamforming unit in an SAU system. The method is based on minimizing the root-mean-square error (RMSE) due to delay decomposition. It reduces the beamforming complexity of a SAU system by 19x while providing high image fidelity that is comparable to non-separable beamforming. The resulting modified Sonic Millip3De architecture supports a frame rate of 32 volumes per second while maintaining power consumption of 15W in 45nm technology.

Next a 3-D plane-wave imaging system that utilizes both separable beamforming and coherent compounding is presented. The resulting system has computational complexity comparable to that of a non-separable non-compounding baseline system while significantly improving contrast-to-noise ratio and SNR. The modified Sonic Millip3De architecture is now capable of generating high resolution images at 1000 volumes per second with 9-fire-angle compounding.

ContributorsYang, Ming (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Karam, Lina (Committee member) / Frakes, David (Committee member) / Ogras, Umit Y. (Committee member) / Arizona State University (Publisher)

Created2015

Constrained energy optimization in heterogeneous platforms using generalized scaling models

Description

Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on-chip (MpSoC) with numerous resources including display, memory, power management IC (PMIC), battery and wireless modems into a compact package. Furthermore, the MpSoC itself is a heterogeneous resource that integrates many processing elements such as CPU cores, GPU, video,…

Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on-chip (MpSoC) with numerous resources including display, memory, power management IC (PMIC), battery and wireless modems into a compact package. Furthermore, the MpSoC itself is a heterogeneous resource that integrates many processing elements such as CPU cores, GPU, video, image, and audio processors. As a result, optimization approaches targeting mobile computing needs to consider the platform at various levels of granularity.

Platform energy consumption and responsiveness are two major considerations for mobile systems since they determine the battery life and user satisfaction, respectively. In this work, the models for power consumption, response time, and energy consumption of heterogeneous mobile platforms are presented. Then, these models are used to optimize the energy consumption of baseline platforms under power, response time, and temperature constraints with and without introducing new resources. It is shown, the optimal design choices depend on dynamic power management algorithm, and adding new resources is more energy efficient than scaling existing resources alone. The framework is verified through actual experiments on Qualcomm Snapdragon 800 based tablet MDP/T. Furthermore, usage of the framework at both design and runtime optimization is also presented.

ContributorsGupta, Ujjwala (Author) / Ogras, Umit Y. (Thesis advisor) / Ozev, Sule (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014

StreamWorks: an energy-efficient embedded co-processor for stream computing

Description

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The…

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements.

This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels.

Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories.

The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator.

ContributorsPanda, Amrit (Author) / Chatha, Karam S. (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2014

Gain and loss factor for conical horns, and impact of ground plane edge diffractions on radiation patterns of uncoated and coated circular aperture antennas

Description

Horn antennas have been used for over a hundred years. They have a wide variety of uses where they are a basic and popular microwave antenna for many practical applications, such as feed elements for communication reflector dishes on satellite or point-to-point relay antennas. They are also widely utilized as…

Horn antennas have been used for over a hundred years. They have a wide variety of uses where they are a basic and popular microwave antenna for many practical applications, such as feed elements for communication reflector dishes on satellite or point-to-point relay antennas. They are also widely utilized as gain standards for calibration and gain measurement of other antennas.

The gain and loss factor of conical horns are revisited in this dissertation based on

spherical and quadratic aperture phase distributions. The gain is compared with published classical data in an attempt to confirm their validity and accuracy and to determine whether they were derived based on spherical or quadratic aperture phase distributions. In this work, it is demonstrated that the gain of a conical horn antenna obtained by using a spherical phase distribution is in close agreement with published classical data. Moreover, more accurate expressions for the loss factor, to account for amplitude and phase tapers over the horn aperture, are derived. New formulas for the design of optimum gain conical horns, based on the more accurate spherical aperture phase distribution, are derived.

To better understand the impact of edge diffractions on aperture antenna performance, an extensive investigation of the edge diffractions impact is undertaken in this dissertation for commercial aperture antennas. The impact of finite uncoated and coated PEC ground plane edge diffractions on the amplitude patterns in the principal planes of circular apertures is intensively examined. Similarly, aperture edge diffractions of aperture antennas without ground planes are examined. Computational results obtained by the analytical model are compared with experimental and HFSS-simulated results for all cases studied. In addition, the impact of the ground plane size, coating thickness, and relative permittivity of the dielectric layer on the radiation amplitude in the back region has been examined.

This investigation indicates that the edge diffractions do impact the main forward lobe pattern, especially in the E plane. Their most significant contribution appears in far side and back lobes. This work demonstrates that the finite edge contributors must be considered to obtain more accurate amplitude patterns of aperture antennas.

ContributorsAboserwal, Nafati Abdasallam (Author) / Balanis, Constantine A (Thesis advisor) / Aberle, James T (Committee member) / Pan, George (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2014

A real-time vision system for a semi-autonomous surface vehicle

Description

In the sport of competitive water skiing, the skill of a human boat driver can affect athletic performance. Driver influence is not necessarily inhibitive to skiers, however, it reduces the fairness and credibility of the sport overall. In response to the stated problem, this thesis proposes a vision-based real-time control…

In the sport of competitive water skiing, the skill of a human boat driver can affect athletic performance. Driver influence is not necessarily inhibitive to skiers, however, it reduces the fairness and credibility of the sport overall. In response to the stated problem, this thesis proposes a vision-based real-time control system designed specifically for tournament waterski boats. The challenges addressed in this thesis include: one, the segmentation of floating objects in frame sequences captured by a moving camera, two, the identification of segmented objects which fit a predefined model, and three, the accurate and fast estimation of camera position and orientation from coplanar point correspondences. This thesis discusses current ideas and proposes new methods for the three challenges mentioned. In the end, a working prototype is produced.

ContributorsWalker, Collin (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Claveau, David (Committee member) / Arizona State University (Publisher)

Created2014

Filtering by