Matching Items (59)
Filtering by

Clear all filters

152813-Thumbnail Image.png
Description
Continuous monitoring of sensor data from smart phones to identify human activities and gestures, puts a heavy load on the smart phone's power consumption. In this research study, the non-Euclidean geometry of the rich sensor data obtained from the user's smart phone is utilized to perform compressive analysis and efficient

Continuous monitoring of sensor data from smart phones to identify human activities and gestures, puts a heavy load on the smart phone's power consumption. In this research study, the non-Euclidean geometry of the rich sensor data obtained from the user's smart phone is utilized to perform compressive analysis and efficient classification of human activities by employing machine learning techniques. We are interested in the generalization of classical tools for signal approximation to newer spaces, such as rotation data, which is best studied in a non-Euclidean setting, and its application to activity analysis. Attributing to the non-linear nature of the rotation data space, which involve a heavy overload on the smart phone's processor and memory as opposed to feature extraction on the Euclidean space, indexing and compaction of the acquired sensor data is performed prior to feature extraction, to reduce CPU overhead and thereby increase the lifetime of the battery with a little loss in recognition accuracy of the activities. The sensor data represented as unit quaternions, is a more intrinsic representation of the orientation of smart phone compared to Euler angles (which suffers from Gimbal lock problem) or the computationally intensive rotation matrices. Classification algorithms are employed to classify these manifold sequences in the non-Euclidean space. By performing customized indexing (using K-means algorithm) of the evolved manifold sequences before feature extraction, considerable energy savings is achieved in terms of smart phone's battery life.
ContributorsSivakumar, Aswin (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2014
153488-Thumbnail Image.png
Description
Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2015
153394-Thumbnail Image.png
Description
As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a

As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a conventional camera, into a single step. A popular variant is the single-pixel camera that obtains measurements of the scene using a pseudo-random measurement matrix. Advances in compressive sensing (CS) theory in the past decade have supplied the tools that, in theory, allow near-perfect reconstruction of an image from these measurements even for sub-Nyquist sampling rates. However, current state-of-the-art reconstruction algorithms suffer from two drawbacks -- They are (1) computationally very expensive and (2) incapable of yielding high fidelity reconstructions for high compression ratios. In computer vision, the final goal is usually to perform an inference task using the images acquired and not signal recovery. With this motivation, this thesis considers the possibility of inference directly from compressed measurements, thereby obviating the need to use expensive reconstruction algorithms. It is often the case that non-linear features are used for inference tasks in computer vision. However, currently, it is unclear how to extract such features from compressed measurements. Instead, using the theoretical basis provided by the Johnson-Lindenstrauss lemma, discriminative features using smashed correlation filters are derived and it is shown that it is indeed possible to perform reconstruction-free inference at high compression ratios with only a marginal loss in accuracy. As a specific inference problem in computer vision, face recognition is considered, mainly beyond the visible spectrum such as in the short wave infra-red region (SWIR), where sensors are expensive.
ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2015
153223-Thumbnail Image.png
Description
Feature representations for raw data is one of the most important component in a machine learning system. Traditionally, features are \textit{hand crafted} by domain experts which can often be a time consuming process. Furthermore, they do not generalize well to unseen data and novel tasks. Recently, there have been many

Feature representations for raw data is one of the most important component in a machine learning system. Traditionally, features are \textit{hand crafted} by domain experts which can often be a time consuming process. Furthermore, they do not generalize well to unseen data and novel tasks. Recently, there have been many efforts to generate data-driven representations using clustering and sparse models. This dissertation focuses on building data-driven unsupervised models for analyzing raw data and developing efficient feature representations.

Simultaneous segmentation and feature extraction approaches for silicon-pores sensor data are considered. Aggregating data into a matrix and performing low rank and sparse matrix decompositions with additional smoothness constraints are proposed to solve this problem. Comparison of several variants of the approaches and results for signal de-noising and translocation/trapping event extraction are presented. Algorithms to improve transform-domain features for ion-channel time-series signals based on matrix completion are presented. The improved features achieve better performance in classification tasks and in reducing the false alarm rates when applied to analyte detection.

Developing representations for multimedia is an important and challenging problem with applications ranging from scene recognition, multi-media retrieval and personal life-logging systems to field robot navigation. In this dissertation, we present a new framework for feature extraction for challenging natural environment sounds. Proposed features outperform traditional spectral features on challenging environmental sound datasets. Several algorithms are proposed that perform supervised tasks such as recognition and tag annotation. Ensemble methods are proposed to improve the tag annotation process.

To facilitate the use of large datasets, fast implementations are developed for sparse coding, the key component in our algorithms. Several strategies to speed-up Orthogonal Matching Pursuit algorithm using CUDA kernel on a GPU are proposed. Implementations are also developed for a large scale image retrieval system. Image-based "exact search" and "visually similar search" using the image patch sparse codes are performed. Results demonstrate large speed-up over CPU implementations and good retrieval performance is also achieved.
ContributorsSattigeri, Prasanna S (Author) / Spanias, Andreas (Thesis advisor) / Thornton, Trevor (Committee member) / Goryll, Michael (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)
Created2014
150380-Thumbnail Image.png
Description
Great advances have been made in the construction of photovoltaic (PV) cells and modules, but array level management remains much the same as it has been in previous decades. Conventionally, the PV array is connected in a fixed topology which is not always appropriate in the presence of faults in

Great advances have been made in the construction of photovoltaic (PV) cells and modules, but array level management remains much the same as it has been in previous decades. Conventionally, the PV array is connected in a fixed topology which is not always appropriate in the presence of faults in the array, and varying weather conditions. With the introduction of smarter inverters and solar modules, the data obtained from the photovoltaic array can be used to dynamically modify the array topology and improve the array power output. This is beneficial especially when module mismatches such as shading, soiling and aging occur in the photovoltaic array. This research focuses on the topology optimization of PV arrays under shading conditions using measurements obtained from a PV array set-up. A scheme known as topology reconfiguration method is proposed to find the optimal array topology for a given weather condition and faulty module information. Various topologies such as the series-parallel (SP), the total cross-tied (TCT), the bridge link (BL) and their bypassed versions are considered. The topology reconfiguration method compares the efficiencies of the topologies, evaluates the percentage gain in the generated power that would be obtained by reconfiguration of the array and other factors to find the optimal topology. This method is employed for various possible shading patterns to predict the best topology. The results demonstrate the benefit of having an electrically reconfigurable array topology. The effects of irradiance and shading on the array performance are also studied. The simulations are carried out using a SPICE simulator. The simulation results are validated with the experimental data provided by the PACECO Company.
ContributorsBuddha, Santoshi Tejasri (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Thesis advisor) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)
Created2011
149977-Thumbnail Image.png
Description
Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, using the proposed invariant multifactor pose features, a suite of simple while effective algorithms have been developed to solve the movement recognition and pose estimation problems. Using these proposed algorithms, excellent human movement analysis results have been obtained, and most of them are superior to those obtained from state-of-the-art algorithms on the same testing datasets. Moreover, a number of key movement analysis challenges, including robust online gesture spotting and multi-camera gesture recognition, have also been addressed in this research. To this end, an online gesture spotting framework has been developed to automatically detect and learn non-gesture movement patterns to improve gesture localization and recognition from continuous data streams using a hidden Markov network. In addition, the optimal data fusion scheme has been investigated for multicamera gesture recognition, and the decision-level camera fusion scheme using the product rule has been found to be optimal for gesture recognition using multiple uncalibrated cameras. Furthermore, the challenge of optimal camera selection in multi-camera gesture recognition has also been tackled. A measure to quantify the complementary strength across cameras has been proposed. Experimental results obtained from a real-life gesture recognition dataset have shown that the optimal camera combinations identified according to the proposed complementary measure always lead to the best gesture recognition results.
ContributorsPeng, Bo (Author) / Qian, Gang (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)
Created2011
149780-Thumbnail Image.png
Description
The demand for handheld portable computing in education, business and research has resulted in advanced mobile devices with powerful processors and large multi-touch screens. Such devices are capable of handling tasks of moderate computational complexity such as word processing, complex Internet transactions, and even human motion analysis. Apple's iOS devices,

The demand for handheld portable computing in education, business and research has resulted in advanced mobile devices with powerful processors and large multi-touch screens. Such devices are capable of handling tasks of moderate computational complexity such as word processing, complex Internet transactions, and even human motion analysis. Apple's iOS devices, including the iPhone, iPod touch and the latest in the family - the iPad, are among the well-known and widely used mobile devices today. Their advanced multi-touch interface and improved processing power can be exploited for engineering and STEM demonstrations. Moreover, these devices have become a part of everyday student life. Hence, the design of exciting mobile applications and software represents a great opportunity to build student interest and enthusiasm in science and engineering. This thesis presents the design and implementation of a portable interactive signal processing simulation software on the iOS platform. The iOS-based object-oriented application is called i-JDSP and is based on the award winning Java-DSP concept. It is implemented in Objective-C and C as a native Cocoa Touch application that can be run on any iOS device. i-JDSP offers basic signal processing simulation functions such as Fast Fourier Transform, filtering, spectral analysis on a compact and convenient graphical user interface and provides a very compelling multi-touch programming experience. Built-in modules also demonstrate concepts such as the Pole-Zero Placement. i-JDSP also incorporates sound capture and playback options that can be used in near real-time analysis of speech and audio signals. All simulations can be visually established by forming interactive block diagrams through multi-touch and drag-and-drop. Computations are performed on the mobile device when necessary, making the block diagram execution fast. Furthermore, the extensive support for user interactivity provides scope for improved learning. The results of i-JDSP assessment among senior undergraduate and first year graduate students revealed that the software created a significant positive impact and increased the students' interest and motivation and in understanding basic DSP concepts.
ContributorsLiu, Jinru (Author) / Spanias, Andreas (Thesis advisor) / Tsakalis, Kostas (Committee member) / Qian, Gang (Committee member) / Arizona State University (Publisher)
Created2011
149962-Thumbnail Image.png
Description
In the last few years, significant advances in nanofabrication have allowed tailoring of structures and materials at a molecular level enabling nanofabrication with precise control of dimensions and organization at molecular length scales, a development leading to significant advances in nanoscale systems. Although, the direction of progress seems to follow

In the last few years, significant advances in nanofabrication have allowed tailoring of structures and materials at a molecular level enabling nanofabrication with precise control of dimensions and organization at molecular length scales, a development leading to significant advances in nanoscale systems. Although, the direction of progress seems to follow the path of microelectronics, the fundamental physics in a nanoscale system changes more rapidly compared to microelectronics, as the size scale is decreased. The changes in length, area, and volume ratios due to reduction in size alter the relative influence of various physical effects determining the overall operation of a system in unexpected ways. One such category of nanofluidic structures demonstrating unique ionic and molecular transport characteristics are nanopores. Nanopores derive their unique transport characteristics from the electrostatic interaction of nanopore surface charge with aqueous ionic solutions. In this doctoral research cylindrical nanopores, in single and array configuration, were fabricated in silicon-on-insulator (SOI) using a combination of electron beam lithography (EBL) and reactive ion etching (RIE). The fabrication method presented is compatible with standard semiconductor foundries and allows fabrication of nanopores with desired geometries and precise dimensional control, providing near ideal and isolated physical modeling systems to study ion transport at the nanometer level. Ion transport through nanopores was characterized by measuring ionic conductances of arrays of nanopores of various diameters for a wide range of concentration of aqueous hydrochloric acid (HCl) ionic solutions. Measured ionic conductances demonstrated two distinct regimes based on surface charge interactions at low ionic concentrations and nanopore geometry at high ionic concentrations. Field effect modulation of ion transport through nanopore arrays, in a fashion similar to semiconductor transistors, was also studied. Using ionic conductance measurements, it was shown that the concentration of ions in the nanopore volume was significantly changed when a gate voltage on nanopore arrays was applied, hence controlling their transport. Based on the ion transport results, single nanopores were used to demonstrate their application as nanoscale particle counters by using polystyrene nanobeads, monodispersed in aqueous HCl solutions of different molarities. Effects of field effect modulation on particle transition events were also demonstrated.
ContributorsJoshi, Punarvasu (Author) / Thornton, Trevor J (Thesis advisor) / Goryll, Michael (Thesis advisor) / Spanias, Andreas (Committee member) / Saraniti, Marco (Committee member) / Arizona State University (Publisher)
Created2011
149867-Thumbnail Image.png
Description
Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.
ContributorsKrishnamoorthi, Harish (Author) / Spanias, Andreas (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)
Created2011
150788-Thumbnail Image.png
Description
Interictal spikes, together with seizures, have been recognized as the two hallmarks of epilepsy, a brain disorder that 1% of the world's population suffers from. Even though the presence of spikes in brain's electromagnetic activity has diagnostic value, their dynamics are still elusive. It was an objective of this dissertation

Interictal spikes, together with seizures, have been recognized as the two hallmarks of epilepsy, a brain disorder that 1% of the world's population suffers from. Even though the presence of spikes in brain's electromagnetic activity has diagnostic value, their dynamics are still elusive. It was an objective of this dissertation to formulate a mathematical framework within which the dynamics of interictal spikes could be thoroughly investigated. A new epileptic spike detection algorithm was developed by employing data adaptive morphological filters. The performance of the spike detection algorithm was favorably compared with others in the literature. A novel spike spatial synchronization measure was developed and tested on coupled spiking neuron models. Application of this measure to individual epileptic spikes in EEG from patients with temporal lobe epilepsy revealed long-term trends of increase in synchronization between pairs of brain sites before seizures and desynchronization after seizures, in the same patient as well as across patients, thus supporting the hypothesis that seizures may occur to break (reset) the abnormal spike synchronization in the brain network. Furthermore, based on these results, a separate spatial analysis of spike rates was conducted that shed light onto conflicting results in the literature about variability of spike rate before and after seizure. The ability to automatically classify seizures into clinical and subclinical was a result of the above findings. A novel method for epileptogenic focus localization from interictal periods based on spike occurrences was also devised, combining concepts from graph theory, like eigenvector centrality, and the developed spike synchronization measure, and tested very favorably against the utilized gold rule in clinical practice for focus localization from seizures onset. Finally, in another application of resetting of brain dynamics at seizures, it was shown that it is possible to differentiate with a high accuracy between patients with epileptic seizures (ES) and patients with psychogenic nonepileptic seizures (PNES). The above studies of spike dynamics have elucidated many unknown aspects of ictogenesis and it is expected to significantly contribute to further understanding of the basic mechanisms that lead to seizures, the diagnosis and treatment of epilepsy.
ContributorsKrishnan, Balu (Author) / Iasemidis, Leonidas (Thesis advisor) / Tsakalis, Kostantinos (Committee member) / Spanias, Andreas (Committee member) / Si, Jennie (Committee member) / Arizona State University (Publisher)
Created2012