Matching Items (71)
Filtering by

Clear all filters

157645-Thumbnail Image.png
Description
Disentangling latent spaces is an important research direction in the interpretability of unsupervised machine learning. Several recent works using deep learning are very effective at producing disentangled representations. However, in the unsupervised setting, there is no way to pre-specify which part of the latent space captures specific factors of

Disentangling latent spaces is an important research direction in the interpretability of unsupervised machine learning. Several recent works using deep learning are very effective at producing disentangled representations. However, in the unsupervised setting, there is no way to pre-specify which part of the latent space captures specific factors of variations. While this is generally a hard problem because of the non-existence of analytical expressions to capture these variations, there are certain factors like geometric

transforms that can be expressed analytically. Furthermore, in existing frameworks, the disentangled values are also not interpretable. The focus of this work is to disentangle these geometric factors of variations (which turn out to be nuisance factors for many applications) from the semantic content of the signal in an interpretable manner which in turn makes the features more discriminative. Experiments are designed to show the modularity of the approach with other disentangling strategies as well as on multiple one-dimensional (1D) and two-dimensional (2D) datasets, clearly indicating the efficacy of the proposed approach.
ContributorsKoneripalli Seetharam, Kaushik (Author) / Turaga, Pavan (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2019
156987-Thumbnail Image.png
Description
Speech is generated by articulators acting on

a phonatory source. Identification of this

phonatory source and articulatory geometry are

individually challenging and ill-posed

problems, called speech separation and

articulatory inversion, respectively.

There exists a trade-off

between decomposition and recovered

articulatory geometry due to multiple

possible mappings between an

articulatory configuration

and the speech produced. However, if measurements

are

Speech is generated by articulators acting on

a phonatory source. Identification of this

phonatory source and articulatory geometry are

individually challenging and ill-posed

problems, called speech separation and

articulatory inversion, respectively.

There exists a trade-off

between decomposition and recovered

articulatory geometry due to multiple

possible mappings between an

articulatory configuration

and the speech produced. However, if measurements

are obtained only from a microphone sensor,

they lack any invasive insight and add

additional challenge to an already difficult

problem.

A joint non-invasive estimation

strategy that couples articulatory and

phonatory knowledge would lead to better

articulatory speech synthesis. In this thesis,

a joint estimation strategy for speech

separation and articulatory geometry recovery

is studied. Unlike previous

periodic/aperiodic decomposition methods that

use stationary speech models within a

frame, the proposed model presents a

non-stationary speech decomposition method.

A parametric glottal source model and an

articulatory vocal tract response are

represented in a dynamic state space formulation.

The unknown parameters of the

speech generation components are estimated

using sequential Monte Carlo methods

under some specific assumptions.

The proposed approach is compared with other

glottal inverse filtering methods,

including iterative adaptive inverse filtering,

state-space inverse filtering, and

the quasi-closed phase method.
ContributorsVenkataramani, Adarsh Akkshai (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel W (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2018
156936-Thumbnail Image.png
Description
In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional CNN. It uses capsules to describe features in multiple dimensions and dynamic routing to increase the statistical stability of the network.

In this work, we first use capsule network for overlapping digit recognition problem. We evaluate the performance of the network with respect to recognition accuracy, convergence and training time per epoch. We show that capsule network achieves higher accuracy when training set size is small. When training set size is larger, capsule network and conventional CNN have comparable recognition accuracy. The training time per epoch for capsule network is longer than conventional CNN because of the dynamic routing algorithm. An analysis of the GPU timing shows that adjusting the capsule structure can help decrease the time complexity of the dynamic routing algorithm significantly.

Next, we design a capsule network for speech recognition, specifically, overlapping word recognition. We use both capsule network and conventional CNN to recognize 2 overlapping words in speech files created from 5 word classes. We show that capsule network achieves a considerably higher recognition accuracy (96.92%) compared to conventional CNN (85.19%). Our results show that capsule network recognizes overlapping word by recognizing each individual word in the speech. We also verify the scalability of capsule network by increasing the number of word classes from 5 to 10. Capsule network still shows a high recognition accuracy of 95.42% in case of 10 words while the accuracy of conventional CNN decreases sharply to 73.18%.
ContributorsXiong, Yan (Author) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Thesis advisor) / Weng, Yang (Committee member) / Arizona State University (Publisher)
Created2018
154630-Thumbnail Image.png
Description
There has been tremendous technological advancement in the past two decades. Faster computers and improved sensing devices have broadened the research scope in computer vision. With these developments, the task of assessing the quality of human actions, is considered an important problem that needs to be tackled. Movement quality assessment

There has been tremendous technological advancement in the past two decades. Faster computers and improved sensing devices have broadened the research scope in computer vision. With these developments, the task of assessing the quality of human actions, is considered an important problem that needs to be tackled. Movement quality assessment finds wide range of application in motor control, health-care, rehabilitation and physical therapy. Home-based interactive physical therapy requires the ability to monitor, inform and assess the quality of everyday movements. Obtaining labeled data from trained therapists/experts is the main limitation, since it is both expensive and time consuming.

Motivated by recent studies in motor control and therapy, in this thesis an existing computational framework is used to assess balance impairment and disease severity in people suffering from Parkinson's disease. The framework uses high-dimensional shape descriptors of the reconstructed phase space, of the subjects' center of pressure (CoP) tracings while performing dynamical postural shifts. The performance of the framework is evaluated using a dataset collected from 43 healthy and 17 Parkinson's disease impaired subjects, and outperforms other methods, such as dynamical shift indices and use of chaotic invariants, in assessment of balance impairment.

In this thesis, an unsupervised method is also proposed that measures movement quality assessment of simple actions like sit-to-stand and dynamic posture shifts by modeling the deviation of a given movement from an ideal movement path in the configuration space, i.e. the quality of movement is directly related to similarity to the ideal trajectory, between the start and end pose. The S^1xS^1 configuration space was used to model the interaction of two joint angles in sit-to-stand actions, and the R^2 space was used to model the subject's CoP while performing dynamic posture shifts for application in movement quality estimation.
ContributorsSom, Anirudh (Author) / Turaga, Pavan (Thesis advisor) / Krishnamurthi, Narayanan (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)
Created2016
154672-Thumbnail Image.png
Description
In recent years, there has been an increased interest in sharing available bandwidth to avoid spectrum congestion. With an ever-increasing number wireless users, it is critical to develop signal processing based spectrum sharing algorithms to achieve cooperative use of the allocated spectrum among multiple systems in order to reduce

In recent years, there has been an increased interest in sharing available bandwidth to avoid spectrum congestion. With an ever-increasing number wireless users, it is critical to develop signal processing based spectrum sharing algorithms to achieve cooperative use of the allocated spectrum among multiple systems in order to reduce interference between systems. This work studies the radar and communications systems coexistence problem using two main approaches. The first approach develops methodologies to increase radar target tracking performance under low signal-to-interference-plus-noise ratio (SINR) conditions due to the coexistence of strong communications interference. The second approach jointly optimizes the performance of both systems by co-designing a common transmit waveform.

When concentrating on improving radar tracking performance, a pulsed radar that is tracking a single target coexisting with high powered communications interference is considered. Although the Cramer-Rao lower bound (CRLB) on the covariance of an unbiased estimator of deterministic parameters provides a bound on the estimation mean squared error (MSE), there exists an SINR threshold at which estimator covariance rapidly deviates from the CRLB. After demonstrating that different radar waveforms experience different estimation SINR thresholds using the Barankin bound (BB), a new radar waveform design method is proposed based on predicting the waveform-dependent BB SINR threshold under low SINR operating conditions.

A novel method of predicting the SINR threshold value for maximum likelihood estimation (MLE) is proposed. A relationship is shown to exist between the formulation of the BB kernel and the probability of selecting sidelobes for the MLE. This relationship is demonstrated as an accurate means of threshold prediction for the radar target parameter estimation of frequency, time-delay and angle-of-arrival.



For the co-design radar and communications system problem, the use of a common transmit waveform for a pulse-Doppler radar and a multiuser communications system is proposed. The signaling scheme for each system is selected from a class of waveforms with nonlinear phase function by optimizing the waveform parameters to minimize interference between the two systems and interference among communications users. Using multi-objective optimization, a trade-off in system performance is demonstrated when selecting waveforms that minimize both system interference and tracking MSE.
ContributorsKota, John S (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Berisha, Visar (Committee member) / Bliss, Daniel (Committee member) / Kovvali, Narayan (Committee member) / Arizona State University (Publisher)
Created2016
154721-Thumbnail Image.png
Description
Several music players have evolved in multi-dimensional and surround sound systems. The audio players are implemented as software applications for different audio hardware systems. Digital formats and wireless networks allow for audio content to be readily accessible on smart networked devices. Therefore, different audio output platforms ranging from multispeaker high-end

Several music players have evolved in multi-dimensional and surround sound systems. The audio players are implemented as software applications for different audio hardware systems. Digital formats and wireless networks allow for audio content to be readily accessible on smart networked devices. Therefore, different audio output platforms ranging from multispeaker high-end surround systems to single unit Bluetooth speakers have been developed. A large body of research has been carried out in audio processing, beamforming, sound fields etc. and new formats are developed to create realistic audio experiences.

An emerging trend is seen towards high definition AV systems, virtual reality gears as well as gaming applications with multidimensional audio. Next generation media technology is concentrating around Virtual reality experience and devices. It has applications not only in gaming but all other fields including medical, entertainment, engineering, and education. All such systems also require realistic audio corresponding with the visuals.

In the project presented in this thesis, a new portable audio hardware system is designed and developed along with a dedicated mobile android application to render immersive surround sound experiences with real-time audio effects. The tablet and mobile phone allow the user to control or “play” with sound directionality and implement various audio effects including sound rotation, spatialization, and other immersive experiences. The thesis describes the hardware and software design, provides the theory of the sound effects, and presents demonstrations of the sound application that was created.
ContributorsDharmadhikari, Chinmay (Author) / Spanias, Andreas (Thesis advisor) / Turaga, Pavan (Committee member) / Ingalls, Todd (Committee member) / Arizona State University (Publisher)
Created2016
154471-Thumbnail Image.png
Description
The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.
ContributorsAnirudh, Rushil (Author) / Turaga, Pavan (Thesis advisor) / Cochran, Douglas (Committee member) / Runger, George C. (Committee member) / Taylor, Thomas (Committee member) / Arizona State University (Publisher)
Created2016
154532-Thumbnail Image.png
Description
Modern systems that measure dynamical phenomena often have limitations as to how many sensors can operate at any given time step. This thesis considers a sensor scheduling problem in which the source of a diffusive phenomenon is to be localized using single point measurements of its concentration. With a

Modern systems that measure dynamical phenomena often have limitations as to how many sensors can operate at any given time step. This thesis considers a sensor scheduling problem in which the source of a diffusive phenomenon is to be localized using single point measurements of its concentration. With a linear diffusion model, and in the absence of noise, classical observability theory describes whether or not the system's initial state can be deduced from a given set of linear measurements. However, it does not describe to what degree the system is observable. Different metrics of observability have been proposed in literature to address this issue. Many of these methods are based on choosing optimal or sub-optimal sensor schedules from a predetermined collection of possibilities. This thesis proposes two greedy algorithms for a one-dimensional and two-dimensional discrete diffusion processes. The first algorithm considers a deterministic linear dynamical system and deterministic linear measurements. The second algorithm considers noise on the measurements and is compared to a Kalman filter scheduling method described in published work.
ContributorsNajam, Anbar (Author) / Cochran, Douglas (Thesis advisor) / Turaga, Pavan (Committee member) / Wang, Chao (Committee member) / Arizona State University (Publisher)
Created2016
154319-Thumbnail Image.png
Description
In many applications, measured sensor data is meaningful only when the location of sensors is accurately known. Therefore, the localization accuracy is crucial. In this dissertation, both location estimation and location detection problems are considered.

In location estimation problems, sensor nodes at known locations, called anchors, transmit signals to sensor

In many applications, measured sensor data is meaningful only when the location of sensors is accurately known. Therefore, the localization accuracy is crucial. In this dissertation, both location estimation and location detection problems are considered.

In location estimation problems, sensor nodes at known locations, called anchors, transmit signals to sensor nodes at unknown locations, called nodes, and use these transmissions to estimate the location of the nodes. Specifically, the location estimation in the presence of fading channels using time of arrival (TOA) measurements with narrowband communication signals is considered. Meanwhile, the Cramer-Rao lower bound (CRLB) for localization error under different assumptions is derived. Also, maximum likelihood estimators (MLEs) under these assumptions are derived.

In large WSNs, distributed location estimation algorithms are more efficient than centralized algorithms. A sequential localization scheme, which is one of distributed location estimation algorithms, is considered. Also, different localization methods, such as TOA, received signal strength (RSS), time difference of arrival (TDOA), direction of arrival (DOA), and large aperture array (LAA) are compared under different signal-to-noise ratio (SNR) conditions. Simulation results show that DOA is the preferred scheme at the low SNR regime and the LAA localization algorithm provides better performance for network discovery at high SNRs. Meanwhile, the CRLB for the localization error using the TOA method is also derived.

A distributed location detection scheme, which allows each anchor to make a decision as to whether a node is active or not is proposed. Once an anchor makes a decision, a bit is transmitted to a fusion center (FC). The fusion center combines all the decisions and uses a design parameter $K$ to make the final decision. Three scenarios are considered in this dissertation. Firstly, location detection at a known location is considered. Secondly, detecting a node in a known region is considered. Thirdly, location detection in the presence of fading is considered. The optimal thresholds are derived and the total probability of false alarm and detection under different scenarios are derived.
ContributorsZhang, Xue (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Spanias, Andreas (Thesis advisor) / Tsakalis, Konstantinos (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2016
154384-Thumbnail Image.png
Description
Today's world is seeing a rapid technological advancement in various fields, having access to faster computers and better sensing devices. With such advancements, the task of recognizing human activities has been acknowledged as an important problem, with a wide range of applications such as surveillance, health monitoring and animation. Traditional

Today's world is seeing a rapid technological advancement in various fields, having access to faster computers and better sensing devices. With such advancements, the task of recognizing human activities has been acknowledged as an important problem, with a wide range of applications such as surveillance, health monitoring and animation. Traditional approaches to dynamical modeling have included linear and nonlinear methods with their respective drawbacks. An alternative idea I propose is the use of descriptors of the shape of the dynamical attractor as a feature representation for quantification of nature of dynamics. The framework has two main advantages over traditional approaches: a) representation of the dynamical system is derived directly from the observational data, without any inherent assumptions, and b) the proposed features show stability under different time-series lengths where traditional dynamical invariants fail.

Approximately 1\% of the total world population are stroke survivors, making it the most common neurological disorder. This increasing demand for rehabilitation facilities has been seen as a significant healthcare problem worldwide. The laborious and expensive process of visual monitoring by physical therapists has motivated my research to invent novel strategies to supplement therapy received in hospital in a home-setting. In this direction, I propose a general framework for tuning component-level kinematic features using therapists’ overall impressions of movement quality, in the context of a Home-based Adaptive Mixed Reality Rehabilitation (HAMRR) system.

The rapid technological advancements in computing and sensing has resulted in large amounts of data which requires powerful tools to analyze. In the recent past, topological data analysis methods have been investigated in various communities, and the work by Carlsson establishes that persistent homology can be used as a powerful topological data analysis approach for effectively analyzing large datasets. I have explored suitable topological data analysis methods and propose a framework for human activity analysis utilizing the same for applications such as action recognition.
ContributorsVenkataraman, Vinay (Author) / Turaga, Pavan (Thesis advisor) / Papandreou-Suppappol, Antonia (Committee member) / Krishnamurthi, Narayanan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2016