Matching Items (89)
Filtering by

Clear all filters

156747-Thumbnail Image.png
Description
Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements.

First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance.



Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model.

Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions.
ContributorsDodge, Samuel Fuller (Author) / Karam, Lina (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2018
156919-Thumbnail Image.png
Description
Motion estimation is a core task in computer vision and many applications utilize optical flow methods as fundamental tools to analyze motion in images and videos. Optical flow is the apparent motion of objects in image sequences that results from relative motion between the objects and the imaging perspective. Today,

Motion estimation is a core task in computer vision and many applications utilize optical flow methods as fundamental tools to analyze motion in images and videos. Optical flow is the apparent motion of objects in image sequences that results from relative motion between the objects and the imaging perspective. Today, optical flow fields are utilized to solve problems in various areas such as object detection and tracking, interpolation, visual odometry, etc. In this dissertation, three problems from different areas of computer vision and the solutions that make use of modified optical flow methods are explained.

The contributions of this dissertation are approaches and frameworks that introduce i) a new optical flow-based interpolation method to achieve minimally divergent velocimetry data, ii) a framework that improves the accuracy of change detection algorithms in synthetic aperture radar (SAR) images, and iii) a set of new methods to integrate Proton Magnetic Resonance Spectroscopy (1HMRSI) data into threedimensional (3D) neuronavigation systems for tumor biopsies.

In the first application an optical flow-based approach for the interpolation of minimally divergent velocimetry data is proposed. The velocimetry data of incompressible fluids contain signals that describe the flow velocity. The approach uses the additional flow velocity information to guide the interpolation process towards reduced divergence in the interpolated data.

In the second application a framework that mainly consists of optical flow methods and other image processing and computer vision techniques to improve object extraction from synthetic aperture radar images is proposed. The proposed framework is used for distinguishing between actual motion and detected motion due to misregistration in SAR image sets and it can lead to more accurate and meaningful change detection and improve object extraction from a SAR datasets.

In the third application a set of new methods that aim to improve upon the current state-of-the-art in neuronavigation through the use of detailed three-dimensional (3D) 1H-MRSI data are proposed. The result is a progressive form of online MRSI-guided neuronavigation that is demonstrated through phantom validation and clinical application.
ContributorsKanberoglu, Berkay (Author) / Frakes, David (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2018
156802-Thumbnail Image.png
Description
Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees

Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees of quality, and the quality can be defined with regard to aesthetic, athletic, or health-related ratings. It is useful to measure and track the quality of a person's movements, for various applications, especially with the prevalence of low-cost and portable cameras and sensors today. Furthermore, based on such measurements, feedback systems can be designed for people to practice their movements towards certain goals. In this dissertation, I introduce symmetry as a family of measures for movement quality, and utilize recent advances in computer vision and differential geometry to model and analyze different types of symmetry in human movements. Movements are modeled as trajectories on different types of manifolds, according to the representations of movements from sensor data. The benefit of such a universal framework is that it can accommodate different existing and future features that describe human movements. The theory and tools developed in this dissertation will also be useful in other scientific areas to analyze symmetry from high-dimensional signals.
ContributorsWang, Qiao (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Srivastava, Anuj (Committee member) / Sha, Xin Wei (Committee member) / Arizona State University (Publisher)
Created2018
157215-Thumbnail Image.png
Description
Non-line-of-sight (NLOS) imaging of objects not visible to either the camera or illumina-

tion source is a challenging task with vital applications including surveillance and robotics.

Recent NLOS reconstruction advances have been achieved using time-resolved measure-

ments. Acquiring these time-resolved measurements requires expensive and specialized

detectors and laser sources. In work proposes a data-driven

Non-line-of-sight (NLOS) imaging of objects not visible to either the camera or illumina-

tion source is a challenging task with vital applications including surveillance and robotics.

Recent NLOS reconstruction advances have been achieved using time-resolved measure-

ments. Acquiring these time-resolved measurements requires expensive and specialized

detectors and laser sources. In work proposes a data-driven approach for NLOS 3D local-

ization requiring only a conventional camera and projector. The localisation is performed

using a voxelisation and a regression problem. Accuracy of greater than 90% is achieved

in localizing a NLOS object to a 5cm × 5cm × 5cm volume in real data. By adopting

the regression approach an object of width 10cm to localised to approximately 1.5cm. To

generalize to line-of-sight (LOS) scenes with non-planar surfaces, an adaptive lighting al-

gorithm is adopted. This algorithm, based on radiosity, identifies and illuminates scene

patches in the LOS which most contribute to the NLOS light paths, and can factor in sys-

tem power constraints. Improvements ranging from 6%-15% in accuracy with a non-planar

LOS wall using adaptive lighting is reported, demonstrating the advantage of combining

the physics of light transport with active illumination for data-driven NLOS imaging.
ContributorsChandran, Sreenithy (Author) / Jayasuriya, Suren (Thesis advisor) / Turaga, Pavan (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)
Created2019
156987-Thumbnail Image.png
Description
Speech is generated by articulators acting on

a phonatory source. Identification of this

phonatory source and articulatory geometry are

individually challenging and ill-posed

problems, called speech separation and

articulatory inversion, respectively.

There exists a trade-off

between decomposition and recovered

articulatory geometry due to multiple

possible mappings between an

articulatory configuration

and the speech produced. However, if measurements

are

Speech is generated by articulators acting on

a phonatory source. Identification of this

phonatory source and articulatory geometry are

individually challenging and ill-posed

problems, called speech separation and

articulatory inversion, respectively.

There exists a trade-off

between decomposition and recovered

articulatory geometry due to multiple

possible mappings between an

articulatory configuration

and the speech produced. However, if measurements

are obtained only from a microphone sensor,

they lack any invasive insight and add

additional challenge to an already difficult

problem.

A joint non-invasive estimation

strategy that couples articulatory and

phonatory knowledge would lead to better

articulatory speech synthesis. In this thesis,

a joint estimation strategy for speech

separation and articulatory geometry recovery

is studied. Unlike previous

periodic/aperiodic decomposition methods that

use stationary speech models within a

frame, the proposed model presents a

non-stationary speech decomposition method.

A parametric glottal source model and an

articulatory vocal tract response are

represented in a dynamic state space formulation.

The unknown parameters of the

speech generation components are estimated

using sequential Monte Carlo methods

under some specific assumptions.

The proposed approach is compared with other

glottal inverse filtering methods,

including iterative adaptive inverse filtering,

state-space inverse filtering, and

the quasi-closed phase method.
ContributorsVenkataramani, Adarsh Akkshai (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel W (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2018
154532-Thumbnail Image.png
Description
Modern systems that measure dynamical phenomena often have limitations as to how many sensors can operate at any given time step. This thesis considers a sensor scheduling problem in which the source of a diffusive phenomenon is to be localized using single point measurements of its concentration. With a

Modern systems that measure dynamical phenomena often have limitations as to how many sensors can operate at any given time step. This thesis considers a sensor scheduling problem in which the source of a diffusive phenomenon is to be localized using single point measurements of its concentration. With a linear diffusion model, and in the absence of noise, classical observability theory describes whether or not the system's initial state can be deduced from a given set of linear measurements. However, it does not describe to what degree the system is observable. Different metrics of observability have been proposed in literature to address this issue. Many of these methods are based on choosing optimal or sub-optimal sensor schedules from a predetermined collection of possibilities. This thesis proposes two greedy algorithms for a one-dimensional and two-dimensional discrete diffusion processes. The first algorithm considers a deterministic linear dynamical system and deterministic linear measurements. The second algorithm considers noise on the measurements and is compared to a Kalman filter scheduling method described in published work.
ContributorsNajam, Anbar (Author) / Cochran, Douglas (Thesis advisor) / Turaga, Pavan (Committee member) / Wang, Chao (Committee member) / Arizona State University (Publisher)
Created2016
154471-Thumbnail Image.png
Description
The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.
ContributorsAnirudh, Rushil (Author) / Turaga, Pavan (Thesis advisor) / Cochran, Douglas (Committee member) / Runger, George C. (Committee member) / Taylor, Thomas (Committee member) / Arizona State University (Publisher)
Created2016
154721-Thumbnail Image.png
Description
Several music players have evolved in multi-dimensional and surround sound systems. The audio players are implemented as software applications for different audio hardware systems. Digital formats and wireless networks allow for audio content to be readily accessible on smart networked devices. Therefore, different audio output platforms ranging from multispeaker high-end

Several music players have evolved in multi-dimensional and surround sound systems. The audio players are implemented as software applications for different audio hardware systems. Digital formats and wireless networks allow for audio content to be readily accessible on smart networked devices. Therefore, different audio output platforms ranging from multispeaker high-end surround systems to single unit Bluetooth speakers have been developed. A large body of research has been carried out in audio processing, beamforming, sound fields etc. and new formats are developed to create realistic audio experiences.

An emerging trend is seen towards high definition AV systems, virtual reality gears as well as gaming applications with multidimensional audio. Next generation media technology is concentrating around Virtual reality experience and devices. It has applications not only in gaming but all other fields including medical, entertainment, engineering, and education. All such systems also require realistic audio corresponding with the visuals.

In the project presented in this thesis, a new portable audio hardware system is designed and developed along with a dedicated mobile android application to render immersive surround sound experiences with real-time audio effects. The tablet and mobile phone allow the user to control or “play” with sound directionality and implement various audio effects including sound rotation, spatialization, and other immersive experiences. The thesis describes the hardware and software design, provides the theory of the sound effects, and presents demonstrations of the sound application that was created.
ContributorsDharmadhikari, Chinmay (Author) / Spanias, Andreas (Thesis advisor) / Turaga, Pavan (Committee member) / Ingalls, Todd (Committee member) / Arizona State University (Publisher)
Created2016
154734-Thumbnail Image.png
Description
The human motion is defined as an amalgamation of several physical traits such as bipedal locomotion, posture and manual dexterity, and mental expectation. In addition to the “positive” body form defined by these traits, casting light on the body produces a “negative” of the body: its shadow. We often interchangeably

The human motion is defined as an amalgamation of several physical traits such as bipedal locomotion, posture and manual dexterity, and mental expectation. In addition to the “positive” body form defined by these traits, casting light on the body produces a “negative” of the body: its shadow. We often interchangeably use with silhouettes in the place of shadow to emphasize indifference to interior features. In a manner of speaking, the shadow is an alter ego that imitates the individual.

The principal value of shadow is its non-invasive behaviour of reflecting precisely the actions of the individual it is attached to. Nonetheless we can still think of the body’s shadow not as the body but its alter ego.

Based on this premise, my thesis creates an experiential system that extracts the data related to the contour of your human shape and gives it a texture and life of its own, so as to emulate your movements and postures, and to be your extension. In technical terms, my thesis extracts abstraction from a pre-indexed database that could be generated from an offline data set or in real time to complement these actions of a user in front of a low-cost optical motion capture device like the Microsoft Kinect. This notion could be the system’s interpretation of the action which creates modularized art through the abstraction’s ‘similarity’ to the live action.

Through my research, I have developed a stable system that tackles various connotations associated with shadows and the need to determine the ideal features that contribute to the relevance of the actions performed. The implication of Factor Oracle [3] pattern interpretation is tested with a feature bin of videos. The system also is flexible towards several methods of Nearest Neighbours searches and a machine learning module to derive the same output. The overall purpose is to establish this in real time and provide a constant feedback to the user. This can be expanded to handle larger dynamic data.

In addition to estimating human actions, my thesis best tries to test various Nearest Neighbour search methods in real time depending upon the data stream. This provides a basis to understand varying parameters that complement human activity recognition and feature matching in real time.
ContributorsSeshasayee, Sudarshan Prashanth (Author) / Sha, Xin Wei (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Tinapple, David A (Committee member) / Arizona State University (Publisher)
Created2016
154572-Thumbnail Image.png
Description
This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed methodologies application for clinical practice in speech-language pathology. The second area concentrates on signal expansions matched to physical-based models but

This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed methodologies application for clinical practice in speech-language pathology. The second area concentrates on signal expansions matched to physical-based models but without requiring independent basis functions; the significance of this work is demonstrated with speech vowels.

A fully automated Vowel Space Area (VSA) computation method is proposed that can be applied to any type of speech. It is shown that the VSA provides an efficient and reliable measure and is correlated to speech intelligibility. A clinical tool that incorporates the automated VSA was proposed for evaluation and treatment to be used by speech language pathologists. Two exploratory studies are performed using two databases by analyzing mean formant trajectories in healthy speech for a wide range of speakers, dialects, and coarticulation contexts. It is shown that phonemes crowded in formant space can often have distinct trajectories, possibly due to accurate perception.

A theory for analyzing time-varying signals models with amplitude modulation and frequency modulation is developed. Examples are provided that demonstrate other possible signal model decompositions with independent basis functions and corresponding physical interpretations. The Hilbert transform (HT) and the use of the analytic form of a signal are motivated, and a proof is provided to show that a signal can still preserve desirable mathematical properties without the use of the HT. A visualization of the Hilbert spectrum is proposed to aid in the interpretation. A signal demodulation is proposed and used to develop a modified Empirical Mode Decomposition (EMD) algorithm.
ContributorsSandoval, Steven, 1984- (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Liss, Julie M (Committee member) / Turaga, Pavan (Committee member) / Kovvali, Narayan (Committee member) / Arizona State University (Publisher)
Created2016