Search Content

Incorporating auditory models in speech/audio applications

Description

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding…

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.

ContributorsKrishnamoorthi, Harish (Author) / Spanias, Andreas (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2011

Multipath mitigating correlation kernels

Description

Autonomous vehicle control systems utilize real-time kinematic Global Navigation Satellite Systems (GNSS) receivers to provide a position within two-centimeter of truth. GNSS receivers utilize the satellite signal time of arrival estimates to solve for position; and multipath corrupts the time of arrival estimates with a time-varying bias. Time of arrival…

Autonomous vehicle control systems utilize real-time kinematic Global Navigation Satellite Systems (GNSS) receivers to provide a position within two-centimeter of truth. GNSS receivers utilize the satellite signal time of arrival estimates to solve for position; and multipath corrupts the time of arrival estimates with a time-varying bias. Time of arrival estimates are based upon accurate direct sequence spread spectrum (DSSS) code and carrier phase tracking. Current multipath mitigating GNSS solutions include fixed radiation pattern antennas and windowed delay-lock loop code phase discriminators. A new multipath mitigating code tracking algorithm is introduced that utilizes a non-symmetric correlation kernel to reject multipath. Independent parameters provide a means to trade-off code tracking discriminant gain against multipath mitigation performance. The algorithm performance is characterized in terms of multipath phase error bias, phase error estimation variance, tracking range, tracking ambiguity and implementation complexity. The algorithm is suitable for modernized GNSS signals including Binary Phase Shift Keyed (BPSK) and a variety of Binary Offset Keyed (BOC) signals. The algorithm compensates for unbalanced code sequences to ensure a code tracking bias does not result from the use of asymmetric correlation kernels. The algorithm does not require explicit knowledge of the propagation channel model. Design recommendations for selecting the algorithm parameters to mitigate precorrelation filter distortion are also provided.

ContributorsMiller, Steven (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)

Created2013

Brain dynamics based automated epileptic seizure detection

Description

Approximately 1% of the world population suffers from epilepsy. Continuous long-term electroencephalographic (EEG) monitoring is the gold-standard for recording epileptic seizures and assisting in the diagnosis and treatment of patients with epilepsy. However, this process still requires that seizures are visually detected and marked by experienced and trained electroencephalographers. The…

Approximately 1% of the world population suffers from epilepsy. Continuous long-term electroencephalographic (EEG) monitoring is the gold-standard for recording epileptic seizures and assisting in the diagnosis and treatment of patients with epilepsy. However, this process still requires that seizures are visually detected and marked by experienced and trained electroencephalographers. The motivation for the development of an automated seizure detection algorithm in this research was to assist physicians in such a laborious, time consuming and expensive task. Seizures in the EEG vary in duration (seconds to minutes), morphology and severity (clinical to subclinical, occurrence rate) within the same patient and across patients. The task of seizure detection is also made difficult due to the presence of movement and other recording artifacts. An early approach towards the development of automated seizure detection algorithms utilizing both EEG changes and clinical manifestations resulted to a sensitivity of 70-80% and 1 false detection per hour. Approaches based on artificial neural networks have improved the detection performance at the cost of algorithm's training. Measures of nonlinear dynamics, such as Lyapunov exponents, have been applied successfully to seizure prediction. Within the framework of this MS research, a seizure detection algorithm based on measures of linear and nonlinear dynamics, i.e., the adaptive short-term maximum Lyapunov exponent (ASTLmax) and the adaptive Teager energy (ATE) was developed and tested. The algorithm was tested on long-term (0.5-11.7 days) continuous EEG recordings from five patients (3 with intracranial and 2 with scalp EEG) and a total of 56 seizures, producing a mean sensitivity of 93% and mean specificity of 0.048 false positives per hour. The developed seizure detection algorithm is data-adaptive, training-free and patient-independent. It is expected that this algorithm will assist physicians in reducing the time spent on detecting seizures, lead to faster and more accurate diagnosis, better evaluation of treatment, and possibly to better treatments if it is incorporated on-line and real-time with advanced neuromodulation therapies for epilepsy.

ContributorsVenkataraman, Vinay (Author) / Jassemidis, Leonidas (Thesis advisor) / Spanias, Andreas (Thesis advisor) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2012

On the dynamics of epileptic spikes and focus localization in temporal lobe epilepsy

Description

Interictal spikes, together with seizures, have been recognized as the two hallmarks of epilepsy, a brain disorder that 1% of the world's population suffers from. Even though the presence of spikes in brain's electromagnetic activity has diagnostic value, their dynamics are still elusive. It was an objective of this dissertation…

Interictal spikes, together with seizures, have been recognized as the two hallmarks of epilepsy, a brain disorder that 1% of the world's population suffers from. Even though the presence of spikes in brain's electromagnetic activity has diagnostic value, their dynamics are still elusive. It was an objective of this dissertation to formulate a mathematical framework within which the dynamics of interictal spikes could be thoroughly investigated. A new epileptic spike detection algorithm was developed by employing data adaptive morphological filters. The performance of the spike detection algorithm was favorably compared with others in the literature. A novel spike spatial synchronization measure was developed and tested on coupled spiking neuron models. Application of this measure to individual epileptic spikes in EEG from patients with temporal lobe epilepsy revealed long-term trends of increase in synchronization between pairs of brain sites before seizures and desynchronization after seizures, in the same patient as well as across patients, thus supporting the hypothesis that seizures may occur to break (reset) the abnormal spike synchronization in the brain network. Furthermore, based on these results, a separate spatial analysis of spike rates was conducted that shed light onto conflicting results in the literature about variability of spike rate before and after seizure. The ability to automatically classify seizures into clinical and subclinical was a result of the above findings. A novel method for epileptogenic focus localization from interictal periods based on spike occurrences was also devised, combining concepts from graph theory, like eigenvector centrality, and the developed spike synchronization measure, and tested very favorably against the utilized gold rule in clinical practice for focus localization from seizures onset. Finally, in another application of resetting of brain dynamics at seizures, it was shown that it is possible to differentiate with a high accuracy between patients with epileptic seizures (ES) and patients with psychogenic nonepileptic seizures (PNES). The above studies of spike dynamics have elucidated many unknown aspects of ictogenesis and it is expected to significantly contribute to further understanding of the basic mechanisms that lead to seizures, the diagnosis and treatment of epilepsy.

ContributorsKrishnan, Balu (Author) / Iasemidis, Leonidas (Thesis advisor) / Tsakalis, Kostantinos (Committee member) / Spanias, Andreas (Committee member) / Si, Jennie (Committee member) / Arizona State University (Publisher)

Created2012

Signal processing and robust statistics for fault detection in photovoltaic arrays

Description

Photovoltaics (PV) is an important and rapidly growing area of research. With the advent of power system monitoring and communication technology collectively known as the "smart grid," an opportunity exists to apply signal processing techniques to monitoring and control of PV arrays. In this paper a monitoring system which provides…

Photovoltaics (PV) is an important and rapidly growing area of research. With the advent of power system monitoring and communication technology collectively known as the "smart grid," an opportunity exists to apply signal processing techniques to monitoring and control of PV arrays. In this paper a monitoring system which provides real-time measurements of each PV module's voltage and current is considered. A fault detection algorithm formulated as a clustering problem and addressed using the robust minimum covariance determinant (MCD) estimator is described; its performance on simulated instances of arc and ground faults is evaluated. The algorithm is found to perform well on many types of faults commonly occurring in PV arrays. Among several types of detection algorithms considered, only the MCD shows high performance on both types of faults.

ContributorsBraun, Henry (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Spanias, Andreas (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2012

Energy-efficient distributed estimation by utilizing a nonlinear amplifier

Description

Distributed estimation uses many inexpensive sensors to compose an accurate estimate of a given parameter. It is frequently implemented using wireless sensor networks. There have been several studies on optimizing power allocation in wireless sensor networks used for distributed estimation, the vast majority of which assume linear radio-frequency amplifiers. Linear…

Distributed estimation uses many inexpensive sensors to compose an accurate estimate of a given parameter. It is frequently implemented using wireless sensor networks. There have been several studies on optimizing power allocation in wireless sensor networks used for distributed estimation, the vast majority of which assume linear radio-frequency amplifiers. Linear amplifiers are inherently inefficient, so in this dissertation nonlinear amplifiers are examined to gain efficiency while operating distributed sensor networks. This research presents a method to boost efficiency by operating the amplifiers in the nonlinear region of operation. Operating amplifiers nonlinearly presents new challenges. First, nonlinear amplifier characteristics change across manufacturing process variation, temperature, operating voltage, and aging. Secondly, the equations conventionally used for estimators and performance expectations in linear amplify-and-forward systems fail. To compensate for the first challenge, predistortion is utilized not to linearize amplifiers but rather to force them to fit a common nonlinear limiting amplifier model close to the inherent amplifier performance. This minimizes the power impact and the training requirements for predistortion. Second, new estimators are required that account for transmitter nonlinearity. This research derives analytically and confirms via simulation new estimators and performance expectation equations for use in nonlinear distributed estimation. An additional complication when operating nonlinear amplifiers in a wireless environment is the influence of varied and potentially unknown channel gains. The impact of these varied gains and both measurement and channel noise sources on estimation performance are analyzed in this paper. Techniques for minimizing the estimate variance are developed. It is shown that optimizing transmitter power allocation to minimize estimate variance for the most-compressed parameter measurement is equivalent to the problem for linear sensors. Finally, a method for operating distributed estimation in a multipath environment is presented that is capable of developing robust estimates for a wide range of Rician K-factors. This dissertation demonstrates that implementing distributed estimation using nonlinear sensors can boost system efficiency and is compatible with existing techniques from the literature for boosting efficiency at the system level via sensor power allocation. Nonlinear transmitters work best when channel gains are known and channel noise and receiver noise levels are low.

ContributorsSantucci, Robert (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioðlu, Cihan (Committee member) / Bakkaloglu, Bertan (Committee member) / Tsakalis, Kostas (Committee member) / Arizona State University (Publisher)

Created2013

Addressing the Challenges of Automated Speech and Language Analysis for the Assessment of Mental Health and Functional Competency

Description

Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews…

Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews and a set of neuropsychiatric batteries; a key component of nearly all of these evaluations is some spoken language task. Clinicians have long used speech and language production as a proxy for neurological health, but most of these assessments are subjective in nature. Meanwhile, technological advancements in speech and natural language processing have grown exponentially over the past decade, increasing the capacity of computer models to assess particular aspects of speech and language. For this reason, many have seen an opportunity to leverage signal processing and machine learning applications to objectively assess clinical speech samples in order to automatically compute objective measures of neurological health. This document summarizes several contributions to expand upon this body of research. Mainly, there is still a large gap between the theoretical power of computational language models and their actual use in clinical applications. One of the largest concerns is the limited and inconsistent reliability of speech and language features used in models for assessing specific aspects of mental health; numerous methods may exist to measure the same or similar constructs and lead researchers to different conclusions in different studies. To address this, a novel measurement model based on a theoretical framework of speech production is used to motivate feature selection, while also performing a smoothing operation on features across several domains of interest. Then, these composite features are used to perform a much wider range of analyses than is typical of previous studies, looking at everything from diagnosis to functional competency assessments. Lastly, potential improvements to address practical implementation challenges associated with the use of speech and language technology in a real-world environment are investigated. The goal of this work is to demonstrate the ability of speech and language technology to aid clinical practitioners toward improvements in quality of life outcomes for their patients.

ContributorsVoleti, Rohit Nihar Uttam (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie M (Thesis advisor) / Turaga, Pavan (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2022

Representation Learning for Graph Structured Data using Deep Neural Networks

Description

Dealing with relational data structures is central to a wide-range of applications including social networks, epidemic modeling, molecular chemistry, medicine, energy distribution, and transportation. Machine learning models that can exploit the inherent structural/relational bias in the graph structured data have gained prominence in recent times. A recurring idea that appears…

Dealing with relational data structures is central to a wide-range of applications including social networks, epidemic modeling, molecular chemistry, medicine, energy distribution, and transportation. Machine learning models that can exploit the inherent structural/relational bias in the graph structured data have gained prominence in recent times. A recurring idea that appears in all approaches is to encode the nodes in the graph (or the entire graph) as low-dimensional vectors also known as embeddings, prior to carrying out downstream task-specific learning. It is crucial to eliminate hand-crafted features and instead directly incorporate the structural inductive bias into the deep learning architectures. In this dissertation, deep learning models that directly operate on graph structured data are proposed for effective representation learning. A literature review on existing graph representation learning is provided in the beginning of the dissertation. The primary focus of dissertation is on building novel graph neural network architectures that are robust against adversarial attacks. The proposed graph neural network models are extended to multiplex graphs (heterogeneous graphs). Finally, a relational neural network model is proposed to operate on a human structural connectome. For every research contribution of this dissertation, several empirical studies are conducted on benchmark datasets. The proposed graph neural network models, approaches, and architectures demonstrate significant performance improvements in comparison to the existing state-of-the-art graph embedding strategies.

ContributorsShanthamallu, Uday Shankar (Author) / Spanias, Andreas (Thesis advisor) / Thiagarajan, Jayaraman J (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2021

Development of hardware and software for a game-like wireless spatial sound distribution system

Description

Several music players have evolved in multi-dimensional and surround sound systems. The audio players are implemented as software applications for different audio hardware systems. Digital formats and wireless networks allow for audio content to be readily accessible on smart networked devices. Therefore, different audio output platforms ranging from multispeaker high-end…

Several music players have evolved in multi-dimensional and surround sound systems. The audio players are implemented as software applications for different audio hardware systems. Digital formats and wireless networks allow for audio content to be readily accessible on smart networked devices. Therefore, different audio output platforms ranging from multispeaker high-end surround systems to single unit Bluetooth speakers have been developed. A large body of research has been carried out in audio processing, beamforming, sound fields etc. and new formats are developed to create realistic audio experiences.

An emerging trend is seen towards high definition AV systems, virtual reality gears as well as gaming applications with multidimensional audio. Next generation media technology is concentrating around Virtual reality experience and devices. It has applications not only in gaming but all other fields including medical, entertainment, engineering, and education. All such systems also require realistic audio corresponding with the visuals.

In the project presented in this thesis, a new portable audio hardware system is designed and developed along with a dedicated mobile android application to render immersive surround sound experiences with real-time audio effects. The tablet and mobile phone allow the user to control or “play” with sound directionality and implement various audio effects including sound rotation, spatialization, and other immersive experiences. The thesis describes the hardware and software design, provides the theory of the sound effects, and presents demonstrations of the sound application that was created.

ContributorsDharmadhikari, Chinmay (Author) / Spanias, Andreas (Thesis advisor) / Turaga, Pavan (Committee member) / Ingalls, Todd (Committee member) / Arizona State University (Publisher)

Created2016

Detection, prediction and control of epileptic seizures

Description

From time immemorial, epilepsy has persisted to be one of the greatest impediments to human life for those stricken by it. As the fourth most common neurological disorder, epilepsy causes paroxysmal electrical discharges in the brain that manifest as seizures. Seizures have the effect of debilitating patients on a physical…

From time immemorial, epilepsy has persisted to be one of the greatest impediments to human life for those stricken by it. As the fourth most common neurological disorder, epilepsy causes paroxysmal electrical discharges in the brain that manifest as seizures. Seizures have the effect of debilitating patients on a physical and psychological level. Although not lethal by themselves, they can bring about total disruption in consciousness which can, in hazardous conditions, lead to fatality. Roughly 1\% of the world population suffer from epilepsy and another 30 to 50 new cases per 100,000 increase the number of affected annually. Controlling seizures in epileptic patients has therefore become a great medical and, in recent years, engineering challenge.

In this study, the conditions of human seizures are recreated in an animal model of temporal lobe epilepsy. The rodents used in this study are chemically induced to become chronically epileptic. Their Electroencephalogram (EEG) data is then recorded and analyzed to detect and predict seizures; with the ultimate goal being the control and complete suppression of seizures.

Two methods, the maximum Lyapunov exponent and the Generalized Partial Directed Coherence (GPDC), are applied on EEG data to extract meaningful information. Their effectiveness have been reported in the literature for the purpose of prediction of seizures and seizure focus localization. This study integrates these measures, through some modifications, to robustly detect seizures and separately find precursors to them and in consequence provide stimulation to the epileptic brain of rats in order to suppress seizures. Additionally open-loop stimulation with biphasic currents of various pairs of sites in differing lengths of time have helped us create control efficacy maps. While GPDC tells us about the possible location of the focus, control efficacy maps tells us how effective stimulating a certain pair of sites will be.

The results from computations performed on the data are presented and the feasibility of the control problem is discussed. The results show a new reliable means of seizure detection even in the presence of artifacts in the data. The seizure precursors provide a means of prediction, in the order of tens of minutes, prior to seizures. Closed loop stimulation experiments based on these precursors and control efficacy maps on the epileptic animals show a maximum reduction of seizure frequency by 24.26\% in one animal and reduction of length of seizures by 51.77\% in another. Thus, through this study it was shown that the implementation of the methods can ameliorate seizures in an epileptic patient. It is expected that the new knowledge and experimental techniques will provide a guide for future research in an effort to ultimately eliminate seizures in epileptic patients.

ContributorsShafique, Md Ashfaque Bin (Author) / Tsakalis, Konstantinos (Thesis advisor) / Rodriguez, Armando (Committee member) / Muthuswamy, Jitendran (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by