Search Content

Audio processing and loudness estimation algorithms with iOS simulations

Description

The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters…

The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP." These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students.

ContributorsKalyanasundaram, Girish (Author) / Spanias, Andreas S (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2013

Time-dependent modulations in corticospinal excitability during motor learning

Description

The primary motor cortex (M1) plays a vital role in motor planning and execution, as well as in motor learning. Baseline corticospinal excitability (CSE) in M1 is known to increase as a result of motor learning, but less is understand about the modulation of CSE at the pre-execution planning stage…

The primary motor cortex (M1) plays a vital role in motor planning and execution, as well as in motor learning. Baseline corticospinal excitability (CSE) in M1 is known to increase as a result of motor learning, but less is understand about the modulation of CSE at the pre-execution planning stage due to learning. This question was addressed using single pulse transcranial magnetic stimulation (TMS) to measure the modulation of both baseline and planning CSE due to learning a reach to grasp task. It was hypothesized that baseline CSE would increase and planning CSE decrease as a function of trial; an increase in baseline CSE would replicate established findings in the literature, while a decrease in planning would be a novel finding. Eight right-handed subjects were visually cued to exert a precise grip force, with the goal of producing that force accurately and consistently. Subjects effectively learned the task in the first 10 trials, but no significant trends were found in the modulation of baseline or planning CSE. The lack of significant results may be due to the very quick learning phase or the lower intensity of training as compared to past studies. The findings presented here suggest that planning and baseline CSE may be modulated along different time courses as learning occurs and point to some important considerations for future studies addressing this question.

ContributorsMoore, Dalton Dale (Author) / Santello, Marco (Thesis director) / Kleim, Jeff (Committee member) / Barrett, The Honors College (Contributor) / Harrington Bioengineering Program (Contributor)

Created2015-05

The Role of Primary Motor Cortex (M1) in the Context-Dependent Interference

Description

A previous study demonstrated that learning to lift an object is context-based and that in the presence of both the memory and visual cues, the acquired sensorimotor memory to manipulate an object in one context interferes with the performance of the same task in presence of visual information about a…

A previous study demonstrated that learning to lift an object is context-based and that in the presence of both the memory and visual cues, the acquired sensorimotor memory to manipulate an object in one context interferes with the performance of the same task in presence of visual information about a different context (Fu et al, 2012).
The purpose of this study is to know whether the primary motor cortex (M1) plays a role in the sensorimotor memory. It was hypothesized that temporary disruption of the M1 following the learning to minimize a tilt using a ‘L’ shaped object would negatively affect the retention of sensorimotor memory and thus reduce interference between the memory acquired in one context and the visual cues to perform the same task in a different context.
Significant findings were shown in blocks 1, 2, and 4. In block 3, subjects displayed insignificant amount of learning. However, it cannot be concluded that there is full interference in block 3. Therefore, looked into 3 effects in statistical analysis: the main effects of the blocks, the main effects of the trials, and the effects of the blocks and trials combined. From the block effects, there is a p-value of 0.001, and from the trial effects, the p-value is less than 0.001. Both of these effects indicate that there is learning occurring. However, when looking at the blocks * trials effects, we see a p-value of 0.002 < 0.05 indicating significant interaction between sensorimotor memories. Based on the results that were found, there is a presence of interference in all the blocks but not enough to justify the use of TMS in order to reduce interference because there is a partial reduction of interference from the control experiment. It is evident that the time delay might be the issue between context switches. By reducing the time delay between block 2 and 3 from 10 minutes to 5 minutes, I will hope to see significant learning to occur from the first trial to the second trial.

ContributorsHasan, Salman Bashir (Author) / Santello, Marco (Thesis director) / Kleim, Jeffrey (Committee member) / Helms Tillery, Stephen (Committee member) / Barrett, The Honors College (Contributor) / W. P. Carey School of Business (Contributor) / Harrington Bioengineering Program (Contributor)

Created2014-05

Bayesian Inference and Information Learning for Switching Nonlinear Gene Regulatory Networks

Description

This dissertation centers on the development of Bayesian methods for learning differ- ent types of variation in switching nonlinear gene regulatory networks (GRNs). A new nonlinear and dynamic multivariate GRN model is introduced to account for different sources of variability in GRNs. The new model is aimed at more precisely…

This dissertation centers on the development of Bayesian methods for learning differ- ent types of variation in switching nonlinear gene regulatory networks (GRNs). A new nonlinear and dynamic multivariate GRN model is introduced to account for different sources of variability in GRNs. The new model is aimed at more precisely capturing the complexity of GRN interactions through the introduction of time-varying kinetic order parameters, while allowing for variability in multiple model parameters. This model is used as the drift function in the development of several stochastic GRN mod- els based on Langevin dynamics. Six models are introduced which capture intrinsic and extrinsic noise in GRNs, thereby providing a full characterization of a stochastic regulatory system. A Bayesian hierarchical approach is developed for learning the Langevin model which best describes the noise dynamics at each time step. The trajectory of the state, which are the gene expression values, as well as the indicator corresponding to the correct noise model are estimated via sequential Monte Carlo (SMC) with a high degree of accuracy. To address the problem of time-varying regulatory interactions, a Bayesian hierarchical model is introduced for learning variation in switching GRN architectures with unknown measurement noise covariance. The trajectory of the state and the indicator corresponding to the network configuration at each time point are estimated using SMC. This work is extended to a fully Bayesian hierarchical model to account for uncertainty in the process noise covariance associated with each network architecture. An SMC algorithm with local Gibbs sampling is developed to estimate the trajectory of the state and the indicator correspond- ing to the network configuration at each time point with a high degree of accuracy. The results demonstrate the efficacy of Bayesian methods for learning information in switching nonlinear GRNs.

ContributorsVélez-Cruz, Nayely (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Moraffah, Bahman (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2023

A Tunable Loss Function for Robust, Rigorous, and Reliable Machine Learning

Description

In the era of big data, more and more decisions and recommendations are being made by machine learning (ML) systems and algorithms. Despite their many successes, there have been notable deficiencies in the robustness, rigor, and reliability of these ML systems, which have had detrimental societal impacts. In the next…

In the era of big data, more and more decisions and recommendations are being made by machine learning (ML) systems and algorithms. Despite their many successes, there have been notable deficiencies in the robustness, rigor, and reliability of these ML systems, which have had detrimental societal impacts. In the next generation of ML, these significant challenges must be addressed through careful algorithmic design, and it is crucial that practitioners and meta-algorithms have the necessary tools to construct ML models that align with human values and interests. In an effort to help address these problems, this dissertation studies a tunable loss function called α-loss for the ML setting of classification. The alpha-loss is a hyperparameterized loss function originating from information theory that continuously interpolates between the exponential (alpha = 1/2), log (alpha = 1), and 0-1 (alpha = infinity) losses, hence providing a holistic perspective of several classical loss functions in ML. Furthermore, the alpha-loss exhibits unique operating characteristics depending on the value (and different regimes) of alpha; notably, for alpha > 1, alpha-loss robustly trains models when noisy training data is present. Thus, the alpha-loss can provide robustness to ML systems for classification tasks, and this has bearing in many applications, e.g., social media, finance, academia, and medicine; indeed, results are presented where alpha-loss produces more robust logistic regression models for COVID-19 survey data with gains over state of the art algorithmic approaches.

ContributorsSypherd, Tyler (Author) / Sankar, Lalitha (Thesis advisor) / Berisha, Visar (Committee member) / Dasarathy, Gautam (Committee member) / Kosut, Oliver (Committee member) / Arizona State University (Publisher)

Created2022

Representation Learning for Graph Structured Data using Deep Neural Networks

Description

Dealing with relational data structures is central to a wide-range of applications including social networks, epidemic modeling, molecular chemistry, medicine, energy distribution, and transportation. Machine learning models that can exploit the inherent structural/relational bias in the graph structured data have gained prominence in recent times. A recurring idea that appears…

Dealing with relational data structures is central to a wide-range of applications including social networks, epidemic modeling, molecular chemistry, medicine, energy distribution, and transportation. Machine learning models that can exploit the inherent structural/relational bias in the graph structured data have gained prominence in recent times. A recurring idea that appears in all approaches is to encode the nodes in the graph (or the entire graph) as low-dimensional vectors also known as embeddings, prior to carrying out downstream task-specific learning. It is crucial to eliminate hand-crafted features and instead directly incorporate the structural inductive bias into the deep learning architectures. In this dissertation, deep learning models that directly operate on graph structured data are proposed for effective representation learning. A literature review on existing graph representation learning is provided in the beginning of the dissertation. The primary focus of dissertation is on building novel graph neural network architectures that are robust against adversarial attacks. The proposed graph neural network models are extended to multiplex graphs (heterogeneous graphs). Finally, a relational neural network model is proposed to operate on a human structural connectome. For every research contribution of this dissertation, several empirical studies are conducted on benchmark datasets. The proposed graph neural network models, approaches, and architectures demonstrate significant performance improvements in comparison to the existing state-of-the-art graph embedding strategies.

ContributorsShanthamallu, Uday Shankar (Author) / Spanias, Andreas (Thesis advisor) / Thiagarajan, Jayaraman J (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2021

Characterizing EEG Data for Epileptic Seizures Using a Variety of Data Analysis Methods to Understand the Data and Enable Future Research

Description

In this research, I surveyed existing methods of characterizing Epilepsy from Electroencephalogram (EEG) data, including the Random Forest algorithm, which was claimed by many researchers to be the most effective at detecting epileptic seizures [7]. I observed that although many papers claimed a detection of >99% using Random Forest, it…

In this research, I surveyed existing methods of characterizing Epilepsy from Electroencephalogram (EEG) data, including the Random Forest algorithm, which was claimed by many researchers to be the most effective at detecting epileptic seizures [7]. I observed that although many papers claimed a detection of >99% using Random Forest, it was not specified “when” the detection was declared within the 23.6 second interval of the seizure event. In this research, I created a time-series procedure to detect the seizure as early as possible within the 23.6 second epileptic seizure window and found that the detection is effective (> 92%) as early as the first few seconds of the epileptic episode. I intend to use this research as a stepping stone towards my upcoming Masters thesis research where I plan to expand the time-series detection mechanism to the pre-ictal stage, which will require a different dataset.

ContributorsBou-Ghazale, Carine (Author) / Lai, Ying-Cheng (Thesis director) / Berisha, Visar (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)

Created2022-05

Learning Robust and Repeatable Speech Features for Clinical Applications

Description

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on…

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on conventional acoustic feature changes in subjects with post-traumatic headache (PTH) attributed to mild traumatic brain injury (mTBI). This work demonstrates the effectiveness of using speech signals to assess the pathological status of individuals. At the same time, it highlights some of the limitations of conventional acoustic and linguistic features, such as low repeatability and generalizability. Two critical characteristics of speech features are (1) good robustness, as speech features need to generalize across different corpora, and (2) high repeatability, as speech features need to be invariant to all confounding factors except the pathological state of targets. This thesis presents two research thrusts in the context of speech signals in clinical applications that focus on improving the robustness and repeatability of speech features, respectively. The first thrust introduces a deep learning framework to generate acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss combined with a classification loss is used to train the model jointly, and data-warping techniques are employed to improve the robustness of embeddings. Empirical results demonstrate that the proposed method achieves high in-corpus and cross-corpus classification accuracy and generates good embeddings sensitive to voice quality and robust across different corpora. The second thrust introduces using the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. A novel regularizer, the ICC regularizer, is proposed to regularize deep neural networks to produce embeddings with higher repeatability. This ICC regularizer is implemented and applied to three speech applications: a clinical application, speaker verification, and voice style conversion. The experimental results reveal that the ICC regularizer improves the repeatability of learned embeddings compared to the contrastive loss, leading to enhanced performance in downstream tasks.

ContributorsZhang, Jianwei (Author) / Jayasuriya, Suren (Thesis advisor) / Berisha, Visar (Thesis advisor) / Liss, Julie (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2023

Methodologies to Improve Fidelity and Reliability of Deep Learning Models for Real-World Deployment

Description

The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain aware knowledge; (iii) i.i.d assumption between the train and target…

The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain aware knowledge; (iii) i.i.d assumption between the train and target distributions and (iv) belief on existing metrics as reliable indicators of performance. When any of these assumptions are violated, the models exhibit brittleness producing adversely varied behavior. This dissertation focuses on methods for accurate model design and characterization that enhance process reliability when certain assumptions are not met. With the need to safely adopt artificial intelligence tools in practice, it is vital to build reliable failure detectors that indicate regimes where the model must not be invoked. To that end, an error predictor trained with a self-calibration objective is developed to estimate loss consistent with the underlying model. The properties of the error predictor are described and their utility in supporting introspection via feature importances and counterfactual explanations is elucidated. While such an approach can signal data regime changes, it is critical to calibrate models using regimes of inlier (training) and outlier data to prevent under- and over-generalization in models i.e., incorrectly identifying inliers as outliers and vice-versa. By identifying the space for specifying inliers and outliers, an anomaly detector that can effectively flag data of varying semantic complexities in medical imaging is next developed. Uncertainty quantification in deep learning models involves identifying sources of failure and characterizing model confidence to enable actionability. A training strategy is developed that allows the accurate estimation of model uncertainties and its benefits are demonstrated for active learning and generalization gap prediction. This helps identify insufficiently sampled regimes and representation insufficiency in models. In addition, the task of deep inversion under data scarce scenarios is considered, which in practice requires a prior to control the optimization. By identifying limitations in existing work, data priors powered by generative models and deep model priors are designed for audio restoration. With relevant empirical studies on a variety of benchmarks, the need for such design strategies is demonstrated.

ContributorsNarayanaswamy, Vivek Sivaraman (Author) / Spanias, Andreas (Thesis advisor) / J. Thiagarajan, Jayaraman (Committee member) / Berisha, Visar (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2023

Does Human Speech Follow Benford's Law

Description

Researchers have observed that the frequencies of leading digits in many man-made and naturally occurring datasets follow a logarithmic curve, with digits that start with the number 1 accounting for 30% of all numbers in the dataset and digits that start with the number 9 accounting for 5% of all…

Researchers have observed that the frequencies of leading digits in many man-made and naturally occurring datasets follow a logarithmic curve, with digits that start with the number 1 accounting for 30% of all numbers in the dataset and digits that start with the number 9 accounting for 5% of all numbers in the dataset. This phenomenon, known as Benford's Law, is highly repeatable and appears in lists of numbers from electricity bills, stock prices, tax returns, house prices, death rates, lengths of rivers, and naturally occurring images. This paper will demonstrate that human speech spectra also follow Benford's Law. This observation is used to motivate a new set of features that can be efficiently extracted from speech and demonstrate that these features can be used to classify between human speech and synthetic speech.

ContributorsHsu, Leo (Author) / Berisha, Visar (Thesis advisor) / Spanias, Andreas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2022

Filtering by