Search Content

Correlational Analysis Between Speech and Gait in Parkinson's Disease

Description

Parkinson’s Disease is one of the most complicated and abundantneurodegenerative diseases in the world. Previous analysis of Parkinson’s disease has identified both speech and gait deficits throughout progression of the disease. There has been minimal research looking into the correlation between both the speech and gait deficits in those diagnosed with Parkinson’s. There…

Parkinson’s Disease is one of the most complicated and abundantneurodegenerative diseases in the world. Previous analysis of Parkinson’s disease has identified both speech and gait deficits throughout progression of the disease. There has been minimal research looking into the correlation between both the speech and gait deficits in those diagnosed with Parkinson’s. There is high indication that there is a correlation between the two given the similar pathology and origins of both deficits. This exploratory study aims to establish correlation between both the gait and speech deficits in those diagnosed with Parkinson’s disease. Using previously identified motor and speech measurements and tasks, I conducted a correlational study of individuals with Parkinson’s disease at baseline. There were correlations between multiple speech and gait variability outcomes. The expected correlations ranged from average harmonics-to-noise ratio values against anticipatory postural adjustments-lateral peak distance to average shimmer values against anticipatory postural adjustments-lateral peak distance. There were also unexpected outcomes that ranged from F2 variability against the average number of steps in a turn to intensity variability against step duration variability. I also analyzed the speech changes over 1 year as a secondary outcome of the study. Finally, I found that averages and variabilities increased over 1 year regarding speech primary outcomes. This study serves as a basis for further treatment that may be able to simultaneously treat both speech and gait deficits in those diagnosed with Parkinson’s. The exploratory study also indicates multiple targets for further investigation to better understand cohesive and compensatory mechanisms.

ContributorsBelnavis, Alexander Salvador (Author) / Peterson, Daniel (Thesis advisor) / Daliri, Ayoub (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2022

A Tunable Loss Function for Robust, Rigorous, and Reliable Machine Learning

Description

In the era of big data, more and more decisions and recommendations are being made by machine learning (ML) systems and algorithms. Despite their many successes, there have been notable deficiencies in the robustness, rigor, and reliability of these ML systems, which have had detrimental societal impacts. In the next…

In the era of big data, more and more decisions and recommendations are being made by machine learning (ML) systems and algorithms. Despite their many successes, there have been notable deficiencies in the robustness, rigor, and reliability of these ML systems, which have had detrimental societal impacts. In the next generation of ML, these significant challenges must be addressed through careful algorithmic design, and it is crucial that practitioners and meta-algorithms have the necessary tools to construct ML models that align with human values and interests. In an effort to help address these problems, this dissertation studies a tunable loss function called α-loss for the ML setting of classification. The alpha-loss is a hyperparameterized loss function originating from information theory that continuously interpolates between the exponential (alpha = 1/2), log (alpha = 1), and 0-1 (alpha = infinity) losses, hence providing a holistic perspective of several classical loss functions in ML. Furthermore, the alpha-loss exhibits unique operating characteristics depending on the value (and different regimes) of alpha; notably, for alpha > 1, alpha-loss robustly trains models when noisy training data is present. Thus, the alpha-loss can provide robustness to ML systems for classification tasks, and this has bearing in many applications, e.g., social media, finance, academia, and medicine; indeed, results are presented where alpha-loss produces more robust logistic regression models for COVID-19 survey data with gains over state of the art algorithmic approaches.

ContributorsSypherd, Tyler (Author) / Sankar, Lalitha (Thesis advisor) / Berisha, Visar (Committee member) / Dasarathy, Gautam (Committee member) / Kosut, Oliver (Committee member) / Arizona State University (Publisher)

Created2022

Representation Learning for Graph Structured Data using Deep Neural Networks

Description

Dealing with relational data structures is central to a wide-range of applications including social networks, epidemic modeling, molecular chemistry, medicine, energy distribution, and transportation. Machine learning models that can exploit the inherent structural/relational bias in the graph structured data have gained prominence in recent times. A recurring idea that appears…

Dealing with relational data structures is central to a wide-range of applications including social networks, epidemic modeling, molecular chemistry, medicine, energy distribution, and transportation. Machine learning models that can exploit the inherent structural/relational bias in the graph structured data have gained prominence in recent times. A recurring idea that appears in all approaches is to encode the nodes in the graph (or the entire graph) as low-dimensional vectors also known as embeddings, prior to carrying out downstream task-specific learning. It is crucial to eliminate hand-crafted features and instead directly incorporate the structural inductive bias into the deep learning architectures. In this dissertation, deep learning models that directly operate on graph structured data are proposed for effective representation learning. A literature review on existing graph representation learning is provided in the beginning of the dissertation. The primary focus of dissertation is on building novel graph neural network architectures that are robust against adversarial attacks. The proposed graph neural network models are extended to multiplex graphs (heterogeneous graphs). Finally, a relational neural network model is proposed to operate on a human structural connectome. For every research contribution of this dissertation, several empirical studies are conducted on benchmark datasets. The proposed graph neural network models, approaches, and architectures demonstrate significant performance improvements in comparison to the existing state-of-the-art graph embedding strategies.

ContributorsShanthamallu, Uday Shankar (Author) / Spanias, Andreas (Thesis advisor) / Thiagarajan, Jayaraman J (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2021

Marmoset Calls Labeling

Description

Callithrix jacchus, also known as a common marmoset, is native to the new world. These marmosets possess a wide range of vocal repertoire that is interesting to observe for the purpose of understanding their group communication and their fight or flight responses to the environment around them. In this project,…

Callithrix jacchus, also known as a common marmoset, is native to the new world. These marmosets possess a wide range of vocal repertoire that is interesting to observe for the purpose of understanding their group communication and their fight or flight responses to the environment around them. In this project, I am continuing with the project that a previous student, Jasmin, had done to find more data for her study. For the most part, my project entailed recording and labeling the marmoset’s calls into different types.

ContributorsTran, Anh (Author) / Zhou, Yi (Thesis director) / Berisha, Visar (Committee member) / Barrett, The Honors College (Contributor)

Created2021-05

Machine Learning for the Design of Screening Tests: General Principles and Applications in Criminology and Digital Medicine

Description

This dissertation explores applications of machine learning methods in service of the design of screening tests, which are ubiquitous in applications from social work, to criminology, to healthcare. In the first part, a novel Bayesian decision theory framework is presented for designing tree-based adaptive tests. On an application to youth…

This dissertation explores applications of machine learning methods in service of the design of screening tests, which are ubiquitous in applications from social work, to criminology, to healthcare. In the first part, a novel Bayesian decision theory framework is presented for designing tree-based adaptive tests. On an application to youth delinquency in Honduras, the method produces a 15-item instrument that is almost as accurate as a full-length 150+ item test. The framework includes specific considerations for the context in which the test will be administered, and provides uncertainty quantification around the trade-offs of shortening lengthy tests. In the second part, classification complexity is explored via theoretical and empirical results from statistical learning theory, information theory, and empirical data complexity measures. A simulation study that explicitly controls two key aspects of classification complexity is performed to relate the theoretical and empirical approaches. Throughout, a unified language and notation that formalizes classification complexity is developed; this same notation is used in subsequent chapters to discuss classification complexity in the context of a speech-based screening test. In the final part, the relative merits of task and feature engineering when designing a speech-based cognitive screening test are explored. Through an extensive classification analysis on a clinical speech dataset from patients with normal cognition and Alzheimer’s disease, the speech elicitation task is shown to have a large impact on test accuracy; carefully performed task and feature engineering are required for best results. A new framework for objectively quantifying speech elicitation tasks is introduced, and two methods are proposed for automatically extracting insights into the aspects of the speech elicitation task that are driving classification performance. The dissertation closes with recommendations for how to evaluate the obtained insights and use them to guide future design of speech-based screening tests.

ContributorsKrantsevich, Chelsea (Author) / Hahn, P. Richard (Thesis advisor) / Berisha, Visar (Committee member) / Lopes, Hedibert (Committee member) / Renaut, Rosemary (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2023

Music-Remixing Preferences of Prelingual and Postlingual Cochlear Implant Users

Description

The poor spectral and temporal resolution of cochlear implants (CIs) limit their users’ music enjoyment. Remixing music by boosting vocals while attenuating spectrally complex instruments has been shown to benefit music enjoyment of postlingually deaf CI users. However, the effectiveness of music remixing in prelingually deaf CI users is still…

The poor spectral and temporal resolution of cochlear implants (CIs) limit their users’ music enjoyment. Remixing music by boosting vocals while attenuating spectrally complex instruments has been shown to benefit music enjoyment of postlingually deaf CI users. However, the effectiveness of music remixing in prelingually deaf CI users is still unknown. This study compared the music-remixing preferences of nine postlingually deaf, late-implanted CI users and seven prelingually deaf, early-implanted CI users, as well as their ratings of song familiarity and vocal pleasantness. Twelve songs were selected from the most streamed tracks on Spotify for testing. There were six remixed versions of each song: Original, Music-6 (6-dB attenuation of all instruments), Music-12 (12-dB attenuation of all instruments), Music-3-3-12 (3-dB attenuation of bass and drums and 12-dB attenuation of other instruments), Vocals-6 (6-dB attenuation of vocals), and Vocals-12 (12-dB attenuation of vocals). It was found that the prelingual group preferred the Music-6 and Original versions over the other versions, while the postlingual group preferred the Vocals-12 version over the Music-12 version. The prelingual group was more familiar with the songs than the postlingual group. However, the song familiarity rating did not significantly affect the patterns of preference ratings in each group. The prelingual group also had higher vocal pleasantness ratings than the postlingual group. For the prelingual group, higher vocal pleasantness led to higher preference ratings for the Music-12 version. For the postlingual group, their overall preference for the Vocals-12 version was driven by their preference ratings for songs with very unpleasant vocals. These results suggest that the patient factor of auditory experience and stimulus factor of vocal pleasantness may affect the music-remixing preferences of CI users. As such, the music-remixing strategy needs to be customized for individual patients and songs.

ContributorsVecellio, Amanda Paige (Author) / Luo, Xin (Thesis advisor) / Ringenbach, Shannon (Committee member) / Berisha, Visar (Committee member) / Zhou, Yi (Committee member) / Arizona State University (Publisher)

Created2024

Methodologies to Improve Fidelity and Reliability of Deep Learning Models for Real-World Deployment

Description

The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain aware knowledge; (iii) i.i.d assumption between the train and target…

The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain aware knowledge; (iii) i.i.d assumption between the train and target distributions and (iv) belief on existing metrics as reliable indicators of performance. When any of these assumptions are violated, the models exhibit brittleness producing adversely varied behavior. This dissertation focuses on methods for accurate model design and characterization that enhance process reliability when certain assumptions are not met. With the need to safely adopt artificial intelligence tools in practice, it is vital to build reliable failure detectors that indicate regimes where the model must not be invoked. To that end, an error predictor trained with a self-calibration objective is developed to estimate loss consistent with the underlying model. The properties of the error predictor are described and their utility in supporting introspection via feature importances and counterfactual explanations is elucidated. While such an approach can signal data regime changes, it is critical to calibrate models using regimes of inlier (training) and outlier data to prevent under- and over-generalization in models i.e., incorrectly identifying inliers as outliers and vice-versa. By identifying the space for specifying inliers and outliers, an anomaly detector that can effectively flag data of varying semantic complexities in medical imaging is next developed. Uncertainty quantification in deep learning models involves identifying sources of failure and characterizing model confidence to enable actionability. A training strategy is developed that allows the accurate estimation of model uncertainties and its benefits are demonstrated for active learning and generalization gap prediction. This helps identify insufficiently sampled regimes and representation insufficiency in models. In addition, the task of deep inversion under data scarce scenarios is considered, which in practice requires a prior to control the optimization. By identifying limitations in existing work, data priors powered by generative models and deep model priors are designed for audio restoration. With relevant empirical studies on a variety of benchmarks, the need for such design strategies is demonstrated.

ContributorsNarayanaswamy, Vivek Sivaraman (Author) / Spanias, Andreas (Thesis advisor) / J. Thiagarajan, Jayaraman (Committee member) / Berisha, Visar (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2023

Assessing the Influence of Apple AirPods with Live Listen feature on Speech Recognition and Memory Retention in Noise Levels Simulating Noisy Healthcare Settings - Insights from QuickSIN

Description

This study aimed to evaluate the efficacy of Apple AirPods pro (2nd generation) Live Listen feature in enhancing word recognition and memory retention among individuals with varying degrees of hearing loss, as determined by their Signal-to-Noise Ratio (SNR) loss. Utilizing a single-group experimental design, the research measured participants' performance on…

This study aimed to evaluate the efficacy of Apple AirPods pro (2nd generation) Live Listen feature in enhancing word recognition and memory retention among individuals with varying degrees of hearing loss, as determined by their Signal-to-Noise Ratio (SNR) loss. Utilizing a single-group experimental design, the research measured participants' performance on word recognition and memory retention tasks with and without the Live Listen feature. Statistical analysis, including paired t-tests and linear regression, revealed significant improvements in word recognition (from 81.8% to 94.4%) and memory retention (from 43.8% to 59.4%) scores when the Live Listen feature was activated. Moreover, a positive correlation between SNR loss and recognition score improvements suggested a greater benefit for those with higher levels of hearing loss. However, the relationship with memory retention improvements was less pronounced. These findings underscore the potential of the Live Listen feature as an effective assistive listening device, highlighting its importance in enhancing auditory experiences for individuals with hearing impairments and encouraging further research into personalized auditory assistance technologies in noisy healthcare environments.

ContributorsForoogozar, Mehdi (Author) / Liss, Julie (Thesis advisor) / Berisha, Visar (Committee member) / Luo, Xin (Committee member) / Arizona State University (Publisher)

Created2024

Producing Acoustic-Prosodic Entrainment in a Robotic Learning Companion to Build Learner Rapport

Description

With advances in automatic speech recognition, spoken dialogue systems are assuming increasingly social roles. There is a growing need for these systems to be socially responsive, capable of building rapport with users. In human-human interactions, rapport is critical to patient-doctor communication, conflict resolution, educational interactions, and social engagement. Rapport between…

With advances in automatic speech recognition, spoken dialogue systems are assuming increasingly social roles. There is a growing need for these systems to be socially responsive, capable of building rapport with users. In human-human interactions, rapport is critical to patient-doctor communication, conflict resolution, educational interactions, and social engagement. Rapport between people promotes successful collaboration, motivation, and task success. Dialogue systems which can build rapport with their user may produce similar effects, personalizing interactions to create better outcomes.

This dissertation focuses on how dialogue systems can build rapport utilizing acoustic-prosodic entrainment. Acoustic-prosodic entrainment occurs when individuals adapt their acoustic-prosodic features of speech, such as tone of voice or loudness, to one another over the course of a conversation. Correlated with liking and task success, a dialogue system which entrains may enhance rapport. Entrainment, however, is very challenging to model. People entrain on different features in many ways and how to design entrainment to build rapport is unclear. The first goal of this dissertation is to explore how acoustic-prosodic entrainment can be modeled to build rapport.

Towards this goal, this work presents a series of studies comparing, evaluating, and iterating on the design of entrainment, motivated and informed by human-human dialogue. These models of entrainment are implemented in the dialogue system of a robotic learning companion. Learning companions are educational agents that engage students socially to increase motivation and facilitate learning. As a learning companion’s ability to be socially responsive increases, so do vital learning outcomes. A second goal of this dissertation is to explore the effects of entrainment on concrete outcomes such as learning in interactions with robotic learning companions.

This dissertation results in contributions both technical and theoretical. Technical contributions include a robust and modular dialogue system capable of producing prosodic entrainment and other socially-responsive behavior. One of the first systems of its kind, the results demonstrate that an entraining, social learning companion can positively build rapport and increase learning. This dissertation provides support for exploring phenomena like entrainment to enhance factors such as rapport and learning and provides a platform with which to explore these phenomena in future work.

ContributorsLubold, Nichola Anne (Author) / Walker, Erin (Thesis advisor) / Pon-Barry, Heather (Thesis advisor) / Litman, Diane (Committee member) / VanLehn, Kurt (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2018

Data-Driven Representation Learning in Multimodal Feature Fusion

Description

Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance.…

Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance. This dissertation focuses on the representation learning approaches as the fusion strategy. Specifically, the objective is to learn the shared latent representation which jointly exploit the structural information encoded in all modalities, such that a straightforward learning model can be adopted to obtain the prediction.

We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems.

In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph.

ContributorsSong, Huan (Author) / Spanias, Andreas (Thesis advisor) / Thiagarajan, Jayaraman (Committee member) / Berisha, Visar (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2018

Theses and Dissertations

Filtering by

Correlational Analysis Between Speech and Gait in Parkinson's Disease

A Tunable Loss Function for Robust, Rigorous, and Reliable Machine Learning

Representation Learning for Graph Structured Data using Deep Neural Networks

Marmoset Calls Labeling

Machine Learning for the Design of Screening Tests: General Principles and Applications in Criminology and Digital Medicine

Music-Remixing Preferences of Prelingual and Postlingual Cochlear Implant Users

Methodologies to Improve Fidelity and Reliability of Deep Learning Models for Real-World Deployment

Assessing the Influence of Apple AirPods with Live Listen feature on Speech Recognition and Memory Retention in Noise Levels Simulating Noisy Healthcare Settings - Insights from QuickSIN

Producing Acoustic-Prosodic Entrainment in a Robotic Learning Companion to Build Learner Rapport

Data-Driven Representation Learning in Multimodal Feature Fusion