Search Content

Audiovisual sentence recognition in bimodal and bilateral cochlear implant users

Description

The present study describes audiovisual sentence recognition in normal hearing listeners, bimodal cochlear implant (CI) listeners and bilateral CI listeners. This study explores a new set of sentences (the AzAV sentences) that were created to have equal auditory intelligibility and equal gain from visual information.

The aims of Experiment I…

The present study describes audiovisual sentence recognition in normal hearing listeners, bimodal cochlear implant (CI) listeners and bilateral CI listeners. This study explores a new set of sentences (the AzAV sentences) that were created to have equal auditory intelligibility and equal gain from visual information.

The aims of Experiment I were to (i) compare the lip reading difficulty of the AzAV sentences to that of other sentence materials, (ii) compare the speech-reading ability of CI listeners to that of normal-hearing listeners and (iii) assess the gain in speech understanding when listeners have both auditory and visual information from easy-to-lip-read and difficult-to-lip read sentences. In addition, the sentence lists were subjected to a multi-level text analysis to determine the factors that make sentences easy or difficult to speech read.

The results of Experiment I showed that (i) the AzAV sentences were relatively difficult to lip read, (ii) that CI listeners and normal-hearing listeners did not differ in lip reading ability and (iii) that sentences with low lip-reading intelligibility (10-15 % correct) provide about a 30 percentage point improvement in speech understanding when added to the acoustic stimulus, while sentences with high lip-reading intelligibility (30-60 % correct) provide about a 50 percentage point improvement in the same comparison. The multi-level text analyses showed that the familiarity of phrases in the sentences was the primary driving factor that affects the lip reading difficulty.

The aim of Experiment II was to investigate the value, when visual information is present, of bimodal hearing and bilateral cochlear implants. The results of Experiment II showed that when visual information is present, low-frequency acoustic hearing can be of value to speech understanding for patients fit with a single CI. However, when visual information was available no gain was seen from the provision of a second CI, i.e., bilateral CIs. As was the case in Experiment I, visual information provided about a 30 percentage point improvement in speech understanding.

ContributorsWang, Shuai (Author) / Dorman, Michael (Thesis advisor) / Berisha, Visar (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)

Created2015

The impact of visual input on the ability of bilateral and bimodal cochlear implant users to accurately perceive words and phonemes in experimental phrases

Description

A multitude of individuals across the globe suffer from hearing loss and that number continues to grow. Cochlear implants, while having limitations, provide electrical input for users enabling them to "hear" and more fully interact socially with their environment. There has been a clinical shift to the…

A multitude of individuals across the globe suffer from hearing loss and that number continues to grow. Cochlear implants, while having limitations, provide electrical input for users enabling them to "hear" and more fully interact socially with their environment. There has been a clinical shift to the bilateral placement of implants in both ears and to bimodal placement of a hearing aid in the contralateral ear if residual hearing is present. However, there is potentially more to subsequent speech perception for bilateral and bimodal cochlear implant users than the electric and acoustic input being received via these modalities. For normal listeners vision plays a role and Rosenblum (2005) points out it is a key feature of an integrated perceptual process. Logically, cochlear implant users should also benefit from integrated visual input. The question is how exactly does vision provide benefit to bilateral and bimodal users. Eight (8) bilateral and 5 bimodal participants received randomized experimental phrases previously generated by Liss et al. (1998) in auditory and audiovisual conditions. The participants recorded their perception of the input. Data were consequently analyzed for percent words correct, consonant errors, and lexical boundary error types. Overall, vision was found to improve speech perception for bilateral and bimodal cochlear implant participants. Each group experienced a significant increase in percent words correct when visual input was added. With vision bilateral participants reduced consonant place errors and demonstrated increased use of syllabic stress cues used in lexical segmentation. Therefore, results suggest vision might provide perceptual benefits for bilateral cochlear implant users by granting access to place information and by augmenting cues for syllabic stress in the absence of acoustic input. On the other hand vision did not provide the bimodal participants significantly increased access to place and stress cues. Therefore the exact mechanism by which bimodal implant users improved speech perception with the addition of vision is unknown. These results point to the complexities of audiovisual integration during speech perception and the need for continued research regarding the benefit vision provides to bilateral and bimodal cochlear implant users.

ContributorsLudwig, Cimarron (Author) / Liss, Julie (Thesis advisor) / Dorman, Michael (Committee member) / Azuma, Tamiko (Committee member) / Arizona State University (Publisher)

Created2015

Vowel Normalization in Dysarthria

Description

In this study, the Bark transform and Lobanov method were used to normalize vowel formants in speech produced by persons with dysarthria. The computer classification accuracy of these normalized data were then compared to the results of human perceptual classification accuracy of the actual vowels. These results were then analyzed…

In this study, the Bark transform and Lobanov method were used to normalize vowel formants in speech produced by persons with dysarthria. The computer classification accuracy of these normalized data were then compared to the results of human perceptual classification accuracy of the actual vowels. These results were then analyzed to determine if these techniques correlated with the human data.

ContributorsJones, Hanna Vanessa (Author) / Liss, Julie (Thesis director) / Dorman, Michael (Committee member) / Borrie, Stephanie (Committee member) / Barrett, The Honors College (Contributor) / Department of Speech and Hearing Science (Contributor) / Department of English (Contributor) / Speech and Hearing Science (Contributor)

Created2013-05

Student-To-Student Anatomy Volume 1: Heart, Lungs, ENT

Description

Student to Student: A Guide to Anatomy is an anatomy guide written by students, for students. Its focus is on teaching the anatomy of the heart, lungs, nose, ears and throat in a manner that isn't overpowering or stress inducing. Daniel and I have taken numerous anatomy courses, and fully…

Student to Student: A Guide to Anatomy is an anatomy guide written by students, for students. Its focus is on teaching the anatomy of the heart, lungs, nose, ears and throat in a manner that isn't overpowering or stress inducing. Daniel and I have taken numerous anatomy courses, and fully comprehend what it takes to have success in these classes. We found that the anatomy books recommended for these courses are often completely overwhelming, offering way more information than what is needed. This renders them near useless for a college student who just wants to learn the essentials. Why would a student even pick it up if they can't find what they need to learn? With that in mind, our goal was to create a comprehensive, easy to understand, and easy to follow guide to the heart, lungs and ENT (ear nose throat). We know what information is vital for test day, and wanted to highlight these key concepts and ideas in our guide. Spending just 60 to 90 minutes studying our guide should help any student with their studying needs. Whether the student has medical school aspirations, or if they simply just want to pass the class, our guide is there for them. We aren't experts, but we know what strategies and methods can help even the most confused students learn. Our guide can also be used as an introductory resource to our respective majors (Daniel-Biology, Charles-Speech and Hearing) for students who are undecided on what they want to do. In the future Daniel and I would like to see more students creating similar guides, and adding onto the "Student to Student' title with their own works... After all, who better to teach students than the students who know what it takes?

ContributorsKennedy, Charles (Co-author) / McDermand, Daniel (Co-author) / Kingsbury, Jeffrey (Thesis director) / Washo-Krupps, Delon (Committee member) / Department of Speech and Hearing Science (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

A computational model of the relationship between speech intelligibility and speech acoustics

Description

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental…

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.

ContributorsJiao, Yishan (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie (Thesis advisor) / Zhou, Yi (Committee member) / Arizona State University (Publisher)

Created2019

Robust Experimental Design for Speech Analysis Applications

Description

In many biological research studies, including speech analysis, clinical research, and prediction studies, the validity of the study is dependent on the effectiveness of the training data set to represent the target population. For example, in speech analysis, if one is performing emotion classification based on speech, the performance of…

In many biological research studies, including speech analysis, clinical research, and prediction studies, the validity of the study is dependent on the effectiveness of the training data set to represent the target population. For example, in speech analysis, if one is performing emotion classification based on speech, the performance of the classifier is mainly dependent on the number and quality of the training data set. For small sample sizes and unbalanced data, classifiers developed in this context may be focusing on the differences in the training data set rather than emotion (e.g., focusing on gender, age, and dialect).

This thesis evaluates several sampling methods and a non-parametric approach to sample sizes required to minimize the effect of these nuisance variables on classification performance. This work specifically focused on speech analysis applications, and hence the work was done with speech features like Mel-Frequency Cepstral Coefficients (MFCC) and Filter Bank Cepstral Coefficients (FBCC). The non-parametric divergence (D_p divergence) measure was used to study the difference between different sampling schemes (Stratified and Multistage sampling) and the changes due to the sentence types in the sampling set for the process.

ContributorsMariajohn, Aaquila (Author) / Berisha, Visar (Thesis advisor) / Spanias, Andreas (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)

Created2020

Learning Robust and Repeatable Speech Features for Clinical Applications

Description

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on…

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on conventional acoustic feature changes in subjects with post-traumatic headache (PTH) attributed to mild traumatic brain injury (mTBI). This work demonstrates the effectiveness of using speech signals to assess the pathological status of individuals. At the same time, it highlights some of the limitations of conventional acoustic and linguistic features, such as low repeatability and generalizability. Two critical characteristics of speech features are (1) good robustness, as speech features need to generalize across different corpora, and (2) high repeatability, as speech features need to be invariant to all confounding factors except the pathological state of targets. This thesis presents two research thrusts in the context of speech signals in clinical applications that focus on improving the robustness and repeatability of speech features, respectively. The first thrust introduces a deep learning framework to generate acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss combined with a classification loss is used to train the model jointly, and data-warping techniques are employed to improve the robustness of embeddings. Empirical results demonstrate that the proposed method achieves high in-corpus and cross-corpus classification accuracy and generates good embeddings sensitive to voice quality and robust across different corpora. The second thrust introduces using the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. A novel regularizer, the ICC regularizer, is proposed to regularize deep neural networks to produce embeddings with higher repeatability. This ICC regularizer is implemented and applied to three speech applications: a clinical application, speaker verification, and voice style conversion. The experimental results reveal that the ICC regularizer improves the repeatability of learned embeddings compared to the contrastive loss, leading to enhanced performance in downstream tasks.

ContributorsZhang, Jianwei (Author) / Jayasuriya, Suren (Thesis advisor) / Berisha, Visar (Thesis advisor) / Liss, Julie (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by

Audiovisual sentence recognition in bimodal and bilateral cochlear implant users

The impact of visual input on the ability of bilateral and bimodal cochlear implant users to accurately perceive words and phonemes in experimental phrases

Vowel Normalization in Dysarthria

Student-To-Student Anatomy Volume 1: Heart, Lungs, ENT

A computational model of the relationship between speech intelligibility and speech acoustics

Robust Experimental Design for Speech Analysis Applications

Learning Robust and Repeatable Speech Features for Clinical Applications