Matching Items (7)
Filtering by

Clear all filters

151634-Thumbnail Image.png
Description
Two groups of cochlear implant (CI) listeners were tested for sound source localization and for speech recognition in complex listening environments. One group (n=11) wore bilateral CIs and, potentially, had access to interaural level difference (ILD) cues, but not interaural timing difference (ITD) cues. The second group (n=12) wore a

Two groups of cochlear implant (CI) listeners were tested for sound source localization and for speech recognition in complex listening environments. One group (n=11) wore bilateral CIs and, potentially, had access to interaural level difference (ILD) cues, but not interaural timing difference (ITD) cues. The second group (n=12) wore a single CI and had low-frequency, acoustic hearing in both the ear contralateral to the CI and in the implanted ear. These `hearing preservation' listeners, potentially, had access to ITD cues but not to ILD cues. At issue in this dissertation was the value of the two types of information about sound sources, ITDs and ILDs, for localization and for speech perception when speech and noise sources were separated in space. For Experiment 1, normal hearing (NH) listeners and the two groups of CI listeners were tested for sound source localization using a 13 loudspeaker array. For the NH listeners, the mean RMS error for localization was 7 degrees, for the bilateral CI listeners, 20 degrees, and for the hearing preservation listeners, 23 degrees. The scores for the two CI groups did not differ significantly. Thus, both CI groups showed equivalent, but poorer than normal, localization. This outcome using the filtered noise bands for the normal hearing listeners, suggests ILD and ITD cues can support equivalent levels of localization. For Experiment 2, the two groups of CI listeners were tested for speech recognition in noise when the noise sources and targets were spatially separated in a simulated `restaurant' environment and in two versions of a `cocktail party' environment. At issue was whether either CI group would show benefits from binaural hearing, i.e., better performance when the noise and targets were separated in space. Neither of the CI groups showed spatial release from masking. However, both groups showed a significant binaural advantage (a combination of squelch and summation), which also maintained separation of the target and noise, indicating the presence of some binaural processing or `unmasking' of speech in noise. Finally, localization ability in Experiment 1 was not correlated with binaural advantage in Experiment 2.
ContributorsLoiselle, Louise (Author) / Dorman, Michael F. (Thesis advisor) / Yost, William A. (Thesis advisor) / Azuma, Tamiko (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)
Created2013
153453-Thumbnail Image.png
Description
The present study describes audiovisual sentence recognition in normal hearing listeners, bimodal cochlear implant (CI) listeners and bilateral CI listeners. This study explores a new set of sentences (the AzAV sentences) that were created to have equal auditory intelligibility and equal gain from visual information.

The aims of Experiment I

The present study describes audiovisual sentence recognition in normal hearing listeners, bimodal cochlear implant (CI) listeners and bilateral CI listeners. This study explores a new set of sentences (the AzAV sentences) that were created to have equal auditory intelligibility and equal gain from visual information.

The aims of Experiment I were to (i) compare the lip reading difficulty of the AzAV sentences to that of other sentence materials, (ii) compare the speech-reading ability of CI listeners to that of normal-hearing listeners and (iii) assess the gain in speech understanding when listeners have both auditory and visual information from easy-to-lip-read and difficult-to-lip read sentences. In addition, the sentence lists were subjected to a multi-level text analysis to determine the factors that make sentences easy or difficult to speech read.

The results of Experiment I showed that (i) the AzAV sentences were relatively difficult to lip read, (ii) that CI listeners and normal-hearing listeners did not differ in lip reading ability and (iii) that sentences with low lip-reading intelligibility (10-15 % correct) provide about a 30 percentage point improvement in speech understanding when added to the acoustic stimulus, while sentences with high lip-reading intelligibility (30-60 % correct) provide about a 50 percentage point improvement in the same comparison. The multi-level text analyses showed that the familiarity of phrases in the sentences was the primary driving factor that affects the lip reading difficulty.

The aim of Experiment II was to investigate the value, when visual information is present, of bimodal hearing and bilateral cochlear implants. The results of Experiment II showed that when visual information is present, low-frequency acoustic hearing can be of value to speech understanding for patients fit with a single CI. However, when visual information was available no gain was seen from the provision of a second CI, i.e., bilateral CIs. As was the case in Experiment I, visual information provided about a 30 percentage point improvement in speech understanding.
ContributorsWang, Shuai (Author) / Dorman, Michael (Thesis advisor) / Berisha, Visar (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)
Created2015
153419-Thumbnail Image.png
Description
A multitude of individuals across the globe suffer from hearing loss and that number continues to grow. Cochlear implants, while having limitations, provide electrical input for users enabling them to "hear" and more fully interact socially with their environment. There has been a clinical shift to the

A multitude of individuals across the globe suffer from hearing loss and that number continues to grow. Cochlear implants, while having limitations, provide electrical input for users enabling them to "hear" and more fully interact socially with their environment. There has been a clinical shift to the bilateral placement of implants in both ears and to bimodal placement of a hearing aid in the contralateral ear if residual hearing is present. However, there is potentially more to subsequent speech perception for bilateral and bimodal cochlear implant users than the electric and acoustic input being received via these modalities. For normal listeners vision plays a role and Rosenblum (2005) points out it is a key feature of an integrated perceptual process. Logically, cochlear implant users should also benefit from integrated visual input. The question is how exactly does vision provide benefit to bilateral and bimodal users. Eight (8) bilateral and 5 bimodal participants received randomized experimental phrases previously generated by Liss et al. (1998) in auditory and audiovisual conditions. The participants recorded their perception of the input. Data were consequently analyzed for percent words correct, consonant errors, and lexical boundary error types. Overall, vision was found to improve speech perception for bilateral and bimodal cochlear implant participants. Each group experienced a significant increase in percent words correct when visual input was added. With vision bilateral participants reduced consonant place errors and demonstrated increased use of syllabic stress cues used in lexical segmentation. Therefore, results suggest vision might provide perceptual benefits for bilateral cochlear implant users by granting access to place information and by augmenting cues for syllabic stress in the absence of acoustic input. On the other hand vision did not provide the bimodal participants significantly increased access to place and stress cues. Therefore the exact mechanism by which bimodal implant users improved speech perception with the addition of vision is unknown. These results point to the complexities of audiovisual integration during speech perception and the need for continued research regarding the benefit vision provides to bilateral and bimodal cochlear implant users.
ContributorsLudwig, Cimarron (Author) / Liss, Julie (Thesis advisor) / Dorman, Michael (Committee member) / Azuma, Tamiko (Committee member) / Arizona State University (Publisher)
Created2015
137669-Thumbnail Image.png
Description
When listeners hear sentences presented simultaneously, the listeners are better able to discriminate between speakers when there is a difference in fundamental frequency (F0). This paper explores the use of a pulse train vocoder to simulate cochlear implant listening. A pulse train vocoder, rather than a noise or tonal vocoder,

When listeners hear sentences presented simultaneously, the listeners are better able to discriminate between speakers when there is a difference in fundamental frequency (F0). This paper explores the use of a pulse train vocoder to simulate cochlear implant listening. A pulse train vocoder, rather than a noise or tonal vocoder, was used so the fundamental frequency (F0) of speech would be well represented. The results of this experiment showed that listeners are able to use the F0 information to aid in speaker segregation. As expected, recognition performance is the poorest when there was no difference in F0 between speakers, and listeners performed better as the difference in F0 increased. The type of errors that the listeners made was also analyzed. The results show that when an error was made in identifying the correct word from the target sentence, the response was usually (~60%) a word that was uttered in the competing sentence.
ContributorsStanley, Nicole Ernestine (Author) / Yost, William (Thesis director) / Dorman, Michael (Committee member) / Liss, Julie (Committee member) / Barrett, The Honors College (Contributor) / Department of Speech and Hearing Science (Contributor) / Hugh Downs School of Human Communication (Contributor)
Created2013-05
157359-Thumbnail Image.png
Description
Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.
ContributorsJiao, Yishan (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie (Thesis advisor) / Zhou, Yi (Committee member) / Arizona State University (Publisher)
Created2019
158464-Thumbnail Image.png
Description
In many biological research studies, including speech analysis, clinical research, and prediction studies, the validity of the study is dependent on the effectiveness of the training data set to represent the target population. For example, in speech analysis, if one is performing emotion classification based on speech, the performance of

In many biological research studies, including speech analysis, clinical research, and prediction studies, the validity of the study is dependent on the effectiveness of the training data set to represent the target population. For example, in speech analysis, if one is performing emotion classification based on speech, the performance of the classifier is mainly dependent on the number and quality of the training data set. For small sample sizes and unbalanced data, classifiers developed in this context may be focusing on the differences in the training data set rather than emotion (e.g., focusing on gender, age, and dialect).

This thesis evaluates several sampling methods and a non-parametric approach to sample sizes required to minimize the effect of these nuisance variables on classification performance. This work specifically focused on speech analysis applications, and hence the work was done with speech features like Mel-Frequency Cepstral Coefficients (MFCC) and Filter Bank Cepstral Coefficients (FBCC). The non-parametric divergence (D_p divergence) measure was used to study the difference between different sampling schemes (Stratified and Multistage sampling) and the changes due to the sentence types in the sampling set for the process.
ContributorsMariajohn, Aaquila (Author) / Berisha, Visar (Thesis advisor) / Spanias, Andreas (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)
Created2020
190765-Thumbnail Image.png
Description
Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on conventional acoustic feature changes in subjects with post-traumatic headache (PTH) attributed to mild traumatic brain injury (mTBI). This work demonstrates the effectiveness of using speech signals to assess the pathological status of individuals. At the same time, it highlights some of the limitations of conventional acoustic and linguistic features, such as low repeatability and generalizability. Two critical characteristics of speech features are (1) good robustness, as speech features need to generalize across different corpora, and (2) high repeatability, as speech features need to be invariant to all confounding factors except the pathological state of targets. This thesis presents two research thrusts in the context of speech signals in clinical applications that focus on improving the robustness and repeatability of speech features, respectively. The first thrust introduces a deep learning framework to generate acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss combined with a classification loss is used to train the model jointly, and data-warping techniques are employed to improve the robustness of embeddings. Empirical results demonstrate that the proposed method achieves high in-corpus and cross-corpus classification accuracy and generates good embeddings sensitive to voice quality and robust across different corpora. The second thrust introduces using the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. A novel regularizer, the ICC regularizer, is proposed to regularize deep neural networks to produce embeddings with higher repeatability. This ICC regularizer is implemented and applied to three speech applications: a clinical application, speaker verification, and voice style conversion. The experimental results reveal that the ICC regularizer improves the repeatability of learned embeddings compared to the contrastive loss, leading to enhanced performance in downstream tasks.
ContributorsZhang, Jianwei (Author) / Jayasuriya, Suren (Thesis advisor) / Berisha, Visar (Thesis advisor) / Liss, Julie (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)
Created2023