Matching Items (14)

134484-Thumbnail Image.png

Examining the Equivalence of Traditional vs. Automated Speech Perception Testing in Adult Listeners with Normal Hearing

Description

The purpose of the present study was to determine if an automated speech perception task yields results that are equivalent to a word recognition test used in audiometric evaluations. This

The purpose of the present study was to determine if an automated speech perception task yields results that are equivalent to a word recognition test used in audiometric evaluations. This was done by testing 51 normally hearing adults using a traditional word recognition task (NU-6) and an automated Non-Word Detection task. Stimuli for each task were presented in quiet as well as in six signal-to-noise ratios (SNRs) increasing in 3 dB increments (+0 dB, +3 dB, +6 dB, +9 dB, + 12 dB, +15 dB). A two one-sided test procedure (TOST) was used to determine equivalency of the two tests. This approach required the performance for both tasks to be arcsine transformed and converted to z-scores in order to calculate the difference in scores across listening conditions. These values were then compared to a predetermined criterion to establish if equivalency exists. It was expected that the TOST procedure would reveal equivalency between the traditional word recognition task and the automated Non-Word Detection Task. The results confirmed that the two tasks differed by no more than 2 test items in any of the listening conditions. Overall, the results indicate that the automated Non-Word Detection task could be used in addition to, or in place of, traditional word recognition tests. In addition, the features of an automated test such as the Non-Word Detection task offer additional benefits including rapid administration, accurate scoring, and supplemental performance data (e.g., error analyses) beyond those obtained in traditional speech perception measures.

Contributors

Agent

Created

Date Created
  • 2017-05

133858-Thumbnail Image.png

Cognitive and Auditory Factors for Speech and Music Perception in Elderly Adult Cochlear Implant Users

Description

Working memory and cognitive functions contribute to speech recognition in normal hearing and hearing impaired listeners. In this study, auditory and cognitive functions are measured in young adult normal hearing,

Working memory and cognitive functions contribute to speech recognition in normal hearing and hearing impaired listeners. In this study, auditory and cognitive functions are measured in young adult normal hearing, elderly normal hearing, and elderly cochlear implant subjects. The effects of age and hearing on the different measures are investigated. The correlations between auditory/cognitive functions and speech/music recognition are examined. The results may demonstrate which factors can better explain the variable performance across elderly cochlear implant users.

Contributors

Agent

Created

Date Created
  • 2018-05

152801-Thumbnail Image.png

Audiovisual perception of dysarthric speech in older adults compared to younger adults

Description

Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information

Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults participated in the study that measured intelligibility (percent words correct) of dysarthric speech in auditory versus audiovisual conditions. Participants were then separated into two groups: older adults (age range 47 to 68) and young adults (age range 19 to 36) to examine the influence of age. Findings revealed that all participants, regardless of age, improved their ability to recognize dysarthric speech when visual speech was added to the auditory signal. The magnitude of this benefit, however, was greater for older adults when compared with younger adults. These results inform our understanding of how visual speech information influences understanding of dysarthric speech.

Contributors

Agent

Created

Date Created
  • 2014

154197-Thumbnail Image.png

Towards a sensorimotor approach to L2 phonological acquisition

Description

Studies in Second Language Acquisition and Neurolinguistics have argued that adult learners when dealing with certain phonological features of L2, such as segmental and suprasegmental ones, face problems of articulatory

Studies in Second Language Acquisition and Neurolinguistics have argued that adult learners when dealing with certain phonological features of L2, such as segmental and suprasegmental ones, face problems of articulatory placement (Esling, 2006; Abercrombie, 1967) and somatosensory stimulation (Guenther, Ghosh, & Tourville, 2006; Waldron, 2010). These studies have argued that adult phonological acquisition is a complex matter that needs to be informed by a specialized sensorimotor theory of speech acquisition. They further suggested that traditional pronunciation pedagogy needs to be enhanced by an approach to learning offering learners fundamental and practical sensorimotor tools to advance the quality of L2 speech acquisition.

This foundational study designs a sensorimotor approach to pronunciation pedagogy and tests its effect on the L2 speech of five adult (late) learners of American English. Throughout an eight week classroom experiment, participants from different first language backgrounds received instruction on Articulatory Settings (Honickman, 1964) and the sensorimotor mechanism of speech acquisition (Waldron 2010; Guenther et al., 2006). In addition, they attended five adapted lessons of the Feldenkrais technique (Feldenkrais, 1972) designed to develop sensorimotor awareness of the vocal apparatus and improve the quality of L2 speech movement. I hypothesize that such sensorimotor learning triggers overall positive changes in the way L2 learners deal with speech articulators for L2 and that over time they develop better pronunciation.

After approximately eight hours of intervention, analysis of results shows participants’ improvement in speech rate, degree of accentedness, and speaking confidence, but mixed changes in word intelligibility and vowel space area. Albeit not statistically significant (p >.05), these results suggest that such a sensorimotor approach to L2 phonological acquisition warrants further consideration and investigation for use in the L2 classroom.

Contributors

Agent

Created

Date Created
  • 2015

153488-Thumbnail Image.png

Context recognition methods using audio signals for human-machine interaction

Description

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.

Contributors

Agent

Created

Date Created
  • 2015

154572-Thumbnail Image.png

Model-driven time-varying signal analysis and its application to speech processing

Description

This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed

This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed methodologies application for clinical practice in speech-language pathology. The second area concentrates on signal expansions matched to physical-based models but without requiring independent basis functions; the significance of this work is demonstrated with speech vowels.

A fully automated Vowel Space Area (VSA) computation method is proposed that can be applied to any type of speech. It is shown that the VSA provides an efficient and reliable measure and is correlated to speech intelligibility. A clinical tool that incorporates the automated VSA was proposed for evaluation and treatment to be used by speech language pathologists. Two exploratory studies are performed using two databases by analyzing mean formant trajectories in healthy speech for a wide range of speakers, dialects, and coarticulation contexts. It is shown that phonemes crowded in formant space can often have distinct trajectories, possibly due to accurate perception.

A theory for analyzing time-varying signals models with amplitude modulation and frequency modulation is developed. Examples are provided that demonstrate other possible signal model decompositions with independent basis functions and corresponding physical interpretations. The Hilbert transform (HT) and the use of the analytic form of a signal are motivated, and a proof is provided to show that a signal can still preserve desirable mathematical properties without the use of the HT. A visualization of the Hilbert spectrum is proposed to aid in the interpretation. A signal demodulation is proposed and used to develop a modified Empirical Mode Decomposition (EMD) algorithm.

Contributors

Agent

Created

Date Created
  • 2016

156177-Thumbnail Image.png

The role of primary motor cortex in second language word recognition

Description

The activation of the primary motor cortex (M1) is common in speech perception tasks that involve difficult listening conditions. Although the challenge of recognizing and discriminating non-native speech sounds appears

The activation of the primary motor cortex (M1) is common in speech perception tasks that involve difficult listening conditions. Although the challenge of recognizing and discriminating non-native speech sounds appears to be an instantiation of listening under difficult circumstances, it is still unknown if M1 recruitment is facilitatory of second language speech perception. The purpose of this study was to investigate the role of M1 associated with speech motor centers in processing acoustic inputs in the native (L1) and second language (L2), using repetitive Transcranial Magnetic Stimulation (rTMS) to selectively alter neural activity in M1. Thirty-six healthy English/Spanish bilingual subjects participated in the experiment. The performance on a listening word-to-picture matching task was measured before and after real- and sham-rTMS to the orbicularis oris (lip muscle) associated M1. Vowel Space Area (VSA) obtained from recordings of participants reading a passage in L2 before and after real-rTMS, was calculated to determine its utility as an rTMS aftereffect measure. There was high variability in the aftereffect of the rTMS protocol to the lip muscle among the participants. Approximately 50% of participants showed an inhibitory effect of rTMS, evidenced by smaller motor evoked potentials (MEPs) area, whereas the other 50% had a facilitatory effect, with larger MEPs. This suggests that rTMS has a complex influence on M1 excitability, and relying on grand-average results can obscure important individual differences in rTMS physiological and functional outcomes. Evidence of motor support to word recognition in the L2 was found. Participants showing an inhibitory aftereffect of rTMS on M1 produced slower and less accurate responses in the L2 task, whereas those showing a facilitatory aftereffect of rTMS on M1 produced more accurate responses in L2. In contrast, no effect of rTMS was found on the L1, where accuracy and speed were very similar after sham- and real-rTMS. The L2 VSA measure was indicative of the aftereffect of rTMS to M1 associated with speech production, supporting its utility as an rTMS aftereffect measure. This result revealed an interesting and novel relation between cerebral motor cortex activation and speech measures.

Contributors

Agent

Created

Date Created
  • 2018

Enhancing the perception of speech indexical properties of Cochlear implants through sensory substitution

Description

Through decades of clinical progress, cochlear implants have brought the world of speech and language to thousands of profoundly deaf patients. However, the technology has many possible areas for improvement,

Through decades of clinical progress, cochlear implants have brought the world of speech and language to thousands of profoundly deaf patients. However, the technology has many possible areas for improvement, including providing information of non-linguistic cues, also called indexical properties of speech. The field of sensory substitution, providing information relating one sense to another, offers a potential avenue to further assist those with cochlear implants, in addition to the promise they hold for those without existing aids. A user study with a vibrotactile device is evaluated to exhibit the effectiveness of this approach in an auditory gender discrimination task. Additionally, preliminary computational work is included that demonstrates advantages and limitations encountered when expanding the complexity of future implementations.

Contributors

Agent

Created

Date Created
  • 2015

153415-Thumbnail Image.png

Individual differences in the perceptual learning of degraded speech: implications for cochlear implant aural rehabilitation

Description

In the noise and commotion of daily life, people achieve effective communication partly because spoken messages are replete with redundant information. Listeners exploit available contextual, linguistic, phonemic, and prosodic

In the noise and commotion of daily life, people achieve effective communication partly because spoken messages are replete with redundant information. Listeners exploit available contextual, linguistic, phonemic, and prosodic cues to decipher degraded speech. When other cues are absent or ambiguous, phonemic and prosodic cues are particularly important because they help identify word boundaries, a process known as lexical segmentation. Individuals vary in the degree to which they rely on phonemic or prosodic cues for lexical segmentation in degraded conditions.

Deafened individuals who use a cochlear implant have diminished access to fine frequency information in the speech signal, and show resulting difficulty perceiving phonemic and prosodic cues. Auditory training on phonemic elements improves word recognition for some listeners. Little is known, however, about the potential benefits of prosodic training, or the degree to which individual differences in cue use affect outcomes.

The present study used simulated cochlear implant stimulation to examine the effects of phonemic and prosodic training on lexical segmentation. Participants completed targeted training with either phonemic or prosodic cues, and received passive exposure to the non-targeted cue. Results show that acuity to the targeted cue improved after training. In addition, both targeted attention and passive exposure to prosodic features led to increased use of these cues for lexical segmentation. Individual differences in degree and source of benefit point to the importance of personalizing clinical intervention to increase flexible use of a range of perceptual strategies for understanding speech.

Contributors

Agent

Created

Date Created
  • 2015

153419-Thumbnail Image.png

The impact of visual input on the ability of bilateral and bimodal cochlear implant users to accurately perceive words and phonemes in experimental phrases

Description

A multitude of individuals across the globe suffer from hearing loss and that number continues to grow. Cochlear implants, while having limitations, provide electrical input for users enabling

A multitude of individuals across the globe suffer from hearing loss and that number continues to grow. Cochlear implants, while having limitations, provide electrical input for users enabling them to "hear" and more fully interact socially with their environment. There has been a clinical shift to the bilateral placement of implants in both ears and to bimodal placement of a hearing aid in the contralateral ear if residual hearing is present. However, there is potentially more to subsequent speech perception for bilateral and bimodal cochlear implant users than the electric and acoustic input being received via these modalities. For normal listeners vision plays a role and Rosenblum (2005) points out it is a key feature of an integrated perceptual process. Logically, cochlear implant users should also benefit from integrated visual input. The question is how exactly does vision provide benefit to bilateral and bimodal users. Eight (8) bilateral and 5 bimodal participants received randomized experimental phrases previously generated by Liss et al. (1998) in auditory and audiovisual conditions. The participants recorded their perception of the input. Data were consequently analyzed for percent words correct, consonant errors, and lexical boundary error types. Overall, vision was found to improve speech perception for bilateral and bimodal cochlear implant participants. Each group experienced a significant increase in percent words correct when visual input was added. With vision bilateral participants reduced consonant place errors and demonstrated increased use of syllabic stress cues used in lexical segmentation. Therefore, results suggest vision might provide perceptual benefits for bilateral cochlear implant users by granting access to place information and by augmenting cues for syllabic stress in the absence of acoustic input. On the other hand vision did not provide the bimodal participants significantly increased access to place and stress cues. Therefore the exact mechanism by which bimodal implant users improved speech perception with the addition of vision is unknown. These results point to the complexities of audiovisual integration during speech perception and the need for continued research regarding the benefit vision provides to bilateral and bimodal cochlear implant users.

Contributors

Agent

Created

Date Created
  • 2015