Matching Items (5)
Filtering by

Clear all filters

152801-Thumbnail Image.png
Description
Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults

Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults participated in the study that measured intelligibility (percent words correct) of dysarthric speech in auditory versus audiovisual conditions. Participants were then separated into two groups: older adults (age range 47 to 68) and young adults (age range 19 to 36) to examine the influence of age. Findings revealed that all participants, regardless of age, improved their ability to recognize dysarthric speech when visual speech was added to the auditory signal. The magnitude of this benefit, however, was greater for older adults when compared with younger adults. These results inform our understanding of how visual speech information influences understanding of dysarthric speech.
ContributorsFall, Elizabeth (Author) / Liss, Julie (Thesis advisor) / Berisha, Visar (Committee member) / Gray, Shelley (Committee member) / Arizona State University (Publisher)
Created2014
153419-Thumbnail Image.png
Description
A multitude of individuals across the globe suffer from hearing loss and that number continues to grow. Cochlear implants, while having limitations, provide electrical input for users enabling them to "hear" and more fully interact socially with their environment. There has been a clinical shift to the

A multitude of individuals across the globe suffer from hearing loss and that number continues to grow. Cochlear implants, while having limitations, provide electrical input for users enabling them to "hear" and more fully interact socially with their environment. There has been a clinical shift to the bilateral placement of implants in both ears and to bimodal placement of a hearing aid in the contralateral ear if residual hearing is present. However, there is potentially more to subsequent speech perception for bilateral and bimodal cochlear implant users than the electric and acoustic input being received via these modalities. For normal listeners vision plays a role and Rosenblum (2005) points out it is a key feature of an integrated perceptual process. Logically, cochlear implant users should also benefit from integrated visual input. The question is how exactly does vision provide benefit to bilateral and bimodal users. Eight (8) bilateral and 5 bimodal participants received randomized experimental phrases previously generated by Liss et al. (1998) in auditory and audiovisual conditions. The participants recorded their perception of the input. Data were consequently analyzed for percent words correct, consonant errors, and lexical boundary error types. Overall, vision was found to improve speech perception for bilateral and bimodal cochlear implant participants. Each group experienced a significant increase in percent words correct when visual input was added. With vision bilateral participants reduced consonant place errors and demonstrated increased use of syllabic stress cues used in lexical segmentation. Therefore, results suggest vision might provide perceptual benefits for bilateral cochlear implant users by granting access to place information and by augmenting cues for syllabic stress in the absence of acoustic input. On the other hand vision did not provide the bimodal participants significantly increased access to place and stress cues. Therefore the exact mechanism by which bimodal implant users improved speech perception with the addition of vision is unknown. These results point to the complexities of audiovisual integration during speech perception and the need for continued research regarding the benefit vision provides to bilateral and bimodal cochlear implant users.
ContributorsLudwig, Cimarron (Author) / Liss, Julie (Thesis advisor) / Dorman, Michael (Committee member) / Azuma, Tamiko (Committee member) / Arizona State University (Publisher)
Created2015
153352-Thumbnail Image.png
Description
Language and music are fundamentally entwined within human culture. The two domains share similar properties including rhythm, acoustic complexity, and hierarchical structure. Although language and music have commonalities, abilities in these two domains have been found to dissociate after brain damage, leaving unanswered questions about their interconnectedness, including can one

Language and music are fundamentally entwined within human culture. The two domains share similar properties including rhythm, acoustic complexity, and hierarchical structure. Although language and music have commonalities, abilities in these two domains have been found to dissociate after brain damage, leaving unanswered questions about their interconnectedness, including can one domain support the other when damage occurs? Evidence supporting this question exists for speech production. Musical pitch and rhythm are employed in Melodic Intonation Therapy to improve expressive language recovery, but little is known about the effects of music on the recovery of speech perception and receptive language. This research is one of the first to address the effects of music on speech perception. Two groups of participants, an older adult group (n=24; M = 71.63 yrs) and a younger adult group (n=50; M = 21.88 yrs) took part in the study. A native female speaker of Standard American English created four different types of stimuli including pseudoword sentences of normal speech, simultaneous music-speech, rhythmic speech, and music-primed speech. The stimuli were presented binaurally and participants were instructed to repeat what they heard following a 15 second time delay. Results were analyzed using standard parametric techniques. It was found that musical priming of speech, but not simultaneous synchronized music and speech, facilitated speech perception in both the younger adult and older adult groups. This effect may be driven by rhythmic information. The younger adults outperformed the older adults in all conditions. The speech perception task relied heavily on working memory, and there is a known working memory decline associated with aging. Thus, participants completed a working memory task to be used as a covariate in analyses of differences across stimulus types and age groups. Working memory ability was found to correlate with speech perception performance, but that the age-related performance differences are still significant once working memory differences are taken into account. These results provide new avenues for facilitating speech perception in stroke patients and sheds light upon the underlying mechanisms of Melodic Intonation Therapy for speech production.
ContributorsLaCroix, Arianna (Author) / Rogalsky, Corianne (Thesis advisor) / Gray, Shelley (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)
Created2015
Description
Through decades of clinical progress, cochlear implants have brought the world of speech and language to thousands of profoundly deaf patients. However, the technology has many possible areas for improvement, including providing information of non-linguistic cues, also called indexical properties of speech. The field of sensory substitution, providing information relating

Through decades of clinical progress, cochlear implants have brought the world of speech and language to thousands of profoundly deaf patients. However, the technology has many possible areas for improvement, including providing information of non-linguistic cues, also called indexical properties of speech. The field of sensory substitution, providing information relating one sense to another, offers a potential avenue to further assist those with cochlear implants, in addition to the promise they hold for those without existing aids. A user study with a vibrotactile device is evaluated to exhibit the effectiveness of this approach in an auditory gender discrimination task. Additionally, preliminary computational work is included that demonstrates advantages and limitations encountered when expanding the complexity of future implementations.
ContributorsButts, Austin McRae (Author) / Helms Tillery, Stephen (Thesis advisor) / Berisha, Visar (Committee member) / Buneo, Christopher (Committee member) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2015
154757-Thumbnail Image.png
Description
Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the memory and computation cost

of keyword detection and speech recognition networks (or DNNs) are presented.

The first technique is based on representing all weights and biases by a small number of bits and mapping all nodal computations into fixed-point ones with minimal degradation in the

accuracy. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Additional reduction in memory is achieved by a technique called weight pruning,

where the weights are classified as sensitive and insensitive and the sensitive weights are represented with higher precision. A combination of these two techniques helps reduce the memory

footprint by 81 - 84% for speech recognition and keyword detection networks respectively.

Further reduction in memory size is achieved by judiciously dropping connections for large blocks of weights. The corresponding technique, termed coarse-grain sparsification, introduces

hardware-aware sparsity during DNN training, which leads to efficient weight memory compression and significant reduction in the number of computations during classification without

loss of accuracy. Keyword detection and speech recognition DNNs trained with 75% of the weights dropped and classified with 5-6 bit weight precision effectively reduced the weight memory

requirement by ~95% compared to a fully-connected network with double precision, while showing similar performance in keyword detection accuracy and word error rate.
ContributorsArunachalam, Sairam (Author) / Chakrabarti, Chaitali (Thesis advisor) / Seo, Jae-Sun (Thesis advisor) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2016