Matching Items (6)
Filtering by

Clear all filters

152801-Thumbnail Image.png
Description
Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults

Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults participated in the study that measured intelligibility (percent words correct) of dysarthric speech in auditory versus audiovisual conditions. Participants were then separated into two groups: older adults (age range 47 to 68) and young adults (age range 19 to 36) to examine the influence of age. Findings revealed that all participants, regardless of age, improved their ability to recognize dysarthric speech when visual speech was added to the auditory signal. The magnitude of this benefit, however, was greater for older adults when compared with younger adults. These results inform our understanding of how visual speech information influences understanding of dysarthric speech.
ContributorsFall, Elizabeth (Author) / Liss, Julie (Thesis advisor) / Berisha, Visar (Committee member) / Gray, Shelley (Committee member) / Arizona State University (Publisher)
Created2014
153488-Thumbnail Image.png
Description
Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2015
153352-Thumbnail Image.png
Description
Language and music are fundamentally entwined within human culture. The two domains share similar properties including rhythm, acoustic complexity, and hierarchical structure. Although language and music have commonalities, abilities in these two domains have been found to dissociate after brain damage, leaving unanswered questions about their interconnectedness, including can one

Language and music are fundamentally entwined within human culture. The two domains share similar properties including rhythm, acoustic complexity, and hierarchical structure. Although language and music have commonalities, abilities in these two domains have been found to dissociate after brain damage, leaving unanswered questions about their interconnectedness, including can one domain support the other when damage occurs? Evidence supporting this question exists for speech production. Musical pitch and rhythm are employed in Melodic Intonation Therapy to improve expressive language recovery, but little is known about the effects of music on the recovery of speech perception and receptive language. This research is one of the first to address the effects of music on speech perception. Two groups of participants, an older adult group (n=24; M = 71.63 yrs) and a younger adult group (n=50; M = 21.88 yrs) took part in the study. A native female speaker of Standard American English created four different types of stimuli including pseudoword sentences of normal speech, simultaneous music-speech, rhythmic speech, and music-primed speech. The stimuli were presented binaurally and participants were instructed to repeat what they heard following a 15 second time delay. Results were analyzed using standard parametric techniques. It was found that musical priming of speech, but not simultaneous synchronized music and speech, facilitated speech perception in both the younger adult and older adult groups. This effect may be driven by rhythmic information. The younger adults outperformed the older adults in all conditions. The speech perception task relied heavily on working memory, and there is a known working memory decline associated with aging. Thus, participants completed a working memory task to be used as a covariate in analyses of differences across stimulus types and age groups. Working memory ability was found to correlate with speech perception performance, but that the age-related performance differences are still significant once working memory differences are taken into account. These results provide new avenues for facilitating speech perception in stroke patients and sheds light upon the underlying mechanisms of Melodic Intonation Therapy for speech production.
ContributorsLaCroix, Arianna (Author) / Rogalsky, Corianne (Thesis advisor) / Gray, Shelley (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)
Created2015
Description
Through decades of clinical progress, cochlear implants have brought the world of speech and language to thousands of profoundly deaf patients. However, the technology has many possible areas for improvement, including providing information of non-linguistic cues, also called indexical properties of speech. The field of sensory substitution, providing information relating

Through decades of clinical progress, cochlear implants have brought the world of speech and language to thousands of profoundly deaf patients. However, the technology has many possible areas for improvement, including providing information of non-linguistic cues, also called indexical properties of speech. The field of sensory substitution, providing information relating one sense to another, offers a potential avenue to further assist those with cochlear implants, in addition to the promise they hold for those without existing aids. A user study with a vibrotactile device is evaluated to exhibit the effectiveness of this approach in an auditory gender discrimination task. Additionally, preliminary computational work is included that demonstrates advantages and limitations encountered when expanding the complexity of future implementations.
ContributorsButts, Austin McRae (Author) / Helms Tillery, Stephen (Thesis advisor) / Berisha, Visar (Committee member) / Buneo, Christopher (Committee member) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2015
156177-Thumbnail Image.png
Description
The activation of the primary motor cortex (M1) is common in speech perception tasks that involve difficult listening conditions. Although the challenge of recognizing and discriminating non-native speech sounds appears to be an instantiation of listening under difficult circumstances, it is still unknown if M1 recruitment is facilitatory of second

The activation of the primary motor cortex (M1) is common in speech perception tasks that involve difficult listening conditions. Although the challenge of recognizing and discriminating non-native speech sounds appears to be an instantiation of listening under difficult circumstances, it is still unknown if M1 recruitment is facilitatory of second language speech perception. The purpose of this study was to investigate the role of M1 associated with speech motor centers in processing acoustic inputs in the native (L1) and second language (L2), using repetitive Transcranial Magnetic Stimulation (rTMS) to selectively alter neural activity in M1. Thirty-six healthy English/Spanish bilingual subjects participated in the experiment. The performance on a listening word-to-picture matching task was measured before and after real- and sham-rTMS to the orbicularis oris (lip muscle) associated M1. Vowel Space Area (VSA) obtained from recordings of participants reading a passage in L2 before and after real-rTMS, was calculated to determine its utility as an rTMS aftereffect measure. There was high variability in the aftereffect of the rTMS protocol to the lip muscle among the participants. Approximately 50% of participants showed an inhibitory effect of rTMS, evidenced by smaller motor evoked potentials (MEPs) area, whereas the other 50% had a facilitatory effect, with larger MEPs. This suggests that rTMS has a complex influence on M1 excitability, and relying on grand-average results can obscure important individual differences in rTMS physiological and functional outcomes. Evidence of motor support to word recognition in the L2 was found. Participants showing an inhibitory aftereffect of rTMS on M1 produced slower and less accurate responses in the L2 task, whereas those showing a facilitatory aftereffect of rTMS on M1 produced more accurate responses in L2. In contrast, no effect of rTMS was found on the L1, where accuracy and speed were very similar after sham- and real-rTMS. The L2 VSA measure was indicative of the aftereffect of rTMS to M1 associated with speech production, supporting its utility as an rTMS aftereffect measure. This result revealed an interesting and novel relation between cerebral motor cortex activation and speech measures.
ContributorsBarragan, Beatriz (Author) / Liss, Julie (Thesis advisor) / Berisha, Visar (Committee member) / Rogalsky, Corianne (Committee member) / Restrepo, Adelaida (Committee member) / Arizona State University (Publisher)
Created2018
137282-Thumbnail Image.png
Description
A previous study demonstrated that learning to lift an object is context-based and that in the presence of both the memory and visual cues, the acquired sensorimotor memory to manipulate an object in one context interferes with the performance of the same task in presence of visual information about a

A previous study demonstrated that learning to lift an object is context-based and that in the presence of both the memory and visual cues, the acquired sensorimotor memory to manipulate an object in one context interferes with the performance of the same task in presence of visual information about a different context (Fu et al, 2012).
The purpose of this study is to know whether the primary motor cortex (M1) plays a role in the sensorimotor memory. It was hypothesized that temporary disruption of the M1 following the learning to minimize a tilt using a ‘L’ shaped object would negatively affect the retention of sensorimotor memory and thus reduce interference between the memory acquired in one context and the visual cues to perform the same task in a different context.
Significant findings were shown in blocks 1, 2, and 4. In block 3, subjects displayed insignificant amount of learning. However, it cannot be concluded that there is full interference in block 3. Therefore, looked into 3 effects in statistical analysis: the main effects of the blocks, the main effects of the trials, and the effects of the blocks and trials combined. From the block effects, there is a p-value of 0.001, and from the trial effects, the p-value is less than 0.001. Both of these effects indicate that there is learning occurring. However, when looking at the blocks * trials effects, we see a p-value of 0.002 < 0.05 indicating significant interaction between sensorimotor memories. Based on the results that were found, there is a presence of interference in all the blocks but not enough to justify the use of TMS in order to reduce interference because there is a partial reduction of interference from the control experiment. It is evident that the time delay might be the issue between context switches. By reducing the time delay between block 2 and 3 from 10 minutes to 5 minutes, I will hope to see significant learning to occur from the first trial to the second trial.
ContributorsHasan, Salman Bashir (Author) / Santello, Marco (Thesis director) / Kleim, Jeffrey (Committee member) / Helms Tillery, Stephen (Committee member) / Barrett, The Honors College (Contributor) / W. P. Carey School of Business (Contributor) / Harrington Bioengineering Program (Contributor)
Created2014-05