This collection includes both ASU Theses and Dissertations, submitted by graduate students, and the Barrett, Honors College theses submitted by undergraduate students. 

Displaying 1 - 4 of 4
Filtering by

Clear all filters

152801-Thumbnail Image.png
Description
Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults

Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults participated in the study that measured intelligibility (percent words correct) of dysarthric speech in auditory versus audiovisual conditions. Participants were then separated into two groups: older adults (age range 47 to 68) and young adults (age range 19 to 36) to examine the influence of age. Findings revealed that all participants, regardless of age, improved their ability to recognize dysarthric speech when visual speech was added to the auditory signal. The magnitude of this benefit, however, was greater for older adults when compared with younger adults. These results inform our understanding of how visual speech information influences understanding of dysarthric speech.
ContributorsFall, Elizabeth (Author) / Liss, Julie (Thesis advisor) / Berisha, Visar (Committee member) / Gray, Shelley (Committee member) / Arizona State University (Publisher)
Created2014
156892-Thumbnail Image.png
Description
With advances in automatic speech recognition, spoken dialogue systems are assuming increasingly social roles. There is a growing need for these systems to be socially responsive, capable of building rapport with users. In human-human interactions, rapport is critical to patient-doctor communication, conflict resolution, educational interactions, and social engagement. Rapport between

With advances in automatic speech recognition, spoken dialogue systems are assuming increasingly social roles. There is a growing need for these systems to be socially responsive, capable of building rapport with users. In human-human interactions, rapport is critical to patient-doctor communication, conflict resolution, educational interactions, and social engagement. Rapport between people promotes successful collaboration, motivation, and task success. Dialogue systems which can build rapport with their user may produce similar effects, personalizing interactions to create better outcomes.

This dissertation focuses on how dialogue systems can build rapport utilizing acoustic-prosodic entrainment. Acoustic-prosodic entrainment occurs when individuals adapt their acoustic-prosodic features of speech, such as tone of voice or loudness, to one another over the course of a conversation. Correlated with liking and task success, a dialogue system which entrains may enhance rapport. Entrainment, however, is very challenging to model. People entrain on different features in many ways and how to design entrainment to build rapport is unclear. The first goal of this dissertation is to explore how acoustic-prosodic entrainment can be modeled to build rapport.

Towards this goal, this work presents a series of studies comparing, evaluating, and iterating on the design of entrainment, motivated and informed by human-human dialogue. These models of entrainment are implemented in the dialogue system of a robotic learning companion. Learning companions are educational agents that engage students socially to increase motivation and facilitate learning. As a learning companion’s ability to be socially responsive increases, so do vital learning outcomes. A second goal of this dissertation is to explore the effects of entrainment on concrete outcomes such as learning in interactions with robotic learning companions.

This dissertation results in contributions both technical and theoretical. Technical contributions include a robust and modular dialogue system capable of producing prosodic entrainment and other socially-responsive behavior. One of the first systems of its kind, the results demonstrate that an entraining, social learning companion can positively build rapport and increase learning. This dissertation provides support for exploring phenomena like entrainment to enhance factors such as rapport and learning and provides a platform with which to explore these phenomena in future work.
ContributorsLubold, Nichola Anne (Author) / Walker, Erin (Thesis advisor) / Pon-Barry, Heather (Thesis advisor) / Litman, Diane (Committee member) / VanLehn, Kurt (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2018
153745-Thumbnail Image.png
Description
Glottal fry is a vocal register characterized by low frequency and increased signal perturbation, and is perceptually identified by its popping, creaky quality. Recently, the use of the glottal fry vocal register has received growing awareness and attention in popular culture and media in the United States. The creaky quality

Glottal fry is a vocal register characterized by low frequency and increased signal perturbation, and is perceptually identified by its popping, creaky quality. Recently, the use of the glottal fry vocal register has received growing awareness and attention in popular culture and media in the United States. The creaky quality that was originally associated with vocal pathologies is indeed becoming “trendy,” particularly among young women across the United States. But while existing studies have defined, quantified, and attempted to explain the use of glottal fry in conversational speech, there is currently no explanation for the increasing prevalence of the use of glottal fry amongst American women. This thesis, however, proposes that conversational entrainment—a communication phenomenon which describes the propensity to modify one’s behavior to align more closely with one’s communication partner—may provide a theoretical framework to explain the growing trend in the use of glottal fry amongst college-aged women in the United States. Female participants (n = 30) between the ages of 18 and 29 years (M = 20.6, SD = 2.95) had conversations with two conversation partners, one who used quantifiably more glottal fry than the other. The study utilized perceptual and quantifiable acoustic information to address the following key question: Does the amount of habitual glottal fry in a conversational partner influence one’s use of glottal fry in their own speech? Results yielded the following two findings: (1) according to perceptual annotations, the participants used a greater amount of glottal fry when speaking with the Fry conversation partner than with the Non Fry partner, (2) statistically significant differences were found in the acoustics of the participants’ vocal qualities based on conversation partner. While the current study demonstrates that young women are indeed speaking in glottal fry in everyday conversations, and that its use can be attributed in part to conversational entrainment, we still lack a clear explanation of the deeper motivations for women to speak in a lower vocal register. The current study opens avenues for continued analysis of the sociolinguistic functions of the glottal fry register.
ContributorsDelfino, Christine R (Author) / Liss, Julie M (Thesis advisor) / Borrie, Stephanie A (Thesis advisor) / Azuma, Tamiko (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2015
155613-Thumbnail Image.png
Description
The purpose of this study was to identify acoustic markers that correlate with accurate and inaccurate /r/ production in children ages 5-8 using signal processing. In addition, the researcher aimed to identify predictive acoustic markers that relate to changes in /r/ accuracy. A total of 35 children (23 accurate, 12

The purpose of this study was to identify acoustic markers that correlate with accurate and inaccurate /r/ production in children ages 5-8 using signal processing. In addition, the researcher aimed to identify predictive acoustic markers that relate to changes in /r/ accuracy. A total of 35 children (23 accurate, 12 inaccurate, 8 longitudinal) were recorded. Computerized stimuli were presented on a PC laptop computer and the children were asked to do five tasks to elicit spontaneous and imitated /r/ production in all positions. Files were edited and analyzed using a filter bank approach centered at 40 frequencies based on the Mel-scale. T-tests were used to compare spectral energy of tokens between accurate and inaccurate groups and additional t-tests were used to compare duration of accurate and inaccurate files. Results included significant differences between the accurate and inaccurate productions of /r/, notable differences in the 24-26 mel bin range, and longer duration of inaccurate /r/ than accurate. Signal processing successfully identified acoustic features of accurate and inaccurate production of /r/ and candidate predictive markers that may be associated with acquisition of /r/.
ContributorsBecvar, Brittany Patricia (Author) / Azuma, Tamiko (Thesis advisor) / Weinhold, Juliet (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2017