Matching Items (8)
153488-Thumbnail Image.png
Description
Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2015
137577-Thumbnail Image.png
Description
Children's speech and language development is measured by performance on standardized articulation tests. Test items on these assessments, however, vary in length and complexity. Word complexity was compared across five articulation tests: the Assessment of Phonological Patterns-Revised (APP-R), the Bankson-Bernthal Test of Phonology (BBTOP), the Clinical Assessment of Articulation and

Children's speech and language development is measured by performance on standardized articulation tests. Test items on these assessments, however, vary in length and complexity. Word complexity was compared across five articulation tests: the Assessment of Phonological Patterns-Revised (APP-R), the Bankson-Bernthal Test of Phonology (BBTOP), the Clinical Assessment of Articulation and Phonology (CAAP), the Goldman-Fristoe Test of Articulation (GFTA), and the Assessment of Children's Articulation and Phonology (ACAP). Four groups of word complexity were used, using the dimensions of monosyllabic vs. multisyllabic words, and words with consonant clusters vs. words without consonant clusters. The measure of phonological mean length of utterance (Ingram, 2001), was used to assess overall word complexity. It was found that the tests varied in number of test items and word complexity, with the BBTOP and the CAAP showing the most similarity to word complexity in spontaneous speech of young children. On the other hand, the APP-R used the most complex words and showed the least similarity. Additionally, case studies were analyzed for three of the tests to examine the effect of word complexity on consonant correctness, usedin the measures of Percentage of Correct Consonants (PCC) and the Proportion of Whole Word Proximity (PWP). Word complexity was found to affect consonant correctness, therefore affecting test performance.
ContributorsSullivan, Katherine Elizabeth (Author) / Ingram, David (Thesis director) / Bacon, Cathy (Committee member) / Brown, Jean (Committee member) / Barrett, The Honors College (Contributor) / Department of Speech and Hearing Science (Contributor) / T. Denny Sanford School of Social and Family Dynamics (Contributor)
Created2013-05
137578-Thumbnail Image.png
Description
This paper analyzes the British people’s attitudes towards the French people both before and after the French Revolution. It looks at how the French émigrés played a role in shaping these attitudes. To analyze the opinions of the British people prior to the French Revolution travel diaries are used. These

This paper analyzes the British people’s attitudes towards the French people both before and after the French Revolution. It looks at how the French émigrés played a role in shaping these attitudes. To analyze the opinions of the British people prior to the French Revolution travel diaries are used. These travel diaries identify the stereotypes of the French people given by the British. The French Revolution prompted the immigration of French people to England. This immigration led to a change in treatment towards the French people. Kirsty Carpenter was a pioneer in researching the role émigrés played in changing British attitudes towards the French. During the Revolution a variety of sources are used to examine what the British thought of the émigrés. Memories of Frances Burney and Comtesse du Boigne are used. In addition, articles and reports found in newspapers like The Observer. Also, editorial and political writings by Henry Dundas and Edmund Burke are used. In general, after analyzing these sources it is seen that British attitudes towards the French people differed with the introduction of French émigrés during the French Revolution. Prior to the French Revolution, many British people thought of the French as foolish, vain, and lazy. The French emigrants elicited a sympathetic response from the British people. The differing attitudes towards the French people can be explained by the dire circumstances of the emigrants, the violent nature of the Revolution, and the increased contact between the French and British people.
ContributorsNorris, Katie Desiree (Author) / Thompson, Victoria (Thesis director) / Hopkins, Richard (Committee member) / Bruhn, Karen (Committee member) / School of Historical, Philosophical and Religious Studies (Contributor) / Barrett, The Honors College (Contributor)
Created2013-05
161568-Thumbnail Image.png
Description
Clarinet articulation is a process that uses the tongue to create an interruption in sound production either by contacting the reed or disrupting the air stream. This process occurs inside the mouth and is hidden from direct view. As a result, various solutions were developed in clarinet pedagogy to address

Clarinet articulation is a process that uses the tongue to create an interruption in sound production either by contacting the reed or disrupting the air stream. This process occurs inside the mouth and is hidden from direct view. As a result, various solutions were developed in clarinet pedagogy to address the issue of teaching with no visual feedback. Clarinet pedagogy literature consists of language that makes it possible for other clarinetists to discuss, teach, and research various aspects of clarinet playing. The interdisciplinary application of theoretical concepts in linguistics and how they map onto the language of clarinet pedagogy offers a new perspective for understanding the teaching methods used for articulation. To provide insight into the relationship of language and clarinet pedagogy, an overview of several linguistic theories and concepts, such as Peircean semiotics, metalanguages, discursive strategies, and articulatory phonetics, is presented. Additionally, a brief explanation of articulation techniques (single, multiple, flutter, and slap articulation) and commonly used teaching strategies is outlined. The language used in clarinet pedagogy literature from resources by prominent clarinet pedagogues, such as the works of John Anderson, Joshua Gardner, Michèle Gingras, Eric C. Hansen, Howard Klug, Phillip Rehfeldt, Thomas Ridenour, Heather Roche, Robert Spring, and Rachel Yoder, is surveyed. Pedagogical insights from a linguistic analysis are used to create resources for teaching and/or correcting articulation. Since the interdisciplinary application of linguistics and clarinet pedagogy is an underexplored topic, this research also aims to serve as a basis for further interdisciplinary studies.
Contributorsde Alba, Francisco Javier (Author) / Spring, Robert (Thesis advisor) / Gardner, Joshua (Thesis advisor) / Fossum, Dave (Committee member) / Caslor, Jason (Committee member) / Arizona State University (Publisher)
Created2021
157359-Thumbnail Image.png
Description
Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.
ContributorsJiao, Yishan (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie (Thesis advisor) / Zhou, Yi (Committee member) / Arizona State University (Publisher)
Created2019
187370-Thumbnail Image.png
Description
This project investigates the gleam-glum effect, a well-replicated phonetic emotion association in which words with the [i] vowel-sound (as in “gleam”) are judged more emotionally positive than words with the [Ʌ] vowel-sound (as in “glum”). The effect is observed across different modalities and languages and is moderated by mouth movements

This project investigates the gleam-glum effect, a well-replicated phonetic emotion association in which words with the [i] vowel-sound (as in “gleam”) are judged more emotionally positive than words with the [Ʌ] vowel-sound (as in “glum”). The effect is observed across different modalities and languages and is moderated by mouth movements relevant to word production. This research presents and tests an articulatory explanation for this association in three experiments. Experiment 1 supported the articulatory explanation by comparing recordings of 71 participants completing an emotional recall task and a word read-aloud task, showing that oral movements were more similar between positive emotional expressions and [i] articulation, and negative emotional expressions and [Ʌ] articulation. Experiment 2 partially supported the explanation with 98 YouTube recordings of natural speech. In Experiment 3, 149 participants judged emotions expressed by a speaker during [i] and [Ʌ] articulation. Contradicting the robust phonetic emotion association, participants judged more frequently that the speaker’s [Ʌ] articulatory movements were positive emotional expressions and [i] articulatory movements were negative emotional expressions. This is likely due to other visual emotional cues not related to oral movements and the order of word lists read by the speaker. Findings from the current project overall support an articulatory explanation for the gleam-glum effect, which has major implications for language and communication.
ContributorsYu, Shin-Phing (Author) / Mcbeath, Michael K (Thesis advisor) / Glenberg, Arthur M (Committee member) / Stone, Greg O (Committee member) / Coza, Aurel (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)
Created2023
158471-Thumbnail Image.png
Description
Children with cleft palate with or without cleft lip (CP+/-L) often demonstrate disordered speech. Clinicians and researchers have a goal for children with CP+/-L to demonstrate typical speech when entering kindergarten; however, this benchmark is not routinely met. There is a large body of previous research examining speech articulation skills

Children with cleft palate with or without cleft lip (CP+/-L) often demonstrate disordered speech. Clinicians and researchers have a goal for children with CP+/-L to demonstrate typical speech when entering kindergarten; however, this benchmark is not routinely met. There is a large body of previous research examining speech articulation skills in this clinical population; however, there are continued questions regarding the severity of articulation deficits in children with CP+/-L, especially for the age range of children entering school. This dissertation aimed to provide additional information on speech accuracy and speech error usage in children with CP+/-L between the ages of four and seven years. Additionally, it explored individual and treatment characteristics that may influence articulation skills. Finally, it examined the relationship between speech accuracy during a sentence repetition task versus during a single-word naming task.

Children with CP+/-L presented with speech accuracy that differed according to manner of production. Speech accuracy for fricative phonemes was influenced by severity of hypernasality, although age and status of secondary surgery did not influence speech accuracy for fricatives. For place of articulation, children with CP+/-L demonstrated strongest accuracy of production for bilabial and velar phonemes, while alveolar and palatal phonemes were produced with lower accuracy. Children with clefting that involved the lip and alveolus demonstrated reduced speech accuracy for alveolar phonemes compared to children with clefts involving the hard and soft palate only.

Participants used a variety of speech error types, with developmental/phonological errors, anterior oral cleft speech characteristics, and compensatory errors occurring most frequently across the sample. Several factors impacted the type of speech errors used, including cleft type, severity of hypernasality, and age.

The results from this dissertation project support previous research findings and provide additional information regarding the severity of speech articulation deficits according to manner and place of consonant production and according to different speech error categories. This study adds information on individual and treatment characteristics that influenced speech accuracy and speech error usage.
ContributorsLien, Kari (Author) / Scherer, Nancy J. (Thesis advisor) / Nett Cordero, Kelly (Committee member) / Liss, Julie (Committee member) / Sitzman, Thomas (Committee member) / Arizona State University (Publisher)
Created2020
158475-Thumbnail Image.png
Description
This paper closely examines the performance practice regarding articulation of the preludes from Bach’s Well-Tempered Clavier Book I. Recordings by five pianists are studied: Vladimir Feltsman, Glenn Gould, Angela Hewitt, Andras Schiff, and Rosalyn Tureck. The recordings reveal certain recurring articulation patterns which are categorized into six articulation techniques: short

This paper closely examines the performance practice regarding articulation of the preludes from Bach’s Well-Tempered Clavier Book I. Recordings by five pianists are studied: Vladimir Feltsman, Glenn Gould, Angela Hewitt, Andras Schiff, and Rosalyn Tureck. The recordings reveal certain recurring articulation patterns which are categorized into six articulation techniques: short slurs, long slurs, detached upbeat, accented downbeats, changing articulation, and rolled chords. The author has divided the preludes into four groups: preludes with continuous running figuration, lyrical preludes, lyrical preludes with distinct melody and accompaniment, and preludes with non-lyrical themes. Analysis reveals that for each group of preludes, there are a set of principles that these pianists follow. Overall, for non-lyrical preludes, the running sixteenth notes are played legato, staccato, or a short slur followed by staccato. The slower moving quarter and eighth notes stay mostly detached or staccato. For lyrical preludes, the melody remains largely legato. Articulation techniques are used more extensively in non-lyrical preludes than lyrical ones, and more often in the slower moving eighth notes than running figuration. Articulation techniques are often used as means of embellishment. They enhance the individual character of each piece and generate Baroque attributes. Despite the principles observed in the recordings, many isolated performances are found which do not conform to any of them, suggesting that there is no authoritative rule when articulating Bach’s works on piano.
ContributorsGan, Nan (Author) / Hamilton, Robert (Thesis advisor) / Meir, Baruch (Committee member) / Little, Bliss (Committee member) / Marshall, Kimberly (Committee member) / Arizona State University (Publisher)
Created2020