Search Content

Two-Sentence Recognition with a Pulse Train Vocoder

Description

When listeners hear sentences presented simultaneously, the listeners are better able to discriminate between speakers when there is a difference in fundamental frequency (F0). This paper explores the use of a pulse train vocoder to simulate cochlear implant listening. A pulse train vocoder, rather than a noise or tonal vocoder,…

When listeners hear sentences presented simultaneously, the listeners are better able to discriminate between speakers when there is a difference in fundamental frequency (F0). This paper explores the use of a pulse train vocoder to simulate cochlear implant listening. A pulse train vocoder, rather than a noise or tonal vocoder, was used so the fundamental frequency (F0) of speech would be well represented. The results of this experiment showed that listeners are able to use the F0 information to aid in speaker segregation. As expected, recognition performance is the poorest when there was no difference in F0 between speakers, and listeners performed better as the difference in F0 increased. The type of errors that the listeners made was also analyzed. The results show that when an error was made in identifying the correct word from the target sentence, the response was usually (~60%) a word that was uttered in the competing sentence.

ContributorsStanley, Nicole Ernestine (Author) / Yost, William (Thesis director) / Dorman, Michael (Committee member) / Liss, Julie (Committee member) / Barrett, The Honors College (Contributor) / Department of Speech and Hearing Science (Contributor) / Hugh Downs School of Human Communication (Contributor)

Created2013-05

A computational model of the relationship between speech intelligibility and speech acoustics

Description

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental…

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.

ContributorsJiao, Yishan (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie (Thesis advisor) / Zhou, Yi (Committee member) / Arizona State University (Publisher)

Created2019

Repeatability and Accuracy of a Widely-Available Voice-Based Stress Analysis Tool

Description

Stress, depression, and anxiety are prevailing mental health issues that affect individuals worldwide. As the search for effective solutions continues, advancements in technology have led to the development of digital tools for stress identification and management purposes. The Cigna StressWaves Test (CSWT) is a publicly available stress analysis toolkit that…

Stress, depression, and anxiety are prevailing mental health issues that affect individuals worldwide. As the search for effective solutions continues, advancements in technology have led to the development of digital tools for stress identification and management purposes. The Cigna StressWaves Test (CSWT) is a publicly available stress analysis toolkit that claims to use “clinical-grade” artificial intelligence (AI) technology to evaluate individual stress levels through speech. To investigate their claim, this research stands as an independent validation study involving 60 participants over the age of 18. The primary objective of the study is to assess the repeatability and efficacy of the CSWT as a stress measurement tool. Key results indicate the CSWT lacks test-retest reliability and convergent validity. This implies that the CWST is not a repeatable tool and does not provide similar stress outcomes relative to an established measure of stress, the Perceived Stress Scale (PSS). These findings cast doubt on the accuracy and effectiveness of the CWST as a stress assessment tool. The public availability of the CSWT and the claim that it is a “clinical-grade” tool highlights concerns regarding premature deployment of digital health tools for stress management.

ContributorsYawer, Batul (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie (Committee member) / Luo, Xin (Committee member) / Arizona State University (Publisher)

Created2023

Contributing Factors to Orofacial Somatosensory Sensitivity

Description

The brain uses the somatosensory system to interact with the environment and control movements. Additionally, many movement disorders are associated with deficits in the somatosensory sensory system. Thus, understanding the somatosensory system is essential for developing treatments for movement disorders. Previous studies have extensively examined the role of the somatosensory…

The brain uses the somatosensory system to interact with the environment and control movements. Additionally, many movement disorders are associated with deficits in the somatosensory sensory system. Thus, understanding the somatosensory system is essential for developing treatments for movement disorders. Previous studies have extensively examined the role of the somatosensory system in controlling the lower and upper extremities; however, little is known about the contributions of the orofacial somatosensory system. The overall goal of this study was to determine factors that influence the sensitivity of the orofacial somatosensory system. To measure the somatosensory system's sensitivity, transcutaneous electrical current stimulation was applied to the skin overlaying the trigeminal nerve on the lower portion of the face. After applying stimulation, participants' sensitivity was determined through the detection of the electrical stimuli (i.e., perceptual threshold). The data analysis focused on the impact of (1) stimulation parameters, (2) electrode placement, and (3) motor tasks on the perceptual threshold. The results showed that, as expected, stimulation parameters (such as stimulation frequency and duration) influenced perceptual thresholds. However, electrode placement (left vs. right side of the face) and motor tasks (lip contraction vs. rest) did not influence perceptual thresholds. Overall, these findings have important implications for designing and developing therapeutic neuromodulation techniques based on trigeminal nerve stimulation.

ContributorsKhoury, Maya Elie (Author) / Daliri, Ayoub (Thesis advisor) / Patten, Jake (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)

Created2022

She Said I Love You: The Effect of StartReact and Startle Adjuvant Rehabilitation Therapy in Post-Stroke Aphasia, Apraxia, and Dysarthria of Speech

Description

Stroke is the leading cause of long-term disability in the U.S., with up to 60% of strokescausing speech loss. Individuals with severe stroke, who require the most frequent, intense speech therapy, often cannot adhere to treatments due to high cost and low success rates. Therefore, the ability to make functionally…

Stroke is the leading cause of long-term disability in the U.S., with up to 60% of strokescausing speech loss. Individuals with severe stroke, who require the most frequent, intense speech therapy, often cannot adhere to treatments due to high cost and low success rates. Therefore, the ability to make functionally significant changes in individuals with severe post- stroke aphasia remains a key challenge for the rehabilitation community. This dissertation aimed to evaluate the efficacy of Startle Adjuvant Rehabilitation Therapy (START), a tele-enabled, low- cost treatment, to improve quality of life and speech in individuals with severe-to-moderate stroke. START is the exposure to startling acoustic stimuli during practice of motor tasks in individuals with stroke. START increases the speed and intensity of practice in severely impaired post-stroke reaching, with START eliciting muscle activity 2-3 times higher than maximum voluntary contraction. Voluntary reaching distance, onset, and final accuracy increased after a session of START, suggesting a rehabilitative effect. However, START has not been evaluated during impaired speech. The objective of this study is to determine if impaired speech can be elicited by startling acoustic stimuli, and if three days of START training can enhance clinical measures of moderate to severe post-stroke aphasia and apraxia of speech. This dissertation evaluates START in 42 individuals with post-stroke speech impairment via telehealth in a Phase 0 clinical trial. Results suggest that impaired speech can be elicited by startling acoustic stimuli and that START benefits individuals with severe-to-moderate post-stroke impairments in both linguistic and motor speech domains. This fills an important gap in aphasia care, as many speech therapies remain ineffective and financially inaccessible for patients with severe deficits. START is effective, remotely delivered, and may likely serve as an affordable adjuvant to traditional therapy for those that have poor access to quality care.

ContributorsSwann, Zoe Elisabeth (Author) / Honeycutt, Claire F (Thesis advisor) / Daliri, Ayoub (Committee member) / Rogalsky, Corianne (Committee member) / Liss, Julie (Committee member) / Schaefer, Sydney (Committee member) / Arizona State University (Publisher)

Created2022

Robust Experimental Design for Speech Analysis Applications

Description

In many biological research studies, including speech analysis, clinical research, and prediction studies, the validity of the study is dependent on the effectiveness of the training data set to represent the target population. For example, in speech analysis, if one is performing emotion classification based on speech, the performance of…

In many biological research studies, including speech analysis, clinical research, and prediction studies, the validity of the study is dependent on the effectiveness of the training data set to represent the target population. For example, in speech analysis, if one is performing emotion classification based on speech, the performance of the classifier is mainly dependent on the number and quality of the training data set. For small sample sizes and unbalanced data, classifiers developed in this context may be focusing on the differences in the training data set rather than emotion (e.g., focusing on gender, age, and dialect).

This thesis evaluates several sampling methods and a non-parametric approach to sample sizes required to minimize the effect of these nuisance variables on classification performance. This work specifically focused on speech analysis applications, and hence the work was done with speech features like Mel-Frequency Cepstral Coefficients (MFCC) and Filter Bank Cepstral Coefficients (FBCC). The non-parametric divergence (D_p divergence) measure was used to study the difference between different sampling schemes (Stratified and Multistage sampling) and the changes due to the sentence types in the sampling set for the process.

ContributorsMariajohn, Aaquila (Author) / Berisha, Visar (Thesis advisor) / Spanias, Andreas (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)

Created2020

Examining Speech Production in Children with Cleft Palate with or without Cleft Lip: An Investigation of Characteristics related to Speech Articulation Skills

Description

Children with cleft palate with or without cleft lip (CP+/-L) often demonstrate disordered speech. Clinicians and researchers have a goal for children with CP+/-L to demonstrate typical speech when entering kindergarten; however, this benchmark is not routinely met. There is a large body of previous research examining speech articulation skills…

Children with cleft palate with or without cleft lip (CP+/-L) often demonstrate disordered speech. Clinicians and researchers have a goal for children with CP+/-L to demonstrate typical speech when entering kindergarten; however, this benchmark is not routinely met. There is a large body of previous research examining speech articulation skills in this clinical population; however, there are continued questions regarding the severity of articulation deficits in children with CP+/-L, especially for the age range of children entering school. This dissertation aimed to provide additional information on speech accuracy and speech error usage in children with CP+/-L between the ages of four and seven years. Additionally, it explored individual and treatment characteristics that may influence articulation skills. Finally, it examined the relationship between speech accuracy during a sentence repetition task versus during a single-word naming task.

Children with CP+/-L presented with speech accuracy that differed according to manner of production. Speech accuracy for fricative phonemes was influenced by severity of hypernasality, although age and status of secondary surgery did not influence speech accuracy for fricatives. For place of articulation, children with CP+/-L demonstrated strongest accuracy of production for bilabial and velar phonemes, while alveolar and palatal phonemes were produced with lower accuracy. Children with clefting that involved the lip and alveolus demonstrated reduced speech accuracy for alveolar phonemes compared to children with clefts involving the hard and soft palate only.

Participants used a variety of speech error types, with developmental/phonological errors, anterior oral cleft speech characteristics, and compensatory errors occurring most frequently across the sample. Several factors impacted the type of speech errors used, including cleft type, severity of hypernasality, and age.

The results from this dissertation project support previous research findings and provide additional information regarding the severity of speech articulation deficits according to manner and place of consonant production and according to different speech error categories. This study adds information on individual and treatment characteristics that influenced speech accuracy and speech error usage.

ContributorsLien, Kari (Author) / Scherer, Nancy J. (Thesis advisor) / Nett Cordero, Kelly (Committee member) / Liss, Julie (Committee member) / Sitzman, Thomas (Committee member) / Arizona State University (Publisher)

Created2020

Specificity of Auditory Modulation during Speech Planning

Description

Previous research has showed that auditory modulation may be affected by pure tone
stimuli played prior to the onset of speech production. In this experiment, we are examining the
specificity of the auditory stimulus by implementing congruent and incongruent speech sounds in
addition to non-speech sound. Electroencephalography (EEG) data was recorded for eleven…

Previous research has showed that auditory modulation may be affected by pure tone
stimuli played prior to the onset of speech production. In this experiment, we are examining the
specificity of the auditory stimulus by implementing congruent and incongruent speech sounds in
addition to non-speech sound. Electroencephalography (EEG) data was recorded for eleven adult
subjects in both speaking (speech planning) and silent reading (no speech planning) conditions.
Data analysis was accomplished manually as well as via generation of a MATLAB code to
combine data sets and calculate auditory modulation (suppression). Results of the P200
modulation showed that modulation was larger for incongruent stimuli than congruent stimuli.
However, this was not the case for the N100 modulation. The data for pure tone could not be
analyzed because the intensity of this stimulus was substantially lower than that of the speech
stimuli. Overall, the results indicated that the P200 component plays a significant role in
processing stimuli and determining the relevance of stimuli; this result is consistent with role of
P200 component in high-level analysis of speech and perceptual processing. This experiment is
ongoing, and we hope to obtain data from more subjects to support the current findings.

ContributorsTaylor, Megan Kathleen (Author) / Daliri, Ayoub (Thesis director) / Liss, Julie (Committee member) / School of Life Sciences (Contributor) / School of International Letters and Cultures (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Using Transcranial Alternating Current Stimulation to Entrain Cortical Oscillations

Description

Transcranial Current Stimulation (TCS) is a long-established method of modulating neuronal activity in the brain. One type of this stimulation, transcranial alternating current stimulation (tACS), is able to entrain endogenous oscillations and result in behavioral change. In the present study, we used five stimulation conditions: tACS at three different frequencies…

Transcranial Current Stimulation (TCS) is a long-established method of modulating neuronal activity in the brain. One type of this stimulation, transcranial alternating current stimulation (tACS), is able to entrain endogenous oscillations and result in behavioral change. In the present study, we used five stimulation conditions: tACS at three different frequencies (6Hz, 12Hz, and 22Hz), transcranial random noise stimulation (tRNS), and a no-stimulation sham condition. In all stimulation conditions, we recorded electroencephalographic data to investigate the link between different frequencies of tACS and their effects on brain oscillations. We recruited 12 healthy participants. Each participant completed 30 trials of the stimulation conditions. In a given trial, we recorded brain activity for 10 seconds, stimulated for 12 seconds, and recorded an additional 10 seconds of brain activity. The difference between the average oscillation power before and after a stimulation condition indicated change in oscillation amplitude due to the stimulation. Our results showed the stimulation conditions entrained brain activity of a sub-group of participants.

ContributorsChernicky, Jacob Garrett (Author) / Daliri, Ayoub (Thesis director) / Liss, Julie (Committee member) / School of Life Sciences (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Development of a MATLAB Software to Localize the ‘Hotspot’ for TMS Studies

Description

Transcranial magnetic stimulation (TMS) is a non-invasive brain stimulation technique used in a variety of research settings, including speech neuroscience studies. However, one of the difficulties in using TMS for speech studies is the time that it takes to localize the lip motor cortex representation on the scalp. For my…

Transcranial magnetic stimulation (TMS) is a non-invasive brain stimulation technique used in a variety of research settings, including speech neuroscience studies. However, one of the difficulties in using TMS for speech studies is the time that it takes to localize the lip motor cortex representation on the scalp. For my project, I used MATLAB to create a software package that facilitates the localization of the ‘hotspot’ for TMS studies in a systematic, reliable manner. The software sends TMS pulses at certain locations, collects electromyography (EMG) data, and extracts motor-evoked potentials (MEPs) to help users visualize the resulting muscle activation. In this way, users can systematically find the subject’s hotspot for TMS stimulation of the motor cortex. The hotspot detection software was found to be an effective and efficient improvement on previous localization methods.

ContributorsKshatriya, Nyah (Author) / Daliri, Ayoub (Thesis director) / Liss, Julie (Committee member) / Barrett, The Honors College (Contributor) / Business (Minor) (Contributor)

Created2022-05