Search Content

Dynamic spatial hearing by human and robot listeners

Description

This study consisted of several related projects on dynamic spatial hearing by both human and robot listeners. The first experiment investigated the maximum number of sound sources that human listeners could localize at the same time. Speech stimuli were presented simultaneously from different loudspeakers at multiple time intervals. The maximum…

This study consisted of several related projects on dynamic spatial hearing by both human and robot listeners. The first experiment investigated the maximum number of sound sources that human listeners could localize at the same time. Speech stimuli were presented simultaneously from different loudspeakers at multiple time intervals. The maximum of perceived sound sources was close to four. The second experiment asked whether the amplitude modulation of multiple static sound sources could lead to the perception of auditory motion. On the horizontal and vertical planes, four independent noise sound sources with 60° spacing were amplitude modulated with consecutively larger phase delay. At lower modulation rates, motion could be perceived by human listeners in both cases. The third experiment asked whether several sources at static positions could serve as "acoustic landmarks" to improve the localization of other sources. Four continuous speech sound sources were placed on the horizontal plane with 90° spacing and served as the landmarks. The task was to localize a noise that was played for only three seconds when the listener was passively rotated in a chair in the middle of the loudspeaker array. The human listeners were better able to localize the sound sources with landmarks than without. The other experiments were with the aid of an acoustic manikin in an attempt to fuse binaural recording and motion data to localize sounds sources. A dummy head with recording devices was mounted on top of a rotating chair and motion data was collected. The fourth experiment showed that an Extended Kalman Filter could be used to localize sound sources in a recursive manner. The fifth experiment demonstrated the use of a fitting method for separating multiple sounds sources.

ContributorsZhong, Xuan (Author) / Yost, William (Thesis advisor) / Zhou, Yi (Committee member) / Dorman, Michael (Committee member) / Helms Tillery, Stephen (Committee member) / Arizona State University (Publisher)

Created2015

Learning and retention of novel words in musicians and non-musicians: the impact of enriched auditory experience on behavioral performance and electrophysiologic measures

Description

Music training is associated with measurable physiologic changes in the auditory pathway. Benefits of music training have also been demonstrated in the areas of working memory, auditory attention, and speech perception in noise. The purpose of this study was to determine whether long-term auditory experience secondary to music…

Music training is associated with measurable physiologic changes in the auditory pathway. Benefits of music training have also been demonstrated in the areas of working memory, auditory attention, and speech perception in noise. The purpose of this study was to determine whether long-term auditory experience secondary to music training enhances the ability to detect, learn, and recall new words.

Participants consisted of 20 young adult musicians and 20 age-matched non-musicians. In addition to completing word recognition and non-word detection tasks, each participant learned 10 nonsense words in a rapid word-learning task. All tasks were completed in quiet and in multi-talker babble. Next-day retention of the learned words was examined in isolation and in context. Cortical auditory evoked responses to vowel stimuli were recorded to obtain latencies and amplitudes for the N1, P2, and P3a components. Performance was compared across groups and listening conditions. Correlations between the behavioral tasks and the cortical auditory evoked responses were also examined.

No differences were found between groups (musicians vs. non-musicians) on any of the behavioral tasks. Nor did the groups differ in cortical auditory evoked response latencies or amplitudes, with the exception of P2 latencies, which were significantly longer in musicians than in non-musicians. Performance was significantly poorer in babble than in quiet on word recognition and non-word detection, but not on word learning, learned-word retention, or learned-word detection. CAEP latencies collapsed across group were significantly longer and amplitudes were significantly smaller in babble than in quiet. P2 latencies in quiet were positively correlated with word recognition in quiet, while P3a latencies in babble were positively correlated with word recognition and learned-word detection in babble. No other significant correlations were observed between CAEPs and performance on behavioral tasks.

These results indicated that, for young normal-hearing adults, auditory experience resulting from long-term music training did not provide an advantage for learning new information in either favorable (quiet) or unfavorable (babble) listening conditions. Results of the present study suggest that the relationship between music training and the strength of cortical auditory evoked responses may be more complex or too weak to be observed in this population.

ContributorsStewart, Elizabeth (Author) / Pittman, Andrea (Thesis advisor) / Cone, Barbara (Committee member) / Zhou, Yi (Committee member) / Arizona State University (Publisher)

Created2017

Natural Correlations of Spectral Envelope and their Contribution to Auditory Scene Analysis

Description

Auditory scene analysis (ASA) is the process through which listeners parse and organize their acoustic environment into relevant auditory objects. ASA functions by exploiting natural regularities in the structure of auditory information. The current study investigates spectral envelope and its contribution to the perception of changes in pitch and loudness.…

Auditory scene analysis (ASA) is the process through which listeners parse and organize their acoustic environment into relevant auditory objects. ASA functions by exploiting natural regularities in the structure of auditory information. The current study investigates spectral envelope and its contribution to the perception of changes in pitch and loudness. Experiment 1 constructs a perceptual continuum of twelve f0- and intensity-matched vowel phonemes (i.e. a pure timbre manipulation) and reveals spectral envelope as a primary organizational dimension. The extremes of this dimension are i (as in “bee”) and Ʌ (“bun”). Experiment 2 measures the strength of the relationship between produced f0 and the previously observed phonetic-pitch continuum at three different levels of phonemic constraint. Scat performances and, to a lesser extent, recorded interviews were found to exhibit changes in accordance with the natural regularity; specifically, f0 changes were correlated with the phoneme pitch-height continuum. The more constrained case of lyrical singing did not exhibit the natural regularity. Experiment 3 investigates participant ratings of pitch and loudness as stimuli vary in f0, intensity, and the phonetic-pitch continuum. Psychophysical functions derived from the results reveal that moving from i to Ʌ is equivalent to a .38 semitone decrease in f0 and a .75 dB decrease in intensity. Experiment 4 examines the potentially functional aspect of the pitch, loudness, and spectral envelope relationship. Detection thresholds of stimuli in which all three dimensions change congruently (f0 increase, intensity increase, Ʌ to i) or incongruently (no f0 change, intensity increase, i to Ʌ) are compared using an objective version of the method of limits. Congruent changes did not provide a detection benefit over incongruent changes; however, when the contribution of phoneme change was removed, congruent changes did offer a slight detection benefit, as in previous research. While this relationship does not offer a detection benefit at threshold, there is a natural regularity for humans to produce phonemes at higher f0s according to their relative position on the pitch height continuum. Likewise, humans have a bias to detect pitch and loudness changes in phoneme sweeps in accordance with the natural regularity.

ContributorsPatten, K. Jakob (Author) / Mcbeath, Michael K (Thesis advisor) / Amazeen, Eric L (Committee member) / Glenberg, Arthur W (Committee member) / Zhou, Yi (Committee member) / Arizona State University (Publisher)

Created2017

Response Accuracy and Response Time in Multisensory Localization

Description

Spatial awareness (i.e., the sense of the space that we are in) involves the integration of auditory, visual, vestibular, and proprioceptive sensory information of environmental events. Hearing impairment has negative effects on spatial awareness and can result in deficits in communication and the overall aesthetic experience of life, especially in…

Spatial awareness (i.e., the sense of the space that we are in) involves the integration of auditory, visual, vestibular, and proprioceptive sensory information of environmental events. Hearing impairment has negative effects on spatial awareness and can result in deficits in communication and the overall aesthetic experience of life, especially in noisy or reverberant environments. This deficit occurs as hearing impairment reduces the signal strength needed for auditory spatial processing and changes how auditory information is combined with other sensory inputs (e.g., vision). The influence of multisensory processing on spatial awareness in listeners with normal, and impaired hearing is not assessed in clinical evaluations, and patients’ everyday sensory experiences are currently not directly measurable. This dissertation investigated the role of vision in auditory localization in listeners with normal, and impaired hearing in a naturalistic stimulus setting, using natural gaze orienting responses. Experiments examined two behavioral outcomes—response accuracy and response time—based on eye movement in response to simultaneously presented auditory and visual stimuli. The first set of experiments examined the effects of stimulus spatial saliency on response accuracy and response time and the extent of visual dominance in both metrics in auditory localization. The results indicate that vision can significantly influence both the speed and accuracy of auditory localization, especially when auditory stimuli are more ambiguous. The influence of vision is shown for both normal hearing- and hearing-impaired listeners. The second set of experiments examined the effect of frontal visual stimulation on localizing an auditory target presented from in front of or behind a listener. The results show domain-specific effects of visual capture on both response time and response accuracy. These results support previous findings that auditory-visual interactions are not limited by the spatial rule of proximity. These results further suggest the strong influence of vision on both the processing and the decision-making stages of sound source localization for both listeners with normal, and impaired hearing.

ContributorsClayton, Colton (Author) / Zhou, Yi (Thesis advisor) / Azuma, Tamiko (Committee member) / Daliri, Ayoub (Committee member) / Arizona State University (Publisher)

Created2021

A Mixed Reality Platform for Systematic Investigation of the Neural Mechanisms of Multisensory Integration During Motor Planning

Description

Multisensory integration is the process by which information from different sensory modalities is integrated by the nervous system. This process is important not only from a basic science perspective but also for translational reasons, e.g., for the development of closed-loop neural prosthetic systems. A mixed virtual reality platform was developed…

Multisensory integration is the process by which information from different sensory modalities is integrated by the nervous system. This process is important not only from a basic science perspective but also for translational reasons, e.g., for the development of closed-loop neural prosthetic systems. A mixed virtual reality platform was developed to study the neural mechanisms of multisensory integration for the upper limb during motor planning. The platform allows for selection of different arms and manipulation of the locations of physical and virtual target cues in the environment. The system was tested with two non-human primates (NHP) trained to reach to multiple virtual targets. Arm kinematic data as well as neural spiking data from primary motor (M1) and dorsal premotor cortex (PMd) were collected. The task involved manipulating visual information about initial arm position by rendering the virtual avatar arm in either its actual position (veridical (V) condition) or in a different shifted (e.g., small vs large shifts) position (perturbed (P) condition) prior to movement. Tactile feedback was modulated in blocks by placing or removing the physical start cue on the table (tactile (T), and no-tactile (NT) conditions, respectively). Behaviorally, errors in initial movement direction were larger when the physical start cue was absent. Slightly larger directional errors were found in the P condition compared to the V condition for some movement directions. Both effects were consistent with the idea that erroneous or reduced information about initial hand location led to movement direction-dependent reach planning errors. Neural correlates of these behavioral effects were probed using population decoding techniques. For small shifts in the visual position of the arm, no differences in decoding accuracy between the T and NT conditions were observed in either M1 or PMd. However, for larger visual shifts, decoding accuracy decreased in the NT condition, but only in PMd. Thus, activity in PMd, but not M1, may reflect the uncertainty in reach planning that results when sensory cues regarding initial hand position are erroneous or absent.

ContributorsPhataraphruk, Preyaporn Kris (Author) / Buneo, Christopher A (Thesis advisor) / Zhou, Yi (Committee member) / Helms Tillery, Steve (Committee member) / Greger, Bradley (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2023

A computational model for studying L1’s effect on L2 speech learning

Description

Much evidence has shown that first language (L1) plays an important role in the formation of L2 phonological system during second language (L2) learning process. This combines with the fact that different L1s have distinct phonological patterns to indicate the diverse L2 speech learning outcomes for speakers from different L1…

Much evidence has shown that first language (L1) plays an important role in the formation of L2 phonological system during second language (L2) learning process. This combines with the fact that different L1s have distinct phonological patterns to indicate the diverse L2 speech learning outcomes for speakers from different L1 backgrounds. This dissertation hypothesizes that phonological distances between accented speech and speakers' L1 speech are also correlated with perceived accentedness, and the correlations are negative for some phonological properties. Moreover, contrastive phonological distinctions between L1s and L2 will manifest themselves in the accented speech produced by speaker from these L1s. To test the hypotheses, this study comes up with a computational model to analyze the accented speech properties in both segmental (short-term speech measurements on short-segment or phoneme level) and suprasegmental (long-term speech measurements on word, long-segment, or sentence level) feature space. The benefit of using a computational model is that it enables quantitative analysis of L1's effect on accent in terms of different phonological properties. The core parts of this computational model are feature extraction schemes to extract pronunciation and prosody representation of accented speech based on existing techniques in speech processing field. Correlation analysis on both segmental and suprasegmental feature space is conducted to look into the relationship between acoustic measurements related to L1s and perceived accentedness across several L1s. Multiple regression analysis is employed to investigate how the L1's effect impacts the perception of foreign accent, and how accented speech produced by speakers from different L1s behaves distinctly on segmental and suprasegmental feature spaces. Results unveil the potential application of the methodology in this study to provide quantitative analysis of accented speech, and extend current studies in L2 speech learning theory to large scale. Practically, this study further shows that the computational model proposed in this study can benefit automatic accentedness evaluation system by adding features related to speakers' L1s.

ContributorsTu, Ming (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie M (Committee member) / Zhou, Yi (Committee member) / Arizona State University (Publisher)

Created2018

A computational model of the relationship between speech intelligibility and speech acoustics

Description

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental…

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.

ContributorsJiao, Yishan (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie (Thesis advisor) / Zhou, Yi (Committee member) / Arizona State University (Publisher)

Created2019

Neuronal Deep Fakes Data Driven Optimization of Reduced Neuronal Model

Description

Neuron models that behave like their biological counterparts are essential for computational neuroscience.Reduced neuron models, which abstract away biological mechanisms in the interest of speed and interpretability, have received much attention due to their utility in large scale simulations of the brain, but little care has been taken to ensure…

Neuron models that behave like their biological counterparts are essential for computational neuroscience.Reduced neuron models, which abstract away biological mechanisms in the interest of speed and interpretability, have received much attention due to their utility in large scale simulations of the brain, but little care has been taken to ensure that these models exhibit behaviors that closely resemble real neurons.
In order to improve the verisimilitude of these reduced neuron models, I developed an optimizer that uses genetic algorithms to align model behaviors with those observed in experiments.
I verified that this optimizer was able to recover model parameters given only observed physiological data; however, I also found that reduced models nonetheless had limited ability to reproduce all observed behaviors, and that this varied by cell type and desired behavior.
These challenges can partly be surmounted by carefully designing the set of physiological features that guide the optimization. In summary, we found evidence that reduced neuron model optimization had the potential to produce reduced neuron models for only a limited range of neuron types.

ContributorsJarvis, Russell Jarrod (Author) / Crook, Sharon M (Thesis advisor) / Gerkin, Richard C (Thesis advisor) / Zhou, Yi (Committee member) / Abbas, James J (Committee member) / Arizona State University (Publisher)

Created2020

Understanding Cortical Neuron Dynamics through Simulation-Based Applications of Machine Learning

Description

It is increasingly common to see machine learning techniques applied in conjunction with computational modeling for data-driven research in neuroscience. Such applications include using machine learning for model development, particularly for optimization of parameters based on electrophysiological constraints. Alternatively, machine learning can be used to validate and enhance techniques for…

It is increasingly common to see machine learning techniques applied in conjunction with computational modeling for data-driven research in neuroscience. Such applications include using machine learning for model development, particularly for optimization of parameters based on electrophysiological constraints. Alternatively, machine learning can be used to validate and enhance techniques for experimental data analysis or to analyze model simulation data in large-scale modeling studies, which is the approach I apply here. I use simulations of biophysically-realistic cortical neuron models to supplement a common feature-based technique for analysis of electrophysiological signals. I leverage these simulated electrophysiological signals to perform feature selection that provides an improved method for neuron-type classification. Additionally, I validate an unsupervised approach that extends this improved feature selection to discover signatures associated with neuron morphologies - performing in vivo histology in effect. The result is a simulation-based discovery of the underlying synaptic conditions responsible for patterns of extracellular signatures that can be applied to understand both simulation and experimental data. I also use unsupervised learning techniques to identify common channel mechanisms underlying electrophysiological behaviors of cortical neuron models. This work relies on an open-source database containing a large number of computational models for cortical neurons. I perform a quantitative data-driven analysis of these previously published ion channel and neuron models that uses information shared across models as opposed to information limited to individual models. The result is simulation-based discovery of model sub-types at two spatial scales which map functional relationships between activation/inactivation properties of channel family model sub-types to electrophysiological properties of cortical neuron model sub-types. Further, the combination of unsupervised learning techniques and parameter visualizations serve to integrate characterizations of model electrophysiological behavior across scales.

ContributorsHaynes, Reuben (Author) / Crook, Sharon M (Thesis advisor) / Gerkin, Richard C (Committee member) / Zhou, Yi (Committee member) / Baer, Steven (Committee member) / Armbruster, Hans D (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by