Search Content

Re-sonification of objects, events, and environments

Description

Digital sound synthesis allows the creation of a great variety of sounds. Focusing on interesting or ecologically valid sounds for music, simulation, aesthetics, or other purposes limits the otherwise vast digital audio palette. Tools for creating such sounds vary from arbitrary methods of altering recordings to precise simulations of vibrating…

Digital sound synthesis allows the creation of a great variety of sounds. Focusing on interesting or ecologically valid sounds for music, simulation, aesthetics, or other purposes limits the otherwise vast digital audio palette. Tools for creating such sounds vary from arbitrary methods of altering recordings to precise simulations of vibrating objects. In this work, methods of sound synthesis by re-sonification are considered. Re-sonification, herein, refers to the general process of analyzing, possibly transforming, and resynthesizing or reusing recorded sounds in meaningful ways, to convey information. Applied to soundscapes, re-sonification is presented as a means of conveying activity within an environment. Applied to the sounds of objects, this work examines modeling the perception of objects as well as their physical properties and the ability to simulate interactive events with such objects. To create soundscapes to re-sonify geographic environments, a method of automated soundscape design is presented. Using recorded sounds that are classified based on acoustic, social, semantic, and geographic information, this method produces stochastically generated soundscapes to re-sonify selected geographic areas. Drawing on prior knowledge, local sounds and those deemed similar comprise a locale's soundscape. In the context of re-sonifying events, this work examines processes for modeling and estimating the excitations of sounding objects. These include plucking, striking, rubbing, and any interaction that imparts energy into a system, affecting the resultant sound. A method of estimating a linear system's input, constrained to a signal-subspace, is presented and applied toward improving the estimation of percussive excitations for re-sonification. To work toward robust recording-based modeling and re-sonification of objects, new implementations of banded waveguide (BWG) models are proposed for object modeling and sound synthesis. Previous implementations of BWGs use arbitrary model parameters and may produce a range of simulations that do not match digital waveguide or modal models of the same design. Subject to linear excitations, some models proposed here behave identically to other equivalently designed physical models. Under nonlinear interactions, such as bowing, many of the proposed implementations exhibit improvements in the attack characteristics of synthesized sounds.

ContributorsFink, Alex M (Author) / Spanias, Andreas S (Thesis advisor) / Cook, Perry R. (Committee member) / Turaga, Pavan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2013

Dynamic spatial hearing by human and robot listeners

Description

This study consisted of several related projects on dynamic spatial hearing by both human and robot listeners. The first experiment investigated the maximum number of sound sources that human listeners could localize at the same time. Speech stimuli were presented simultaneously from different loudspeakers at multiple time intervals. The maximum…

This study consisted of several related projects on dynamic spatial hearing by both human and robot listeners. The first experiment investigated the maximum number of sound sources that human listeners could localize at the same time. Speech stimuli were presented simultaneously from different loudspeakers at multiple time intervals. The maximum of perceived sound sources was close to four. The second experiment asked whether the amplitude modulation of multiple static sound sources could lead to the perception of auditory motion. On the horizontal and vertical planes, four independent noise sound sources with 60° spacing were amplitude modulated with consecutively larger phase delay. At lower modulation rates, motion could be perceived by human listeners in both cases. The third experiment asked whether several sources at static positions could serve as "acoustic landmarks" to improve the localization of other sources. Four continuous speech sound sources were placed on the horizontal plane with 90° spacing and served as the landmarks. The task was to localize a noise that was played for only three seconds when the listener was passively rotated in a chair in the middle of the loudspeaker array. The human listeners were better able to localize the sound sources with landmarks than without. The other experiments were with the aid of an acoustic manikin in an attempt to fuse binaural recording and motion data to localize sounds sources. A dummy head with recording devices was mounted on top of a rotating chair and motion data was collected. The fourth experiment showed that an Extended Kalman Filter could be used to localize sound sources in a recursive manner. The fifth experiment demonstrated the use of a fitting method for separating multiple sounds sources.

ContributorsZhong, Xuan (Author) / Yost, William (Thesis advisor) / Zhou, Yi (Committee member) / Dorman, Michael (Committee member) / Helms Tillery, Stephen (Committee member) / Arizona State University (Publisher)

Created2015

Investigations of environmental effects on freeway acoustics

Description

The role of environmental factors that influence atmospheric propagation of sound originating from freeway noise sources is studied with a combination of field experiments and numerical simulations. Acoustic propagation models are developed and adapted for refractive index depending upon meteorological conditions. A high-resolution multi-nested environmental forecasting model forced by coarse…

The role of environmental factors that influence atmospheric propagation of sound originating from freeway noise sources is studied with a combination of field experiments and numerical simulations. Acoustic propagation models are developed and adapted for refractive index depending upon meteorological conditions. A high-resolution multi-nested environmental forecasting model forced by coarse global analysis is applied to predict real meteorological profiles at fine scales. These profiles are then used as input for the acoustic models. Numerical methods for producing higher resolution acoustic refractive index fields are proposed. These include spatial and temporal nested meteorological simulations with vertical grid refinement. It is shown that vertical nesting can improve the prediction of finer structures in near-ground temperature and velocity profiles, such as morning temperature inversions and low level jet-like features. Accurate representation of these features is shown to be important for modeling sound refraction phenomena and for enabling accurate noise assessment. Comparisons are made using the acoustic model for predictions with profiles derived from meteorological simulations and from field experiment observations in Phoenix, Arizona. The challenges faced in simulating accurate meteorological profiles at high resolution for sound propagation applications are highlighted and areas for possible improvement are discussed.

A detailed evaluation of the environmental forecast is conducted by investigating the Surface Energy Balance (SEB) obtained from observations made with an eddy-covariance flux tower compared with SEB from simulations using several physical parameterizations of urban effects and planetary boundary layer schemes. Diurnal variation in SEB constituent fluxes are examined in relation to surface layer stability and modeled diagnostic variables. Improvement is found when adapting parameterizations for Phoenix with reduced errors in the SEB components. Finer model resolution (to 333 m) is seen to have insignificant ($<1\sigma$) influence on mean absolute percent difference of 30-minute diurnal mean SEB terms. A new method of representing inhomogeneous urban development density derived from observations of impervious surfaces with sub-grid scale resolution is then proposed for mesoscale applications. This method was implemented and evaluated within the environmental modeling framework. Finally, a new semi-implicit scheme based on Leapfrog and a fourth-order implicit time-filter is developed.

ContributorsShaffer, Stephen R. (Author) / Moustaoui, Mohamed (Thesis advisor) / Mahalov, Alex (Committee member) / Fernando, Harindra J.S. (Committee member) / Ovenden, Nicholas C. (Committee member) / Huang, Huei-Ping (Committee member) / Calhoun, Ronald (Committee member) / Arizona State University (Publisher)

Created2014

Incorporating auditory models in speech/audio applications

Description

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding…

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.

ContributorsKrishnamoorthi, Harish (Author) / Spanias, Andreas (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2011

Degraded vowel acoustics and the perceptual consequences in dysarthria

Description

Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been…

Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been paid to quantifying the vowel production deficits associated with the specific dysarthrias. Attempts to characterize the relationship between naturally degraded vowel production in dysarthria with overall intelligibility have met with mixed results, leading some to question the nature of this relationship. It has been suggested that aberrant vowel acoustics may be an index of overall severity of the impairment and not an "integral component" of the intelligibility deficit. A limitation of previous work detailing perceptual consequences of disordered vowel acoustics is that overall intelligibility, not vowel identification accuracy, has been the perceptual measure of interest. A series of three experiments were conducted to address the problems outlined herein. The goals of the first experiment were to identify subsets of vowel metrics that reliably distinguish speakers with dysarthria from non-disordered speakers and differentiate the dysarthria subtypes. Vowel metrics that capture vowel centralization and reduced spectral distinctiveness among vowels differentiated dysarthric from non-disordered speakers. Vowel metrics generally failed to differentiate speakers according to their dysarthria diagnosis. The second and third experiments were conducted to evaluate the relationship between degraded vowel acoustics and the resulting percept. In the second experiment, correlation and regression analyses revealed vowel metrics that capture vowel centralization and distinctiveness and movement of the second formant frequency were most predictive of vowel identification accuracy and overall intelligibility. The third experiment was conducted to evaluate the extent to which the nature of the acoustic degradation predicts the resulting percept. Results suggest distinctive vowel tokens are better identified and, likewise, better-identified tokens are more distinctive. Further, an above-chance level agreement between nature of vowel misclassification and misidentification errors was demonstrated for all vowels, suggesting degraded vowel acoustics are not merely an index of severity in dysarthria, but rather are an integral component of the resultant intelligibility disorder.

ContributorsLansford, Kaitlin L (Author) / Liss, Julie M (Thesis advisor) / Dorman, Michael F. (Committee member) / Azuma, Tamiko (Committee member) / Lotto, Andrew J (Committee member) / Arizona State University (Publisher)

Created2012

Concert Hall Acoustics and Piano Lid Height: A Study of Five Arizona Concert Halls

Description

Traditional consensus in duos with grand piano has been that issues of balance between piano and the other instrument can be corrected through lowering the lid on the piano, particularly when the other instrument has been thought of as less forceful. The perceived result of lowering the lid on the…

Traditional consensus in duos with grand piano has been that issues of balance between piano and the other instrument can be corrected through lowering the lid on the piano, particularly when the other instrument has been thought of as less forceful. The perceived result of lowering the lid on the piano is to quiet the piano enough so as not to overwhelm the other instrument, though the physics of the piano and acoustics suggest that it is incorrect to expect this result. Due to the physics of the piano and natural laws such as the conservation of energy, as well as the intricacies of sound propagation, the author hypothesizes that lowering the lid on the piano does not have a significant effect on its sound output for the audience of a musical performance. Experimentation to determine empirically whether the lid has any significant effect on the piano's volume and tone for the audience seating area was undertaken, with equipment to objectively measure volume and tone quality produced by a mechanical set of arms that reproduces an F-major chord with consistent power. The chord was produced with a wooden frame that input consistent energy into the piano, with measurements taken from the audience seating area using a sound pressure level meter and recorded with a Zoom H4N digital recorder for analysis. The results suggested that lowering the lid has a small effect on sound pressure level, but not significant enough to overcome issues of overtone balance or individual pianists’ touch.

ContributorsLee, Paul Allen (Author) / Campbell, Andrew (Thesis advisor) / DeMars, James (Committee member) / FitzPatrick, Carole (Committee member) / Ryan, Russell (Committee member) / Swoboda, Deanna (Committee member) / Arizona State University (Publisher)

Created2017

Clarinet multiphonics: a catalog and analysis of their production strategies

Description

Clarinet multiphonics have become increasingly popular among composers since they were first introduced in the 1950s. However, it is a topic poorly understood by both performers and composers, which sometimes leads to the use of acoustically impossible multiphonics in compositions. Producing multiphonics requires precise manipulations of embouchure force, air pressure,…

Clarinet multiphonics have become increasingly popular among composers since they were first introduced in the 1950s. However, it is a topic poorly understood by both performers and composers, which sometimes leads to the use of acoustically impossible multiphonics in compositions. Producing multiphonics requires precise manipulations of embouchure force, air pressure, and tongue position. These three factors are invisible to the naked eye during clarinet performance, leading to many conflicting theories about multiphonic production strategies, often based on subjective perception of the performer. This study attempts to observe the latter factor—tongue motion—during multiphonic production in situ using ultrasound. Additionally, a multiphonic catalog containing 604 dyad multiphonics was compiled as part of this study. The author hypothesized that nearly all, if not all, of the multiphonics can be produced using one of four primary production strategies. The four production strategies are: (A) lowering the back of the tongue while sustaining the upper note; (B) raising the back of the tongue while sustaining the upper note; (C) changing the tongue position to that of the lower note while sustaining the upper note; and (D) raising the root of the tongue (a sensation similar to constricting the throat) while sustaining the upper note. To distill production strategies into four primary categories, the author documented his perceived tongue motion over twenty repetitions of playing every multiphonic in the catalog. These perceptions were then confirmed or corrected through ultrasound investigation sessions after every five repetitions. The production strategies detailed in this study are only for finding the correct voicing to produce the multiphonics. The catalog compiled during this study is organized using two different organizational systems: the first uses the traditional method of organizing by pitch; the second uses a fingering-based system to facilitate the ease of finding multiphonics in question, since notated pitches of multiphonics often differ between sources.

ContributorsLiang, Jack Yi Jing (Author) / Gardner, Joshua (Thesis advisor) / Spring, Robert (Thesis advisor) / Caslor, Jason (Committee member) / Creviston, Christopher (Committee member) / Rockmaker, Jody (Committee member) / Arizona State University (Publisher)

Created2018

High-Frequency Ultrasound Analysis of Soft Material Characterization

Description

Ultrasound has become one of the most popular non-destructive characterization tools for soft materials. Compared to conventional ultrasound imaging, quantitative ultrasound has the potential of analyzing detailed microstructural variation through spectral analysis. Because of having a better axial and lateral resolution, and high attenuation coefficient, quantitative high-frequency ultrasound analysis (HFUA)…

Ultrasound has become one of the most popular non-destructive characterization tools for soft materials. Compared to conventional ultrasound imaging, quantitative ultrasound has the potential of analyzing detailed microstructural variation through spectral analysis. Because of having a better axial and lateral resolution, and high attenuation coefficient, quantitative high-frequency ultrasound analysis (HFUA) is a very effective tool for small-scale penetration depth application. One of the QUS parameters, peak density had recently shown a promising response with the variation in the soft material microstructure. Acoustic scattering is arguably the most important factor behind different parametric responses in ultrasound spectra. Therefore, to evaluate peak density, acoustic scattering at different frequency levels was investigated. Analytical, computational, and experimental analysis was conducted to observe both single and multiple scattering in different microstructural setups. It was observed that peak density was an effective tool to express different levels of acoustic scattering that occurred through microstructural variation. The feasibility of the peak density parameter was further evaluated in ultrasound C-scan imaging. The study was also extended to detect the relative position of the imaged structure in the direction of wave propagation. For this purpose, a derivative parameter of peak density named mean peak to valley distance (MPVD) was developed to address the limitations of peak density. The study was then focused on detecting soft tissue malignancy. The histology-based computational study of HFUA was conducted to detect various breast tumor (soft tissue) grades. It was observed that both peak density and MPVD parameters could identify tumor grades at a certain level. Finally, the study was focused on evaluating the feasibility of ultrasound parameters to detect asymptotic breast carcinoma i.e., ductal carcinoma in situ (DCIS) in the surgical margin of the breast tumor. In that computational study, breast pathologies were modeled by including all the phases of DCIS. From the similar analysis mentioned above, it was understood that both peak density and MPVD parameters could detect various breast pathologies like ductal hyperplasia, DCIS, and calcification during intraoperative margin analysis. Furthermore, the spectral features of the frequency spectrums from various pathologies also provided significant information to identify them conclusively.

ContributorsPaul, Koushik (Author) / Ladani, Leila (Thesis advisor) / Razmi, Jafar (Committee member) / Holloway, Julianne (Committee member) / Li, Xiangjia (Committee member) / Liu, Yongming (Committee member) / Arizona State University (Publisher)

Created2022

Response Accuracy and Response Time in Multisensory Localization

Description

Spatial awareness (i.e., the sense of the space that we are in) involves the integration of auditory, visual, vestibular, and proprioceptive sensory information of environmental events. Hearing impairment has negative effects on spatial awareness and can result in deficits in communication and the overall aesthetic experience of life, especially in…

Spatial awareness (i.e., the sense of the space that we are in) involves the integration of auditory, visual, vestibular, and proprioceptive sensory information of environmental events. Hearing impairment has negative effects on spatial awareness and can result in deficits in communication and the overall aesthetic experience of life, especially in noisy or reverberant environments. This deficit occurs as hearing impairment reduces the signal strength needed for auditory spatial processing and changes how auditory information is combined with other sensory inputs (e.g., vision). The influence of multisensory processing on spatial awareness in listeners with normal, and impaired hearing is not assessed in clinical evaluations, and patients’ everyday sensory experiences are currently not directly measurable. This dissertation investigated the role of vision in auditory localization in listeners with normal, and impaired hearing in a naturalistic stimulus setting, using natural gaze orienting responses. Experiments examined two behavioral outcomes—response accuracy and response time—based on eye movement in response to simultaneously presented auditory and visual stimuli. The first set of experiments examined the effects of stimulus spatial saliency on response accuracy and response time and the extent of visual dominance in both metrics in auditory localization. The results indicate that vision can significantly influence both the speed and accuracy of auditory localization, especially when auditory stimuli are more ambiguous. The influence of vision is shown for both normal hearing- and hearing-impaired listeners. The second set of experiments examined the effect of frontal visual stimulation on localizing an auditory target presented from in front of or behind a listener. The results show domain-specific effects of visual capture on both response time and response accuracy. These results support previous findings that auditory-visual interactions are not limited by the spatial rule of proximity. These results further suggest the strong influence of vision on both the processing and the decision-making stages of sound source localization for both listeners with normal, and impaired hearing.

ContributorsClayton, Colton (Author) / Zhou, Yi (Thesis advisor) / Azuma, Tamiko (Committee member) / Daliri, Ayoub (Committee member) / Arizona State University (Publisher)

Created2021

Techniques for soundscape retrieval and synthesis

Description

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the…

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the realms of digital signal processing, multimedia information retrieval, and computer music synthesis to the analysis of the soundscape. Namely, these tools include a) an open source software library, Sirens, which can be used for the segmentation of long environmental field recordings into individual sonic events and compare these events in terms of acoustic content, b) a graph-based retrieval system that can use these measures of acoustic similarity and measures of semantic similarity using the lexical database WordNet to perform both text-based retrieval and automatic annotation of environmental sounds, and c) new techniques for the dynamic, realtime parametric morphing of multiple field recordings, informed by the geographic paths along which they were recorded.

ContributorsMechtley, Brandon Michael (Author) / Spanias, Andreas S (Thesis advisor) / Sundaram, Hari (Thesis advisor) / Cook, Perry R. (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2013

Filtering by