Matching Items (812)
Filtering by

Clear all filters

151722-Thumbnail Image.png
Description
Digital sound synthesis allows the creation of a great variety of sounds. Focusing on interesting or ecologically valid sounds for music, simulation, aesthetics, or other purposes limits the otherwise vast digital audio palette. Tools for creating such sounds vary from arbitrary methods of altering recordings to precise simulations of vibrating

Digital sound synthesis allows the creation of a great variety of sounds. Focusing on interesting or ecologically valid sounds for music, simulation, aesthetics, or other purposes limits the otherwise vast digital audio palette. Tools for creating such sounds vary from arbitrary methods of altering recordings to precise simulations of vibrating objects. In this work, methods of sound synthesis by re-sonification are considered. Re-sonification, herein, refers to the general process of analyzing, possibly transforming, and resynthesizing or reusing recorded sounds in meaningful ways, to convey information. Applied to soundscapes, re-sonification is presented as a means of conveying activity within an environment. Applied to the sounds of objects, this work examines modeling the perception of objects as well as their physical properties and the ability to simulate interactive events with such objects. To create soundscapes to re-sonify geographic environments, a method of automated soundscape design is presented. Using recorded sounds that are classified based on acoustic, social, semantic, and geographic information, this method produces stochastically generated soundscapes to re-sonify selected geographic areas. Drawing on prior knowledge, local sounds and those deemed similar comprise a locale's soundscape. In the context of re-sonifying events, this work examines processes for modeling and estimating the excitations of sounding objects. These include plucking, striking, rubbing, and any interaction that imparts energy into a system, affecting the resultant sound. A method of estimating a linear system's input, constrained to a signal-subspace, is presented and applied toward improving the estimation of percussive excitations for re-sonification. To work toward robust recording-based modeling and re-sonification of objects, new implementations of banded waveguide (BWG) models are proposed for object modeling and sound synthesis. Previous implementations of BWGs use arbitrary model parameters and may produce a range of simulations that do not match digital waveguide or modal models of the same design. Subject to linear excitations, some models proposed here behave identically to other equivalently designed physical models. Under nonlinear interactions, such as bowing, many of the proposed implementations exhibit improvements in the attack characteristics of synthesized sounds.
ContributorsFink, Alex M (Author) / Spanias, Andreas S (Thesis advisor) / Cook, Perry R. (Committee member) / Turaga, Pavan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)
Created2013
151418-Thumbnail Image.png
Description
ABSTRACT This work seeks to develop a practical solution for short range ultrasonic communications and produce an integrated array of acoustic transmitters on a flexible substrate. This is done using flexible thin film transistor (TFT) and micro electromechanical systems (MEMS). The goal is to develop a flexible system capable of

ABSTRACT This work seeks to develop a practical solution for short range ultrasonic communications and produce an integrated array of acoustic transmitters on a flexible substrate. This is done using flexible thin film transistor (TFT) and micro electromechanical systems (MEMS). The goal is to develop a flexible system capable of communicating in the ultrasonic frequency range at a distance of 10 - 100 meters. This requires a great deal of innovation on the part of the FDC team developing the TFT driving circuitry and the MEMS team adapting the technology for fabrication on a flexible substrate. The technologies required for this research are independently developed. The TFT development is driven primarily by research into flexible displays. The MEMS development is driving by research in biosensors and micro actuators. This project involves the integration of TFT flexible circuit capabilities with MEMS micro actuators in the novel area of flexible acoustic transmitter arrays. This thesis focuses on the design, testing and analysis of the circuit components required for this project.
ContributorsDaugherty, Robin (Author) / Allee, David R. (Thesis advisor) / Chae, Junseok (Thesis advisor) / Aberle, James T (Committee member) / Vasileska, Dragica (Committee member) / Arizona State University (Publisher)
Created2012
ContributorsChang, Ruihong (Performer) / ASU Library. Music Library (Publisher)
Created2018-03-29
152941-Thumbnail Image.png
Description
Head movement is known to have the benefit of improving the accuracy of sound localization for humans and animals. Marmoset is a small bodied New World monkey species and it has become an emerging model for studying the auditory functions. This thesis aims to detect the horizontal and vertical

Head movement is known to have the benefit of improving the accuracy of sound localization for humans and animals. Marmoset is a small bodied New World monkey species and it has become an emerging model for studying the auditory functions. This thesis aims to detect the horizontal and vertical rotation of head movement in marmoset monkeys.

Experiments were conducted in a sound-attenuated acoustic chamber. Head movement of marmoset monkey was studied under various auditory and visual stimulation conditions. With increasing complexity, these conditions are (1) idle, (2) sound-alone, (3) sound and visual signals, and (4) alert signal by opening and closing of the chamber door. All of these conditions were tested with either house light on or off. Infra-red camera with a frame rate of 90 Hz was used to capture of the head movement of monkeys. To assist the signal detection, two circular markers were attached to the top of monkey head. The data analysis used an image-based marker detection scheme. Images were processed using the Computation Vision Toolbox in Matlab. The markers and their positions were detected using blob detection techniques. Based on the frame-by-frame information of marker positions, the angular position, velocity and acceleration were extracted in horizontal and vertical planes. Adaptive Otsu Thresholding, Kalman filtering and bound setting for marker properties were used to overcome a number of challenges encountered during this analysis, such as finding image segmentation threshold, continuously tracking markers during large head movement, and false alarm detection.

The results show that the blob detection method together with Kalman filtering yielded better performances than other image based techniques like optical flow and SURF features .The median of the maximal head turn in the horizontal plane was in the range of 20 to 70 degrees and the median of the maximal velocity in horizontal plane was in the range of a few hundreds of degrees per second. In comparison, the natural alert signal - door opening and closing - evoked the faster head turns than other stimulus conditions. These results suggest that behaviorally relevant stimulus such as alert signals evoke faster head-turn responses in marmoset monkeys.
ContributorsSimhadri, Sravanthi (Author) / Zhou, Yi (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2014
153418-Thumbnail Image.png
Description
This study consisted of several related projects on dynamic spatial hearing by both human and robot listeners. The first experiment investigated the maximum number of sound sources that human listeners could localize at the same time. Speech stimuli were presented simultaneously from different loudspeakers at multiple time intervals. The maximum

This study consisted of several related projects on dynamic spatial hearing by both human and robot listeners. The first experiment investigated the maximum number of sound sources that human listeners could localize at the same time. Speech stimuli were presented simultaneously from different loudspeakers at multiple time intervals. The maximum of perceived sound sources was close to four. The second experiment asked whether the amplitude modulation of multiple static sound sources could lead to the perception of auditory motion. On the horizontal and vertical planes, four independent noise sound sources with 60° spacing were amplitude modulated with consecutively larger phase delay. At lower modulation rates, motion could be perceived by human listeners in both cases. The third experiment asked whether several sources at static positions could serve as "acoustic landmarks" to improve the localization of other sources. Four continuous speech sound sources were placed on the horizontal plane with 90° spacing and served as the landmarks. The task was to localize a noise that was played for only three seconds when the listener was passively rotated in a chair in the middle of the loudspeaker array. The human listeners were better able to localize the sound sources with landmarks than without. The other experiments were with the aid of an acoustic manikin in an attempt to fuse binaural recording and motion data to localize sounds sources. A dummy head with recording devices was mounted on top of a rotating chair and motion data was collected. The fourth experiment showed that an Extended Kalman Filter could be used to localize sound sources in a recursive manner. The fifth experiment demonstrated the use of a fitting method for separating multiple sounds sources.
ContributorsZhong, Xuan (Author) / Yost, William (Thesis advisor) / Zhou, Yi (Committee member) / Dorman, Michael (Committee member) / Helms Tillery, Stephen (Committee member) / Arizona State University (Publisher)
Created2015
153171-Thumbnail Image.png
Description
The role of environmental factors that influence atmospheric propagation of sound originating from freeway noise sources is studied with a combination of field experiments and numerical simulations. Acoustic propagation models are developed and adapted for refractive index depending upon meteorological conditions. A high-resolution multi-nested environmental forecasting model forced by coarse

The role of environmental factors that influence atmospheric propagation of sound originating from freeway noise sources is studied with a combination of field experiments and numerical simulations. Acoustic propagation models are developed and adapted for refractive index depending upon meteorological conditions. A high-resolution multi-nested environmental forecasting model forced by coarse global analysis is applied to predict real meteorological profiles at fine scales. These profiles are then used as input for the acoustic models. Numerical methods for producing higher resolution acoustic refractive index fields are proposed. These include spatial and temporal nested meteorological simulations with vertical grid refinement. It is shown that vertical nesting can improve the prediction of finer structures in near-ground temperature and velocity profiles, such as morning temperature inversions and low level jet-like features. Accurate representation of these features is shown to be important for modeling sound refraction phenomena and for enabling accurate noise assessment. Comparisons are made using the acoustic model for predictions with profiles derived from meteorological simulations and from field experiment observations in Phoenix, Arizona. The challenges faced in simulating accurate meteorological profiles at high resolution for sound propagation applications are highlighted and areas for possible improvement are discussed.



A detailed evaluation of the environmental forecast is conducted by investigating the Surface Energy Balance (SEB) obtained from observations made with an eddy-covariance flux tower compared with SEB from simulations using several physical parameterizations of urban effects and planetary boundary layer schemes. Diurnal variation in SEB constituent fluxes are examined in relation to surface layer stability and modeled diagnostic variables. Improvement is found when adapting parameterizations for Phoenix with reduced errors in the SEB components. Finer model resolution (to 333 m) is seen to have insignificant ($<1\sigma$) influence on mean absolute percent difference of 30-minute diurnal mean SEB terms. A new method of representing inhomogeneous urban development density derived from observations of impervious surfaces with sub-grid scale resolution is then proposed for mesoscale applications. This method was implemented and evaluated within the environmental modeling framework. Finally, a new semi-implicit scheme based on Leapfrog and a fourth-order implicit time-filter is developed.
ContributorsShaffer, Stephen R. (Author) / Moustaoui, Mohamed (Thesis advisor) / Mahalov, Alex (Committee member) / Fernando, Harindra J.S. (Committee member) / Ovenden, Nicholas C. (Committee member) / Huang, Huei-Ping (Committee member) / Calhoun, Ronald (Committee member) / Arizona State University (Publisher)
Created2014
153277-Thumbnail Image.png
Description
This study explores the psychophysical and neural processes associated with the perception of sounds as either pleasant or aversive. The underlying psychophysical theory is based on auditory scene analysis, the process through which listeners parse auditory signals into individual acoustic sources. The first experiment tests and confirms that a self-rated

This study explores the psychophysical and neural processes associated with the perception of sounds as either pleasant or aversive. The underlying psychophysical theory is based on auditory scene analysis, the process through which listeners parse auditory signals into individual acoustic sources. The first experiment tests and confirms that a self-rated pleasantness continuum reliably exists for 20 various stimuli (r = .48). In addition, the pleasantness continuum correlated with the physical acoustic characteristics of consonance/dissonance (r = .78), which can facilitate auditory parsing processes. The second experiment uses an fMRI block design to test blood oxygen level dependent (BOLD) changes elicited by a subset of 5 exemplar stimuli chosen from Experiment 1 that are evenly distributed over the pleasantness continuum. Specifically, it tests and confirms that the pleasantness continuum produces systematic changes in brain activity for unpleasant acoustic stimuli beyond what occurs with pleasant auditory stimuli. Results revealed that the combination of two positively and two negatively valenced experimental sounds compared to one neutral baseline control elicited BOLD increases in the primary auditory cortex, specifically the bilateral superior temporal gyrus, and left dorsomedial prefrontal cortex; the latter being consistent with a frontal decision-making process common in identification tasks. The negatively-valenced stimuli yielded additional BOLD increases in the left insula, which typically indicates processing of visceral emotions. The positively-valenced stimuli did not yield any significant BOLD activation, consistent with consonant, harmonic stimuli being the prototypical acoustic pattern of auditory objects that is optimal for auditory scene analysis. Both the psychophysical findings of Experiment 1 and the neural processing findings of Experiment 2 support that consonance is an important dimension of sound that is processed in a manner that aids auditory parsing and functional representation of acoustic objects and was found to be a principal feature of pleasing auditory stimuli.
ContributorsPatten, Kristopher Jakob (Author) / Mcbeath, Michael K (Thesis advisor) / Baxter, Leslie C (Committee member) / Amazeen, Eric L (Committee member) / Dorman, Michael F. (Committee member) / Arizona State University (Publisher)
Created2014
149867-Thumbnail Image.png
Description
Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.
ContributorsKrishnamoorthi, Harish (Author) / Spanias, Andreas (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)
Created2011
150496-Thumbnail Image.png
Description
Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been

Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been paid to quantifying the vowel production deficits associated with the specific dysarthrias. Attempts to characterize the relationship between naturally degraded vowel production in dysarthria with overall intelligibility have met with mixed results, leading some to question the nature of this relationship. It has been suggested that aberrant vowel acoustics may be an index of overall severity of the impairment and not an "integral component" of the intelligibility deficit. A limitation of previous work detailing perceptual consequences of disordered vowel acoustics is that overall intelligibility, not vowel identification accuracy, has been the perceptual measure of interest. A series of three experiments were conducted to address the problems outlined herein. The goals of the first experiment were to identify subsets of vowel metrics that reliably distinguish speakers with dysarthria from non-disordered speakers and differentiate the dysarthria subtypes. Vowel metrics that capture vowel centralization and reduced spectral distinctiveness among vowels differentiated dysarthric from non-disordered speakers. Vowel metrics generally failed to differentiate speakers according to their dysarthria diagnosis. The second and third experiments were conducted to evaluate the relationship between degraded vowel acoustics and the resulting percept. In the second experiment, correlation and regression analyses revealed vowel metrics that capture vowel centralization and distinctiveness and movement of the second formant frequency were most predictive of vowel identification accuracy and overall intelligibility. The third experiment was conducted to evaluate the extent to which the nature of the acoustic degradation predicts the resulting percept. Results suggest distinctive vowel tokens are better identified and, likewise, better-identified tokens are more distinctive. Further, an above-chance level agreement between nature of vowel misclassification and misidentification errors was demonstrated for all vowels, suggesting degraded vowel acoustics are not merely an index of severity in dysarthria, but rather are an integral component of the resultant intelligibility disorder.
ContributorsLansford, Kaitlin L (Author) / Liss, Julie M (Thesis advisor) / Dorman, Michael F. (Committee member) / Azuma, Tamiko (Committee member) / Lotto, Andrew J (Committee member) / Arizona State University (Publisher)
Created2012
153808-Thumbnail Image.png
Description
Four Souvenirs for Violin and Piano was composed by Paul Schoenfeld (b.1947) in 1990 as a showpiece, spotlighting the virtuosity of both the violin and piano in equal measure. Each movement is a modern interpretation of a folk or popular genre, re- envisioned over intricate jazz harmonies and rhythms. The

Four Souvenirs for Violin and Piano was composed by Paul Schoenfeld (b.1947) in 1990 as a showpiece, spotlighting the virtuosity of both the violin and piano in equal measure. Each movement is a modern interpretation of a folk or popular genre, re- envisioned over intricate jazz harmonies and rhythms. The work was commissioned by violinist Lev Polyakin, who specifically requested some short pieces that could be performed in a local jazz establishment named Night Town in Cleveland, Ohio. The result is a work that is approximately fifteen minutes in length. Schoenfeld is a respected composer in the contemporary classical music community, whose Café Music (1986) for piano trio has recently become a staple of the standard chamber music repertoire. Many of his other works, however, remain in relative obscurity. It is the focus of this document to shed light on at least one other notable composition; Four Souvenirs for Violin and Piano. Among the topics to be discussed regarding this piece are a brief history behind the genesis of this composition, a structural summary of the entire work and each of its movements, and an appended practice guide based on interview and coaching sessions with the composer himself. With this project, I hope to provide a better understanding and appreciation of this work.
ContributorsJanczyk, Kristie Annette (Author) / Ryan, Russell (Thesis advisor) / Campbell, Andrew (Committee member) / Norton, Kay (Committee member) / Arizona State University (Publisher)
Created2015