Search Content

Incorporating auditory models in speech/audio applications

Description

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding…

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.

ContributorsKrishnamoorthi, Harish (Author) / Spanias, Andreas (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2011

Techniques for soundscape retrieval and synthesis

Description

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the…

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the realms of digital signal processing, multimedia information retrieval, and computer music synthesis to the analysis of the soundscape. Namely, these tools include a) an open source software library, Sirens, which can be used for the segmentation of long environmental field recordings into individual sonic events and compare these events in terms of acoustic content, b) a graph-based retrieval system that can use these measures of acoustic similarity and measures of semantic similarity using the lexical database WordNet to perform both text-based retrieval and automatic annotation of environmental sounds, and c) new techniques for the dynamic, realtime parametric morphing of multiple field recordings, informed by the geographic paths along which they were recorded.

ContributorsMechtley, Brandon Michael (Author) / Spanias, Andreas S (Thesis advisor) / Sundaram, Hari (Thesis advisor) / Cook, Perry R. (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2013

Re-sonification of objects, events, and environments

Description

Digital sound synthesis allows the creation of a great variety of sounds. Focusing on interesting or ecologically valid sounds for music, simulation, aesthetics, or other purposes limits the otherwise vast digital audio palette. Tools for creating such sounds vary from arbitrary methods of altering recordings to precise simulations of vibrating…

Digital sound synthesis allows the creation of a great variety of sounds. Focusing on interesting or ecologically valid sounds for music, simulation, aesthetics, or other purposes limits the otherwise vast digital audio palette. Tools for creating such sounds vary from arbitrary methods of altering recordings to precise simulations of vibrating objects. In this work, methods of sound synthesis by re-sonification are considered. Re-sonification, herein, refers to the general process of analyzing, possibly transforming, and resynthesizing or reusing recorded sounds in meaningful ways, to convey information. Applied to soundscapes, re-sonification is presented as a means of conveying activity within an environment. Applied to the sounds of objects, this work examines modeling the perception of objects as well as their physical properties and the ability to simulate interactive events with such objects. To create soundscapes to re-sonify geographic environments, a method of automated soundscape design is presented. Using recorded sounds that are classified based on acoustic, social, semantic, and geographic information, this method produces stochastically generated soundscapes to re-sonify selected geographic areas. Drawing on prior knowledge, local sounds and those deemed similar comprise a locale's soundscape. In the context of re-sonifying events, this work examines processes for modeling and estimating the excitations of sounding objects. These include plucking, striking, rubbing, and any interaction that imparts energy into a system, affecting the resultant sound. A method of estimating a linear system's input, constrained to a signal-subspace, is presented and applied toward improving the estimation of percussive excitations for re-sonification. To work toward robust recording-based modeling and re-sonification of objects, new implementations of banded waveguide (BWG) models are proposed for object modeling and sound synthesis. Previous implementations of BWGs use arbitrary model parameters and may produce a range of simulations that do not match digital waveguide or modal models of the same design. Subject to linear excitations, some models proposed here behave identically to other equivalently designed physical models. Under nonlinear interactions, such as bowing, many of the proposed implementations exhibit improvements in the attack characteristics of synthesized sounds.

ContributorsFink, Alex M (Author) / Spanias, Andreas S (Thesis advisor) / Cook, Perry R. (Committee member) / Turaga, Pavan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2013

Thin film transistor control circuitry for MEMS acoustic transducers

Description

ABSTRACT This work seeks to develop a practical solution for short range ultrasonic communications and produce an integrated array of acoustic transmitters on a flexible substrate. This is done using flexible thin film transistor (TFT) and micro electromechanical systems (MEMS). The goal is to develop a flexible system capable of…

ABSTRACT This work seeks to develop a practical solution for short range ultrasonic communications and produce an integrated array of acoustic transmitters on a flexible substrate. This is done using flexible thin film transistor (TFT) and micro electromechanical systems (MEMS). The goal is to develop a flexible system capable of communicating in the ultrasonic frequency range at a distance of 10 - 100 meters. This requires a great deal of innovation on the part of the FDC team developing the TFT driving circuitry and the MEMS team adapting the technology for fabrication on a flexible substrate. The technologies required for this research are independently developed. The TFT development is driven primarily by research into flexible displays. The MEMS development is driving by research in biosensors and micro actuators. This project involves the integration of TFT flexible circuit capabilities with MEMS micro actuators in the novel area of flexible acoustic transmitter arrays. This thesis focuses on the design, testing and analysis of the circuit components required for this project.

ContributorsDaugherty, Robin (Author) / Allee, David R. (Thesis advisor) / Chae, Junseok (Thesis advisor) / Aberle, James T (Committee member) / Vasileska, Dragica (Committee member) / Arizona State University (Publisher)

Created2012

Degraded vowel acoustics and the perceptual consequences in dysarthria

Description

Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been…

Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been paid to quantifying the vowel production deficits associated with the specific dysarthrias. Attempts to characterize the relationship between naturally degraded vowel production in dysarthria with overall intelligibility have met with mixed results, leading some to question the nature of this relationship. It has been suggested that aberrant vowel acoustics may be an index of overall severity of the impairment and not an "integral component" of the intelligibility deficit. A limitation of previous work detailing perceptual consequences of disordered vowel acoustics is that overall intelligibility, not vowel identification accuracy, has been the perceptual measure of interest. A series of three experiments were conducted to address the problems outlined herein. The goals of the first experiment were to identify subsets of vowel metrics that reliably distinguish speakers with dysarthria from non-disordered speakers and differentiate the dysarthria subtypes. Vowel metrics that capture vowel centralization and reduced spectral distinctiveness among vowels differentiated dysarthric from non-disordered speakers. Vowel metrics generally failed to differentiate speakers according to their dysarthria diagnosis. The second and third experiments were conducted to evaluate the relationship between degraded vowel acoustics and the resulting percept. In the second experiment, correlation and regression analyses revealed vowel metrics that capture vowel centralization and distinctiveness and movement of the second formant frequency were most predictive of vowel identification accuracy and overall intelligibility. The third experiment was conducted to evaluate the extent to which the nature of the acoustic degradation predicts the resulting percept. Results suggest distinctive vowel tokens are better identified and, likewise, better-identified tokens are more distinctive. Further, an above-chance level agreement between nature of vowel misclassification and misidentification errors was demonstrated for all vowels, suggesting degraded vowel acoustics are not merely an index of severity in dysarthria, but rather are an integral component of the resultant intelligibility disorder.

ContributorsLansford, Kaitlin L (Author) / Liss, Julie M (Thesis advisor) / Dorman, Michael F. (Committee member) / Azuma, Tamiko (Committee member) / Lotto, Andrew J (Committee member) / Arizona State University (Publisher)

Created2012

Development of acoustic sensor for flow rate monitoring

Description

The project is mainly aimed at detecting the gas flow rate in Biosensors and medical health applications by means of an acoustic method using whistle based device. Considering the challenges involved in maintaining particular flow rate and back pressure for detecting certain analytes in breath analysis the proposed system along…

The project is mainly aimed at detecting the gas flow rate in Biosensors and medical health applications by means of an acoustic method using whistle based device. Considering the challenges involved in maintaining particular flow rate and back pressure for detecting certain analytes in breath analysis the proposed system along with a cell phone provides a suitable way to maintain the flow rate without any additional battery driven device. To achieve this, a system-level approach is implemented which involves development of a closed end whistle which is placed inside a tightly fitted constant back pressure tube. By means of experimentation pressure vs. flowrate curve is initially obtained and used for the development of the particular whistle. Finally, by means of an FFT code in a cell phone the flow rate vs. frequency characteristic curve is obtained. When a person respires through the device a whistle sound is generated which is captured by the cellphone microphone and a FFT analysis is performed to determine the frequency and hence the flow rate from the characteristic curve. This approach can be used to detect flow rate as low as low as 1L/min. The concept has been applied for the first time in this work to the development and optimization of a breath analyzer.

ContributorsRavichandran, Balaje Dhanram (Author) / Forzani, Erica (Thesis advisor) / Xian, Xiaojun (Committee member) / Huang, Huei-Ping (Committee member) / Arizona State University (Publisher)

Created2012

High-Frequency Ultrasound Analysis of Soft Material Characterization

Description

Ultrasound has become one of the most popular non-destructive characterization tools for soft materials. Compared to conventional ultrasound imaging, quantitative ultrasound has the potential of analyzing detailed microstructural variation through spectral analysis. Because of having a better axial and lateral resolution, and high attenuation coefficient, quantitative high-frequency ultrasound analysis (HFUA)…

Ultrasound has become one of the most popular non-destructive characterization tools for soft materials. Compared to conventional ultrasound imaging, quantitative ultrasound has the potential of analyzing detailed microstructural variation through spectral analysis. Because of having a better axial and lateral resolution, and high attenuation coefficient, quantitative high-frequency ultrasound analysis (HFUA) is a very effective tool for small-scale penetration depth application. One of the QUS parameters, peak density had recently shown a promising response with the variation in the soft material microstructure. Acoustic scattering is arguably the most important factor behind different parametric responses in ultrasound spectra. Therefore, to evaluate peak density, acoustic scattering at different frequency levels was investigated. Analytical, computational, and experimental analysis was conducted to observe both single and multiple scattering in different microstructural setups. It was observed that peak density was an effective tool to express different levels of acoustic scattering that occurred through microstructural variation. The feasibility of the peak density parameter was further evaluated in ultrasound C-scan imaging. The study was also extended to detect the relative position of the imaged structure in the direction of wave propagation. For this purpose, a derivative parameter of peak density named mean peak to valley distance (MPVD) was developed to address the limitations of peak density. The study was then focused on detecting soft tissue malignancy. The histology-based computational study of HFUA was conducted to detect various breast tumor (soft tissue) grades. It was observed that both peak density and MPVD parameters could identify tumor grades at a certain level. Finally, the study was focused on evaluating the feasibility of ultrasound parameters to detect asymptotic breast carcinoma i.e., ductal carcinoma in situ (DCIS) in the surgical margin of the breast tumor. In that computational study, breast pathologies were modeled by including all the phases of DCIS. From the similar analysis mentioned above, it was understood that both peak density and MPVD parameters could detect various breast pathologies like ductal hyperplasia, DCIS, and calcification during intraoperative margin analysis. Furthermore, the spectral features of the frequency spectrums from various pathologies also provided significant information to identify them conclusively.

ContributorsPaul, Koushik (Author) / Ladani, Leila (Thesis advisor) / Razmi, Jafar (Committee member) / Holloway, Julianne (Committee member) / Li, Xiangjia (Committee member) / Liu, Yongming (Committee member) / Arizona State University (Publisher)

Created2022

Experimental Investigation of the Combined Effect of Al2O3 Nanoparticles and Ultrasound on Convective Heat Transfer under Laminar Flow Condition in a Circular Mini Channel Heat Sink

Description

The colloidal solutions of nanoparticles have been seen as promising solutions forheat transfer enhancement. Additionally, there has been an accelerated study on the effects of ultrasound on heat transfer enhancement in recent years. A few authors have studied the combined impact of Al2O3 nanofluids and ultrasound on mini channels. This study focused on…

The colloidal solutions of nanoparticles have been seen as promising solutions forheat transfer enhancement. Additionally, there has been an accelerated study on the effects of ultrasound on heat transfer enhancement in recent years. A few authors have studied the combined impact of Al2O3 nanofluids and ultrasound on mini channels. This study focused on the combined effects of Al2O3 nanofluids and ultrasound on heat transfer enhancement in a circular mini channel heat sink. Two concentrations of Al2O3-water nanofluids, i.e., 0.5% and 1%, were used for the experiments in addition to two heat input conditions, namely 40 W and 50 W providing a constant heat flux of 25000 W m-2 and 31250 W m-2 respectively. The effect on the nanofluids using 5 W ultrasound was analyzed. Experimental observations show that the usage of ultrasound increased the heat transfer coefficient. The heat transfer coefficient also increased with increasing nanoparticle concentration and high heat flux. The average heat transfer coefficient enhancement for 0.5% and 1% nanofluid due to increased heat flux in the absence of ultrasound was 12.4% and 9% respectively. At a constant heat input of 40 W, the induction of ultrasound enhanced the heat transfer coefficient by 22.8% and 23.9% for 0.5% and 1% nanofluid respectively. Similarly, for a constant heat input of 50 W, the usage of ultrasound enhanced the heat transfer coefficient by 19.8% and 22.9% for 0.5% and 1% nanofluid respectively Also, interesting findings are reported with low heat input with ultrasound vs. high heat input without ultrasound (i.e., 40 W with US vs. 50 W without US). The heat transfer coefficient and Nusselt number for 0.5% and 1% concentrations was enhanced by 9.2% and 13.6%, respectively. Furthermore, for fixed heat input powers of 40 W and 50 W, increasing the concentration from 0.5% to 1% along with ultrasound yielded an average enhancement in Nu of 38.3% and 32.4% respectively

ContributorsMastoi, Faisal Ali (Author) / Phelan, Patrick E (Thesis advisor) / Milcarek, Ryan (Committee member) / Kwon, Beomjin (Committee member) / Arizona State University (Publisher)

Created2022

The Effect of Bimodal Hearing on Speech Intonation Production in Adult Cochlear Implant Users

Description

Prosodic features such as fundamental frequency (F0), intensity, and duration convey important information of speech intonation (i.e., is it a statement or a question?). Because cochlear implants (CIs) do not adequately encode pitch-related F0 cues, pre-lignually deaf pediatric CI users have poorer speech intonation perception and production than normal-hearing (NH)…

Prosodic features such as fundamental frequency (F0), intensity, and duration convey important information of speech intonation (i.e., is it a statement or a question?). Because cochlear implants (CIs) do not adequately encode pitch-related F0 cues, pre-lignually deaf pediatric CI users have poorer speech intonation perception and production than normal-hearing (NH) children. In contrast, post-lingually deaf adult CI users have developed speech production skills via normal hearing before deafness and implantation. Further, combined electric hearing (via CI) and acoustic hearing (via hearing aid, HA) may improve CI users’ perception of pitch cues in speech intonation. Therefore, this study tested (1) whether post-lingually deaf adult CI users have similar speech intonation production to NH adults and (2) whether their speech intonation production improves with auditory feedback via CI+HA (i.e., bimodal hearing). Eight post-lingually deaf adult bimodal CI users and nine NH adults participated in this study. 10 question-and-answer dialogues with an experimenter were used to elicit 10 pairs of syntactically matched questions and statements from each participant. Bimodal CI users were tested under four hearing conditions: no-device (ND), HA, CI, and CI+HA. F0 change, intensity change, and duration ratio between the last two syllables of each utterance were analyzed to evaluate the quality of speech intonation production. The results showed no significant differences between CI and NH participants in any of the acoustic features of questions and statements. For CI participants, the CI+HA condition led to significantly greater F0 decreases of statements than the ND condition, while the ND condition led to significantly greater duration ratios of questions and statements. These results suggest that bimodal CI users change the use of prosodic cues for speech intonation production in different hearing conditions and access to auditory feedback via CI+HA may improve their voice pitch control to produce more salient statement intonation contours.

ContributorsAi, Chang (Author) / Luo, Xin (Thesis advisor) / Daliri, Ayoub (Committee member) / Davidson, Lisa (Committee member) / Arizona State University (Publisher)

Created2022

Response Accuracy and Response Time in Multisensory Localization

Description

Spatial awareness (i.e., the sense of the space that we are in) involves the integration of auditory, visual, vestibular, and proprioceptive sensory information of environmental events. Hearing impairment has negative effects on spatial awareness and can result in deficits in communication and the overall aesthetic experience of life, especially in…

Spatial awareness (i.e., the sense of the space that we are in) involves the integration of auditory, visual, vestibular, and proprioceptive sensory information of environmental events. Hearing impairment has negative effects on spatial awareness and can result in deficits in communication and the overall aesthetic experience of life, especially in noisy or reverberant environments. This deficit occurs as hearing impairment reduces the signal strength needed for auditory spatial processing and changes how auditory information is combined with other sensory inputs (e.g., vision). The influence of multisensory processing on spatial awareness in listeners with normal, and impaired hearing is not assessed in clinical evaluations, and patients’ everyday sensory experiences are currently not directly measurable. This dissertation investigated the role of vision in auditory localization in listeners with normal, and impaired hearing in a naturalistic stimulus setting, using natural gaze orienting responses. Experiments examined two behavioral outcomes—response accuracy and response time—based on eye movement in response to simultaneously presented auditory and visual stimuli. The first set of experiments examined the effects of stimulus spatial saliency on response accuracy and response time and the extent of visual dominance in both metrics in auditory localization. The results indicate that vision can significantly influence both the speed and accuracy of auditory localization, especially when auditory stimuli are more ambiguous. The influence of vision is shown for both normal hearing- and hearing-impaired listeners. The second set of experiments examined the effect of frontal visual stimulation on localizing an auditory target presented from in front of or behind a listener. The results show domain-specific effects of visual capture on both response time and response accuracy. These results support previous findings that auditory-visual interactions are not limited by the spatial rule of proximity. These results further suggest the strong influence of vision on both the processing and the decision-making stages of sound source localization for both listeners with normal, and impaired hearing.

ContributorsClayton, Colton (Author) / Zhou, Yi (Thesis advisor) / Azuma, Tamiko (Committee member) / Daliri, Ayoub (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by