Search Content

The value of two ears for sound source localization and speech understanding in complex listening environments: : two cochlear implants vs. two partially hearing ears and one cochlear implant

Description

Two groups of cochlear implant (CI) listeners were tested for sound source localization and for speech recognition in complex listening environments. One group (n=11) wore bilateral CIs and, potentially, had access to interaural level difference (ILD) cues, but not interaural timing difference (ITD) cues. The second group (n=12) wore a…

Two groups of cochlear implant (CI) listeners were tested for sound source localization and for speech recognition in complex listening environments. One group (n=11) wore bilateral CIs and, potentially, had access to interaural level difference (ILD) cues, but not interaural timing difference (ITD) cues. The second group (n=12) wore a single CI and had low-frequency, acoustic hearing in both the ear contralateral to the CI and in the implanted ear. These `hearing preservation' listeners, potentially, had access to ITD cues but not to ILD cues. At issue in this dissertation was the value of the two types of information about sound sources, ITDs and ILDs, for localization and for speech perception when speech and noise sources were separated in space. For Experiment 1, normal hearing (NH) listeners and the two groups of CI listeners were tested for sound source localization using a 13 loudspeaker array. For the NH listeners, the mean RMS error for localization was 7 degrees, for the bilateral CI listeners, 20 degrees, and for the hearing preservation listeners, 23 degrees. The scores for the two CI groups did not differ significantly. Thus, both CI groups showed equivalent, but poorer than normal, localization. This outcome using the filtered noise bands for the normal hearing listeners, suggests ILD and ITD cues can support equivalent levels of localization. For Experiment 2, the two groups of CI listeners were tested for speech recognition in noise when the noise sources and targets were spatially separated in a simulated `restaurant' environment and in two versions of a `cocktail party' environment. At issue was whether either CI group would show benefits from binaural hearing, i.e., better performance when the noise and targets were separated in space. Neither of the CI groups showed spatial release from masking. However, both groups showed a significant binaural advantage (a combination of squelch and summation), which also maintained separation of the target and noise, indicating the presence of some binaural processing or `unmasking' of speech in noise. Finally, localization ability in Experiment 1 was not correlated with binaural advantage in Experiment 2.

ContributorsLoiselle, Louise (Author) / Dorman, Michael F. (Thesis advisor) / Yost, William A. (Thesis advisor) / Azuma, Tamiko (Committee member) / Liss, Julie (Committee member) / Arizona State University (Publisher)

Created2013

Two-Sentence Recognition with a Pulse Train Vocoder

Description

When listeners hear sentences presented simultaneously, the listeners are better able to discriminate between speakers when there is a difference in fundamental frequency (F0). This paper explores the use of a pulse train vocoder to simulate cochlear implant listening. A pulse train vocoder, rather than a noise or tonal vocoder,…

When listeners hear sentences presented simultaneously, the listeners are better able to discriminate between speakers when there is a difference in fundamental frequency (F0). This paper explores the use of a pulse train vocoder to simulate cochlear implant listening. A pulse train vocoder, rather than a noise or tonal vocoder, was used so the fundamental frequency (F0) of speech would be well represented. The results of this experiment showed that listeners are able to use the F0 information to aid in speaker segregation. As expected, recognition performance is the poorest when there was no difference in F0 between speakers, and listeners performed better as the difference in F0 increased. The type of errors that the listeners made was also analyzed. The results show that when an error was made in identifying the correct word from the target sentence, the response was usually (~60%) a word that was uttered in the competing sentence.

ContributorsStanley, Nicole Ernestine (Author) / Yost, William (Thesis director) / Dorman, Michael (Committee member) / Liss, Julie (Committee member) / Barrett, The Honors College (Contributor) / Department of Speech and Hearing Science (Contributor) / Hugh Downs School of Human Communication (Contributor)

Created2013-05

Moving Target Defense: Defending against Adversarial Defense

Description

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the…

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the reasons why particular combinations were more effective than others is explored.

ContributorsMazboudi, Yassine Ahmad (Author) / Yang, Yezhou (Thesis director) / Ren, Yi (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Economics Program in CLAS (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Somatosensory Modulation during Speech Planning

Description

Previous studies have found that the detection of near-threshold stimuli is decreased immediately before movement and throughout movement production. This has been suggested to occur through the use of the internal forward model processing an efferent copy of the motor command and creating a prediction that is used to cancel…

Previous studies have found that the detection of near-threshold stimuli is decreased immediately before movement and throughout movement production. This has been suggested to occur through the use of the internal forward model processing an efferent copy of the motor command and creating a prediction that is used to cancel out the resulting sensory feedback. Currently, there are no published accounts of the perception of tactile signals for motor tasks and contexts related to the lips during both speech planning and production. In this study, we measured the responsiveness of the somatosensory system during speech planning using light electrical stimulation below the lower lip by comparing perception during mixed speaking and silent reading conditions. Participants were asked to judge whether a constant near-threshold electrical stimulation (subject-specific intensity, 85% detected at rest) was present during different time points relative to an initial visual cue. In the speaking condition, participants overtly produced target words shown on a computer monitor. In the reading condition, participants read the same target words silently to themselves without any movement or sound. We found that detection of the stimulus was attenuated during speaking conditions while remaining at a constant level close to the perceptual threshold throughout the silent reading condition. Perceptual modulation was most intense during speech production and showed some attenuation just prior to speech production during the planning period of speech. This demonstrates that there is a significant decrease in the responsiveness of the somatosensory system during speech production as well as milliseconds before speech is even produced which has implications for speech disorders such as stuttering and schizophrenia with pronounced deficits in the somatosensory system.

ContributorsMcguffin, Brianna Jean (Author) / Daliri, Ayoub (Thesis director) / Liss, Julie (Committee member) / Department of Psychology (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Startle-evoked movement in multi-jointed, two-dimensional reaching tasks

Description

Previous research has shown that a loud acoustic stimulus can trigger an individual's prepared movement plan. This movement response is referred to as a startle-evoked movement (SEM). SEM has been observed in the stroke survivor population where results have shown that SEM enhances single joint movements that are usually performed…

Previous research has shown that a loud acoustic stimulus can trigger an individual's prepared movement plan. This movement response is referred to as a startle-evoked movement (SEM). SEM has been observed in the stroke survivor population where results have shown that SEM enhances single joint movements that are usually performed with difficulty. While the presence of SEM in the stroke survivor population advances scientific understanding of movement capabilities following a stroke, published studies using the SEM phenomenon only examined one joint. The ability of SEM to generate multi-jointed movements is understudied and consequently limits SEM as a potential therapy tool. In order to apply SEM as a therapy tool however, the biomechanics of the arm in multi-jointed movement planning and execution must be better understood. Thus, the objective of our study was to evaluate if SEM could elicit multi-joint reaching movements that were accurate in an unrestrained, two-dimensional workspace. Data was collected from ten subjects with no previous neck, arm, or brain injury. Each subject performed a reaching task to five Targets that were equally spaced in a semi-circle to create a two-dimensional workspace. The subject reached to each Target following a sequence of two non-startling acoustic stimuli cues: "Get Ready" and "Go". A loud acoustic stimuli was randomly substituted for the "Go" cue. We hypothesized that SEM is accessible and accurate for unrestricted multi-jointed reaching tasks in a functional workspace and is therefore independent of movement direction. Our results found that SEM is possible in all five Target directions. The probability of evoking SEM and the movement kinematics (i.e. total movement time, linear deviation, average velocity) to each Target are not statistically different. Thus, we conclude that SEM is possible in a functional workspace and is not dependent on where arm stability is maximized. Moreover, coordinated preparation and storage of a multi-jointed movement is indeed possible.

ContributorsOssanna, Meilin Ryan (Author) / Honeycutt, Claire (Thesis director) / Schaefer, Sydney (Committee member) / Harrington Bioengineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Towards learning compact visual embeddings using deep neural networks

Description

Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models…

Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models have seen numerous applications in both language and vision community as they capture the information in the modality (English language) efficiently. Inspired by these language models, this work focuses on learning embedding spaces for two visual computing tasks, 1. Image Hashing 2. Zero Shot Learning. The training set was used to learn embedding spaces over which similarity/dissimilarity is measured using several distance metrics like hamming / euclidean / cosine distances. While the above-mentioned language models learn generic word embeddings, in this work task specific embeddings were learnt which can be used for Image Retrieval and Classification separately.

Image Hashing is the task of mapping images to binary codes such that some notion of user-defined similarity is preserved. The first part of this work focuses on designing a new framework that uses the hash-tags associated with web images to learn the binary codes. Such codes can be used in several applications like Image Retrieval and Image Classification. Further, this framework requires no labelled data, leaving it very inexpensive. Results show that the proposed approach surpasses the state-of-art approaches by a significant margin.

Zero-shot classification is the task of classifying the test sample into a new class which was not seen during training. This is possible by establishing a relationship between the training and the testing classes using auxiliary information. In the second part of this thesis, a framework is designed that trains using the handcrafted attribute vectors and word vectors but doesn’t require the expensive attribute vectors during test time. More specifically, an intermediate space is learnt between the word vector space and the image feature space using the hand-crafted attribute vectors. Preliminary results on two zero-shot classification datasets show that this is a promising direction to explore.

ContributorsGattupalli, Jaya Vijetha (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)

Created2019

Machine Learning and Mario Speedruns

Description

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will…

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will be described is self-play in video games, where a neural model will be researched and described that will teach a computer to complete a level of Super Mario World (1990) on its own. The neural model in question was inspired by the academic paper “Evolving Neural Networks through Augmenting Topologies”, which was written by Kenneth O. Stanley and Risto Miikkulainen of University of Texas at Austin. The model that will actually be described is from YouTuber SethBling of the California Institute of Technology. The second department that will be described is cybersecurity, where an algorithm is described from the academic paper “Process Based Volatile Memory Forensics for Ransomware Detection”, written by Asad Arfeen, Muhammad Asim Khan, Obad Zafar, and Usama Ahsan. This algorithm utilizes Python and the Volatility framework to detect malicious software in an infected system.

ContributorsBallecer, Joshua (Author) / Yang, Yezhou (Thesis director) / Luo, Yiran (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Reduced Order Models and Approximations for Hardware Acceleration of Neural Networks

Description

Many real-world engineering problems require simulations to evaluate the design objectives and constraints. Often, due to the complexity of the system model, simulations can be prohibitive in terms of computation time. One approach to overcome this issue is to construct a surrogate model, which approximates the original model. The focus…

Many real-world engineering problems require simulations to evaluate the design objectives and constraints. Often, due to the complexity of the system model, simulations can be prohibitive in terms of computation time. One approach to overcome this issue is to construct a surrogate model, which approximates the original model. The focus of this work is on the data-driven surrogate models, in which empirical approximations of the output are performed given the input parameters. Recently neural networks (NN) have re-emerged as a popular method for constructing data-driven surrogate models. Although, NNs have achieved excellent accuracy and are widely used, they pose their own challenges. This work addresses two common challenges, the need for: (1) hardware acceleration and (2) uncertainty quantification (UQ) in the presence of input variability. The high demand in the inference phase of deep NNs in cloud servers/edge devices calls for the design of low power custom hardware accelerators. The first part of this work describes the design of an energy-efficient long short-term memory (LSTM) accelerator. The overarching goal is to aggressively reduce the power consumption and area of the LSTM components using approximate computing, and then use architectural level techniques to boost the performance. The proposed design is synthesized and placed and routed as an application-specific integrated circuit (ASIC). The results demonstrate that this accelerator is 1.2X and 3.6X more energy-efficient and area-efficient than the baseline LSTM. In the second part of this work, a robust framework is developed based on an alternate data-driven surrogate model referred to as polynomial chaos expansion (PCE) for addressing UQ. In contrast to many existing approaches, no assumptions are made on the elements of the function space and UQ is a function of the expansion coefficients. Moreover, the sensitivity of the output with respect to any subset of the input variables can be computed analytically by post-processing the PCE coefficients. This provides a systematic and incremental method to pruning or changing the order of the model. This framework is evaluated on several real-world applications from different domains and is extended for classification tasks as well.

ContributorsAzari, Elham (Author) / Vrudhula, Sarma (Thesis advisor) / Fainekos, Georgios (Committee member) / Ren, Fengbo (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2021

Toward Reliable Graph Matching: from Deterministic Optimization to Combinatorial Learning

Description

Graph matching is a fundamental but notoriously difficult problem due to its NP-hard nature, and serves as a cornerstone for a series of applications in machine learning and computer vision, such as image matching, dynamic routing, drug design, to name a few. Although there has been massive previous investigation on…

Graph matching is a fundamental but notoriously difficult problem due to its NP-hard nature, and serves as a cornerstone for a series of applications in machine learning and computer vision, such as image matching, dynamic routing, drug design, to name a few. Although there has been massive previous investigation on high-performance graph matching solvers, it still remains a challenging task to tackle the matching problem under real-world scenarios with severe graph uncertainty (e.g., noise, outlier, misleading or ambiguous link).In this dissertation, a main focus is to investigate the essence and propose solutions to graph matching with higher reliability under such uncertainty. To this end, the proposed research was conducted taking into account three perspectives related to reliable graph matching: modeling, optimization and learning. For modeling, graph matching is extended from typical quadratic assignment problem to a more generic mathematical model by introducing a specific family of separable function, achieving higher capacity and reliability. In terms of optimization, a novel high gradient-efficient determinant-based regularization technique is proposed in this research, showing high robustness against outliers. Then learning paradigm for graph matching under intrinsic combinatorial characteristics is explored. First, a study is conducted on the way of filling the gap between discrete problem and its continuous approximation under a deep learning framework. Then this dissertation continues to investigate the necessity of more reliable latent topology of graphs for matching, and propose an effective and flexible framework to obtain it. Coherent findings in this dissertation include theoretical study and several novel algorithms, with rich experiments demonstrating the effectiveness.

ContributorsYu, Tianshu (Author) / Li, Baoxin (Thesis advisor) / Wang, Yalin (Committee member) / Yang, Yezhou (Committee member) / Yang, Yingzhen (Committee member) / Arizona State University (Publisher)

Created2021

Machine Learning Approaches to Tumor Estimation of Whole Slide Images

Description

Molecular pathology makes use of estimates of tumor content (tumor percentage) for pre-analytic and analytic purposes, such as molecular oncology testing, massive parallel sequencing, or next-generation sequencing (NGS), assessment of sample acceptability, accurate quantitation of variants, assessment of copy number changes (among other applications), determination of specimen viability for testing…

Molecular pathology makes use of estimates of tumor content (tumor percentage) for pre-analytic and analytic purposes, such as molecular oncology testing, massive parallel sequencing, or next-generation sequencing (NGS), assessment of sample acceptability, accurate quantitation of variants, assessment of copy number changes (among other applications), determination of specimen viability for testing (since many assays require a minimum tumor content to report variants at the limit of detection) may all be improved with more accurate and reproducible estimates of tumor content. Currently, tumor percentages of samples submitted for molecular testing are estimated by visual examination of Hematoxylin and Eosin (H&E) stained tissue slides under the microscope by pathologists. These estimations can be automated, expedited, and rendered more accurate by applying machine learning methods on digital whole slide images (WSI).

ContributorsCirelli, Claire (Author) / Yang, Yezhou (Thesis director) / Yalim, Jason (Committee member) / Velu, Priya (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

Filtering by