Matching Items (79)
152198-Thumbnail Image.png
Description
The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters

The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP." These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students.
ContributorsKalyanasundaram, Girish (Author) / Spanias, Andreas S (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2013
136475-Thumbnail Image.png
Description
Epilepsy affects numerous people around the world and is characterized by recurring seizures, prompting the ability to predict them so precautionary measures may be employed. One promising algorithm extracts spatiotemporal correlation based features from intracranial electroencephalography signals for use with support vector machines. The robustness of this methodology is tested

Epilepsy affects numerous people around the world and is characterized by recurring seizures, prompting the ability to predict them so precautionary measures may be employed. One promising algorithm extracts spatiotemporal correlation based features from intracranial electroencephalography signals for use with support vector machines. The robustness of this methodology is tested through a sensitivity analysis. Doing so also provides insight about how to construct more effective feature vectors.
ContributorsMa, Owen (Author) / Bliss, Daniel (Thesis director) / Berisha, Visar (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)
Created2015-05
137081-Thumbnail Image.png
Description
Passive radar can be used to reduce the demand for radio frequency spectrum bandwidth. This paper will explain how a MATLAB simulation tool was developed to analyze the feasibility of using passive radar with digitally modulated communication signals. The first stage of the simulation creates a binary phase-shift keying (BPSK)

Passive radar can be used to reduce the demand for radio frequency spectrum bandwidth. This paper will explain how a MATLAB simulation tool was developed to analyze the feasibility of using passive radar with digitally modulated communication signals. The first stage of the simulation creates a binary phase-shift keying (BPSK) signal, quadrature phase-shift keying (QPSK) signal, or digital terrestrial television (DTTV) signal. A scenario is then created using user defined parameters that simulates reception of the original signal on two different channels, a reference channel and a surveillance channel. The signal on the surveillance channel is delayed and Doppler shifted according to a point target scattering profile. An ambiguity function detector is implemented to identify the time delays and Doppler shifts associated with reflections off of the targets created. The results of an example are included in this report to demonstrate the simulation capabilities.
ContributorsScarborough, Gillian Donnelly (Author) / Cochran, Douglas (Thesis director) / Berisha, Visar (Committee member) / Wang, Chao (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)
Created2014-05
136347-Thumbnail Image.png
Description
The ability of cochlear implants (CI) to restore auditory function has advanced significantly in the past decade. Approximately 96,000 people in the United States benefit from these devices, which by the generation and transmission of electrical impulses, enable the brain to perceive sound. But due to the predominantly Western cochlear

The ability of cochlear implants (CI) to restore auditory function has advanced significantly in the past decade. Approximately 96,000 people in the United States benefit from these devices, which by the generation and transmission of electrical impulses, enable the brain to perceive sound. But due to the predominantly Western cochlear implant market, current CI characterization primarily focuses on improving the quality of American English. Only recently has research begun to evaluate CI performance using other languages such as Mandarin Chinese, which rely on distinct spectral characteristics not present in English. Mandarin, a tonal language utilizes four, distinct pitch patterns, which when voiced a syllable, conveys different meanings for the same word. This presents a challenge to hearing research as spectral, or frequency based information like pitch is readily acknowledged to be significantly reduced by CI processing algorithms. Thus the present study sought to identify the intelligibility differences for English and Mandarin when processed using current CI strategies. The objective of the study was to pinpoint any notable discrepancies in speech recognition, using voice-coded (vocoded) audio that simulates a CI generated stimuli. This approach allowed 12 normal hearing English speakers, and 9 normal hearing Mandarin listeners to participate in the experiment. The number of frequency channels available and the carrier type of excitation were varied in order to compare their effects on two cases of Mandarin intelligibility: Case 1) word recognition and Case 2) combined word and tone recognition. The results indicated a statistically significant difference between English and Mandarin intelligibility for Condition 1 (8Ch-Sinewave Carrier, p=0.022) given Case 1 and Condition 1 (8Ch-Sinewave Carrier, p=0.001) and Condition 3 (16Ch-Sinewave Carrier, p=0.001) given Case 2. The data suggests that the nature of the carrier type does have an effect on tonal language intelligibility and warrants further research as a design consideration for future cochlear implants.
ContributorsSchiltz, Jessica Hammitt (Author) / Berisha, Visar (Thesis director) / Frakes, David (Committee member) / Barrett, The Honors College (Contributor) / Harrington Bioengineering Program (Contributor)
Created2015-05
132795-Thumbnail Image.png
Description
The marmoset monkey (Callithrix jacchus) is a new-world primate species native to South America rainforests. Because they rely on vocal communication to navigate and survive, marmosets have evolved as a promising primate model to study vocal production, perception, cognition, and social interactions. The purpose of this project is to provide

The marmoset monkey (Callithrix jacchus) is a new-world primate species native to South America rainforests. Because they rely on vocal communication to navigate and survive, marmosets have evolved as a promising primate model to study vocal production, perception, cognition, and social interactions. The purpose of this project is to provide an initial assessment on the vocal repertoire of a marmoset colony raised at Arizona State University and call types they use in different social conditions. The vocal production of a colony of 16 marmoset monkeys was recorded in 3 different conditions with three repeats of each condition. The positive condition involves a caretaker distributing food, the negative condition involves an experimenter taking a marmoset out of his cage to a different room, and the control condition is the normal state of the colony with no human interference. A total of 5396 samples of calls were collected during a total of 256 minutes of audio recordings. Call types were analyzed in semi-automated computer programs developed in the Laboratory of Auditory Computation and Neurophysiology. A total of 5 major call types were identified and their variants in different social conditions were analyzed. The results showed that the total number of calls and the type of calls made differed in the three social conditions, suggesting that monkey vocalization signals and depends on the social context.
ContributorsFernandez, Jessmin Natalie (Author) / Zhou, Yi (Thesis director) / Berisha, Visar (Committee member) / School of International Letters and Cultures (Contributor) / Department of Psychology (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2019-05
133725-Thumbnail Image.png
Description
Detecting early signs of neurodegeneration is vital for measuring the efficacy of pharmaceuticals and planning treatments for neurological diseases. This is especially true for Amyotrophic Lateral Sclerosis (ALS) where differences in symptom onset can be indicative of the prognosis. Because it can be measured noninvasively, changes in speech production have

Detecting early signs of neurodegeneration is vital for measuring the efficacy of pharmaceuticals and planning treatments for neurological diseases. This is especially true for Amyotrophic Lateral Sclerosis (ALS) where differences in symptom onset can be indicative of the prognosis. Because it can be measured noninvasively, changes in speech production have been proposed as a promising indicator of neurological decline. However, speech changes are typically measured subjectively by a clinician. These perceptual ratings can vary widely between clinicians and within the same clinician on different patient visits, making clinical ratings less sensitive to subtle early indicators. In this paper, we propose an algorithm for the objective measurement of flutter, a quasi-sinusoidal modulation of fundamental frequency that manifests in the speech of some ALS patients. The algorithm detailed in this paper employs long-term average spectral analysis on the residual F0 track of a sustained phonation to detect the presence of flutter and is robust to longitudinal drifts in F0. The algorithm is evaluated on a longitudinal speech dataset of ALS patients at varying stages in their prognosis. Benchmarking with two stages of perceptual ratings provided by an expert speech pathologist indicate that the algorithm follows perceptual ratings with moderate accuracy and can objectively detect flutter in instances where the variability of the perceptual rating causes uncertainty.
ContributorsPeplinski, Jacob Scott (Author) / Berisha, Visar (Thesis director) / Liss, Julie (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
133225-Thumbnail Image.png
Description
Speech nasality disorders are characterized by abnormal resonance in the nasal cavity. Hypernasal speech is of particular interest, characterized by an inability to prevent improper nasalization of vowels, and poor articulation of plosive and fricative consonants, and can lead to negative communicative and social consequences. It can be associated with

Speech nasality disorders are characterized by abnormal resonance in the nasal cavity. Hypernasal speech is of particular interest, characterized by an inability to prevent improper nasalization of vowels, and poor articulation of plosive and fricative consonants, and can lead to negative communicative and social consequences. It can be associated with a range of conditions, including cleft lip or palate, velopharyngeal dysfunction (a physical or neurological defective closure of the soft palate that regulates resonance between the oral and nasal cavity), dysarthria, or hearing impairment, and can also be an early indicator of developing neurological disorders such as ALS. Hypernasality is typically scored perceptually by a Speech Language Pathologist (SLP). Misdiagnosis could lead to inadequate treatment plans and poor treatment outcomes for a patient. Also, for some applications, particularly screening for early neurological disorders, the use of an SLP is not practical. Hence this work demonstrates a data-driven approach to objective assessment of hypernasality, through the use of Goodness of Pronunciation features. These features capture the overall precision of articulation of speaker on a phoneme-by-phoneme basis, allowing demonstrated models to achieve a Pearson correlation coefficient of 0.88 on low-nasality speakers, the population of most interest for this sort of technique. These results are comparable to milestone methods in this domain.
ContributorsSaxon, Michael Stephen (Author) / Berisha, Visar (Thesis director) / McDaniel, Troy (Committee member) / Electrical Engineering Program (Contributor, Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
133868-Thumbnail Image.png
Description
Previous studies have shown that experimentally implemented formant perturbations result in production of compensatory responses in the opposite direction of the perturbations. In this study, we investigated how participants adapt to a) auditory perturbations that shift formants to a specific point in the vowel space and hence remove variability of

Previous studies have shown that experimentally implemented formant perturbations result in production of compensatory responses in the opposite direction of the perturbations. In this study, we investigated how participants adapt to a) auditory perturbations that shift formants to a specific point in the vowel space and hence remove variability of formants (focused perturbations), and b) auditory perturbations that preserve the natural variability of formants (uniform perturbations). We examined whether the degree of adaptation to focused perturbations was different from adaptation to uniform adaptations. We found that adaptation magnitude of the first formant (F1) was smaller in response to focused perturbations. However, F1 adaptation was initially moved in the same direction as the perturbation, and after several trials the F1 adaptation changed its course toward the opposite direction of the perturbation. We also found that adaptation of the second formant (F2) was smaller in response to focused perturbations than F2 responses to uniform perturbations. Overall, these results suggest that formant variability is an important component of speech, and that our central nervous system takes into account such variability to produce more accurate speech output.
ContributorsDittman, Jonathan William (Author) / Daliri, Ayoub (Thesis director) / Berisha, Visar (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
135475-Thumbnail Image.png
Description
Divergence functions are both highly useful and fundamental to many areas in information theory and machine learning, but require either parametric approaches or prior knowledge of labels on the full data set. This paper presents a method to estimate the divergence between two data sets in the absence of fully

Divergence functions are both highly useful and fundamental to many areas in information theory and machine learning, but require either parametric approaches or prior knowledge of labels on the full data set. This paper presents a method to estimate the divergence between two data sets in the absence of fully labeled data. This semi-labeled case is common in many domains where labeling data by hand is expensive or time-consuming, or wherever large data sets are present. The theory derived in this paper is demonstrated on a simulated example, and then applied to a feature selection and classification problem from pathological speech analysis.
ContributorsGilton, Davis Leland (Author) / Berisha, Visar (Thesis director) / Cochran, Douglas (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05
135457-Thumbnail Image.png
Description
This work details the bootstrap estimation of a nonparametric information divergence measure, the Dp divergence measure, using a power law model. To address the challenge posed by computing accurate divergence estimates given finite size data, the bootstrap approach is used in conjunction with a power law curve to calculate an

This work details the bootstrap estimation of a nonparametric information divergence measure, the Dp divergence measure, using a power law model. To address the challenge posed by computing accurate divergence estimates given finite size data, the bootstrap approach is used in conjunction with a power law curve to calculate an asymptotic value of the divergence estimator. Monte Carlo estimates of Dp are found for increasing values of sample size, and a power law fit is used to relate the divergence estimates as a function of sample size. The fit is also used to generate a confidence interval for the estimate to characterize the quality of the estimate. We compare the performance of this method with the other estimation methods. The calculated divergence is applied to the binary classification problem. Using the inherent relation between divergence measures and classification error rate, an analysis of the Bayes error rate of several data sets is conducted using the asymptotic divergence estimate.
ContributorsKadambi, Pradyumna Sanjay (Author) / Berisha, Visar (Thesis director) / Bliss, Daniel (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05