Search Content

Audio processing and loudness estimation algorithms with iOS simulations

Description

The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters…

The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP." These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students.

ContributorsKalyanasundaram, Girish (Author) / Spanias, Andreas S (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2013

Analytical control grid registration for efficient application of optical flow

Description

Image resolution limits the extent to which zooming enhances clarity, restricts the size digital photographs can be printed at, and, in the context of medical images, can prevent a diagnosis. Interpolation is the supplementing of known data with estimated values based on a function or model involving some or all…

Image resolution limits the extent to which zooming enhances clarity, restricts the size digital photographs can be printed at, and, in the context of medical images, can prevent a diagnosis. Interpolation is the supplementing of known data with estimated values based on a function or model involving some or all of the known samples. The selection of the contributing data points and the specifics of how they are used to define the interpolated values influences how effectively the interpolation algorithm is able to estimate the underlying, continuous signal. The main contributions of this dissertation are three fold: 1) Reframing edge-directed interpolation of a single image as an intensity-based registration problem. 2) Providing an analytical framework for intensity-based registration using control grid constraints. 3) Quantitative assessment of the new, single-image enlargement algorithm based on analytical intensity-based registration. In addition to single image resizing, the new methods and analytical approaches were extended to address a wide range of applications including volumetric (multi-slice) image interpolation, video deinterlacing, motion detection, and atmospheric distortion correction. Overall, the new approaches generate results that more accurately reflect the underlying signals than less computationally demanding approaches and with lower processing requirements and fewer restrictions than methods with comparable accuracy.

ContributorsZwart, Christine M. (Author) / Frakes, David H (Thesis advisor) / Karam, Lina (Committee member) / Kodibagkar, Vikram (Committee member) / Spanias, Andreas (Committee member) / Towe, Bruce (Committee member) / Arizona State University (Publisher)

Created2013

Upper body motion analysis using kinect for stroke rehabilitation at the home

Description

Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera -…

Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera - Kinect is presented. We address this problem by ﬁrst conducting a systematic analysis of the usability of Kinect for motion analysis in stroke rehabilitation. Then a hybrid upper body tracking approach is proposed which combines off-the-shelf skeleton tracking with a novel depth-fused mean shift tracking method. We proposed several kinematic features reliably extracted from the proposed inexpensive and portable motion capture system and classiﬁers that correlate torso movement to clinical measures of unimpaired and impaired. Experiment results show that the proposed sensing and analysis works reliably on measuring torso movement quality and is promising for end-point tracking. The system is currently being deployed for large-scale evaluations.

ContributorsDu, Tingfang (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Rikakis, Thanassis (Committee member) / Arizona State University (Publisher)

Created2012

Performance characterization of communication channels through asymptotic and partial ordering analysis

Description

Asymptotic comparisons of ergodic channel capacity at high and low signal-to-noise ratios (SNRs) are provided for several adaptive transmission schemes over fading channels with general distributions, including optimal power and rate adaptation, rate adaptation only, channel inversion and its variants. Analysis of the high-SNR pre-log constants of the ergodic capacity…

Asymptotic comparisons of ergodic channel capacity at high and low signal-to-noise ratios (SNRs) are provided for several adaptive transmission schemes over fading channels with general distributions, including optimal power and rate adaptation, rate adaptation only, channel inversion and its variants. Analysis of the high-SNR pre-log constants of the ergodic capacity reveals the existence of constant capacity difference gaps among the schemes with a pre-log constant of 1. Closed-form expressions for these high-SNR capacity difference gaps are derived, which are proportional to the SNR loss between these schemes in dB scale. The largest one of these gaps is found to be between the optimal power and rate adaptation scheme and the channel inversion scheme. Based on these expressions it is shown that the presence of space diversity or multi-user diversity makes channel inversion arbitrarily close to achieving optimal capacity at high SNR with sufficiently large number of antennas or users. A low-SNR analysis also reveals that the presence of fading provably always improves capacity at sufficiently low SNR, compared to the additive white Gaussian noise (AWGN) case. Numerical results are shown to corroborate our analytical results. This dissertation derives high-SNR asymptotic average error rates over fading channels by relating them to the outage probability, under mild assumptions. The analysis is based on the Tauberian theorem for Laplace-Stieltjes transforms which is grounded on the notion of regular variation, and applies to a wider range of channel distributions than existing approaches. The theory of regular variation is argued to be the proper mathematical framework for finding sufficient and necessary conditions for outage events to dominate high-SNR error rate performance. It is proved that the diversity order being d and the cumulative distribution function (CDF) of the channel power gain having variation exponent d at 0 imply each other, provided that the instantaneous error rate is upper-bounded by an exponential function of the instantaneous SNR. High-SNR asymptotic average error rates are derived for specific instantaneous error rates. Compared to existing approaches in the literature, the asymptotic expressions are related to the channel distribution in a much simpler manner herein, and related with outage more intuitively. The high-SNR asymptotic error rate is also characterized under diversity combining schemes with the channel power gain of each branch having a regularly varying CDF. Numerical results are shown to corroborate our theoretical analysis. This dissertation studies several problems concerning channel inclusion, which is a partial ordering between discrete memoryless channels (DMCs) proposed by Shannon. Specifically, majorization-based conditions are derived for channel inclusion between certain DMCs. Furthermore, under general conditions, channel equivalence defined through Shannon ordering is shown to be the same as permutation of input and output symbols. The determination of channel inclusion is considered as a convex optimization problem, and the sparsity of the weights related to the representation of the worse DMC in terms of the better one is revealed when channel inclusion holds between two DMCs. For the exploitation of this sparsity, an effective iterative algorithm is established based on modifying the orthogonal matching pursuit algorithm. The extension of channel inclusion to continuous channels and its application in ordering phase noises are briefly addressed.

ContributorsZhang, Yuan (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Zhang, Junshan (Committee member) / Reisslein, Martin (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2013

Understanding the processing of degraded speech: electroencephalographic measures as a surrogate for recovery from concussion

Description

The recent spotlight on concussion has illuminated deficits in the current standard of care with regard to addressing acute and persistent cognitive signs and symptoms of mild brain injury. This stems, in part, from the diffuse nature of the injury, which tends not to produce focal cognitive or behavioral deficits…

The recent spotlight on concussion has illuminated deficits in the current standard of care with regard to addressing acute and persistent cognitive signs and symptoms of mild brain injury. This stems, in part, from the diffuse nature of the injury, which tends not to produce focal cognitive or behavioral deficits that are easily identified or tracked. Indeed it has been shown that patients with enduring symptoms have difficulty describing their problems; therefore, there is an urgent need for a sensitive measure of brain activity that corresponds with higher order cognitive processing. The development of a neurophysiological metric that maps to clinical resolution would inform decisions about diagnosis and prognosis, including the need for clinical intervention to address cognitive deficits. The literature suggests the need for assessment of concussion under cognitively demanding tasks. Here, a joint behavioral- high-density electroencephalography (EEG) paradigm was employed. This allows for the examination of cortical activity patterns during speech comprehension at various levels of degradation in a sentence verification task, imposing the need for higher-order cognitive processes. Eight participants with concussion listened to true-false sentences produced with either moderately to highly intelligible noise-vocoders. Behavioral data were simultaneously collected. The analysis of cortical activation patterns included 1) the examination of event-related potentials, including latency and source localization, and 2) measures of frequency spectra and associated power. Individual performance patterns were assessed during acute injury and a return visit several months following injury. Results demonstrate a combination of task-related electrophysiology measures correspond to changes in task performance during the course of recovery. Further, a discriminant function analysis suggests EEG measures are more sensitive than behavioral measures in distinguishing between individuals with concussion and healthy controls at both injury and recovery, suggesting the robustness of neurophysiological measures during a cognitively demanding task to both injury and persisting pathophysiology.

ContributorsUtianski, Rene (Author) / Liss, Julie M (Thesis advisor) / Berisha, Visar (Committee member) / Caviness, John N (Committee member) / Dorman, Michael (Committee member) / Arizona State University (Publisher)

Created2014

Large-scale wireless networks: stochastic geometry and ordering

Description

Recently, the location of the nodes in wireless networks has been modeled as point processes. In this dissertation, various scenarios of wireless communications in large-scale networks modeled as point processes are considered. The first part of the dissertation considers signal reception and detection problems with symmetric alpha stable noise which…

Recently, the location of the nodes in wireless networks has been modeled as point processes. In this dissertation, various scenarios of wireless communications in large-scale networks modeled as point processes are considered. The first part of the dissertation considers signal reception and detection problems with symmetric alpha stable noise which is from an interfering network modeled as a Poisson point process. For the signal reception problem, the performance of space-time coding (STC) over fading channels with alpha stable noise is studied. We derive pairwise error probability (PEP) of orthogonal STCs. For general STCs, we propose a maximum-likelihood (ML) receiver, and its approximation. The resulting asymptotically optimal receiver (AOR) does not depend on noise parameters and is computationally simple, and close to the ML performance. Then, signal detection in coexisting wireless sensor networks (WSNs) is considered. We define a binary hypothesis testing problem for the signal detection in coexisting WSNs. For the problem, we introduce the ML detector and simpler alternatives. The proposed mixed-fractional lower order moment (FLOM) detector is computationally simple and close to the ML performance. Stochastic orders are binary relations defined on probability. The second part of the dissertation introduces stochastic ordering of interferences in large-scale networks modeled as point processes. Since closed-form results for the interference distributions for such networks are only available in limited cases, it is of interest to compare network interferences using stochastic. In this dissertation, conditions on the fading distribution and path-loss model are given to establish stochastic ordering between interferences. Moreover, Laplace functional (LF) ordering is defined between point processes and applied for comparing interference. Then, the LF orderings of general classes of point processes are introduced. It is also shown that the LF ordering is preserved when independent operations such as marking, thinning, random translation, and superposition are applied. The LF ordering of point processes is a useful tool for comparing spatial deployments of wireless networks and can be used to establish comparisons of several performance metrics such as coverage probability, achievable rate, and resource allocation even when closed form expressions for such metrics are unavailable.

ContributorsLee, Junghoon (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Spanias, Andreas (Committee member) / Reisslein, Martin (Committee member) / Kosut, Oliver (Committee member) / Arizona State University (Publisher)

Created2014

Geometry aware compressive analysis of human activities: application in a smart phone platform

Description

Continuous monitoring of sensor data from smart phones to identify human activities and gestures, puts a heavy load on the smart phone's power consumption. In this research study, the non-Euclidean geometry of the rich sensor data obtained from the user's smart phone is utilized to perform compressive analysis and efficient…

Continuous monitoring of sensor data from smart phones to identify human activities and gestures, puts a heavy load on the smart phone's power consumption. In this research study, the non-Euclidean geometry of the rich sensor data obtained from the user's smart phone is utilized to perform compressive analysis and efficient classification of human activities by employing machine learning techniques. We are interested in the generalization of classical tools for signal approximation to newer spaces, such as rotation data, which is best studied in a non-Euclidean setting, and its application to activity analysis. Attributing to the non-linear nature of the rotation data space, which involve a heavy overload on the smart phone's processor and memory as opposed to feature extraction on the Euclidean space, indexing and compaction of the acquired sensor data is performed prior to feature extraction, to reduce CPU overhead and thereby increase the lifetime of the battery with a little loss in recognition accuracy of the activities. The sensor data represented as unit quaternions, is a more intrinsic representation of the orientation of smart phone compared to Euler angles (which suffers from Gimbal lock problem) or the computationally intensive rotation matrices. Classification algorithms are employed to classify these manifold sequences in the non-Euclidean space. By performing customized indexing (using K-means algorithm) of the evolved manifold sequences before feature extraction, considerable energy savings is achieved in terms of smart phone's battery life.

ContributorsSivakumar, Aswin (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2014

Audiovisual perception of dysarthric speech in older adults compared to younger adults

Description

Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults…

Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults participated in the study that measured intelligibility (percent words correct) of dysarthric speech in auditory versus audiovisual conditions. Participants were then separated into two groups: older adults (age range 47 to 68) and young adults (age range 19 to 36) to examine the influence of age. Findings revealed that all participants, regardless of age, improved their ability to recognize dysarthric speech when visual speech was added to the auditory signal. The magnitude of this benefit, however, was greater for older adults when compared with younger adults. These results inform our understanding of how visual speech information influences understanding of dysarthric speech.

ContributorsFall, Elizabeth (Author) / Liss, Julie (Thesis advisor) / Berisha, Visar (Committee member) / Gray, Shelley (Committee member) / Arizona State University (Publisher)

Created2014

Joint radar-communications performance bounds: data versus estimation information rates

Description

The problem of cooperative radar and communications signaling is investigated. Each system typically considers the other system a source of interference. Consequently, the tradition is to have them operate in orthogonal frequency bands. By considering the radar and communications operations to be a single joint system, performance bounds on a…

The problem of cooperative radar and communications signaling is investigated. Each system typically considers the other system a source of interference. Consequently, the tradition is to have them operate in orthogonal frequency bands. By considering the radar and communications operations to be a single joint system, performance bounds on a receiver that observes communications and radar return in the same frequency allocation are derived. Bounds in performance of the joint system is measured in terms of data information rate for communications and radar estimation information rate for the radar. Inner bounds on performance are constructed.

ContributorsChiriyath, Alex (Author) / Bliss, Daniel W (Thesis advisor) / Kosut, Oliver (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2014

Head rotation detection in marmoset monkeys

Description

Head movement is known to have the benefit of improving the accuracy of sound localization for humans and animals. Marmoset is a small bodied New World monkey species and it has become an emerging model for studying the auditory functions. This thesis aims to detect the horizontal and vertical…

Head movement is known to have the benefit of improving the accuracy of sound localization for humans and animals. Marmoset is a small bodied New World monkey species and it has become an emerging model for studying the auditory functions. This thesis aims to detect the horizontal and vertical rotation of head movement in marmoset monkeys.

Experiments were conducted in a sound-attenuated acoustic chamber. Head movement of marmoset monkey was studied under various auditory and visual stimulation conditions. With increasing complexity, these conditions are (1) idle, (2) sound-alone, (3) sound and visual signals, and (4) alert signal by opening and closing of the chamber door. All of these conditions were tested with either house light on or off. Infra-red camera with a frame rate of 90 Hz was used to capture of the head movement of monkeys. To assist the signal detection, two circular markers were attached to the top of monkey head. The data analysis used an image-based marker detection scheme. Images were processed using the Computation Vision Toolbox in Matlab. The markers and their positions were detected using blob detection techniques. Based on the frame-by-frame information of marker positions, the angular position, velocity and acceleration were extracted in horizontal and vertical planes. Adaptive Otsu Thresholding, Kalman filtering and bound setting for marker properties were used to overcome a number of challenges encountered during this analysis, such as finding image segmentation threshold, continuously tracking markers during large head movement, and false alarm detection.

The results show that the blob detection method together with Kalman filtering yielded better performances than other image based techniques like optical flow and SURF features .The median of the maximal head turn in the horizontal plane was in the range of 20 to 70 degrees and the median of the maximal velocity in horizontal plane was in the range of a few hundreds of degrees per second. In comparison, the natural alert signal - door opening and closing - evoked the faster head turns than other stimulus conditions. These results suggest that behaviorally relevant stimulus such as alert signals evoke faster head-turn responses in marmoset monkeys.

ContributorsSimhadri, Sravanthi (Author) / Zhou, Yi (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2014

Theses and Dissertations