Search Content

Let's Talk Monkey- Quantitative Analysis of Marmoset Monkey Calls

Description

The marmoset monkey (Callithrix jacchus) is a new-world primate species native to South America rainforests. Because they rely on vocal communication to navigate and survive, marmosets have evolved as a promising primate model to study vocal production, perception, cognition, and social interactions. The purpose of this project is to provide…

The marmoset monkey (Callithrix jacchus) is a new-world primate species native to South America rainforests. Because they rely on vocal communication to navigate and survive, marmosets have evolved as a promising primate model to study vocal production, perception, cognition, and social interactions. The purpose of this project is to provide an initial assessment on the vocal repertoire of a marmoset colony raised at Arizona State University and call types they use in different social conditions. The vocal production of a colony of 16 marmoset monkeys was recorded in 3 different conditions with three repeats of each condition. The positive condition involves a caretaker distributing food, the negative condition involves an experimenter taking a marmoset out of his cage to a different room, and the control condition is the normal state of the colony with no human interference. A total of 5396 samples of calls were collected during a total of 256 minutes of audio recordings. Call types were analyzed in semi-automated computer programs developed in the Laboratory of Auditory Computation and Neurophysiology. A total of 5 major call types were identified and their variants in different social conditions were analyzed. The results showed that the total number of calls and the type of calls made differed in the three social conditions, suggesting that monkey vocalization signals and depends on the social context.

ContributorsFernandez, Jessmin Natalie (Author) / Zhou, Yi (Thesis director) / Berisha, Visar (Committee member) / School of International Letters and Cultures (Contributor) / Department of Psychology (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Marmoset Calls Labeling

Description

Callithrix jacchus, also known as a common marmoset, is native to the new world. These marmosets possess a wide range of vocal repertoire that is interesting to observe for the purpose of understanding their group communication and their fight or flight responses to the environment around them. In this project,…

Callithrix jacchus, also known as a common marmoset, is native to the new world. These marmosets possess a wide range of vocal repertoire that is interesting to observe for the purpose of understanding their group communication and their fight or flight responses to the environment around them. In this project, I am continuing with the project that a previous student, Jasmin, had done to find more data for her study. For the most part, my project entailed recording and labeling the marmoset’s calls into different types.

ContributorsTran, Anh (Author) / Zhou, Yi (Thesis director) / Berisha, Visar (Committee member) / Barrett, The Honors College (Contributor)

Created2021-05

Music-Remixing Preferences of Prelingual and Postlingual Cochlear Implant Users

Description

The poor spectral and temporal resolution of cochlear implants (CIs) limit their users’ music enjoyment. Remixing music by boosting vocals while attenuating spectrally complex instruments has been shown to benefit music enjoyment of postlingually deaf CI users. However, the effectiveness of music remixing in prelingually deaf CI users is still…

The poor spectral and temporal resolution of cochlear implants (CIs) limit their users’ music enjoyment. Remixing music by boosting vocals while attenuating spectrally complex instruments has been shown to benefit music enjoyment of postlingually deaf CI users. However, the effectiveness of music remixing in prelingually deaf CI users is still unknown. This study compared the music-remixing preferences of nine postlingually deaf, late-implanted CI users and seven prelingually deaf, early-implanted CI users, as well as their ratings of song familiarity and vocal pleasantness. Twelve songs were selected from the most streamed tracks on Spotify for testing. There were six remixed versions of each song: Original, Music-6 (6-dB attenuation of all instruments), Music-12 (12-dB attenuation of all instruments), Music-3-3-12 (3-dB attenuation of bass and drums and 12-dB attenuation of other instruments), Vocals-6 (6-dB attenuation of vocals), and Vocals-12 (12-dB attenuation of vocals). It was found that the prelingual group preferred the Music-6 and Original versions over the other versions, while the postlingual group preferred the Vocals-12 version over the Music-12 version. The prelingual group was more familiar with the songs than the postlingual group. However, the song familiarity rating did not significantly affect the patterns of preference ratings in each group. The prelingual group also had higher vocal pleasantness ratings than the postlingual group. For the prelingual group, higher vocal pleasantness led to higher preference ratings for the Music-12 version. For the postlingual group, their overall preference for the Vocals-12 version was driven by their preference ratings for songs with very unpleasant vocals. These results suggest that the patient factor of auditory experience and stimulus factor of vocal pleasantness may affect the music-remixing preferences of CI users. As such, the music-remixing strategy needs to be customized for individual patients and songs.

ContributorsVecellio, Amanda Paige (Author) / Luo, Xin (Thesis advisor) / Ringenbach, Shannon (Committee member) / Berisha, Visar (Committee member) / Zhou, Yi (Committee member) / Arizona State University (Publisher)

Created2024

Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

Description

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such…

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.

This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs.

ContributorsSrivastava, Gaurav (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2018

Head rotation detection in marmoset monkeys

Description

Head movement is known to have the benefit of improving the accuracy of sound localization for humans and animals. Marmoset is a small bodied New World monkey species and it has become an emerging model for studying the auditory functions. This thesis aims to detect the horizontal and vertical…

Head movement is known to have the benefit of improving the accuracy of sound localization for humans and animals. Marmoset is a small bodied New World monkey species and it has become an emerging model for studying the auditory functions. This thesis aims to detect the horizontal and vertical rotation of head movement in marmoset monkeys.

Experiments were conducted in a sound-attenuated acoustic chamber. Head movement of marmoset monkey was studied under various auditory and visual stimulation conditions. With increasing complexity, these conditions are (1) idle, (2) sound-alone, (3) sound and visual signals, and (4) alert signal by opening and closing of the chamber door. All of these conditions were tested with either house light on or off. Infra-red camera with a frame rate of 90 Hz was used to capture of the head movement of monkeys. To assist the signal detection, two circular markers were attached to the top of monkey head. The data analysis used an image-based marker detection scheme. Images were processed using the Computation Vision Toolbox in Matlab. The markers and their positions were detected using blob detection techniques. Based on the frame-by-frame information of marker positions, the angular position, velocity and acceleration were extracted in horizontal and vertical planes. Adaptive Otsu Thresholding, Kalman filtering and bound setting for marker properties were used to overcome a number of challenges encountered during this analysis, such as finding image segmentation threshold, continuously tracking markers during large head movement, and false alarm detection.

The results show that the blob detection method together with Kalman filtering yielded better performances than other image based techniques like optical flow and SURF features .The median of the maximal head turn in the horizontal plane was in the range of 20 to 70 degrees and the median of the maximal velocity in horizontal plane was in the range of a few hundreds of degrees per second. In comparison, the natural alert signal - door opening and closing - evoked the faster head turns than other stimulus conditions. These results suggest that behaviorally relevant stimulus such as alert signals evoke faster head-turn responses in marmoset monkeys.

ContributorsSimhadri, Sravanthi (Author) / Zhou, Yi (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2014

Context recognition methods using audio signals for human-machine interaction

Description

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents…

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.

ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2015

Theses and Dissertations

Filtering by

Let's Talk Monkey- Quantitative Analysis of Marmoset Monkey Calls

Marmoset Calls Labeling

Music-Remixing Preferences of Prelingual and Postlingual Cochlear Implant Users

Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

Head rotation detection in marmoset monkeys

Context recognition methods using audio signals for human-machine interaction