This collection includes both ASU Theses and Dissertations, submitted by graduate students, and the Barrett, Honors College theses submitted by undergraduate students. 

Displaying 1 - 3 of 3
Filtering by

Clear all filters

149922-Thumbnail Image.png
Description
Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or low-level data entities. Also, additional domain knowledge may often be indispensable for uncovering the underlying semantics, but in most cases such domain knowledge is not readily available from the acquired media streams. Thus, making use of various types of contextual information and leveraging corresponding domain knowledge are vital for effectively associating high-level semantics with low-level signals with higher accuracies in multimedia computing problems. In this work, novel computational methods are explored and developed for incorporating contextual information/domain knowledge in different forms for multimedia computing and pattern recognition problems. Specifically, a novel Bayesian approach with statistical-sampling-based inference is proposed for incorporating a special type of domain knowledge, spatial prior for the underlying shapes; cross-modality correlations via Kernel Canonical Correlation Analysis is explored and the learnt space is then used for associating multimedia contents in different forms; model contextual information as a graph is leveraged for regulating interactions among high-level semantic concepts (e.g., category labels), low-level input signal (e.g., spatial/temporal structure). Four real-world applications, including visual-to-tactile face conversion, photo tag recommendation, wild web video classification and unconstrained consumer video summarization, are selected to demonstrate the effectiveness of the approaches. These applications range from classic research challenges to emerging tasks in multimedia computing. Results from experiments on large-scale real-world data with comparisons to other state-of-the-art methods and subjective evaluations with end users confirmed that the developed approaches exhibit salient advantages, suggesting that they are promising for leveraging contextual information/domain knowledge for a wide range of multimedia computing and pattern recognition problems.
ContributorsWang, Zhesheng (Author) / Li, Baoxin (Thesis advisor) / Sundaram, Hari (Committee member) / Qian, Gang (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2011
149621-Thumbnail Image.png
Description
Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out of the elements of human interpersonal communication including both verbal

Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out of the elements of human interpersonal communication including both verbal and non-verbal cues. While the verbal cues are instructive and delivered through speech, the non-verbal cues are mostly interpretive and requires the full attention of the participants to understand, comprehend and respond to them appropriately. Unfortunately certain situations are not conducive for a person to have complete access to their social surroundings, especially the non-verbal cues. For example, a person is who is blind or visually impaired may find that the non-verbal cues like smiling, head nod, eye contact, body gestures and facial expressions of their interaction partners are not accessible due to their sensory deprivation. The same could be said of people who are remotely engaged in a conversation and physically separated to have a visual access to one's body and facial mannerisms. This dissertation describes novel multimedia technologies to aid situations where it is necessary to mediate social situational information between interacting participants. As an example of the proposed system, an evidence-based model for understanding the accessibility problem faced by people who are blind or visually impaired is described in detail. From the derived model, a sleuth of sensing and delivery technologies that use state-of-the-art computer vision algorithms in combination with novel haptic interfaces are developed towards a) A Dyadic Interaction Assistant, capable of helping individuals who are blind to access important head and face based non-verbal communicative cues during one-on-one dyadic interactions, and b) A Group Interaction Assistant, capable of provide situational awareness about the interaction partners and their dynamics to a user who is blind, while also providing important social feedback about their own body mannerisms. The goal is to increase the effective social situational information that one has access to, with the conjuncture that a good awareness of one's social surroundings gives them the ability to understand and empathize with their interaction partners better. Extending the work from an important social interaction assistive technology, the need for enriched social situational awareness is everyday professional situations are also discussed, including, a) enriched remote interactions between physically separated interaction partners, and b) enriched communication between medical professionals during critical care procedures, towards enhanced patient safety. In the concluding remarks, this dissertation engages the readers into a science and technology policy discussion on the potential effect of a new technology like the social interaction assistant on the society. Discussing along the policy lines, social disability is highlighted as an important area that requires special attention from researchers and policy makers. Given that the proposed technology relies on wearable inconspicuous cameras, the discussion of privacy policies is extended to encompass newly evolving interpersonal interaction recorders, like the one presented in this dissertation.
ContributorsKrishna, Sreekar (Author) / Panchanathan, Sethuraman (Thesis advisor) / Black, John A. (Committee member) / Qian, Gang (Committee member) / Li, Baoxin (Committee member) / Shiota, Michelle (Committee member) / Arizona State University (Publisher)
Created2011
171764-Thumbnail Image.png
Description
This dissertation constructs a new computational processing framework to robustly and precisely quantify retinotopic maps based on their angle distortion properties. More generally, this framework solves the problem of how to robustly and precisely quantify (angle) distortions of noisy or incomplete (boundary enclosed) 2-dimensional surface to surface mappings. This framework

This dissertation constructs a new computational processing framework to robustly and precisely quantify retinotopic maps based on their angle distortion properties. More generally, this framework solves the problem of how to robustly and precisely quantify (angle) distortions of noisy or incomplete (boundary enclosed) 2-dimensional surface to surface mappings. This framework builds upon the Beltrami Coefficient (BC) description of quasiconformal mappings that directly quantifies local mapping (circles to ellipses) distortions between diffeomorphisms of boundary enclosed plane domains homeomorphic to the unit disk. A new map called the Beltrami Coefficient Map (BCM) was constructed to describe distortions in retinotopic maps. The BCM can be used to fully reconstruct the original target surface (retinal visual field) of retinotopic maps. This dissertation also compared retinotopic maps in the visual processing cascade, which is a series of connected retinotopic maps responsible for visual data processing of physical images captured by the eyes. By comparing the BCM results from a large Human Connectome project (HCP) retinotopic dataset (N=181), a new computational quasiconformal mapping description of the transformed retinal image as it passes through the cascade is proposed, which is not present in any current literature. The description applied on HCP data provided direct visible and quantifiable geometric properties of the cascade in a way that has not been observed before. Because retinotopic maps are generated from in vivo noisy functional magnetic resonance imaging (fMRI), quantifying them comes with a certain degree of uncertainty. To quantify the uncertainties in the quantification results, it is necessary to generate statistical models of retinotopic maps from their BCMs and raw fMRI signals. Considering that estimating retinotopic maps from real noisy fMRI time series data using the population receptive field (pRF) model is a time consuming process, a convolutional neural network (CNN) was constructed and trained to predict pRF model parameters from real noisy fMRI data
ContributorsTa, Duyan Nguyen (Author) / Wang, Yalin (Thesis advisor) / Lu, Zhong-Lin (Committee member) / Hansford, Dianne (Committee member) / Liu, Huan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2022