Search Content

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a…

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

ContributorsChakraborty, Shayok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Balasubramanian, Vineeth N. (Committee member) / Li, Baoxin (Committee member) / Mittelmann, Hans (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Enhancing movie comprehension for individuals who are visually impaired or blind

Description

Typically, the complete loss or severe impairment of a sense such as vision and/or hearing is compensated through sensory substitution, i.e., the use of an alternative sense for receiving the same information. For individuals who are blind or visually impaired, the alternative senses have predominantly been hearing and touch. For…

Typically, the complete loss or severe impairment of a sense such as vision and/or hearing is compensated through sensory substitution, i.e., the use of an alternative sense for receiving the same information. For individuals who are blind or visually impaired, the alternative senses have predominantly been hearing and touch. For movies, visual content has been made accessible to visually impaired viewers through audio descriptions -- an additional narration that describes scenes, the characters involved and other pertinent details. However, as audio descriptions should not overlap with dialogue, sound effects and musical scores, there is limited time to convey information, often resulting in stunted and abridged descriptions that leave out many important visual cues and concepts. This work proposes a promising multimodal approach to sensory substitution for movies by providing complementary information through haptics, pertaining to the positions and movements of actors, in addition to a film's audio description and audio content. In a ten-minute presentation of five movie clips to ten individuals who were visually impaired or blind, the novel methodology was found to provide an almost two time increase in the perception of actors' movements in scenes. Moreover, participants appreciated and found useful the overall concept of providing a visual perspective to film through haptics.

ContributorsViswanathan, Lakshmie Narayan (Author) / Panchanathan, Sethuraman (Thesis advisor) / Hedgpeth, Terri (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2011

Exploring the Design of Vibrotactile Cues for Visio-Haptic Sensory Substitution

Description

This paper presents the design and evaluation of a haptic interface for augmenting human-human interpersonal interactions by delivering facial expressions of an interaction partner to an individual who is blind using a visual-to-tactile mapping of facial action units and emotions. Pancake shaftless vibration motors are mounted on the back of…

This paper presents the design and evaluation of a haptic interface for augmenting human-human interpersonal interactions by delivering facial expressions of an interaction partner to an individual who is blind using a visual-to-tactile mapping of facial action units and emotions. Pancake shaftless vibration motors are mounted on the back of a chair to provide vibrotactile stimulation in the context of a dyadic (one-on-one) interaction across a table. This work explores the design of spatiotemporal vibration patterns that can be used to convey the basic building blocks of facial movements according to the Facial Action Unit Coding System. A behavioral study was conducted to explore the factors that influence the naturalness of conveying affect using vibrotactile cues.

ContributorsBala, Shantanu (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Department of Psychology (Contributor)

Created2014-05

Mediated social interpersonal communication: evidence-based understanding of multimedia solutions for enriching social situational awareness

Description

Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out of the elements of human interpersonal communication including both verbal…

Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out of the elements of human interpersonal communication including both verbal and non-verbal cues. While the verbal cues are instructive and delivered through speech, the non-verbal cues are mostly interpretive and requires the full attention of the participants to understand, comprehend and respond to them appropriately. Unfortunately certain situations are not conducive for a person to have complete access to their social surroundings, especially the non-verbal cues. For example, a person is who is blind or visually impaired may find that the non-verbal cues like smiling, head nod, eye contact, body gestures and facial expressions of their interaction partners are not accessible due to their sensory deprivation. The same could be said of people who are remotely engaged in a conversation and physically separated to have a visual access to one's body and facial mannerisms. This dissertation describes novel multimedia technologies to aid situations where it is necessary to mediate social situational information between interacting participants. As an example of the proposed system, an evidence-based model for understanding the accessibility problem faced by people who are blind or visually impaired is described in detail. From the derived model, a sleuth of sensing and delivery technologies that use state-of-the-art computer vision algorithms in combination with novel haptic interfaces are developed towards a) A Dyadic Interaction Assistant, capable of helping individuals who are blind to access important head and face based non-verbal communicative cues during one-on-one dyadic interactions, and b) A Group Interaction Assistant, capable of provide situational awareness about the interaction partners and their dynamics to a user who is blind, while also providing important social feedback about their own body mannerisms. The goal is to increase the effective social situational information that one has access to, with the conjuncture that a good awareness of one's social surroundings gives them the ability to understand and empathize with their interaction partners better. Extending the work from an important social interaction assistive technology, the need for enriched social situational awareness is everyday professional situations are also discussed, including, a) enriched remote interactions between physically separated interaction partners, and b) enriched communication between medical professionals during critical care procedures, towards enhanced patient safety. In the concluding remarks, this dissertation engages the readers into a science and technology policy discussion on the potential effect of a new technology like the social interaction assistant on the society. Discussing along the policy lines, social disability is highlighted as an important area that requires special attention from researchers and policy makers. Given that the proposed technology relies on wearable inconspicuous cameras, the discussion of privacy policies is extended to encompass newly evolving interpersonal interaction recorders, like the one presented in this dissertation.

ContributorsKrishna, Sreekar (Author) / Panchanathan, Sethuraman (Thesis advisor) / Black, John A. (Committee member) / Qian, Gang (Committee member) / Li, Baoxin (Committee member) / Shiota, Michelle (Committee member) / Arizona State University (Publisher)

Created2011

Conformal predictions in multimedia pattern recognition

Description

The fields of pattern recognition and machine learning are on a fundamental quest to design systems that can learn the way humans do. One important aspect of human intelligence that has so far not been given sufficient attention is the capability of humans to express when they are certain about…

The fields of pattern recognition and machine learning are on a fundamental quest to design systems that can learn the way humans do. One important aspect of human intelligence that has so far not been given sufficient attention is the capability of humans to express when they are certain about a decision, or when they are not. Machine learning techniques today are not yet fully equipped to be trusted with this critical task. This work seeks to address this fundamental knowledge gap. Existing approaches that provide a measure of confidence on a prediction such as learning algorithms based on the Bayesian theory or the Probably Approximately Correct theory require strong assumptions or often produce results that are not practical or reliable. The recently developed Conformal Predictions (CP) framework - which is based on the principles of hypothesis testing, transductive inference and algorithmic randomness - provides a game-theoretic approach to the estimation of confidence with several desirable properties such as online calibration and generalizability to all classification and regression methods. This dissertation builds on the CP theory to compute reliable confidence measures that aid decision-making in real-world problems through: (i) Development of a methodology for learning a kernel function (or distance metric) for optimal and accurate conformal predictors; (ii) Validation of the calibration properties of the CP framework when applied to multi-classifier (or multi-regressor) fusion; and (iii) Development of a methodology to extend the CP framework to continuous learning, by using the framework for online active learning. These contributions are validated on four real-world problems from the domains of healthcare and assistive technologies: two classification-based applications (risk prediction in cardiac decision support and multimodal person recognition), and two regression-based applications (head pose estimation and saliency prediction in images). The results obtained show that: (i) multiple kernel learning can effectively increase efficiency in the CP framework; (ii) quantile p-value combination methods provide a viable solution for fusion in the CP framework; and (iii) eigendecomposition of p-value difference matrices can serve as effective measures for online active learning; demonstrating promise and potential in using these contributions in multimedia pattern recognition problems in real-world settings.

ContributorsNallure Balasubramanian, Vineeth (Author) / Panchanathan, Sethuraman (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Vovk, Vladimir (Committee member) / Arizona State University (Publisher)

Created2010

Modern Sensory Substitution for Vision in Dynamic Environments

Description

Societal infrastructure is built with vision at the forefront of daily life. For those with

severe visual impairments, this creates countless barriers to the participation and

enjoyment of life’s opportunities. Technological progress has been both a blessing and

a curse in this regard. Digital text together with screen readers and refreshable Braille

displays have…

Societal infrastructure is built with vision at the forefront of daily life. For those with

severe visual impairments, this creates countless barriers to the participation and

enjoyment of life’s opportunities. Technological progress has been both a blessing and

a curse in this regard. Digital text together with screen readers and refreshable Braille

displays have made whole libraries readily accessible and rideshare tech has made

independent mobility more attainable. Simultaneously, screen-based interactions and

experiences have only grown in pervasiveness and importance, precluding many of

those with visual impairments.

Sensory Substituion, the process of substituting an unavailable modality with

another one, has shown promise as an alternative to accomodation, but in recent

years meaningful strides in Sensory Substitution for vision have declined in frequency.

Given recent advances in Computer Vision, this stagnation is especially disconcerting.

Designing Sensory Substitution Devices (SSDs) for vision for use in interactive settings

that leverage modern Computer Vision techniques presents a variety of challenges

including perceptual bandwidth, human-computer-interaction, and person-centered

machine learning considerations. To surmount these barriers an approach called Per-

sonal Foveated Haptic Gaze (PFHG), is introduced. PFHG consists of two primary

components: a human visual system inspired interaction paradigm that is intuitive

and flexible enough to generalize to a variety of applications called Foveated Haptic

Gaze (FHG), and a person-centered learning component to address the expressivity

limitations of most SSDs. This component is called One-Shot Object Detection by

Data Augmentation (1SODDA), a one-shot object detection approach that allows a

user to specify the objects they are interested in locating visually and with minimal

effort realizing an object detection model that does so effectively.

The Personal Foveated Haptic Gaze framework was realized in a virtual and real-

world application: playing a 3D, interactive, first person video game (DOOM) and

finding user-specified real-world objects. User study results found Foveated Haptic

Gaze to be an effective and intuitive interface for interacting with dynamic visual

world using solely haptics. Additionally, 1SODDA achieves competitive performance

among few-shot object detection methods and high-framerate many-shot object de-

tectors. The combination of which paves the way for modern Sensory Substitution

Devices for vision.

ContributorsFakhri, Bijan (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy L (Committee member) / Venkateswara, Hemanth (Committee member) / Amor, Heni (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by

Batch mode active learning for multimedia pattern recognition

Enhancing movie comprehension for individuals who are visually impaired or blind

Exploring the Design of Vibrotactile Cues for Visio-Haptic Sensory Substitution

Mediated social interpersonal communication: evidence-based understanding of multimedia solutions for enriching social situational awareness

Conformal predictions in multimedia pattern recognition

Modern Sensory Substitution for Vision in Dynamic Environments