Search Content

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a…

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

ContributorsChakraborty, Shayok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Balasubramanian, Vineeth N. (Committee member) / Li, Baoxin (Committee member) / Mittelmann, Hans (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Building adaptive computational systems for physiological and biomedical data

Description

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems.…

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems. The greatest challenge in developing such systems is the subject-dependent data variations or subject-based variability in physiological and biomedical data, which leads to difference in data distributions making the task of modeling these data, using traditional machine learning algorithms, complex and challenging. As a result, despite the wide application of machine learning, efficient deployment of its principles to model real-world data is still a challenge. This dissertation addresses the problem of subject based variability in physiological and biomedical data and proposes person adaptive prediction models based on novel transfer and active learning algorithms, an emerging field in machine learning. One of the significant contributions of this dissertation is a person adaptive method, for early detection of muscle fatigue using Surface Electromyogram signals, based on a new multi-source transfer learning algorithm. This dissertation also proposes a subject-independent algorithm for grading the progression of muscle fatigue from 0 to 1 level in a test subject, during isometric or dynamic contractions, at real-time. Besides subject based variability, biomedical image data also varies due to variations in their imaging techniques, leading to distribution differences between the image databases. Hence a classifier learned on one database may perform poorly on the other database. Another significant contribution of this dissertation has been the design and development of an efficient biomedical image data annotation framework, based on a novel combination of transfer learning and a new batch-mode active learning method, capable of addressing the distribution differences across databases. The methodologies developed in this dissertation are relevant and applicable to a large set of computing problems where there is a high variation of data between subjects or sources, such as face detection, pose detection and speech recognition. From a broader perspective, these frameworks can be viewed as a first step towards design of automated adaptive systems for real world data.

ContributorsChattopadhyay, Rita (Author) / Panchanathan, Sethuraman (Thesis advisor) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2013

Probabilistic topic models for human emotion analysis

Description

While discrete emotions like joy, anger, disgust etc. are quite popular, continuous

emotion dimensions like arousal and valence are gaining popularity within the research

community due to an increase in the availability of datasets annotated with these

emotions. Unlike the discrete emotions, continuous emotions allow modeling of subtle

and complex affect dimensions but are…

While discrete emotions like joy, anger, disgust etc. are quite popular, continuous

emotion dimensions like arousal and valence are gaining popularity within the research

community due to an increase in the availability of datasets annotated with these

emotions. Unlike the discrete emotions, continuous emotions allow modeling of subtle

and complex affect dimensions but are difficult to predict.

Dimension reduction techniques form the core of emotion recognition systems and

help create a new feature space that is more helpful in predicting emotions. But these

techniques do not necessarily guarantee a better predictive capability as most of them

are unsupervised, especially in regression learning. In emotion recognition literature,

supervised dimension reduction techniques have not been explored much and in this

work a solution is provided through probabilistic topic models. Topic models provide

a strong probabilistic framework to embed new learning paradigms and modalities.

In this thesis, the graphical structure of Latent Dirichlet Allocation has been explored

and new models tuned to emotion recognition and change detection have been built.

In this work, it has been shown that the double mixture structure of topic models

helps 1) to visualize feature patterns, and 2) to project features onto a topic simplex

that is more predictive of human emotions, when compared to popular techniques

like PCA and KernelPCA. Traditionally, topic models have been used on quantized

features but in this work, a continuous topic model called the Dirichlet Gaussian

Mixture model has been proposed. Evaluation of DGMM has shown that while modeling

videos, performance of LDA models can be replicated even without quantizing

the features. Until now, topic models have not been explored in a supervised context

of video analysis and thus a Regularized supervised topic model (RSLDA) that

models video and audio features is introduced. RSLDA learning algorithm performs

both dimension reduction and regularized linear regression simultaneously, and has outperformed supervised dimension reduction techniques like SPCA and Correlation

based feature selection algorithms. In a first of its kind, two new topic models, Adaptive

temporal topic model (ATTM) and SLDA for change detection (SLDACD) have

been developed for predicting concept drift in time series data. These models do not

assume independence of consecutive frames and outperform traditional topic models

in detecting local and global changes respectively.

ContributorsLade, Prasanth (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Balasubramanian, Vineeth N (Committee member) / Arizona State University (Publisher)

Created2015

Computational methods for perceptual training in radiology

Description

Medical images constitute a special class of images that are captured to allow diagnosis of disease, and their "correct" interpretation is vitally important. Because they are not "natural" images, radiologists must be trained to visually interpret them. This training process includes implicit perceptual learning that is gradually acquired over an…

Medical images constitute a special class of images that are captured to allow diagnosis of disease, and their "correct" interpretation is vitally important. Because they are not "natural" images, radiologists must be trained to visually interpret them. This training process includes implicit perceptual learning that is gradually acquired over an extended period of exposure to medical images. This dissertation proposes novel computational methods for evaluating and facilitating perceptual training in radiologists. Part 1 of this dissertation proposes an eye-tracking-based metric for measuring the training progress of individual radiologists. Six metrics were identified as potentially useful: time to complete task, fixation count, fixation duration, consciously viewed regions, subconsciously viewed regions, and saccadic length. Part 2 of this dissertation proposes an eye-tracking-based entropy metric for tracking the rise and fall in the interest level of radiologists, as they scan chest radiographs. The results showed that entropy was significantly lower when radiologists were fixating on abnormal regions. Part 3 of this dissertation develops a method that allows extraction of Gabor-based feature vectors from corresponding anatomical regions of "normal" chest radiographs, despite anatomical variations across populations. These feature vectors are then used to develop and compare transductive and inductive computational methods for generating overlay maps that show atypical regions within test radiographs. The results show that the transductive methods produced much better maps than the inductive methods for 20 ground-truthed test radiographs. Part 4 of this dissertation uses an Extended Fuzzy C-Means (EFCM) based instance selection method to reduce the computational cost of transductive methods. The results showed that EFCM substantially reduced the computational cost without a substantial drop in performance. The dissertation then proposes a novel Variance Based Instance Selection (VBIS) method that also reduces the computational cost, but allows for incremental incorporation of new informative radiographs, as they are encountered. Part 5 of this dissertation develops and demonstrates a novel semi-transductive framework that combines the superior performance of transductive methods with the reduced computational cost of inductive methods. The results showed that the semi-transductive approach provided both an effective and efficient framework for detection of atypical regions in chest radiographs.

ContributorsAlzubaidi, Mohammad A (Author) / Panchanathan, Sethuraman (Thesis advisor) / Black, John A. (Committee member) / Ye, Jieping (Committee member) / Patel, Ameet (Committee member) / Arizona State University (Publisher)

Created2012

Somatic ABC's: a theoretical framework for designing, developing and evaluating the building blocks of touch-based information delivery

Description

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's…

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's computerized devices and displays largely engage--have become overloaded, creating possibilities for distractions, delays and high cognitive load; which in turn can lead to a loss of situational awareness, increasing chances for life threatening situations such as texting while driving. Surprisingly, alternative modalities for information delivery have seen little exploration. Touch, in particular, is a promising candidate given that it is our largest sensory organ with impressive spatial and temporal acuity. Although some approaches have been proposed for touch-based information delivery, they are not without limitations including high learning curves, limited applicability and/or limited expression. This is largely due to the lack of a versatile, comprehensive design theory--specifically, a theory that addresses the design of touch-based building blocks for expandable, efficient, rich and robust touch languages that are easy to learn and use. Moreover, beyond design, there is a lack of implementation and evaluation theories for such languages. To overcome these limitations, a unified, theoretical framework, inspired by natural, spoken language, is proposed called Somatic ABC's for Articulating (designing), Building (developing) and Confirming (evaluating) touch-based languages. To evaluate the usefulness of Somatic ABC's, its design, implementation and evaluation theories were applied to create communication languages for two very unique application areas: audio described movies and motor learning. These applications were chosen as they presented opportunities for complementing communication by offloading information, typically conveyed visually and/or aurally, to the skin. For both studies, it was found that Somatic ABC's aided the design, development and evaluation of rich somatic languages with distinct and natural communication units.

ContributorsMcDaniel, Troy Lee (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2012

Domain Adaptive Computational Models for Computer Vision

Description

The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences…

The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences in image quality (resolution, brightness, occlusion and color), changes in camera perspective, dissimilar backgrounds and an inherent diversity of the samples themselves. Machine learning techniques like transfer learning are employed to adapt computational models across distributions. Domain adaptation is a special case of transfer learning, where knowledge from a source domain is transferred to a target domain in the form of learned models and efficient feature representations.

The dissertation outlines novel domain adaptation approaches across different feature spaces; (i) a linear Support Vector Machine model for domain alignment; (ii) a nonlinear kernel based approach that embeds domain-aligned data for enhanced classification; (iii) a hierarchical model implemented using deep learning, that estimates domain-aligned hash values for the source and target data, and (iv) a proposal for a feature selection technique to reduce cross-domain disparity. These adaptation procedures are tested and validated across a range of computer vision applications like object classification, facial expression recognition, digit recognition, and activity recognition. The dissertation also provides a unique perspective of domain adaptation literature from the point-of-view of linear, nonlinear and hierarchical feature spaces. The dissertation concludes with a discussion on the future directions for research that highlight the role of domain adaptation in an era of rapid advancements in artificial intelligence.

ContributorsDemakethepalli Venkateswara, Hemanth (Author) / Panchanathan, Sethuraman (Thesis advisor) / Li, Baoxin (Committee member) / Davulcu, Hasan (Committee member) / Ye, Jieping (Committee member) / Chakraborty, Shayok (Committee member) / Arizona State University (Publisher)

Created2017

A computational framework for wearable accelerometer based activity and gesture recognition

Description

Advances in the area of ubiquitous, pervasive and wearable computing have resulted in the development of low band-width, data rich environmental and body sensor networks, providing a reliable and non-intrusive methodology for capturing activity data from humans and the environments they inhabit. Assistive technologies that promote independent living amongst elderly…

Advances in the area of ubiquitous, pervasive and wearable computing have resulted in the development of low band-width, data rich environmental and body sensor networks, providing a reliable and non-intrusive methodology for capturing activity data from humans and the environments they inhabit. Assistive technologies that promote independent living amongst elderly and individuals with cognitive impairment are a major motivating factor for sensor-based activity recognition systems. However, the process of discerning relevant activity information from these sensor streams such as accelerometers is a non-trivial task and is an on-going research area. The difficulty stems from factors such as spatio-temporal variations in movement patterns induced by different individuals and contexts, sparse occurrence of relevant activity gestures in a continuous stream of irrelevant movements and the lack of real-world data for training learning algorithms. This work addresses these challenges in the context of wearable accelerometer-based simple activity and gesture recognition. The proposed computational framework utilizes discriminative classifiers for learning the spatio-temporal variations in movement patterns and demonstrates its effectiveness through a real-time simple activity recognition system and short duration, non- repetitive activity gesture recognition. Furthermore, it proposes adaptive discriminative threshold models trained only on relevant activity gestures for filtering irrelevant movement patterns in a continuous stream. These models are integrated into a gesture spotting network for detecting activity gestures involved in complex activities of daily living. The framework addresses the lack of real world data for training, by using auxiliary, yet related data samples for training in a transfer learning setting. Finally the problem of predicting activity tasks involved in the execution of a complex activity of daily living is described and a solution based on hierarchical Markov models is discussed and evaluated.

ContributorsChatapuram Krishnan, Narayanan (Author) / Panchanathan, Sethuraman (Thesis advisor) / Sundaram, Hari (Committee member) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Cook, Diane (Committee member) / Arizona State University (Publisher)

Created2010

Mediated social interpersonal communication: evidence-based understanding of multimedia solutions for enriching social situational awareness

Description

Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out of the elements of human interpersonal communication including both verbal…

Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out of the elements of human interpersonal communication including both verbal and non-verbal cues. While the verbal cues are instructive and delivered through speech, the non-verbal cues are mostly interpretive and requires the full attention of the participants to understand, comprehend and respond to them appropriately. Unfortunately certain situations are not conducive for a person to have complete access to their social surroundings, especially the non-verbal cues. For example, a person is who is blind or visually impaired may find that the non-verbal cues like smiling, head nod, eye contact, body gestures and facial expressions of their interaction partners are not accessible due to their sensory deprivation. The same could be said of people who are remotely engaged in a conversation and physically separated to have a visual access to one's body and facial mannerisms. This dissertation describes novel multimedia technologies to aid situations where it is necessary to mediate social situational information between interacting participants. As an example of the proposed system, an evidence-based model for understanding the accessibility problem faced by people who are blind or visually impaired is described in detail. From the derived model, a sleuth of sensing and delivery technologies that use state-of-the-art computer vision algorithms in combination with novel haptic interfaces are developed towards a) A Dyadic Interaction Assistant, capable of helping individuals who are blind to access important head and face based non-verbal communicative cues during one-on-one dyadic interactions, and b) A Group Interaction Assistant, capable of provide situational awareness about the interaction partners and their dynamics to a user who is blind, while also providing important social feedback about their own body mannerisms. The goal is to increase the effective social situational information that one has access to, with the conjuncture that a good awareness of one's social surroundings gives them the ability to understand and empathize with their interaction partners better. Extending the work from an important social interaction assistive technology, the need for enriched social situational awareness is everyday professional situations are also discussed, including, a) enriched remote interactions between physically separated interaction partners, and b) enriched communication between medical professionals during critical care procedures, towards enhanced patient safety. In the concluding remarks, this dissertation engages the readers into a science and technology policy discussion on the potential effect of a new technology like the social interaction assistant on the society. Discussing along the policy lines, social disability is highlighted as an important area that requires special attention from researchers and policy makers. Given that the proposed technology relies on wearable inconspicuous cameras, the discussion of privacy policies is extended to encompass newly evolving interpersonal interaction recorders, like the one presented in this dissertation.

ContributorsKrishna, Sreekar (Author) / Panchanathan, Sethuraman (Thesis advisor) / Black, John A. (Committee member) / Qian, Gang (Committee member) / Li, Baoxin (Committee member) / Shiota, Michelle (Committee member) / Arizona State University (Publisher)

Created2011

Conformal predictions in multimedia pattern recognition

Description

The fields of pattern recognition and machine learning are on a fundamental quest to design systems that can learn the way humans do. One important aspect of human intelligence that has so far not been given sufficient attention is the capability of humans to express when they are certain about…

The fields of pattern recognition and machine learning are on a fundamental quest to design systems that can learn the way humans do. One important aspect of human intelligence that has so far not been given sufficient attention is the capability of humans to express when they are certain about a decision, or when they are not. Machine learning techniques today are not yet fully equipped to be trusted with this critical task. This work seeks to address this fundamental knowledge gap. Existing approaches that provide a measure of confidence on a prediction such as learning algorithms based on the Bayesian theory or the Probably Approximately Correct theory require strong assumptions or often produce results that are not practical or reliable. The recently developed Conformal Predictions (CP) framework - which is based on the principles of hypothesis testing, transductive inference and algorithmic randomness - provides a game-theoretic approach to the estimation of confidence with several desirable properties such as online calibration and generalizability to all classification and regression methods. This dissertation builds on the CP theory to compute reliable confidence measures that aid decision-making in real-world problems through: (i) Development of a methodology for learning a kernel function (or distance metric) for optimal and accurate conformal predictors; (ii) Validation of the calibration properties of the CP framework when applied to multi-classifier (or multi-regressor) fusion; and (iii) Development of a methodology to extend the CP framework to continuous learning, by using the framework for online active learning. These contributions are validated on four real-world problems from the domains of healthcare and assistive technologies: two classification-based applications (risk prediction in cardiac decision support and multimodal person recognition), and two regression-based applications (head pose estimation and saliency prediction in images). The results obtained show that: (i) multiple kernel learning can effectively increase efficiency in the CP framework; (ii) quantile p-value combination methods provide a viable solution for fusion in the CP framework; and (iii) eigendecomposition of p-value difference matrices can serve as effective measures for online active learning; demonstrating promise and potential in using these contributions in multimedia pattern recognition problems in real-world settings.

ContributorsNallure Balasubramanian, Vineeth (Author) / Panchanathan, Sethuraman (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Vovk, Vladimir (Committee member) / Arizona State University (Publisher)

Created2010

A Person-Centric Design Framework for At-Home Motor Learning in Serious Games

Description

In motor learning, real-time multi-modal feedback is a critical element in guided training. Serious games have been introduced as a platform for at-home motor training due to their highly interactive and multi-modal nature. This dissertation explores the design of a multimodal environment for at-home training in which an autonomous system…

In motor learning, real-time multi-modal feedback is a critical element in guided training. Serious games have been introduced as a platform for at-home motor training due to their highly interactive and multi-modal nature. This dissertation explores the design of a multimodal environment for at-home training in which an autonomous system observes and guides the user in the place of a live trainer, providing real-time assessment, feedback and difficulty adaptation as the subject masters a motor skill. After an in-depth review of the latest solutions in this field, this dissertation proposes a person-centric approach to the design of this environment, in contrast to the standard techniques implemented in related work, to address many of the limitations of these approaches. The unique advantages and restrictions of this approach are presented in the form of a case study in which a system entitled the "Autonomous Training Assistant" consisting of both hardware and software for guided at-home motor learning is designed and adapted for a specific individual and trainer.

In this work, the design of an autonomous motor learning environment is approached from three areas: motor assessment, multimodal feedback, and serious game design. For motor assessment, a 3-dimensional assessment framework is proposed which comprises of 2 spatial (posture, progression) and 1 temporal (pacing) domains of real-time motor assessment. For multimodal feedback, a rod-shaped device called the "Intelligent Stick" is combined with an audio-visual interface to provide feedback to the subject in three domains (audio, visual, haptic). Feedback domains are mapped to modalities and feedback is provided whenever the user's performance deviates from the ideal performance level by an adaptive threshold. Approaches for multi-modal integration and feedback fading are discussed. Finally, a novel approach for stealth adaptation in serious game design is presented. This approach allows serious games to incorporate motor tasks in a more natural way, facilitating self-assessment by the subject. An evaluation of three different stealth adaptation approaches are presented and evaluated using the flow-state ratio metric. The dissertation concludes with directions for future work in the integration of stealth adaptation techniques across the field of exergames.

ContributorsTadayon, Ramin (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Amresh, Ashish (Committee member) / Glenberg, Arthur (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2017

Filtering by