Search Content

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a…

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

ContributorsChakraborty, Shayok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Balasubramanian, Vineeth N. (Committee member) / Li, Baoxin (Committee member) / Mittelmann, Hans (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Grassmannian learning for facial expression recognition from video

Description

In this thesis we consider the problem of facial expression recognition (FER) from video sequences. Our method is based on subspace representations and Grassmann manifold based learning. We use Local Binary Pattern (LBP) at the frame level for representing the facial features. Next we develop a model to represent the…

In this thesis we consider the problem of facial expression recognition (FER) from video sequences. Our method is based on subspace representations and Grassmann manifold based learning. We use Local Binary Pattern (LBP) at the frame level for representing the facial features. Next we develop a model to represent the video sequence in a lower dimensional expression subspace and also as a linear dynamical system using Autoregressive Moving Average (ARMA) model. As these subspaces lie on Grassmann space, we use Grassmann manifold based learning techniques such as kernel Fisher Discriminant Analysis with Grassmann kernels for classification. We consider six expressions namely, Angry (AN), Disgust (Di), Fear (Fe), Happy (Ha), Sadness (Sa) and Surprise (Su) for classification. We perform experiments on extended Cohn-Kanade (CK+) facial expression database to evaluate the expression recognition performance. Our method demonstrates good expression recognition performance outperforming other state of the art FER algorithms. We achieve an average recognition accuracy of 97.41% using a method based on expression subspace, kernel-FDA and Support Vector Machines (SVM) classifier. By using a simpler classifier, 1-Nearest Neighbor (1-NN) along with kernel-FDA, we achieve a recognition accuracy of 97.09%. We find that to process a group of 19 frames in a video sequence, LBP feature extraction requires majority of computation time (97 %) which is about 1.662 seconds on the Intel Core i3, dual core platform. However when only 3 frames (onset, middle and peak) of a video sequence are used, the computational complexity is reduced by about 83.75 % to 260 milliseconds at the expense of drop in the recognition accuracy to 92.88 %.

ContributorsYellamraju, Anirudh (Author) / Chakrabarti, Chaitali (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Karam, Lina (Committee member) / Arizona State University (Publisher)

Created2014

Multi-task learning via structured regularization: formulations, algorithms, and applications

Description

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It…

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose a series of MTL approaches which assume that multiple tasks are intrinsically related via a shared low-dimensional feature space. The proposed MTL approaches are developed to deal with different scenarios and settings; they are respectively formulated as mathematical optimization problems of minimizing the empirical loss regularized by different structures. For all proposed MTL formulations, I develop the associated optimization algorithms to find their globally optimal solution efficiently. I also conduct theoretical analysis for certain MTL approaches by deriving the globally optimal solution recovery condition and the performance bound. To demonstrate the practical performance, I apply the proposed MTL approaches on different real-world applications: (1) Automated annotation of the Drosophila gene expression pattern images; (2) Categorization of the Yahoo web pages. Our experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.

ContributorsChen, Jianhui (Author) / Ye, Jieping (Thesis advisor) / Kumar, Sudhir (Committee member) / Liu, Huan (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2011

Advancing microfluidic-based protein biosensor technology for use in clinical diagnostics

Description

Demand for biosensor research applications is growing steadily. According to a new report by Frost & Sullivan, the biosensor market is expected to reach $14.42 billion by 2016. Clinical diagnostic applications continue to be the largest market for biosensors, and this demand is likely to continue through 2016 and beyond.…

Demand for biosensor research applications is growing steadily. According to a new report by Frost & Sullivan, the biosensor market is expected to reach $14.42 billion by 2016. Clinical diagnostic applications continue to be the largest market for biosensors, and this demand is likely to continue through 2016 and beyond. Biosensor technology for use in clinical diagnostics, however, requires translational research that moves bench science and theoretical knowledge toward marketable products. Despite the high volume of academic research to date, only a handful of biomedical devices have become viable commercial applications. Academic research must increase its focus on practical uses for biosensors. This dissertation is an example of this increased focus, and discusses work to advance microfluidic-based protein biosensor technologies for practical use in clinical diagnostics. Four areas of work are discussed: The first involved work to develop reusable/reconfigurable biosensors that are useful in applications like biochemical science and analytical chemistry that require detailed sensor calibration. This work resulted in a prototype sensor and an in-situ electrochemical surface regeneration technique that can be used to produce microfluidic-based reusable biosensors. The second area of work looked at non-specific adsorption (NSA) of biomolecules, which is a persistent challenge in conventional microfluidic biosensors. The results of this work produced design methods that reduce the NSA. The third area of work involved a novel microfluidic sensing platform that was designed to detect target biomarkers using competitive protein adsorption. This technique uses physical adsorption of proteins to a surface rather than complex and time-consuming immobilization procedures. This method enabled us to selectively detect a thyroid cancer biomarker, thyroglobulin, in a controlled-proteins cocktail and a cardiovascular biomarker, fibrinogen, in undiluted human serum. The fourth area of work involved expanding the technique to produce a unique protein identification method; Pattern-recognition. A sample mixture of proteins generates a distinctive composite pattern upon interaction with a sensing platform consisting of multiple surfaces whereby each surface consists of a distinct type of protein pre-adsorbed on the surface. The utility of the "pattern-recognition" sensing mechanism was then verified via recognition of a particular biomarker, C-reactive protein, in the cocktail sample mixture.

ContributorsChoi, Seokheun (Author) / Chae, Junseok (Thesis advisor) / Tao, Nongjian (Committee member) / Yu, Hongyu (Committee member) / Forzani, Erica (Committee member) / Arizona State University (Publisher)

Created2011

CPR complex pattern ranking for evaluating top-k pattern queries over event streams

Description

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and…

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this work, we propose a complex pattern ranking (CPR) framework for specifying top-k pattern queries over streaming data, present new algorithms to support top-k pattern queries in data streaming environments, and verify the effectiveness and efficiency of the proposed algorithms. The developed algorithms identify top-k matching results satisfying both patterns as well as additional criteria. To support real-time processing of the data streams, instead of computing top-k results from scratch for each time window, we maintain top-k results dynamically as new events come and old ones expire. We also develop new top-k join execution strategies that are able to adapt to the changing situations (e.g., sorted and random access costs, join rates) without having to assume a priori presence of data statistics. Experiments show significant improvements over existing approaches.

ContributorsWang, Xinxin (Author) / Candan, K. Selcuk (Thesis advisor) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

Mining semantics from low-level features in multimedia computing

Description

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or…

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or low-level data entities. Also, additional domain knowledge may often be indispensable for uncovering the underlying semantics, but in most cases such domain knowledge is not readily available from the acquired media streams. Thus, making use of various types of contextual information and leveraging corresponding domain knowledge are vital for effectively associating high-level semantics with low-level signals with higher accuracies in multimedia computing problems. In this work, novel computational methods are explored and developed for incorporating contextual information/domain knowledge in different forms for multimedia computing and pattern recognition problems. Specifically, a novel Bayesian approach with statistical-sampling-based inference is proposed for incorporating a special type of domain knowledge, spatial prior for the underlying shapes; cross-modality correlations via Kernel Canonical Correlation Analysis is explored and the learnt space is then used for associating multimedia contents in different forms; model contextual information as a graph is leveraged for regulating interactions among high-level semantic concepts (e.g., category labels), low-level input signal (e.g., spatial/temporal structure). Four real-world applications, including visual-to-tactile face conversion, photo tag recommendation, wild web video classification and unconstrained consumer video summarization, are selected to demonstrate the effectiveness of the approaches. These applications range from classic research challenges to emerging tasks in multimedia computing. Results from experiments on large-scale real-world data with comparisons to other state-of-the-art methods and subjective evaluations with end users confirmed that the developed approaches exhibit salient advantages, suggesting that they are promising for leveraging contextual information/domain knowledge for a wide range of multimedia computing and pattern recognition problems.

ContributorsWang, Zhesheng (Author) / Li, Baoxin (Thesis advisor) / Sundaram, Hari (Committee member) / Qian, Gang (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2011

Large scale analytical insights of email communication patterns

Description

This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log…

This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log archives over a period of five years) and focused on drawing more precise metrics from different perspectives of the communication data. Also, I attempted to overcome the scalability issue by using Apache Pig libraries, which run on a MapReduce framework based Hadoop Cluster. I described four metrics based on which I observed and analyzed the data and also presented the results which show the required patterns and anomalies to better understand and infer the communication. Also described the usage experience with Pig Latin (scripting language of Apache Pig Libraries) for this research and how they brought the feature of scalability, simplicity, and visibility in this data intensive research work. These approaches are useful in project monitoring, to augment human observation and reporting, in social network analysis, to track individual contributions.

ContributorsMotamarri, Lakshminarayana (Author) / Santanam, Raghu (Thesis advisor) / Ye, Jieping (Thesis advisor) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

Efficient implementation of a low cost object tracking system

Description

Object tracking is an important topic in multimedia, particularly in applications such as teleconferencing, surveillance and human-computer interface. Its goal is to determine the position of objects in images continuously and reliably. The key steps involved in object tracking are foreground detection to detect moving objects, clustering to enable representation…

Object tracking is an important topic in multimedia, particularly in applications such as teleconferencing, surveillance and human-computer interface. Its goal is to determine the position of objects in images continuously and reliably. The key steps involved in object tracking are foreground detection to detect moving objects, clustering to enable representation of an object by its centroid, and tracking the centroids to determine the motion parameters.

In this thesis, a low cost object tracking system is implemented on a hardware accelerator that is a warp based processor for SIMD/Vector style computations. First, the different foreground detection techniques are explored to figure out the best technique that involves the least number of computations without compromising on the performance. It is found that the Gaussian Mixture Model proposed by Zivkovic gives the best performance with respect to both accuracy and number of computations. Pixel level parallelization is applied to this algorithm and it is mapped onto the hardware accelerator.

Next, the different clustering algorithms are studied and it is found that while DBSCAN is highly accurate and robust to outliers, it is very computationally intensive. In contrast, K-means is computationally simple, but it requires that the number of means to be specified beforehand. So, a new clustering algorithm is proposed that uses a combination of both DBSCAN and K-means algorithm along with a diagnostic algorithm on K-means to estimate the right number of centroids. The proposed hybrid algorithm is shown to be faster than the DBSCAN algorithm by ~2.5x with minimal loss in accuracy. Also, the 1D Kalman filter is implemented assuming constant acceleration model. Since the computations involved in Kalman filter is just a set of recursive equations, the sequential model in itself exhibits good performance, thereby alleviating the need for parallelization. The tracking performance of the low cost implementation is evaluated against the sequential version. It is found that the proposed hybrid algorithm performs very close to the reference algorithm based on the DBSCAN algorithm.

ContributorsSasikumar, Asha (Author) / Chakrabarti, Chaitali (Thesis advisor) / Ogras, Umit Y. (Committee member) / Suppapola, Antonia Pappandreau (Committee member) / Arizona State University (Publisher)

Created2015

Community, Collaboration, and Creativity: An Exploration of Original Characters

Description

How do you convey what’s interesting and important to you as an artist in a digital world of constantly shifting attentions? For many young creatives, the answer is original characters, or OCs. An OC is a character that an artist creates for personal enjoyment, whether based on an already existing…

How do you convey what’s interesting and important to you as an artist in a digital world of constantly shifting attentions? For many young creatives, the answer is original characters, or OCs. An OC is a character that an artist creates for personal enjoyment, whether based on an already existing story or world, or completely from their own imagination.
As creations made for purely personal interests, OCs are an excellent elevator pitch to talk one creative to another, opening up opportunities for connection in a world where communication is at our fingertips but personal connection is increasingly harder to make. OCs encourage meaningful interaction by offering themselves as muses, avatars, and story pieces, and so much more, where artists can have their characters interact with other creatives through many different avenues such as art-making, table top games, or word of mouth.

In this thesis, I explore the worlds and aesthetics of many creators and their original characters through qualitative research and collaborative art-making. I begin with a short survey of my creative peers, asking general questions about their characters and thoughts on OCs, then move to sketching characters from various creators. I focus my research to a group of seven core creators and their characters, whom I interview and work closely with in order to create a series of seven final paintings of their original characters.

ContributorsCote, Jacqueline (Author) / Button, Melissa M (Thesis director) / Dove-Viebahn, Aviva (Committee member) / School of Art (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Listen Up! Developing Accessible Educational Materials for Aspiring Audio Engineers

Description

In the past ten years, the United States’ sound recording industries have experienced significant decreases in employment opportunities for aspiring audio engineers from economic imbalances in the music industry’s digital streaming era and reductions in government funding for career and technical education (CTE). The Recording Industry Association of America reports…

In the past ten years, the United States’ sound recording industries have experienced significant decreases in employment opportunities for aspiring audio engineers from economic imbalances in the music industry’s digital streaming era and reductions in government funding for career and technical education (CTE). The Recording Industry Association of America reports promises of music industry sustainability based on increasing annual revenues in paid streaming services and artists’ high creative demand. The rate of new audio engineer entries in the sound recording subsection of the music industry is not viable to support streaming artists’ high demand to engineer new music recordings. Offering CTE programs in secondary education is rare for aspiring engineers with insufficient accessibility to pursue a post-secondary or vocational education because of financial and academic limitations. These aspiring engineers seek alternatives for receiving an informal education in audio engineering on the Internet using video sharing services like YouTube to search for tutorials and improve their engineering skills. The shortage of accessible educational materials on the Internet restricts engineers from advancing their own audio engineering education, reducing opportunities to enter a desperate job market in need of independent, home studio-based engineers. Content creators on YouTube take advantage of this situation and commercialize their own video tutorial series for free and selling paid subscriptions to exclusive content. This is misleading for newer engineers because these tutorials omit important understandings of fundamental engineering concepts. Instead, content creators teach inflexible engineering methodologies that are mostly beneficial to their own way of thinking. Content creators do not often assess the incompatibility of teaching their own methodologies to potential entrants in a profession that demands critical thinking skills requiring applied fundamental audio engineering concepts and techniques. This project analyzes potential solutions to resolve the deficiencies in online audio engineering education and experiments with structuring simple, deliverable, accessible educational content and materials to new entries in audio engineering. Designing clear, easy to follow material to these new entries in audio engineering is essential for developing a strong understanding for the application of fundamental concepts in future engineers’ careers. Approaches to creating and designing educational content requires translating complex engineering concepts through simplified mediums that reduce limitations in learning for future audio engineers.

ContributorsBurns, Triston Connor (Author) / Tobias, Evan (Thesis director) / Libman, Jeff (Committee member) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Filtering by