This collection includes both ASU Theses and Dissertations, submitted by graduate students, and the Barrett, Honors College theses submitted by undergraduate students. 

Displaying 1 - 10 of 16
Filtering by

Clear all filters

153340-Thumbnail Image.png
Description
While discrete emotions like joy, anger, disgust etc. are quite popular, continuous

emotion dimensions like arousal and valence are gaining popularity within the research

community due to an increase in the availability of datasets annotated with these

emotions. Unlike the discrete emotions, continuous emotions allow modeling of subtle

and complex affect dimensions but are

While discrete emotions like joy, anger, disgust etc. are quite popular, continuous

emotion dimensions like arousal and valence are gaining popularity within the research

community due to an increase in the availability of datasets annotated with these

emotions. Unlike the discrete emotions, continuous emotions allow modeling of subtle

and complex affect dimensions but are difficult to predict.

Dimension reduction techniques form the core of emotion recognition systems and

help create a new feature space that is more helpful in predicting emotions. But these

techniques do not necessarily guarantee a better predictive capability as most of them

are unsupervised, especially in regression learning. In emotion recognition literature,

supervised dimension reduction techniques have not been explored much and in this

work a solution is provided through probabilistic topic models. Topic models provide

a strong probabilistic framework to embed new learning paradigms and modalities.

In this thesis, the graphical structure of Latent Dirichlet Allocation has been explored

and new models tuned to emotion recognition and change detection have been built.

In this work, it has been shown that the double mixture structure of topic models

helps 1) to visualize feature patterns, and 2) to project features onto a topic simplex

that is more predictive of human emotions, when compared to popular techniques

like PCA and KernelPCA. Traditionally, topic models have been used on quantized

features but in this work, a continuous topic model called the Dirichlet Gaussian

Mixture model has been proposed. Evaluation of DGMM has shown that while modeling

videos, performance of LDA models can be replicated even without quantizing

the features. Until now, topic models have not been explored in a supervised context

of video analysis and thus a Regularized supervised topic model (RSLDA) that

models video and audio features is introduced. RSLDA learning algorithm performs

both dimension reduction and regularized linear regression simultaneously, and has outperformed supervised dimension reduction techniques like SPCA and Correlation

based feature selection algorithms. In a first of its kind, two new topic models, Adaptive

temporal topic model (ATTM) and SLDA for change detection (SLDACD) have

been developed for predicting concept drift in time series data. These models do not

assume independence of consecutive frames and outperform traditional topic models

in detecting local and global changes respectively.
ContributorsLade, Prasanth (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Balasubramanian, Vineeth N (Committee member) / Arizona State University (Publisher)
Created2015
150019-Thumbnail Image.png
Description
Currently Java is making its way into the embedded systems and mobile devices like androids. The programs written in Java are compiled into machine independent binary class byte codes. A Java Virtual Machine (JVM) executes these classes. The Java platform additionally specifies the Java Native Interface (JNI). JNI allows Java

Currently Java is making its way into the embedded systems and mobile devices like androids. The programs written in Java are compiled into machine independent binary class byte codes. A Java Virtual Machine (JVM) executes these classes. The Java platform additionally specifies the Java Native Interface (JNI). JNI allows Java code that runs within a JVM to interoperate with applications or libraries that are written in other languages and compiled to the host CPU ISA. JNI plays an important role in embedded system as it provides a mechanism to interact with libraries specific to the platform. This thesis addresses the overhead incurred in the JNI due to reflection and serialization when objects are accessed on android based mobile devices. It provides techniques to reduce this overhead. It also provides an API to access objects through its reference through pinning its memory location. The Android emulator was used to evaluate the performance of these techniques and we observed that there was 5 - 10 % performance gain in the new Java Native Interface.
ContributorsChandrian, Preetham (Author) / Lee, Yann-Hang (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2011
150599-Thumbnail Image.png
Description
Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's computerized devices and displays largely engage--have become overloaded, creating possibilities for distractions, delays and high cognitive load; which in turn can lead to a loss of situational awareness, increasing chances for life threatening situations such as texting while driving. Surprisingly, alternative modalities for information delivery have seen little exploration. Touch, in particular, is a promising candidate given that it is our largest sensory organ with impressive spatial and temporal acuity. Although some approaches have been proposed for touch-based information delivery, they are not without limitations including high learning curves, limited applicability and/or limited expression. This is largely due to the lack of a versatile, comprehensive design theory--specifically, a theory that addresses the design of touch-based building blocks for expandable, efficient, rich and robust touch languages that are easy to learn and use. Moreover, beyond design, there is a lack of implementation and evaluation theories for such languages. To overcome these limitations, a unified, theoretical framework, inspired by natural, spoken language, is proposed called Somatic ABC's for Articulating (designing), Building (developing) and Confirming (evaluating) touch-based languages. To evaluate the usefulness of Somatic ABC's, its design, implementation and evaluation theories were applied to create communication languages for two very unique application areas: audio described movies and motor learning. These applications were chosen as they presented opportunities for complementing communication by offloading information, typically conveyed visually and/or aurally, to the skin. For both studies, it was found that Somatic ABC's aided the design, development and evaluation of rich somatic languages with distinct and natural communication units.
ContributorsMcDaniel, Troy Lee (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)
Created2012
155963-Thumbnail Image.png
Description
Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is

Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is generalization of supervised learning, is one

example of task learning that is discussed. In particular, a novel non-parametric k-

NN-based multiple-instance learning is proposed, which is shown to outperform other

existing approaches. This solution is applied to a diabetic retinopathy pathology

detection problem eectively.

In cases of representation learning, generality of neural features are investigated

rst. This investigation leads to some critical understanding and results in feature

generality among datasets. The possibility of learning from a mentor network instead

of from labels is then investigated. Distillation of dark knowledge is used to eciently

mentor a small network from a pre-trained large mentor network. These studies help

in understanding representation learning with smaller and compressed networks.
ContributorsVenkatesan, Ragav (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2017
156084-Thumbnail Image.png
Description
The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos.

The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss.

In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks.
ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2017
156951-Thumbnail Image.png
Description
Visual processing in social media platforms is a key step in gathering and understanding information in the era of Internet and big data. Online data is rich in content, but its processing faces many challenges including: varying scales for objects of interest, unreliable and/or missing labels, the inadequacy of single

Visual processing in social media platforms is a key step in gathering and understanding information in the era of Internet and big data. Online data is rich in content, but its processing faces many challenges including: varying scales for objects of interest, unreliable and/or missing labels, the inadequacy of single modal data and difficulty in analyzing high dimensional data. Towards facilitating the processing and understanding of online data, this dissertation primarily focuses on three challenges that I feel are of great practical importance: handling scale differences in computer vision tasks, such as facial component detection and face retrieval, developing efficient classifiers using partially labeled data and noisy data, and employing multi-modal models and feature selection to improve multi-view data analysis. For the first challenge, I propose a scale-insensitive algorithm to expedite and accurately detect facial landmarks. For the second challenge, I propose two algorithms that can be used to learn from partially labeled data and noisy data respectively. For the third challenge, I propose a new framework that incorporates feature selection modules into LDA models.
ContributorsZhou, Xu (Author) / Li, Baoxin (Thesis advisor) / Hsiao, Sharon (Committee member) / Davulcu, Hasan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2018
154885-Thumbnail Image.png
Description
Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their

Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their aesthetic quality instead of binary-categorizing them. Furthermore, in such applications, it may be possible that all images belong to the same category. Hence determining the aesthetic ranking of the images is more appropriate. To this end, a novel problem of ranking images with respect to their aesthetic quality is formulated in this work. A new data-set of image pairs with relative labels is constructed by carefully selecting images from the popular AVA data-set. Unlike in aesthetics classification, there is no single threshold which would determine the ranking order of the images across the entire data-set.

This problem is attempted using a deep neural network based approach that is trained on image pairs by incorporating principles from relative learning. Results show that such relative training procedure allows the network to rank the images with a higher accuracy than a state-of-art network trained on the same set of images using binary labels. Further analyzing the results show that training a model using the image pairs learnt better aesthetic features than training on same number of individual binary labelled images.

Additionally, an attempt is made at enhancing the performance of the system by incorporating saliency related information. Given an image, humans might fixate their vision on particular parts of the image, which they might be subconsciously intrigued to. I therefore tried to utilize the saliency information both stand-alone as well as in combination with the global and local aesthetic features by performing two separate sets of experiments. In both the cases, a standard saliency model is chosen and the generated saliency maps are convoluted with the images prior to passing them to the network, thus giving higher importance to the salient regions as compared to the remaining. Thus generated saliency-images are either used independently or along with the global and the local features to train the network. Empirical results show that the saliency related aesthetic features might already be learnt by the network as a sub-set of the global features from automatic feature extraction, thus proving the redundancy of the additional saliency module.
ContributorsGattupalli, Jaya Vijetha (Author) / Li, Baoxin (Thesis advisor) / Davulcu, Hasan (Committee member) / Liang, Jianming (Committee member) / Arizona State University (Publisher)
Created2016
154816-Thumbnail Image.png
Description
Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword

Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword query to search health forums regarding to some specific questions, e.g., what treatments are effective for a disease symptom? A medical researcher can discover medical knowledge in a timely and large-scale fashion by automatically aggregating the latest evidences emerging in health forums.

This dissertation studies how to effectively discover information in health forums. Several challenges have been identified. First, the existing work relies on the syntactic information unit, such as a sentence, a post, or a thread, to bind different pieces of information in a forum. However, most of information discovery tasks should be based on the semantic information unit, a patient. For instance, given a keyword query that involves the relationship between a treatment and side effects, it is expected that the matched keywords refer to the same patient. In this work, patient-centered mining is proposed to mine patient semantic information units. In a patient information unit, the health information, such as diseases, symptoms, treatments, effects, and etc., is connected by the corresponding patient.

Second, the information published in health forums has varying degree of quality. Some information includes patient-reported personal health experience, while others can be hearsay. In this work, a context-aware experience extraction framework is proposed to mine patient-reported personal health experience, which can be used for evidence-based knowledge discovery or finding patients with similar experience.

At last, the proposed patient-centered and experience-aware mining framework is used to build a patient health information database for effectively discovering adverse drug reactions (ADRs) from health forums. ADRs have become a serious health problem and even a leading cause of death in the United States. Health forums provide valuable evidences in a large scale and in a timely fashion through the active participation of patients, caregivers, and doctors. Empirical evaluation shows the effectiveness of the proposed approach.
ContributorsLiu, Yunzhong (Author) / Chen, Yi (Thesis advisor) / Liu, Huan (Thesis advisor) / Li, Baoxin (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2016
155830-Thumbnail Image.png
Description
Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and

Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and to generate a textual answer to the input question. There are two key research problems in VQA: image understanding and the question answering. My research mainly focuses on developing solutions to support solving these two problems.

In image understanding, one important research area is semantic segmentation, which takes images as input and output the label of each pixel. As much manual work is needed to label a useful training set, typical training sets for such supervised approaches are always small. There are also approaches with relaxed labeling requirement, called weakly supervised semantic segmentation, where only image-level labels are needed. With the development of social media, there are more and more user-uploaded images available

on-line. Such user-generated content often comes with labels like tags and may be coarsely labelled by various tools. To use these information for computer vision tasks, I propose a new graphic model by considering the neighborhood information and their interactions to obtain the pixel-level labels of the images with only incomplete image-level labels. The method was evaluated on both synthetic and real images.

In question answering, my research centers on best answer prediction, which addressed two main research topics: feature design and model construction. In the feature design part, most existing work discussed how to design effective features for answer quality / best answer prediction. However, little work mentioned how to design features by considering the relationship between answers of one given question. To fill this research gap, I designed new features to help improve the prediction performance. In the modeling part, to employ the structure of the feature space, I proposed an innovative learning-to-rank model by considering the hierarchical lasso. Experiments with comparison with the state-of-the-art in the best answer prediction literature have confirmed

that the proposed methods are effective and suitable for solving the research task.
ContributorsTian, Qiongjie (Author) / Li, Baoxin (Thesis advisor) / Tong, Hanghang (Committee member) / Davulcu, Hasan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2017
155339-Thumbnail Image.png
Description
The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences

The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences in image quality (resolution, brightness, occlusion and color), changes in camera perspective, dissimilar backgrounds and an inherent diversity of the samples themselves. Machine learning techniques like transfer learning are employed to adapt computational models across distributions. Domain adaptation is a special case of transfer learning, where knowledge from a source domain is transferred to a target domain in the form of learned models and efficient feature representations.

The dissertation outlines novel domain adaptation approaches across different feature spaces; (i) a linear Support Vector Machine model for domain alignment; (ii) a nonlinear kernel based approach that embeds domain-aligned data for enhanced classification; (iii) a hierarchical model implemented using deep learning, that estimates domain-aligned hash values for the source and target data, and (iv) a proposal for a feature selection technique to reduce cross-domain disparity. These adaptation procedures are tested and validated across a range of computer vision applications like object classification, facial expression recognition, digit recognition, and activity recognition. The dissertation also provides a unique perspective of domain adaptation literature from the point-of-view of linear, nonlinear and hierarchical feature spaces. The dissertation concludes with a discussion on the future directions for research that highlight the role of domain adaptation in an era of rapid advancements in artificial intelligence.
ContributorsDemakethepalli Venkateswara, Hemanth (Author) / Panchanathan, Sethuraman (Thesis advisor) / Li, Baoxin (Committee member) / Davulcu, Hasan (Committee member) / Ye, Jieping (Committee member) / Chakraborty, Shayok (Committee member) / Arizona State University (Publisher)
Created2017