Matching Items (6)
Filtering by

Clear all filters

151926-Thumbnail Image.png
Description
In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems.

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems. The greatest challenge in developing such systems is the subject-dependent data variations or subject-based variability in physiological and biomedical data, which leads to difference in data distributions making the task of modeling these data, using traditional machine learning algorithms, complex and challenging. As a result, despite the wide application of machine learning, efficient deployment of its principles to model real-world data is still a challenge. This dissertation addresses the problem of subject based variability in physiological and biomedical data and proposes person adaptive prediction models based on novel transfer and active learning algorithms, an emerging field in machine learning. One of the significant contributions of this dissertation is a person adaptive method, for early detection of muscle fatigue using Surface Electromyogram signals, based on a new multi-source transfer learning algorithm. This dissertation also proposes a subject-independent algorithm for grading the progression of muscle fatigue from 0 to 1 level in a test subject, during isometric or dynamic contractions, at real-time. Besides subject based variability, biomedical image data also varies due to variations in their imaging techniques, leading to distribution differences between the image databases. Hence a classifier learned on one database may perform poorly on the other database. Another significant contribution of this dissertation has been the design and development of an efficient biomedical image data annotation framework, based on a novel combination of transfer learning and a new batch-mode active learning method, capable of addressing the distribution differences across databases. The methodologies developed in this dissertation are relevant and applicable to a large set of computing problems where there is a high variation of data between subjects or sources, such as face detection, pose detection and speech recognition. From a broader perspective, these frameworks can be viewed as a first step towards design of automated adaptive systems for real world data.
ContributorsChattopadhyay, Rita (Author) / Panchanathan, Sethuraman (Thesis advisor) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)
Created2013
150158-Thumbnail Image.png
Description
Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering

Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.
ContributorsSun, Liang (Author) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Liu, Huan (Committee member) / Mittelmann, Hans D. (Committee member) / Arizona State University (Publisher)
Created2011
150244-Thumbnail Image.png
Description
A statement appearing in social media provides a very significant challenge for determining the provenance of the statement. Provenance describes the origin, custody, and ownership of something. Most statements appearing in social media are not published with corresponding provenance data. However, the same characteristics that make the social media environment

A statement appearing in social media provides a very significant challenge for determining the provenance of the statement. Provenance describes the origin, custody, and ownership of something. Most statements appearing in social media are not published with corresponding provenance data. However, the same characteristics that make the social media environment challenging, including the massive amounts of data available, large numbers of users, and a highly dynamic environment, provide unique and untapped opportunities for solving the provenance problem for social media. Current approaches for tracking provenance data do not scale for online social media and consequently there is a gap in provenance methodologies and technologies providing exciting research opportunities. The guiding vision is the use of social media information itself to realize a useful amount of provenance data for information in social media. This departs from traditional approaches for data provenance which rely on a central store of provenance information. The contemporary online social media environment is an enormous and constantly updated "central store" that can be mined for provenance information that is not readily made available to the average social media user. This research introduces an approach and builds a foundation aimed at realizing a provenance data capability for social media users that is not accessible today.
ContributorsBarbier, Geoffrey P (Author) / Liu, Huan (Thesis advisor) / Bell, Herbert (Committee member) / Li, Baoxin (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2011
154816-Thumbnail Image.png
Description
Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword

Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword query to search health forums regarding to some specific questions, e.g., what treatments are effective for a disease symptom? A medical researcher can discover medical knowledge in a timely and large-scale fashion by automatically aggregating the latest evidences emerging in health forums.

This dissertation studies how to effectively discover information in health forums. Several challenges have been identified. First, the existing work relies on the syntactic information unit, such as a sentence, a post, or a thread, to bind different pieces of information in a forum. However, most of information discovery tasks should be based on the semantic information unit, a patient. For instance, given a keyword query that involves the relationship between a treatment and side effects, it is expected that the matched keywords refer to the same patient. In this work, patient-centered mining is proposed to mine patient semantic information units. In a patient information unit, the health information, such as diseases, symptoms, treatments, effects, and etc., is connected by the corresponding patient.

Second, the information published in health forums has varying degree of quality. Some information includes patient-reported personal health experience, while others can be hearsay. In this work, a context-aware experience extraction framework is proposed to mine patient-reported personal health experience, which can be used for evidence-based knowledge discovery or finding patients with similar experience.

At last, the proposed patient-centered and experience-aware mining framework is used to build a patient health information database for effectively discovering adverse drug reactions (ADRs) from health forums. ADRs have become a serious health problem and even a leading cause of death in the United States. Health forums provide valuable evidences in a large scale and in a timely fashion through the active participation of patients, caregivers, and doctors. Empirical evaluation shows the effectiveness of the proposed approach.
ContributorsLiu, Yunzhong (Author) / Chen, Yi (Thesis advisor) / Liu, Huan (Thesis advisor) / Li, Baoxin (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2016
155339-Thumbnail Image.png
Description
The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences

The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences in image quality (resolution, brightness, occlusion and color), changes in camera perspective, dissimilar backgrounds and an inherent diversity of the samples themselves. Machine learning techniques like transfer learning are employed to adapt computational models across distributions. Domain adaptation is a special case of transfer learning, where knowledge from a source domain is transferred to a target domain in the form of learned models and efficient feature representations.

The dissertation outlines novel domain adaptation approaches across different feature spaces; (i) a linear Support Vector Machine model for domain alignment; (ii) a nonlinear kernel based approach that embeds domain-aligned data for enhanced classification; (iii) a hierarchical model implemented using deep learning, that estimates domain-aligned hash values for the source and target data, and (iv) a proposal for a feature selection technique to reduce cross-domain disparity. These adaptation procedures are tested and validated across a range of computer vision applications like object classification, facial expression recognition, digit recognition, and activity recognition. The dissertation also provides a unique perspective of domain adaptation literature from the point-of-view of linear, nonlinear and hierarchical feature spaces. The dissertation concludes with a discussion on the future directions for research that highlight the role of domain adaptation in an era of rapid advancements in artificial intelligence.
ContributorsDemakethepalli Venkateswara, Hemanth (Author) / Panchanathan, Sethuraman (Thesis advisor) / Li, Baoxin (Committee member) / Davulcu, Hasan (Committee member) / Ye, Jieping (Committee member) / Chakraborty, Shayok (Committee member) / Arizona State University (Publisher)
Created2017
162017-Thumbnail Image.png
Description
Data mining, also known as big data analysis, has been identified as a critical and challenging process for a variety of applications in real-world problems. Numerous datasets are collected and generated every day to store the information. The rise in the number of data volumes and data modality has resulted

Data mining, also known as big data analysis, has been identified as a critical and challenging process for a variety of applications in real-world problems. Numerous datasets are collected and generated every day to store the information. The rise in the number of data volumes and data modality has resulted in the increased demand for data mining methods and strategies of finding anomalies, patterns, and correlations within large data sets to predict outcomes. Effective machine learning methods are widely adapted to build the data mining pipeline for various purposes like business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The major challenges for effectively and efficiently mining big data include (1) data heterogeneity and (2) missing data. Heterogeneity is the natural characteristic of big data, as the data is typically collected from different sources with diverse formats. The missing value is the most common issue faced by the heterogeneous data analysis, which resulted from variety of factors including the data collecting processing, user initiatives, erroneous data entries, and so on. In response to these challenges, in this thesis, three main research directions with application scenarios have been investigated: (1) Mining and Formulating Heterogeneous Data, (2) missing value imputation strategy in various application scenarios in both offline and online manner, and (3) missing value imputation for multi-modality data. Multiple strategies with theoretical analysis are presented, and the evaluation of the effectiveness of the proposed algorithms compared with state-of-the-art methods is discussed.
Contributorsliu, Xu (Author) / He, Jingrui (Thesis advisor) / Xue, Guoliang (Thesis advisor) / Li, Baoxin (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2021