Search Content

Multi-task learning via structured regularization: formulations, algorithms, and applications

Description

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It…

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose a series of MTL approaches which assume that multiple tasks are intrinsically related via a shared low-dimensional feature space. The proposed MTL approaches are developed to deal with different scenarios and settings; they are respectively formulated as mathematical optimization problems of minimizing the empirical loss regularized by different structures. For all proposed MTL formulations, I develop the associated optimization algorithms to find their globally optimal solution efficiently. I also conduct theoretical analysis for certain MTL approaches by deriving the globally optimal solution recovery condition and the performance bound. To demonstrate the practical performance, I apply the proposed MTL approaches on different real-world applications: (1) Automated annotation of the Drosophila gene expression pattern images; (2) Categorization of the Yahoo web pages. Our experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.

ContributorsChen, Jianhui (Author) / Ye, Jieping (Thesis advisor) / Kumar, Sudhir (Committee member) / Liu, Huan (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2011

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

Improving expression vectors for recombinant protein production in plants

Description

Over the past decade, several high-value proteins have been produced using plant-based transient expression systems. However, these studies exposed some limitations that must be overcome to allow plant expression systems to reach their full potential. These limitations are the low level of recombinant protein accumulation achieved in some cases, and…

Over the past decade, several high-value proteins have been produced using plant-based transient expression systems. However, these studies exposed some limitations that must be overcome to allow plant expression systems to reach their full potential. These limitations are the low level of recombinant protein accumulation achieved in some cases, and lack of efficient co-expression vectors for the production of multi-protein complexes. This study report that tobacco Extensin (Ext) gene 3' untranslated region (UTR) can be broadly used to enhance recombinant protein expression in plants. Extensin is the hydroxyproline-rich glycoprotein that constitutes the major protein component of cell walls. Using transient expression, it was found that the Ext 3' UTR increases recombinant protein expression up to 13.5- and 6-fold in non-replicating and replicating vector systems, respectively, compared to previously established terminators. Enhanced protein accumulation was correlated with increased mRNA levels associated with reduction in read-through transcription. Regions of Ext 3' UTR essential for maximum gene expression included a poly-purine sequence used as a major poly-adenylation site. Furthermore, modified bean yellow dwarf virus (BeYDV)-based vectors designed to allow co-expression of multiple recombinant genes were constructed and tested for their performance in driving transient expression in plants. Robust co-expression and assembly of heavy and light chains of the anti-Ebola virus monoclonal antibody 6D8, as well as E. coli heat-labile toxin (LT) were achieved with the modified vectors. The simultaneous co-expression of three fluoroproteins using the single replicon, triple cassette is demonstrated by confocal microscopy. In conclusion, this study provides an excellent tool for rapid, cost-effective, large-scale manufacturing of recombinant proteins for use in medicine and industry.

ContributorsRosenthal, Sun Hee (Author) / Mason, Hugh (Thesis advisor) / Mor, Tsafrir (Committee member) / Chang, Yung (Committee member) / Arntzen, Charles (Committee member) / Arizona State University (Publisher)

Created2012

Developmental plasticity: the influence of neonatal diet and immune challenges on carotenoid-based ornamental coloration and adult immune function in mallard ducks

Description

Conditions during development can shape the expression of traits at adulthood, a phenomenon called developmental plasticity. In this context, factors such as nutrition or health state during development can affect current and subsequent physiology, body size, brain structure, ornamentation, and behavior. However, many of the links between developmental and adult…

Conditions during development can shape the expression of traits at adulthood, a phenomenon called developmental plasticity. In this context, factors such as nutrition or health state during development can affect current and subsequent physiology, body size, brain structure, ornamentation, and behavior. However, many of the links between developmental and adult phenotype are poorly understood. I performed a series of experiments using a common molecular currency - carotenoid pigments - to track somatic and reproductive investments through development and into adulthood. Carotenoids are red, orange, or yellow pigments that: (a) animals must acquire from their diets, (b) can be physiologically beneficial, acting as antioxidants or immunostimulants, and (c) color the sexually attractive features (e.g., feathers, scales) of many animals. I studied how carotenoid nutrition and immune challenges during ontogeny impacted ornamental coloration and immune function of adult male mallard ducks (Anas platyrhynchos). Male mallards use carotenoids to pigment their yellow beak, and males with more beaks that are more yellow are preferred as mates, have increased immune function, and have higher quality sperm. In my dissertation work, I established a natural context for the role that carotenoids and body condition play in the formation of the adult phenotype and examined how early-life experiences, including immune challenges and dietary access to carotenoids, affect adult immune function and ornamental coloration. Evidence from mallard ducklings in the field showed that variation in circulating carotenoid levels at hatch are likely driven by maternal allocation of carotenoids, but that carotenoid physiology shifts during the subsequent few weeks to reflect individual foraging habits. In the lab, adult beak color expression and immune function were more tightly correlated with body condition during growth than body condition during subsequent stages of development or adulthood. Immune challenges during development affected adult immune function and interacted with carotenoid physiology during adulthood, but did not affect adult beak coloration. Dietary access to carotenoids during development, but not adulthood, also affected adult immune function. Taken together, these results highlight the importance of the developmental stage in shaping certain survival-related traits (i.e., immune function), and lead to further questions regarding the development of ornamental traits.

ContributorsButler, Michael (Author) / McGraw, Kevin J. (Thesis advisor) / Chang, Yung (Committee member) / Deviche, Pierre (Committee member) / DeNardo, Dale (Committee member) / Rutowski, Ronald (Committee member) / Arizona State University (Publisher)

Created2012

Learning from the Data Heterogeneity for Data Imputation

Description

Data mining, also known as big data analysis, has been identified as a critical and challenging process for a variety of applications in real-world problems. Numerous datasets are collected and generated every day to store the information. The rise in the number of data volumes and data modality has resulted…

Data mining, also known as big data analysis, has been identified as a critical and challenging process for a variety of applications in real-world problems. Numerous datasets are collected and generated every day to store the information. The rise in the number of data volumes and data modality has resulted in the increased demand for data mining methods and strategies of finding anomalies, patterns, and correlations within large data sets to predict outcomes. Effective machine learning methods are widely adapted to build the data mining pipeline for various purposes like business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The major challenges for effectively and efficiently mining big data include (1) data heterogeneity and (2) missing data. Heterogeneity is the natural characteristic of big data, as the data is typically collected from different sources with diverse formats. The missing value is the most common issue faced by the heterogeneous data analysis, which resulted from variety of factors including the data collecting processing, user initiatives, erroneous data entries, and so on. In response to these challenges, in this thesis, three main research directions with application scenarios have been investigated: (1) Mining and Formulating Heterogeneous Data, (2) missing value imputation strategy in various application scenarios in both offline and online manner, and (3) missing value imputation for multi-modality data. Multiple strategies with theoretical analysis are presented, and the evaluation of the effectiveness of the proposed algorithms compared with state-of-the-art methods is discussed.

Contributorsliu, Xu (Author) / He, Jingrui (Thesis advisor) / Xue, Guoliang (Thesis advisor) / Li, Baoxin (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2021

Transfer Learning for BioImaging and Bilingual Applications

Description

Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution, but we may have plenty of labeled data from one or multiple related sources with different distributions.…

Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution, but we may have plenty of labeled data from one or multiple related sources with different distributions. Due to its capability of migrating knowledge from related domains, transfer learning has shown to be effective for cross-domain learning problems. In this dissertation, I carry out research along this direction with a particular focus on designing efficient and effective algorithms for BioImaging and Bilingual applications. Specifically, I propose deep transfer learning algorithms which combine transfer learning and deep learning to improve image annotation performance. Firstly, I propose to generate the deep features for the Drosophila embryo images via pretrained deep models and build linear classifiers on top of the deep features. Secondly, I propose to fine-tune the pretrained model with a small amount of labeled images. The time complexity and performance of deep transfer learning methodologies are investigated. Promising results have demonstrated the knowledge transfer ability of proposed deep transfer algorithms. Moreover, I propose a novel Robust Principal Component Analysis (RPCA) approach to process the noisy images in advance. In addition, I also present a two-stage re-weighting framework for general domain adaptation problems. The distribution of source domain is mapped towards the target domain in the first stage, and an adaptive learning model is proposed in the second stage to incorporate label information from the target domain if it is available. Then the proposed model is applied to tackle cross lingual spam detection problem at LinkedIn’s website. Our experimental results on real data demonstrate the efficiency and effectiveness of the proposed algorithms.

ContributorsSun, Qian (Author) / Ye, Jieping (Committee member) / Xue, Guoliang (Committee member) / Liu, Huan (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2015

Tracking the humoral immune response in type 1 diabetes

Description

Type 1 diabetes (T1D) is a chronic autoimmune disease characterized by progressive autoimmune destruction of insulin-producing pancreatic β-cells. Genetic, immunological and environmental factors contribute to T1D development. The focus of this dissertation is to track the humoral immune response in T1D by profiling autoantibodies (AAbs) and anti-viral antibodies using an…

Type 1 diabetes (T1D) is a chronic autoimmune disease characterized by progressive autoimmune destruction of insulin-producing pancreatic β-cells. Genetic, immunological and environmental factors contribute to T1D development. The focus of this dissertation is to track the humoral immune response in T1D by profiling autoantibodies (AAbs) and anti-viral antibodies using an innovative protein array platform called Nucleic Acid Programmable Protein Array (NAPPA).

AAbs provide value in identifying individuals at risk, stratifying patients with different clinical courses, improving our understanding of autoimmune destructions, identifying antigens for cellular immune response and providing candidates for prevention trials in T1D. A two-stage serological AAb screening against 6,000 human proteins was performed. A dual specificity tyrosine-phosphorylation-regulated kinase 2 (DYRK2) was validated with 36% sensitivity at 98% specificity by an orthogonal immunoassay. This is the first systematic screening for novel AAbs against large number of human proteins by protein arrays in T1D. A more comprehensive search for novel AAbs was performed using a knowledge-based approach by ELISA and a screening-based approach against 10,000 human proteins by NAPPA. Six AAbs were identified and validated with sensitivities ranged from 16% to 27% at 95% specificity. These two studies enriched the T1D “autoantigenome” and provided insights into T1D pathophysiology in an unprecedented breadth and width.

The rapid rise of T1D incidence suggests the potential involvement of environmental factors including viral infections. Sero-reactivity to 646 viral antigens was assessed in new-onset T1D patients. Antibody positive rate of EBV was significantly higher in cases than controls that suggested a potential role of EBV in T1D development. A high density-NAPPA platform was demonstrated with high reproducibility and sensitivity in profiling anti-viral antibodies.

This dissertation shows the power of a protein-array based immunoproteomics approach to characterize humoral immunoprofile against human and viral proteomes. The identification of novel T1D-specific AAbs and T1D-associated viruses will help to connect the nodes in T1D etiology and provide better understanding of T1D pathophysiology.

ContributorsBian, Xiaofang (Author) / LaBaer, Joshua (Thesis advisor) / Mandarino, Lawrence (Committee member) / Chang, Yung (Committee member) / Arizona State University (Publisher)

Created2015

Cancer autoantibody biomarker discovery and validation using nucleic acid programmable protein array

Description

Currently in the US, many patients with cancer do not benefit from the population-based screening, due to challenges associated with the existing cancer screening scheme. Blood-based diagnostic assays have the potential to detect diseases in a non-invasive way. Proteins released from small early tumors may only be present intermittently and…

Currently in the US, many patients with cancer do not benefit from the population-based screening, due to challenges associated with the existing cancer screening scheme. Blood-based diagnostic assays have the potential to detect diseases in a non-invasive way. Proteins released from small early tumors may only be present intermittently and get diluted to tiny concentrations in the blood, making them difficult to use as biomarkers. However, they can induce autoantibody (AAb) responses, which can amplify the signal and persist in the blood even if the antigen is gone. Circulating autoantibodies is a promising class of molecules that have potential to serve as early detection biomarkers for cancers. This Ph.D thesis aims to screen for autoantibody biomarkers for the early detection of two deadly cancer, basal-like breast cancer and lung adenocarcinoma. First, a method was developed to display proteins in both native and denatured conformation on protein array. This method adopted a novel protein tag technology, called HaloTag, to covalently immobilize proteins on glass slide surface. The covalent attachment allowed these proteins to endure harsh treatment without getting dissociated from slide surface, which enabled the profiling of antibody responses against both conformational and linear epitopes. Next, a plasma screening protocol was optimized to significantly increase signal to noise ratio of protein array based AAb detection. Following this, the AAb responses in basal-like breast cancer were explored using nucleic acid programmable protein arrays (NAPPA) containing 10,000 full-length human proteins in 45 cases and 45 controls. After verification in a large sample set (145 basal-like breast cancer cases / 145 controls / 70 non-basal breast cancer) by ELISA, a 13-AAb classifier was developed to differentiate patients from controls with a sensitivity of 33% at 98% specificity. Similar approach was also applied to the lung cancer study to identify AAbs that distinguished lung cancer patients from computed-tomography positive benign pulmonary nodules (137 lung cancer cases, 127 smoker controls, 170 benign controls). In this study, two panels of AAbs were discovered that showed promising sensitivity and specificity. Six out of eight AAb targets were also found to have elevated mRNA level in lung adenocarcinoma patients using TCGA data. These projects as a whole provide novel insights on the association between AAbs and cancer, as well as general B cell antigenicity against self-proteins.

ContributorsWang, Jie (Author) / LaBaer, Joshua (Thesis advisor) / Anderson, Karen S (Committee member) / Lake, Douglas F (Committee member) / Chang, Yung (Committee member) / Arizona State University (Publisher)

Created2015

Structured sparse methods for imaging genetics

Description

Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a novel direction to reveal and understand the complex genetic mechanisms. Oftentimes, imaging genetics studies are challenging due to the relatively…

Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a novel direction to reveal and understand the complex genetic mechanisms. Oftentimes, imaging genetics studies are challenging due to the relatively small number of subjects but extremely high-dimensionality of both imaging data and genomic data. In this dissertation, I carry on my research on imaging genetics with particular focuses on two tasks---building predictive models between neuroimaging data and genomic data, and identifying disorder-related genetic risk factors through image-based biomarkers. To this end, I consider a suite of structured sparse methods---that can produce interpretable models and are robust to overfitting---for imaging genetics. With carefully-designed sparse-inducing regularizers, different biological priors are incorporated into learning models. More specifically, in the Allen brain image--gene expression study, I adopt an advanced sparse coding approach for image feature extraction and employ a multi-task learning approach for multi-class annotation. Moreover, I propose a label structured-based two-stage learning framework, which utilizes the hierarchical structure among labels, for multi-label annotation. In the Alzheimer's disease neuroimaging initiative (ADNI) imaging genetics study, I employ Lasso together with EDPP (enhanced dual polytope projections) screening rules to fast identify Alzheimer's disease risk SNPs. I also adopt the tree-structured group Lasso with MLFre (multi-layer feature reduction) screening rules to incorporate linkage disequilibrium information into modeling. Moreover, I propose a novel absolute fused Lasso model for ADNI imaging genetics. This method utilizes SNP spatial structure and is robust to the choice of reference alleles of genotype coding. In addition, I propose a two-level structured sparse model that incorporates gene-level networks through a graph penalty into SNP-level model construction. Lastly, I explore a convolutional neural network approach for accurate predicting Alzheimer's disease related imaging phenotypes. Experimental results on real-world imaging genetics applications demonstrate the efficiency and effectiveness of the proposed structured sparse methods.

ContributorsYang, Tao (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Thesis advisor) / He, Jingrui (Committee member) / Li, Baoxin (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2017

Frameshift antigens for cancer vaccine development

Description

Immunotherapy has been revitalized with the advent of immune checkpoint blockade

treatments, and neo-antigens are the targets of immune system in cancer patients who

respond to the treatments. The cancer vaccine field is focused on using neo-antigens from

unique point mutations of genomic sequence in the cancer patient for making

personalized cancer vaccines. However,…

Immunotherapy has been revitalized with the advent of immune checkpoint blockade

treatments, and neo-antigens are the targets of immune system in cancer patients who

respond to the treatments. The cancer vaccine field is focused on using neo-antigens from

unique point mutations of genomic sequence in the cancer patient for making

personalized cancer vaccines. However, we choose a different path to find frameshift

neo-antigens at the mRNA level and develop broadly effective cancer vaccines based on

frameshift antigens.

In this dissertation, I have summarized and characterized all the potential frameshift

antigens from microsatellite regions in human, dog and mouse. A list of frameshift

antigens was validated by PCR in tumor samples and the mutation rate was calculated for

one candidate – SEC62. I develop a method to screen the antibody response against

frameshift antigens in human and dog cancer patients by using frameshift peptide arrays.

Frameshift antigens selected by positive antibody response in cancer patients or by MHC

predictions show protection in different mouse tumor models. A dog version of the

cancer vaccine based on frameshift antigens was developed and tested in a small safety

trial. The results demonstrate that the vaccine is safe and it can induce strong B and T cell

immune responses. Further, I built the human exon junction frameshift database which

includes all possible frameshift antigens from mis-splicing events in exon junctions, and I

develop a method to find potential frameshift antigens from large cancer

immunosignature dataset with these databases. In addition, I test the idea of ‘early cancer

diagnosis, early treatment’ in a transgenic mouse cancer model. The results show that

ii

early treatment gives significantly better protection than late treatment and the correct

time point for treatment is crucial to give the best clinical benefit. A model for early

treatment is developed with these results.

Frameshift neo-antigens from microsatellite regions and mis-splicing events are

abundant at mRNA level and they are better antigens than neo-antigens from point

mutations in the genomic sequences of cancer patients in terms of high immunogenicity,

low probability to cause autoimmune diseases and low cost to develop a broadly effective

vaccine. This dissertation demonstrates the feasibility of using frameshift antigens for

cancer vaccine development.

ContributorsZhang, Jian (Author) / Johnston, Stephen Albert (Thesis advisor) / Chang, Yung (Committee member) / Stafford, Phillip (Committee member) / Chen, Qiang (Committee member) / Arizona State University (Publisher)

Created2018

Filtering by