Search Content

Combining thickness information with surface tensor-based morphometry for the 3D statistical analysis of the corpus callosum

Description

In blindness research, the corpus callosum (CC) is the most frequently studied sub-cortical structure, due to its important involvement in visual processing. While most callosal analyses from brain structural magnetic resonance images (MRI) are limited to the 2D mid-sagittal slice, we propose a novel framework to capture a complete set…

In blindness research, the corpus callosum (CC) is the most frequently studied sub-cortical structure, due to its important involvement in visual processing. While most callosal analyses from brain structural magnetic resonance images (MRI) are limited to the 2D mid-sagittal slice, we propose a novel framework to capture a complete set of 3D morphological differences in the corpus callosum between two groups of subjects. The CCs are segmented from whole brain T1-weighted MRI and modeled as 3D tetrahedral meshes. The callosal surface is divided into superior and inferior patches on which we compute a volumetric harmonic field by solving the Laplace's equation with Dirichlet boundary conditions. We adopt a refined tetrahedral mesh to compute the Laplacian operator, so our computation can achieve sub-voxel accuracy. Thickness is estimated by tracing the streamlines in the harmonic field. We combine areal changes found using surface tensor-based morphometry and thickness information into a vector at each vertex to be used as a metric for the statistical analysis. Group differences are assessed on this combined measure through Hotelling's T2 test. The method is applied to statistically compare three groups consisting of: congenitally blind (CB), late blind (LB; onset > 8 years old) and sighted (SC) subjects. Our results reveal significant differences in several regions of the CC between both blind groups and the sighted groups; and to a lesser extent between the LB and CB groups. These results demonstrate the crucial role of visual deprivation during the developmental period in reshaping the structural architecture of the CC.

ContributorsXu, Liang (Author) / Wang, Yalin (Thesis advisor) / Maciejewski, Ross (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

Machine learning methods for high-dimensional imbalanced biomedical data

Description

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect…

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance. In this thesis, I focus on developing learning methods for the high-dimensional imbalanced biomedical data. In the first part, a sparse canonical correlation analysis (CCA) method is presented. The penalty terms is used to control the sparsity of the projection matrices of CCA. The sparse CCA method is then applied to find patterns among biomedical data sets and labels, or to find patterns among different data sources. In the second part, I discuss several learning problems for imbalanced biomedical data. Note that traditional learning systems are often biased when the biomedical data are imbalanced. Therefore, traditional evaluations such as accuracy may be inappropriate for such cases. I then discuss several alternative evaluation criteria to evaluate the learning performance. For imbalanced binary classification problems, I use the undersampling based classifiers ensemble (UEM) strategy to obtain accurate models for both classes of samples. A small sphere and large margin (SSLM) approach is also presented to detect rare abnormal samples from a large number of subjects. In addition, I apply multiple feature selection and clustering methods to deal with high-dimensional data and data with highly correlated features. Experiments on high-dimensional imbalanced biomedical data are presented which illustrate the effectiveness and efficiency of my methods.

ContributorsYang, Tao (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

Understanding and Utilizing Protein Interactions in Diverse Environments

Description

Transient protein-protein and protein-molecule interactions fluctuate between associated and dissociated states. They are widespread in nature and mediate most biological processes. These interactions are complex and are strongly influenced by factors such as concentration, structure, and environment. Understanding and utilizing these types of interactions is useful from both a fundamental…

Transient protein-protein and protein-molecule interactions fluctuate between associated and dissociated states. They are widespread in nature and mediate most biological processes. These interactions are complex and are strongly influenced by factors such as concentration, structure, and environment. Understanding and utilizing these types of interactions is useful from both a fundamental and design perspective. In this dissertation, transient protein interactions are used as the sensing element of a biosensor for small molecule detection. This is done by using a transcription factor-small molecule pair that mediates the activation of a CRISPR/Cas12a complex. Activation of the Cas12a enzyme results in an amplified readout mechanism that is either fluorescence or paper based. This biosensor can successfully detect 9 different small molecules including antibiotics with a tuneable detection limit ranging from low µM to low nM. By combining protein and nucleic acid-based systems, this biosensor has the potential to report on almost any protein-molecule interaction, linking this to the intrinsic amplification that is possible when working with nucleic acid-based technologies. The second part of this dissertation focuses on understanding protein-molecule interactions at a more fundamental level, and, in so doing, exploring design rules required to generalize sensors like the ones described above. This is done by training a neural network algorithm with binding data from high density peptide micro arrays incubated with specific protein targets. Because the peptide sequences were chosen simply to evenly, though sparsely, represent all sequence space, the resulting network provides a comprehensive sequence/binding relationship for a given target protein. While past work had shown that this works well on the arrays, here I have explored how well the neural networks thus trained, predict sequence-dependent binding in the context of protein-protein and peptide-protein interactions. Amino acid sequences, either free in solution or embedded in protein structure, will display somewhat different binding properties than sequences affixed to the surface of a high-density array. However, the neural network trained on array sequences was able to both identify binding regions in between proteins and predict surface plasmon resonance-based binding propensities for peptides with statistically significant levels of accuracy.

ContributorsSwingle, Kirstie Lynn (Author) / Woodbury, Neal W (Thesis advisor) / Green, Alexander A (Thesis advisor) / Stephanopoulos, Nicholas (Committee member) / Borges, Chad (Committee member) / Arizona State University (Publisher)

Created2022

Multi-task learning and its applications to biomedical informatics

Description

In many fields one needs to build predictive models for a set of related machine learning tasks, such as information retrieval, computer vision and biomedical informatics. Traditionally these tasks are treated independently and the inference is done separately for each task, which ignores important connections among the tasks. Multi-task learning…

In many fields one needs to build predictive models for a set of related machine learning tasks, such as information retrieval, computer vision and biomedical informatics. Traditionally these tasks are treated independently and the inference is done separately for each task, which ignores important connections among the tasks. Multi-task learning aims at simultaneously building models for all tasks in order to improve the generalization performance, leveraging inherent relatedness of these tasks. In this thesis, I firstly propose a clustered multi-task learning (CMTL) formulation, which simultaneously learns task models and performs task clustering. I provide theoretical analysis to establish the equivalence between the CMTL formulation and the alternating structure optimization, which learns a shared low-dimensional hypothesis space for different tasks. Then I present two real-world biomedical informatics applications which can benefit from multi-task learning. In the first application, I study the disease progression problem and present multi-task learning formulations for disease progression. In the formulations, the prediction at each point is a regression task and multiple tasks at different time points are learned simultaneously, leveraging the temporal smoothness among the tasks. The proposed formulations have been tested extensively on predicting the progression of the Alzheimer's disease, and experimental results demonstrate the effectiveness of the proposed models. In the second application, I present a novel data-driven framework for densifying the electronic medical records (EMR) to overcome the sparsity problem in predictive modeling using EMR. The densification of each patient is a learning task, and the proposed algorithm simultaneously densify all patients. As such, the densification of one patient leverages useful information from other patients.

ContributorsZhou, Jiayu (Author) / Ye, Jieping (Thesis advisor) / Mittelmann, Hans (Committee member) / Li, Baoxin (Committee member) / Wang, Yalin (Committee member) / Arizona State University (Publisher)

Created2014

Topological analysis of biological pathways : genes, microRNAs and pathways involved in hepatocellular carcinoma

Description

Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired miRNA-mRNA interactions are defined as the differential (rewiring) effects of…

Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired miRNA-mRNA interactions are defined as the differential (rewiring) effects of miRNAs on the topology of biological pathways between controls and cases. In the dissertation, it is discussed that how rewired biological pathways (Chapter 1) and/or rewired miRNA-mRNA interactions (Chapter 2) aberrantly influence the activity of biological pathways and their association with disease.

This dissertation proposes two PageRank-based analytical methods, Pathways of Topological Rank Analysis (PoTRA) and miR2Pathway, discussed in Chapter 1 and Chapter 2, respectively. PoTRA focuses on detecting pathways with an altered number of hub genes in corresponding pathways between two phenotypes. The basis for PoTRA is that the loss of connectivity is a common topological trait of cancer networks, as well as the prior knowledge that a normal biological network is a scale-free network whose degree distribution follows a power law where a small number of nodes are hubs and a large number of nodes are non-hubs. However, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the scale-free structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal samples. Hence, it is hypothesized that if the number of hub genes is different in a pathway between normal and cancer, this pathway might be involved in cancer. MiR2Pathway focuses on quantifying the differential effects of miRNAs on the activity of a biological pathway when miRNA-mRNA connections are altered from normal to disease and rank disease risk of rewired miRNA-mediated biological pathways. This dissertation explores how rewired gene-gene interactions and rewired miRNA-mRNA interactions lead to aberrant activity of biological pathways, and rank pathways for their disease risk. The two methods proposed here can be used to complement existing genomics analysis methods to facilitate the study of biological mechanisms behind disease at the systems-level.

ContributorsLi, Chaoxing (Author) / Dinu, Valentin (Thesis advisor) / Kuang, Yang (Thesis advisor) / Liu, Li (Committee member) / Wang, Xiao (Committee member) / Arizona State University (Publisher)

Created2017

Needle in a Haystack: the search for immunogenic epitopes for TPD52

Description

Breast cancer is the leading cause of cancer-related deaths of women in the united states. Traditionally, Breast cancer is predominantly treated by a combination of surgery, chemotherapy, and radiation therapy. However, due to the significant negative side effects associated with these traditional treatments, there has been substantial efforts to develo…

Breast cancer is the leading cause of cancer-related deaths of women in the united states. Traditionally, Breast cancer is predominantly treated by a combination of surgery, chemotherapy, and radiation therapy. However, due to the significant negative side effects associated with these traditional treatments, there has been substantial efforts to develop alternative therapies to treat cancer. One such alternative therapy is a peptide-based therapeutic cancer vaccine. Therapeutic cancer vaccines enhance an individual's immune response to a specific tumor. They are capable of doing this through artificial activation of tumor specific CTLs (Cytotoxic T Lymphocytes). However, in order to artificially activate tumor specific CTLs, a patient must be treated with immunogenic epitopes derived from their specific cancer type. We have identified that the tumor associated antigen, TPD52, is an ideal target for a therapeutic cancer vaccine. This designation was due to the overexpression of TPD52 in a variety of different cancer types. In order to start the development of a therapeutic cancer vaccine for TPD52-related cancers, we have devised a two-step strategy. First, we plan to create a list of potential TPD52 epitopes by using epitope binding and processing prediction tools. Second, we plan to attempt to experimentally identify MHC class I TPD52 epitopes in vitro. We identified 942 potential 9 and 10 amino acid epitopes for the HLAs A1, A2, A3, A11, A24, B07, B27, B35, B44. These epitopes were predicted by using a combination of 3 binding prediction tools and 2 processing prediction tools. From these 942 potential epitopes, we selected the top 50 epitopes ranked by a combination of binding and processing scores. Due to the promiscuity of some predicted epitopes for multiple HLAs, we ordered 38 synthetic epitopes from the list of the top 50 epitope. We also performed a frequency analysis of the TPD52 protein sequence and identified 3 high volume regions of high epitope production. After the epitope predictions were completed, we proceeded to attempt to experimentally detected presented TPD52 epitopes. First, we successful transduced parental K562 cells with TPD52. After transduction, we started the optimization process for the immunoprecipitation protocol. The optimization of the immunoprecipitation protocol proved to be more difficult than originally believed and was the main reason that we were unable to progress past the transduction of the parental cells. However, we believe that we have identified the issues and will be able to complete the experiment in the coming months.

ContributorsWilson, Eric Andrew (Author) / Anderson, Karen (Thesis director) / Borges, Chad (Committee member) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Filtering by

Combining thickness information with surface tensor-based morphometry for the 3D statistical analysis of the corpus callosum

Structured sparse learning and its applications to biomedical and biological data

Machine learning methods for high-dimensional imbalanced biomedical data

Understanding and Utilizing Protein Interactions in Diverse Environments

Multi-task learning and its applications to biomedical informatics

Topological analysis of biological pathways : genes, microRNAs and pathways involved in hepatocellular carcinoma

Needle in a Haystack: the search for immunogenic epitopes for TPD52