Search Content

The role of mutations in protein structural dynamics and function: a multi-scale computational approach

Description

Proteins are a fundamental unit in biology. Although proteins have been extensively studied, there is still much to investigate. The mechanism by which proteins fold into their native state, how evolution shapes structural dynamics, and the dynamic mechanisms of many diseases are not well understood. In this thesis, protein folding…

Proteins are a fundamental unit in biology. Although proteins have been extensively studied, there is still much to investigate. The mechanism by which proteins fold into their native state, how evolution shapes structural dynamics, and the dynamic mechanisms of many diseases are not well understood. In this thesis, protein folding is explored using a multi-scale modeling method including (i) geometric constraint based simulations that efficiently search for native like topologies and (ii) reservoir replica exchange molecular dynamics, which identify the low free energy structures and refines these structures toward the native conformation. A test set of eight proteins and three ancestral steroid receptor proteins are folded to 2.7Å all-atom RMSD from their experimental crystal structures. Protein evolution and disease associated mutations (DAMs) are most commonly studied by in silico multiple sequence alignment methods. Here, however, the structural dynamics are incorporated to give insight into the evolution of three ancestral proteins and the mechanism of several diseases in human ferritin protein. The differences in conformational dynamics of these evolutionary related, functionally diverged ancestral steroid receptor proteins are investigated by obtaining the most collective motion through essential dynamics. Strikingly, this analysis shows that evolutionary diverged proteins of the same family do not share the same dynamic subspace. Rather, those sharing the same function are simultaneously clustered together and distant from those functionally diverged homologs. This dynamics analysis also identifies 77% of mutations (functional and permissive) necessary to evolve new function. In silico methods for prediction of DAMs rely on differences in evolution rate due to purifying selection and therefore the accuracy of DAM prediction decreases at fast and slow evolvable sites. Here, we investigate structural dynamics through computing the contribution of each residue to the biologically relevant fluctuations and from this define a metric: the dynamic stability index (DSI). Using DSI we study the mechanism for three diseases observed in the human ferritin protein. The T30I and R40G DAMs show a loss of dynamic stability at the C-terminus helix and nearby regulatory loop, agreeing with experimental results implicating the same regulatory loop as a cause in cataracts syndrome.

ContributorsGlembo, Tyler J (Author) / Ozkan, Sefika B (Thesis advisor) / Thorpe, Michael F (Committee member) / Ros, Robert (Committee member) / Kumar, Sudhir (Committee member) / Shumway, John (Committee member) / Arizona State University (Publisher)

Created2011

Multi-task learning via structured regularization: formulations, algorithms, and applications

Description

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It…

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose a series of MTL approaches which assume that multiple tasks are intrinsically related via a shared low-dimensional feature space. The proposed MTL approaches are developed to deal with different scenarios and settings; they are respectively formulated as mathematical optimization problems of minimizing the empirical loss regularized by different structures. For all proposed MTL formulations, I develop the associated optimization algorithms to find their globally optimal solution efficiently. I also conduct theoretical analysis for certain MTL approaches by deriving the globally optimal solution recovery condition and the performance bound. To demonstrate the practical performance, I apply the proposed MTL approaches on different real-world applications: (1) Automated annotation of the Drosophila gene expression pattern images; (2) Categorization of the Yahoo web pages. Our experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.

ContributorsChen, Jianhui (Author) / Ye, Jieping (Thesis advisor) / Kumar, Sudhir (Committee member) / Liu, Huan (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2011

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

HIV evolution: biogeography and intra-individual dynamics

Description

The entire history of HIV-1 is hidden in its ten thousand bases, where information regarding its evolutionary traversal through the human population can only be unlocked with fine-scale sequence analysis. Measurable footprints of mutation and recombination have imparted upon us a wealth of knowledge, from multiple chimpanzee-to-human transmissions to patterns…

The entire history of HIV-1 is hidden in its ten thousand bases, where information regarding its evolutionary traversal through the human population can only be unlocked with fine-scale sequence analysis. Measurable footprints of mutation and recombination have imparted upon us a wealth of knowledge, from multiple chimpanzee-to-human transmissions to patterns of neutralizing antibody and drug resistance. Extracting maximum understanding from such diverse data can only be accomplished by analyzing the viral population from many angles. This body of work explores two primary aspects of HIV sequence evolution, point mutation and recombination, through cross-sectional (inter-individual) and longitudinal (intra-individual) investigations, respectively. Cross-sectional Analysis: The role of Haiti in the subtype B pandemic has been hotly debated for years; while there have been many studies, up to this point, no one has incorporated the well-known mechanism of retroviral recombination into their biological model. Prior to the use of recombination detection, multiple analyses produced trees where subtype B appears to have first entered Haiti, followed by a jump into the rest of the world. The results presented here contest the Haiti-first theory of the pandemic and instead suggest simultaneous entries of subtype B into Haiti and the rest of the world. Longitudinal Analysis: Potential N-linked glycosylation sites (PNGS) are the most evolutionarily dynamic component of one of the most evolutionarily dynamic proteins known to date. While the number of mutations associated with the increase or decrease of PNGS frequency over time is high, there are a set of relatively stable sites that persist within and between longitudinally sampled individuals. Here, I identify the most conserved stable PNGSs and suggest their potential roles in host-virus interplay. In addition, I have identified, for the first time, what may be a gp-120-based environmental preference for N-linked glycosylation sites.

ContributorsHepp, Crystal Marie, 1981- (Author) / Rosenberg, Michael S. (Thesis advisor) / Hedrick, Philip (Committee member) / Escalante, Ananias (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

Combined photo- and thermionic electron emission from low work function diamond films

Description

In this dissertation, combined photo-induced and thermionic electron emission from low work function diamond films is studied through low energy electron spectroscopy analysis and other associated techniques. Nitrogen-doped, hydrogen-terminated diamond films prepared by the microwave plasma chemical vapor deposition method have been the most focused material. The theme of this…

In this dissertation, combined photo-induced and thermionic electron emission from low work function diamond films is studied through low energy electron spectroscopy analysis and other associated techniques. Nitrogen-doped, hydrogen-terminated diamond films prepared by the microwave plasma chemical vapor deposition method have been the most focused material. The theme of this research is represented by four interrelated issues. (1) An in-depth study describes combined photo-induced and thermionic emission from nitrogen-doped diamond films on molybdenum substrates, which were illuminated with visible light photons, and the electron emission spectra were recorded as a function of temperature. The diamond films displayed significant emissivity with a low work function of ~ 1.5 eV. The results indicate that these diamond emitters can be applied in combined solar and thermal energy conversion. (2) The nitrogen-doped diamond was further investigated to understand the physical mechanism and material-related properties that enable the combined electron emission. Through analysis of the spectroscopy, optical absorbance and photoelectron microscopy results from sample sets prepared with different configurations, it was deduced that the photo-induced electron generation involves both the ultra-nanocrystalline diamond and the interface between the diamond film and metal substrate. (3) Based on results from the first two studies, possible photon-enhanced thermionic emission was examined from nitrogen-doped diamond films deposited on silicon substrates, which could provide the basis for a novel approach for concentrated solar energy conversion. A significant increase of emission intensity was observed at elevated temperatures, which was analyzed using computer-based modeling and a combination of different emission mechanisms. (4) In addition, the electronic structure of vanadium-oxide-terminated diamond surfaces was studied through in-situ photoemission spectroscopy. Thin layers of vanadium were deposited on oxygen-terminated diamond surfaces which led to oxide formation. After thermal annealing, a negative electron affinity was found on boron-doped diamond, while a positive electron affinity was found on nitrogen-doped diamond. A model based on the barrier at the diamond-oxide interface was employed to analyze the results. Based on results of this dissertation, applications of diamond-based energy conversion devices for combined solar- and thermal energy conversion are proposed.

ContributorsSun, Tianyin (Author) / Nemanich, Robert (Thesis advisor) / Ponce, Fernando (Committee member) / Peng, Xihong (Committee member) / Spence, John (Committee member) / Treacy, Michael (Committee member) / Arizona State University (Publisher)

Created2013

Identification of neo-antigens for a cancer vaccine by transcriptome analysis

Description

We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA…

We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA processing or in genomic DNA, may lead to generation of neo-peptides that are foreign to the immune system. Viral peptides presumably would originate from exogenous but integrated viral nucleic acid sequences. Both are non-self, therefore lessen concerns about development of autoimmunity. I have developed a bioinformatical approach to identify these aberrant transcripts in the cancer transcriptome. Their suitability for use in a vaccine is evaluated by establishing their frequencies and predicting possible epitopes along with their population coverage according to the prevalence of major histocompatibility complex (MHC) types. Viral transcripts and transcripts with FS mutations from gene fusion, insertion/deletion at coding microsatellite DNA, and alternative splicing were identified in NCBI Expressed Sequence Tag (EST) database. 48 FS chimeric transcripts were validated in 50 breast cell lines and 68 primary breast tumor samples with their frequencies from 4% to 98% by RT-PCR and sequencing confirmation. These 48 FS peptides, if translated and presented, could be used to protect more than 90% of the population in Northern America based on the prediction of epitopes derived from them. Furthermore, we synthesized 150 peptides that correspond to FS and viral peptides that we predicted would exist in tumor patients and we tested over 200 different cancer patient sera. We found a number of serological reactive peptide sequences in cancer patients that had little to no reactivity in healthy controls; strong support for the strength of our bioinformatic approach. This study describes a process used to identify aberrant transcripts that lead to a new source of antigens that can be tested and used in a prophylactic cancer vaccine. The vast amount of transcriptome data of various cancers from the Cancer Genome Atlas (TCGA) project will enhance our ability to further select better cancer antigen candidates.

ContributorsLee, HoJoon (Author) / Johnston, Stephen A. (Thesis advisor) / Kumar, Sudhir (Committee member) / Miller, Laurence (Committee member) / Stafford, Phillip (Committee member) / Sykes, Kathryn (Committee member) / Arizona State University (Publisher)

Created2012

Identification of Tumor Associated Antigens using Nucleic Acid Programmable Protein Arrays

Description

Identifying disease biomarkers may aid in the early detection of breast cancer and improve patient outcomes. Recent evidence suggests that tumors are immunogenic and therefore patients may launch an autoantibody response to tumor associated antigens. Single-chain variable fragments of autoantibodies derived from regional lymph node B cells of breast cancer…

Identifying disease biomarkers may aid in the early detection of breast cancer and improve patient outcomes. Recent evidence suggests that tumors are immunogenic and therefore patients may launch an autoantibody response to tumor associated antigens. Single-chain variable fragments of autoantibodies derived from regional lymph node B cells of breast cancer patients were used to discover these tumor associated biomarkers on protein microarrays. Six candidate biomarkers were discovered from 22 heavy chain-only variable region antibody fragments screened. Validation tests are necessary to confirm the tumorgenicity of these antigens. However, the use of single-chain variable autoantibody fragments presents a novel platform for diagnostics and cancer therapeutics.

ContributorsSharman, M. Camila (Author) / Magee, Dewey (Mitch) (Thesis director) / Wallstrom, Garrick (Committee member) / Petritis, Brianne (Committee member) / Barrett, The Honors College (Contributor) / College of Liberal Arts and Sciences (Contributor) / Virginia G. Piper Center for Personalized Diagnostics (Contributor) / Biodesign Institute (Contributor)

Created2012-12

Gut Bacteria in Children With Autism Spectrum Disorders: Challenges and Promise of Studying How a Complex Community Influences a Complex Disease

Description

Recent studies suggest a role for the microbiota in autism spectrum disorders (ASD), potentially arising from their role in modulating the immune system and gastrointestinal (GI) function or from gut–brain interactions dependent or independent from the immune system. GI problems such as chronic constipation and/or diarrhea are common in children…

Recent studies suggest a role for the microbiota in autism spectrum disorders (ASD), potentially arising from their role in modulating the immune system and gastrointestinal (GI) function or from gut–brain interactions dependent or independent from the immune system. GI problems such as chronic constipation and/or diarrhea are common in children with ASD, and significantly worsen their behavior and their quality of life. Here we first summarize previously published data supporting that GI dysfunction is common in individuals with ASD and the role of the microbiota in ASD. Second, by comparing with other publically available microbiome datasets, we provide some evidence that the shifted microbiota can be a result of westernization and that this shift could also be framing an altered immune system. Third, we explore the possibility that gut–brain interactions could also be a direct result of microbially produced metabolites.

ContributorsKrajmalnik-Brown, Rosa (Author) / Lozupone, Catherine (Author) / Kang, Dae Wook (Author) / Adams, James (Author) / Biodesign Institute (Contributor)

Created2015-03-12

The effect of material properties on energy resolution in gamma-ray detectors

Description

Nuclear proliferation concerns have resulted in a desire for radiation detectors with superior energy resolution. In this dissertation a Monte Carlo code is developed for calculating energy resolution in gamma-ray detector materials. The effects of basic material properties such as the bandgap and plasmon resonance energy are studied using…

Nuclear proliferation concerns have resulted in a desire for radiation detectors with superior energy resolution. In this dissertation a Monte Carlo code is developed for calculating energy resolution in gamma-ray detector materials. The effects of basic material properties such as the bandgap and plasmon resonance energy are studied using a model for inelastic electron scattering based on electron energy-loss spectra. From a simplified "toy model" for a generic material, energy resolution is found to oscillate as the plasmon resonance energy is increased, and energy resolution can also depend on the valence band width. By incorporating the model developed here as an extension of the radiation transport code Penelope, photon processes are also included. The enhanced version of Penelope is used to calculate the Fano factor and average electron-hole pair energy in semiconductors silicon, gallium arsenide, zinc telluride, and scintillators cerium fluoride and lutetium oxyorthosilicate (LSO). If the effects of the valence band density-of-states and phonon scattering are removed, the calculated energy-resolution for these materials is fairly close to that for a toy model with a uniform electron energy-loss probability density function. This implies that the details of the electron cascade may in some cases have only a marginal effect on energy resolution.

ContributorsNarayan, Raman (Author) / Rez, Peter (Thesis advisor) / Spence, John (Committee member) / Ponce, Fernando (Committee member) / Comfort, Joseph (Committee member) / Chizmeshya, Andrew (Committee member) / Arizona State University (Publisher)

Created2011

Image-level and group-level models for Drosophila gene expression pattern annotation

Description

Background
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the…

Background
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
Results
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
Conclusion
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.

ContributorsSun, Qian (Author) / Muckatira, Sherin (Author) / Yuan, Lei (Author) / Ji, Shuiwang (Author) / Newfeld, Stuart (Author) / Kumar, Sudhir (Author) / Ye, Jieping (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / Ira A. Fulton Schools of Engineering (Contributor)

Created2013-12-03