Search Content

The role of mutations in protein structural dynamics and function: a multi-scale computational approach

Description

Proteins are a fundamental unit in biology. Although proteins have been extensively studied, there is still much to investigate. The mechanism by which proteins fold into their native state, how evolution shapes structural dynamics, and the dynamic mechanisms of many diseases are not well understood. In this thesis, protein folding…

Proteins are a fundamental unit in biology. Although proteins have been extensively studied, there is still much to investigate. The mechanism by which proteins fold into their native state, how evolution shapes structural dynamics, and the dynamic mechanisms of many diseases are not well understood. In this thesis, protein folding is explored using a multi-scale modeling method including (i) geometric constraint based simulations that efficiently search for native like topologies and (ii) reservoir replica exchange molecular dynamics, which identify the low free energy structures and refines these structures toward the native conformation. A test set of eight proteins and three ancestral steroid receptor proteins are folded to 2.7Å all-atom RMSD from their experimental crystal structures. Protein evolution and disease associated mutations (DAMs) are most commonly studied by in silico multiple sequence alignment methods. Here, however, the structural dynamics are incorporated to give insight into the evolution of three ancestral proteins and the mechanism of several diseases in human ferritin protein. The differences in conformational dynamics of these evolutionary related, functionally diverged ancestral steroid receptor proteins are investigated by obtaining the most collective motion through essential dynamics. Strikingly, this analysis shows that evolutionary diverged proteins of the same family do not share the same dynamic subspace. Rather, those sharing the same function are simultaneously clustered together and distant from those functionally diverged homologs. This dynamics analysis also identifies 77% of mutations (functional and permissive) necessary to evolve new function. In silico methods for prediction of DAMs rely on differences in evolution rate due to purifying selection and therefore the accuracy of DAM prediction decreases at fast and slow evolvable sites. Here, we investigate structural dynamics through computing the contribution of each residue to the biologically relevant fluctuations and from this define a metric: the dynamic stability index (DSI). Using DSI we study the mechanism for three diseases observed in the human ferritin protein. The T30I and R40G DAMs show a loss of dynamic stability at the C-terminus helix and nearby regulatory loop, agreeing with experimental results implicating the same regulatory loop as a cause in cataracts syndrome.

ContributorsGlembo, Tyler J (Author) / Ozkan, Sefika B (Thesis advisor) / Thorpe, Michael F (Committee member) / Ros, Robert (Committee member) / Kumar, Sudhir (Committee member) / Shumway, John (Committee member) / Arizona State University (Publisher)

Created2011

Targeted proteomics studies: design, development and translation of mass spectrometric immunoassays for diabetes and kidney disease

Description

In an effort to begin validating the large number of discovered candidate biomarkers, proteomics is beginning to shift from shotgun proteomic experiments towards targeted proteomic approaches that provide solutions to automation and economic concerns. Such approaches to validate biomarkers necessitate the mass spectrometric analysis of hundreds to thousands of human…

In an effort to begin validating the large number of discovered candidate biomarkers, proteomics is beginning to shift from shotgun proteomic experiments towards targeted proteomic approaches that provide solutions to automation and economic concerns. Such approaches to validate biomarkers necessitate the mass spectrometric analysis of hundreds to thousands of human samples. As this takes place, a serendipitous opportunity has become evident. By the virtue that as one narrows the focus towards "single" protein targets (instead of entire proteomes) using pan-antibody-based enrichment techniques, a discovery science has emerged, so to speak. This is due to the largely unknown context in which "single" proteins exist in blood (i.e. polymorphisms, transcript variants, and posttranslational modifications) and hence, targeted proteomics has applications for established biomarkers. Furthermore, besides protein heterogeneity accounting for interferences with conventional immunometric platforms, it is becoming evident that this formerly hidden dimension of structural information also contains rich-pathobiological information. Consequently, targeted proteomics studies that aim to ascertain a protein's genuine presentation within disease- stratified populations and serve as a stepping-stone within a biomarker translational pipeline are of clinical interest. Roughly 128 million Americans are pre-diabetic, diabetic, and/or have kidney disease and public and private spending for treating these diseases is in the hundreds of billions of dollars. In an effort to create new solutions for the early detection and management of these conditions, described herein is the design, development, and translation of mass spectrometric immunoassays targeted towards diabetes and kidney disease. Population proteomics experiments were performed for the following clinically relevant proteins: insulin, C-peptide, RANTES, and parathyroid hormone. At least thirty-eight protein isoforms were detected. Besides the numerous disease correlations confronted within the disease-stratified cohorts, certain isoforms also appeared to be causally related to the underlying pathophysiology and/or have therapeutic implications. Technical advancements include multiplexed isoform quantification as well a "dual- extraction" methodology for eliminating non-specific proteins while simultaneously validating isoforms. Industrial efforts towards widespread clinical adoption are also described. Consequently, this work lays a foundation for the translation of mass spectrometric immunoassays into the clinical arena and simultaneously presents the most recent advancements concerning the mass spectrometric immunoassay approach.

ContributorsOran, Paul (Author) / Nelson, Randall (Thesis advisor) / Hayes, Mark (Thesis advisor) / Ros, Alexandra (Committee member) / Williams, Peter (Committee member) / Arizona State University (Publisher)

Created2011

Multi-task learning via structured regularization: formulations, algorithms, and applications

Description

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It…

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose a series of MTL approaches which assume that multiple tasks are intrinsically related via a shared low-dimensional feature space. The proposed MTL approaches are developed to deal with different scenarios and settings; they are respectively formulated as mathematical optimization problems of minimizing the empirical loss regularized by different structures. For all proposed MTL formulations, I develop the associated optimization algorithms to find their globally optimal solution efficiently. I also conduct theoretical analysis for certain MTL approaches by deriving the globally optimal solution recovery condition and the performance bound. To demonstrate the practical performance, I apply the proposed MTL approaches on different real-world applications: (1) Automated annotation of the Drosophila gene expression pattern images; (2) Categorization of the Yahoo web pages. Our experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.

ContributorsChen, Jianhui (Author) / Ye, Jieping (Thesis advisor) / Kumar, Sudhir (Committee member) / Liu, Huan (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2011

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

Statistical signal processing of ESI-TOF-MS for biomarker discovery

Description

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest from degraded measurements. One such application is mass spectrometry immunoassay…

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest from degraded measurements. One such application is mass spectrometry immunoassay (MSIA) which has been one of the primary methods of biomarker discovery techniques. MSIA analyzes protein molecules as potential biomarkers using time of flight mass spectrometry (TOF-MS). Peak detection in TOF-MS is important for biomarker analysis and many other MS related application. Though many peak detection algorithms exist, most of them are based on heuristics models. One of the ways of detecting signal peaks is by deploying stochastic models of the signal and noise observations. Likelihood ratio test (LRT) detector, based on the Neyman-Pearson (NP) lemma, is an uniformly most powerful test to decision making in the form of a hypothesis test. The primary goal of this dissertation is to develop signal and noise models for the electrospray ionization (ESI) TOF-MS data. A new method is proposed for developing the signal model by employing first principles calculations based on device physics and molecular properties. The noise model is developed by analyzing MS data from careful experiments in the ESI mass spectrometer. A non-flat baseline in MS data is common. The reasons behind the formation of this baseline has not been fully comprehended. A new signal model explaining the presence of baseline is proposed, though detailed experiments are needed to further substantiate the model assumptions. Signal detection schemes based on these signal and noise models are proposed. A maximum likelihood (ML) method is introduced for estimating the signal peak amplitudes. The performance of the detection methods and ML estimation are evaluated with Monte Carlo simulation which shows promising results. An application of these methods is proposed for fractional abundance calculation for biomarker analysis, which is mathematically robust and fundamentally different than the current algorithms. Biomarker panels for type 2 diabetes and cardiovascular disease are analyzed using existing MS analysis algorithms. Finally, a support vector machine based multi-classification algorithm is developed for evaluating the biomarkers' effectiveness in discriminating type 2 diabetes and cardiovascular diseases and is shown to perform better than a linear discriminant analysis based classifier.

ContributorsBuddi, Sai (Author) / Taylor, Thomas (Thesis advisor) / Cochran, Douglas (Thesis advisor) / Nelson, Randall (Committee member) / Duman, Tolga (Committee member) / Arizona State University (Publisher)

Created2012

HIV evolution: biogeography and intra-individual dynamics

Description

The entire history of HIV-1 is hidden in its ten thousand bases, where information regarding its evolutionary traversal through the human population can only be unlocked with fine-scale sequence analysis. Measurable footprints of mutation and recombination have imparted upon us a wealth of knowledge, from multiple chimpanzee-to-human transmissions to patterns…

The entire history of HIV-1 is hidden in its ten thousand bases, where information regarding its evolutionary traversal through the human population can only be unlocked with fine-scale sequence analysis. Measurable footprints of mutation and recombination have imparted upon us a wealth of knowledge, from multiple chimpanzee-to-human transmissions to patterns of neutralizing antibody and drug resistance. Extracting maximum understanding from such diverse data can only be accomplished by analyzing the viral population from many angles. This body of work explores two primary aspects of HIV sequence evolution, point mutation and recombination, through cross-sectional (inter-individual) and longitudinal (intra-individual) investigations, respectively. Cross-sectional Analysis: The role of Haiti in the subtype B pandemic has been hotly debated for years; while there have been many studies, up to this point, no one has incorporated the well-known mechanism of retroviral recombination into their biological model. Prior to the use of recombination detection, multiple analyses produced trees where subtype B appears to have first entered Haiti, followed by a jump into the rest of the world. The results presented here contest the Haiti-first theory of the pandemic and instead suggest simultaneous entries of subtype B into Haiti and the rest of the world. Longitudinal Analysis: Potential N-linked glycosylation sites (PNGS) are the most evolutionarily dynamic component of one of the most evolutionarily dynamic proteins known to date. While the number of mutations associated with the increase or decrease of PNGS frequency over time is high, there are a set of relatively stable sites that persist within and between longitudinally sampled individuals. Here, I identify the most conserved stable PNGSs and suggest their potential roles in host-virus interplay. In addition, I have identified, for the first time, what may be a gp-120-based environmental preference for N-linked glycosylation sites.

ContributorsHepp, Crystal Marie, 1981- (Author) / Rosenberg, Michael S. (Thesis advisor) / Hedrick, Philip (Committee member) / Escalante, Ananias (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

Identification of neo-antigens for a cancer vaccine by transcriptome analysis

Description

We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA…

We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA processing or in genomic DNA, may lead to generation of neo-peptides that are foreign to the immune system. Viral peptides presumably would originate from exogenous but integrated viral nucleic acid sequences. Both are non-self, therefore lessen concerns about development of autoimmunity. I have developed a bioinformatical approach to identify these aberrant transcripts in the cancer transcriptome. Their suitability for use in a vaccine is evaluated by establishing their frequencies and predicting possible epitopes along with their population coverage according to the prevalence of major histocompatibility complex (MHC) types. Viral transcripts and transcripts with FS mutations from gene fusion, insertion/deletion at coding microsatellite DNA, and alternative splicing were identified in NCBI Expressed Sequence Tag (EST) database. 48 FS chimeric transcripts were validated in 50 breast cell lines and 68 primary breast tumor samples with their frequencies from 4% to 98% by RT-PCR and sequencing confirmation. These 48 FS peptides, if translated and presented, could be used to protect more than 90% of the population in Northern America based on the prediction of epitopes derived from them. Furthermore, we synthesized 150 peptides that correspond to FS and viral peptides that we predicted would exist in tumor patients and we tested over 200 different cancer patient sera. We found a number of serological reactive peptide sequences in cancer patients that had little to no reactivity in healthy controls; strong support for the strength of our bioinformatic approach. This study describes a process used to identify aberrant transcripts that lead to a new source of antigens that can be tested and used in a prophylactic cancer vaccine. The vast amount of transcriptome data of various cancers from the Cancer Genome Atlas (TCGA) project will enhance our ability to further select better cancer antigen candidates.

ContributorsLee, HoJoon (Author) / Johnston, Stephen A. (Thesis advisor) / Kumar, Sudhir (Committee member) / Miller, Laurence (Committee member) / Stafford, Phillip (Committee member) / Sykes, Kathryn (Committee member) / Arizona State University (Publisher)

Created2012

Structural characterization of potential cancer biomarker proteins

Description

Cancer claims hundreds of thousands of lives every year in US alone. Finding ways for early detection of cancer onset is crucial for better management and treatment of cancer. Thus, biomarkers especially protein biomarkers, being the functional units which reflect dynamic physiological changes, need to be discovered. Though important, there…

Cancer claims hundreds of thousands of lives every year in US alone. Finding ways for early detection of cancer onset is crucial for better management and treatment of cancer. Thus, biomarkers especially protein biomarkers, being the functional units which reflect dynamic physiological changes, need to be discovered. Though important, there are only a few approved protein cancer biomarkers till date. To accelerate this process, fast, comprehensive and affordable assays are required which can be applied to large population studies. For this, these assays should be able to comprehensively characterize and explore the molecular diversity of nominally "single" proteins across populations. This information is usually unavailable with commonly used immunoassays such as ELISA (enzyme linked immunosorbent assay) which either ignore protein microheterogeneity, or are confounded by it. To this end, mass spectrometric immuno assays (MSIA) for three different human plasma proteins have been developed. These proteins viz. IGF-1, hemopexin and tetranectin have been found in reported literature to show correlations with many diseases along with several carcinomas. Developed assays were used to extract entire proteins from plasma samples and subsequently analyzed on mass spectrometric platforms. Matrix assisted laser desorption ionization (MALDI) and electrospray ionization (ESI) mass spectrometric techniques where used due to their availability and suitability for the analysis. This resulted in visibility of different structural forms of these proteins showing their structural micro-heterogeneity which is invisible to commonly used immunoassays. These assays are fast, comprehensive and can be applied in large sample studies to analyze proteins for biomarker discovery.

ContributorsRai, Samita (Author) / Nelson, Randall (Thesis advisor) / Hayes, Mark (Thesis advisor) / Borges, Chad (Committee member) / Ros, Alexandra (Committee member) / Arizona State University (Publisher)

Created2012

A Review of Traditional and Novel Treatments for Seizures in Autism Spectrum Disorder: Findings From a Systematic Review and Expert Panel

Description

Despite the fact that seizures are commonly associated with autism spectrum disorder (ASD), the effectiveness of treatments for seizures has not been well studied in individuals with ASD. This manuscript reviews both traditional and novel treatments for seizures associated with ASD. Studies were selected by systematically searching major electronic databases…

Despite the fact that seizures are commonly associated with autism spectrum disorder (ASD), the effectiveness of treatments for seizures has not been well studied in individuals with ASD. This manuscript reviews both traditional and novel treatments for seizures associated with ASD. Studies were selected by systematically searching major electronic databases and by a panel of experts that treat ASD individuals. Only a few anti-epileptic drugs (AEDs) have undergone carefully controlled trials in ASD, but these trials examined outcomes other than seizures. Several lines of evidence point to valproate, lamotrigine, and levetiracetam as the most effective and tolerable AEDs for individuals with ASD. Limited evidence supports the use of traditional non-AED treatments, such as the ketogenic and modified Atkins diet, multiple subpial transections, immunomodulation, and neurofeedback treatments. Although specific treatments may be more appropriate for specific genetic and metabolic syndromes associated with ASD and seizures, there are few studies which have documented the effectiveness of treatments for seizures for specific syndromes. Limited evidence supports l-carnitine, multivitamins, and N-acetyl-l-cysteine in mitochondrial disease and dysfunction, folinic acid in cerebral folate abnormalities and early treatment with vigabatrin in tuberous sclerosis complex. Finally, there is limited evidence for a number of novel treatments, particularly magnesium with pyridoxine, omega-3 fatty acids, the gluten-free casein-free diet, and low-frequency repetitive transcranial magnetic simulation. Zinc and l-carnosine are potential novel treatments supported by basic research but not clinical studies. This review demonstrates the wide variety of treatments used to treat seizures in individuals with ASD as well as the striking lack of clinical trials performed to support the use of these treatments. Additional studies concerning these treatments for controlling seizures in individuals with ASD are warranted.

ContributorsFrye, Richard E. (Author) / Rossignol, Daniel (Author) / Casanova, Manuel F. (Author) / Brown, Gregory L. (Author) / Martin, Victoria (Author) / Edelson, Stephen (Author) / Coben, Robert (Author) / Lewine, Jeffrey (Author) / Slattery, John C. (Author) / Lau, Chrystal (Author) / Hardy, Paul (Author) / Fatemi, S. Hossein (Author) / Folsom, Timothy D. (Author) / MacFabe, Derrick (Author) / Adams, James (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2013-09-13

Approaches to Studying and Manipulating the Enteric Microbiome to Improve Autism Symptoms

Description

There is a growing body of scientific evidence that the health of the microbiome (the trillions of microbes that inhabit the human host) plays an important role in maintaining the health of the host and that disruptions in the microbiome may play a role in certain disease processes. An increasing…

There is a growing body of scientific evidence that the health of the microbiome (the trillions of microbes that inhabit the human host) plays an important role in maintaining the health of the host and that disruptions in the microbiome may play a role in certain disease processes. An increasing number of research studies have provided evidence that the composition of the gut (enteric) microbiome (GM) in at least a subset of individuals with autism spectrum disorder (ASD) deviates from what is usually observed in typically developing individuals. There are several lines of research that suggest that specific changes in the GM could be causative or highly associated with driving core and associated ASD symptoms, pathology, and comorbidities which include gastrointestinal symptoms, although it is also a possibility that these changes, in whole or in part, could be a consequence of underlying pathophysiological features associated with ASD. However, if the GM truly plays a causative role in ASD, then the manipulation of the GM could potentially be leveraged as a therapeutic approach to improve ASD symptoms and/or comorbidities, including gastrointestinal symptoms.

One approach to investigating this possibility in greater detail includes a highly controlled clinical trial in which the GM is systematically manipulated to determine its significance in individuals with ASD. To outline the important issues that would be required to design such a study, a group of clinicians, research scientists, and parents of children with ASD participated in an interdisciplinary daylong workshop as an extension of the 1st International Symposium on the Microbiome in Health and Disease with a Special Focus on Autism (www.microbiome-autism.com). The group considered several aspects of designing clinical studies, including clinical trial design, treatments that could potentially be used in a clinical trial, appropriate ASD participants for the clinical trial, behavioral and cognitive assessments, important biomarkers, safety concerns, and ethical considerations. Overall, the group not only felt that this was a promising area of research for the ASD population and a promising avenue for potential treatment but also felt that further basic and translational research was needed to clarify the clinical utility of such treatments and to elucidate possible mechanisms responsible for a clinical response, so that new treatments and approaches may be discovered and/or fostered in the future.

ContributorsFrye, Richard E. (Author) / Slattery, John (Author) / MacFabe, Derrick F. (Author) / Allen-Vercoe, Emma (Author) / Parker, William (Author) / Rodakis, John (Author) / Adams, James (Author) / Krajmalnik-Brown, Rosa (Author) / Bolte, Ellen (Author) / Kahler, Stephen (Author) / Jennings, Jana (Author) / James, Jill (Author) / Cerniglia, Carl E. (Author) / Midtvedt, Tore (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2015-05-07