Search Content

Next-Generation Sequencing for DNA Methylation Profiling in Blood and Skeletal Muscle

Description

DNA methylation, a subset of epigenetics, has been found to be a significant marker associated with variations in gene expression and activity across the entire human genome. As of now, however, there is little to no information about how DNA methylation varies between different tissues inside a singular person's body.…

DNA methylation, a subset of epigenetics, has been found to be a significant marker associated with variations in gene expression and activity across the entire human genome. As of now, however, there is little to no information about how DNA methylation varies between different tissues inside a singular person's body. By using research data from a preliminary study of lean and obese clinical subjects, this study attempts to put together a profile of the differences in DNA methylation that can be observed between two particular body tissues from this subject group: blood and skeletal muscle. This study allows us to start describing the changes that occur at the epigenetic level that influence how differently these two tissues operate, along with seeing how these tissues change between individuals of different weight classes, especially in the context of the development of symptoms of Type 2 Diabetes.

ContributorsRappazzo, Micah Gabriel (Author) / Coletta, Dawn (Thesis director) / Katsanos, Christos (Committee member) / Dinu, Valentin (Committee member) / Barrett, The Honors College (Contributor) / Harrington Bioengineering Program (Contributor) / Department of Psychology (Contributor)

Created2013-12

BioEve: user interface framework bridging IE and IR

Description

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to…

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many systems aid in navigating through this wide collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also facilitate discovery of the unknown information implicitly conveyed in the texts. This thesis presents the different approaches used for large scale biomedical named entity recognition, and the challenges faced in each. It also proposes BioEve: an integrative framework to fuse a faceted search with information extraction to provide a search service that addresses the user's desire for "completeness" of the query results, not just the top-ranked ones. This information extraction system enables discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines and events from biomedical text on MEDLINE, which is the largest publicly available database of the world's biomedical journal literature. It is an innovative search and discovery service that makes it easier to search
avigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm.

ContributorsKanwar, Pradeep (Author) / Davulcu, Hasan (Thesis advisor) / Dinu, Valentin (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2010

Merging Economics and Epidemiology to Improve the Prediction and Management of Infectious Disease

Description

Mathematical epidemiology, one of the oldest and richest areas in mathematical biology, has significantly enhanced our understanding of how pathogens emerge, evolve, and spread. Classical epidemiological models, the standard for predicting and managing the spread of infectious disease, assume that contacts between susceptible and infectious individuals depend on their relative…

Mathematical epidemiology, one of the oldest and richest areas in mathematical biology, has significantly enhanced our understanding of how pathogens emerge, evolve, and spread. Classical epidemiological models, the standard for predicting and managing the spread of infectious disease, assume that contacts between susceptible and infectious individuals depend on their relative frequency in the population. The behavioral factors that underpin contact rates are not generally addressed. There is, however, an emerging a class of models that addresses the feedbacks between infectious disease dynamics and the behavioral decisions driving host contact. Referred to as “economic epidemiology” or “epidemiological economics,” the approach explores the determinants of decisions about the number and type of contacts made by individuals, using insights and methods from economics. We show how the approach has the potential both to improve predictions of the course of infectious disease, and to support development of novel approaches to infectious disease management.

ContributorsPerrings, Charles (Author) / Castillo-Chavez, Carlos (Author) / Chowell-Puente, Gerardo (Author) / Daszak, Peter (Author) / Fenichel, Eli P. (Author) / Finnoff, David (Author) / Horan, Richard D. (Author) / Kilpatrick, A. Marm (Author) / Kinzig, Ann (Author) / Kuminoff, Nicolai (Author) / Levin, Simon (Author) / Morin, Benjamin (Author) / Smith, Katherine F. (Author) / Springborn, Michael (Author) / Simon M. Levin Mathematical, Computational and Modeling Sciences Center (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / School of Human Evolution and Social Change (Contributor) / W.P. Carey School of Business (Contributor) / Economics (Contributor) / Julie Ann Wrigley Global Institute of Sustainability (Contributor)

Created2015-12-01

Did Modeling Overestimate the Transmission Potential of Pandemic (H1N1-2009)? Sample Size Estimation for Post-Epidemic Seroepidemiological Studies

Description

Background
Seroepidemiological studies before and after the epidemic wave of H1N1-2009 are useful for estimating population attack rates with a potential to validate early estimates of the reproduction number, R, in modeling studies.
Methodology/Principal Findings
Since the final epidemic size, the proportion of individuals in a population who become infected during an epidemic,…

Background
Seroepidemiological studies before and after the epidemic wave of H1N1-2009 are useful for estimating population attack rates with a potential to validate early estimates of the reproduction number, R, in modeling studies.
Methodology/Principal Findings
Since the final epidemic size, the proportion of individuals in a population who become infected during an epidemic, is not the result of a binomial sampling process because infection events are not independent of each other, we propose the use of an asymptotic distribution of the final size to compute approximate 95% confidence intervals of the observed final size. This allows the comparison of the observed final sizes against predictions based on the modeling study (R = 1.15, 1.40 and 1.90), which also yields simple formulae for determining sample sizes for future seroepidemiological studies. We examine a total of eleven published seroepidemiological studies of H1N1-2009 that took place after observing the peak incidence in a number of countries. Observed seropositive proportions in six studies appear to be smaller than that predicted from R = 1.40; four of the six studies sampled serum less than one month after the reported peak incidence. The comparison of the observed final sizes against R = 1.15 and 1.90 reveals that all eleven studies appear not to be significantly deviating from the prediction with R = 1.15, but final sizes in nine studies indicate overestimation if the value R = 1.90 is used.
Conclusions
Sample sizes of published seroepidemiological studies were too small to assess the validity of model predictions except when R = 1.90 was used. We recommend the use of the proposed approach in determining the sample size of post-epidemic seroepidemiological studies, calculating the 95% confidence interval of observed final size, and conducting relevant hypothesis testing instead of the use of methods that rely on a binomial proportion.

ContributorsNishiura, Hiroshi (Author) / Chowell-Puente, Gerardo (Author) / Castillo-Chavez, Carlos (Author) / Simon M. Levin Mathematical, Computational and Modeling Sciences Center (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor)

Created2011-03-24

Joint Exome and Metabolome Analysis in Individuals With Dyslexia: Evidence for Associated Dysregulations of Olfactory Perception and Autoimmune Functions

Description

Dyslexia is a learning disability that negatively affects reading, writing, and spelling development at the word level in 5%-9% of children. The phenotype is variable and complex, involving several potential cognitive and physical concomitants such as sensory dysregulation and immunodeficiencies. The biological pathogenesis is not well-understood. Toward a better understanding…

Dyslexia is a learning disability that negatively affects reading, writing, and spelling development at the word level in 5%-9% of children. The phenotype is variable and complex, involving several potential cognitive and physical concomitants such as sensory dysregulation and immunodeficiencies. The biological pathogenesis is not well-understood. Toward a better understanding of the biological drivers of dyslexia, we conducted the first joint exome and metabolome investigation in a pilot sample of 30 participants with dyslexia and 13 controls. In the metabolite analysis, eight metabolites of interest emerged (pyridoxine, kynurenic acid, citraconic acid, phosphocreatine, hippuric acid, xylitol, 2-deoxyuridine, and acetylcysteine). A metabolite-metabolite interaction analysis identified Krebs cycle intermediates that may be implicated in the development of dyslexia. Gene ontology analysis based on exome variants resulted in several pathways of interest, including the sensory perception of smell (olfactory) and immune system-related responses. In the joint exome and metabolite analysis, the olfactory transduction pathway emerged as the primary pathway of interest. Although the olfactory transduction and Krebs cycle pathways have not previously been described in the dyslexia literature, these pathways have been implicated in other neurodevelopmental disorders including autism spectrum disorder and obsessive-compulsive disorder, suggesting the possibility of these pathways playing a role in dyslexia as well. Immune system response pathways, on the other hand, have been implicated in both dyslexia and other neurodevelopmental disorders.

ContributorsNandakumar, Rohit (Author) / Dinu, Valentin (Thesis director) / Peter, Beate (Committee member) / Barrett, The Honors College (Contributor) / College of Health Solutions (Contributor)

Created2022-05

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular…

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

ContributorsNyarige, Verah Mocheche (Author) / Liu, Li (Thesis advisor) / Wang, Junwen (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2022

Unveiling Cellular Heterogeneity, Genetic Regulation, and Protein Trafficking Dynamics Via Novel Integrative Multi-Omic Approaches

Description

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological…

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological mechanisms, integrative multi-omics analysis is needed that can provide a comprehensive picture of the disease. Here, I present three novel approaches to multi-omics integrative analysis. I introduce a single-cell integrative clustering method, which leverages multi-omics to enhance the resolution of cell subpopulations. Applied to a Cellular Indexing of Transcriptomes and Epitopes (CITE-Seq) dataset from human Acute Myeloid Lymphoma (AML) and control samples, this approach unveiled nuanced cell populations that otherwise remain elusive. I then shift the focus to a computational framework to discover transcriptional regulatory trios in which a transcription factor binds to a regulatory element harboring a genetic variant and subsequently differentially regulates the transcription level of a target gene. Applied to whole-exome, whole-genome, and transcriptome data of multiple myeloma samples, this approach discovered synergetic cis-acting and trans-acting regulatory elements associated with tumorigenesis. The next part of this work introduces a novel methodology that leverages the transcriptome and surface protein data at the single-cell level produced by CITE-Seq to model the intracellular protein trafficking process. Applied to COVID-19 samples, this approach revealed dysregulated protein trafficking associated with the severity of the infection.

ContributorsMudappathi, Rekha (Author) / Liu, Li (Thesis advisor) / Dinu, Valentin (Committee member) / Sun, Zhifu (Committee member) / Arizona State University (Publisher)

Created2023

Pharmacogenomics of Selective Serotonin Reuptake Inhibitor Treatment for Major Depressive Disorder: a Genome Wide Association Study

Description

A genome wide association study (GWAS) of treatment outcomes for citalopram and escitalopram, two frontline SSRI treatments for Major Depressive Disorder, was conducted with 529 subjects on an imputed dataset. While no variants of genome-wide significance were identified, various potentially interesting variants were identified that warrant further exploration. These findings…

A genome wide association study (GWAS) of treatment outcomes for citalopram and escitalopram, two frontline SSRI treatments for Major Depressive Disorder, was conducted with 529 subjects on an imputed dataset. While no variants of genome-wide significance were identified, various potentially interesting variants were identified that warrant further exploration. These findings have the potential to elucidate novel mechanisms underlying drug response for SSRIs. This work will be continued further, with machine learning and deep learning analyses to perform non-linear analyses and employing a biologist or geneticist to provide more specialized knowledge for interpretation of results.

ContributorsLeiter-Weintraub, Ethan (Author) / Dinu, Valentin (Thesis director) / Scotch, Matthew (Committee member) / Barrett, The Honors College (Contributor) / Dean, W.P. Carey School of Business (Contributor) / College of Health Solutions (Contributor) / School of Life Sciences (Contributor)

Created2024-05

Genetic Variants in GC, CYP2R1, and VDR Genes and Associations of Serum 25-Hydroxyvitamin D Concentrations in a Population of Hispanic and Non-Hispanic Adults Residing in San Diego County, California

Description

Vitamin D is a nutrient that is obtained through the diet and vitamin D supplementation and created from exposure to Ultraviolet B (UVB) radiation. While there are many factors that determine how much serum 25-hydroxyvitamin D (25(OH)D) concentration is in the body, little is known about how genetic variation in…

Vitamin D is a nutrient that is obtained through the diet and vitamin D supplementation and created from exposure to Ultraviolet B (UVB) radiation. While there are many factors that determine how much serum 25-hydroxyvitamin D (25(OH)D) concentration is in the body, little is known about how genetic variation in vitamin D-related genes influences serum 25(OH)D concentrations resulting from daily vitamin D intake and exposure to direct sunlight. Previous studies show that common genetic variants rs10741657 (CYP2R1), rs4588 (GC), rs228678 (GC), and rs4516035 (VDR) act as moderators and alter the effect of outdoor time and vitamin D intake on serum 25(OH)D concentrations. The objective of this study is to analyze the associations between serum 25(OH)D concentrations resulting from outdoor time and vitamin D intake, and genetic risk scores (GRS) established from previous studies involving single nucleotide polymorphisms (SNP) located on or near genes involving vitamin D synthesis, transport, activation, and degradation in 102 Hispanic and Non-Hispanic adults in the San Diego County, California. This study is a secondary analysis of data from the Community of Mine study. Global Positioning System (GPS) data collected by the Qstarz GPS device worn by each participant was used to measure outdoor time, a proxy measurement for sun exposure time. Vitamin D intake was assessed using two 24-hour dietary recalls. Blood samples were measured for serum 25(OH)D concentrations. DNA was provided to assess each participant for the various genetic variants. Adjusted analyses of the GRS and serum 25(OH)D concentrations showed that individuals with high GRS (3-4) had lower serum 25(OH)D concentrations than individuals with low GRS (0-2) for both Nissen GRS and Rivera-Paredez GRS.

ContributorsAnderson, Heather Ray (Author) / Sears, Dorothy (Thesis advisor) / Alexon, Christy (Committee member) / Dinu, Valentin (Committee member) / Jankowska, Marta (Committee member) / Arizona State University (Publisher)

Created2022

Circular RNA characterization and regulatory network prediction in human tissue

Description

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their…

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their functional role and impact remains to be clarified, circRNAs have been found to regulate micro-RNAs (miRNAs) as well as parental gene transcription and may thus have key roles in transcriptional regulation. Although circRNAs have continued to gain attention, our understanding of their expression in a cell-, tissue- , and brain region-specific context remains limited. Further, computational algorithms produce varied results in terms of what circRNAs are detected. This thesis aims to advance current knowledge of circRNA expression in a region specific context focusing on the human brain, as well as address computational challenges.

The overarching goal of my research unfolds over three aims: (i) evaluating circRNAs and their predicted impact on transcriptional regulatory networks in cell-specific RNAseq data; (ii) developing a novel solution for de novo detection of full length circRNAs as well as in silico validation of selected circRNA junctions using assembly; and (iii) application of these assembly based detection and validation workflows, and integrating existing tools, to systematically identify and characterize circRNAs in functionally distinct human brain regions. To this end, I have developed novel bioinformatics workflows that are applicable to non-polyA selected RNAseq datasets and can be used to characterize circRNA expression across various sample types and diseases. Further, I establish a reference dataset of circRNA expression profiles and regulatory networks in a brain region-specific manner. This resource along with existing databases such as circBase will be invaluable in advancing circRNA research as well as improving our understanding of their role in transcriptional regulation and various neurological conditions.

ContributorsSekar, Shobana (Author) / Liang, Winnie S (Thesis advisor) / Dinu, Valentin (Thesis advisor) / Craig, David (Committee member) / Liu, Li (Committee member) / Arizona State University (Publisher)

Created2018