Search Content

BioEve: user interface framework bridging IE and IR

Description

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to…

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many systems aid in navigating through this wide collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also facilitate discovery of the unknown information implicitly conveyed in the texts. This thesis presents the different approaches used for large scale biomedical named entity recognition, and the challenges faced in each. It also proposes BioEve: an integrative framework to fuse a faceted search with information extraction to provide a search service that addresses the user's desire for "completeness" of the query results, not just the top-ranked ones. This information extraction system enables discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines and events from biomedical text on MEDLINE, which is the largest publicly available database of the world's biomedical journal literature. It is an innovative search and discovery service that makes it easier to search
avigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm.

ContributorsKanwar, Pradeep (Author) / Davulcu, Hasan (Thesis advisor) / Dinu, Valentin (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2010

Isotropic 3D Nuclear Morphometry of Normal, Fibrocystic and Malignant Breast Epithelial Cells Reveals New Structural Alterations

Description

Background
Grading schemes for breast cancer diagnosis are predominantly based on pathologists' qualitative assessment of altered nuclear structure from 2D brightfield microscopy images. However, cells are three-dimensional (3D) objects with features that are inherently 3D and thus poorly characterized in 2D. Our goal is to quantitatively characterize nuclear structure in 3D,…

Background
Grading schemes for breast cancer diagnosis are predominantly based on pathologists' qualitative assessment of altered nuclear structure from 2D brightfield microscopy images. However, cells are three-dimensional (3D) objects with features that are inherently 3D and thus poorly characterized in 2D. Our goal is to quantitatively characterize nuclear structure in 3D, assess its variation with malignancy, and investigate whether such variation correlates with standard nuclear grading criteria.
Methodology
We applied micro-optical computed tomographic imaging and automated 3D nuclear morphometry to quantify and compare morphological variations between human cell lines derived from normal, benign fibrocystic or malignant breast epithelium. To reproduce the appearance and contrast in clinical cytopathology images, we stained cells with hematoxylin and eosin and obtained 3D images of 150 individual stained cells of each cell type at sub-micron, isotropic resolution. Applying volumetric image analyses, we computed 42 3D morphological and textural descriptors of cellular and nuclear structure.
Principal Findings
We observed four distinct nuclear shape categories, the predominant being a mushroom cap shape. Cell and nuclear volumes increased from normal to fibrocystic to metastatic type, but there was little difference in the volume ratio of nucleus to cytoplasm (N/C ratio) between the lines. Abnormal cell nuclei had more nucleoli, markedly higher density and clumpier chromatin organization compared to normal. Nuclei of non-tumorigenic, fibrocystic cells exhibited larger textural variations than metastatic cell nuclei. At p<0.0025 by ANOVA and Kruskal-Wallis tests, 90% of our computed descriptors statistically differentiated control from abnormal cell populations, but only 69% of these features statistically differentiated the fibrocystic from the metastatic cell populations.
Conclusions
Our results provide a new perspective on nuclear structure variations associated with malignancy and point to the value of automated quantitative 3D nuclear morphometry as an objective tool to enable development of sensitive and specific nuclear grade classification in breast cancer diagnosis.

ContributorsNandakumar, Vivek (Author) / Kelbauskas, Laimonas (Author) / Hernandez, Kathryn (Author) / Lintecum, Kelly (Author) / Senechal, Patti (Author) / Bussey, Kimberly (Author) / Davies, Paul (Author) / Johnson, Roger (Author) / Meldrum, Deirdre (Author) / Ira A. Fulton Schools of Engineering (Contributor) / School of Electrical, Computer and Energy Engineering (Contributor) / Biodesign Institute (Contributor) / Center for Biosignatures Discovery Automation (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / Department of Physics (Contributor)

Created2012-01-05

Joint Exome and Metabolome Analysis in Individuals With Dyslexia: Evidence for Associated Dysregulations of Olfactory Perception and Autoimmune Functions

Description

Dyslexia is a learning disability that negatively affects reading, writing, and spelling development at the word level in 5%-9% of children. The phenotype is variable and complex, involving several potential cognitive and physical concomitants such as sensory dysregulation and immunodeficiencies. The biological pathogenesis is not well-understood. Toward a better understanding…

Dyslexia is a learning disability that negatively affects reading, writing, and spelling development at the word level in 5%-9% of children. The phenotype is variable and complex, involving several potential cognitive and physical concomitants such as sensory dysregulation and immunodeficiencies. The biological pathogenesis is not well-understood. Toward a better understanding of the biological drivers of dyslexia, we conducted the first joint exome and metabolome investigation in a pilot sample of 30 participants with dyslexia and 13 controls. In the metabolite analysis, eight metabolites of interest emerged (pyridoxine, kynurenic acid, citraconic acid, phosphocreatine, hippuric acid, xylitol, 2-deoxyuridine, and acetylcysteine). A metabolite-metabolite interaction analysis identified Krebs cycle intermediates that may be implicated in the development of dyslexia. Gene ontology analysis based on exome variants resulted in several pathways of interest, including the sensory perception of smell (olfactory) and immune system-related responses. In the joint exome and metabolite analysis, the olfactory transduction pathway emerged as the primary pathway of interest. Although the olfactory transduction and Krebs cycle pathways have not previously been described in the dyslexia literature, these pathways have been implicated in other neurodevelopmental disorders including autism spectrum disorder and obsessive-compulsive disorder, suggesting the possibility of these pathways playing a role in dyslexia as well. Immune system response pathways, on the other hand, have been implicated in both dyslexia and other neurodevelopmental disorders.

ContributorsNandakumar, Rohit (Author) / Dinu, Valentin (Thesis director) / Peter, Beate (Committee member) / Barrett, The Honors College (Contributor) / College of Health Solutions (Contributor)

Created2022-05

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular…

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

ContributorsNyarige, Verah Mocheche (Author) / Liu, Li (Thesis advisor) / Wang, Junwen (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2022

Unveiling Cellular Heterogeneity, Genetic Regulation, and Protein Trafficking Dynamics Via Novel Integrative Multi-Omic Approaches

Description

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological…

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological mechanisms, integrative multi-omics analysis is needed that can provide a comprehensive picture of the disease. Here, I present three novel approaches to multi-omics integrative analysis. I introduce a single-cell integrative clustering method, which leverages multi-omics to enhance the resolution of cell subpopulations. Applied to a Cellular Indexing of Transcriptomes and Epitopes (CITE-Seq) dataset from human Acute Myeloid Lymphoma (AML) and control samples, this approach unveiled nuanced cell populations that otherwise remain elusive. I then shift the focus to a computational framework to discover transcriptional regulatory trios in which a transcription factor binds to a regulatory element harboring a genetic variant and subsequently differentially regulates the transcription level of a target gene. Applied to whole-exome, whole-genome, and transcriptome data of multiple myeloma samples, this approach discovered synergetic cis-acting and trans-acting regulatory elements associated with tumorigenesis. The next part of this work introduces a novel methodology that leverages the transcriptome and surface protein data at the single-cell level produced by CITE-Seq to model the intracellular protein trafficking process. Applied to COVID-19 samples, this approach revealed dysregulated protein trafficking associated with the severity of the infection.

ContributorsMudappathi, Rekha (Author) / Liu, Li (Thesis advisor) / Dinu, Valentin (Committee member) / Sun, Zhifu (Committee member) / Arizona State University (Publisher)

Created2023

Methods for Multiclass Geospatial Data Visualization

Description

Geographical visualizations are critical for multi-criteria analysis, optimization, and decision making, where the translation of spatial data into a visual form allows analysts to quickly see patterns, explore summaries and relate domain knowledge about underlying geographical phenomena. However, several critical challenges arise when visualizing large spatiotemporal datasets. While, the underlying…

Geographical visualizations are critical for multi-criteria analysis, optimization, and decision making, where the translation of spatial data into a visual form allows analysts to quickly see patterns, explore summaries and relate domain knowledge about underlying geographical phenomena. However, several critical challenges arise when visualizing large spatiotemporal datasets. While, the underlying geographical component of the data lends itself well to univariate visualization in the form of traditional cartographic representations (e.g., choropleth, isopleth, dasymetric maps), as the data becomes multivariate, cartographic representations become more complex, requiring new approaches for multiclass map visualization and exploration. In this thesis, novel visual analytics methods and frameworks are proposed to support multiclass map analysis. An interactive conservation portfolio development system that combines visualization, multicriteria analysis, optimization, and decision making is developed that showcases a novel visualization and interaction design to compare different purchasing profiles under various optimization constraints. Such multiclass map analysis is then extended using concepts from scalar field topology for hotspot analysis including the introduction of a novel visualization construct combining Merge Trees and Streamgraphs.

ContributorsZhang, Rui (Author) / Maciejewski, Ross RM (Thesis advisor) / Sefair, Jorge JS (Committee member) / Bryan, Chris CB (Committee member) / Hsiao, Sharon SH (Committee member) / Arizona State University (Publisher)

Created2022

Pharmacogenomics of Selective Serotonin Reuptake Inhibitor Treatment for Major Depressive Disorder: a Genome Wide Association Study

Description

A genome wide association study (GWAS) of treatment outcomes for citalopram and escitalopram, two frontline SSRI treatments for Major Depressive Disorder, was conducted with 529 subjects on an imputed dataset. While no variants of genome-wide significance were identified, various potentially interesting variants were identified that warrant further exploration. These findings…

A genome wide association study (GWAS) of treatment outcomes for citalopram and escitalopram, two frontline SSRI treatments for Major Depressive Disorder, was conducted with 529 subjects on an imputed dataset. While no variants of genome-wide significance were identified, various potentially interesting variants were identified that warrant further exploration. These findings have the potential to elucidate novel mechanisms underlying drug response for SSRIs. This work will be continued further, with machine learning and deep learning analyses to perform non-linear analyses and employing a biologist or geneticist to provide more specialized knowledge for interpretation of results.

ContributorsLeiter-Weintraub, Ethan (Author) / Dinu, Valentin (Thesis director) / Scotch, Matthew (Committee member) / Barrett, The Honors College (Contributor) / Dean, W.P. Carey School of Business (Contributor) / College of Health Solutions (Contributor) / School of Life Sciences (Contributor)

Created2024-05

Genetic Variants in GC, CYP2R1, and VDR Genes and Associations of Serum 25-Hydroxyvitamin D Concentrations in a Population of Hispanic and Non-Hispanic Adults Residing in San Diego County, California

Description

Vitamin D is a nutrient that is obtained through the diet and vitamin D supplementation and created from exposure to Ultraviolet B (UVB) radiation. While there are many factors that determine how much serum 25-hydroxyvitamin D (25(OH)D) concentration is in the body, little is known about how genetic variation in…

Vitamin D is a nutrient that is obtained through the diet and vitamin D supplementation and created from exposure to Ultraviolet B (UVB) radiation. While there are many factors that determine how much serum 25-hydroxyvitamin D (25(OH)D) concentration is in the body, little is known about how genetic variation in vitamin D-related genes influences serum 25(OH)D concentrations resulting from daily vitamin D intake and exposure to direct sunlight. Previous studies show that common genetic variants rs10741657 (CYP2R1), rs4588 (GC), rs228678 (GC), and rs4516035 (VDR) act as moderators and alter the effect of outdoor time and vitamin D intake on serum 25(OH)D concentrations. The objective of this study is to analyze the associations between serum 25(OH)D concentrations resulting from outdoor time and vitamin D intake, and genetic risk scores (GRS) established from previous studies involving single nucleotide polymorphisms (SNP) located on or near genes involving vitamin D synthesis, transport, activation, and degradation in 102 Hispanic and Non-Hispanic adults in the San Diego County, California. This study is a secondary analysis of data from the Community of Mine study. Global Positioning System (GPS) data collected by the Qstarz GPS device worn by each participant was used to measure outdoor time, a proxy measurement for sun exposure time. Vitamin D intake was assessed using two 24-hour dietary recalls. Blood samples were measured for serum 25(OH)D concentrations. DNA was provided to assess each participant for the various genetic variants. Adjusted analyses of the GRS and serum 25(OH)D concentrations showed that individuals with high GRS (3-4) had lower serum 25(OH)D concentrations than individuals with low GRS (0-2) for both Nissen GRS and Rivera-Paredez GRS.

ContributorsAnderson, Heather Ray (Author) / Sears, Dorothy (Thesis advisor) / Alexon, Christy (Committee member) / Dinu, Valentin (Committee member) / Jankowska, Marta (Committee member) / Arizona State University (Publisher)

Created2022

Circular RNA characterization and regulatory network prediction in human tissue

Description

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their…

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their functional role and impact remains to be clarified, circRNAs have been found to regulate micro-RNAs (miRNAs) as well as parental gene transcription and may thus have key roles in transcriptional regulation. Although circRNAs have continued to gain attention, our understanding of their expression in a cell-, tissue- , and brain region-specific context remains limited. Further, computational algorithms produce varied results in terms of what circRNAs are detected. This thesis aims to advance current knowledge of circRNA expression in a region specific context focusing on the human brain, as well as address computational challenges.

The overarching goal of my research unfolds over three aims: (i) evaluating circRNAs and their predicted impact on transcriptional regulatory networks in cell-specific RNAseq data; (ii) developing a novel solution for de novo detection of full length circRNAs as well as in silico validation of selected circRNA junctions using assembly; and (iii) application of these assembly based detection and validation workflows, and integrating existing tools, to systematically identify and characterize circRNAs in functionally distinct human brain regions. To this end, I have developed novel bioinformatics workflows that are applicable to non-polyA selected RNAseq datasets and can be used to characterize circRNA expression across various sample types and diseases. Further, I establish a reference dataset of circRNA expression profiles and regulatory networks in a brain region-specific manner. This resource along with existing databases such as circBase will be invaluable in advancing circRNA research as well as improving our understanding of their role in transcriptional regulation and various neurological conditions.

ContributorsSekar, Shobana (Author) / Liang, Winnie S (Thesis advisor) / Dinu, Valentin (Thesis advisor) / Craig, David (Committee member) / Liu, Li (Committee member) / Arizona State University (Publisher)

Created2018

Automated Injection of Curated Knowledge Into Real-Time Clinical Systems: CDS Architecture for the 21st Century

Description

Clinical Decision Support (CDS) is primarily associated with alerts, reminders, order entry, rule-based invocation, diagnostic aids, and on-demand information retrieval. While valuable, these foci have been in production use for decades, and do not provide a broader, interoperable means of plugging structured clinical knowledge into live electronic health record (EHR)…

Clinical Decision Support (CDS) is primarily associated with alerts, reminders, order entry, rule-based invocation, diagnostic aids, and on-demand information retrieval. While valuable, these foci have been in production use for decades, and do not provide a broader, interoperable means of plugging structured clinical knowledge into live electronic health record (EHR) ecosystems for purposes of orchestrating the user experiences of patients and clinicians. To date, the gap between knowledge representation and user-facing EHR integration has been considered an “implementation concern” requiring unscalable manual human efforts and governance coordination. Drafting a questionnaire engineered to meet the specifications of the HL7 CDS Knowledge Artifact specification, for example, carries no reasonable expectation that it may be imported and deployed into a live system without significant burdens. Dramatic reduction of the time and effort gap in the research and application cycle could be revolutionary. Doing so, however, requires both a floor-to-ceiling precoordination of functional boundaries in the knowledge management lifecycle, as well as formalization of the human processes by which this occurs.

This research introduces ARTAKA: Architecture for Real-Time Application of Knowledge Artifacts, as a concrete floor-to-ceiling technological blueprint for both provider heath IT (HIT) and vendor organizations to incrementally introduce value into existing systems dynamically. This is made possible by service-ization of curated knowledge artifacts, then injected into a highly scalable backend infrastructure by automated orchestration through public marketplaces. Supplementary examples of client app integration are also provided. Compilation of knowledge into platform-specific form has been left flexible, in so far as implementations comply with ARTAKA’s Context Event Service (CES) communication and Health Services Platform (HSP) Marketplace service packaging standards.

Towards the goal of interoperable human processes, ARTAKA’s treatment of knowledge artifacts as a specialized form of software allows knowledge engineers to operate as a type of software engineering practice. Thus, nearly a century of software development processes, tools, policies, and lessons offer immediate benefit: in some cases, with remarkable parity. Analyses of experimentation is provided with guidelines in how choice aspects of software development life cycles (SDLCs) apply to knowledge artifact development in an ARTAKA environment.

Portions of this culminating document have been further initiated with Standards Developing Organizations (SDOs) intended to ultimately produce normative standards, as have active relationships with other bodies.

ContributorsLee, Preston Victor (Author) / Dinu, Valentin (Thesis advisor) / Sottara, Davide (Committee member) / Greenes, Robert (Committee member) / Arizona State University (Publisher)

Created2018