Search Content

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular…

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

ContributorsNyarige, Verah Mocheche (Author) / Liu, Li (Thesis advisor) / Wang, Junwen (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2022

Unveiling Cellular Heterogeneity, Genetic Regulation, and Protein Trafficking Dynamics Via Novel Integrative Multi-Omic Approaches

Description

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological…

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological mechanisms, integrative multi-omics analysis is needed that can provide a comprehensive picture of the disease. Here, I present three novel approaches to multi-omics integrative analysis. I introduce a single-cell integrative clustering method, which leverages multi-omics to enhance the resolution of cell subpopulations. Applied to a Cellular Indexing of Transcriptomes and Epitopes (CITE-Seq) dataset from human Acute Myeloid Lymphoma (AML) and control samples, this approach unveiled nuanced cell populations that otherwise remain elusive. I then shift the focus to a computational framework to discover transcriptional regulatory trios in which a transcription factor binds to a regulatory element harboring a genetic variant and subsequently differentially regulates the transcription level of a target gene. Applied to whole-exome, whole-genome, and transcriptome data of multiple myeloma samples, this approach discovered synergetic cis-acting and trans-acting regulatory elements associated with tumorigenesis. The next part of this work introduces a novel methodology that leverages the transcriptome and surface protein data at the single-cell level produced by CITE-Seq to model the intracellular protein trafficking process. Applied to COVID-19 samples, this approach revealed dysregulated protein trafficking associated with the severity of the infection.

ContributorsMudappathi, Rekha (Author) / Liu, Li (Thesis advisor) / Dinu, Valentin (Committee member) / Sun, Zhifu (Committee member) / Arizona State University (Publisher)

Created2023

Transorbital Skull Base Surgery; Exploratory Quantitative Assessment of the Microscopic and Endoscopic Surgical Corridor

Description

Transorbital surgery has gained recent notoriety due to its incorporation into endoscopic skull base surgery. The body of published literature on the field is cadaveric and observation. The pre-clinical studies are focused on the use of the endoscope only. Furthermore the methodology utilised in the published literature is inconsistent and…

Transorbital surgery has gained recent notoriety due to its incorporation into endoscopic skull base surgery. The body of published literature on the field is cadaveric and observation. The pre-clinical studies are focused on the use of the endoscope only. Furthermore the methodology utilised in the published literature is inconsistent and does not embody the optimal principles of scientific experimentation. This body of work evaluates a minimally invasive novel surgical corridor - the transorbital approach - its validity in neurosurgical practice, as well as both qualitatively and quantitatively assessing available technological advances in a robust experimental fashion. While the endoscope is an established means of visualisation used in clinical transorbital surgery, the microscope has never been assessed with respect to the transorbital approach. This question is investigated here and the anatomical and surgical benefits and limitations of microscopic visualisation demonstrated. The comparative studies provide increased knowledge on specifics pertinent to neurosurgeons and other skull base specialists when planning pre-operatively, such as pathology location, involved anatomical structures, instrument maneuvrability and the advantages and disadvantages of the distinct visualisation technologies. This is all with the intention of selecting the most suitable surgical approach and technology, specific to the patient, pathology and anatomy, so as to perform the best surgical procedure. The research findings illustrated in this body of work are diverse, reproducible and applicable. The transorbital surgical corridor has substantive potential for access to the anterior cranial fossa and specific surgical target structures. The neuroquantitative metrics investigated confirm the utility and benefits specific to the respective visualisation technologies i.e. the endoscope and microscope. The most appropriate setting wherein the approach should be used is also discussed. The transorbital corridor has impressive potential, can utilise all available technological advances, promotes multi-disciplinary co-operation and learning amongst clinicians and ultimately, is a means of improving operative patient care.

ContributorsHoulihan, Lena Mary (Author) / Preul, Mark C. (Thesis advisor) / Vernon, Brent (Thesis advisor) / O' Sullivan, Michael G.J. (Committee member) / Lawton, Michael T. (Committee member) / Santarelli, Griffin (Committee member) / Smith, Brian (Committee member) / Arizona State University (Publisher)

Created2021

Genetic Variants in GC, CYP2R1, and VDR Genes and Associations of Serum 25-Hydroxyvitamin D Concentrations in a Population of Hispanic and Non-Hispanic Adults Residing in San Diego County, California

Description

Vitamin D is a nutrient that is obtained through the diet and vitamin D supplementation and created from exposure to Ultraviolet B (UVB) radiation. While there are many factors that determine how much serum 25-hydroxyvitamin D (25(OH)D) concentration is in the body, little is known about how genetic variation in…

Vitamin D is a nutrient that is obtained through the diet and vitamin D supplementation and created from exposure to Ultraviolet B (UVB) radiation. While there are many factors that determine how much serum 25-hydroxyvitamin D (25(OH)D) concentration is in the body, little is known about how genetic variation in vitamin D-related genes influences serum 25(OH)D concentrations resulting from daily vitamin D intake and exposure to direct sunlight. Previous studies show that common genetic variants rs10741657 (CYP2R1), rs4588 (GC), rs228678 (GC), and rs4516035 (VDR) act as moderators and alter the effect of outdoor time and vitamin D intake on serum 25(OH)D concentrations. The objective of this study is to analyze the associations between serum 25(OH)D concentrations resulting from outdoor time and vitamin D intake, and genetic risk scores (GRS) established from previous studies involving single nucleotide polymorphisms (SNP) located on or near genes involving vitamin D synthesis, transport, activation, and degradation in 102 Hispanic and Non-Hispanic adults in the San Diego County, California. This study is a secondary analysis of data from the Community of Mine study. Global Positioning System (GPS) data collected by the Qstarz GPS device worn by each participant was used to measure outdoor time, a proxy measurement for sun exposure time. Vitamin D intake was assessed using two 24-hour dietary recalls. Blood samples were measured for serum 25(OH)D concentrations. DNA was provided to assess each participant for the various genetic variants. Adjusted analyses of the GRS and serum 25(OH)D concentrations showed that individuals with high GRS (3-4) had lower serum 25(OH)D concentrations than individuals with low GRS (0-2) for both Nissen GRS and Rivera-Paredez GRS.

ContributorsAnderson, Heather Ray (Author) / Sears, Dorothy (Thesis advisor) / Alexon, Christy (Committee member) / Dinu, Valentin (Committee member) / Jankowska, Marta (Committee member) / Arizona State University (Publisher)

Created2022

Modeling Brain Cancer Progression Using Reaction-Diffusion Equations with Minimal Parameters

Description

A description of numerical and analytical work pertaining to models that describe the growth and progression of glioblastoma multiforme (GBM), an aggressive form of primary brain cancer. Two reaction-diffusion models are used: the Fisher-Kolmogorov-Petrovsky-Piskunov equation and a 2-population model that divides the tumor into actively proliferating and quiescent (or necrotic)…

A description of numerical and analytical work pertaining to models that describe the growth and progression of glioblastoma multiforme (GBM), an aggressive form of primary brain cancer. Two reaction-diffusion models are used: the Fisher-Kolmogorov-Petrovsky-Piskunov equation and a 2-population model that divides the tumor into actively proliferating and quiescent (or necrotic) cells. The numerical portion of this work (chapter 2) focuses on simulating GBM expansion in patients undergoing treatment for recurrence of tumor following initial surgery. The models are simulated on 3-dimensional brain geometries derived from magnetic resonance imaging (MRI) scans provided by the Barrow Neurological Institute. The study consists of 17 clinical time intervals across 10 patients that have been followed in detail, each of whom shows significant progression of tumor over a period of 1 to 3 months on sequential follow up scans. A Taguchi sampling design is implemented to estimate the variability of the predicted tumors to using 144 different choices of model parameters. In 9 cases, model parameters can be identified such that the simulated tumor contains at least 40 percent of the volume of the observed tumor. In the analytical portion of the paper (chapters 3 and 4), a positively invariant region for our 2-population model is identified. Then, a rigorous derivation of the critical patch size associated with the model is performed. The critical patch (KISS) size is the minimum habitat size needed for a population to survive in a region. Habitats larger than the critical patch size allow a population to persist, while smaller habitats lead to extinction. The critical patch size of the 2-population model is consistent with that of the Fisher-Kolmogorov-Petrovsky-Piskunov equation, one of the first reaction-diffusion models proposed for GBM. The critical patch size may indicate that GBM tumors have a minimum size depending on the location in the brain. A theoretical relationship between the size of a GBM tumor at steady-state and its maximum cell density is also derived, which has potential applications for patient-specific parameter estimation based on magnetic resonance imaging data.

ContributorsHarris, Duane C. (Author) / Kuang, Yang (Thesis advisor) / Kostelich, Eric J. (Thesis advisor) / Preul, Mark C. (Committee member) / Crook, Sharon (Committee member) / Gardner, Carl (Committee member) / Arizona State University (Publisher)

Created2023

Circular RNA characterization and regulatory network prediction in human tissue

Description

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their…

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their functional role and impact remains to be clarified, circRNAs have been found to regulate micro-RNAs (miRNAs) as well as parental gene transcription and may thus have key roles in transcriptional regulation. Although circRNAs have continued to gain attention, our understanding of their expression in a cell-, tissue- , and brain region-specific context remains limited. Further, computational algorithms produce varied results in terms of what circRNAs are detected. This thesis aims to advance current knowledge of circRNA expression in a region specific context focusing on the human brain, as well as address computational challenges.

The overarching goal of my research unfolds over three aims: (i) evaluating circRNAs and their predicted impact on transcriptional regulatory networks in cell-specific RNAseq data; (ii) developing a novel solution for de novo detection of full length circRNAs as well as in silico validation of selected circRNA junctions using assembly; and (iii) application of these assembly based detection and validation workflows, and integrating existing tools, to systematically identify and characterize circRNAs in functionally distinct human brain regions. To this end, I have developed novel bioinformatics workflows that are applicable to non-polyA selected RNAseq datasets and can be used to characterize circRNA expression across various sample types and diseases. Further, I establish a reference dataset of circRNA expression profiles and regulatory networks in a brain region-specific manner. This resource along with existing databases such as circBase will be invaluable in advancing circRNA research as well as improving our understanding of their role in transcriptional regulation and various neurological conditions.

ContributorsSekar, Shobana (Author) / Liang, Winnie S (Thesis advisor) / Dinu, Valentin (Thesis advisor) / Craig, David (Committee member) / Liu, Li (Committee member) / Arizona State University (Publisher)

Created2018

Automated Injection of Curated Knowledge Into Real-Time Clinical Systems: CDS Architecture for the 21st Century

Description

Clinical Decision Support (CDS) is primarily associated with alerts, reminders, order entry, rule-based invocation, diagnostic aids, and on-demand information retrieval. While valuable, these foci have been in production use for decades, and do not provide a broader, interoperable means of plugging structured clinical knowledge into live electronic health record (EHR)…

Clinical Decision Support (CDS) is primarily associated with alerts, reminders, order entry, rule-based invocation, diagnostic aids, and on-demand information retrieval. While valuable, these foci have been in production use for decades, and do not provide a broader, interoperable means of plugging structured clinical knowledge into live electronic health record (EHR) ecosystems for purposes of orchestrating the user experiences of patients and clinicians. To date, the gap between knowledge representation and user-facing EHR integration has been considered an “implementation concern” requiring unscalable manual human efforts and governance coordination. Drafting a questionnaire engineered to meet the specifications of the HL7 CDS Knowledge Artifact specification, for example, carries no reasonable expectation that it may be imported and deployed into a live system without significant burdens. Dramatic reduction of the time and effort gap in the research and application cycle could be revolutionary. Doing so, however, requires both a floor-to-ceiling precoordination of functional boundaries in the knowledge management lifecycle, as well as formalization of the human processes by which this occurs.

This research introduces ARTAKA: Architecture for Real-Time Application of Knowledge Artifacts, as a concrete floor-to-ceiling technological blueprint for both provider heath IT (HIT) and vendor organizations to incrementally introduce value into existing systems dynamically. This is made possible by service-ization of curated knowledge artifacts, then injected into a highly scalable backend infrastructure by automated orchestration through public marketplaces. Supplementary examples of client app integration are also provided. Compilation of knowledge into platform-specific form has been left flexible, in so far as implementations comply with ARTAKA’s Context Event Service (CES) communication and Health Services Platform (HSP) Marketplace service packaging standards.

Towards the goal of interoperable human processes, ARTAKA’s treatment of knowledge artifacts as a specialized form of software allows knowledge engineers to operate as a type of software engineering practice. Thus, nearly a century of software development processes, tools, policies, and lessons offer immediate benefit: in some cases, with remarkable parity. Analyses of experimentation is provided with guidelines in how choice aspects of software development life cycles (SDLCs) apply to knowledge artifact development in an ARTAKA environment.

Portions of this culminating document have been further initiated with Standards Developing Organizations (SDOs) intended to ultimately produce normative standards, as have active relationships with other bodies.

ContributorsLee, Preston Victor (Author) / Dinu, Valentin (Thesis advisor) / Sottara, Davide (Committee member) / Greenes, Robert (Committee member) / Arizona State University (Publisher)

Created2018

A Novel Approach to the Comparative Genomic Analysis of Canine and Human Cancers

Description

Study of canine cancer’s molecular underpinnings holds great potential for informing veterinary and human oncology. Sporadic canine cancers are highly abundant (~4 million diagnoses/year in the United States) and the dog’s unique genomic architecture due to selective inbreeding, alongside the high similarity between dog and human genomes both confer power…

Study of canine cancer’s molecular underpinnings holds great potential for informing veterinary and human oncology. Sporadic canine cancers are highly abundant (~4 million diagnoses/year in the United States) and the dog’s unique genomic architecture due to selective inbreeding, alongside the high similarity between dog and human genomes both confer power for improving understanding of cancer genes. However, characterization of canine cancer genome landscapes has been limited. It is hindered by lack of canine-specific tools and resources. To enable robust and reproducible comparative genomic analysis of canine cancers, I have developed a workflow for somatic and germline variant calling in canine cancer genomic data. I have first adapted a human cancer genomics pipeline to create a semi-automated canine pipeline used to map genomic landscapes of canine melanoma, lung adenocarcinoma, osteosarcoma and lymphoma. This pipeline also forms the backbone of my novel comparative genomics workflow.

Practical impediments to comparative genomic analysis of dog and human include challenges identifying similarities in mutation type and function across species. For example, canine genes could have evolved different functions and their human orthologs may perform different functions. Hence, I undertook a systematic statistical evaluation of dog and human cancer genes and assessed functional similarities and differences between orthologs to improve understanding of the roles of these genes in cancer across species. I tested this pipeline canine and human Diffuse Large B-Cell Lymphoma (DLBCL), given that canine DLBCL is the most comprehensively genomically characterized canine cancer. Logistic regression with genes bearing somatic coding mutations in each cancer was used to determine if conservation metrics (sequence identity, network placement, etc.) could explain co-mutation of genes in both species. Using this model, I identified 25 co-mutated and evolutionarily similar genes that may be compelling cross-species cancer genes. For example, PCLO was identified as a co-mutated conserved gene with PCLO having been previously identified as recurrently mutated in human DLBCL, but with an unclear role in oncogenesis. Further investigation of these genes might shed new light on the biology of lymphoma in dogs and human and this approach may more broadly serve to prioritize new genes for comparative cancer biology studies.

ContributorsSivaprakasam, Karthigayini (Author) / Dinu, Valentin (Thesis advisor) / Trent, Jeffrey (Thesis advisor) / Hendricks, William (Committee member) / Runger, George C. (Committee member) / Arizona State University (Publisher)

Created2018

Early detection and treatment of breast cancer by random peptide array in neuN transgenic mouse model

Description

Breast cancer is the most common cancer and currently the second leading cause of death among women in the United States. Patients’ five-year relative survival rate decreases from 99% to 25% when breast cancer is diagnosed late. Immune checkpoint blockage has shown to be a promising therapy to improve patients’…

Breast cancer is the most common cancer and currently the second leading cause of death among women in the United States. Patients’ five-year relative survival rate decreases from 99% to 25% when breast cancer is diagnosed late. Immune checkpoint blockage has shown to be a promising therapy to improve patients’ outcome in many other cancers. However, due to the lack of early diagnosis, the treatment is normally given in the later stages. An early diagnosis system for breast cancer could potentially revolutionize current treatment strategies, improve patients’ outcomes and even eradicate the disease. The current breast cancer diagnostic methods cannot meet this demand. A simple, effective, noninvasive and inexpensive early diagnostic technology is needed. Immunosignature technology leverages the power of the immune system to find cancer early. Antibodies targeting tumor antigens in the blood are probed on a high-throughput random peptide array and generate a specific binding pattern called the immunosignature.

In this dissertation, I propose a scenario for using immunosignature technology to detect breast cancer early and to implement an early treatment strategy by using the PD-L1 immune checkpoint inhibitor. I develop a methodology to describe the early diagnosis and treatment of breast cancer in a FVB/N neuN breast cancer mouse model. By comparing FVB/N neuN transgenic mice and age-matched wild type controls, I have found and validated specific immunosignatures at multiple time points before tumors are palpable. Immunosignatures change along with tumor development. Using a late-stage immunosignature to predict early samples, or vice versa, cannot achieve high prediction performance. By using the immunosignature of early breast cancer, I show that at the time of diagnosis, early treatment with the checkpoint blockade, anti-PD-L1, inhibits tumor growth in FVB/N neuN transgenic mouse model. The mRNA analysis of the PD-L1 level in mice mammary glands suggests that it is more effective to have treatment early.

Novel discoveries are changing understanding of breast cancer and improving strategies in clinical treatment. Researchers and healthcare professionals are actively working in the early diagnosis and early treatment fields. This dissertation provides a step along the road for better diagnosis and treatment of breast cancer.

ContributorsDuan, Hu (Author) / Johnston, Stephen Albert (Thesis advisor) / Hartwell, Leland Harrison (Committee member) / Dinu, Valentin (Committee member) / Chang, Yung (Committee member) / Arizona State University (Publisher)

Created2015

Integrative analysis of genomic aberrations in cancer and xenograft Models

Description

No two cancers are alike. Cancer is a dynamic and heterogeneous disease, such heterogeneity arise among patients with the same cancer type, among cancer cells within the same individual’s tumor and even among cells within the same sub-clone over time. The recent application of next-generation sequencing and precision medicine techniques…

No two cancers are alike. Cancer is a dynamic and heterogeneous disease, such heterogeneity arise among patients with the same cancer type, among cancer cells within the same individual’s tumor and even among cells within the same sub-clone over time. The recent application of next-generation sequencing and precision medicine techniques is the driving force to uncover the complexity of cancer and the best clinical practice. The core concept of precision medicine is to move away from crowd-based, best-for-most treatment and take individual variability into account when optimizing the prevention and treatment strategies. Next-generation sequencing is the method to sift through the entire 3 billion letters of each patient’s DNA genetic code in a massively parallel fashion.

The deluge of next-generation sequencing data nowadays has shifted the bottleneck of cancer research from multiple “-omics” data collection to integrative analysis and data interpretation. In this dissertation, I attempt to address two distinct, but dependent, challenges. The first is to design specific computational algorithms and tools that can process and extract useful information from the raw data in an efficient, robust, and reproducible manner. The second challenge is to develop high-level computational methods and data frameworks for integrating and interpreting these data. Specifically, Chapter 2 presents a tool called Snipea (SNv Integration, Prioritization, Ensemble, and Annotation) to further identify, prioritize and annotate somatic SNVs (Single Nucleotide Variant) called from multiple variant callers. Chapter 3 describes a novel alignment-based algorithm to accurately and losslessly classify sequencing reads from xenograft models. Chapter 4 describes a direct and biologically motivated framework and associated methods for identification of putative aberrations causing survival difference in GBM patients by integrating whole-genome sequencing, exome sequencing, RNA-Sequencing, methylation array and clinical data. Lastly, chapter 5 explores longitudinal and intratumor heterogeneity studies to reveal the temporal and spatial context of tumor evolution. The long-term goal is to help patients with cancer, particularly those who are in front of us today. Genome-based analysis of the patient tumor can identify genomic alterations unique to each patient’s tumor that are candidate therapeutic targets to decrease therapy resistance and improve clinical outcome.

ContributorsPeng, Sen (Author) / Dinu, Valentin (Thesis advisor) / Scotch, Matthew (Committee member) / Wallstrom, Garrick (Committee member) / Arizona State University (Publisher)

Created2015

Filtering by