Search Content

A Robust scRNA-seq Data Analysis Pipeline for Measuring Gene Expression Noise

Description

The past decade has seen a drastic increase in collaboration between Computer Science (CS) and Molecular Biology (MB). Current foci in CS such as deep learning require very large amounts of data, and MB research can often be rapidly advanced by analysis and models from CS. One of the places…

The past decade has seen a drastic increase in collaboration between Computer Science (CS) and Molecular Biology (MB). Current foci in CS such as deep learning require very large amounts of data, and MB research can often be rapidly advanced by analysis and models from CS. One of the places where CS could aid MB is during analysis of sequences to find binding sites, prediction of folding patterns of proteins. Maintenance and replication of stem-like cells is possible for long terms as well as differentiation of these cells into various tissue types. These behaviors are possible by controlling the expression of specific genes. These genes then cascade into a network effect by either promoting or repressing downstream gene expression. The expression level of all gene transcripts within a single cell can be analyzed using single cell RNA sequencing (scRNA-seq). A significant portion of noise in scRNA-seq data are results of extrinsic factors and could only be removed by customized scRNA-seq analysis pipeline. scRNA-seq experiments utilize next-gen sequencing to measure genome scale gene expression levels with single cell resolution.

Almost every step during analysis and quantification requires the use of an often empirically determined threshold, which makes quantification of noise less accurate. In addition, each research group often develops their own data analysis pipeline making it impossible to compare data from different groups. To remedy this problem a streamlined and standardized scRNA-seq data analysis and normalization protocol was designed and developed. After analyzing multiple experiments we identified the possible pipeline stages, and tools needed. Our pipeline is capable of handling data with adapters and barcodes, which was not the case with pipelines from some experiments. Our pipeline can be used to analyze single experiment scRNA-seq data and also to compare scRNA-seq data across experiments. Various processes like data gathering, file conversion, and data merging were automated in the pipeline. The main focus was to standardize and normalize single-cell RNA-seq data to minimize technical noise introduced by disparate platforms.

ContributorsBalachandran, Parithi (Author) / Wang, Xiao (Thesis advisor) / Brafman, David (Committee member) / Lockhart, Thurmon (Committee member) / Arizona State University (Publisher)

Created2017

Generation of isogenic pluripotent stem cell lines for study of APOE, an Alzheimer’s risk factor

Description

Alzheimer’s disease (AD), despite over a century of research, does not have a clearly defined pathogenesis for the sporadic form that makes up the majority of disease incidence. A variety of correlative risk factors have been identified, including the three isoforms of apolipoprotein E (ApoE), a cholesterol transport protein in…

Alzheimer’s disease (AD), despite over a century of research, does not have a clearly defined pathogenesis for the sporadic form that makes up the majority of disease incidence. A variety of correlative risk factors have been identified, including the three isoforms of apolipoprotein E (ApoE), a cholesterol transport protein in the central nervous system. ApoE ε3 is the wild-type variant with no effect on risk. ApoE ε2, the protective and most rare variant, reduces risk of developing AD by 40%. ApoE ε4, the risk variant, increases risk by 3.2-fold and 14.9-fold for heterozygous and homozygous representation respectively. Study of these isoforms has been historically complex, but the advent of human induced pluripotent stem cells (hiPSC) provides the means for highly controlled, longitudinal in vitro study. The effect of ApoE variants can be further elucidated using this platform by generating isogenic hiPSC lines through precise genetic modification, the objective of this research. As the difference between alleles is determined by two cytosine-thymine polymorphisms, a specialized CRISPR/Cas9 system for direct base conversion was able to be successfully employed. The base conversion method for transitioning from the ε3 to ε2 allele was first verified using the HEK293 cell line as a model with delivery via electroporation. Following this verification, the transfection method was optimized using two hiPSC lines derived from ε4/ε4 patients, with a lipofection technique ultimately resulting in successful base conversion at the same site verified in the HEK293 model. Additional research performed included characterization of the pre-modification genotype with respect to likely off-target sites and methods of isolating clonal variants.

ContributorsLakers, Mary Frances (Author) / Brafman, David (Thesis advisor) / Haynes, Karmella (Committee member) / Wang, Xiao (Committee member) / Arizona State University (Publisher)

Created2017

Increased Enrichment and Generation of Isogenic Lines Using a Transient Reporter for Editing Enrichment

Description

Alzheimer’s disease (AD) affects over 5 million individuals each year in the United States. Furthermore, most cases of AD are sporadic, making it extremely difficult to model and study in vitro. CRISPR/Cas9 and base editing technologies have been of recent interest because of their ability to create single nucleotide edits…

Alzheimer’s disease (AD) affects over 5 million individuals each year in the United States. Furthermore, most cases of AD are sporadic, making it extremely difficult to model and study in vitro. CRISPR/Cas9 and base editing technologies have been of recent interest because of their ability to create single nucleotide edits at nearly any genomic sequence using a Cas9 protein and a guide RNA (sgRNA). Currently, there is no available phenotype to differentiate edited cells from unedited cells. Past research has employed fluorescent proteins bound to Cas9 proteins to attempt to enrich for edited cells, however, these methods are only reporters of transfection (RoT) and are no indicative of actual base-editing occurring. Thus, this study proposes a transient reporter for editing enrichment (TREE) and Cas9-mediated adenosine TREE (CasMasTREE) which use plasmids to co-transfect with CRISPR/Cas9 technologies to serve as an indicator of base-editing. Specifically, TREE features a blue fluorescent protein (BFP) mutant that, upon a C-T conversion, changes the emission spectrum to a green fluorescent protein (GFP). CasMasTREE features a mCherry and GFP protein separated by a stop codon which can be negated using an A-G conversion. By employing a sgRNA that targets one of the TREE plasmids and at least one genomic site, cells can be sorted for GFP(+) cells. Using these methods, base-edited isogenic hiPSC line generation using TREE (BIG-TREE) was created to generate isogenic hiPSC lines with AD-relevant edits. For example, BIG-TREE demonstrates the capability of converting Apolipoprotein E (APOE), a gene associated with AD-risk development, wildtype (3/3) into another isoform, APOE2/2, to create isogenic hiPSC lines. The capabilities of TREE are vast and can be applied to generate various models of diseases with specific genomic edits.

ContributorsNguyen, Toan Thai Tran (Author) / Brafman, David (Thesis advisor) / Wang, Xiao (Committee member) / Tian, Xiaojun (Committee member) / Arizona State University (Publisher)

Created2020

Engineering of Synthetic DNA/RNA Modules for Manipulating Gene Expression and Circuit Dynamics

Description

Gene circuit engineering facilitates the discovery and understanding of fundamental biology and has been widely used in various biological applications. In synthetic biology, gene circuits are often constructed by two main strategies: either monocistronic or polycistronic constructions. The Latter architecture can be commonly found in prokaryotes, eukaryotes, and viruses and…

Gene circuit engineering facilitates the discovery and understanding of fundamental biology and has been widely used in various biological applications. In synthetic biology, gene circuits are often constructed by two main strategies: either monocistronic or polycistronic constructions. The Latter architecture can be commonly found in prokaryotes, eukaryotes, and viruses and has been largely applied in gene circuit engineering. In this work, the effect of adjacent genes and noncoding regions are systematically investigated through the construction of batteries of gene circuits in diverse scenarios. Data-driven analysis yields a protein expression metric that strongly correlates with the features of adjacent transcriptional regions (ATRs). This novel mathematical tool helps the guide for circuit construction and has the implication for the design of synthetic ATRs to tune gene expression, illustrating its potential to facilitate engineering complex gene networks. The ability to tune RNA dynamics is greatly needed for biotech applications, including therapeutics and diagnostics. Diverse methods have been developed to tune gene expression through transcriptional or translational manipulation. Control of RNA stability/degradation is often overlooked and can be the lightweight alternative to regulate protein yields. To further extend the utility of engineered ATRs to regulate gene expression, a library of RNA modules named degradation-tuning RNAs (dtRNAs) are designed with the ability to form specific 5’ secondary structures prior to RBS. These modules can modulate transcript stability while having a minimal interference on translation initiation. Optimization of their functional structural features enables gene expression level to be tuned over a wide dynamic range. These engineered dtRNAs are capable of regulating gene circuit dynamics as well as noncoding RNA levels and can be further expanded into cell-free system for gene expression control in vitro. Finally, integrating dtRNA with synthetic toehold sensor enables improved paper-based viral diagnostics, illustrating the potential of using synthetic dtRNAs for biomedical applications.

ContributorsZhang, Qi (Author) / Wang, Xiao (Thesis advisor) / Green, Alexander (Committee member) / Brafman, David (Committee member) / Tian, Xiaojun (Committee member) / Plaisier, Christopher (Committee member) / Arizona State University (Publisher)

Created2020

Landscape of Gene Regulatory Network Motifs

Description

The human transcriptional regulatory machine utilizes hundreds of transcription factors which bind to specific genic sites resulting in either activation or repression of targeted genes. Networks comprised of nodes and edges can be constructed to model the relationships of regulators and their targets. Within these biological networks small enriched structural…

The human transcriptional regulatory machine utilizes hundreds of transcription factors which bind to specific genic sites resulting in either activation or repression of targeted genes. Networks comprised of nodes and edges can be constructed to model the relationships of regulators and their targets. Within these biological networks small enriched structural patterns containing at least three nodes can be identified as potential building blocks from which a network is organized. A first iteration computational pipeline was designed to generate a disease specific gene regulatory network for motif detection using established computational tools. The first goal was to identify motifs that can express themselves in a state that results in differential patient survival in one of the 32 different cancer types studied. This study identified issues for detecting strongly correlated motifs that also effect patient survival, yielding preliminary results for possible driving cancer etiology. Second, a comparison was performed for the topology of network motifs across multiple different data types to identify possible divergence from a conserved enrichment pattern in network perturbing diseases. The topology of enriched motifs across all the datasets converged upon a single conserved pattern reported in a previous study which did not appear to diverge dependent upon the type of disease. This report highlights possible methods to improve detection of disease driving motifs that can aid in identifying possible treatment targets in cancer. Finally, networks where only minimally perturbed, suggesting that regulatory programs were run from evolved circuits into a cancer context.

ContributorsStriker, Shawn Scott (Author) / Plaisier, Christopher (Thesis advisor) / Brafman, David (Committee member) / Wang, Xiao (Committee member) / Arizona State University (Publisher)

Created2020

Investigating the Role of APOE2 in Alzheimer's Disease Using Human Induced Pluripotent Stem Cell Derived Neurons and Astrocytes

Description

Genome wide association studies (GWAS) have identified polymorphism in the Apolipoprotein E (APOE) gene to be the most prominent risk factor for Alzheimer’s disease (AD). Compared to individuals homozygous for the APOE3 variant, individuals with the APOE4 variant have a significantly elevated risk of AD. On the other hand, longitudinal…

Genome wide association studies (GWAS) have identified polymorphism in the Apolipoprotein E (APOE) gene to be the most prominent risk factor for Alzheimer’s disease (AD). Compared to individuals homozygous for the APOE3 variant, individuals with the APOE4 variant have a significantly elevated risk of AD. On the other hand, longitudinal studies have shown that the presence of the APOE2 variant reduces lifetime risk of developing AD by 40 percent. While there has been significant research that has identified the risk-inducing effects of APOE4, the underlying mechanisms by which APOE2 influences AD onset and progression have not been extensively explored. The hallmarks of AD pathology manifest in human neurons in the form of extracellular amyloid deposits and intracellular neurofibrillary tangles, whereas astrocytes are the primary source of the APOE protein in the brain. In this study, an isogenic human induced pluripotent stem cell (hiPSC)-based system is utilized to demonstrate that conversion of APOE3 to APOE2 greatly reduced the production of amyloid-beta (Aβ) peptides in hiPSC-derived neural cultures. Mechanistically, analysis of pure populations of neurons and astrocytes derived from these neural cultures revealed that mitigating effects of APOE2 is mediated by cell autonomous and non-autonomous effects. In particular, it was demonstrated the reduction in Aβ and pathogenic β-C-terminal fragments (APP-βCTF) is potentially driven by a mechanism related to non-amyloidogenic processing of amyloid precursor protein (APP), suggesting a gain of protective function of the APOE2 variant. Together, this study provides insights into the risk-modifying effects associated with the APOE2 allele and establishes a platform to probe the mechanisms by which APOE2 enhances neuroprotection against AD.

ContributorsRaman, Sreedevi (Author) / Brafman, David (Thesis advisor) / Smith, Barbara (Committee member) / Plaiser, Christopher (Committee member) / Wang, Xiao (Committee member) / Tian, Xiaojun (Committee member) / Arizona State University (Publisher)

Created2021

Current Systematic Carbon-Cycle Observations and the Need for Implementing a Policy-Relevant Carbon Observing System

Description

A globally integrated carbon observation and analysis system is needed to improve the fundamental understanding of the global carbon cycle, to improve our ability to project future changes, and to verify the effectiveness of policies aiming to reduce greenhouse gas emissions and increase carbon sequestration. Building an integrated carbon observation…

A globally integrated carbon observation and analysis system is needed to improve the fundamental understanding of the global carbon cycle, to improve our ability to project future changes, and to verify the effectiveness of policies aiming to reduce greenhouse gas emissions and increase carbon sequestration. Building an integrated carbon observation system requires transformational advances from the existing sparse, exploratory framework towards a dense, robust, and sustained system in all components: anthropogenic emissions, the atmosphere, the ocean, and the terrestrial biosphere. The paper is addressed to scientists, policymakers, and funding agencies who need to have a global picture of the current state of the (diverse) carbon observations.

We identify the current state of carbon observations, and the needs and notional requirements for a global integrated carbon observation system that can be built in the next decade. A key conclusion is the substantial expansion of the ground-based observation networks required to reach the high spatial resolution for CO₂ and CH₄ fluxes, and for carbon stocks for addressing policy-relevant objectives, and attributing flux changes to underlying processes in each region. In order to establish flux and stock diagnostics over areas such as the southern oceans, tropical forests, and the Arctic, in situ observations will have to be complemented with remote-sensing measurements. Remote sensing offers the advantage of dense spatial coverage and frequent revisit. A key challenge is to bring remote-sensing measurements to a level of long-term consistency and accuracy so that they can be efficiently combined in models to reduce uncertainties, in synergy with ground-based data.

Bringing tight observational constraints on fossil fuel and land use change emissions will be the biggest challenge for deployment of a policy-relevant integrated carbon observation system. This will require in situ and remotely sensed data at much higher resolution and density than currently achieved for natural fluxes, although over a small land area (cities, industrial sites, power plants), as well as the inclusion of fossil fuel CO₂ proxy measurements such as radiocarbon in CO₂ and carbon-fuel combustion tracers. Additionally, a policy-relevant carbon monitoring system should also provide mechanisms for reconciling regional top-down (atmosphere-based) and bottom-up (surface-based) flux estimates across the range of spatial and temporal scales relevant to mitigation policies. In addition, uncertainties for each observation data-stream should be assessed. The success of the system will rely on long-term commitments to monitoring, on improved international collaboration to fill gaps in the current observations, on sustained efforts to improve access to the different data streams and make databases interoperable, and on the calibration of each component of the system to agreed-upon international scales.

ContributorsCiais, P. (Author) / Dolman, A. J. (Author) / Bombelli, A. (Author) / Duren, R. (Author) / Peregon, A. (Author) / Rayner, P. J. (Author) / Miller, C. (Author) / Gobron, N. (Author) / Kinderman, G. (Author) / Marland, G. (Author) / Gruber, N. (Author) / Chevallier, F. (Author) / Andres, R. J. (Author) / Balsamo, G. (Author) / Bopp, L. (Author) / Breon, F. -M. (Author) / Broquet, G. (Author) / Dargaville, R. (Author) / Battin, T. J. (Author) / Borges, A. (Author) / Bovensmann, H. (Author) / Buchwitz, M. (Author) / Butler, J. (Author) / Canadell, J. G. (Author) / Cook, R. B. (Author) / DeFries, R. (Author) / Engelen, R. (Author) / Gurney, Kevin (Author) / Heinze, C. (Author) / Heimann, M. (Author) / Held, A. (Author) / Henry, M. (Author) / Law, B. (Author) / Luyssaert, S. (Author) / Miller, J. (Author) / Moriyama, T. (Author) / Moulin, C. (Author) / Myneni, R. (Author) / College of Liberal Arts and Sciences (Contributor)

Created2013-11-30

The Geological Nature of Dark Material on Vesta and Implications for the Subsurface Structure

Description

Deposits of dark material appear on Vesta’s surface as features of relatively low-albedo in the visible wavelength range of Dawn’s camera and spectrometer. Mixed with the regolith and partially excavated by younger impacts, the material is exposed as individual layered outcrops in crater walls or ejecta patches, having been uncovered…

Deposits of dark material appear on Vesta’s surface as features of relatively low-albedo in the visible wavelength range of Dawn’s camera and spectrometer. Mixed with the regolith and partially excavated by younger impacts, the material is exposed as individual layered outcrops in crater walls or ejecta patches, having been uncovered and broken up by the impact. Dark fans on crater walls and dark deposits on crater floors are the result of gravity-driven mass wasting triggered by steep slopes and impact seismicity. The fact that dark material is mixed with impact ejecta indicates that it has been processed together with the ejected material. Some small craters display continuous dark ejecta similar to lunar dark-halo impact craters, indicating that the impact excavated the material from beneath a higher-albedo surface. The asymmetric distribution of dark material in impact craters and ejecta suggests non-continuous distribution in the local subsurface. Some positive-relief dark edifices appear to be impact-sculpted hills with dark material distributed over the hill slopes.

Dark features inside and outside of craters are in some places arranged as linear outcrops along scarps or as dark streaks perpendicular to the local topography. The spectral characteristics of the dark material resemble that of Vesta’s regolith. Dark material is distributed unevenly across Vesta’s surface with clusters of all types of dark material exposures. On a local scale, some craters expose or are associated with dark material, while others in the immediate vicinity do not show evidence for dark material. While the variety of surface exposures of dark material and their different geological correlations with surface features, as well as their uneven distribution, indicate a globally inhomogeneous distribution in the subsurface, the dark material seems to be correlated with the rim and ejecta of the older Veneneia south polar basin structure. The origin of the dark material is still being debated, however, the geological analysis suggests that it is exogenic, from carbon-rich low-velocity impactors, rather than endogenic, from freshly exposed mafic material or melt, exposed or created by impacts.

ContributorsJaumann, R. (Author) / Nass, A. (Author) / Otto, K. (Author) / Krohn, K. (Author) / Stephan, K. (Author) / McCord, T. B. (Author) / Williams, David (Author) / Raymond, C. A. (Author) / Blewett, D. T. (Author) / Hiesinger, H. (Author) / Yingst, R. A. (Author) / De Sanctis, M. C. (Author) / Palomba, E. (Author) / Roatsch, T. (Author) / Matz, K-D. (Author) / Preusker, F. (Author) / Scholten, F. (Author) / Russell, C. T. (Author) / College of Liberal Arts and Sciences (Contributor)

Created2014-09-15

Endogenous WNT Signaling Regulates hPSC-Derived Neural Progenitor Cell Heterogeneity and Specifies Their Regional Identity

Description

Neural progenitor cells (NPCs) derived from human pluripotent stem cells (hPSCs) are a multipotent cell population that is capable of nearly indefinite expansion and subsequent differentiation into the various neuronal and supporting cell types that comprise the CNS. However, current protocols for differentiating NPCs toward neuronal lineages result in a…

Neural progenitor cells (NPCs) derived from human pluripotent stem cells (hPSCs) are a multipotent cell population that is capable of nearly indefinite expansion and subsequent differentiation into the various neuronal and supporting cell types that comprise the CNS. However, current protocols for differentiating NPCs toward neuronal lineages result in a mixture of neurons from various regions of the CNS. In this study, we determined that endogenous WNT signaling is a primary contributor to the heterogeneity observed in NPC cultures and neuronal differentiation. Furthermore, exogenous manipulation of WNT signaling during neural differentiation, through either activation or inhibition, reduces this heterogeneity in NPC cultures, thereby promoting the formation of regionally homogeneous NPC and neuronal cultures. The ability to manipulate WNT signaling to generate regionally specific NPCs and neurons will be useful for studying human neural development and will greatly enhance the translational potential of hPSCs for neural-related therapies.

ContributorsMoya, Noel (Author) / Cutts, Joshua (Author) / Gaasterland, Terry (Author) / Willert, Karl (Author) / Brafman, David (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2014-12-09

Sensitivity of Simulated CO2 Concentration to Regridding of Global Fossil Fuel CO2 Emissions

Description

Errors in the specification or utilization of fossil fuel CO₂ emissions within carbon budget or atmospheric CO₂ inverse studies can alias the estimation of biospheric and oceanic carbon exchange. A key component in the simulation of CO₂ concentrations arising from fossil fuel emissions is the spatial distribution of the emission…

Errors in the specification or utilization of fossil fuel CO₂ emissions within carbon budget or atmospheric CO₂ inverse studies can alias the estimation of biospheric and oceanic carbon exchange. A key component in the simulation of CO₂ concentrations arising from fossil fuel emissions is the spatial distribution of the emission near coastlines. Regridding of fossil fuel CO₂ emissions (FFCO₂) from fine to coarse grids to enable atmospheric transport simulations can give rise to mismatches between the emissions and simulated atmospheric dynamics which differ over land or water. For example, emissions originally emanating from the land are emitted from a grid cell for which the vertical mixing reflects the roughness and/or surface energy exchange of an ocean surface. We test this potential "dynamical inconsistency" by examining simulated global atmospheric CO₂ concentration driven by two different approaches to regridding fossil fuel CO₂ emissions. The two approaches are as follows: (1) a commonly used method that allocates emissions to grid cells with no attempt to ensure dynamical consistency with atmospheric transport and (2) an improved method that reallocates emissions to grid cells to ensure dynamically consistent results. Results show large spatial and temporal differences in the simulated CO₂ concentration when comparing these two approaches. The emissions difference ranges from −30.3 TgC grid cell^-1 yr^-1 (−3.39 kgC m^-2 yr^-1) to +30.0 TgC grid cell^-1 yr^-1 (+2.6 kgC m^-2 yr^-1) along coastal margins. Maximum simulated annual mean CO₂ concentration differences at the surface exceed ±6 ppm at various locations and times. Examination of the current CO₂ monitoring locations during the local afternoon, consistent with inversion modeling system sampling and measurement protocols, finds maximum hourly differences at 38 stations exceed ±0.10 ppm with individual station differences exceeding −32 ppm. The differences implied by not accounting for this dynamical consistency problem are largest at monitoring sites proximal to large coastal urban areas and point sources. These results suggest that studies comparing simulated to observed atmospheric CO₂ concentration, such as atmospheric CO₂ inversions, must take measures to correct for this potential problem and ensure flux and dynamical consistency.

ContributorsZhang, X. (Author) / Gurney, Kevin (Author) / Rayner, P. (Author) / Liu, Y. (Author) / Asefi-Najafabady, Salvi (Author) / College of Liberal Arts and Sciences (Contributor)

Created2013-11-30

Filtering by