Filtering by
- Creators: Barrett, The Honors College
- Creators: Tollis, Marc
Big Data Network Analysis of Genetic Variation and Gene Expression in Individuals with Breast Cancer
Agassiz’s desert tortoise (Gopherus agassizii) is a long-lived species native to the Mojave Desert and is listed as threatened under the US Endangered Species Act. To aid conservation efforts for preserving the genetic diversity of this species, we generated a whole genome reference sequence with an annotation based on deep transcriptome sequences of adult skeletal muscle, lung, brain, and blood. The draft genome assembly for G. agassizii has a scaffold N50 length of 252 kbp and a total length of 2.4 Gbp. Genome annotation reveals 20,172 protein-coding genes in the G. agassizii assembly, and that gene structure is more similar to chicken than other turtles. We provide a series of comparative analyses demonstrating (1) that turtles are among the slowest-evolving genome-enabled reptiles, (2) amino acid changes in genes controlling desert tortoise traits such as shell development, longevity and osmoregulation, and (3) fixed variants across the Gopherus species complex in genes related to desert adaptations, including circadian rhythm and innate immune response. This G. agassizii genome reference and annotation is the first such resource for any tortoise, and will serve as a foundation for future analysis of the genetic basis of adaptations to the desert environment, allow for investigation into genomic factors affecting tortoise health, disease and longevity, and serve as a valuable resource for additional studies in this species complex.
Data Availability: All genomic and transcriptomic sequence files are available from the NIH-NCBI BioProject database (accession numbers PRJNA352725, PRJNA352726, and PRJNA281763). All genome assembly, transcriptome assembly, predicted protein, transcript, genome annotation, repeatmasker, phylogenetic trees, .vcf and GO enrichment files are available on Harvard Dataverse (doi:10.7910/DVN/EH2S9K).
Idiopathic pulmonary fibrosis (IPF) is an interstitial lung disease (ILD) that results in the permanent scarring and damage of lung tissue. Currently, there is no known cause or viable treatment for this disease, and the majority of patients either receive a lung transplant or succumb to the disease within five years of diagnosis. This project centers around studying IPF through analyzing gene expression patterns in healthy vs. diseased lung tissue via spatial transcriptomics. Spatial transcriptomics is the study of individual RNA transcripts within cells on a spatial level. With the novel technology MERFISH, we can detect gene expression in a spatial context with single-cell resolution, allowing us to make inferences about certain patterns of gene expression that are solely driven by the pathology of the disease. A total of 120 cells were selected from 21 different lung samples - 6 healthy; 15 ILD. Within those lung samples, selected from 4 different tissue features - control, less fibrotic, more fibrotic, and cystic. We built an analysis pipeline in R to analyze cell type composition around these features at different distances from the center cell (0-75, 76-150, and 150-225 μm). Cell types were annotated at both a broad (less specific) and fine (more specific) level. Upon analyzing the relationship between the proportions of various cell types and distance from tissue features, we found that within the broad cell type annotation level, airway epithelium cells had a negative relationship with distance and were statistically significant through linear regression models. Within the fine cell type annotation level, ciliated/secretory cells displayed this same trend. The results above support our current understanding of cystic tissue in lung tissue, and is a foundation for understanding disease pathology as a whole.
Previous recombination rate estimation studies in rhesus macaques have been mostly restricted to a singular approach (e.g., using microsatellite loci). Here, we employ a bilateral method in estimating recombination rates—pedigree-based and linkage-disequilibrium-based—from whole-genome data of rhesus macaques to estimate CO and NCO recombination events and to compare contemporary and historical rates of recombination.