Matching Items (3)
Filtering by

Clear all filters

134552-Thumbnail Image.png
Description
There are several challenges to accurately inferring levels of transcription using RNA-sequencing (RNA-seq) data, including detecting and correcting for reference genome mapping bias. One potential confounder of RNA-seq analysis results from the application of a standardized pipeline to samples of different sexes in species with chromosomal sex determination. The homology

There are several challenges to accurately inferring levels of transcription using RNA-sequencing (RNA-seq) data, including detecting and correcting for reference genome mapping bias. One potential confounder of RNA-seq analysis results from the application of a standardized pipeline to samples of different sexes in species with chromosomal sex determination. The homology between the human X and Y chromosomes will routinely cause mismapping to occur, artificially biasing estimates of sex-biased gene transcription. For this reason we tested sex-specific mapping scenarios in humans on RNA-seq samples from the brains of 5 genetic females and 5 genetic males to assess how inferences of differential gene expression patterns change depending on the reference genome. We first applied a mapping protocol where we mapped all individuals to the entire human reference genome (complete), including the X and Y chromosomes, and computed differential expression between the set of genetic male and genetic female samples. We next mapped the genetic female samples (46,XX) to the human reference genome with the Y chromosome removed (Y-excluded) and the genetic male samples (46, XY) to the human reference genome (including the Y chromosome), but with the pseudoautosomal regions of the Y chromosome hard-masked (YPARs-masked) for the two sex-specific mappings. Using the complete and sex-specific mapping protocols, we compared the differential expression measurements of genetic males and genetic females from cuffDiff outputs. The second strategy called 33 additional genes as being differentially expressed between the two sexes when compared to the complete mapping protocol. This research provides a framework for a new standard of reference genome mappings to correct for sex-biased gene expression estimates that can be used in future studies.
ContributorsBrotman, Sarah Marie (Author) / Wilson Sayres, Melissa (Thesis director) / Crook, Sharon (Committee member) / Webster, Timothy (Committee member) / School of Life Sciences (Contributor) / School of Mathematical and Natural Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2017-05
135114-Thumbnail Image.png
Description
Unlike the autosomes, recombination on the sex chromosomes is limited to the pseudoautosomal regions (PARs) at each end of the chromosome. PAR1 spans approximately 2.7 Mb from the tip of the proximal arm of each sex chromosome, and a pseudoautosomal boundary between the PAR1 and non-PAR region is thought to

Unlike the autosomes, recombination on the sex chromosomes is limited to the pseudoautosomal regions (PARs) at each end of the chromosome. PAR1 spans approximately 2.7 Mb from the tip of the proximal arm of each sex chromosome, and a pseudoautosomal boundary between the PAR1 and non-PAR region is thought to have evolved from a Y-specific inversion that suppressed recombination across the boundary. In addition to the two PARs, there is also a human-specific X-transposed region (XTR) that was duplicated from the X to the Y chromosome. Genetic diversity is expected to be higher in recombining than nonrecombining regions, particularly because recombination reduces the effects of linked selection, allowing neutral variation to accumulate. We previously showed that diversity decreases linearly across the previously defined pseudoautosomal boundary (rather than drop suddenly at the boundary), suggesting that the pseudoautosomal boundary may not be as strict as previously thought. In this study, we analyzed data from 1271 genetic females to explore the extent to which the pseudoautosomal boundary varies among human populations (broadly, African, European, South Asian, East Asian, and the Americas). We found that, in all populations, genetic diversity was significantly higher in the PAR1 and XTR than in the non-PAR regions, and that diversity decreased linearly from the PAR1 to finally reach a non-PAR value well past the pseudoautosomal boundary in all populations. However, we also found that the location at which diversity changes from reflecting the higher PAR1 diversity to the lower nonPAR diversity varied by as much as 500 kb among populations. The lack of genetic evidence for a strict pseudoautosomal boundary and the variability in patterns of diversity across the pseudoautosomal boundary are consistent with two potential explanations: (1) the boundary itself may vary across populations, or (2) that population-specific demographic histories have shaped diversity across the pseudoautosomal boundary.
ContributorsCotter, Daniel Juetten (Author) / Wilson Sayres, Melissa (Thesis director) / Stone, Anne (Committee member) / Webster, Timothy (Committee member) / School of Life Sciences (Contributor) / School of International Letters and Cultures (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
132667-Thumbnail Image.png
Description
In recent years, experimental and theoretical evidence has pointed to the existence of biologically active proteins that either include unstructured regions or are entirely unstructured. Referred to as intrinsically disordered proteins (IDPs), they are now known to be involved in diverse functions, much as any folded protein. Mutations

In recent years, experimental and theoretical evidence has pointed to the existence of biologically active proteins that either include unstructured regions or are entirely unstructured. Referred to as intrinsically disordered proteins (IDPs), they are now known to be involved in diverse functions, much as any folded protein. Mutations in IDPs have been implicated in multiple neurodegenerative diseases. Considering the disordered nature of IDPs, there are limited structure features that can be used to quantify the disordered state. One such pair of variables are the radius of gyration (Rg) and the corresponding Flory’s scaling exponent, both of which characterize the dimension and size of the protein. It is generally understood that the sequence of an IDP affects its Rg and scaling exponent. Properties such as amino acid hydrophobicity and charge can play important roles in determining the Rg of an IDP, much as they affect the structure of a folded protein. However, it is nontrivial to directly predict Rg and scaling exponent from an IDP sequence. In this thesis, a coarse-grained model is used to simulate the Rg and scaling exponents of 10,000 randomly generated sequences mimicking the amino acid propensities of a typical IDP sequence. Such a database is then fed into an artificial neural network model to directly predict the scaling exponent from the sequence. The framework has not only made accurate and precise predictions (<1% error) in comparing to the simulation-obtained scaling exponent, but also suggest important sequence descriptors for such prediction. In addition, through varying the number of sequences for training the model, we suggest a minimum dataset of 100 sequences might be sufficient to achieve a 5% error of prediction, shedding light upon possible predictive models with only experimental inputs.
ContributorsBrown, Matthew D (Author) / Zheng, Wenwei (Thesis director) / Huffman, Holly (Committee member) / College of Integrative Sciences and Arts (Contributor) / Barrett, The Honors College (Contributor)
Created2019-05