Filtering by
- All Subjects: Biology
- Genre: Academic theses
- Resource Type: Text
In order to address this, multiple comparative genomics and bioinformatics analyses were conducted to elucidate patterns of evolution in the green anole and across multiple anole species. Comparative genomics analyses were used to infer additional X-linked loci in the green anole, RNAseq data from male and female samples were anayzed to quantify patterns of sex-biased gene expression across the genome, and the extent of dosage compensation on the anole X chromosome was characterized, providing evidence that the sex chromosomes in the green anole are dosage compensated.
In addition, X-linked genes have a lower ratio of nonsynonymous to synonymous substitution rates than the autosomes when compared to other Anolis species, and pairwise rates of evolution in genes across the anole genome were analyzed. To conduct this analysis a new pipeline was created for filtering alignments and performing batch calculations for whole genome coding sequences. This pipeline has been made publicly available.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.
Structural Equation Modeling (SEM) is a multivariate analysis methodology that could potentially be utilized to examine the barrier effect that river systems have on genetic differentiation. In this project, river systems are split into the variables of Daily Average Discharge, Average River Width, and Seasonality measurements and regressed onto the genetic differentiation, measured as Fst. This data was collected from the USGS database (U.S. Geological Survey, 2020), sequencing files from differing literature, or Google Earth measurements. Different Structural Equation Modeling models are used to model different system structures as well as compare it to more traditional methodologies like Generalized Linear Modeling and Generalized Linear Mixed Modeling. Ultimately results were limited by the small sample size, however, interesting patterns still emerged from the models. The SE models indicate that Discharge plays a primary role in the genetic differentiation of adjacent river populations. In addition to this, the results demonstrate how quantification of indirect effects, particularly those relating to discharge, give more informative interpretations than traditional multivariate statistics alone. These findings prompt further investigations into this potential methodology.