Matching Items (2)
Filtering by

Clear all filters

158849-Thumbnail Image.png
Description
Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The

Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.
ContributorsOrr, Adam James (Author) / Cartwright, Reed (Thesis advisor) / Wilson, Melissa (Committee member) / Kusumi, Kenro (Committee member) / Taylor, Jesse (Committee member) / Pfeifer, Susanne (Committee member) / Arizona State University (Publisher)
Created2020
190922-Thumbnail Image.png
Description
Mutation is the source of heritable variation of genotype and phenotype, on which selection may act. Mutation rates describe a fundamental parameter of living things, which influence the rate at which evolution may occur, from viral pathogens to human crops and even to aging cells and the emergence of cancer.

Mutation is the source of heritable variation of genotype and phenotype, on which selection may act. Mutation rates describe a fundamental parameter of living things, which influence the rate at which evolution may occur, from viral pathogens to human crops and even to aging cells and the emergence of cancer. An understanding of the variables which impact mutation rates and their estimation is necessary to place mutation rate estimates in their proper contexts. To better understand mutation rate estimates, this research investigates the impact of temperature upon transcription rate error estimates; the impact of growing cells in liquid culture vs. on agar plates; the impact of many in vitro variables upon the estimation of deoxyribonucleic acid (DNA) mutation rates from a single sample; and the mutational hazard induced by expressing clustered regularly interspaced short palindromic repeat (CRISPR) proteins in yeast. This research finds that many of the variables tested did not significantly alter the estimation of mutation rates, strengthening the claims of previous mutation rate estimates across the tree of life by diverse experimental approaches. However, it is clear that sonication is a mutagen of DNA, part of an effort which has reduced the sequencing error rate of circle-seq by over 1,000-fold. This research also demonstrates that growth in liquid culture modestly skews the mutation spectrum of MMR- Escherichia coli, though it does not significantly impact the overall mutation rate. Finally, this research demonstrates a modest mutational hazard of expressing Cas9 and similar CRISPR proteins in yeast cells at an un-targeted genomic locus, though it is possible the indel rate has been increased by an order of magnitude.
ContributorsBaehr, Stephan (Author) / Lynch, Michael (Thesis advisor) / Geiler-Samerotte, Kerry (Committee member) / Mangone, Marco (Committee member) / Wilson, Melissa (Committee member) / Arizona State University (Publisher)
Created2023