The incidence of childhood obesity has become increasingly prevalent in the United States in recent years. The development of obesity at any age, but especially in adolescence, can have lasting negative effects in the form of cardiometabolic disease, increased incurred healthcare costs, and potential negative effects on quality of life. In recent years, a rising trend of obesity, in both adults and adolescents, has been observed in lower income and ethnic groups. Increased adiposity can be influenced by modifiable factors -(physical activity, caloric intake, or sleep) or by non-modifiable factors (ethnicity, genetic predispositions, and socioeconomic status). The influence of these factors can be observed in individuals of all ages, including infants. A common indicator of the development of childhood obesity is rapid weight gain (RWG) within an infant’s first year of life. The composition of the gut microbiome can act as a predictor for RWG and the development of childhood obesity. Infants are exposed to an immense microbial load when they are born and their gut microbiome is continually diversified through their method of feeding and the subsequent introduction to solid foods. While currently understudied, it is understood that cultural and socioeconomic factors influence the development of the gut microbiome, which is further explored in this analysis. The DNA from 51 fecal samples from infants ranging from 3 weeks to 12 months in age was extracted and sequenced using next-generation sequencing, and the resulting sequences were analyzed using QIIME 2. Results from alpha-diversity and beta-diversity metrics showed significant differences in the gut microbiome of infants when comparing groups based on baby race/ethnicity, household income, and mom’s education. These findings suggest the importance of sociodemographic characteristics in shaping the gut microbiome and suggest the importance of future studies including diverse populations in gut microbiome work.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.