treatments, and neo-antigens are the targets of immune system in cancer patients who
respond to the treatments. The cancer vaccine field is focused on using neo-antigens from
unique point mutations of genomic sequence in the cancer patient for making
personalized cancer vaccines. However, we choose a different path to find frameshift
neo-antigens at the mRNA level and develop broadly effective cancer vaccines based on
frameshift antigens.
In this dissertation, I have summarized and characterized all the potential frameshift
antigens from microsatellite regions in human, dog and mouse. A list of frameshift
antigens was validated by PCR in tumor samples and the mutation rate was calculated for
one candidate – SEC62. I develop a method to screen the antibody response against
frameshift antigens in human and dog cancer patients by using frameshift peptide arrays.
Frameshift antigens selected by positive antibody response in cancer patients or by MHC
predictions show protection in different mouse tumor models. A dog version of the
cancer vaccine based on frameshift antigens was developed and tested in a small safety
trial. The results demonstrate that the vaccine is safe and it can induce strong B and T cell
immune responses. Further, I built the human exon junction frameshift database which
includes all possible frameshift antigens from mis-splicing events in exon junctions, and I
develop a method to find potential frameshift antigens from large cancer
immunosignature dataset with these databases. In addition, I test the idea of ‘early cancer
diagnosis, early treatment’ in a transgenic mouse cancer model. The results show that
ii
early treatment gives significantly better protection than late treatment and the correct
time point for treatment is crucial to give the best clinical benefit. A model for early
treatment is developed with these results.
Frameshift neo-antigens from microsatellite regions and mis-splicing events are
abundant at mRNA level and they are better antigens than neo-antigens from point
mutations in the genomic sequences of cancer patients in terms of high immunogenicity,
low probability to cause autoimmune diseases and low cost to develop a broadly effective
vaccine. This dissertation demonstrates the feasibility of using frameshift antigens for
cancer vaccine development.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.
Pathways of Distinction Analysis of Liver Cancer Data: Genetic Differences Between Males and Females
Mild Cognitive Impairment (MCI) is a transitional stage between normal aging and dementia and people with MCI are at high risk of progression to dementia. MCI is attracting increasing attention, as it offers an opportunity to target the disease process during an early symptomatic stage. Structural magnetic resonance imaging (MRI) measures have been the mainstay of Alzheimer's disease (AD) imaging research, however, ventricular morphometry analysis remains challenging because of its complicated topological structure. Here we describe a novel ventricular morphometry system based on the hyperbolic Ricci flow method and tensor-based morphometry (TBM) statistics. Unlike prior ventricular surface parameterization methods, hyperbolic conformal parameterization is angle-preserving and does not have any singularities. Our system generates a one-to-one diffeomorphic mapping between ventricular surfaces with consistent boundary matching conditions. The TBM statistics encode a great deal of surface deformation information that could be inaccessible or overlooked by other methods. We applied our system to the baseline MRI scans of a set of MCI subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI: 71 MCI converters vs. 62 MCI stable). Although the combined ventricular area and volume features did not differ between the two groups, our fine-grained surface analysis revealed significant differences in the ventricular regions close to the temporal lobe and posterior cingulate, structures that are affected early in AD. Significant correlations were also detected between ventricular morphometry, neuropsychological measures, and a previously described imaging index based on fluorodeoxyglucose positron emission tomography (FDG-PET) scans. This novel ventricular morphometry method may offer a new and more sensitive approach to study preclinical and early symptomatic stage AD.