treatments, and neo-antigens are the targets of immune system in cancer patients who
respond to the treatments. The cancer vaccine field is focused on using neo-antigens from
unique point mutations of genomic sequence in the cancer patient for making
personalized cancer vaccines. However, we choose a different path to find frameshift
neo-antigens at the mRNA level and develop broadly effective cancer vaccines based on
frameshift antigens.
In this dissertation, I have summarized and characterized all the potential frameshift
antigens from microsatellite regions in human, dog and mouse. A list of frameshift
antigens was validated by PCR in tumor samples and the mutation rate was calculated for
one candidate – SEC62. I develop a method to screen the antibody response against
frameshift antigens in human and dog cancer patients by using frameshift peptide arrays.
Frameshift antigens selected by positive antibody response in cancer patients or by MHC
predictions show protection in different mouse tumor models. A dog version of the
cancer vaccine based on frameshift antigens was developed and tested in a small safety
trial. The results demonstrate that the vaccine is safe and it can induce strong B and T cell
immune responses. Further, I built the human exon junction frameshift database which
includes all possible frameshift antigens from mis-splicing events in exon junctions, and I
develop a method to find potential frameshift antigens from large cancer
immunosignature dataset with these databases. In addition, I test the idea of ‘early cancer
diagnosis, early treatment’ in a transgenic mouse cancer model. The results show that
ii
early treatment gives significantly better protection than late treatment and the correct
time point for treatment is crucial to give the best clinical benefit. A model for early
treatment is developed with these results.
Frameshift neo-antigens from microsatellite regions and mis-splicing events are
abundant at mRNA level and they are better antigens than neo-antigens from point
mutations in the genomic sequences of cancer patients in terms of high immunogenicity,
low probability to cause autoimmune diseases and low cost to develop a broadly effective
vaccine. This dissertation demonstrates the feasibility of using frameshift antigens for
cancer vaccine development.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.
Pathways of Distinction Analysis of Liver Cancer Data: Genetic Differences Between Males and Females
This essay uses census data from the eighteenth century to examine the leadership role of caciques in the Guaraní missions. Cacique succession between 1735 and 1759 confirms that the position of cacique transitioned from the Guaraníes’ flexible interpretation of hereditary succession to the Jesuits’ rigid idea of primogenitor (father to eldest son) succession. This essay argues that scholars overstate the caciques’ leadership role in the Guaraní missions. Adherence to primogenitor succession did not take into account a candidate's leadership qualities, and thus, some caciques functioned as placeholders for organizing the mission population and calculating tribute and not as active leaders. An assortment of other Guaraní leadership positions compensated for this weakness by providing both access to leadership roles for non-caciques who possessed leadership qualities but not the proper bloodline and additional leadership opportunities for more capable caciques. By taking into account leadership qualities and not just descent, these positions provided flexibility and reflected continuity with pre-contact Guaraní ideas about leadership.
Multitrophic communities that maintain the functionality of the extreme Antarctic terrestrial ecosystems, while the simplest of any natural community, are still challenging our knowledge about the limits to life on earth. In this study, we describe and interpret the linkage between the diversity of different trophic level communities to the geological morphology and soil geochemistry in the remote Transantarctic Mountains (Darwin Mountains, 80°S). We examined the distribution and diversity of biota (bacteria, cyanobacteria, lichens, algae, invertebrates) with respect to elevation, age of glacial drift sheets, and soil physicochemistry. Results showed an abiotic spatial gradient with respect to the diversity of the organisms across different trophic levels. More complex communities, in terms of trophic level diversity, were related to the weakly developed younger drifts (Hatherton and Britannia) with higher soil C/N ratio and lower total soluble salts content (thus lower conductivity). Our results indicate that an increase of ion concentration from younger to older drift regions drives a succession of complex to more simple communities, in terms of number of trophic levels and diversity within each group of organisms analysed. This study revealed that integrating diversity across multi-trophic levels of biotic communities with abiotic spatial heterogeneity and geological history is fundamental to understand environmental constraints influencing biological distribution in Antarctic soil ecosystems.
This survey of 206 forensic psychologists tested the “filtering” effects of preexisting expert attitudes in adversarial proceedings. Results confirmed the hypothesis that evaluator attitudes toward capital punishment influence willingness to accept capital case referrals from particular adversarial parties. Stronger death penalty opposition was associated with higher willingness to conduct evaluations for the defense and higher likelihood of rejecting referrals from all sources. Conversely, stronger support was associated with higher willingness to be involved in capital cases generally, regardless of referral source. The findings raise the specter of skewed evaluator involvement in capital evaluations, where evaluators willing to do capital casework may have stronger capital punishment support than evaluators who opt out, and evaluators with strong opposition may work selectively for the defense. The results may provide a partial explanation for the “allegiance effect” in adversarial legal settings such that preexisting attitudes may contribute to partisan participation through a self-selection process.
Background: Immunosignaturing is a new peptide microarray based technology for profiling of humoral immune responses. Despite new challenges, immunosignaturing gives us the opportunity to explore new and fundamentally different research questions. In addition to classifying samples based on disease status, the complex patterns and latent factors underlying immunosignatures, which we attempt to model, may have a diverse range of applications.
Methods: We investigate the utility of a number of statistical methods to determine model performance and address challenges inherent in analyzing immunosignatures. Some of these methods include exploratory and confirmatory factor analyses, classical significance testing, structural equation and mixture modeling.
Results: We demonstrate an ability to classify samples based on disease status and show that immunosignaturing is a very promising technology for screening and presymptomatic screening of disease. In addition, we are able to model complex patterns and latent factors underlying immunosignatures. These latent factors may serve as biomarkers for disease and may play a key role in a bioinformatic method for antibody discovery.
Conclusion: Based on this research, we lay out an analytic framework illustrating how immunosignatures may be useful as a general method for screening and presymptomatic screening of disease as well as antibody discovery.
Background: Microarray image analysis processes scanned digital images of hybridized arrays to produce the input spot-level data for downstream analysis, so it can have a potentially large impact on those and subsequent analysis. Signal saturation is an optical effect that occurs when some pixel values for highly expressed genes or peptides exceed the upper detection threshold of the scanner software (216 - 1 = 65, 535 for 16-bit images). In practice, spots with a sizable number of saturated pixels are often flagged and discarded. Alternatively, the saturated values are used without adjustments for estimating spot intensities. The resulting expression data tend to be biased downwards and can distort high-level analysis that relies on these data. Hence, it is crucial to effectively correct for signal saturation.
Results: We developed a flexible mixture model-based segmentation and spot intensity estimation procedure that accounts for saturated pixels by incorporating a censored component in the mixture model. As demonstrated with biological data and simulation, our method extends the dynamic range of expression data beyond the saturation threshold and is effective in correcting saturation-induced bias when the lost information is not tremendous. We further illustrate the impact of image processing on downstream classification, showing that the proposed method can increase diagnostic accuracy using data from a lymphoma cancer diagnosis study.
Conclusions: The presented method adjusts for signal saturation at the segmentation stage that identifies a pixel as part of the foreground, background or other. The cluster membership of a pixel can be altered versus treating saturated values as truly observed. Thus, the resulting spot intensity estimates may be more accurate than those obtained from existing methods that correct for saturation based on already segmented data. As a model-based segmentation method, our procedure is able to identify inner holes, fuzzy edges and blank spots that are common in microarray images. The approach is independent of microarray platform and applicable to both single- and dual-channel microarrays.
Background: High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data.
Results: We characterized several classification algorithms to analyze immunosignaturing data. We selected several datasets that range from easy to difficult to classify, from simple monoclonal binding to complex binding patterns in asthma patients. We then classified the biological samples using 17 different classification algorithms. Using a wide variety of assessment criteria, we found ‘Naïve Bayes’ far more useful than other widely used methods due to its simplicity, robustness, speed and accuracy.
Conclusions: ‘Naïve Bayes’ algorithm appears to accommodate the complex patterns hidden within multilayered immunosignaturing microarray data due to its fundamental mathematical properties.