Filtering by
AAbs provide value in identifying individuals at risk, stratifying patients with different clinical courses, improving our understanding of autoimmune destructions, identifying antigens for cellular immune response and providing candidates for prevention trials in T1D. A two-stage serological AAb screening against 6,000 human proteins was performed. A dual specificity tyrosine-phosphorylation-regulated kinase 2 (DYRK2) was validated with 36% sensitivity at 98% specificity by an orthogonal immunoassay. This is the first systematic screening for novel AAbs against large number of human proteins by protein arrays in T1D. A more comprehensive search for novel AAbs was performed using a knowledge-based approach by ELISA and a screening-based approach against 10,000 human proteins by NAPPA. Six AAbs were identified and validated with sensitivities ranged from 16% to 27% at 95% specificity. These two studies enriched the T1D “autoantigenome” and provided insights into T1D pathophysiology in an unprecedented breadth and width.
The rapid rise of T1D incidence suggests the potential involvement of environmental factors including viral infections. Sero-reactivity to 646 viral antigens was assessed in new-onset T1D patients. Antibody positive rate of EBV was significantly higher in cases than controls that suggested a potential role of EBV in T1D development. A high density-NAPPA platform was demonstrated with high reproducibility and sensitivity in profiling anti-viral antibodies.
This dissertation shows the power of a protein-array based immunoproteomics approach to characterize humoral immunoprofile against human and viral proteomes. The identification of novel T1D-specific AAbs and T1D-associated viruses will help to connect the nodes in T1D etiology and provide better understanding of T1D pathophysiology.
treatments, and neo-antigens are the targets of immune system in cancer patients who
respond to the treatments. The cancer vaccine field is focused on using neo-antigens from
unique point mutations of genomic sequence in the cancer patient for making
personalized cancer vaccines. However, we choose a different path to find frameshift
neo-antigens at the mRNA level and develop broadly effective cancer vaccines based on
frameshift antigens.
In this dissertation, I have summarized and characterized all the potential frameshift
antigens from microsatellite regions in human, dog and mouse. A list of frameshift
antigens was validated by PCR in tumor samples and the mutation rate was calculated for
one candidate – SEC62. I develop a method to screen the antibody response against
frameshift antigens in human and dog cancer patients by using frameshift peptide arrays.
Frameshift antigens selected by positive antibody response in cancer patients or by MHC
predictions show protection in different mouse tumor models. A dog version of the
cancer vaccine based on frameshift antigens was developed and tested in a small safety
trial. The results demonstrate that the vaccine is safe and it can induce strong B and T cell
immune responses. Further, I built the human exon junction frameshift database which
includes all possible frameshift antigens from mis-splicing events in exon junctions, and I
develop a method to find potential frameshift antigens from large cancer
immunosignature dataset with these databases. In addition, I test the idea of ‘early cancer
diagnosis, early treatment’ in a transgenic mouse cancer model. The results show that
ii
early treatment gives significantly better protection than late treatment and the correct
time point for treatment is crucial to give the best clinical benefit. A model for early
treatment is developed with these results.
Frameshift neo-antigens from microsatellite regions and mis-splicing events are
abundant at mRNA level and they are better antigens than neo-antigens from point
mutations in the genomic sequences of cancer patients in terms of high immunogenicity,
low probability to cause autoimmune diseases and low cost to develop a broadly effective
vaccine. This dissertation demonstrates the feasibility of using frameshift antigens for
cancer vaccine development.
Agassiz’s desert tortoise (Gopherus agassizii) is a long-lived species native to the Mojave Desert and is listed as threatened under the US Endangered Species Act. To aid conservation efforts for preserving the genetic diversity of this species, we generated a whole genome reference sequence with an annotation based on deep transcriptome sequences of adult skeletal muscle, lung, brain, and blood. The draft genome assembly for G. agassizii has a scaffold N50 length of 252 kbp and a total length of 2.4 Gbp. Genome annotation reveals 20,172 protein-coding genes in the G. agassizii assembly, and that gene structure is more similar to chicken than other turtles. We provide a series of comparative analyses demonstrating (1) that turtles are among the slowest-evolving genome-enabled reptiles, (2) amino acid changes in genes controlling desert tortoise traits such as shell development, longevity and osmoregulation, and (3) fixed variants across the Gopherus species complex in genes related to desert adaptations, including circadian rhythm and innate immune response. This G. agassizii genome reference and annotation is the first such resource for any tortoise, and will serve as a foundation for future analysis of the genetic basis of adaptations to the desert environment, allow for investigation into genomic factors affecting tortoise health, disease and longevity, and serve as a valuable resource for additional studies in this species complex.
Data Availability: All genomic and transcriptomic sequence files are available from the NIH-NCBI BioProject database (accession numbers PRJNA352725, PRJNA352726, and PRJNA281763). All genome assembly, transcriptome assembly, predicted protein, transcript, genome annotation, repeatmasker, phylogenetic trees, .vcf and GO enrichment files are available on Harvard Dataverse (doi:10.7910/DVN/EH2S9K).
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.