![137802-Thumbnail Image.png](https://d1rbsgppyrdqq4.cloudfront.net/s3fs-public/styles/width_400/public/2021-05/137802-Thumbnail%20Image.png?versionId=FeTUnJBgDzEse3KqdOVo9eAfJDpNLYZF&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASBVQ3ZQ42ZLA5CUJ/20240530/us-west-2/s3/aws4_request&X-Amz-Date=20240530T154049Z&X-Amz-SignedHeaders=host&X-Amz-Expires=120&X-Amz-Signature=8a9480b67b5c9db9266a4d0422e812f4ca4862460acc94fb636479801cf6c977&itok=nYiz0gMg)
![137766-Thumbnail Image.png](https://d1rbsgppyrdqq4.cloudfront.net/s3fs-public/styles/width_400/public/2021-05/137766-Thumbnail%20Image.png?versionId=goNsuWx6CXAVwn57Z2Os.SjlTrsf.tib&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASBVQ3ZQ42ZLA5CUJ/20240616/us-west-2/s3/aws4_request&X-Amz-Date=20240616T203459Z&X-Amz-SignedHeaders=host&X-Amz-Expires=120&X-Amz-Signature=e383ffd573e9cde7d6356dabfc30b0abcca4720ed2922f72c419fc8b7d0c42fe&itok=i2ooAimc)
![154070-Thumbnail Image.png](https://d1rbsgppyrdqq4.cloudfront.net/s3fs-public/styles/width_400/public/2021-09/154070-Thumbnail%20Image.png?versionId=BE9fYV5UvAnmMh471E.u_vazT027mA97&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASBVQ3ZQ42ZLA5CUJ/20240617/us-west-2/s3/aws4_request&X-Amz-Date=20240617T170508Z&X-Amz-SignedHeaders=host&X-Amz-Expires=120&X-Amz-Signature=06e700b80d2252887034786ca12347ff168b2e7d8799e37a14692be89bcc1912&itok=Y4QAUBYo)
The deluge of next-generation sequencing data nowadays has shifted the bottleneck of cancer research from multiple “-omics” data collection to integrative analysis and data interpretation. In this dissertation, I attempt to address two distinct, but dependent, challenges. The first is to design specific computational algorithms and tools that can process and extract useful information from the raw data in an efficient, robust, and reproducible manner. The second challenge is to develop high-level computational methods and data frameworks for integrating and interpreting these data. Specifically, Chapter 2 presents a tool called Snipea (SNv Integration, Prioritization, Ensemble, and Annotation) to further identify, prioritize and annotate somatic SNVs (Single Nucleotide Variant) called from multiple variant callers. Chapter 3 describes a novel alignment-based algorithm to accurately and losslessly classify sequencing reads from xenograft models. Chapter 4 describes a direct and biologically motivated framework and associated methods for identification of putative aberrations causing survival difference in GBM patients by integrating whole-genome sequencing, exome sequencing, RNA-Sequencing, methylation array and clinical data. Lastly, chapter 5 explores longitudinal and intratumor heterogeneity studies to reveal the temporal and spatial context of tumor evolution. The long-term goal is to help patients with cancer, particularly those who are in front of us today. Genome-based analysis of the patient tumor can identify genomic alterations unique to each patient’s tumor that are candidate therapeutic targets to decrease therapy resistance and improve clinical outcome.
![152847-Thumbnail Image.png](https://d1rbsgppyrdqq4.cloudfront.net/s3fs-public/styles/width_400/public/2021-08/152847-Thumbnail%20Image.png?versionId=OoXmrMqi6gYT9vXtrmWDCY.SAzyjtiBu&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASBVQ3ZQ42ZLA5CUJ/20240617/us-west-2/s3/aws4_request&X-Amz-Date=20240617T025343Z&X-Amz-SignedHeaders=host&X-Amz-Expires=120&X-Amz-Signature=828cb591e6907601bbc8e7130b25f307baad413bb4526ee547ed5400ddf1624a&itok=HqKUGwJX)
![152740-Thumbnail Image.png](https://d1rbsgppyrdqq4.cloudfront.net/s3fs-public/styles/width_400/public/2021-09/152740-Thumbnail%20Image.png?versionId=khLvjguuCK_52NaJav082JpBFuTtccAt&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASBVQ3ZQ42ZLA5CUJ/20240617/us-west-2/s3/aws4_request&X-Amz-Date=20240617T163917Z&X-Amz-SignedHeaders=host&X-Amz-Expires=120&X-Amz-Signature=50fb03d2070c5a4a56f9f1dd0f8fa34620042d63986bb7baba7a24e21443f150&itok=Mbbs9KkD)
![161062-Thumbnail Image.png](https://d1rbsgppyrdqq4.cloudfront.net/s3fs-public/styles/width_400/public/2021-11/161062-Thumbnail%20Image.png?versionId=mwvvn_RDQdDA3YdyuzlCincPJQFisirI&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASBVQ3ZQ42ZLA5CUJ/20240617/us-west-2/s3/aws4_request&X-Amz-Date=20240617T110912Z&X-Amz-SignedHeaders=host&X-Amz-Expires=120&X-Amz-Signature=8625da445a39f2b6859754c2db295cea8359b8de19e5cee563adbd6bdc732c45&itok=s6QrumjF)
Early detection of disease is essential for alleviating disease burden, increasing success rate and decreasing mortality rate especially for cancer. To improve disease diagnostics, many candidate biomarkers have been suggested using molecular biology or image analysis techniques over the past decade. The receiver operating characteristics (ROC) curve is a standard technique to evaluate a diagnostic accuracy of biomarkers, but it has some limitations especially for heterogeneous diseases. As an alternative of the ROC curve analysis, we suggest a jittered dot plot (JDP) and JDP-based evaluation measures, above mean difference (AMD) and averaged above mean difference (AAMD). We demonstrate how JDP and AMD or AAMD together better evaluate biomarkers than the standard ROC curve. We analyze real and heterogeneous basal-like breast cancer data.
Throughout the long history of virus-host co-evolution, viruses have developed delicate strategies to facilitate their invasion and replication of their genome, while silencing the host immune responses through various mechanisms. The systematic characterization of viral protein-host interactions would yield invaluable information in the understanding of viral invasion/evasion, diagnosis and therapeutic treatment of a viral infection, and mechanisms of host biology. With more than 2,000 viral genomes sequenced, only a small percent of them are well investigated. The access of these viral open reading frames (ORFs) in a flexible cloning format would greatly facilitate both in vitro and in vivo virus-host interaction studies. However, the overall progress of viral ORF cloning has been slow. To facilitate viral studies, we are releasing the initiation of our panviral proteome collection of 2,035 ORF clones from 830 viral genes in the Gateway® recombinational cloning system. Here, we demonstrate several uses of our viral collection including highly efficient production of viral proteins using human cell-free expression system in vitro, global identification of host targets for rubella virus using Nucleic Acid Programmable Protein Arrays (NAPPA) containing 10,000 unique human proteins, and detection of host serological responses using micro-fluidic multiplexed immunoassays. The studies presented here begin to elucidate host-viral protein interactions with our systemic utilization of viral ORFs, high-throughput cloning, and proteomic technologies. These valuable plasmid resources will be available to the research community to enable continued viral functional studies.
![129310-Thumbnail Image.png](https://d1rbsgppyrdqq4.cloudfront.net/s3fs-public/styles/width_400/public/2021-04/129310-Thumbnail%20Image.png?versionId=mihWnRH0AaV.YGXoZ0Zo3yLdMSfP_Zjc&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASBVQ3ZQ42ZLA5CUJ/20240617/us-west-2/s3/aws4_request&X-Amz-Date=20240617T170022Z&X-Amz-SignedHeaders=host&X-Amz-Expires=120&X-Amz-Signature=8a8a8413e8d049e51eaafb98da3451cbdbdc1886d2498b6c6a331a08d0822414&itok=e5JUZK_m)
Sera from patients with ovarian cancer contain autoantibodies (AAb) to tumor-derived proteins that are potential biomarkers for early detection. To detect AAb, we probed high-density programmable protein microarrays (NAPPA) expressing 5177 candidate tumor antigens with sera from patients with serous ovarian cancer (n = 34 cases/30 controls) and measured bound IgG. Of these, 741 antigens were selected and probed with an independent set of ovarian cancer sera (n = 60 cases/60 controls). Twelve potential autoantigens were identified with sensitivities ranging from 13 to 22% at >93% specificity. These were retested using a Luminex bead array using 60 cases and 60 controls, with sensitivities ranging from 0 to 31.7% at 95% specificity. Three AAb (p53, PTPRA, and PTGFR) had area under the curve (AUC) levels >60% (p < 0.01), with the partial AUC (SPAUC) over 5 times greater than for a nondiscriminating test (p < 0.01). Using a panel of the top three AAb (p53, PTPRA, and PTGFR), if at least two AAb were positive, then the sensitivity was 23.3% at 98.3% specificity. AAb to at least one of these top three antigens were also detected in 7/20 sera (35%) of patients with low CA 125 levels and 0/15 controls. AAb to p53, PTPRA, and PTGFR are potential biomarkers for the early detection of ovarian cancer.
![129076-Thumbnail Image.png](https://d1rbsgppyrdqq4.cloudfront.net/s3fs-public/styles/width_400/public/2021-04/129076-Thumbnail%20Image.png?versionId=bwhB7im7EisLbYG_kzfB1EQ9usULs4Tg&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASBVQ3ZQ42ZLA5CUJ/20240617/us-west-2/s3/aws4_request&X-Amz-Date=20240617T082135Z&X-Amz-SignedHeaders=host&X-Amz-Expires=120&X-Amz-Signature=06cfe9fdad35de4f824413f117217184366b560be71286f89328f53b1f3d70af&itok=fL2Lku2d)
Background: Tissue-specific RNA plasticity broadly impacts the development, tissue identity and adaptability of all organisms, but changes in composition, expression levels and its impact on gene regulation in different somatic tissues are largely unknown. Here we developed a new method, polyA-tagging and sequencing (PAT-Seq) to isolate high-quality tissue-specific mRNA from Caenorhabditis elegans intestine, pharynx and body muscle tissues and study changes in their tissue-specific transcriptomes and 3’UTRomes.
Results: We have identified thousands of novel genes and isoforms differentially expressed between these three tissues. The intestine transcriptome is expansive, expressing over 30% of C. elegans mRNAs, while muscle transcriptomes are smaller but contain characteristic unique gene signatures. Active promoter regions in all three tissues reveal both known and novel enriched tissue-specific elements, along with putative transcription factors, suggesting novel tissue-specific modes of transcription initiation. We have precisely mapped approximately 20,000 tissue-specific polyadenylation sites and discovered that about 30% of transcripts in somatic cells use alternative polyadenylation in a tissue-specific manner, with their 3’UTR isoforms significantly enriched with microRNA targets.
Conclusions: For the first time, PAT-Seq allowed us to directly study tissue specific gene expression changes in an in vivo setting and compare these changes between three somatic tissues from the same organism at single-base resolution within the same experiment. We pinpoint precise tissue-specific transcriptome rearrangements and for the first time link tissue-specific alternative polyadenylation to miRNA regulation, suggesting novel and unexplored tissue-specific post-transcriptional regulatory networks in somatic cells.