Design, analytics and quality assurance for emerging personalized clinical diagnostics based on next-gen sequencing
Major advancements in biology and medicine have been realized during recent decades, including massively parallel sequencing, which allows researchers to collect millions or billions of short reads from a DNA or RNA sample. This capability opens the door to a renaissance in personalized medicine if effectively deployed. Three projects that address major and necessary advancements in massively parallel sequencing are included in this dissertation. The first study involves a pair of algorithms to verify patient identity based on single nucleotide polymorphisms (SNPs). In brief, we developed a method that allows de novo construction of sample relationships, e.g., which ones are from the same individuals and which are from different individuals. We also developed a method to confirm the hypothesis that a tumor came from a known individual. The second study derives an algorithm to multiplex multiple Polymerase Chain Reaction (PCR) reactions, while minimizing interference between reactions that compromise results. PCR is a powerful technique that amplifies pre-determined regions of DNA and is often used to selectively amplify DNA and RNA targets that are destined for sequencing. It is highly desirable to multiplex reactions to save on reagent and assay setup costs as well as equalize the effect of minor handling issues across gene targets. Our solution involves a binary integer program that minimizes events that are likely to cause interference between PCR reactions. The third study involves design and analysis methods required to analyze gene expression and copy number results against a reference range in a clinical setting for guiding patient treatments. Our goal is to determine which events are present in a given tumor specimen. These events may be mutation, DNA copy number or RNA expression. All three techniques are being used in major research and diagnostic projects for their intended purpose at the time of writing this manuscript. The SNP matching solution has been selected by The Cancer Genome Atlas to determine sample identity. Paradigm Diagnostics, Viomics and International Genomics Consortium utilize the PCR multiplexing technique to multiplex various types of PCR reactions on multi-million dollar projects. The reference range-based normalization method is used by Paradigm Diagnostics to analyze results from every patient.