Methods in the assessment of genotype-phenotype correlations in rare childhood disease through orthogonal multi-omics, high-throughput sequencing approaches
Rapid advancements in genomic technologies have increased our understanding of rare human disease. Generation of multiple types of biological data including genetic variation from genome or exome, expression from transcriptome, methylation patterns from epigenome, protein complexity from proteome and metabolite information from metabolome is feasible. "Omics" tools provide comprehensive view into biological mechanisms that impact disease trait and risk. In spite of available data types and ability to collect them simultaneously from patients, researchers still rely on their independent analysis. Combining information from multiple biological data can reduce missing information, increase confidence in single data findings, and provide a more complete view of genotype-phenotype correlations. Although rare disease genetics has been greatly improved by exome sequencing, a substantial portion of clinical patients remain undiagnosed. Multiple frameworks for integrative analysis of genomic and transcriptomic data are presented with focus on identifying functional genetic variations in patients with undiagnosed, rare childhood conditions. Direct quantitation of X inactivation ratio was developed from genomic and transcriptomic data using allele specific expression and segregation analysis to determine magnitude and inheritance mode of X inactivation. This approach was applied in two families revealing non-random X inactivation in female patients. Expression based analysis of X inactivation showed high correlation with standard clinical assay. These findings improved understanding of molecular mechanisms underlying X-linked disorders. In addition multivariate outlier analysis of gene and exon level data from RNA-seq using Mahalanobis distance, and its integration of distance scores with genomic data found genotype-phenotype correlations in variant prioritization process in 25 families. Mahalanobis distance scores revealed variants with large transcriptional impact in patients. In this dataset, frameshift variants were more likely result in outlier expression signatures than other types of functional variants. Integration of outlier estimates with genetic variants corroborated previously identified, presumed causal variants and highlighted new candidate in previously un-diagnosed case. Integrative genomic approaches in easily attainable tissue will facilitate the search for biomarkers that impact disease trait, uncover pharmacogenomics targets, provide novel insight into molecular underpinnings of un-characterized conditions, and help improve analytical approaches that use large datasets.