Matching Items (15)

128373-Thumbnail Image.png

Evolution-informed modeling improves outcome prediction for cancers

Description

Despite wide applications of high-throughput biotechnologies in cancer research, many biomarkers discovered by exploring large-scale omics data do not provide satisfactory performance when used to predict cancer treatment outcomes. This

Despite wide applications of high-throughput biotechnologies in cancer research, many biomarkers discovered by exploring large-scale omics data do not provide satisfactory performance when used to predict cancer treatment outcomes. This problem is partly due to the overlooking of functional implications of molecular markers. Here, we present a novel computational method that uses evolutionary conservation as prior knowledge to discover bona fide biomarkers. Evolutionary selection at the molecular level is nature's test on functional consequences of genetic elements. By prioritizing genes that show significant statistical association and high functional impact, our new method reduces the chances of including spurious markers in the predictive model. When applied to predicting therapeutic responses for patients with acute myeloid leukemia and to predicting metastasis for patients with prostate cancers, the new method gave rise to evolution-informed models that enjoyed low complexity and high accuracy. The identified genetic markers also have significant implications in tumor progression and embrace potential drug targets. Because evolutionary conservation can be estimated as a gene-specific, position-specific, or allele-specific parameter on the nucleotide level and on the protein level, this new method can be extended to apply to miscellaneous “omics” data to accelerate biomarker discoveries.

Contributors

Agent

Created

Date Created
  • 2016-10-21

128627-Thumbnail Image.png

Evolutionary and Topological Properties of Genes and Community Structures in Human Gene Regulatory Networks

Description

The diverse, specialized genes present in today’s lifeforms evolved from a common core of ancient, elementary genes. However, these genes did not evolve individually: gene expression is controlled by a

The diverse, specialized genes present in today’s lifeforms evolved from a common core of ancient, elementary genes. However, these genes did not evolve individually: gene expression is controlled by a complex network of interactions, and alterations in one gene may drive reciprocal changes in its proteins’ binding partners. Like many complex networks, these gene regulatory networks (GRNs) are composed of communities, or clusters of genes with relatively high connectivity. A deep understanding of the relationship between the evolutionary history of single genes and the topological properties of the underlying GRN is integral to evolutionary genetics. Here, we show that the topological properties of an acute myeloid leukemia GRN and a general human GRN are strongly coupled with its genes’ evolutionary properties. Slowly evolving (“cold”), old genes tend to interact with each other, as do rapidly evolving (“hot”), young genes. This naturally causes genes to segregate into community structures with relatively homogeneous evolutionary histories. We argue that gene duplication placed old, cold genes and communities at the center of the networks, and young, hot genes and communities at the periphery. We demonstrate this with single-node centrality measures and two new measures of efficiency, the set efficiency and the interset efficiency. We conclude that these methods for studying the relationships between a GRN’s community structures and its genes’ evolutionary properties provide new perspectives for understanding evolutionary genetics.

Contributors

Agent

Created

Date Created
  • 2016-06-30

Evolutionary Diagnosis of Non-Synonymous Variants Involved in Differential Drug Response

Description

Background:
Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants

Background:
Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their ability to modulate the drug response.

Results:
We found that the available data on the link between drug response and nsSNV is rather modest. There were only 31 distinct drug response-altering (DR-altering) and 43 distinct drug response-neutral (DR-neutral) nsSNVs in the whole Pharmacogenomics Knowledge Base (PharmGKB). However, even with this modest dataset, it was clear that existing bioinformatics tools have difficulties in correctly predicting the known DR-altering and DR-neutral nsSNVs. They exhibited an overall accuracy of less than 50%, which was not better than random diagnosis. We found that the underlying problem is the markedly different evolutionary properties between positions harboring nsSNVs linked to drug responses and those observed for inherited diseases. To solve this problem, we developed a new diagnosis method, Drug-EvoD, which was trained on the evolutionary properties of nsSNVs associated with drug responses in a sparse learning framework. Drug-EvoD achieves a TPR of 84% and a TNR of 53%, with a balanced accuracy of 69%, which improves upon other methods significantly.

Conclusions:
The new tool will enable researchers to computationally identify nsSNVs that may affect drug responses. However, much larger training and testing datasets are needed to develop more reliable and accurate tools.

Contributors

Agent

Created

Date Created
  • 2015-01-15

133258-Thumbnail Image.png

Expansion and Application of Pathways of Topological Rank Analysis (PoTRA) to Various Cancers

Description

Cancer is the second leading cause of death in the United States. Cancer is a serious, complex disease which causes cells to grow uncontrollably, causing millions of deaths per year

Cancer is the second leading cause of death in the United States. Cancer is a serious, complex disease which causes cells to grow uncontrollably, causing millions of deaths per year [1]. Cancer is usually caused by a combination of environmental variables and biological pathways. The pathways have a very robust structure normally, but are altered because of cancer, resulting in a loss of connectivity between pathways. In order detect these pathways, a PageRank-based method called Pathways of Topological Rank Analysis (PoTRA) was created, which measures the relative rankings of the genes in each pathway. Applying this algorithm will allow us to figure out what pathways differed significantly in areas with cancer and areas without cancer. This would allow scientists to focus on specific pathways in order to learn more about the cancer and find more effective ways to treat it. So far, analysis using PoTRA has been successfully conducted on hepatocellular carcinoma (HCC) and its subtypes, resulting in all significant pathways found being cancer-associated. Now, using the TCGA data stored in Google Cloud's BigQuery, we created a pipeline to apply PoTRA to other cancer data sets and see how well it cross-applies to other cancers. The results show that even though some modification may need to be made to adapt to other datasets, many significant pathways were found for both HCC and breast cancer.

Contributors

Agent

Created

Date Created
  • 2018-05

133301-Thumbnail Image.png

Analysis of HIV Risk Groups Using Bayesian Analysis

Description

Phylogenetic analyses that were conducted in the past didn't have the ability or functionality to inform and implement useful public health decisions while using clustering. Models can be constructed to

Phylogenetic analyses that were conducted in the past didn't have the ability or functionality to inform and implement useful public health decisions while using clustering. Models can be constructed to conduct any further analyses for the result of meaningful data to be used in the future of public health informatics. A phylogenetic tree is considered one of the best ways for researchers to visualize and analyze the evolutionary history of a certain virus. The focus of this study was to research HIV phylodynamic and phylogenetic methods. This involved identifying the fast growing HIV transmission clusters and rates for certain risk groups in the US. In order to achieve these results an HIV database was required to retrieve real-time data for implementation, alignment software for multiple sequence alignment, Bayesian analysis software for the development and manipulation of models, and graphical tools for visualizing the output from the models created. This study began by conducting a literature review on HIV phylogeographies and phylodynamics. Sequence data was then obtained from a sequence database to be run in a multiple alignment software. The sequence that was obtained was unaligned which is why the alignment was required. Once the alignment was performed, the same file was loaded into a Bayesian analysis software for model creation of a phylogenetic tree. When the model was created, the tree was edited in a tree visualization software for the user to easily interpret. From this study the output of the tree resulted the way it did, due to a distant homology or the mixing of certain parameters. For a further continuation of this study, it would be interesting to use the same aligned sequence and use different model parameter selections for the initial creation of the model to see how the output changes. This is because one small change for the model parameter could greatly affect the output of the phylogenetic tree.

Contributors

Agent

Created

Date Created
  • 2018-05

132249-Thumbnail Image.png

A Retrospective Investigation to Assess the Potential Application of Predictive Machine Learning Algorithms in Oncology Clinical Trials

Description

The purpose of this investigation is to apply a machine learning algorithm with de-identified, historic oncology clinical trial data to assess the theoretical understanding of predictive modeling to derive potential

The purpose of this investigation is to apply a machine learning algorithm with de-identified, historic oncology clinical trial data to assess the theoretical understanding of predictive modeling to derive potential clinical practice recommendations. Within this study, electronic medical records from the HonorHealth Virginia G. Piper Institute will undergo data visualization to identify potential correlations and trends critical for model creation as well as further identify potential expansions or limitations of scope regarding model purpose. Hypothesis pursued post data visualization was the development of a predictive model for 6-month survival. Current standard is estimated physician accuracy at 56.5% accuracy at 6 months out. This study created supervised learning models using decision trees, KNN, SVM and Ensemble methods using combinations of LASSO Logistic Regression and Know-GRFF Random Forest for feature selection. SVM trained on a combined set of LASSO and Know-GRRF featured produced the highest performing model at 75.5% with an AUC of 0.82. This study demonstrates the potential for applying predictive modeling on readily available EMR records to drive clinical practice recommendations. The models developed could potentially, with further development, be used as an ancillary tool for jumpstarting patient-physician conversations on survival and life expectancy.

Contributors

Agent

Created

Date Created
  • 2019-05

130273-Thumbnail Image.png

FlyExpress 7: An Integrated Discovery Platform To Study Coexpressed Genes Using in Situ Hybridization Images in Drosophila

Description

Gene expression patterns assayed across development can offer key clues about a gene’s function and regulatory role. Drosophila melanogaster is ideal for such investigations as multiple individual and high-throughput efforts

Gene expression patterns assayed across development can offer key clues about a gene’s function and regulatory role. Drosophila melanogaster is ideal for such investigations as multiple individual and high-throughput efforts have captured the spatiotemporal patterns of thousands of embryonic expressed genes in the form of in situ images. FlyExpress (www.flyexpress.net), a knowledgebase based on a massive and unique digital library of standardized images and a simple search engine to find coexpressed genes, was created to facilitate the analytical and visual mining of these patterns. Here, we introduce the next generation of FlyExpress resources to facilitate the integrative analysis of sequence data and spatiotemporal patterns of expression from images. FlyExpress 7 now includes over 100,000 standardized in situ images and implements a more efficient, user-defined search algorithm to identify coexpressed genes via Genomewide Expression Maps (GEMs). Shared motifs found in the upstream 5′ regions of any pair of coexpressed genes can be visualized in an interactive dotplot. Additional webtools and link-outs to assist in the downstream validation of candidate motifs are also provided. Together, FlyExpress 7 represents our largest effort yet to accelerate discovery via the development and dispersal of new webtools that allow researchers to perform data-driven analyses of coexpression (image) and genomic (sequence) data.

Contributors

Created

Date Created
  • 2017-06-30

158771-Thumbnail Image.png

Fine Mapping Functional Noncoding Genetic Elements Via Machine Learning

Description

All biological processes like cell growth, cell differentiation, development, and aging requires a series of steps which are characterized by gene regulation. Studies have shown that gene regulation is the

All biological processes like cell growth, cell differentiation, development, and aging requires a series of steps which are characterized by gene regulation. Studies have shown that gene regulation is the key to various traits and diseases. Various factors affect the gene regulation which includes genetic signals, epigenetic tracks, genetic variants, etc. Deciphering and cataloging these functional genetic elements in the non-coding regions of the genome is one of the biggest challenges in precision medicine and genetic research. This thesis presents two different approaches to identifying these elements: TreeMap and DeepCORE. The first approach involves identifying putative causal genetic variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. TreeMap performs an organized search for individual and multiple causal variants using a tree guided nested machine learning method. DeepCORE on the other hand explores novel deep learning techniques that models the relationship between genetic, epigenetic and transcriptional patterns across tissues and cell lines and identifies co-operative regulatory elements that affect gene regulation. These two methods are believed to be the link for genotype-phenotype association and a necessary step to explaining various complex diseases and missing heritability.

Contributors

Agent

Created

Date Created
  • 2020

158859-Thumbnail Image.png

Biomarkers of Familial Speech Sound Disorders: Genes, Perception, and Motor Control

Description

Speech sound disorders (SSDs) are the most prevalent type of communication disorder in children. Clinically, speech-language pathologists (SLPs) rely on behavioral methods for assessing and treating SSDs. Though

Speech sound disorders (SSDs) are the most prevalent type of communication disorder in children. Clinically, speech-language pathologists (SLPs) rely on behavioral methods for assessing and treating SSDs. Though clients typically experience improved speech outcomes as a result of therapy, there is evidence that underlying deficits may persist even in individuals who have completed treatment for surface-level speech behaviors. Advances in the field of genetics have created the opportunity to investigate the contribution of genes to human communication. Due to the heterogeneity of many communication disorders, the manner in which specific genetic changes influence neural mechanisms, and thereby behavioral phenotypes, remains largely unknown. The purpose of this study was to identify genotype-phenotype associations, along with perceptual, and motor-related biomarkers within families displaying SSDs. Five parent-child trios participated in genetic testing, and five families participated in a combination of genetic and behavioral testing to help elucidate biomarkers related to SSDs. All of the affected individuals had a history of childhood apraxia of speech (CAS) except for one family that displayed a phonological disorder. Genetic investigation yielded several genes of interest relevant for an SSD phenotype: CNTNAP2, CYFIP1, GPR56, HERC1, KIAA0556, LAMA5, LAMB1, MDGA2, MECP2, NBEA, SHANK3, TENM3, and ZNF142. All of these genes showed at least some expression in the developing brain. Gene ontology analysis yielded terms supporting a genetic influence on central nervous system development. Behavioral testing revealed evidence of a sequential processing biomarker for all individuals with CAS, with many showing deficits in sequential motor skills in addition to speech deficits. In some families, participants also showed evidence of a co-occurring perceptual processing biomarker. The family displaying a phonological phenotype showed milder sequential processing deficits compared to CAS families. Overall, this study supports the presence of a sequential processing biomarker for CAS and shows that relevant genes of interest may be influencing a CAS phenotype via sequential processing. Knowledge of these biomarkers can help strengthen precision of clinical assessment and motivate development of novel interventions for individuals with SSDs.

Contributors

Agent

Created

Date Created
  • 2020

157966-Thumbnail Image.png

Discovering subclones and their driver genes in tumors sequenced at standard depths

Description

Understanding intratumor heterogeneity and their driver genes is critical to

designing personalized treatments and improving clinical outcomes of cancers. Such

investigations require accurate delineation of the subclonal composition of a tumor, which

to

Understanding intratumor heterogeneity and their driver genes is critical to

designing personalized treatments and improving clinical outcomes of cancers. Such

investigations require accurate delineation of the subclonal composition of a tumor, which

to date can only be reliably inferred from deep-sequencing data (>300x depth). The

resulting algorithm from the work presented here, incorporates an adaptive error model

into statistical decomposition of mixed populations, which corrects the mean-variance

dependency of sequencing data at the subclonal level and enables accurate subclonal

discovery in tumors sequenced at standard depths (30-50x). Tested on extensive computer

simulations and real-world data, this new method, named model-based adaptive grouping

of subclones (MAGOS), consistently outperforms existing methods on minimum

sequencing depth, decomposition accuracy and computation efficiency. MAGOS supports

subclone analysis using single nucleotide variants and copy number variants from one or

more samples of an individual tumor. GUST algorithm, on the other hand is a novel method

in detecting the cancer type specific driver genes. Combination of MAGOS and GUST

results can provide insights into cancer progression. Applications of MAGOS and GUST

to whole-exome sequencing data of 33 different cancer types’ samples discovered a

significant association between subclonal diversity and their drivers and patient overall

survival.

Contributors

Agent

Created

Date Created
  • 2019