ASU Regents' Professors Open Access Works
The title “Regents’ Professor” is the highest faculty honor awarded at Arizona State University. It is conferred on ASU faculty who have made pioneering contributions in their areas of expertise, who have achieved a sustained level of distinction, and who enjoy national and international recognition for these accomplishments. This collection contains primarily open access works by ASU Regents' Professors.
Filtering by
- Creators: Center for Evolution and Medicine
Background:
Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their ability to modulate the drug response.
Results:
We found that the available data on the link between drug response and nsSNV is rather modest. There were only 31 distinct drug response-altering (DR-altering) and 43 distinct drug response-neutral (DR-neutral) nsSNVs in the whole Pharmacogenomics Knowledge Base (PharmGKB). However, even with this modest dataset, it was clear that existing bioinformatics tools have difficulties in correctly predicting the known DR-altering and DR-neutral nsSNVs. They exhibited an overall accuracy of less than 50%, which was not better than random diagnosis. We found that the underlying problem is the markedly different evolutionary properties between positions harboring nsSNVs linked to drug responses and those observed for inherited diseases. To solve this problem, we developed a new diagnosis method, Drug-EvoD, which was trained on the evolutionary properties of nsSNVs associated with drug responses in a sparse learning framework. Drug-EvoD achieves a TPR of 84% and a TNR of 53%, with a balanced accuracy of 69%, which improves upon other methods significantly.
Conclusions:
The new tool will enable researchers to computationally identify nsSNVs that may affect drug responses. However, much larger training and testing datasets are needed to develop more reliable and accurate tools.
Methods: The traditional methodology (Forced-Stare [FS]) measures TFBUT and IBI separately. TFBUT is measured under forced-stare conditions by an examiner using a stopwatch, while IBI is measured as the subject watches television. The new methodology (video capture manual analysis [VCMA]) involves retrospective analysis of video data of fluorescein-stained eyes taken through a slit lamp while the subject watches television, and provides TFBUT and BUA for each IBI during the 1-minute video under natural blink conditions. The FS and VCMA methods were directly compared in the same set of dry-eye subjects. The VCMA method was evaluated for the ability to discriminate between dry-eye subjects and normal subjects. The VCMA method was further evaluated in the dry eye subjects for the ability to detect a treatment effect before, and 10 minutes after, bilateral instillation of an artificial tear solution.
Results: Ten normal subjects and 17 dry-eye subjects were studied. In the dry-eye subjects, the two methods differed with respect to mean TFBUTs (5.82 seconds, FS; 3.98 seconds, VCMA; P = 0.002). The FS variables alone (TFBUT, IBI) were not able to successfully distinguish between the dry-eye and normal subjects, whereas the additional VCMA variables, both derived and observed (BUA, BUA/IBI, breakup rate), were able to successfully distinguish between the dry-eye and normal subjects in a statistically significant fashion. TFBUT (P = 0.034) and BUA/IBI (P = 0.001) were able to distinguish the treatment effect of artificial tears in dry-eye subjects.
Conclusion: The VCMA methodology provides a clinically relevant analysis of tear film stability measured in the context of a natural blink pattern.
Methods: Thirty-three dry eye subjects completed a single-center, single-visit, pilot CAE study. The primary endpoint was mean break-up area (MBA) as assessed by the OPI 2.0 system. Secondary endpoints included corneal fluorescein staining, tear film break-up time, and OPI 2.0 system measurements. Subjects were also asked to rate their ocular discomfort throughout the CAE. Dry eye endpoints were measured at baseline, immediately following a 90-minute CAE exposure, and again 30 minutes after exposure.
Results: The post-CAE measurements of MBA showed a statistically significant decrease from the baseline measurements. The decrease was relatively specific to those patients with moderate to severe dry eye, as measured by baseline MBA. Secondary endpoints including palpebral fissure size, corneal staining, and redness, also showed significant changes when pre- and post-CAE measurements were compared. A correlation analysis identified specific associations between MBA, blink rate, and palpebral fissure size. Comparison of MBA responses allowed us to identify subpopulations of subjects who exhibited different compensatory mechanisms in response to CAE challenge. Of note, none of the measures of tear film break-up time showed statistically significant changes or correlations in pre-, versus post-CAE measures.
Conclusion: This pilot study confirms that the tear film metric MBA can detect changes in the ocular surface induced by a CAE, and that these changes are correlated with other, established measures of dry eye disease. The observed decrease in MBA following CAE exposure demonstrates that compensatory mechanisms are initiated during the CAE exposure, and that this compensation may provide the means to identify and characterize clinically relevant subpopulations of dry eye patients.
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.
Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.
Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.
“Stoichioproteomics” relates the elemental composition of proteins and proteomes to variation in the physiological and ecological environment. To help harness and explore the wealth of hypotheses made possible under this framework, we introduce GRASP (http://www.graspdb.net), a public bioinformatic knowledgebase containing information on the frequencies of 20 amino acids and atomic composition of their side chains. GRASP integrates comparative protein composition data with annotation data from multiple public databases. Currently, GRASP includes information on proteins of 12 sequenced Drosophila (fruit fly) proteomes, which will be expanded to include increasingly diverse organisms over time. In this paper we illustrate the potential of GRASP for testing stoichioproteomic hypotheses by conducting an exploratory investigation into the composition of 12 Drosophila proteomes, testing the prediction that protein atomic content is associated with species ecology and with protein expression levels.
Results
Elements varied predictably along multivariate axes. Species were broadly similar, with the D. willistoni proteome a clear outlier. As expected, individual protein atomic content within proteomes was influenced by protein function and amino acid biochemistry. Evolution in elemental composition across the phylogeny followed less predictable patterns, but was associated with broad ecological variation in diet. Using expression data available for D. melanogaster, we found evidence consistent with selection for efficient usage of elements within the proteome: as expected, nitrogen content was reduced in highly expressed proteins in most tissues, most strongly in the gut, where nutrients are assimilated, and least strongly in the germline.
Conclusions
The patterns identified here using GRASP provide a foundation on which to base future research into the evolution of atomic composition in Drosophila and other taxa.
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
Results
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
Conclusion
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.
Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval
Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords.
Results
In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes.
Conclusions
We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results.
Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions.
Results
We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/.
Conclusions
Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods.