Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.
Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.
Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.
X-ray free electron lasers are used in measuring diffraction patterns from nanocrystals in the 'diffract-before-destroy' mode by outrunning radiation damage. The finite-sized nanocrystals provide an opportunity to recover intensity between Bragg spots by removing the modulating function that depends on crystal shape, i.e. the transform of the crystal shape. This shape-transform dividing-out scheme for solving the phase problem has been tested using simulated examples with cubic crystals. It provides a phasing method which does not require atomic resolution data, chemical modification to the sample, or modelling based on the protein databases. It is common to find multiple structural units (e.g. molecules, in symmetry-related positions) within a single unit cell, therefore incomplete unit cells (e.g. one additional molecule) can be observed at surface layers of crystals. In this work, the effects of such incomplete unit cells on the 'dividing-out' phasing algorithm are investigated using 2D crystals within the projection approximation. It is found that the incomplete unit cells do not hinder the recovery of the scattering pattern from a single unit cell (after dividing out the shape transforms from data merged from many nanocrystals of different sizes), assuming that certain unit-cell types are preferred. The results also suggest that the dynamic range of the data is a critical issue to be resolved in order to apply the shape transform method practically.