Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.
Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.
Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.
The membrane proximal region (MPR, residues 649–683) and transmembrane domain (TMD, residues 684–705) of the gp41 subunit of HIV-1’s envelope protein are highly conserved and are important in viral mucosal transmission, virus attachment and membrane fusion with target cells. Several structures of the trimeric membrane proximal external region (residues 662–683) of MPR have been reported at the atomic level; however, the atomic structure of the TMD still remains unknown. To elucidate the structure of both MPR and TMD, we expressed the region spanning both domains, MPR-TM (residues 649–705), in Escherichia coli as a fusion protein with maltose binding protein (MBP). MPR-TM was initially fused to the C-terminus of MBP via a 42 aa-long linker containing a TEV protease recognition site (MBP-linker-MPR-TM).
Biophysical characterization indicated that the purified MBP-linker-MPR-TM protein was a monodisperse and stable candidate for crystallization. However, crystals of the MBP-linker-MPR-TM protein could not be obtained in extensive crystallization screens. It is possible that the 42 residue-long linker between MBP and MPR-TM was interfering with crystal formation. To test this hypothesis, the 42 residue-long linker was replaced with three alanine residues. The fusion protein, MBP-AAA-MPR-TM, was similarly purified and characterized. Significantly, both the MBP-linker-MPR-TM and MBP-AAA-MPR-TM proteins strongly interacted with broadly neutralizing monoclonal antibodies 2F5 and 4E10. With epitopes accessible to the broadly neutralizing antibodies, these MBP/MPR-TM recombinant proteins may be in immunologically relevant conformations that mimic a pre-hairpin intermediate of gp41.