Contents
- Index
- Previous
- Next
Background
I created SimPlot in order to learn more about HIV-1 intersubtype recombination analysis when I encountered a mosaic HIV-1 genome during analysis of some clones from India (see Lole et al. reference on first page of this Help file). There is a program available for doing this sort of analysis, at the Los Alamos National Lab's Human Retroviruses and AIDS Database Web site (http://hiv-web.lanl.gov). The program is called the Recombination Identification Program (RIP), and the direct link is: http://linker.lanl.gov/RIP/RIPsubmit.html. It has very nice online documentation, and is also described in Siepel AC and Korber BT, Scanning the Database for Recombinant HIV-1 Genomes, in the Human Retroviruses and AIDS Compendium, 1995 (available from the Los Alamos site as a Adobe Acrobat file).
I wanted to do some customization, so here is SimPlot. While the output from SimPlot bears a passing resemblance to that of RIP, I have spent very little time with RIP; I wrote SimPlot without using any prior code for similarity plots. RIP does some things that SimPlot does not, like the "informative mode", which limits the comparison to sites that contain at least 1 mismatch among the reference sequences. If you find SimPlot useful and want that feature I may add it. Similarity plots are only a screening tool, and as such SimPlot is pretty utilitarian.
SimPlot allows identification of a Query, generally the a sequence you suspect is mosaic, and the rest of the sequences are Reference sequences (or can be ignored - see the Select Function). The graph that is generated is a set of lines (or optionally strings of points) that reflect the similarity (or distance) of each Reference sequence (or group) to the Query. In order to generate this plot a sliding window is passed across the alignment in small steps (the window size and step size are selectable).
Bootscanning was added in version 2.5, and this documentation will not do justice to this analytical approach. I recommend Mika Salminen, et al. Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. AIDS Res Hum Retroviruses. 1995 Nov;11(11):1423-5.
The informative sites analysis was initially based on Robertson,D.L., Hahn,B.H., and Sharp,P.M. Recombination in AIDS viruses. J Mol Evol 1995;40(3):249-59. Interactive analysis of informative sites is arguably the most novel aspect of SimPlot.
For HIV, I recommend that you use the reference alignments available from the Los Alamos WWW site for your reference sequences. Since SimPlot can now create on-the-fly consensus sequences, all you need is the full alignment. You may also want the majority consensus alignment for faster alignment of query sequences to the reference sequences.