SimPlot version 1.3
July 31, 1998
"SimPlot", software, documentation, and SimPlot icon copyright (c) 1998 Stuart C. Ray, M.D.
SimPlot contains modifications of the ReadSeq Pascal code (ca. 1990) generously placed in the public domain by Don Gilbert at Indiana University, who is responsible for how well file formats are detected. However, I am responsible for any bugs in the format detection, as I have changed the code significantly, e.g. adding detection of PHYLIP formats.
If you share this program with others, please include the documentation files.
This program is "HelloWare". If you use it, please send me an email or letter saying so. This program is NOT in the public domain, and may not be sold by anyone other than the author, nor may it be included in any collection of software (such as a CD-ROM) for distribution without the author's written consent.
Disclaimer: This software is distributed "as is", with no warranty expressed or implied. While I have made a reasonable effort to test it, you should make sure the results make sense to you.
Before contacting me about a problem, please check the known problems section below to make sure it is not already on the list.
General features | Background | Installation | How to use it | Version history | Known problems | Contacting the author
General features go to top
Background go to top
I created SimPlot in order to learn more about HIV-1 intersubtype recombination analysis when I encountered a mosaic HIV-1 genome during analysis of some clones from international isolates. There is a program available for doing this sort of analysis, at the Los Alamos National Lab's Human Retroviruses and AIDS Database Web site (http://hiv-web.lanl.gov.). The program is called the Recombination Identification Program (RIP), and the direct link is:http://hiv-web.lanl.gov./HTML/rip.html. It has very nice online documentation, and is also described in Siepel AC and Korber BT, Scanning the Database for Recombinant HIV-1 Genomes, in the Human Retroviruses and AIDS Compendium, 1995 (available from the Los Alamos site as a Adobe Acrobat file).
The RIP server was having problems at that time, and I wanted to do some customization, so here is SimPlot. While the output from SimPlot bears a passing resemblance to that of RIP, I have used RIP very little, and modeled SimPlot after figures in various published reports. RIP does some things that SimPlot does not, like the "informative mode", which limits the comparison to sites that contain at least mismatch among the reference sequences. If you find SimPlot useful and want that feature I may add it. Similarity plots are only a screening tool, and as such SimPlot is pretty utilitarian.
SimPlot allows identification of one Query sequence, generally the one you suspect is mosaic, and the rest of the sequences are either Reference sequences or Hidden. The graph that is generated is a set of lines (or optionally strings of points) that reflect the similarity of each Reference sequence to the query sequence. In order to generate this plot a sliding window is passed across the alignment in small steps, with the size of the window and step selectable.
The informative sites module is largely based on Robertson,D.L., Hahn,B.H., and Sharp,P.M. Recombination in AIDS viruses. J Mol Evol 1995;40(3):249-59.
I strongly recommend that you use the reference alignments available from the Los Alamos WWW site for your reference sequences. I find the threshold (e.g. 50%) consensus alignments particularly valuable. In order to use these I first align to the standard majority consensus) alignment, then copy my query sequence(s) to the threshold consensus file.
Installation go to top
This is a 32 bit program, so you need to be running Windows 95. It probably does fine under Windows 98 and Windows NT, but I have not tested it. It takes up less than a megabyte of hard drive space. I plan to test its memory requirements, but have not yet. It allocates memory dynamically, meaning that it uses what it needs for the data set you use. On my machine with 48 MB RAM it can handle at least 15 sequences of 9.7 kb each. Please let me know if you have any memory problems. I recommend using a screen resolution higher than 640x480 if you can such as 800x600 or 1024x768, to prevent the need for screen scrolling. This will not result in higher resolution plots.
To install, just create a folder wherever you want it and copy SimPlot.exe there. To run it, double click the file (or a Windows shortcut made from it). You can use the start menu (for instructions, use the Start menu to access the Windows Help system, the use the Index to find "adding programs to the Start menu).
How to use SimPlot go to top
As of version 1.3, SimPlot reads most sequence file formats. The format is automatically detected using code based on Don Gilbert's ReadSeq code (please see first page of this file). First, prepare your sequences by aligning them, and save them in a standard format such as FASTA/Pearson format. SimPlot can use no more 10 sequences, but the alignment can contain more - you will be prompted to select the sequences you want to analyze.
When you run the program, you will see:
Use the File menu to Open a sequence file. You can also use the Ctrl-O (^O) key combination. If the file is read successfully, you will be prompted to select the sequences you want to analyze. The 10 buttons on the right will then be updated with the 10 sequence names. You can click on these buttons to change the color associated with a sequence. You now need to select a Query sequence (by clicking one of the radio buttons next to a sequence button on the right). Using those same radio buttons you can Hide any sequences you want to. Any others are left as Reference (the default).
Now you can either hit the DoSimPlot button, or alter some of the options. The most apparent options are the Window size and Step size, with "spinner" controls visible just below the sequence buttons on the right. You can click on the up and down arrows to change these. Only multiples of 10 are permitted.
Now hit the DoSimPlot button and you see (I used a gag
As you can see, 301904 is my Query sequence and the rest are
Reference sequences. The window and step size are the
default values. At this point the user can customize the plot
using the buttons on the right side of the screen as described
above to change sequences used, colors, window and step
sizes. There are also a number of options available from
the Options menu.
Options (depicted at left) include gap stripping (ignore sites containing a gap in any sequence - the original alignment file is never changed by SimPlot), scatter or line plot (with selectable point or line size or thickness), and 6 toggles for controlling the plot appearance. These should be self-explanatory and trial-and-error will make their effects clear.
To zoom in on a plot area, click (with the Left
mouse button) on the upper left corner of the region of interest,
and while holding the mouse button down, drag (i.e. don't let up
on the mouse button yet) down and to the right to enclose
the area of interest in the red box that appears (example
depicted at left). When you release the mouse button the
plot will redraw at the new level of magnification. If you
are dealing with a really big alignment this make take a second
or so. Below I explain how to zoom back out. While
you are zoomed in you can pan around the plot by clicking the
Right mouse button and dragging as if you were moving a piece of
At any level of magnification you can get more info about a particular point. This can be especially useful if you want to know where the point of apparent crossover is located. For scatter plots just click on a point. For line plots you need to click on a vertex (a data point used to plot the line, as depicted at left).
When you click on the point the dialog depicted at left is displayed.
In order to return to the original magnification
level, click and drag up and to the left. It does
not matter how large an area you enclose - this is a signal to
At any time, the current plot can be:
The bitmap file for Save and Copy will work in any Windows program that can handle a bitmap (*.bmp) file, like Word, WordPerfect, PowerPoint, Havard Graphics, or the Paint program that comes as part of Windows.
When you are ready to choose 4 sequences for informative site analysis, hit the FindSites button at the lower right corner. There are instructions built into this part, and if you understand the theory this should be enough to get you going [please refer to Robertson,D.L., Hahn,B.H., and Sharp,P.M. Recombination in AIDS viruses. J Mol Evol 1995;40(3):249-59].
Version History go to top
version 1.3 - July 31, 1998
Fixed bug which prevented recognition of informative sites if sequences were in lowercase
version 1.2.2 - May 19, 1998
Removed bevel from edge of chart (looked bad after copy & paste)
version 1.2.1 - May 18, 1998
Fixed problem with fasta file reading detected by Carla Kuiken at the HIV Database and Analysis Group, Los Alamos National Laboratory - thank you!
version 1.2 - May 11, 1998
version 1.1 - April 26, 1998 - First public version
Known Problems/Plans go to top
Please report other problems or suggest additions/priorities
for the improvement list.
Contacting the author go to top
I welcome comments and suggestions.
Stuart Ray, M.D.
home page: http://www.welch.jhu.edu/~sray
Division of Infectious Diseases
Johns Hopkins University School of Medicine
720 Rutland Avenue, Ross 1159
Baltimore, MD, 21205
Windows is a registered trademark of the Microsoft Corporation.