First start up sip4 by typing in:
sip4 & Then select the Simple sub option from the Load sequences option of the File pull down menu Load in two sequences from the default EMBL sequence database. Load in: For the Horizontal sequence, the EMBL entry with the EntryName xlacacr For the Vertical sequence, the EMBL entry with the EntryName xlactcag Once you have things as illustrated, click on the OK button of the Load sequences window. The selected sequences will be read in by sip4 and their details displayed sip4's Output window. You have loaded two Xenopus Laevis actin sequences. One is genomic DNA (xlactcag) and the other is the corresponding cDNA (xlacacr). |
Next select the Local alignment option
from the
Comparison pull down menu.
In the local alignment window, change the setting for the penalty for each residue in gap from its default setting of 0.2 to 1 and then click OK You have made gaps substantially more expensive to extend than by default, but retained the default cost for starting a gap. A textual representation of the alignment you compute will appear in the Output window of sip4. A corresponding graphical representation will also be generated in a new window labelled sip plot. Look in the Output window. Use the Output window scroll bar to look at the computed alignment. |
Move to plot. The diagonal line indicates
the aligned regions. Select crosshairs. Note that as you move the
crosshairs around, the indicated positions in the Horizontal and
Vertical
sequences are shown in boxes in the top right hand corner of the
plot. Click on the crosshair button once more and the crosshairs
will disappear.
Double click with middle mouse button near to the line indicating the aligned regions. sequence_display will appear. Move to the sequence_display and click on Nearest match. This will move the sequence_display to one end of the aligned region. Click on the Lock button. Move around the sequence_display using the sequence_display slide bars and/or the graphics screen cross hair (middle mouse button position over the green and blue lines indicating the position of the sequence_display) to look along the aligned region. Note movement of the sequence display is strictly limited to the current alignment diagonal. Click on the sequence_display Lock button again (turning Lock off). Move the sequence_display once more using the scroll bars and/or the graphic screen crosshair. Note you are no longer constrained to the current alignment diagonal. |
Get rid of the sequence display by selecting the
Exit
option from its File pull down menu.
Once again, select the Local alignment option from the Comparison pull down menu. As before, set the penalty for each residue in gap to 1. This time, also click on the alignments above score button. This asks that all alignments scoring more than 20 (by default) are shown. The default is that only the one best alignment is displayed. Click OK Look at the textual output in the sip4 Output window and the corresponding graphical output. Note that this time sip4 has reported 7 aligned regions (corresponding to the 7 exons represented in both sequences and separated by 6 introns in the genomic sequence). You have your two graphical alignments one on top of the other. Separate them by picking up either one (middle mouse button held down over the coloured square next to the graphic to be repositioned) and moving it out of the current graphics display window (you can also move it just above or below other graphics on the current window). |
For the third time, select the Local alignment
option from the Comparison pull down menu. This time, accept all
the default settings and click OK
Look at the textual output in the sip4 Output window and the corresponding graphical output. Note that this time sip4 has reported only 1 aligned region but that region spans the 5 of the 7 reported by the previous analysis. The default settings of the gap penalty values are such that sip4 introduces gaps in the cDNA to match the introns in the genomic sequence. Look at both the textual and graphical output. Note that not the whole of both sequences are included in the alignment. Move the graphical output into a separate window. |
For the fourth and final time, select the Local
alignment option from the Comparison pull down menu. This time,
click only on the alignments above score button and then go for
the OK button. This time you generate 3 alignments. One covering
5
of
the 7 exons and two others each covering one of the remaining two
exons.
Select the results manager from the View pull down menu. Remove the 4 raster plot entries. Go to the Output window. Note that using your right hand mouse button you can out put any of your textual creations to a disc file. Instead, remove all textual output so you have a nice clean start for the next section. |
GLOBAL ALIGNMENT WITH SIP4
Select Sequence
manager from the File pull down menu. Put the mouse over the
xlacacr
entry
and hold down the right hand button. Select the Set range option.
Set the Start position of xlacacr to 500. Set the End position of xlacacr to 800 Put the mouse over the xlactcag entry and hold down the right hand button. Select the Set range option. Set the Start position of xlactcag to 4700. Set the End position of xlactcag to 5500 Put the mouse over the xlacacr (500..800) entry and hold down the right hand button. Select the Horizontal option. Put the mouse over the xlactcag (4700..5500) entry and hold down the right hand button. Select the Vertical option. Select Align sequences from the Comparison pull down menu. Note the very much larger default value for penalty for each residue in gap. By default, long gaps will be expensive. This method is therefore far less likely than the local method to gap the introns in xlactcag correctly. Click on the OK
button of the align sequences window for a default global alignment.
Look first at the textual output in the sip4 Output window. The regions you are aligning contain 2 of the 7 exons in both xlacacr and xlactcag. It should be clear that sip4 has correctly aligned one of the two exons but aligned the other with the intervening intron. The gap penalties were such that, from the simplistic view point of the program, this is preferable to matching the intron in xlactcag with a gap in xlacacr. Note that at the bottom of the display sip4 notes that it has: Added sequence xlacacr_s1_a4If you look again at your Sequence manager, you will see that there are two new entries. These are the aligned portions of xlacacr and xlactcag, including padding characters. |
Percentage mismatch 71.9 500 510 520 530 540 550 |
Now take a look at
your graphical output.
You should see a single diagonal line passing through the correctly aligned exon. Using your textual alignment as a guide, double click with your middle button on the diagonal somewhere around where the alignment is real. This will cause the sequence_display window to come into view and the green and blue sequence position cross hairs to appear. Click on the Nearest match button in the sequence_display window to move exactly to a properly aligned region. Click on the Lock button and move along the aligned exon. Remove your incorrect graphical alignment and your sequence display. |
Do further Global alignments changing the alignment
parameters until you generate a believable alignment illustrated graphically
here.
Hint: (as if you need one): You have to make gaps, particularly long gaps cheaper and/or correctly aligned bases better rewarded and/or incorrectly aligned bases less severely penalised. Once you have succeeded, remove all you results in order to start afresh for the next section. |
DOT PLOTS WITH SIP4
First do a dot plot of the whole of xlacacr
against
the whole of xlactcag. Logically, one would do this before playing
around with the Global and/or Local alignment tools of sip4. Dot
plots are for generating an overview showing roughly how sequences compare.
This overview should be used to plan the use of the textual alignment tools.
To retain a little "mystery" we leave the obvious first step until last in this exercise. So, load the whole of xlacacr as the Horizontal sequence and the whole of xlactcag as the Vertical sequence. Select the Find similar spans option from the Comparison pull down menu and request a default dot plot by clicking the OK button. The dot plot clearly shows the 7 exons that were revealed bit by bit during the previous analyses. |
Fine, but not that interesting as dot plots go.
The exons revealed are all very strong (almost identical regions). They
offer but a small challenge to sip4.
Remove all current results, textual and graphical. Select the Sequence manager from the File pull down menu. Put the mouse cursor over each sequence in turn, depress the right hand mouse button and select the delete option. Your Sequence manager should be empty when you have finished. Select the Simple sub option from the Load sequences option of the File pull down menu. Select the SWISSPROT database for both the Horizontal and Vertical sequences. Enter the Entry Name egfr_human for both the Horizontal and Vertical sequences. Click the OK button of the Load sequence window. |
Select the Find similar spans option from the Comparison pull down menu and change the default window length of 11 to 25. Note how the default minimum score adjusts automatically to reflect the change in window length. Click on the OK button in the find similar spans window. |
sip4 notices that you have given it a sequence
to compare against itself. By default, for self comparisons
sip4 does
not plot the inevitable leading diagonal or the mirror image top half of
the plot.
This plot illustrates that, contrary to common initial reaction, comparing a sequence with itself is not silly. Such plots can show up interesting internal features. Here you should see a fairly strong diagonal line of dots indicating clear evidence of a reasonably faithful repeat of the first 300 or so amino acids. Also, at the end of this repeat there appears to be a region of several other very short repeats. To investigate the repeat region, bring up the sequence_display by double clicking with the middle mouse button near an interesting feature. Click on the Nearest match button of the sequence_display window to position the display exactly over the region of interest. Move along the repeat region by clicking on the Lock button, sliding the sequence_display scroll bar along a bit, clicking on the Lock button once more (to unlock the sequence displays) and then clicking on the Nearest match button once more. |
Once you have seen enough, remove the sequence_display
and the plot. Next produce the same plot again, but this time computing
the whole plot including the leading diagonal the merely indicates the
egfr_human
is remarkably similar to itself.
To ensure the whole plot is generated, select the Hide duplicate matches option from the Options pull down menu. Once more, select the Find similar spans option from the Comparison pull down menu and change the default window length of 11 to 25. Click the OK button. You should see the full dot plot as illustrated to the left of this text. |
Next, to show the effect of varying the window
size used for the dot plot. Varying the window size being the most effective
way of controlling the sensitivity of a dot plot. Smaller window sizes
generating more sensitive plots.
First, configure the plot you have just computed so that it is displayed using thick white dots. To do this, put your mouse cursor over the coloured square corresponding to the plot you have just generated. Hold your right mouse button down and select the configure option. A window labelled cbox will emerge. In this window, adjust the Line width setting to 4 and position the Red, Green, Blue sliders to produce White (i.e. slide all three to the extreme right). This done, click on the cbox OK button. The next step is to redraw the dot plot using a smaller (default size of 11) window length. Smaller window lengths generate more accurate plots. |
Select once more the Find similar spans option
from the Comparison pull down menu. This time compute a default
dot plot by simply clicking on the OK button.
The second, more accurate plot is drawn on top of the less accurate (window length of 25) plot. In order to make comparison of the two plots as easy as possible, draw the plot on top in black with thin lines. Use the configure option from the second plot to do this as before. The superimposed plots show three important effects of using more accurate smaller window sizes. They are: |
Try zooming into an interesting region (for example
the region with the small repeats) by holding the Ctrl key and the
right hand mouse button down and defining a rectangle around the region
you wish to magnify.
To "unzoom", use the Back button. Finally, try using the Help buttons. Help buttons are available from all sip4 windows. Depending on how things are set up, very full context dependant Help will be made available either in a web browser window or in a display tool specific to the Staden package. |