KalignP: Improved multiple sequence alignments using position specific gap penalties in Kalign2
Thursday, November 21 2024
Main Menu
New query
About KalignP
Download
Supplement
Contact
Help
Stockholm University logotype
SBC logotype
CBR logotype
EDICT logotype

Help



1. Summary

KalignP is a modification to Kalign2, so that it accepts externally supplied position specific gap penalties. We show that KalignP using position specific gap penalties obtained from predicted secondary structures makes steady improvement over Kalign2 when tested on Balibase 3.0 as well as on a dataset of derived from Pfam-A seed alignments. Further, KalignP is more flexible than Kalign2, as researchers can modify the behavior of alignment using other knowledge.


2. Usage
To use this web-server, paste or upload your sequences in FASTA format or Enhanced FASTA format and then click the Submit button. The results will be ready for download in the comming webpage shortly. Nevertheless, you may wait for a moment if your input data contains lots of long sequences. You may choose the sequence type and output format by clicking the Advanced options. The sequence type can be set as one of the following three
  1. Uncategorized
  2. Non membrane proteins
  3. Membrane proteins
If you do not know the type of your proteins, just set the sequence type as Uncategorized, and this is also the default sequence type. The output format can be chosen from the following list
  1. aln (Example)
  2. msf/gcg (Example)
  3. clu (Example)
  4. fasta (Example)
  5. macsim (Example)
Under Advanced options, you may also choose whether to use the function Add position specific gap penalty or not. This function is enabled by default. This means if the input file is a normal FASTA file, the server will add the position specific gap penalties automatically based on the secondary structures predicted by PSIPRED (using only single sequence)


How KalignP works

KalignP changes the behavior of the alignment intuitively with the externally supplied position specific gap penalties. Generally speaking, if the users want to force a gap after the first residue of a sequence, set the gap open penalty at the second residue position of that sequence to a small value. For example, for the following four sequences,

>seq1
ASNLSKLFLSDSDA
>seq2
ASNLDA
>seq3
ASNLKFFFDDDAA
>seq4
LLNFFSDAAAAA

The multiple sequence alignment (MSA) with all parameters set to default is as follows (in ClustalW format)

seq1 ASNLSKLFLSDSDA-
seq2 ASNLDA-----
seq3 ASNLKF-FFDDDAA-
seq4 LLN--FFSDAAAAA 

The tree of the MSA is (((seq1, seq2), seq3), seq4). If the users want to open a gap after the second residue in seq1, the users can set the gap open penalty at the third position to be a minus value (e.g. -5). An example setting of ESPSGP is as follows. To ensure that only one gap is opened after the second residue S in seq1 so that we can see clearly the effect of gap open, we have set the gap open penalties for seq2 and the gap open penalty at the fourth position of seq1 to be a large positive value.

>seq1
ASNLSKLFLSDSDA
{gpo: 1 1 -5 10 1 1 1 1 1 1 1 1 1 1 }
>seq2
ASNLDA
{gpo: 10 10 10 10 10 10 }
>seq3
ASNLKFFFDDDAA
>seq4
LLNFFSDAAAAA

The alignment will become


seq1 AS-NLSKLFLSDSDA-
seq2 -ASNLDA-----
seq3 -ASNLKF-FFDDDAA-
seq4 --LLN-FFSDAAAAA

If the users want again to open a gap after the 11th residue D in seq3, set the gap open penalty of the 11th position in seq3 to a minus value (e.g. -5). An example ESPSGP setting is as follows. Again, to ensure that gap will only be opened after the 11th residue in seq3, we have set the gap open penalty of seq4 and the neighboring residue positions in seq3 to large positive values.

>seq1
ASNLSKLFLSDSDA
{gpo: 1 1 -5 10 1 1 1 1 1 1 1 1 1 1 }
>seq2
ASNLDA
{gpo: 10 10 10 10 10 10 }
>seq3
ASNLKFFFDDDAA
{gpo: 1 1 1 1 1 1 1 1 1 10 -5 20 10 }
>seq4
LLNFFSDAAAAA
{gpo: 10 10 10 10 10 10 10 10 10 10 10 10}

The alignment will become

seq1 -----AS-NLSKLFLSDSDA
seq2 -----ASNLDA----
seq3 ASNLKFFFDD--DAA----
seq4 ------LLNFFSDAAAAA

The alignment above might not be ideal since there are many gaps at the terminals. To reduce the number of terminal gaps, one can increase the terminal gap extension penalty as shown in the example below.

>seq1
ASNLSKLFLSDSDA
{gpo: 1 1 -5 10 1 1 1 1 1 1 1 1 1 1 }
{tgpe: 10 10 10 10 10 10 10 10 10 10 10 10 10 10}
>seq2
ASNLDA
{gpo: 10 10 10 10 10 10 }
{tgpe:10 10 10 10 10 10}
>seq3
ASNLKFFFDDDAA
{gpo: 1 1 1 1 1 1 1 1 1 10 -5 20 10}
{tgpe: 10 10 10 10 10 10 10 10 10 10 10 10 10}
>seq4
LLNFFSDAAAAA
{gpo: 10 10 10 10 10 10 10 10 10 10 10 10}
{tgpe:10 10 10 10 10 10 10 10 10 10 10 10}

The alignment will become

seq1 AS-NLSKLFLSDS-DA
seq2 -ASNLDA-----
seq3 AS-NLKFFFDD-D-AA
seq4 --LLNFFSDAAAAA-

Sometimes, the gap will not be opened at the expected position if many customized gap penalties are set in multiple sequences. This is because KalignP forbid neighboring gap opens at two aligned sequences such as the following example

ALDDS-D-S
ALED-D-S-


3. Output

The server output the multiple sequence alignment in the specified format. The default format is ClustalW (Example)


4. References

Nanjiang Shu and Arne Elofsson. KalignP: Improved multiple sequence alignments using position specific gap penalties in Kalign2. Bioinformatics, 2011;27(12):1702-3  


5. Contact

Arne Elofsson group

Center for Biomembrane Research
Department for Biochemistry and Biophysics
The Arrhenius Laboratories for Natural Sciences
Stockholm University
SE-106 91 Stockholm, Sweden




E-mail:     arne@bioinfo.se
Phone:     (+46)-8-16 4672
Fax:     (+46)-8-15 3679
 
 
© 2010-2011 Stockholm University, Stockholm Bioinformatics Center