|
|
Abstract
Multiple sequence alignment is fundamental in biological sequence
analysis. It is essential for protein family analysis, phylogenetic
tree construction, remote homology detection and protein structure
prediction. Thanks to recent advances in sequencing technology, the
number of biological sequences in databases has accumulated to
millions and still increasing. This requires multiple sequence
alignment tools with both efficiency and accuracy. Kalign2 is one
of the fastest but still accurate methods for multiple alignments
of large numbers of sequences. However, in contrast to
other methods Kalign2 does not allow externally supplied position
specific gap penalties. Here, we present a modification to Kalign2,
KalignP, so that it accepts such penalties. Further, we show that
KalignP using position specific gap penalties obtained from
predicted secondary structures makes steady improvement over
Kalign2 when tested on Balibase 3.0 as well as on a dataset of
derived from Pfam-A seed alignments.
|