Alfredo Pulvirenti, University of Catania, Catania, Italy

Sequence Alignment Algorithms
Sequence alignment is a mandatory step in many biological applications from sequence similarity search to structure modelling and phylogenetic studies. However, design tools able to produce high-quality alignments for distantly related sequences is still a challenging computational problem. In this tutorial algorithmic approaches to two and multiple sequence alignment are reviewed. The basic computational formulation of pairwise alignment together with its dynamic programming solution is introduced first. Next the much harder multiple sequences is examined. Some key techniques to provide acceptable solutions are surveyed: probabilistic consistency, exploiting additional not sequence-related information (structure, profiles, etc.), gaps handling, adding flexibility for aligning proteins from different domain architectures, improving scalability. Finally MSA validation issues such as standard de facto benchmarks, scoring systems and ROC analysis are also discussed.

Pei J and Grishin NV. (2007). PROMALS: towards accurate multiple sequence alignments of distantly related proteins, Bioinformatics , 23 : 802–808.

Schwartz AL and Pachter L. (2007). Multiple alignment by sequence annealing, Bioinformatics 23 :24-29 .

Edgar RC and Batzoglou S. (2007). Multiple sequence alignment, Curr Opin Struct Biol .

Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. (2005). ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res , 15 :330-340.

Raphael B, Zhi D, Tang H, Pevzner P. (2004). A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res , 14 :2336-2346.

Löytynoja A and Goldman N. (2005). An algorithm for progressive multiple alignment of sequences with insertions, Proc. Natl. Acad. Sci. USA 102 , 10557–10562.

Edgar RC. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res , 32 :1792-1797.

Di Pietro C, Di Pietro V, Emmanuele G, Ferro A, Maugeri T, Modica E, Pigola G, Purrello M, Pulvirenti A, Ragusa M, Scalia M, Shasha D, Travali S, Zimmitti V. (2003). ANTICLUSTAL: Multiple Sequence Alignment by Antipole Clustering and Linear Approximate 1-Median Selection, Proceedings of IEEE Computer Society Bioinformatics Conference (CSB03), 326-336.

Apostolico A., Giancarlo, R. (1999). Sequence Alignments in Molecular Biology, Journal of Computational Biology , 9 :120-162.