Raffaele Giancarlo, University of Palermo, Palermo, Italy

Alignment-free Classification and Comparison of Biological Sequences and Structures
Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Those measures offer the additional advantage of being applicable to structural data, as long as the structure is encoded into a string. A series of recent ground-breaking results have given evidence that those techniques are valid alternatives to classic methods, based on alignments, for classification and comparison of biological sequences and structures. In this tutorial, the state of the art in this area is presented. In depth attention will be given to the Universal Similarity Metric, a methodology based on Kolmogorov Complexity. In particular, the first set of extensive quantitative experiments showing that USM is successfully applicable in Biology will be presented.

References:
Ferragina P, Giancarlo R., Greco V., Manzini G., Valiente G. (2007). Compression-based classification of biological sequences and structures via the Universal Similarity Metric: Experimental Assessment, BMC Bioinformatics , 8 :252.

Gilbert D, Rossello F, Valiente G, Veeramalai M. (2007). Alignment-Free Comparison of TOPS Strings, In London Algorithmics and Stringology 2006 , Volume . Edited by Daykin J, Mohamed M, Steinhofel K, College Publications:177-197.

Cilibrasi R, Vitanyi PMB. (2005). Clustering by Compression. IEEE Tr. Inform. Theory , 51 :1523-1545.

Keogh E, Lonardi S, Rtanamahata C. (2004). Towards parameter-free data mining, Proc. 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, ACM :206-215.

Krasnogor N, Pelta DA. (2004). Measuring the Similarity of Protein Structures by Means of the Universal Similarity Metric, Bioinformatics , 20 :1015-1021.

Li M, Chen X, Li X, Ma B, Vitanyi PMB. (2004). The Similarity Metric, IEEE Tr. Inform. Theory , 50 :3250--3264.

Vinga S, Almeida J. (2003). Alignment-Free Sequence Comparison: A Review. Bioinformatics , 19 :513-523.