Raffaele Giancarlo, University of Palermo, Italy

Computational Cluster Validation for Post Genomic Data Analysis
This tutorial will focus on computational and statistical techniques that have found some use in microarray data analysis, in particular for clustering. A detailed list of topics follows.

Statistics, Algorithms and intrinsic limitations of the technology. Cluster Analysis as a three step process: normalization and distance function selection, algorithm selection and parameter setting, selection of validation techniques.
Distance and Similarity Functions. Connections with Sequence Alignment Methods.
Fundamental Clustering Methods: Hierarchical, K-means. Advanced Clustering Methods: CAST, CLICK. Internal Validation Techniques: FOM, Consensus, Gap Statistics, Silhuette.
External Validation Techniques or how good are clustering algorithms: The F-index, the Adjusted Rand Index.
The need for Benchmark Data sets. "One Stop Shop" software systems for Microarray Data Analysis.

 

Bibliography:

Terry Speed (ed), Statistical analysis of gene expression microarray data, CRC Press 2009.

Raffaele Giancarlo, Davide Scaturro and Filippo Utro, Statistical indices for computational and data driven class discovery in microarray data, in Biological Data Mining, J.Y. Chen and S. Lonardi (eds), Taylor and Francis 2009.

J. Handl , J. Knowles and D. B. Kell, Computational Cluster Validation in Post-genomic Data Analysis, Bioinformatics, 2003, 21:3201-3212.