Anna Tramontano, University of Rome “La Sapienza”, Rome, Italy, and Istituto Pasteur Fondazione Cenci Bolognetti, Rome, Italy

Assessing Algorithmic Performance-The CASP (Critical Assessment of Techniques for Protein Structure Prediction) Experience
The seventh edition of the Critical Assessment of Protein Structure Prediction Experiment (CASP) was held in Asilomar (CA) in December 2006. A few months before, ninety-six protein sequences of unknown, but soon to be determined, structure were distributed to more than two hundred and fifty groups whose task was to predict their structure and a number of other features, such as their domain boundaries, the presence of disordered regions, contacts between specific amino acids and their molecular function. As soon as the structure of a target protein was available, the prediction centre at UC Davis (http://www.predictioncenter.org) compared it with all submitted predictions, computed several parameters and passed them to a set of experts who were given the task to draw conclusions about the state of the art in the field. Last year the assessors for structure prediction were Neil Clarke (Genome Institute of Singapore), Torsten Schwede (Swiss Institute of Bioinformatics, CH) and Randy Read (Cambridge Institute for Medical Research, UK). Alfonso Valencia (Spanish National Cancer Research Centre, ES), Lorenza Bortoli (Swiss Institute of Bioinformatics, CH), Neil Clarke and Mike Tress (Spanish National Cancer Research Centre, ES) assessed the so-called “other categories”, namely function prediction, disordered regions, contacts and domain boundary prediction. A new category was introduced in 2006: predictors were given a set of models produced by automatic servers and were asked to predict the “quality” of each model (i.e. its distance from the native structure). After the structures were made available, Andriy Kryshtafovych (University of California Davis, USA) and I analysed the correctness of the predictions in this area. This complex process culminated in a meeting, held in Asilomar (California, USA), where the scientists discussed what went right and what went wrong in their modelling efforts as revealed by the blind tests of the experiment. The results of the assessment and discussions as well as some of the idea that will be part of the next CASP edition will be the subject of my talk. The group of Yang Zhang (University of Kansas, USA) improved the Tasser method produced models of impressive quality. So did David Baker (Washington University, USA) who improved the Rosetta method mainly by recruiting hundreds of thousands of CPUs around the world in his Rosetta@home project (http://boinc.bakerlab.org/rosetta/) and therefore was able to sample the conformational space of proteins much more thoroughly than it was possible before. I will discuss the methods for assessment used in CASP which are by now very professional, based on sound statistical analysis and essentially uncontroversial. I will also talk about the problem of establishing whether there has been progress between experiments. This is a difficult question: the targets are different in each experiment and therefore the complexity of predicting them can be different. Furthermore, sequence and structure databases grow continuously so that the task of identifying suitable structural templates and of obtaining a good sequence alignment for a given protein target becomes progressively easier. For these reasons, comparing results from different experiments is a tricky business, and methods for defining a “difficulty scale” for a protein structure prediction are continuously being proposed. I will also survey the results in the other categories. In domain boundary predictions not many conclusions could be derived. Most of the target proteins came from structural genomics projects where, apparently, multi-domain proteins are rarely selected and therefore the results in this area were inconclusive. Nothing much happened in disorder prediction either, no new methods appeared and the old ones performed exactly as expected, i.e. at the same level as in the previous CASP experiment. Assessing function prediction was as difficult as in the past experiment, to the point that it was decided that, in the next CASP experiment, predictors will be asked only to concentrate on the prediction of residues involved in binding sites rather than in predicting the full-fledged molecular function of the target proteins. The problem is that it is difficult to assess the predictions, since the assessors do not have more information than the predictors at the end of the experiment. The issue has been discussed before and we published a retrospective analysis of the CASP6 function predictions concluding that the predictions can be useful, provided that a large number of groups participate. Yet this did not convince more than a few members of the community to take part in the challenge and too few predictions were again submitted this time around. Finally, it is extremely relevant to have an idea of the quality of the thousands of models produced every day all over the world if we want to make use of them in a sensible way. The category of “quality prediction” was introduced exactly for this reason and we are happy to report that there are methods able to evaluate “a priori” the quality of a model on the basis of the atomic coordinates, the best ones being those available at the Stockholm Bioinformatics Centre (http://www.sbc.su.se/). Too-hundred and fifty groups working on a project for three months every two years (the effort of the CASP community in the prediction season, without taking into account all the time and effort needed to develop and test the methods) represent a substantial investment in time and money, probably comparable with some of the large world wide genomic and post-genomic projects. It is legitimate to ask the question of whether the results are worth the effort. Undoubtedly everybody has a different answer to this question. Personally I strongly believe that CASP is necessary to the community for at least three reasons, first for removing false claim of success from such an important field as structure prediction, second to highlight areas where we are not able to make progress in a measurable way and, last but not least, to urge the prediction community to continuously develop and improve the methods.

References
Moult, J., Pedersen, J., Judson, R. & Fidelis, K. (1995) A large-scale experiment to assess protein structure prediction methods., Proteins. 23 , ii-v.

Zhang, Y. & Skolnick, J. (2004) Automated structure prediction of weakly homologous proteins on genomic scale, Proc. Natl. Acad. Sci. 101 , 7594-7599.

Simons, K. T., Kooperberg, C., Huang, E. & Baker, D. (1997) Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using Simulated Annealing and Bayesian Scoring Functions, J. Mol. Biol. 268 , 209-225.

Soro, S. & Tramontano, A. (2005) The prediction of protein function at CASP6., Proteins. 61 , 201-213.

Pellegrini-Calace, M., Soro, S. & Tramontano, A. (2006) Revisiting the prediction of protein function at CASP6., FEBS J. 273 , 2977-2983.

Wallner, B. & Elofsson, A. (2005) Pcons5: combining consensus, structural evaluation and fold recognition scores, Bioinformatics. 21 , 4248-4254.