RNAs : structure, function and therapy

Knut Reinert, Freie Universität Berlin, Germany

SeqAn, A a generic C++ library for the analysis of biological sequences
Biological sequence analysis is the heart of computational biology. Many successful algorithms (e.g., Myers' bit-vector search algorithm, BLAST, etc.) and data structures (e.g., suffix arrays q-gram based string indices, sequence profiles) have been developed over the last fifteen years. The assembly of large eukaryotes genomes like Drosophila Melanogaster, Human, and Mouse are prime examples where algorithm research was successfully applied to a biological problem. However, with entire genomes in hand, large scale analysis algorithms that require considerable computing resources are becoming increasingly important (e.g., Lagan, MUMmer, MGA, Mauve). Although these tools use slightly different algorithms nearly all of them require some basic algorithmic components, like a suffix array, exact or approximate string searches, a chaining of fragments, or local alignments. The construction of these components is, however, rather non-trivial. Therefore suboptimal data types and ad-hoc algorithms are frequently employed. The lack of readily available, sophisticated implementations of the accepted algorithms and data types greatly hinders the rapid development of large-scale applications. In the tutorial we present SeqAn, a generic C++ library for the analysis of biological sequences (see www.seqan.de for more information and publications). In the firt half of the tutorial we speak about the desig principles of SeqAn. We show how genereic it is and what mechanisms guarantee high performance. We discuss its content and what high level applications have already been implemented. In the second half of the tutorial we give a hands-on example of how to rapidly prototype a high level application in SeqAn.

Lipari School