Itzik Peer, Columbia University in the City of New York

High Throughput Sequencing for Comprehensively Cataloging Variants
Within the short time since its introduction, high throughput (next generation) sequencing has revolutionized molecular genetics and therefore bioinformatics. Specifically, for medical and population genetics, this array of technologies made it possible to ask and answer a new set of questions, taking advantage of data throughput increase that exceeds Moore's law. The bioinformatics challenges therefore run from the basic data processing pipelines to innovative analysis.

The first and major part of this course we will explore the algorithms used for mapping short reads at a genomic scale, assembling genomes de-novo, calling variants, and achieving all of these in a scalable, parallel manner. Computational constructs we would discuss include suffix arrays, Euler graphs, likelihood ratios and the Map/Reduce paradigm.

Secondly, we will discuss the current paradigm shift in genetics from common- to rare-variant association. This has implication to us not only as scientists, but as members of society, as it brings new meaning to the annotation of personal genomes towards personalized medicine.

Finally, specific attention will be devoted to research in the Pe'er lab, focusing at genetics of isolated populations, along with the unique opportunities and challenges they bring to the era of personal genome.

Bibliography:

Introduction - the human genome
Initial sequencing and analysis of the human genome - Lander et al. (2001) www.nature.com/nature/journal/v409/n6822/full/409860a0.html

The Sequence of the Human Genome , - Venter et al. (2001) www.sciencemag.org/cgi/content/short/291/5507/1304

Technologies
Evaluation of next generation sequencing platforms for population targeted sequencing studies - Harismendy et al. (2009) www.genomebiology.com/2009/10/3/R32

Comparing Platforms for C. elegans Mutant Identification Using High-Throughput Whole-Genome Sequencing - Shen et al. (2008) www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0004012

Individual sequencing
The diploid genome sequence of an individual human - Levy , et al. (2008) www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0050254

The complete genome of an individual by massively parallel DNA sequencing - Wheeler et al. (2008) http://www.nature.com/nature/journal/v452/n7189/full/nature06884.html

Accurate whole human genome sequencing using reversible terminator chemistry - Bentley et al. (2008) http://www.nature.com/nature/journal/v456/n7218/full/nature07517.html

The diploid genome sequence of an Asian individual - Wang et al. (2008) http://www.nature.com/nature/journal/v456/n7218/full/nature07484.html

The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group - Ahn et al. (2009) http://genome.cshlp.org/content/early/2009/05/26/gr.092197.109.full.pdf+html

A highly annotated whole-genome sequence of a Korean individual - Kim et al. (2009) www.nature.com/nature/journal/vaop/ncurrent/full/nature08211.html

Single-molecule sequencing of an individual human genome - Pushkarev et al. (2009) http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.1561.html

Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two base encoding - McKernan et al. (2009) genome.cshlp.org/content/early/2009/06/18/gr.091868.109.full.pdf+html

Applications
A Metagenomic Survey of Microbes in Honey Bee Colony Collapse Disorder - Cox-Foster et al. (2007), www.sciencemag.org/cgi/content/full/318/5848/283

Genetic Variation in an Individual Human Exome - Ng et al.(2008) www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000160

Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing - Campbell et al. (2008) http://www.nature.com/ng/journal/v40/n6/full/ng.128.html

Recurring Mutations Found by Sequencing an Acute Myeloid Leukemia Genome - Mardis et al. (2009) http://content.nejm.org/cgi/content/abstract/NEJMoa0903840v1

Quantification of the yeast transcriptome by single-molecule sequencing - Lipson et al. (2009) www.nature.com/nbt/journal/v27/n7/full/nbt.1551.html

Limitations and possibilities of small RNA digital gene expression profiling - Linsen et al. (2009) www.nature.com/nmeth/journal/v6/n7/full/nmeth0709-474.html

Mapping and Assembly
Velvet: algorithms for de novo short read assembly using de Bruijn graphs. - Zerbino & Birney (2008) http://genome.cshlp.org/content/18/5/821.full

Short read fragment assembly of bacterial genomes - Chaisson & Pevzner (2008) http://genome.cshlp.org/content/18/2/324.full

SOAP: short oligonucleotide alignment program - Li et al. (2008) http://bioinformatics.oxfordjournals.org/cgi/content/full/24/5/713

Mapping short DNA sequencing reads and calling variants using mapping quality scores - Li et al. (2008) http://genome.cshlp.org/content/18/11/1851.full

Fast and accurate short read alignment with Burrows_Wheeler transform - Li & Durbin (2009) http://bioinformatics.oxfordjournals.org/cgi/content/full/25/14/1754

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome - Langmead et al. (2009) http://genomebiology.com/2009/10/3/R25

SOAP2: an improved ultrafast tool for short read alignment - Li et al., (2009) http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp336

SNP detection for massively parallel whole-genome resequencing - Li et al. (2009) http://genome.cshlp.org/content/early/2009/05/06/gr.088013.108

Sensitive and accurate detection of copy number variants using read depth of coverage - Yoon et al. (2009) http://genome.cshlp.org/content/early/2009/08/05/gr.092981.109.full.pdf+html

High-resolution mapping of copy-number alterations with massively parallel sequencing - Chiang et al. (2009) http://www.nature.com/nmeth/journal/v6/n1/full/nmeth.1276.html

Reviews
Bioinformatics challenges of new sequencing technology
Pop & Saltzberg (2008) dx.doi.org/10.1016/j.tig.2007.12.006

The impact of next-generation sequencing technology on genetics - Mardis (2007) http://dx.doi.org/10.1016/j.tig.2007.12.007