Iuliana Ionita-Laza, Department of Biostatistics, Columbia University, USA

Statistical Challenges in Genetic Association Studies
It is now well-recognized that the genetic architecture of most common diseases, such as cancer, autism, schizophrenia, is complex, with many different genes influencing risk to such diseases. The most widely-used approach to identify genetic factors underlying complex diseases has been through genome-wide association studies. With the recent progress in massively parallel sequencing technologies, sequence-based association studies have become more affordable, and are likely to contribute significantly to our current knowledge of complex disease genetics. A potential limitation of both genome-wide and sequence-based association studies is that they focus on one variant or one gene at a time, and do not take advantage of existing biological knowledge. We will discuss the basic of genome-wide and sequence-based association studies, and statistical methodology that incorporates prior information on biological networks to (1) improve the power to identify disease related genes, and (2) identify new pathways involved in disease. The methodology will be illustrated using relevant examples from genetic studies of complex traits.

References

1. Network-constrained regularization and variable selection for analysis of genomic data. http://www.ncbi.nlm.nih.gov/pubmed/18310618 Li C, Li H. Bioinformatics. 2008 May 1;24(9):1175-82. doi: 10.1093/bioinformatics/btn081. Epub 2008 Mar 1.

2. Five years of GWAS discovery. http://www.ncbi.nlm.nih.gov/pubmed/22243964 Visscher PM, Brown MA, McCarthy MI, Yang J. Am J Hum Genet. 2012 Jan 13;90(1):7-24. doi: 10.1016/j.ajhg.2011.11.029. Review.

3. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. http://www.ncbi.nlm.nih.gov/pubmed/18577223 Liu D, Ghosh D, Lin X. BMC Bioinformatics. 2008 Jun 24;9:292. doi: 10.1186/1471-2105-9-292.

4. Optimal tests for rare variant effects in sequencing association studies. http://www.ncbi.nlm.nih.gov/pubmed/22699862 Lee S, Wu MC, Lin X. Biostatistics. 2012 Sep;13(4):762-75. doi: 10.1093/biostatistics/kxs014. Epub 2012 Jun 14.

5. Scan-statistic approach identifies clusters of rare disease variants in LRP2, a gene linked and associated with autism spectrum disorders, in three datasets. http://www.ncbi.nlm.nih.gov/pubmed/22578327 Ionita-Laza I, Makarov V; ARRA Autism Sequencing Consortium, Buxbaum JD. Am J Hum Genet. 2012 Jun 8;90(6):1002-13. doi: 10.1016/j.ajhg.2012.04.010. Epub 2012 May 10.

6. Incorporating network structure in integrative analysis of cancer prognosis data. http://www.ncbi.nlm.nih.gov/pubmed/23161517 Liu J, Huang J, Ma S. Genet Epidemiol. 2013 Feb;37(2):173-83. doi: 10.1002/gepi.21697. Epub 2012 Nov 17.