Soren Brunak, Center for Biological Sequence Analysis; Technical University of Danmark

Data Integration In the Era of Personalized Genomics

Data integration within biology is a rather old idea, and as a concept it now influences strongly how complex biological mechanisms are understood and disease etiology revealed. Phenotype-specific information is available in computer-accessible form from a multitude of sources, from molecular level gene-specific knowledge and all the way to healthcare sector data such as electronic patient records. Clinical data contain for example information on the chemical and nutritional environment that – together with genotype information from the individual – can reveal disease etiology in novel ways.

Molecular bioinformatics and systems biology, where properties of genes or networks of genes are revealed, have not been having a large overlap with the medical research communities where results made on the basis of information in patient records, public registries, and from the epidemiological area, are created. The medical sector – using its "past history" – can impact the enabling of an increased personalization of treatment and more precise monitoring for prevention and recovery, and thus improve both quality of care and productivity significantly.

One example described in this series of lectures will be "Heart Development" which involves hundreds of genes, some of which are implicated in congenital heart disease. However, systems-level analyses of how these genes integrate into functional molecular networks, and insight into how these networks are affected in disease, have been is missing. We have combined detailed phenotype information from 255 mouse mutants with high-confidence experimental interactome data to create overviews of the time- and tissue-specific networks coordinating human heart development. The networks were experimentally validated in human embryonic hearts at different developmental stages. The data show that morphogenesis is coordinated by a small set of functional modules that are extensively re-cycled across developmental stages, but each stage is defined by a unique combination of modules. We observe a striking temporal correlation between organ complexity and the number of discrete functional modules coordinating cardiac morphogenesis.

This series of lectures will discuss the perspectives within data integration which includes the medical informatics area – both in terms of data and the methods which are used to integrate them. Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently are mapped to systems biology frameworks.

Relevant papers include (2010-2007):

Dissecting spatio-temporal protein networks driving human heart development and related disorders. Lage K, Møllgård K, Greenway S, Wakimoto H, Gorham JM, Workman CT, Bendsen E, Hansen NT, Rigina O, Roque FS, Wiese C, Christoffels VM, Roberts AE, Smoot LB, Pu WT, Donahoe PK, Tommerup N, Brunak S, Seidman CE, Seidman JG, Larsen LA. Mol Syst Biol. 2010 Jun 22;6:381. PMID: 20571530

Deciphering diseases and biological targets for environmental chemicals using toxicogenomics networks. Audouze K, Juncker AS, Roque FJ, Krysiak-Baltyn K, Weinhold N, Taboureau O, Jensen TS, Brunak S. PLoS Comput Biol. 2010 May 20;6(5):e1000788. PMID: 20502671

Protein annotation in the era of personal genomics. Blicher T, Gupta R, Wesolowska A, Jensen LJ, Brunak S. Curr Opin Struct Biol. 2010 Jun;20(3):335-41. Epub 2010 Apr 18. PMID: 20403684

A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Lage K, Hansen NT, Karlberg EO, Eklund AC, Roque FS, Donahoe PK, Szallasi Z, Jensen TS, Brunak S. Proc Natl Acad Sci U S A. 2008 Dec 30;105(52):20870-5. Epub 2008 Dec 22. PMID: 19104045

A human phenome-interactome network of protein complexes implicated in genetic disorders. Lage K, Karlberg EO, Størling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tümer Z, Pociot F, Tommerup N, Moreau Y, Brunak S. Nat Biotechnol. 2007 Mar;25(3):309-16. PMID: 17344885