Abstract
The environment's contribution to health has been conceptualized as the exposome. Biomedical research interest in environmental exposures as a determinant of physiopathological processes is rising as such data increasingly become available. The panoply of miniaturized sensing devices now accessible and affordable for individuals to use to monitor a widening range of parameters opens up a new world of research data. Biomedical informatics (BMI) must provide a coherent framework for dealing with multi-scale population data including the phenome, the genome, the exposome, and their interconnections. The combination of these more continuous, comprehensive, and personalized data sources requires new research and development approaches to data management, analysis, and visualization. This article analyzes the implications of a new paradigm for the discipline of BMI, one that recognizes genome, phenome, and exposome data and their intricate interactions as the basis for biomedical research now and for clinical care in the near future.
The opportunities
The phenotype of an individual results from the interplay between the genome (the complete set of genetic information) and the external/environmental elements to which it is exposed.1 The environment's contribution to health has been conceptualized as the exposome, defined as ‘every exposure to which an individual is subjected from conception to death, requiring consideration of the nature of the exposures and their changes and can be considered as internal, specific external and general external.’2
Biomedical research interest in environmental exposures as a determinant of physiopathological processes is rising as such data become increasingly available. The collection of new types of data on microbiomes,3 epigenomics,4 and physiological changes5 is proving very valuable in exposure assessment. Moreover, the panoply of miniaturized sensing devices now accessible and affordable for individuals to use to monitor a widening range of parameters—from clinical parameters such as blood pressure or glucose levels, to environmental parameters such as physical activity, food intake, the ambient temperature, or the presence of pollutants6—opens up a new world of research data. All of these data can be considered relevant for understanding the exposome; their integration and combined analysis looks very promising for advancing biomedical research.7
This situation presents new opportunities for biomedical informatics (BMI) to evolve as a discipline. For most of the 20th century, BMI mainly studied, represented, and analyzed phenotypic information related to health and disease states. In the last 20 years, due to advances in molecular medicine, BMI has started to deal significantly with ‘-omics' information, and this has had a profound impact on BMI as a discipline.8 Many studies combining phenomic and genomic data, including genome-wide association studies (GWAS), have yielded important results. However, these approaches have also been criticized for their limited capability to explain the mechanisms underlying complex diseases.9 There is also increasing evidence that major determinants of common disease are based on exposure and behaviors.10,11 Now advances in exposome data collection12,13 and processing may be extending BMI again, probably pushing it towards another substantial revision.
A new paradigm for BMI is demanded by the increasing need to deal with inter-related exposome, genome, and phenome data or, as it has been termed, exposure science information.14 Five examples illustrate this point. First, continuous collection of real-time, highly dynamic environmental, genetic, and physiological data is now possible, using the new sensors.15 This is also closely related to the concept of ‘reality mining,’ which refers to the analysis of behavioral and self-reported data extracted from social networks and handheld devices such as mobile phones and applications.16 Second, genetic phenomena such as mosaicism and chimerism (eg, gene therapy, allogenic organ transplant, or intra-tumor cell genome heterogeneity17) reveal that a single individual might be composed of different genomes, adding a dynamic dimension to our previously static view of genomes. Third, epigenetic changes in response to environmental factors involve new probabilistic and multidimensional elements in health and disease.18 Fourth, advances in nanotechnology and its applications in medicine require the consideration of data on nanomaterials and their effects on living cells, as another aspect to be included in exposome informatics.19,20 Fifth, data from the human microbiome21 project sit at the intersection of genome, exposome, and phenome information. Definitions for key concepts are provided in table 1.
Definition of key concepts
| Concept | Definition | Source |
|---|---|---|
| Mosaicism | Condition in which cells within the same person have a different genetic makeup | Medline Plus http://www.nlm.nih.gov/medlineplus/ency/article/001317.htm |
| Epigenetics | Concerns the mechanisms that make organisms or parts of organisms look different, despite the fact they have the same genes and are in the same environment | The Conversation http://theconversation.com/explainer-what-is-epigenetics-13877 |
| Nanomaterial | Materials with at least one external dimension in the size range from approximately 1–100 nanometers | Centers for Disease Control and Prevention http://www.cdc.gov/niosh/docs/2009-125/ |
| Microbiome | Collective genomes of the microbes (composed of bacteria, bacteriophage, fungi, protozoa, and viruses) that live inside and on the human body | National Human Genome Research Institute http://www.genome.gov/27549400 |
| Concept | Definition | Source |
|---|---|---|
| Mosaicism | Condition in which cells within the same person have a different genetic makeup | Medline Plus http://www.nlm.nih.gov/medlineplus/ency/article/001317.htm |
| Epigenetics | Concerns the mechanisms that make organisms or parts of organisms look different, despite the fact they have the same genes and are in the same environment | The Conversation http://theconversation.com/explainer-what-is-epigenetics-13877 |
| Nanomaterial | Materials with at least one external dimension in the size range from approximately 1–100 nanometers | Centers for Disease Control and Prevention http://www.cdc.gov/niosh/docs/2009-125/ |
| Microbiome | Collective genomes of the microbes (composed of bacteria, bacteriophage, fungi, protozoa, and viruses) that live inside and on the human body | National Human Genome Research Institute http://www.genome.gov/27549400 |
These are examples of how the equation ‘Phenotype=Genotype×Environment’ poses enormous challenges to current biomedical research information systems. Current systems show something like a snapshot of the information available at certain stages. In comparison, future information systems for research will have to use new methods to process the flow and mix of data that will generate the coming wave of biomedical information and insights.
This new paradigm for BMI will bring a change in focus as well as in methods, insofar as it realizes the vision of more personalized biomedical research. Traditionally, most available exposure data have been captured through population studies. However, with the new sensors each individual can monitor their own exposures autonomously. Furthermore, new approaches to data integration can support individuals to combine such data with geospatial and behavioral tracking data.22 We have moved into an era when complex data monitoring and handling processes can be driven not only through large formal health research infrastructures, but also by individuals who wish to build their personal understanding of their own health (figure 1).
The challenges
The combination of these more continuous, comprehensive, and personalized data sources requires new BMI research and development approaches to data management, analysis, and visualization. BMI must provide a coherent framework for dealing with multi-scale population data including the phenome, the genome, the exposome, and their interconnections (figure 2). The work involves defining an informatics infrastructure able to handle all of these types of data with a three-fold goal: (i) to perform population-based analysis that improves our knowledge of basic human health behaviors and determinants of common diseases; (ii) to provide data for basic and clinical research that combines phenotype, genotype, and exposure data at the level of the individual; and (iii) to build an augmented, data-rich personal health record which produces personal research results, tracking a person's exposome and giving him or her highly individualized, multi-faceted, disease risk profiles. A number of technical, organizational, and societal challenges have to be faced in implementing this BMI infrastructure to support both institutional and personal research.
New research data types will require changes in biomedical informatics methods.
New research data types will require changes in biomedical informatics methods.
Let us consider what is involved in dealing with the ‘general external exposome’ (GEE).2 GEE data are generated routinely by everyone who engages in the information society through our communications using mobile phones, our movements using transit passes and recorded by security cameras, our purchases on bank cards, our utility consumption metered in the household, and our lifestyle choices reflected in social media, complemented by fixed and wearable sensors for sporting activity, ambulatory care monitoring, and ambient assisted living in smart homes. They are heterogeneous and selective (variety), there is a huge amount of data (volume), and their speed of processing needs to be high for optimal use (velocity). An additional crucial dimension of GEE data is time, characterized by multiple granularities: the GEE may include signals, for example collected by sensors (on a time scale of seconds, minutes, or hours), lifestyle data, such as information on food and nutrition (on a time scale of days or months), and finally long-term exposure data, such as the presence of pollutants (on a time scale of years or decades). In other words, GEE are not simply ‘big data,’23 but time series of big data.
Therefore, their very nature requires BMI implementation studies of novel informatics architectures that integrate recent data warehousing efforts, such as i2b224 and tranSMART25 which are aimed at managing phenotypes and molecular data, with NoSQL (Not only SQL) frameworks26 such as CouchDB27 and Cassandra,28 which are naturally scalable and can be implemented in a distributed environment, storing petabytes of data.
BMI also has a critical contribution to make in organizing these data conceptually, relying on a knowledge representation layer, based on suitable domain ontologies. For instance, the unstructured nature of GEE data requires extra effort in cataloging the information sources and the type of queries that can be performed in NoSQL repositories, making metadata essential to assess the quality of evidence that can be extracted from such data by suitable analytics.29
The types of analytical methods that are suited to cope with distributed, heterogeneous data is another area that needs particular attention from BMI, both in terms of scope—including information-based correlation analysis, detection of emergent phenomena, visualization, trends, and temporal abstractions—and in terms of computational efficiency.30 Pioneering efforts have been already made in the area of association studies with environmental/genomic/phenomic data,31–36 comprehensive molecular self-monitoring,37 the data collection surveys carried out by some direct-to-consumer genomic-testing companies,38 and previous epidemiological studies. However, those approaches lack the comprehensive treatment of data that is proposed here, namely coverage of individual exposure data facilitated by new technologies and sensors.
Last but not least, the design and implementation of a global BMI infrastructure for GEE data raises fundamental issues of security, privacy, and national and international legal compliance. These issues are related to the three-fold goal that a GEE-enabled biomedical research information system may pursue.
In the first case, of population-based analysis, the main concern is the implementation of a secure and reliable system for data gathering and data anonymization, that is, permanently and completely removing personal identifiers from data so that they can no longer be re-associated with an individual in any manner. This is a true challenge given the nature of GEE data, but could be achieved by providing aggregated data as advocated by the European Union eHealth Taskforce under the theme ‘Liberate the data.’39
A second, more complex issue is also one whose resolution is potentially much more valuable. This entails the definition of up-to-date strategies and policies for managing GEE data for clinical research at the individual level, even if de-identified, within the proper biomedical research governance infrastructure, including careful management of informed consent and risk management.40
Lastly, a cornerstone of a GEE-enabled biomedical research information system is the issue of building and maintaining a personal health record capable of including all clinical, genetic, and exposome data in a virtual repository. This must be under the ultimate control of ‘participatory biocitizens,’41 who may grant access for clinical care, clinical research, or epidemiological studies on a ‘my data my decision’ basis.39
Ways forward
In this article we have focused only on GEE, the first of Wild's2 three categories of exposures, but the complexity and volume of data exponentially increase when we incorporate the other two categories (table 2).
Examples of the data of interest for future information systems
| Group | Subgroup | Measure |
|---|---|---|
| Exposome | General external | Climate |
| Education | ||
| Socio-economical aspects | ||
| Natural and built environment | ||
| Specific external | Noise, humidity, CO, NOx, temperature, O3, radiation, particulate matter | |
| Medication, nanomaterials, medical procedures | ||
| Sedentary behaviors, physical activity | ||
| Smoking, diet, sleep, alcohol consumption | ||
| Infectious agents | ||
| Internal | Metabolites, hormones, oxidative stress, inflammation | |
| Phenome | Molecular traits | Gene expression, proteomics |
| Lipids, HDL, triglycerides | ||
| Cellular traits | Signaling pathways | |
| Cell cycle, apoptosis | ||
| Cell migration | ||
| Tissue/organ traits | Organ malformations, morphology, medical imaging | |
| Blood pressure | ||
| Organismal traits | Body mass index, weight, height | |
| Disease phenotypes | Pathologies | |
| Behavior | Stress, mood | |
| Endophenotypes | Cholesterol, immunoglobulins | |
| Genome | Sequence information | Whole genome, exome |
| Genomic variation | Single nucleotide variants (SNPs, mutations, …), structural variants (CNVs, In/Dels, …). | |
| Haplotypes | Blocks of variants | |
| Epigenomics | Methylation profiles |
| Group | Subgroup | Measure |
|---|---|---|
| Exposome | General external | Climate |
| Education | ||
| Socio-economical aspects | ||
| Natural and built environment | ||
| Specific external | Noise, humidity, CO, NOx, temperature, O3, radiation, particulate matter | |
| Medication, nanomaterials, medical procedures | ||
| Sedentary behaviors, physical activity | ||
| Smoking, diet, sleep, alcohol consumption | ||
| Infectious agents | ||
| Internal | Metabolites, hormones, oxidative stress, inflammation | |
| Phenome | Molecular traits | Gene expression, proteomics |
| Lipids, HDL, triglycerides | ||
| Cellular traits | Signaling pathways | |
| Cell cycle, apoptosis | ||
| Cell migration | ||
| Tissue/organ traits | Organ malformations, morphology, medical imaging | |
| Blood pressure | ||
| Organismal traits | Body mass index, weight, height | |
| Disease phenotypes | Pathologies | |
| Behavior | Stress, mood | |
| Endophenotypes | Cholesterol, immunoglobulins | |
| Genome | Sequence information | Whole genome, exome |
| Genomic variation | Single nucleotide variants (SNPs, mutations, …), structural variants (CNVs, In/Dels, …). | |
| Haplotypes | Blocks of variants | |
| Epigenomics | Methylation profiles |
Moreover, the internal exposome category (eg, metabolism, hormones, oxidative stress) can be measured using molecular biomarkers, reinforcing the points this article makes about data. Furthermore, these data too can be collected not only through sophisticated equipment available in institutions, but also through personalized, real time, continuous input from affordable devices and DIY services.
As already mentioned, it is worthwhile noticing that Wild's classification looks at the problem mainly from the data collection angle. As a matter of fact, BMI may not only provide instruments for data analysis but also tools for data representation and memorization, which may allow a clear description of the information and its consequent integration into an information system. For example, well-known disease nosology systems that include behavior and exposures, like SNOMED, provide clean, albeit orthogonal to Wild's view, ways to describe exposure factors, by giving different axes (ie, Living organisms; Physical agents, activities, and forces; Chemical, drugs and biological products) of classification. Such axes may be then properly exploited when the exposome is fitted into, for example, an electronic medical record.
What are the implications of a new paradigm for the discipline of BMI, one that recognizes genome, phenome, and exposome data, and their intricate interactions as the basis for biomedical research now and for clinical care in the near future?
The new generation of researchers in BMI should be familiar with the main methods and technological solutions required for the management of these new types of data (including big data, sensors, privacy and security, ontologies, systems analysis, and advanced visualization including geospatial systems). The new data types and sources may complement other studies and provide insights that are useful to understand the risks and the causes of the development of disease phenotypes. This has important consequences for the way we design BMI training programs and for the way we structure and specify the underlying competencies of experts in the discipline. In connection with this, the organization of BMI forums for professional development and knowledge exchange may need review to ensure sufficient scope for both established and new topics and themes.
The development of new information systems capable of linking these new data types and sources with personal health records could entrench recognition of the role of BMI expertise within other areas of biomedical research and development. And BMI has all the potentials, and tools, including a collection of ontologies, terminologies, and standards, to deal with such a challenge. There will be growing expectation that biomedical research routinely will include the design, implementation, and evaluation of comprehensive data-rich environments, in which to investigate the causative elements associated with pathologies to improve risk profiling, and so to contribute to advancing preventive medicine. To our knowledge no-one yet is fully engaged in realizing the vision proposed in this article, although recent initiatives probably will require many of the elements described herein.42,43
Lastly, the way we think about the contribution of BMI as a discipline will need to have regard for new insights that the exposome will bring, into the connections between human health and the health of the biosphere. BMI may increasingly support shared decision making in settings beyond traditional health sciences.
Contributors
FMS designed the general structure of the paper, prepared the figures and table, contributed to writing the ‘Ways forward’ section, and reviewed the paper. KG critically revised the paper and edited it. RB wrote the ‘The challenges' section, provided technical oversight, and reviewed the paper. GLC contributed to writing the ‘The opportunities' section and provided scientific oversight.
Competing interests
None.
Provenance and peer review
Not commissioned; externally peer reviewed.


Comments