Summary: The package adegenet for the R software is dedicated to the multivariate analysis of genetic markers. It extends the ade4 package of multivariate methods by implementing formal classes and functions to manipulate and analyse genetic markers. Data can be imported from common population genetics software and exported to other software and R packages. adegenet also implements standard population genetics tools along with more original approaches for spatial genetics and hybridization.
Availability: Stable version is available from CRAN: http://cran.r-project.org/mirrors.html. Development version is available from adegenet website: http://adegenet.r-forge.r-project.org/. Both versions can be installed directly from R. adegenet is distributed under the GNU General Public Licence (v.2).
Supplementary information:Supplementary data are available at Bioinformatics online.
Genetic markers are now widely used in many fields of population biology, and can be analysed using various approaches. Among these, multivariate methods such as principal component analysis (PCA) are compelled to play an important role because they can summarize the genetic variability without making strong assumptions about an evolution model: they do not rely on Hardy–Weinberg equilibrium, nor do they suppose the absence of linkage disequilibrium. This is especially valuable when no or very little information is known about the system under study, as is frequent in landscape genetics (Manel et al., 2003). Recently, multivariate methods have proven useful to assess the consensus genetic structuring among a set of genetic markers (Laloë et al., 2007), as well as to investigate the spatial pattern of the genetic variability (Jombart et al., in press). However, multivariate methods currently available in population genetics software are very restricted, despite the fairly large number of these programs (Excoffier and Heckel, 2006). An exception to this is the R software (R Development Core Team, 2008) which contains both packages devoted to multivariate methods like ade4 (Chessel et al., 2004; Dray et al., 2007), and packages dedicated to the analysis of genetic markers (http://cran.r-project.org/web/views/Genetics.html). Currently there are no bridges between multivariate analysis packages and genetic marker packages, and genetic markers data cannot be readily analysed using multivariate approaches. The purpose of adegenet is to build this connection. This package aims at extending the ade4 package so that genetic markers can be analysed using multivariate methods. This is achieved by defining new classes of objects to represent genetic markers, and providing functions to import, export and manipulate these objects. Moreover, adegenet also implements some usual population genetics methods, as well as more original tools for spatial genetics and data simulation. This article presents an overview of these functionalities.
2.1 Data representation
Basic genetic markers data are genotypes obtained for a set of markers, each allele being coded by a character string (Warnes, 2003). In order to use statistical methods, such information cannot be used directly, and needs to be recoded numerically into a matrix of allelic frequencies. In adegenet, allelic frequencies of genotypes are stored inside objects of the class
Great attention was devoted to developing input/output functions, because interoperability of data is crucial to facilitate data analysis. Until now, data could only be imported into R from FSTAT (Goudet, 2002) using the hierfstat package (Goudet, 2005). Currently, adegenet can read files from the software GENETIX (Belkhir et al., 1996–2004), STRUCTURE (Pritchard et al., 2000), FSTAT (Goudet, 2002), and Genepop (Raymond and Rousset, 1995), which are among the most common data formats in population genetics software (Excoffier and Heckel, 2006). Data can also be read inside R from a
To perform analyses at a population level, a
The last goal of adegenet is to implement more original methods, either by extending existing ones, or by proposing new methods. Hybridization between individuals from two
This example illustrates how a theoretical hybrid population would appear on a typology provided by a multivariate method. First, we load the required packages, and the dataset
To simulate a hybrid population, two parent breeds (Salers and Zebu) are isolated:
The hybrid population (‘Zebler’) is obtained using the
Now we seek a typology displaying the diversity between breeds. For this, the inter-class PCA (Dolédec et al., 1987) is appropriate: this modification of PCA maximizes the variance between populations (here, breeds), instead of the total variance. Missing data are replaced (
The resulting typology (Fig. 1) is obtained by:
The first principal axis of the analysis (Fig. 1) differentiates African and French breeds, while the second axis expresses the genetic variability between African breeds. Interestingly enough, the simulated hybrid population (Zebler) appears between its parent populations (Salers and Zebu).
The first contribution of the R package adegenet is to implement classes and functions to facilitate the multivariate analysis of genetic markers. This led to define new formal classes for genotypes (
The author is grateful to R-Forge for hosting adegenet, to P. Sólymos for his contribution and to A.-B. Dufour, S. Devillard, D. Laloë and D. Pontier for their constructive comments.
Conflict of Interest: none declared.