Abstract

Summary: snp.plotter is a newly developed R package which produces high-quality plots of results from genetic association studies. The main features of the package include options to display a linkage disequilibrium (LD) plot below the P-value plot using either the r2 or D′ LD metric, to set the X-axis to equal spacing or to use the physical map of markers, and to specify plot labels, colors, symbols and LD heatmap color scheme. snp.plotter can plot single SNP and/or haplotype data and simultaneously plot multiple sets of results. R is a free software environment for statistical computing and graphics available for most platforms. The proposed package provides a simple way to convey both association and LD information in a single appealing graphic for genetic association studies.

Availability: Downloadable R package and example datasets are available at http://cbdb.nimh.nih.gov/~kristin/snp.plotter.html and http://www.r-project.org

Contact:nicodemusk@mail.nih.gov

1 INTRODUCTION

Genetic association studies have been an important strategy for identifying susceptibility genes for a range of diseases including Alzheimer disease, deep vein thrombosis, inflammatory bowel disease, hypertriglyceridemia, diabetes and schizophrenia (Morton 2005). Single nucleotide polymorphisms (SNPs) are often used to test for a statistical association between a disease phenotype and single markers or multiple markers via haplotype-based analyses. SNPs may be tightly linked and exhibit correlation or linkage disequilibrium (LD). Knowledge of LD aids in the selection of SNPs and haplotypes to be examined for association with a disease (Abecasis et al., 2005) and in localizing a putative causal variant. Given the importance of LD to genetic association studies, researchers often plot the results of association studies in relation to LD present in the chromosomal region or gene examined. However, researchers often create the LD plot and association result plot separately using different software, which can lead to difficulty in aligning the two plots, making the resulting graphic unclear. We propose snp.plotter, which produces Portable Document Format (PDF) or Encapsulated Postscript (EPS) images of genetic association results using single SNP and/or haplotype data with a corresponding LD heatmap in one correctly aligned graphic.

2 SOFTWARE OVERVIEW

snp.plotter is a package for R, the freely available statistical computing and graphics environment, which is available for several platforms including Windows, MacOS and UNIX/Linux (R Development Core Team, 2006). Nearly all aspects of the images produced by snp.plotter are customizable including labels, symbols, colors and color schemes, LD metric, graph P-value threshold, Y-axis scale, and lines corresponding to user specified P-value thresholds. snp.plotter has the ability to visualize multiple SNP and haplotype association sets of results. Haplotype results can be plotted using either global and/or individual haplotype P-values. P-value results may be plotted using physical spacing or can be evenly spaced. Even spacing of P-values aids in elucidating results in areas with dense SNP maps. Figures are produced in two print sizes (3.5 and 7 inches) corresponding to one and two columns, respectively, on a printed page in resolution-independent formats (PDF and EPS) for ease of use in manuscript preparation. snp.plotter figures can be easily imported into LaTeX documents, and due to the resolution-independent formats used, figures can be converted into raster image formats such as JPG, PNG and BMP without a loss in quality.

3 DATA INPUT

snp.plotter uses four different types of input files: configuration files, single SNP and haplotype file for each result set, and genotype data; all files used are plain-text and tab-delimited. The configuration file is the preferred method of running snp.plotter because it allows users to save preferred settings and avoids the difficulty of writing extended R commands.

SNP.FILE=snp20_ss.txt,snp20_ss2.txt

HAP.FILE=snp20_haplo.txt,snp20_haplo2.txt

GENOTYPE.FILE=snp20_geno.txt

DISP.LDMAP=TRUE

COLOR.LIST=blue,red

SYMBOLS=circle-fill,square

LD.TYPE=rsquare

IMAGE.TYPE=pdf

The single SNP result set,

SNP.FILE
, includes four necessary columns: ASSOC, SNP.NAME, LOC and SS.PVAL corresponding to positive or negative association (indicating susceptibility or protective alleles), a SNP label, the location and a P-value for each SNP.

ASSOC
 
NAME
 
LOC
 
SS.PVAL
 
+
 
rs1
 
126272509
 
0.065
 
 
rs2
 
126274467
 
0.029
 
+
 
rs3
 
126275017
 
0.046
 
 
rs4
 
126275750
 
0.005
 
ASSOC
 
NAME
 
LOC
 
SS.PVAL
 
+
 
rs1
 
126272509
 
0.065
 
 
rs2
 
126274467
 
0.029
 
+
 
rs3
 
126275017
 
0.046
 
 
rs4
 
126275750
 
0.005
 

Haplotypes are specified using three necessary columns: ASSOC, GBL.PVAL and IND.PVAL, corresponding to positive or negative association, a global P-value, and an individual P-value for each haplotype followed by a set of columns of SNPs containing the corresponding haplotypes. Haplotypes are presented with the major allele given as 1 and the minor allele as 2; haplotype variants for a set of SNPs should be grouped together in the file. SNP labels in HAP.FILE must be the same as in SNP.FILE, and only SNPs with corresponding haplotypes need to be included.

ASSOC
 
G.PVAL
 
I.PVAL
 
rs1
 
rs2
 
rs3
 
rs4
 
 
0.015
 
0.004
 
1
 
1
 
1
 
+
 
0.015
 
0.062
 
1
 
2
 
2
 
+
 
0.075
 
0.079
 
1
 
1
 
1
 
+
 
0.075
 
0.039
 
2
 
2
 
2
 
ASSOC
 
G.PVAL
 
I.PVAL
 
rs1
 
rs2
 
rs3
 
rs4
 
 
0.015
 
0.004
 
1
 
1
 
1
 
+
 
0.015
 
0.062
 
1
 
2
 
2
 
+
 
0.075
 
0.079
 
1
 
1
 
1
 
+
 
0.075
 
0.039
 
2
 
2
 
2
 

Genotype data are formatted in modified LINKAGE format pedigree files; this marker information is used in the creation of LD plots and may be based on the controls from a case-control study or the founders in a family-based study. An optional file type can be used to specify color schemes for LD plots; PALETTE.FILE colors are hexadecimal HTML color codes with one color per line. The first and last colors correspond to the lowest and highest value of the chosen LD metric, respectively.

4 snp.plotter USAGE

The package makes use of the grid graphics package for creation and placement of individual graphic elements, and the

genetics
package is used for the calculation of linkage disequilibrium (Warnes and Leisch, 2005). Modified code from the
LDheatmap
package is used to create a LD heatmap (Shin et al., 2006). Once snp.plotter and its dependencies are installed, snp.plotter can be loaded into R using this command:

library(snp.plotter)

snp.plotter is then run using the following command; this command produces the desired figure in the current working directory:

snp.plotter(config.file="config.txt")

In addition, there is an optional web interface for snp.plotter utilizing the Rpad R package for download. The web interface is best suited to intranet environments since users have complete access to any command in R and any system command (Short et al., 2005). snp.plotter must be installed on the machine running Rpad. Instructions for server deployment are presented on the Rpad website. The interface includes the majority of features, but is limited to one result set. The snp.plotter interface can be extended with basic knowledge of HTML and R to manipulate options presented or to perform additional analysis the researcher may require.

5 EXAMPLE

The HapMap Project catalogs SNPs from populations with African, Asian and European ancestry (The International HapMap Consortium, 2005). Sample data for 20 SNPs was obtained from HapMap and two case-control populations with 500 cases and 500 controls were simulated using the Simulation of Haplotype Heterogeneity, Interaction and Population Stratification (SH2IPS) R package (Nicodemus and Luna, 2006). Logistic regression was used to determine association of each SNP with the disease phenotype. Haplotypes were analyzed using haplo.stats to evaluate disease association of haplotypes using a 3-SNP sliding window (Schaid et al., 2002). The results are presented in Figure 1 using snp.plotter. Single SNP and global haplotype P-values are shown for the two populations; the adjoining LD plot uses the r2 metric.

Fig. 1.

Association results and LD presented using snp.plotter of two simulated populations using data obtained from HapMap; haplotypes are indicated by sample symbols connected by a solid line, and single SNPs are represented by single symbols. Dotted lines represent P-value thresholds.

Fig. 1.

Association results and LD presented using snp.plotter of two simulated populations using data obtained from HapMap; haplotypes are indicated by sample symbols connected by a solid line, and single SNPs are represented by single symbols. Dotted lines represent P-value thresholds.

ACKNOWLEDGEMENTS

We are grateful to Dr Daniel Weinberger, Dr Steven Huffaker, and Anushka Aqil for comments and feedback and to Dr Richard Coppola for help with Rpad.

Conflict of Interest: none declared.

REFERENCES

Abecasis
GR
, et al.  . 
Linkage disequilibrium: ancient history drives the new genetics.
Hum. Hered
 , 
2005
, vol. 
59
 (pg. 
118
-
124
)
The International HapMap Consortium
A haplotype map of the human genome.
Nature
 , 
2005
, vol. 
437
 (pg. 
1299
-
1320
)
Morton
NE
Linkage disequilibrium maps and association mapping.
J. Clin. Invest.
 , 
2005
, vol. 
115
 (pg. 
1425
-
1430
)
Nicodemus
KK
Luna
A
Simulation of haplotype heterogeneity, interaction, and population stratification. R package version 1.0
2006
R Development Core Team
R: a language and environment for statistical computing
2006
Schaid
DJ
, et al.  . 
Score tests for association between traits and haplotypes when linkage phase is ambiguous.
Am. J. Hum. Genet.
 , 
2002
, vol. 
70
 (pg. 
425
-
434
)
Shin
J
, et al.  . 
LDheatmap: Graphical display of pairwise linkage disequilibria between SNPs. R package version 0.2
2006
Short
T
Grosjean
P
Rpad: workbook-style, web-based interface to R. R package version 1.1.1
2006
Warnes
G
Leisch
F
Genetics: population genetics. R package version 1.2.0
2005

Author notes

Associate Editor: Martin Bishop

Comments

0 Comments