Two classes of package user might be interested in this book: users who are happy with their pack-age but fancy a few extensions in R to add novel analyses or to check results, and users who fancy a completely new approach and perhaps want to start programming from scratch. According to the preface, readers will become able to read data into R, manage their data, create publication quality graphs and run basic analyses. The claim is,

‘If you know SAS or SPSS, reading this book should be a nice leisurely step before diving into a book [on programming in R]’.

Regretfully, as a leisure experience, this seemed to me nearer to being stuck in a bank-holiday traffic jam.

Having used SPSS since 1973, and watched with interest the growing use of R without yet participating, I would put myself in the first class of users. It would be very useful at times to extend SPSS and to take advantage of the techniques that are contributed through R libraries. Users of SAS, and of other statistics packages, will hopefully see parallels. SPSS offers a convenient link through the SPSS-R integration package from the company Web site. Even this revealed the capacity of software to irritate. Install SPSS (version 17.0); fetch the latest R (version 2.8); then run the integration package. Oops, it will not accept R later than version 2.7. So uninstall R, fetch previous version, and rerun the interface.

Package users may prefer to use R through a graphical user interface such as Commander. Although this receives one mention, the book focuses exclusively on the core R language and selected libraries.

The first few pages introduce R, noting that its workspace can contain objects of various types and an SPSS or SAS data set corresponds to a data frame. Then some examples of functions and output manipulation hint at the command structure, but there is no overview or philosophy of object-oriented programming. The suggestion,

‘You can use R while knowing very little about it’,

appears refuted by each sudden introduction of a library to get round an apparent deficiency. I would recommend reading the documents that come with the R installation, such as ‘An introduction to R’, in parallel with the early chapters.

Further chapters plunge into examples for getting data into a data frame and various data manipulations. About 250 pages work through features of SAS and SPSS and show how they can be emulated in R. Why would an experienced package user not follow the initial advice, prepare their data set as usual, and call R functions only for new stuff? In addition, too often the impression is left that the R approach is fragile and dangerous. I hope that I am not just picking sour cherries in noting the following.

‘Occasionally two packages will have functions with the same name. That can be very confusing until you realise what is happening’ (page 14).

‘Help files are written for intermediate to advanced users [and] can be somewhat intimidating at first’ (page 41).

‘Variable selection in R is both more flexible and quite a bit more complex’ (page 103).

‘WARNING! The attach function is hazardous to variable creation’ (page 149).

‘The rbind function needs both data frames to have the exact same variable names. Luckily, Hadley Wickham’s reshape package has a function that binds whichever variables it finds that match and fills in missing values for those that do not [just like SAS or SPSS]’ (page 188).

‘… but also occasionally leads to confusion among names. Note that R comes with a function named reshape, and the Hmisc package has one named reShape … but the reshape package is the one we are using’ (page 219).

‘Merging aggregates with original data … is an area in which R has a distinct advantage over SAS and SPSS [by doing] both multilevel calculations and selections in a single step. … [In SAS and SPSS] you would have to create the aggregate-level data and then merge it back into the individual-level dataset. R can use this approach too, and as the number of levels increases, it becomes more reasonable to do so’ (pages 198–199).

There are few misprints, but some occur in contexts that are likely to confuse.

‘The where function gets the index values for the TRUE values of a logical vector’ (page 130)

is followed by examples of which. SYSMIS in SPSS comparisons should be $SYSMIS.

The methods of data analysis that are described cover graphics (three chapters, describing two inconsistent plotting systems) and, finally, one chapter of basic analyses (correlation, regression and analysis of variance).

This book reports a very interesting academic exercise comparing the features of two major packages and the current language of choice, but it appeals more to computer scientists than to statisticians. It is not a ringing endorsement of R. Users of the R integration route will find too much detail duplicating their package use, and those who want to learn R will find too little systematic description.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)