Abstract

R. Allan Reese reviews three textbooks for users of the statistical programming language, R

I have always been ambivalent about R. On the one hand, it is an admirable collective enterprise providing a tool for general statistical programming; on the other, it is pushed as the tool for teaching and using statistics at all levels, and eagerly adopted by novices “because it's free”. But to my mind, R is a computing language of great power. Picking the odd function from selected packages, as is done in the Beckerman and Hector books, does not give an adequate introduction for a new user.

Hector points out that, “Occasionally, there can be conflict between different packages”, but the book gives no hint as to how this would be detected. Both Beckerman and Hector assume data will be compatible with a data frame, and do not discuss the many other objects that R can manipulate, generate or coerce.

Biologists are commonly disparaged as less numerate than other scientists, but some appreciation of computer science is needed to safely use a tool such as R, and it cannot help to anthropomorphise the machine, as Beckerman does. Example quote: “This bit of R magic [emphasis added] is very important… it clears R's brain.” Elsewhere, an effect of finite-precision working is dismissed as “computer mumbo jumbo”. Any introductory text should stress the need to understand objects, their manipulation through functions, and to confirm the script agrees with your logic.

Both the Beckerman and Hector books contain typos and errors that may mislead. For example, Hector claims sample data of plant heights contained “a negative value lying apart from the others with a height of 12”, while the appendix that briefly introduces R incorrectly describes the assignment arrow (<-) as “a dash followed by the ‘greater than’ symbol”.

My special interest is graphics, a field in which R is said to excel. Beckerman notes, as have many authors, that “with R you can make outstanding publication quality … figures”. But users – and authors – should recognise the gap between “functional” and “publication quality” graphs. Both introductory books stress the importance of graphing data before analysis, and both opt for the ggplot2 package. Hector uses various themes and other parameters without discussion, except to note “how simple it is to use different symbol shapes, colours and backgrounds”. Beckerman gives a little more detail, under the heading “pimping your graph” (ugh!).

Where Beckerman's and Hector's books are scattergun and superficial, Murrell's is comprehensive and rigorous, but only on graphics. Novice R users might benefit from a complementary introduction to the language but should not find Murrell's text intimidating or cultish.

Murrell is at pains to stress that his book does not discuss which is the most appropriate graph type in given circumstances, so provides the tools to create bad graphs as much as good, and covers statistical graphs, maps, diagrams and infographics. All three books make ex cathedra claims that particular graph forms are deprecated, despite being in common use. But that implies that users are encouraged to make design choices, which R gives the power to do.

R Graphics clarifies that R includes a core graphics engine, abase graphics system, and grid graphics that underpin both the lattice and ggplot2 packages. Murrell explains in detail each system and how the levels of coding interact; plots drawn in the two high-level systems can be modified and combined using lower-level functions from grid or the base. You can switch between systems, but I would not try to “mix ‘n’ match”. People new to R will quickly learn that the power derives from a necessary complexity. They need to be clear and logical, not expect correct results by “magic”.

As books to encourage correct and safe statistical programming, Beckerman's and Hector's miss the mark, but Murrell's is a hit.

See bit.ly/sigreviews for a list of titles available for review. To request a book, contact reviews editor Allan Reese at reviews@rss.org.uk

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)