This is an exceptional book. The author is very clear that this book has been written as a course and is not suitable for ‘random access’. It is aimed at researchers in natural and social scientists from doctoral students onwards, but it contains such a depth of understanding and exposition of the principles and practice of statistical modelling that it deserves to be read carefully by a much wider audience even if it must be read from cover to cover.

Strengths of the book include this clear conceptual exposition of statistical thinking as well as the focus on applying the material to real phenomena. The author makes plentiful use of thought-provoking metaphors ranging from the ‘statistical Golems’ in Chapter 1 through to a very accessible explanation of Markov chain Monte Carlo methods thanks to planning ‘Good King Markov's’ visits to different islands in his kingdom in Chapter 8.

The first three chapters set out a framework for understanding and investigating statistical models and the way that they are intended to represent natural or social phenomena. Chapters 4 and 5 cover first simple and then multiple linear regression with interactions covered in Chapter 7. Chapter 5 contains a particularly succinct and finely polished overview of multiple regression tackling subjects such as multicollinearity in a thoughtful way. It does not just address the technical issues but also deals with often neglected or downplayed interpretative problems that arise in this context. Omitted variables and post-treatment bias are covered in the same chapter.

Overfitting is a core theme developed throughout the book, mentioned right at the start, but then a whole chapter (Chapter 6) is devoted to overfitting as well as regularization and information criteria. Regularization is particularly well explained especially (in my opinion) in relation to multilevel models. As noted Chapter 8 provides an overview of Markov chain Monte Carlo methods with considerable insight into some problems that can occur (such as problems emerging from using flat priors used with variance parameters).

Chapters 9–14 then provide an overview of a wide range of practically important methods ranging from standard generalized linear models through to mixtures, multilevel models and handling missing data. The book does provide a fair degree of the mathematical theory and introduces topics such as entropy which are not always covered in this context. It does not claim to be a definitive formal mathematical text. It relies heavily on computer exploration to engage the reader with the key issues whether these concern concepts, interpreting phenomena or understanding methodology. For that it is supported by a very thorough and well-constructed R package ‘rethinking’ which facilitates carefully designed access to the Stan package (Carpenter et al., 2016).

Effort has gone into selecting interesting data sets both for examples and for end-of-chapter exercises. The book contains a good selection of extension activities labelled according to difficulty. The book itself contains optional paragraphs labelled ‘rethinking’ or ‘overthinking’, indicating either further material or fine details.

I would unreservedly recommend this book to a wide audience interested in the principles of modern statistical modelling.

Reference

Carpenter
,
R.
,
Gelman
,
A.
,
Hoffman
,
M.
,
Lee
,
D.
,
Goodrich
,
B.
,
Betancourt
,
M.
,
Brubaker
,
M. A.
,
Guo
,
J.
,
Li
,
P.
and
Riddell
,
A.
(
2016
)
Stan: a probabilistic programming language
.
J. Statist. Softwr.
, to be published.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)