The analysis and modelling of high-dimensional gene expression and DNA methylation data are demanding research areas in bioinformatics and biostatistics. The author aims at providing practical guidelines for various methodological issues related to quality control, data pre-processing, data mining and further assessments of such data sets.

The eight chapters of the book can be broadly divided into four major parts. The first three chapters provide background description of Genome-Scale Genetic and Epigenetic Data, techniques for data generation, data quality control and data pre-processing. The fourth and fifth chapters discuss the role of data mining tools for (i) non-parametric and semi-parametric data screening techniques such as random forests and support vector machine, and (ii) clustering methods including hierarchical clustering, bi-clustering, cluster analysis for linear regression and joint clustering. Chapters six and seven focus on different factor selection methods such as LASSO, elastic net, adaptive LASSO, SCAD, Bayesian methods based on Zellner’s g-prior and non-parametric/semi-parametric approaches. Finally, chapter eight discusses network construction and analysis techniques.

The author of the book is a prolific researcher in the field of statistical methodology development in variable selection, joint clustering and Bayesian networks with application to phenotypic data, which is well reflected in the book.

The book does not go into detailed technicalities of different analytical tools and mainly focuses on their implementation and applications. For readers, basic background of data mining tools, Bayesian methods, semi-parametric/non-parametric techniques and little mathematical background is a prerequisite. A big asset of the book, which makes it remarkable contribution and ideal reference book for students of statistics, biostatistics, bioinformatics as well as applied workers/researchers interested in exploring high-dimensional genetic and epigenetic, is the well-illustrated applications and reproducible R codes for thoroughly analysing gene expression and DNA methylation data sets at the genome scale along with the ‘pipeline’ for analytical methods. The source links of datasets used in various examples are given and R codes can also be downloaded from the author’s website: https://www.memphis.edu/sph/contact/faculty_profiles/zhang.php.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)