Understanding sequencing data as compositions: an outlook and review

Aitchison

J.

(

1986

)

The Statistical Analysis of Compositional Data

.

Chapman & Hall, Ltd

.,

London, UK

.

Aitchison

J.

(

2003

) A concise guide to compositional data analysis. In: 2nd Compositional Data Analysis Workshop; Girona, Italy.

Aitchison

J.

(

2008

) The single principle of compositional data analysis, continuing fallacies, confusions and misunderstandings and some suggested remedies. In: Proceedings of CoDaWork’08.

Aitchison

J.

,

Greenacre

M.

(

2002

)

Biplots of compositional data

.

J. R. Stat. Soc. Ser. C (Appl. Stat.)

,

51

,

375

–

392

.

Aitchison

J.

et al. (

2000

)

Logratio analysis and compositional distance

.

Math. Geol

.,

32

,

271

–

275

.

Anders

S.

,

Huber

W.

(

2010

)

Differential expression analysis for sequence count data

.

Genome Biol

.,

11

,

R106.

Baruzzo

G.

et al. (

2017

)

Simulation-based comprehensive benchmarking of RNA-seq aligners

.

Nat. Methods

,

14

,

135

–

139

.

Benjamin

A.M.

et al. (

2014

)

Comparing reference-based RNA-Seq mapping methods for non-human primate data

.

BMC Genomics

,

15

,

570.

Bian

G.

et al. (

2017

)

The gut microbiota of healthy aged chinese is similar to that of the healthy young

.

mSphere

,

2

,

e00327

–

e00317

.

Bliss

C.I.

,

Fisher

R.A.

(

1953

)

Fitting the negative binomial distribution to biological data

.

Biometrics

,

9

,

176

–

200

.

Boogaart

K.G.v.d.

,

Tolosana-Delgado

R.

(

2013a

) Descriptive analysis of compositional data. In:

Gentleman

R.

et al. (eds.)

Analyzing Compositional Data with R, Use R!

Springer

,

Berlin

, pp.

73

–

93

.

Boogaart

K.G.v.d.

,

Tolosana-Delgado

R.

(

2013b

) Fundamental concepts of compositional data analysis. In:

Gentleman

R.

et al. (eds.)

Analyzing Compositional Data with R, Use R!

Springer

,

Berlin

, pp.

13

–

50

.

Boogaart

K.G.v.d.

,

Tolosana-Delgado

R.

(

2013c

) Zeroes, missings, and outliers. In:

Gentleman

R.

et al. (eds.)

Analyzing Compositional Data with R, Use R!

Springer

,

Berlin

, pp.

209

–

253

.

Buccianti

A.

(

2013

)

Is compositional data analysis a way to see beyond the illusion?

Comput. Geosci

.,

50

,

165

–

173

.

Conesa

A.

et al. (

2016

)

A survey of best practices for RNA-seq data analysis

.

Genome Biol

.,

17

,

13.

Dillies

M.-A.

et al. (

2013

)

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

.

Brief. Bioinf

.,

14

,

671

–

683

.

Dohm

J.C.

et al. (

2008

)

Substantial biases in ultra-short read data sets from high-throughput DNA sequencing

.

Nucleic Acids Res

.,

36

,

e105.

Egozcue

J.J.

et al. (

2003

)

Isometric logratio transformations for compositional data analysis

.

Math. Geol

.,

35

,

279

–

300

.

Erb

I.

,

Notredame

C.

(

2016

)

How should we measure proportionality on relative gene expression data?

Theory Biosci

.,

135

,

21

–

36

.

Erb

I.

et al. (

2017

) Differential proportionality – a normalization-free approach to differential gene expression. In: Proceedings of CoDaWork 2017, The 7th Compositional Data Analysis Workshop; available under bioRxiv, pp. 134536.

Fernandes

A.D.

et al. (

2013

)

ANOVA-Like Differential Expression (ALDEx) analysis for mixed population RNA-Seq

.

Plos One

,

8

,

e67019

.

Fernandes

A.D.

et al. (

2014

)

Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16s rRNA gene sequencing and selective growth experiments by compositional data analysis

.

Microbiome

,

2

,

15.

Friedman

J.

,

Alm

E.J.

(

2012

)

Inferring correlation networks from genomic survey data

.

PLoS Comput. Biol

.,

8

,

e1002687.

Greenacre

M.

(

2009

)

Power transformations in correspondence analysis

.

Comput. Stat. Data Anal

.,

53

,

3107

–

3116

.

Greenacre

M.

(

2011

)

Measuring subcompositional incoherence

.

Math. Geosci

.,

43

,

681

–

693

.

Greenacre

M.

(

2017

). Towards a pragmatic approach to compositional data analysis. Technical Report 1554, Department of Economics and Business, Universitat Pompeu Fabra.

Griffith

M.

et al. (

2015

)

Informatics for RNA sequencing: a web resource for analysis on the cloud

.

PLoS Comput. Biol

.,

11

,

e1004393.

Head

S.R.

et al. (

2014

)

Library construction for next-generation sequencing: overviews and challenges

.

BioTechniques

,

56

,

61

. passim.

Jiang

L.

et al. (

2011

)

Synthetic spike-in standards for RNA-seq experiments

.

Genome Res

.,

21

,

1543

–

1551

.

Kurtz

Z.D.

et al. (

2015

)

Sparse and compositionally robust inference of microbial ecological networks

.

PLOS Comput. Biol

.,

11

,

e1004226

.

Law

C.W.

et al. (

2014

)

voom: precision weights unlock linear model analysis tools for RNA-seq read counts

.

Genome Biol

.,

15

,

R29

.

Li

J.-H.

et al. (

2015

)

Discovery of protein–lncRNA interactions by integrating large-scale CLIP-Seq and RNA-Seq datasets

.

Bioinf. Comput. Biol

.,

2

,

88

.

Lin

Y.

et al. (

2016

)

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster

.

BMC Genomics

,

17

,

Lovell

D.

et al. (

2015

)

Proportionality: a valid alternative to correlation for relative data

.

PLoS Comput. Biol

.,

11

,

e1004075

.

Lovén

J.

et al. (

2012

)

Revisiting global gene expression analysis

.

Cell

,

151

,

476

–

482

.

Mandal

S.

et al. (

2015

)

Analysis of composition of microbiomes: a novel method for studying microbial composition

.

Microb. Ecol. Health Dis

.,

26

,

Martín-Fernández

J.

,

Thió-Henestrosa

S.

(

2006

)

Rounded zeros: some practical aspects for compositional data

.

Geol. Soc. London Special Publ

.,

264

,

191

–

201

.

Martín-Fernández

J.

et al. (

1998

) Measures of difference for compositional data and hierarchical clustering methods. In: Proceedings of IAMG, Vol. 98, pp.

526

–

531

.

Mateu-Figueras

G.

et al. (

2011

) The principle of working on coordinates. In:

Pawlowsky-Glahn

V.

,

Buccianti

A.

(eds.)

Compositional Data Analysis

.

John Wiley & Sons, Ltd., West Sussex, UK

, pp.

29

–

42

.

Merino

G.A.

et al. (

2017

) A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies. Brief. Bioinform., doi: 10.1093/bib/bbx122.

Metzker

M.L.

(

2010

)

Sequencing technologies—the next generation

.

Nat. Rev. Genet

.,

11

,

31

–

46

.

Pearson

K.

(

1896

)

Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia

.

Philos. Trans. R. Soc. Lond. Ser. A, Contain. Papers Math. Phys. Character

,

187

,

253

–

318

.

Quinn

T.

et al. (

2017a

) Differential expression analysis of log-ratio transformed counts: benchmarking methods for RNA-Seq data. bioRxiv, 231175.

Quinn

T.P.

et al. (

2017b

)

propr: an R-package for Identifying Proportionally Abundant Features Using Compositional Data Analysis

.

Sci. Rep

.,

7

,

16252

.

Robinson

M.D.

,

Oshlack

A.

(

2010

)

A scaling normalization method for differential expression analysis of RNA-seq data

.

Genome Biol

.,

11

,

R25.

Robinson

M.D.

et al. (

2010

)

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

.

Bioinformatics

,

26

,

139

–

140

.

Saccenti

E.

(

2017

)

Correlation patterns in experimental data are affected by normalization procedures: consequences for data analysis and network inference

.

J. Proteome Res

.,

16

,

619.

Scott

M.

et al. (

2010

)

Interdependence of cell growth and gene expression: origins and consequences

.

Science

,

330

,

1099

–

1102

.

Seyednasrollah

F.

et al. (

2015

)

Comparison of software packages for detecting differential expression in RNA-seq studies

.

Brief. Bioinf

.,

16

,

59

–

70

.

Smyth

G.K.

(

2004

)

Linear models and empirical bayes methods for assessing differential expression in microarray experiments

.

Stat. Appl. Genet. Mol. Biol

.,

3

,

1.

Article3.

Soneson

C.

,

Delorenzi

M.

(

2013

)

A comparison of methods for differential expression analysis of RNA-seq data

.

BMC Bioinformatics

,

14

,

91.

Tarazona

S.

et al. (

2015

)

Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package

.

Nucleic Acids Res

.,

43

,

e140

–

e140

.

PubMed

Teng

M.

et al. (

2016

)

A benchmark for RNA-seq quantification pipelines

.

Genome Biol

.,

17

,

74.

Thomas

C.W.

,

Aitchison

J.

(

2006

)

Log-ratios and geochemical discrimination of Scottish Dalradian limestones: a case study

.

Geol. Soc. Lond. Special Publ

.,

264

,

25

–

41

.

Topa

H.

,

Honkela

A.

(

2016

)

Analysis of differential splicing suggests different modes of short-term splicing regulation

.

Bioinformatics

,

32

,

i147

–

i155

.

Urbaniak

C.

et al. (

2016

)

Human milk microbiota profiles in relation to birthing method, gestation and infant gender

.

Microbiome

,

4

,

1.

van den Boogaart

K.G.

,

Tolosana-Delgado

R.

(

2008

)

“compositions”: a unified R package to analyze compositional data

.

Comput. Geosci

.,

34

,

320

–

338

.

Wang

W.A.

et al. (

2014

). Comparisons and performance evaluations of RNA-seq alignment tools. In: 2014 International Conference on Electrical Engineering and Computer Science (ICEECS), pp.

215

–

218

.

Wang

Z.

et al. (

2009

)

RNA-Seq: a revolutionary tool for transcriptomics

.

Nat. Rev. Genet

.,

10

,

57

–

63

.

Washburne

A.D.

et al. (

2017

)

Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets

.

PeerJ

,

5

,

e2969

.

Williams

C.R.

et al. (

2017

)

Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq

.

BMC Bioinformatics

,

18

,