Expanded functionality, increased accuracy, and enhanced speed in the de novo genotyping-by-sequencing pipeline GBS-SNP-CROP

Melo, Arthur T O; Hale, Iago

doi:10.1093/bioinformatics/bty1073

Bioinformatics (2019) doi: 10.1093/bioinformatics/bty873, 35, 1783–1785.

In the original article, there was an error in the formatting of Table 1.

Table 1.

Comparative summary of GBS-SNP-CROP v.4.0 performance, based on a set of simulated data from GBS-Pacecar

Pipeline^a	MR geno^b	Time (min)^c	Variants called^d	Type I error^e	Type II error^f	Accuracy^g
UNEAK	NA	8.5	2642	0.9%	92.5%	7.5%
GSC v.1.0	1	370.8	23 395	1.3%	34.1%	65.4%
GSC v.4.0	1	121.7	29 738	0.6%	15.6%	84.0%
	5	156.9	26 885	0.6%	23.6%	76.0%
	10	171.5	26 854	0.5%	23.7%	76.1%
	15	179.1	26 897	0.5%	23.6%	76.1%
	20	183.0	26 892	0.5%	23.6%	76.1%
	25	163.2	26 901	0.5%	23.5%	76.2%

Pipeline^a	MR geno^b	Time (min)^c	Variants called^d	Type I error^e	Type II error^f	Accuracy^g
UNEAK	NA	8.5	2642	0.9%	92.5%	7.5%
GSC v.1.0	1	370.8	23 395	1.3%	34.1%	65.4%
GSC v.4.0	1	121.7	29 738	0.6%	15.6%	84.0%
	5	156.9	26 885	0.6%	23.6%	76.0%
	10	171.5	26 854	0.5%	23.7%	76.1%
	15	179.1	26 897	0.5%	23.6%	76.1%
	20	183.0	26 892	0.5%	23.6%	76.1%
	25	163.2	26 901	0.5%	23.5%	76.2%

Note: In total, 25 000 SNPs and 10 000 indels were simulated across a genomic space of 100 000 GBS fragments. A total of 60 002 165 single-end reads were simulated for a population of 25 individuals (average of 2.4 million reads per genotype), with a sequencing error rate of 1.1%. See Supplementary Table S1 for more details

a

UNEAK = TASSEL-UNEAK; GSC = GBS-SNP-CROP.

b

The number of genotypes used for mock reference (MR) assembly.

c

Computation time (minutes) required to run the full analysis on a Unix workstation with 16 GB RAM and a 2.6 GHz Dual Intel processor.

d

Number of variants called by a pipeline (Note: a total of 35 000 variants were simulated, consisting of 25 000 SNPs and 10 000 indels).

e

Percentage of called variants that could not be validated (false positives).

f

Percentage of true, simulated variants that were not detected by the pipeline.

g

Overall accuracy: 100 * [number of validated variants/(total number of simulated variants + number of non-validated variants)].

Open in new tab

Table 1.

Comparative summary of GBS-SNP-CROP v.4.0 performance, based on a set of simulated data from GBS-Pacecar

Pipeline^a	MR geno^b	Time (min)^c	Variants called^d	Type I error^e	Type II error^f	Accuracy^g
UNEAK	NA	8.5	2642	0.9%	92.5%	7.5%
GSC v.1.0	1	370.8	23 395	1.3%	34.1%	65.4%
GSC v.4.0	1	121.7	29 738	0.6%	15.6%	84.0%
	5	156.9	26 885	0.6%	23.6%	76.0%
	10	171.5	26 854	0.5%	23.7%	76.1%
	15	179.1	26 897	0.5%	23.6%	76.1%
	20	183.0	26 892	0.5%	23.6%	76.1%
	25	163.2	26 901	0.5%	23.5%	76.2%

Pipeline^a	MR geno^b	Time (min)^c	Variants called^d	Type I error^e	Type II error^f	Accuracy^g
UNEAK	NA	8.5	2642	0.9%	92.5%	7.5%
GSC v.1.0	1	370.8	23 395	1.3%	34.1%	65.4%
GSC v.4.0	1	121.7	29 738	0.6%	15.6%	84.0%
	5	156.9	26 885	0.6%	23.6%	76.0%
	10	171.5	26 854	0.5%	23.7%	76.1%
	15	179.1	26 897	0.5%	23.6%	76.1%
	20	183.0	26 892	0.5%	23.6%	76.1%
	25	163.2	26 901	0.5%	23.5%	76.2%

Note: In total, 25 000 SNPs and 10 000 indels were simulated across a genomic space of 100 000 GBS fragments. A total of 60 002 165 single-end reads were simulated for a population of 25 individuals (average of 2.4 million reads per genotype), with a sequencing error rate of 1.1%. See Supplementary Table S1 for more details

a

UNEAK = TASSEL-UNEAK; GSC = GBS-SNP-CROP.

b

The number of genotypes used for mock reference (MR) assembly.

c

Computation time (minutes) required to run the full analysis on a Unix workstation with 16 GB RAM and a 2.6 GHz Dual Intel processor.

d

Number of variants called by a pipeline (Note: a total of 35 000 variants were simulated, consisting of 25 000 SNPs and 10 000 indels).

e

Percentage of called variants that could not be validated (false positives).

f

Percentage of true, simulated variants that were not detected by the pipeline.

g

Overall accuracy: 100 * [number of validated variants/(total number of simulated variants + number of non-validated variants)].

Open in new tab

This has been corrected and the corrected table appears below.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
February 2019	165
March 2019	29
April 2019	15
May 2019	17
June 2019	5
July 2019	4
August 2019	8
September 2019	74
October 2019	28
November 2019	13
December 2019	15
January 2020	21
February 2020	22
March 2020	2
April 2020	8
May 2020	8
June 2020	21
July 2020	5
August 2020	9
September 2020	13
October 2020	15
November 2020	6
December 2020	2
February 2021	7
March 2021	10
April 2021	9
May 2021	6
June 2021	9
July 2021	12
August 2021	3
September 2021	3
October 2021	2
November 2021	4
December 2021	3
January 2022	7
February 2022	5
March 2022	1
April 2022	3
May 2022	4
June 2022	5
July 2022	6
August 2022	9
September 2022	10
October 2022	10
November 2022	2
December 2022	1
January 2023	2
February 2023	6
March 2023	2
April 2023	8
May 2023	4
June 2023	2
July 2023	8
August 2023	7
October 2023	4
November 2023	5
December 2023	5
January 2024	8
February 2024	4
March 2024	3
April 2024	1

Article Contents

Expanded functionality, increased accuracy, and enhanced speed in the de novo genotyping-by-sequencing pipeline GBS-SNP-CROP

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

Expanded functionality, increased accuracy, and enhanced speed in the de novo genotyping-by-sequencing pipeline GBS-SNP-CROP

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only