Construction of a high-density genetic map for hexaploid kiwifruit (Actinidia chinensis var. deliciosa) using genotyping by sequencing

Abstract Commercially grown kiwifruit (genus Actinidia) are generally of two sub-species which have a base haploid genome of 29 chromosomes. The yellow-fleshed Actinidia chinensis var. chinensis, is either diploid (2n = 2x = 58) or tetraploid (2n = 4x = 116) and the green-fleshed cultivar A. chinensis var. deliciosa “Hayward,” is hexaploid (2n = 6x = 174). Advances in breeding green kiwifruit could be greatly sped up by the use of molecular resources for more efficient and faster selection, for example using marker-assisted selection (MAS). The key genetic marker that has been implemented for MAS in hexaploid kiwifruit is for gender testing. The limited marker-trait association has been reported for other polyploid kiwifruit for fruit and production traits. We have constructed a high-density linkage map for hexaploid green kiwifruit using genotyping-by-sequence (GBS). The linkage map obtained consists of 3686 and 3940 markers organized in 183 and 176 linkage groups for the female and male parents, respectively. Both parental linkage maps are co-linear with the A. chinensis “Red5” reference genome of kiwifruit. The linkage map was then used for quantitative trait locus (QTL) mapping, and successfully identified QTLs for king flower number, fruit number and weight, dry matter accumulation, and storage firmness. These are the first QTLs to be reported and discovered for complex traits in hexaploid kiwifruit.


Introduction
Originating in China and Southeast Asia, kiwifruit (genus Actinidia) has been described as a woody, perennial, deciduous vine, producing cylindrical fleshy fruit (Schmid 1978). In the last 60 years, kiwifruit has become the largest horticultural export item for New Zealand (Ferguson 2004) generating NZD$2.3 billion in exports in 2019 (https://www.freshfacts.co.nz/). The two most commonly exported species are Actinidia chinensis var. deliciosa and A. chinensis var. chinensis. Outside of China, the main A. chinensis var. deliciosa cultivar is green-fleshed "Hayward," accounting for 80% of global kiwifruit production (Ferguson 2015). The genus Actinidia is very diverse in terms of morphological and genetic variation (Ferguson and Huang 2007). Naturally occurring ploidy levels are diploid, tetraploid, hexaploid, and octoploid (Guijun et al. 1994). The A. chinensis var. deliciosa genotypes (green-fleshed kiwifruit) from the New Zealand breeding program are hexaploid with six sets of 29 chromosomes (2n ¼ 6x ¼ 174) (Watanabe et al. 1990).
Polyploid species can be classified as allo-and autopolyploid. An allopolyploid is described as being comprised of sub-genomes originating from multiple related species and displaying chromosomal preferential pairing during meiosis. Notable allopolyploid crop species include hexaploid bread wheat (Triticum aestivum) and tetraploid canola (Brassica napus). By contrast, an autopolyploid is created during a genome duplication event and in general does not display preferential chromosome pairing. An analysis of A. chinensis var. deliciosa using restricted fragment length polymorphism (RFLP) markers proposed A. chinensis var. chinensis as one progenitor for hexaploid kiwifruit (Atkinson et al. 1997). However, additional progenitors contributing to at least two sets of homologs are unknown. Microsatellite markers used to track the inheritance of alleles in a hybrid population (A. chinensis var. deliciosa crossed with A. chinensis var. eriantha) suggested there is no preferential pairing between the chromosome homologs (Mertten et al. 2012). Therefore, A. chinensis var. deliciosa could be classified as an allopolyploid due to its homeologous subgenomes obtained from several species, but functionally pairing as an autopolyploid. Phenotypically, a significant increase in fruit weight is observed as the ploidy level increases (Li et al. 2013). This has been confirmed with the genome doubling of diploid kiwifruit to autotetraploid using colchicine (Wu et al. 2012).
Reference genome assemblies have been developed for diploid A. chinensis var. chinensis Pilkington et al. 2018;Wu et al. 2019), A. eriantha (Tang et al. 2019), and A. rufa (Zhang et al. 2015). Both the A. eriantha and A. chinensis genome v3.0 (Wu et al. 2019) were based on long read sequencing technologies incorporating high-throughput chromatin capture (Hi-C) methodologies with each genome assembly being assigned to the 29 pseudo-chromosomes. An assembly of a diploid A. chinensis selection (Pilkington et al. 2018), known as Red5, presents an improvement in terms of percentage of the assembly anchored to pseudo-chromosomes (only 6 Mb of the assembly were not assigned to chromosomes) and an improved gene annotation compared to the first draft assembly obtained from the cultivar A. chinensis Hongyang: [164 Mb unassigned to chromosomes ]. Linkage maps have been constructed for diploid Actinidia (Testolin et al. 2001;Fraser et al. 2009;Scaglione et al. 2015;Liu et al. 2017) using a range of molecular markers including amplified fragment length polymorphisms (AFLPs), simple sequence repeats (SSRs), and genotyping by sequencing (GBS) (Elshire et al. 2011). GBS has recently been the method of choice for building linkage maps in diploid kiwifruit, including RAD-seq (Scaglione et al. 2015;Liu et al. 2017), GBS (Tahir et al. 2019), and hybridization-based (Liu et al. 2016) approaches. The advantage of GBS is that this method can provide the sequence and haplotype data while generating the number of markers to sufficiently saturate all the homologs from a complex ploidy species.
Due to A. chinensis var. deliciosa requiring 3-5 years (Testolin et al. 2016) to establish a canopy before flower/fruit development occurs, advances in breeding green kiwifruit are constrained by the long generation time. Marker-assisted selection (MAS) would facilitate a more efficient and faster selections, but the high ploidy makes this a considerable challenge with only dominant traits such as gender testing (Gill et al. 1998) and the recently developed vitamin C marker being deployed (McCallum et al. 2019). The objective of this study was to construct a high-density linkage map for hexaploid green kiwifruit (A. chinensis var. deliciosa) using GBS to allow more complex traits to be mapped and selected for using MAS in the future.

Plant material
A F1 population from a bi-parental (A. chinensis var. deliciosa) cross between a hexaploid female ZE and hexaploid male 28 was used in this study (Supplementary Figure S1). Kiwifruit is dioecious, the male and female reproductive organs are on separate plants, which promotes outcrossing (allogamy). The cross was completed in November 2007 with the seeds from the resulting ZE28 family extracted in April 2008. To test the relatedness of the parents, the coefficient of coancestry (f XY ) was calculated using the pedigree structure (Bernardo 2002) and f XY ¼ 0.
The population was generated by germinating seedlings on which MAS gender testing was performed (Gill et al. 1998) and used to identify and reduce the male proportion prior to planting in the orchard. Thousand one hundred and sixty-eight seedlings were planted across four different trial blocks at the Plant & Food Research, Te Puke Research Center (Table 1) of which 86% of these were female. Blocks 3 and 52 were planted at a density of 10 plants per 6 m bay, with 4 m between rows (4167 plants/Ha) while blocks 14 W and 35 were planted with 14 plants (seven plants on each side of the row) at a density of seven plants in a 6 m bay with a row spacing of 5 m (4667 plants/Ha). Males or unidentified gender plants were at a density of one in each bay to ensure adequate pollination. Each seedling planted was a unique genotype.

Library preparation
For each individual, high molecular weight DNA was extracted from approximately 100 mg of leaf tissue using a standard cetyl trimethylammonium bromide (CTAB) protocol (Doyle 1987). The GBS method of Elshire et al. (2011) was used to obtain reduced representation of the genomes for the two parents and 278 individuals of their progeny selected for displaying the extremes of king flower per winter bud production (high or low) with a small number of individuals showing a median king flower production. GBS libraries were developed using the ApeKI restriction enzyme and set of 96 barcodes. Initial data were generated from GBS libraries from 96 individuals by Macrogen Ltd (https://dna.macro gen.com/eng/), and additional data were generated for these individuals as well as an additional 182 individuals by the Australian Genome Research Facility (http://www.agrf.org.au/). Samples were run on the Illumina HiSeq2000 (Illumina, San Diego, CA, USA) generating 100 bp single-end sequences.

Variant calling and filtering
Variants were called using Freebayes (Garrison and Marth 2012) with command line options "-p 6 -C 5 -k -min-mapping-quality 10 -genotype-qualities -use-mapping-quality." Variants that were homozygous for one parent and heterozygous for the other parent and estimated to be in one single dose (simplex Â nulliplex segregation) based on the allelic ratios in each sample were selected for further analysis. Raw variants were filtered using Freebayes based on the proportion of individuals successfully scored (call rate > 0.7) and maximum read depth (DP) of 30,000 (Supplementary Table S1).

Linkage map construction
Linkage maps for the six homologs of both parents and 29 chromosomes were constructed using simplex Â nulliplex markers and using the double-pseudo test cross strategy (Grattapaglia and Sederoff 1994) adapted to hexaploid. JoinMap V R 3.0 (https:// www.kyazma.nl/) was used to construct the LGs. Grouping of loci was achieved with a minimum LOD (logarithm of the odds) score of 5 and regression mapping was used for map calculation using the Kosambi mapping function. A first draft of the linkage map was developed and then manually checked for the presence of double recombinants, due to genotypic errors. During that process, genotypic calls that were anomalous and would have misplaced markers compared to their expected location within the reference genome were manually corrected. The single nucleotide polymorphism (SNP) markers mapped onto the final linkage map were named according to their location in the reference A. chinensis genome assembly of "Red5" (PS1.69.0) (Pilkington et al. 2018). GBS marker locations based on the read mapping done on version PS1.68.5 were converted to physical locations in the assembly version PS1.69.0 using custom perl scripts.

Collinearity of the genetic and physical maps and genome-wide recombination rates
The markers identified through the construction of the linkage map were arranged by their genetic distance (in cM) and physical distance (in Mb) in R version 1.2.5042 using the xyplot function in the lattice library, grouped by each homolog.

Phenotypic assessment and QTL mapping
Flowering and fruit attributes were measured from the ZE28 seedlings in four different flowering/fruiting seasons (Supplementary Table S2 In evaluating seasons one, two, three, and four, the total number of fruits per vine, the average fruit weight (total bulk fruit weight/number of fruit weighed), and the dry matter content were measured on all fruit-bearing vines. King flowers per winter bud and fruit firmness following 12 weeks of storage were measured in season four.
Fruit dry matter content was measured by cutting one equatorial slice of approximately 2 mm thickness and drying at 65 for 24 hours. The fruit dry matter content was calculated from the final dry weight and initial wet weight of the slices, recorded as a percentage of fresh weight. King flowers per winter bud were assessed in spring of the evaluation season as per the flowering protocol described by Snelgar et al. (1997). Fruit firmness, expressed as kilogram-force (kgf), was recorded 12 weeks postharvest using a GUSS penetrometer with a 7.9 mm diameter probe traveling at a downward speed of 5 mm/s. Penetrometer measurements were made on both the flat and round side of the fruit, at 90-degree angles, and then averaged.
For QTL detection, the traits were processed yearly (due to the climatic differences each season and the increase in the seedling maturity) and QTL detection with permutation tests were performed using MapQTL v5.0 (https://www.kyazma.nl/) and interval mapping. QTLs were declared significant if their LOD score was above 3.

Data availability
Raw DNA-seq reads are available at NCBI under BioProject number PRJNA721532. Genotypic data for both parents are presented in Supplementary Table S1. The phenotypic data used for QTL detection are presented, in the same order at the genotypic data, in Supplementary Table S2. Supplementary Figure S1 contains the pedigree structure of the ZE28 family. Supplementary material is available at figshare: https://doi.org/10.25387/g3.13218716. The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures, and tables.

Results
Generation of a bi-parental A. deliciosa cross A large bi-parental cross containing 1168 seedlings that displayed a wide distribution of yield characters was used in this study  (Table 3). Of particular interest was the variation of numbers of king flowers per winter bud. In the pedigree structure, the male parent was an accession from China, while the female was from the fourth generation of crossing from an accession originally imported into New Zealand. Both the female and male parents had inbreeding levels of 0 and so were unrelated to each other (coefficient of ancestry ¼ 0). From this population, 278 individuals were selected for genotyping based on the extremes (high and low) king flower production.

Sequencing data, SNP detection and filtering
To select an appropriate restriction enzyme that provides a balance between coverage depth and fragment length, an in silico digestion of the kiwifruit genome was modeled for 5 candidate restriction enzymes (Supplementary Figure S2). The ApeKI restriction enzyme was chosen for DNA fragmentation as it provided a good spread of fragments and was methylation sensitive reducing the frequency of repetitive DNA represented (Hilario et al. 2015). In total, 3.5 billion raw sequencing reads were obtained from GBS libraries of triplicates of both parents and the 278 selected segregating full-sib individuals, and 74.6% of sequencing reads successfully aligned to the reference genome assembly of A. chinensis (Table 4). Ten sets of reads from the full sibling seedlings were ultimately removed due to low read counts and poor mapping (<50%), providing genotyping for 268 individuals. The estimated mean and median mapping rate for the remaining population was 76% and 75%, ranging from 53 to 94%. A total of 435,901 variants were detected. The median of the mean depth for each sample was distributed around 40X ( Figure 1A). Both parents had higher read depth (138X and 127X for the female and male parent, respectively) due to three replicates being sequenced and merged. The call rate for SNPs, measured as the proportion of SNPs being successfully scored in the progeny and parents, was distributed over 90% ( Figure 1B). The mean SNP depth was mostly consistent with a median value of 44.2 ( Figure 1C) and the call rate, calculated as the proportion of sample being successfully scored, had a median of 0.99 ( Figure 1D).

Linkage map construction
SNPs that were present in the parents in a simplex Â nulliplex mode (e.g., AAAAAG Â AAAAAA), and reciprocal, were selected for the linkage map construction. Many genotyping errors were found in the data and visible as suspicious double recombinants occurring between markers positioned at small physical distance on the reference genome. After a manual correction of these errors, there were 3686 and 3940 mapped SNP markers across 183 and 176 LGs in the female and male parents, respectively (Table 4, Figure 2 and Supplemental Figure S3, A and B). In the female ZE parent, the map spanned a total of 14,957 cM, with an average of 81.7 cM per LG, an average marker distance of 4.6 cM, and a maximum distance between markers of 16.5 cM. For the male 28 parents, the map spanned a total of 10,869 cM with an average LG length of 107.6 cM. The average distance between the male parent markers was 5.8 cM, with the largest gap being 39 cM. The marker content and order along the linkage groups (LGs) were consistent with the Red5 genome assembly of A. chinensis, as demonstrated by the high collinearity between the reference genome and the LGs (Figure 3). On average, the female and male linkage maps covered respectively 63.6% and 75.2% of the genome assembly. The male linkage map was more saturated than the female map, with 25 out of the 29 chromosomes having six LGs, corresponding to the six expected homologs, and 119 out of 176 LGs (67.6%) covering greater than 75% of the chromosome physical length based on the Red5 assembly (PS1.69.0). In the female map, 17 of the 29 chromosomes had six homologs, and 89 out of 183 LGs (48.6%) covered more than 75% of the chromosome length (Figure 2).

Fruit and yield phenotypes
The industry standard cv "Hayward," of comparable age, were grown in the same block locations as the ZE28 progeny to act as a reference. Key phenotypic traits including the number of king flowers per winter bud, fruit number and weight, and size were measured over 1-4 seasons (Table 3 and Figure 4). As the ZE28 seedlings matured, there was an increase in the number of fruit produced as well as the average fruit weight each year. On average, "Hayward" produced less fruit than the ZE28 seedling population average, 13.8 vs 23.2 fruit per vine. However, fruit weight was higher for "Hayward," 99.5 vs 68.6 g, respectively. The dry matter content of "Hayward" was in the lower quartile of the seedling distribution for all years assessed, 17.4 vs 18.8% dry matter content. While the distribution of the ZE28 seedlings was wider than that of "Hayward" for fruit firmness 12-weeks postharvest, the average of "Hayward" was higher, 0.83 vs 1.38 kgf. The distribution of numbers of king flowers per winter bud ranged from 0.45 to 2.36 in Hayward with a mean of 1.23. The

QTL detection
Nine QTLs were identified with LOD scores greater than 3.0 across the four evaluation seasons, two from the female parent and seven from the male parent, for the following traits: the number of king flowers per winter bud, fruit number, fruit weight (g), dry matter content (%), fruit firmness 12-weeks post-harvest (Table 5). QTLs were inherited from both the female and male. The variance explained for these QTLs ranged between 5.6 and 14.5%. Permutation tests conducted on the QTLs indicated two were strong and the others were suggestive.

Discussion
The study demonstrates that it is possible to identify QTLs in a hexaploid population using a GBS approach. QTLs for king flowers per winter bud, dry matter content, fruit size, fruit number, and fruit firmness were discovered. These are the first QTLs to be reported in hexaploid kiwifruit. QTL detection was possible because the progeny size was sufficient for such analysis. It is well known that the ability to detect significant QTLs is based, among other things, on the population size (Beavis et al. 1994). The population size in previously published SNP-based diploid kiwifruit genetic linkage maps ranged from 94 individuals (Scaglione et al. 2015) to 230 individuals ( female vines when reporting a QTL for fruit size on chromosome 6 explaining 41.8% of the variance. Beavis et al. (1994) illustrated how mapping studies with 100 individuals were subjected to greatly inflated genetic estimates of QTL effect sizes. Our study using 278 individuals provided QTLs with a lower explained variance, which is probably better estimated than studies using smaller progeny sizes. Yield attributes and their associated components (plant height, branching types, root structures, and so on) are known to be highly polygenic traits influenced by the environment. This has been illustrated in multiple crops, with yield-associated QTLs previously published in tomato (Gur et al. 2011), wheat (Maccaferri et al. 2008), barley (Von Korff et al. 2008), and brassica (Shi et al. 2009). The fact that this linkage map was able to identify QTLs for highly polygenetic traits (yield and fruit firmness) illustrates its usefulness. GBS, as a process, will have errors introduced at each step in the process, both in the laboratory as well as in the data processing. Challenges exist between differentiating what is an error and what is an accurate sequence read. The most common errors in reading the GBS data are missing genotype information due to low sequence read depth (Bilton et al. 2018) and making the distinction between a real double crossover event or a false data point (Cartwright et al. 2007). A manual process was decided to be appropriate to review the presence of double cross-over events by taking into account the physical distances between markers. The removal of genotyping errors masked as double crossover events improved the quality of the linkage map. However, in comparison to other map construction software (Bilton et al. 2018), JoinMap typically results in an expanded genetic linkage map as well as some incorrect marker identification/order. A linkage map was built for both parents of a hexaploid A. chinensis var. deliciosa population, using GBS and SNP markers segregating in a simplex Â nulliplex fashion. This is the first map for hexaploid kiwifruit and a new addition to maps for polyploid Ericaceae species, as a linkage map of tetraploid blueberry was published (McCallum et al. 2016). Our linkage map produced a similar density of markers compared to what is reported in previous SNP and SSR-based maps of diploid kiwifruit (Liu et al. 2017) based on the average number of markers per LG (Testolin et al. 2001;Fraser et al. 2009;Scaglione et al. 2015). As this experiment was a hexaploid progeny and simplex Â nulliplex markers were used for building LGs corresponding to homologs, it was encouraging that markers were placed across all 29 chromosomes for each parent. In total, 183 and 176 LGs were built for the female and male parent, respectively. Theoretically, 174 (29 chromosomes Â 6 homologs) LGs were expected. The difference between the expected and obtained LG numbers can be attributed to the number of generations of selection in the parents and the inability to get full map saturation along each homolog. The female parent ZE has had more selection pressure than the male parent 28 and has a larger number of LGs, including some chromosomes for which up to nine LGs were built. The nine LGs may be a result of incomplete coverage of a single homolog resulting in two homologs of unknown order.
The plots of the genetic and physical maps (Figure 3) indicate the markers placed on the physical map were effective. An acceptable plot is where the markers are defined by a monotonically increasing function. Most of the homologs fit the monotonically increasing function, but there were exceptions on the distal ends of chromosome 6B, 8A, 21D, and 29B for the ZE parent and all homologs of chromosome 16, and 2F for parent 28. The slope of the plots is indicative of the recombination rates (Rezvoy et al. 2007). Outside the centromere location, which typically has lower rates of recombination, the recombination rates were appropriate for most homologs.
The strategy to use simplex Â nulliplex markers may explain the lack of full saturation for the ZE map as it may not be possible to find such markers segregating for homologs that are fixed. A potential solution for saturating the map would be to include other types of marker segregation, such as duplex Â multiplex, simple Â Â simplex, and higher dosage ratios. However, it was not possible to achieve this using this dataset because the read depth obtained (40X) was insufficient to accurately estimate the allelic dosages in the progeny. As the linkage map is based on SNP markers aligned to the reference genome of kiwifruit, it was possible to obtain an indication of how much of the genome was covered by markers. Again, the female ZE map is less saturated than the male 28 maps, possibly because of a higher selection pressure or higher homozygosity between homologs in the female. The male linkage map is well saturated with only four chromosomes for which the number of LGs is not equal to six. Future directions would include integrating the dosage effects as reported in chrysanthemum and potato (Hackett et al. 2013;van Geest et al. 2017) by using other genotyping technologies such as SNP arrays or hybridization-based sequencing technology. These technologies are capable of more accurate allelic    dosage estimation which is crucial for using polyploid-aware software for genetic linkage analysis, such as polyMapR (Bourke et al. 2018).