On the Importance of Variation: A High-Resolution Map of Copy Number Variants in Arabidopsis.

Linking genotype to phenotype is a major challenge in plant biology. Phenotypic variation observed between individuals of the same plant species is the consequence of a vast array of genetic variation, including single-nucleotide polymorphisms (SNPs) and small or large structural variants including

Linking genotype to phenotype is a major challenge in plant biology. Phenotypic variation observed between individuals of the same plant species is the consequence of a vast array of genetic variation, including single-nucleotide polymorphisms (SNPs) and small or large structural variants including copy number variation (CNV). CNVs are genetic polymorphisms in which a segment of the genome differs in number between individuals. CNVs may result from deletions, insertions, duplications, and larger amplifications occurring at coding and noncoding genomic regions. CNVs are formed through diverse genetic mechanisms including unequal crossing over, DNA double-strand break repair, and transposon activity. CNVs may have facilitated the evolution and diversification of plant phenotypes, mostly through modification of gene dosage. Nevertheless, a population-scale analysis of CNVs to resolve their diversity and consequences on plant phenotypes is still lacking. In a new study, Zmienko et al. (2020) provide a populationscale map of DNA CNVs in Arabidopsis (Arabidopsis thaliana) in an effort to facilitate the exploration of the genetic determinants of phenotypic variation in the model plant.
Taking advantage of the short-read whole-genome sequencing data released through the 1001 Genomes Consortium (2016), the authors called CNVs on 1064 Arabidopsis accessions (Zmienko et al., 2020). CNVs were identified with different tools, based on read-depth, read-pair, splitread, or hybrid strategies. Through this combined approach, the authors uncovered 19,003 CNVs in the Arabidopsis genome. In parallel, 70,137 large insertions/deletions (indels) were called only by read pairbased callers, to expand the repertoire of structural variation identified in Arabidopsis. Data are accessible at http://athcnv. ibch.poznan.pl through a user-friendly interface. After extensive benchmarking, analysis of the distribution and the genomic content of CNVs revealed an overlap with 18.3% of protein-coding genes, particularly enriched in defense-and stressresponse genes. Most of the CNVs were located in genomic segments that are enriched in transposable elements (TEs) and associated with high levels of genomic rearrangements. Conversely, CNVs overlapping with TEs tend to lie farther from genes compared with non-CNV TEs. Taken together, those two observations suggest that the distribution of CNVs is in part defined by the local genomic context (see figure).
The authors then harnessed the newly identified gene-associated CNVs as markers to infer Arabidopsis population structure. Compared with the classical approach of using SNPs, the CNV-based approach performed better at identifying the global distribution of each accession but was less sensitive for detecting genetic subgroups. Use of CNVs as markers for population structure analysis thus represents a valuable complement to SNP markers. At the individual accession level, up to 26.9% of gene-associated CNVs showed variation within the population. The authors investigated the consequences of change in gene copy number on phenotypic variation. Using CNV markers in a genome-wide association study, they show a strong association between resistance to Pseudomonas and change in the copy number of RPS5 and RPM1 resistance genes.
Genomic Distribution of the Newly Identified CNVs and Large Indels.
Circos plot showing the chromosomal distribution of CNVs and large indels identified by Zmienko et al. (2020). SNPs were identified by the 1001Genomes Consortium (2016. CNVs and large indels are distributed throughout the chromosomes and are most abundant in pericentromeric regions. (Adapted from Zmienko et al. [2020], Figure 2.) [OPEN] Articles can be viewed without a subscription. www.plantcell.org/cgi/doi/10. 1105/tpc.20.00257 This example shows that CNVs can be efficiently used as markers for a genome-wide association study.
The availability of a copy number variant map in Arabidopsis offers unprecedented perspectives for assessing the consequences of gene CNV on quantitative phenotypes in plants. The increasing availability and refinement of long-read DNAsequencing technologies (Michael et al., 2018;Jiao and Schneeberger, 2020) will help to resolve particularly complex CNVs and to provide further insights into the consequences of structural variation on the evolution of plant phenotypes.