Down, then up: non-parallel genome size changes and a descending chromosome series in a recent radiation of the Australian allotetraploid plant species, Nicotiana section Suaveolentes (Solanaceae)

Abstract Background and Aims The extent to which genome size and chromosome numbers evolve in concert is little understood, particularly after polyploidy (whole-genome duplication), when a genome returns to a diploid-like condition (diploidization). We study this phenomenon in 46 species of allotetraploid Nicotiana section Suaveolentes (Solanaceae), which formed <6 million years ago and radiated in the arid centre of Australia. Methods We analysed newly assessed genome sizes and chromosome numbers within the context of a restriction site-associated nuclear DNA (RADseq) phylogenetic framework. Key Results RADseq generated a well-supported phylogenetic tree, in which multiple accessions from each species formed unique genetic clusters. Chromosome numbers and genome sizes vary from n = 2x = 15 to 24 and 2.7 to 5.8 pg/1C nucleus, respectively. Decreases in both genome size and chromosome number occur, although neither consistently nor in parallel. Species with the lowest chromosome numbers (n = 15–18) do not possess the smallest genome sizes and, although N. heterantha has retained the ancestral chromosome complement, n = 2x = 24, it nonetheless has the smallest genome size, even smaller than that of the modern representatives of ancestral diploids. Conclusions The results indicate that decreases in genome size and chromosome number occur in parallel down to a chromosome number threshold, n = 20, below which genome size increases, a phenomenon potentially explained by decreasing rates of recombination over fewer chromosomes. We hypothesize that, more generally in plants, major decreases in genome size post-polyploidization take place while chromosome numbers are still high because in these stages elimination of retrotransposons and other repetitive elements is more efficient. Once such major genome size change has been accomplished, then dysploid chromosome reductions take place to reorganize these smaller genomes, producing species with small genomes and low chromosome numbers such as those observed in many annual angiosperms, including Arabidopsis.


INTRODUCTION
Chromosome number and genome size changes in angiosperms have been poorly explored in a phylogenetic context at the species level, especially in a post-polyploid (whole-genome duplication, WGD) context. Aside from WGD, it has long been known that amplification and deletion of highly repetitive DNA, especially retroelements, are mechanisms by which genome size (C-value) change occurs (Bennetzen and Kellogg, 1997;Wang et al., 2021). Genome size is thus a compromise between the activities of various mechanisms in the ancestry of a clade, including intensity of (retro)transposition and frequency of WGDs, that increase genome size and recombination-related processes that purge portions of the genome (Hawkins et al., 2009;Grover and Wendel, 2010;Michael, 2014).
In general, the packaging of chromatin, DNA break repair and activity of (retro)transposons and other repetitive elements is under epigenetic control (Fedoroff, 2012), probably influencing changes in genome size (through increases/decreases in repeat numbers and structure), frequency and occurrence of chromosome rearrangements and genome stability ( Van der Knaap et al., 2004;Schubert and Vu, 2016). In previous studies, a positive correlation between rates of genome size evolution and speciation across the angiosperm phylogenetic tree has been shown Puttick et al., 2015), but there has been no general relationship demonstrated between genome size, direction of chromosome number change and speciation. Angiosperm genome sizes have also been shown not to be directly proportional to ploidy (Leitch and Bennett, 2004;Hufton and Panopoulou, 2009;Rupp et al., 2010;Carta et al., 2020).
In some groups, e.g. Nicotiana section Suaveolentes (approx. 49 species, the subject of this study), high rates of genome size change and chromosome structural changes are correlated with a range of phenomena ) that are putatively promoting the high rates of speciation detected (Clarkson et al., 2017). A change in chromosome number is often more complex than simply fusing two into one (or vice versa) and in most groups involves multiple chromosome segment exchanges (Mandáková and Lysak, 2008); presumably these rearrangements directly alter linkage among genes in the segments that have been reorganized (Morjan and Rieseberg, 2004;Ortiz-Barrientos et al., 2016;Merot et al., 2020). Reduced recombination through the formation of new, often larger, linkage groups (i.e. fewer chromosomes) is hypothesized to protect highly advantageous allele combinations in incipient species (Stebbins, 1950), promoting local adaptation and increasing net diversification (Potter et al., 2017). Although putatively advantageous in this context, fewer chromosomes and the resulting lower rates of recombination could lead to increasing levels of retrotransposon activity because their control is due to recombinationrelated processes, as noted above, which could result in larger genomes. It is in this context that we have focused this study, to study chromosome and genome size divergence over a large range of chromosome numbers (n = 15-24) post-WGD in the framework of a nearly complete species-level phylogenetic analysis of Nicotiana sect. Suaveolentes (Goodspeed, 1954;Purdie et al., 1982;. Nicotiana sect. Suaveolentes has been studied for a long time, starting with the chromosome studies and monograph of Goodspeed (1954), who concluded correctly that the section is ancestrally allotetraploid. Many molecular studies have now demonstrated that they have a single origin via hybridization between two South American diploid species, both with n = 12, one likely to have been itself a diploid hybrid (Chase et al., 2003;Clarkson et al., 2004Clarkson et al., , 2010Kelly et al., 2013;Schiavinato et al., 2019;Dodsworth et al., 2020a), leading to an ancestral Nicotiana sect. Suaveolentes species with n = 24. Divergence of the section has probably involved multiple dysploid reductions to give the current range n = 15-24.
The common ancestor of Nicotiana sect. Suaveolentes arose 5-6 Mya (million years ago; Clarkson et al., 2017;Schiavinato et al., 2019) in central western South America, and then its descendants dispersed widely (Fig. 1), resulting today in species in Africa (one, in Namibia, N. africana), Australia (approx. 46 species, especially numerous in the most arid central regions), New Caledonia (one species, N. fragrans shared with other Pacific islands, plus another, N. forsteri, also in eastern Australia) and several islands in French Polynesia (one species, N. fatuhivensis). None of the species of Nicotiana sect. Suaveolentes is known from the Americas. Extant species with the ancestral (or nearly so) chromosome number, n = 23 or 24, are found in Africa, wetter northern and eastern Australia and the Pacific islands (the chromosome number of N. fatuhivensis is unknown because living material has not been available for cytological study).
About 2 Mya, the common ancestor of the species-rich 'core group' of Nicotiana sect. Suaveolentes invaded the already well-established arid centre of Australia (which became as dry as today about 7 Mya; Byrne et al., 2008) and diversified to produce the currently recognized plethora of Australian species (approx. 49; . Their putative recent origin has resulted in standard molecular markers (e.g. Chase et al., 2003;Clarkson et al., 2010;Marks et al., 2011;Bally et al., 2021) exhibiting low levels of variability, making previous phylogenetic inferences both tentative and weakly supported. In addition, Dodsworth et al. (2020b) found high levels of ancestral polymorphisms that made whole plastome DNA sequences unreliable for estimating species relationships in this section. Accordingly, this study examines the phylogenetics of 46 (out of 49) species of Nicotiana sect. Suaveolentes using restriction site-associated DNA sequencing data (RADseq; Baird et al., 2008), which has worked well to resolve other recently diverged groups (Cruaud et al., 2014;Paun et al., 2016), including some older than these species (e.g. Heckenhauer et al., 2018;Brandrud et al., 2019Brandrud et al., , 2020Wagner et al., 2020).
Using this robust nuclear phylogenetic tree as framework, we examine the relationship between chromosome number and genome size change, and hypothesize that because both genome size and chromosome number changes are impacted by epigenetic phenomena [e.g. the activity of (retro)transposable elements, mechanisms of DNA repair and condensation of chromatin], it is possible that they will exhibit similar levels/ directions of change across the phylogenetic tree. Chromosome numbers typically descend post-WGD in herbaceous species, and genome size shrinkage has also been recorded, and thus we expect to see both occurring among the species of Nicotiana sect. Suaveolentes. We further hypothesize that if chromosome number falls far enough, genome size might start to increase due to the lower number of chiasmata formed during meiosis and, hence, lower levels of recombination. This pattern has been observed previously (Chase et al., 2005;Lysak et al., 2009), but not demonstrated with complete species-level sampling in a group exhibiting simultaneous post-polyploid genome size and extensive chromosome number change.

Taxon sampling, plant material and deposition of vouchers
Our sampling of phylogenetic data included 137 individuals, representing 46 putative species, with the aim to analyse multiple accessions per species, including samples from across the geographic and morphological ranges of each species; however, for some taxa, this was not possible (e.g. N. fatuhivensis and N. murchisonica; Table 1). As we conducted the RADseq sampling, several accessions unexpectedly did not fall with others of the species to which we had assigned them initially, based on their morphological features as assessed in the field (e.g. N. sp. nov. Karara). Species in the group appear to have a relatively high degree of phenotypic plasticity depending on patterns of rainfall and, in some cases, we initially assigned samples mistakenly to the wrong species. We have already described some of the most obvious of these misplaced accessions as new , and other such treatments are in progress. The new species that require more research before they can be described are labelled here as 'sp. nov.' with a locality name (e.g. sp. nov. Karara; Figs 2-4).
We made efforts to include the same accessions that we studied for chromosome numbers and genome sizes in the phylogenetic analysis, but this was not always possible. Chromosome numbers in most species were studied either to verify previous counts from the literature or to re-confirm our first counts (if they deviated from those in the literature), but in a few cases we have relied solely upon counts from the literature (i.e. N. rosulata and N. truncata). In only a few cases did we find something that contradicted what had been published previously, and fortunately genome sizes did not vary enough to make the few unsampled accessions problematic for an examination of general trends in genome sizes vs. chromosome numbers. We have not included parental diploids in this phylogenetic study because the most recent common ancestor of Nicotiana sect. Suaveolentes and any diploid relatives is millions of years greater than the age of the target group (Clarkson et al., 2017). Combining diploids and allotetraploids in the same phylogenetic analysis could also be highly problematic due to the difficulties of confusing maternal and paternal copies, so we confined our phylogenetic studies to just the species of Nicotiana sect. Suaveolentes, minimizing paralogy issues (Brandrud et al., 2020).
Given the phenotypic plasticity in the group and the number of revised species concepts and new species that we have identified, we consider our species determinations more reliable for counts that deviate from those in the literature. We are in the process of identifying vouchers made for the older studies, but many of these were never clearly marked as such in Australian herbaria. In some cases, vouchers were never made. As far as possible, we have included an accession from the same locality as the specimen designated as nomenclatural type, e.g. if we have distantly related genetic clusters of accessions that have been identified previously as in N. rosulata, we have designated as N. rosulata the cluster with the accession from the type locality (e.g. in the case of N. rosulata, the type was collected near Leonora, Western Australia, so the accessions that cluster with the material collected in Leonora are labelled as N. rosulata). For those species in which we report infra-specific chromosome variation, e.g. N. goodspeedii and N. benthamiana, the exact accessions used in the cytological and genome size studies are included in the RADseq matrix (Table 1).

Collecting and import permits
All field-collected material is covered under the following collecting permits: Western Australia SW017148, CE006044, Northern Territory 58658, Victoria 10008399, New South Wales SL101924 and Queensland PTU-18001061. Permission to remove seeds from herbarium specimens was obtained from the curators/collection managers of the following herbaria: BRI, NT and PERTH. All seeds were imported into the UK following published guidelines, and plants were grown in quarantine at the Royal Botanic Gardens, Kew, UK import permit DEFRA PHL2149/194627/5NIRU CERT:106-2019; HMRC TARIFF CODE: 0601209090.

DNA isolation and sequencing
Total DNA was isolated from silica-dried leaves using a cetyltrimethylammonium bromide (CTAB) procedure (Doyle, 1990), after a 20 min pre-treatment on ice with ice-cold sorbitol  buffer (100 mm Tris-HCl, 5 mm EDTA, 0.35 m sorbitol, pH 8.0). After extraction, the DNA was further treated with 2.5 µL of RNase A (Thermo Fischer, USA) for 30 min at 37 °C and the reaction cleaned with a NucleoSpin gDNA clean-up Kit (Machery-Nagel, Germany), following the manufacturer's instructions. DNA was quantified with a Qubit 3.0 spectrophotometer (Thermo Fisher Scientific, USA).
Single-digest RADseq libraries were prepared following a protocol successfully used in previous studies (e.g. Heckenhauer et al., 2018;Brandrud et al., 2019Brandrud et al., , 2020. The protocol used the restriction enzyme PstI to treat batches of 60 individuals per library, including any necessary repeats when not enough had been initially obtained. The inline and index barcodes used differed from each other by at least three positions. The libraries have been sequenced at the Vienna BioCenter Core Facilities (VBCF; https://www.viennabiocenter.org/) on an Illumina Hiseq 2500 as pair-end reads of 125 bp.

Bioinformatic and phylogenomic analyses
The raw reads were demultiplexed first based on index barcodes using BamIndexDecoder v.1.03 (included in the Picard Illumina2Bam package, available from http://gq1.github.io/ illumina2bam/). Demutiplexing based on inline barcodes was then conducted with process_radtags from Stacks v.1.74 (Catchen et al., 2013), together with quality filtering that removed reads containing any uncalled base and those with low quality scores, but rescued barcodes and cut sites with maximum one mismatch.
The reads were mapped with bwa mem (v.0.7.17-r1188; Li and Durbin, 2009) to a reference genome for a member of this section, N. benthamiana (v.1.0.1, Bombarely et al., 2012), a species widely used as a model organism in plant virology and biotechnology (Tregoning et al., 2020). Given that the parents of these allotetraploids were relatively distantly related to each other (from different taxonomic sections; Chase et al., 2003;Clarkson et al., 2004) and that extensive post-WGD chromosomal evolution has already taken place during diploidization, our approach in using a reference genome within the group circumvents as much as possible paralogy issues and can treat the data as effectively 'diploid' (i.e. the homoeologous sequences are expected to map to their own parental sequence). During mapping, the option -M was applied to flag shorter split hits as secondary. The individual mapping rates were investigated to test for mapping bias, potentially driven by phylogenetic relatedness to the reference individual. The resulting aligned sam file was sorted by reference coordinates, and read groups were added using Picard Toolkit v.2.18.17 (available from http://broadinstitute.github.io/picard/). Indel realignment was performed with the Genome Analysis Toolkit v.3.8 (McKenna et al., 2010), thinning the data to a maximum of 100 000 reads per interval. A catalogue has been built and variants were called from the realigned .bam files with the ref_map.pl pipeline in Stacks with default settings. Export_sql.pl and populations from Stacks were used to extract those regions with up to 40 single nucleotide polymorophisms (SNPs) and which had data for at least 50 % of the individuals. We also retained only those variants with a maximum observed heterozygosity of 0.65 to avoid further use of any pooled paralogues. For the chromosome numbers, we also report published counts from the literature. * Chase and Christenhusz, unless otherwise noted. For accessions retrieved via seeds removed from herbarium specimens, the Chase and Christenhusz numbers are provided for the voucher prepared from the cultivated material; the collector and number for the original herbarium specimen are also provided (including the herbarium accession number).
†Accessions not in the RADseq tree.  (Stamatakis, 2014). The analyses were performed with 1000 rapid bootstrap replicates, using an ascertainment bias correction to the likelihood calculations (Lewis, 2001) as recommended for concatenated SNPs. A simultaneous search for the best-scoring ML tree was conducted with a general time-reversible model of nucleotide substitutions (i.e. the GTRCAT model) and disabled rate heterogeneity among sites model (i.e. -v). The best tree was then visualized and annotated in R, using ape v.5.3 (Paradis and Schliep, 2018), biostrings (Pagès et al., 2020), ggplot2 (Wickham, 2016), ggtree (Yu et al., 2017) and treeio (Wang et al., 2020). We assigned N. africana as the outgroup because it was well supported as sister to the rest of Nicotiana sect. Suaveolentes in several phylogenetic analyses using both plastid and nuclear data (Chase et al., 2003;Clarkson et al., 2004Clarkson et al., , 2010Clarkson et al., , 2017Marks et al., 2011;Kelly et al., 2013).
To assess patterns of hybridization/introgression, we constructed a co-ancestry heatmap (Fig. 2) for a set of 64 accessions, corresponding to the accessions in sub-tree B (Fig. 3). For this purpose, we used the genotype-free method implemented in ANGSD v.0.9.10 (Korneliussen et al., 2014) on the indelrealigned .bam files to calculate genotype likelihoods as these were shown to be accurate estimates of genomic parameters for medium to low coverage data (Maas et al., 2018;Warmuth and Ellegren, 2019). Only sites with data for at least 75 % of individuals were retained with a minimum 20 base quality and mapping quality. For 1 085 059 high-confidence (P < 1e-6) variable positions that had a minor allele shared by at least three individuals, we inferred the major and minor alleles frequencies under a GATK-based genotype likelihood model. Starting from covariance matrices calculated using pcangsd v.0.99 (Meisner and Albrechtsen, 2018) from the genotype likelihoods, we further visualized co-ancestry of the different accessions using the heatmaps.2 function (GPLOTS v.3.0.1.1; Warnes et al., 2020).

Chromosome number determination
We used the following protocol for determining chromosome numbers. We first re-potted mature but still actively growing plants in the greenhouse about 2 weeks before harvesting root tips. This forced the plants into producing many actively growing roots, increasing the number of root cells with acceptable mitotic figures. Young root tips were obtained directly from cultivated material and pre-treated with 0.002 m 8-hydroxyquinoline at 10-12 °C for 24 h. Subsequently, the roots were fixed in Farmer's fixative (3:1 absolute ethanol:glacial acetic acid, v/v) for 2-24 h at room temperature. The roots were then washed twice in distilled water (10 min each or until they sank to the bottom of the tube). For slide preparation, the roots were digested on the slide with an enzymatic solution containing 2 % (w/v) cellulase and 20 % (w/v) pectinase in phosphate buffer at 37 °C for 2 h in a wet chamber and washed subsequently to remove the enzyme with a solution containing distilled water and glacial acetic acid (1:1, v/v) for 1 h in a wet chamber. After washing, the meristematic tissue was fragmented with needles in a drop of 45 % acetic acid, placed under a coverslip and squashed. The slide/coverslip assembly was frozen in liquid nitrogen for 5 min, and the coverslip was removed quickly with a razor blade and the slide air dried. Fluorochrome staining followed Schweizer (1976). The slides were first stained with chromomycin A3 (CMA; 0.2 mg mL −1 ) for 1 h and then with 4′,6-diamidino-2-phenylindole (DAPI; 2 μg mL −1 ) in water for 30 min before mounting in glycerol/McIlvaine buffer medium. The best cells were captured on a Zeiss light microscope using an Axio Cam MRC5 video camera and Axiovision 4.8 software. Chromosome images were processed in Photoshop CS3, and counts and measurements were obtained with the software ImageJ.

Genome size estimation
Genome sizes were estimated using seeds instead of leaves or floral tissues. We originally worked with leaf tissue but found that this made genome size estimates difficult or impossible for some species for reasons that are unclear. Perhaps secondary chemistry or unusual leaf pigments negatively impacted the estimates, whereas we experienced few problems using seeds. The genome sizes of these Nicotiana species were measured using a modification of the approach detailed in Pellicer and Leitch (2014). Briefly, 5-10 Nicotiana seeds were co-chopped with a razor blade and 2 cm 2 of leaf from the size standard, Petroselinum crispum (1C = 2.22 Gb/1C; Apiaceae) in 2 mL of isolation buffer (general purpose buffer of Loureiro et al., 2007) supplemented with 0.3 % polyvinylpirrolidone (PVP-40, Sigma Aldrich) and 0.04 % β-mercaptoethanol. The chopped material was then filtered through a 30 μm nylon mesh, stained with propidium iodide (1 mg mL -1 ; Sigma Aldrich in water) at a final concentration of 50 μg mL -1 , and stored on ice for 10-40 min. Three replicate runs per species were conducted, recording 5000 particles using a Partec Cyflow Space with a 532 nm (Partec GmbH, Münster, Germany) flow cytometer fitted with a green laser (30-100 mW). FLOWMAX software (v. 2.7; Partec GmbH). We included here none of the previous estimates (ten in Narayan, 1987) because there were no vouchers made (the seeds were taken from seed banks, which contain no specific information on provenance) and thus we could not be certain about which species were analysed.

Analyses of genome size and chromosome number change
BayesTraits v.3.0.2 (Pagel et al., 2004) was used to infer genome size change across the tree. We used ChromEvol v.2.0 (Glick and Mayrose, 2014) for the analysis of chromosome number change. A suitable tree for modelling both phenomena was created by pruning the tree in Fig. 3, leaving only one representative of each taxon. In taxa with three or more accessions, the accession with the median branch length was chosen as the representative. Tree editing was done in R using ape (Paradis and Schliep, 2018).
Genome size change in Nicotiana sect Suaveolentes was estimated using the continuous model in BayesTraits, with the tree branches scaled to 0.01, and estimating the delta, kappa and lambda parameters. The Markov chain had 11 000 000 iterations, sampled every 500 iterations with a burn-in of 1 000 000 iterations. Estimation of ancestral genome size was limited to values between 2.5 and 6, the range we observed in these species.
The ancestral chromosome number was assessed using the default models, with the root node fixed to n = 24 based on the number of chromosomes in the diploid parents (n = 12). Constant rate models were equal and performed better than linear dependence models. The simplest model was chosen, assessing chromosome gains and losses (all species have the same ploidy, so duplications were not investigated). Another model was constructed in which no chromosome number increases were allowed, again with the root node fixed to be n = 24. All models were run with 100 000 simulations of changes along the branches. Our favoured scenario does not permit number increases, and our assumptions for this are presented in the Discussion, but this choice of model does not affect our general conclusions. Results were checked in Tracer v.1.6 and visualized in R with the packages ape (Paradis and Schliep, 2018), treeio (Wang et al., 2020), gtree (Yu et al., 2017), patchwork (Pedersen, 2020) and ggimage (Yu, 2020).
A simple linear (Brownian motion) model was fitted to test for a specific association between genome size and chromosome number (all included species are of the same ploidy), with the former as the dependent variable and the latter as the independent variable. To account for evolutionary nonindependence between taxa, we also estimated phylogenetic independent contrasts (PICs) for genome size and chromosome number based on the phylogenetic tree in Fig. 4, using ape in R (Paradis and Schliep, 2018). The tree was pruned to include one representative of each taxon for which both data types were available. The PICs were regressed through the origin (Garland et al., 1992) to test whether there was a linear relationship between these two. If there are only two variables, in this case genome size and chromosome number, PIC is equivalent to using the phylogenetic generalized least squares procedures (PGLS; Blomberg et al., 2012). The difference between PIC and PGLS is that the latter returns an intercept, but the slope parameter (which represents the relationship between genome size and chromosome number) is identical.

Phylogenetic analyses
After demultiplexing and quality filtering, we retained on average 2.0 million pairs of reads per accession (  The average coverage across samples obtained after mapping was 11.1× (s.d. 2.2×).
After filtering, numbers of retained SNPs ranged between 130 995 (with data for at least 95 % individuals) and 599 473 (with data for a minimum of 80 % individuals). After comparing the average bootstrap support, the dataset including up to 15 % missing data (i.e. including 457 382 SNPs) was chosen as the final matrix.
The ML tree produced (Fig. 3A, B) exhibits well-supported interspecific relationships [bootstrap percentage (BP) 100], and multiple accessions of species form unique, well-supported clusters. The only major lack of resolution is close to the base, where the position of N. forsteri relative to N. monoschizocarpa is not well supported (BP 73). The 18 major clades identified (numbered as Roman numerals, I-XVIII) are each generally widespread geographically and occur in a variety of habitat types, varying from sheltered (i.e. under trees such as mulga, Acacia aneura, or on the south sides of rock outcrops and in gorges) to open (i.e. sand dunes, dry riverbeds, fields, ruderal sites and gibber plains). The newly recognized species Christenhusz, 2018a, b, c, 2021a, b; are clearly distinct from the concepts in which they were previously included. For example, N. karijini, for which herbarium specimens had been identified previously as N. umbratica, is sister to N. benthamiana (clade VII); N. gascoynica, previously considered to be specimens of N. simulans, is sister to the N. simulans clade (clade XI) plus N. cavicola (clade X); N. yandinga, previously identified as N. maritima, is sister to the whole of the N. suaveolens clade (clade XVIII, which includes N. maritima); and, finally, N. faucicola, which had also routinely been identified as N. maritima (and occasionally as N. velutina; clade XVII) is sister to N. suaveolens plus another as yet undescribed species in the larger N. suaveolens clade (clade XVIII). The major clades identified in the RADseq tree largely conform to the distribution of major differences in vestiture observed, e.g. clade XI with sparse, long, multicellular straight gland-tipped hairs, clade XIII with dense long and short glandtipped hairs, and clade XVIII with long, curly (wooly), multicellular, gland-tipped hairs, but few other major morphological characteristics seem to co-vary with the genetic results for the larger multispecies clades. We are investigating seed morphology, which is variable and potentially taxonomically useful.
Generally, the Australian species of Nicotiana are morphologically similar and not easily distinguished, especially if one is working with the fragmentary material typical of many  herbarium specimens. Inflorescence structure and vestiture are useful traits, with floral traits, especially size, useful in some cases for distinguishing closely related species from each other. Despite their overall highly similar morphology/habit, the high levels of bootstrap support make this a good phylogenetic framework for examining how chromosome number and genome size vary. Some hybrids have been detected, including one that is a neo-allotetraploid (N. notha; Figs 2 and 3B), but all other obvious hybrids have been excluded from this study. As for N. notha, in our results hybrids are obvious due to their isolated positions as sister accessions to larger clades and clear genetic similarities to at least two other species in heatmaps. In Fig. 2, N. notha displays general genetic similarities (brighter colour) to two of the larger clades, XVII and XVIII (Fig. 3B), but specifically N. sp. nov. Strzelecki and N. sp. nov. WAust (bright rose in Fig. 2), which both occur in the general area in which this material was collected (Table 1). Both putative parents are n = 16 and have genome sizes estimated at 3.2 and 3.4-3.8 pg, respectively (Table 1), and N. notha has n = 32 and 6.5 pg. We have found herbarium specimens of this same entity in other nearby localities (labelled as N. suaveolens), so this allotetraploid clearly occurs in more than one place and putatively functions as a species, warranting its formal description (Chase et al., 2021d).

Chromosome number change
Chromosome numbers are shown on the summary RADseq tree (Fig. 4B, C), the data being a combination of our own counts and those taken from the literature (Table 1); however, in the few cases for which our counts differ from earlier reports, we show only our results because we are not sure of the species determinations of previous researchers (and we have been unable to examine the vouchers). Species varied in chromosome number, with numbers forming an almost complete descending dysploid series, ranging from n = 24 in N. monoschizocarpa and N. heterantha to n = 15 in N. yandinga, N. maritima, N. faucicola and N. suaveolens. Chromosome morphology is also highly variable among species, with the occurrence of metacentric, sub-metacentric and acrocentric chromosomes (F. Nollet and M. W. Chase, unpubl. res.), but these are not presented because they are not a focus of this study.
Under our favoured scenario (see the Discussion) in which increases in chromosome number are not permitted (Fig. 4C), the spine of the tree exhibits a stepped decrease at each node in which n = 24-22 occur in sequence, with subsequent multiple independent decreases within many clades. Near the tips of the tree, changes in the spine are precipitous, e.g. skipping from n = 21 to 18 and 16 in clades XVII and XVIII, respectively (Fig. 4C). If a model is applied in which increases and decreases are equally likely, then there is no clear pattern of chromosome number change along the spine of the tree (Supplementary data Fig. S1), but rather it is focused largely within the major clades, resulting in both decreases and increases. For example, N. goodspeedii (n = 20, 21) is surrounded by species with lower numbers (n = 15-18), so under this model an increase is hypothesized in N. goodspeedii. In our favoured model (Fig. 4C), the spine node for this group is n = 21, so changes are all decreases in chromosome number. Our choice of model does not affect our general conclusions about the interactions between genome size and chromosome number change.

Genome size change
Genome size for the allotetraploid ancestor of Nicotiana sect. Suaveolentes, which was hypothesized as n = 24, could be expected to be in the range of 4.8-5.2 pg per 1C nucleus (see the Discussion), corresponding roughly to that of N. africana (n =23) with 5.4-5.5 pg/1C. A decidedly smaller genome size was recorded in two of the n = 24 species, N. monoschizocarpa with 4.3 pg/1C and especially N. heterantha with 2.5 pg/1C. No chromosome or genome size data are available for N. fatuhivensis due to lack of access to appropriate material.
After the above species diverged, genome size (Fig. 4A, B) is estimated to become uniform along the spine of the tree, 3.2-3.3 pg/1C, as well in clades VII (3.3-3.4 pg/1C) and VIII (3.2 pg/1C), then dropping slightly in clades IX (2.7 pg/1C), X (2.7 pg/1C), XI (2.7-2.9 pg/1C) and XII/XIII (2.9 pg/1C). In clade XIV (n = 18, 19) and its sister clade XV (n = 20, 21), genome size ranges from 3.4 to 3.9 pg/1C and from 2.6 to 3.0 pg/1C, respectively. Finally, in the clades with the lowest chromosome numbers, XVII and XVIII, genome size is uniformly 3.2-3.6 pg/1C and close to the estimated ancestral genome size of the core group of species and along the spine (3.2-3.3 pg/1C). Thus, it appears that chromosome numbers and genome sizes are not co-varying (Fig. 4B); one species with the ancestral number, N. heterantha with n = 24, has among the lowest genome sizes (2.5 pg) in the group, and those with the lowest chromosome number, n = 15, 16 (clades XVII and XVIII) are uniformly larger (some up to 30 %) than those in several clades with n = 20, 21. We have left the neoallotetraploid, N. notha, n = 32 and genome size of 6.5 pg/1C, out of these comparisons. Notably, as chromosome numbers decrease in clades XIV, , genome size appears to stabilize or even increase relative to that estimated along the spine (Fig. 4A, B).

Data availability
The data underlying this article are available in the NCBI Short Reads Archive and can be accessed with BioProject ID PRJNA681916, SRA Study SRP295424.

Chromosome number change and environmental correlates
Post-WGD, the general pattern of chromosome number change is reduction during diploidisation (Wendel, 2015;Dodsworth et al., 2015;Soltis et al., 2016;Escudero and Wendel, 2020), which is most obvious in herbaceous groups, as observed here in Nicotiana sect. Suaveolentes, or those with herbaceous ancestry, such as the now mostly woody families in Malpighiales (e.g. Passifloraceae) and Lamiales (e.g. Oleaceae, Bignoniaceae), among others (Carlquist, 2009; chromosome data from the Index to plant chromosome numbers; Goldblatt and Johnson, 1979-onwards).
The general background for chromosome number change in N. sect. Suaveolentes is one in which the species with higher numbers occur in the more dependably wet habitats in northern (summer monsoon) and eastern (rain forest) Australia. This group began to radiate in the arid zone only within the last 2 million years (Clarkson et al., 2017;Cauz-Santos et al., 2022), with formation of the core group of species (clades VII-XVIII), in which chromosome number and genome size both exhibit decreases in general. In all species with <20 pairs of chromosomes (four independent cases), genome size stops decreasing or even increases by up to 30 % compared with relatives with ≥20 pairs (Fig. 4; see below). We appear to have detected a chromosome number inflection point at which genome size begins to stabilize or even increase (see below). Carta et al. (2018) showed in a phylogenetic context for Italian endemic plants that open, disturbed, drought-prone habitats select for low chromosome numbers, whereas long-lived species occurring in shaded, stable habitats are associated with  higher chromosome numbers. Similarly, those Nicotiana species with the lowest chromosome numbers (n = 15, 16) occur in the uniformly driest regions in southern Australia (many with <200 mm of rain per year) and those with the higher numbers in the wetter parts of northern and eastern Australia. These observations support the hypotheses of Darlington (1937) and Stebbins (1950) that environmental instability and stress favour the lower levels of recombination brought about by fewer chromosomes. Protection from interspecific gene flow and recombination of adapted, linked alleles may therefore be the most important effects of changes in chromosome structure (Rieseberg, 2001). The radiation of these Nicotiana species occurred against a background of diploidization associated with invasion of novel habitats, a phenomenon compatible with the lineage-specific ohnologue resolution (LORe) model (Robertson et al., 2017). The redundant, modular structure of duplicated gene regulatory networks offers all polyploid species increased possibilities for novel evolutionary innovation and adaptation through mutations. Whether the radiation of the species of N. section Suaveolentes in the Australian arid zone is specifically adaptive has yet to be documented. Furthermore, there are no explanations for why lower chromosome numbers are so routinely associated in angiosperms with the evolution of annual life histories and inbreeding from outcrossing perennial ancestors (as in these species of Nicotiana).

Drivers of genome size vs. chromosome number change
The ancestral chromosome number in N. sect. Suaveolentes should be n = 24 (the sum of those in the putative parents, n = 12; Chase et al., 2003), which is found in species at the first several nodes in the RADseq tree (Fig. 4B, C). Based on the genome size of N. sylvestris, 2.70 pg/1C (closely related to the paternal diploid parent; Clarkson et al., 2010), N. sect. Noctiflorae, 4.18 pg/1C (Clarkson et al., 2004;Kitamura et al., 2005;Kelly et al., 2013), and N. sect. Alatae, n = 3.7 pg/1C (Chase et al., 2003;Kitamura et al., 2005), we estimate that the ancestral genome size of N. sect. Suaveolentes might be in the 5.40-6.88 pg/1C range. The genome size of N. africana (n = 23), which is sister to the rest of N. sect. Suaveolentes, is 5.45 pg, and is thus close to the ancestral size.
Nicotiana forsteri (n = 24) with 4.9 pg/1C (Table 1; Fig. 4B, C) is also close to the expected genome size range, whereas N. monoschizocarpa (n = 24) with 4.3 pg/1C and particularly N. heterantha (n = 24) with 2.5 pg/1C deviate strongly from the expected genome size and have clearly followed an independent path of reduction, with the last being among the smallest genomes in N. sect. Suaveolentes. The genome size of N. heterantha is also lower than those in the South American diploid progenitors of N. sect. Suaveolentes (see also below). Although N. umbratica (clade VI) has close to the ancestral chromosome number (n = 23), its genome size of 3.8 pg/1C differs little from some of the species with the lowest number, n = 15 (clade XVIII) with up to 3.8 pg/1C.
It is clear that altered repeat content is driving genome size changes in this group. However, chromosome number change is not correlated with the direction of genome size alteration in a systematic manner. Previous studies have shown that relative to parental diploids, e.g. in N. tabacum (a relatively recently formed allotetraploid, >100 000 years ago), there is reduced content of several repeat sequences, including tandem repeats Koukalova et al., 2010;Renny-Byfield et al., 2012), pararetroviral (Gregor et al., 2004) and geminivirus-like (Skalicka et al 2005) sequences and various retrotransposons (Melayah et al., 2004;Petit et al., 2007), frequently from the paternal genome (Mhiri et al., 2019). In another recently formed allotetraploid, N. rustica, the NPAMBO repeat was reduced by at least 10-fold compared with the maternal donor species, N. paniculata, which could have contributed to the observed 2-5 % reduction in DNA amount in this species . In the case of the species of N. sect. Suaveolentes, assessments of repeat content relative to their diploid parents is made difficult by the antiquity of the group (if they appeared 6 Mya, then there is 12 million years of divergence that separates these species from the modern relatives of their parents) and the complexity of their maternal parent, which is likely to have been a diploid hybrid between species in at least two sections of the genus, perhaps Nicotiana sects Alatae and Noctiflorae (Kelly et al., 2013;Schiavinato et al., 2019).
In Oryza, the genome of O. brachyantha (a wild rice species) is 68 % smaller than that of cultivated O. sativa, with the larger cultivated rice genome being associated with the amplification of long terminal repeat (LTR) retrotransposons (Chen et al., 2013). Only 70 % of these two genomes were collinear, with non-homologous end-joining after double-strand breakage accounting for most movements of genes. Such rearrangements could generate reproductive barriers and perhaps lead to speciation if disruptive selection was also operating. It is likely that genome size change and chromosome rearrangements in species of N. sect. Suaveolentes could also be creating interspecific reproductive barriers.
Interspecific hybridization and allopolyploidization can trigger activation of (retro)transposons (Parisod et al., 2010), as can environmental stress (Grandbastien et al., 2005), both potentially triggering chromosome number and/or genome size change. However, given the lag phase between N. sect. Suaveolentes formation (6 Mya) and species radiation (2 Mya) into the arid zone, stress may have been significant to the changes observed. Overall, the ecological and evolutionary features associated with speciation in the harsh conditions of the arid zone in Australia certainly favour chromosome reduction in line with the ideas of Stebbins (1950) and earlier by Darlington (1937). These could be expected to happen in parallel both within and between clades, which we see happening independently in several clades, perhaps in as many as four (Fig. 4): N. benthamiana/karijini (clade VII); N. truncata/excelsior (clade XIV); the N. velutina clade (XVII); and the N. suaveolens clade (XVII). We hypothesize that decreases are the most likely direction of change in this group, despite the appearance of putative increases in a few cases (i.e. N. goodspeedii). Consequently, the optimization of chromosome number change that does not permit increases was modelled (Fig. 4C), even though we admit that it is not the simplest explanation. Further study should be able to clarify this topic, and our preference for number reduction does not influence any of our general conclusions below about chromosome number and genome size change.

Recombination, chromosome number and genome size change
An explanation for increasing genome size observed here in the species with the lowest chromosome numbers might be that with chromosome number decreasing, the recombination rate falls. Assuming chiasma frequency equates with frequency of homologous recombination-based removal of repeats per chromosome complement, the competing rates of repeat increase/elimination reach a tipping point when the recombination rate can no longer compensate for the rate at which repeats are multiplying. This results in the overall repeat content of a genome increasing, leading to larger genome sizes overall. This hypothesis is entirely mechanistic and is the result of intrinsic repeat expansion rates vs. excision rates via recombination.
In the n = 15-19 species of N. sect. Suaveolentes (Fig. 4B, C), genome size increases relative to those with n = 20-21 (N. heterantha, n = 24 and 2.5 pg/1C, being an obvious exception to these patterns of genome size change). Except for N. burbidgeae (clade VIII), clades XI-XV (n = 20-23) have genome sizes that are 2.7-3.0 pg/1C, whereas those in clades VII, XIV, XVII and XVIII (n = 15-19) vary between 3.3 and 3.8 pg, an increase of 10-29 %. Potentially, when chromosome number has dropped far enough to reach an inflexion point, in this case <20 pairs of chromosomes, the number of chiasma per chromosome complement, typically 1-2 chiasma per bivalent (Goodspeed, 1954), dwindles to the point at which genome size increases due to inefficient removal via recombination. We do not expect that this specific number of chromosomes should universally cause this sort of change because it would depend on many different components that govern types and distributions of repeats, the overall genome size range (i.e. Mbp or Gbp) and other factors including population sizes and breeding systems. It also introduces a more general paradox that deserves much more attention: how have the small genomes of annual herbaceous species such as Arabidopsis thaliana become associated with only five pairs of chromosomes?
We hypothesize that the key to understanding this paradox might be in the phenomenon we observe here in N. heterantha (n = 24), which has the smallest genome observed in N. sect. Suaveolentes, even smaller than the South American diploids from which the section was derived. Perhaps because it has more chromosomes and is thus highly efficient in eliminating retrotransposons, it could establish a new starting small genome size that leads in its offspring to even smaller genomes during chromosome number diploidization than those observed in extant species of Nicotiana. Such downward genome size leaps could be important in the ancestry of species with the smallest genomes, despite their possession of only a few chromosomes, which should be associated with increasing genome size, as observed here in Nicotiana. A similar pattern has been observed in Brassicaceae tribes Physarieae and Anchonieae (Lysak et al., 2009) and Caricaceae (Rockinger et al., 2016). In the orchid species in Erycina (Oncidiinae; Chase et al., 2005), which has the lowest number of chromosomes in Orchidaceae (n = 5, 7) and a smaller genome size than most orchids, genome size decreased first in the clades in which this genus is embedded, which have many chromosomes (n = 28, 30). Thus, genome size first decreased in species with many chromosomes and, once it was small, dramatic chromosome decreases took place, after which genome size began to increase in Erycina and its relative Tolumnia (Chase et al., 2005). In Genlisea (Fleischmann et al., 2014), it is the polyploids that exhibit the smallest genome sizes. Based on what we have observed here and these examples from the literature, the lower rates of recombination in species with fewer chromosomes should allow genome size to increase, and it is only through stochastic leaps to smaller genome sizes in species with more chromosomes that massively smaller genome sizes evolve. In this model, major reductions in genome size occur before chromosome number changes. Importantly, it is at the species and population interface that we should expect to find answers to questions regarding which factors induce genome size and chromosome number changes, and what principles govern their interactions.

Conclusions and prospects
Based on previous literature (as reviewed in the Introduction), there is little foundation to our expectation that chromosome number changes in parallel with genome size variation, although it might be expected that as genomes are re-arranged during the formation of a descending dysploid series there would be genome size change (i.e. rates of change in the two are correlated in spite of the directions not being parallel), a situation that we do observe in N. sect. Suaveolentes. Genome size in the species of N. sect. Suaveolentes both decreases and increases as chromosome numbers decrease (Fig. 5), and in one case (N. heterantha, n = 24, 2.5 pg; Table 1) genome size has decreased drastically with no change from the ancestral chromosome number.
If chromosome rearrangements result in reduced introgression for genes carried in the re-organized chromosome arms, we expect to see greater phylogenetic concordance in loci from rearranged than from non-rearranged chromosome arms, and these should also have earlier coalescence than those in parts of the genome still experiencing gene flow. When combined with a detailed karyotypic study, this permits us to ask if different models better fit chromosome segments with varying histories, allowing us to detect distinctive evolutionary dynamics and ultimately to piece together the general history of speciation in this group. If rearrangements instigate divergence, then we expect the times at which they are established to coincide with speciation events; however, if they occur afterwards, this coincidence would not be discovered. Using coalescent models to date multiple speciation events and then mapping rearrangements on these species trees will help determine if changes in chromosome structure (number) have generally been involved in, but are not necessarily driving, speciation (Faria and Navarro, 2010). The key here will be finding systems in which chromosome change is relatively recent so that we can distinguish between the effects of disruptive selection (genic divergence), gene sweeps and changes in genomic architecture on population divergence and speciation. Nicotiana sect. Suaveolentes has many of the attributes of such a system.

SUPPLEMENTARY DATA
Supplementary data are available online at https://academic. oup.com/aob and consist of Figure S1: Chromosome number evolution as estimated with ChromEvol with increases and decreases equally likely.