Establishing Physalis as a Solanaceae model system enables genetic reevaluation of the inflated calyx syndrome

Abstract The highly diverse Solanaceae family contains several widely studied models and crop species. Fully exploring, appreciating, and exploiting this diversity requires additional model systems. Particularly promising are orphan fruit crops in the genus Physalis, which occupy a key evolutionary position in the Solanaceae and capture understudied variation in traits such as inflorescence complexity, fruit ripening and metabolites, disease and insect resistance, self-compatibility, and most notable, the striking inflated calyx syndrome (ICS), an evolutionary novelty found across angiosperms where sepals grow exceptionally large to encapsulate fruits in a protective husk. We recently developed transformation and genome editing in Physalis grisea (groundcherry). However, to systematically explore and unlock the potential of this and related Physalis as genetic systems, high-quality genome assemblies are needed. Here, we present chromosome-scale references for P. grisea and its close relative Physalis pruinosa and use these resources to study natural and engineered variations in floral traits. We first rapidly identified a natural structural variant in a bHLH gene that causes petal color variation. Further, and against expectations, we found that CRISPR–Cas9-targeted mutagenesis of 11 MADS-box genes, including purported essential regulators of ICS, had no effect on inflation. In a forward genetics screen, we identified huskless, which lacks ICS due to mutation of an AP2-like gene that causes sepals and petals to merge into a single whorl of mixed identity. These resources and findings elevate Physalis to a new Solanaceae model system and establish a paradigm in the search for factors driving ICS.


Introduction
The Solanaceae family is one of the most important plant families in fundamental and applied research not only due to its remarkable morphological and ecological diversity but also due to its far-reaching economic value from its many members used as food crops, ornamentals, and sources of pharmaceuticals (Gebhardt, 2016;Shenstone et al., 2020;Añibarro-Ortega et al., 2022). The most studied Solanaceae include major food crops such as eggplant (Solanum melongena), pepper (Capsicum annuum), potato (Solanum tuberosum), and tomato (Solanum lycopersicum), in addition to the model species petunia (Petunia hybrida) and Nicotiana benthamiana. However, various species-specific limitations of the other taxa have made tomato a preferred model for many studies, as it has a full suite of genetic and genomic resources that enable maximal biological discovery and translation to agriculture.
Developing new Solanaceae model systems that equal the utility of tomato is essential to study incompletely explored diversity, including traits of economic importance. The most challenging is identifying potential systems with noteworthy comparative and species-specific variation that, critically, can be dissected by efficient forward and reverse genetics that is enabled by tractable genomics, genome editing, and cultivation. We previously identified species in the genus Physalis as promising in all these aspects (Lemmon et al., 2018). This genus includes orphan crops such as tomatillo (Physalis philadelphica and Physalis ixocarpa), goldenberry (Physalis peruviana), and groundcherry (Physalis grisea and Physalis pruinosa), and many other species that yield edible fruits or are grown as ornamentals.
Physalis occupies a key phylogenetic position that complements other Solanaceae models. It is a representative genus of Physaleae, an under-studied Solanaceae tribe that has the most genera in the family (Zamora-Tavares et al., 2016;Deanna et al., 2019;Pretz and Deanna, 2020), and diverged from established Solanum model systems about 19 million years ago (Ma; Särkinen et al., 2013). In addition, recently discovered Physaloid fruiting fossils dated to about 52 Ma pushed back the evolutionary timing of Solanaceae divergence from other taxa considerably (Wilf et al., 2017;Deanna et al., 2020). Thus, Physalis has great potential to analyze diversification over long evolutionary distances in comparative studies within the Solanaceae. Moreover, Physalis species show substantial variation in developmental and molecular traits, including inflorescence complexity, secondary metabolism, and disease resistance (Baumann and Meier, 1993;Whitson, 2012;Park et al., 2014;Zhang and Tong, 2016;Huang et al., 2020), providing additional avenues for discovery. However, the most conspicuous and impressive feature of Physalis, also found in other angiosperms, is the inflated calyx syndrome (ICS), a remarkable evolutionary novelty where sepals grow excessively large after fertilization to form balloon-like husks that encapsulate fruits (He et al., 2004;Wilf et al., 2017).
Dissecting the evolutionary and mechanistic origins of morphological novelties is a fundamental goal in biology (Muller and Wagner, 1991;Shubin et al., 2009), and it is not surprising that botanists and evolutionary biologists have long been fascinated by ICS (He et al., 2004;Waterfall and Umaldy, 1958;Wilf et al., 2017). Though Physalis has historically lacked molecular and functional genetics tools, studies on ICS over the last few decades have suggested a central role for two MADS-box genes, including an ortholog of one gene in potato, StMADS16 (an ortholog of Arabidopsis thaliana AGAMOUS-LIKE 24), which causes leaf-like sepals when overexpressed in other Solanaceae (He et al., 2004). Prompted by this observation, supportive molecular and functional genetic data generated within Physalis suggested that heterotopic expression of the StMADS16 ortholog MPF2 was key to the evolution of ICS. Later studies suggested this essential role emerged from modified cis-regulatory control of MPF2 by the euAP1-like gene MPF3 (He and Saedler, 2005;Zhao et al., 2013).
A recent genome of Physalis floridana and additional functional work suggested that loss of another MADS-box gene, MBP21/JOINTLESS-2 (J2), a member of the SEPALLATA4 (SEP4) clade, was also critical, and seemingly reinforced an additional conclusion that fertilization is an integral physiological driver of ICS (Lu et al., 2021). The proposed role of fertility and previous findings that flower-specific MPF2 expression is ancestral to ICS suggested this trait may have been lost during evolution (He and Saedler, 2007;Hu and Saedler, 2007). However, a recent deeply sampled taxonomic study showed that, although being invariantly present in a large monophyletic clade such as Physalis subgenus Rydbergi, ICS was gained multiple times throughout the tribe of Physalideae in a stepwise and directional manner, from noninflation to enlarged sepals appressed to the fruit (accrescent-appressed), and finally to an inflated calyx (Deanna et al., 2019). These findings, along with independent emergence of ICS in other angiosperms (Deanna et al., 2019), may indicate that there is a deeper genetic and molecular complexity behind ICS, determined by factors besides MPF2 and other proposed MADS-box genes (Hu and Saedler, 2007;Deanna et al., 2019).
Outstanding questions regarding ICS and our broad interest in Solanaceae biology and agriculture led us several years ago to begin establishing Physalis as a new model system. We developed efficient Agrobacterium-mediated transformation and CRISPR-Cas9 genome editing in the diploid groundcherry species P. grisea, and demonstrated the utility of these tools by mutating orthologs of tomato domestication genes in groundcherry to improve productivity traits (Lemmon et al., 2018;Swartwood and van Eck, 2019). More recently, P. grisea was critical in revealing pleiotropic functions of an ancient homeobox gene, and in dissecting the evolution of redundancy between duplicated signaling peptide genes controlling stem cell proliferation in the Solanaceae (Hendelman et al., 2021;Kwon et al., 2022). However, high-quality reference genomes of P. grisea and other species have been lacking, and are needed to promote the full potential and deployment of this system as has been achieved in tomato. Here, we report high-quality chromosome-scale genomes for P. grisea and its close relative P. pruinosa. We demonstrate the power of these resources in enabling forward and reverse genetics by revealing multiple genotype-to-phenotype relationships in floral development, including ICS. Our work establishes Physalis as a new Solanaceae reference system that can advance comprehensive studies of long-standing and emerging biological questions within and beyond the genus.

Results
Chromosome-scale reference genomes of P. grisea and P. pruinosa Among Solanaceae genera, Physalis is more closely related to Capsicum (pepper) than Solanum (eggplant, potato, and tomato) ( Figure 1A). Chinese lantern (Alkekengi officinarum, closely related to Physalis), tomatillo (P. philadelphica and P. ixocarpa), and many other Physalis orphan crops are selfincompatible, large plants with tetraploid genomes, making them challenging to develop into model systems. In contrast, the groundcherry species P. grisea, P. pruinosa, and close relatives have reasonable genome sizes (estimated $1-2 Gb), are diploid, self-and cross-compatible, have rapid generation times (first mature fruit 66-70 days after sowing), and are easy to grow and manage in both greenhouses and fields. The taxonomy and naming of Physalis species have a convoluted past that was recently clarified (Pretz and Deanna, 2020). Physalis pruinosa was initially designated to describe Physalis in the northeastern USA, showing erect or prostrate growth with large, thick, and coarsely sinuate-dentate leaves (Rydberg, 1896). A revision of Physalis in the last century proposed P. pubescens var. grisea to differentiate species included in P. pruinosa (Waterfall and Umaldy, 1958). Additional species were then identified (Waterfall, 1967), and P. pubescens var. grisea was ultimately recognized as a separate species, P. grisea (Mart ınez, 1993;Pretz and Deanna, 2020).
As P. grisea and P. pruinosa are closely related, they share similar vegetative and reproductive shoot and organ morphologies, including inflated calyxes encapsulating fruits of similar size, shape, and color ( Figure 1, B-D). Their primary shoots terminate in a single flower inflorescence after five to six leaves, and new shoots emerge according to the sympodial growth habit that is characteristic of all Solanaceae (Lemmon et al., 2018). In Physalis, sympodial units comprise one leaf, one flower, and two axillary (sympodial) shoots ( Figure 1C). A conspicuous feature distinguishing P. pruinosa from P. grisea is the absence of purple pigmentation on stems and petal nectar guides. Physalis pruinosa also has narrower leaves and a smaller stature due to shorter internodes (Figure 1, B and D; Supplemental Data Set 1).
Based on the features described, P. grisea and P. pruinosa are excellent candidates occupying a key phylogenetic position among Solanaceae model systems. We integrated PacBio high fidelity (HiFi) and Oxford Nanopore Technology (ONT) long-read sequencing to establish highly accurate and complete chromosome-scale genome assemblies for both species, with assembly sizes of 1.37 Gb for P. grisea and 1.38 Gb for P. pruinosa ( Figure 1E). The P. grisea and P. pruinosa assemblies are the first Physalis genus reference-quality assemblies, demonstrating substantially improved contiguity, accuracy, and completeness compared to a recent P. floridana genome (Lu et al., 2021) (Supplemental Table S1). Specifically, the P. floridana genome has an error rate (errors/bp) of 3.83 Â 10 -4 and a contig N50 of 4.6 Mbp, whereas our assemblies produced substantially lower error rates of 3.09 Â 10 -6 (P. grisea) and 1.66 Â 10 -6 (P. pruinosa) and much higher contig N50s of 31.6 and 82.2 Mbp, respectively, with gapless assemblies of chromosomes 5 and 7 for P. pruinosa.
Based on RNA-sequencing data from vegetative and reproductive tissues ( (Lemmon et al., 2018), and see "Materials and methods"), we annotated 33,833 and 34,187 genes in the P. grisea and P. pruinosa assemblies, respectively (Supplemental Table S2), with most genes concentrated at the ends of the 12 chromosomes, as was observed in other Solanaceae genomes (Xu et al., 2011;Sato et al., 2012;Kim et al., 2014;Wei et al., 2020) (Figure 1E, see "Materials and methods"). Both genomes are highly repetitive, with 79% of the sequence representing transposable elements, especially long terminal repeat (LTR) retrotransposons ( Figure 1E). Comparing the two genomes, we observed nearly complete macrosynteny across all 12 chromosomes, consistent with the close relationship of these species, but also detected a few small-scale inversions and translocations ( Figure 1E). Calling single-nucleotide polymorphisms (SNPs) using P. pruinosa Illumina short-read sequences against the P. grisea reference revealed 60,087 homozygous SNPs, with predicted high-impact changes (SNPeff, Cingolani et al., 2012) on 43 gene transcripts (Supplemental Tables S3 and S4). Despite the broad similarity of these genomes, we identified over 900 structural variants (SVs) between 30 bp and 10 kb in length, many of which intersect coding and putative cisregulatory sequences (Figure 1, F and G; Supplemental Table  S5; Supplemental Data Set S2). Some of these variants could explain phenotypic differences between P. grisea and P. pruinosa.   Figure 1 Reference-quality genome assemblies of P. grisea and P. pruinosa. A, Phylogeny of selected Solanaceae species based on the 20 most conserved protein sequences (see "Materials and methods"). B, Whole plant images of P. grisea and P. pruinosa 40 days after sowing in greenhouse conditions. Bar = 10 cm. C, Sympodial shoot architectures of P. grisea and P. pruinosa. Quantification of internode lengths is in Supplemental Data Set 1. Bar = 5 cm. D, Images of P. grisea and P. pruinosa calyces and fruits at different stages of development. Husks were manually opened to show fruits. Bar = 2 cm. E, Circos plots comparing P. grisea and P. pruinosa genomes. Circos quantitative tracks are summed in 100-kbp windows and show the number of genes (lower tick = 0, middle tick = 25, higher tick = 49), LTR retrotransposons (lower tick = 0, middle tick = 102, higher tick = 204) and SVs (lower tick = 0, middle tick = 4, higher tick = 9). The inner ribbon track shows whole genome alignments, with blue indicating forward-strand alignments and red indicating reverse-strand alignments (inversions). Darker colors indicate alignment boundaries. F, Distribution of deletion and insertion SVs between 30 bp and 10 kbp from P. pruinosa compared to P. grisea, summed in 200-bp windows. G, Counts of SVs intersecting genomic features, comparing P. pruinosa to P. grisea.
An SV in the bHLH transcription factor gene AN1 controls nectar guide color variation We first sought to utilize our genomes to map the most conspicuous phenotype distinguishing the two species, nectar guide color variation. Physalis grisea displays deep purple nectar guides typical of most Physalis species, whereas P. pruinosa does not ( Figure 2A). This pigmentation difference is also found on stems and branches. Crossing P. grisea and P. pruinosa resulted in F1 hybrids showing purple pigmentation, and an F2 population showed that the yellow color was segregated as a single recessive mutation. Mappingby-sequencing localized the mutation to chromosome 4; however, limited recombination resulted in a large interval spanning most of the chromosome ( Figure 2B). To identify candidate genes, we searched for homologs of genes involved in the production of anthocyanins in the Solanaceae genus Petunia. Anthocyanins belong to a class of polyphenolic secondary metabolites named flavonoids, and one outcome of their accumulation in tissues and organs is purple pigmentation (Liu et al., 2018). Many ornamental Petunia species show variation in anthocyanin accumulation, and studies on this diversity have identified enzymes and transcription factors in the anthocyanin pathway (Bombarely et al., 2016;Liu et al., 2018).
Anthocyanin biosynthesis involves three major steps, including the conversion of phenylalanine to 4-coumaroyl-CoA through stepwise enzymatic reactions, and the conversion of 4-coumaroyl-CoA to dihydroflavonols, which are precursors in the final synthesis steps of specific anthocyanins ( Figure 2C). We identified four orthologs of anthocyanin pathway genes and their regulators on chromosome 4. Overlaying our SV analysis revealed a mutation in only one of these genes, a 43-bp deletion in the second intron of the P. pruinosa gene Phypru04g010390, which encodes a basic Helix Loop Helix (bHLH) transcription factor ortholog of petunia ANTHOCYANIN1 (AN1) (Spelt et al., 2000; Figure 2D). AN1 activates the structural gene DIHYDROFLAVONOL REDUCTASE and other anthocyanin regulators (Spelt et al., 2000). Notably, mutations in petunia AN1 result in loss of anthocyanins in all tissues (Spelt et al., 2000(Spelt et al., , 2002. Using reverse transcription polymerase chain reaction (RT-PCR) and sequencing of cDNA, we found that AN1 transcripts in P. pruinosa were longer than those in P. grisea due to a retention of 179 bp from intron 2, which results in a premature stop codon ( Figure 2E). We validated this result by CRISPR-Cas9 targeting PgAN1 (Phygri04g010290) in P. grisea. Five out of 11 first-generation (T 0 ) transgenic lines failed to produce anthocyanins, and sequencing showed that these plants carried edited alleles of PgAN1 (Figure 2, F and G). Though another variant closely linked to AN1 on chromosome 4 could be responsible for the color variation, our genetic and molecular results strongly support that the SV in P. pruinosa AN1 (PprAN1) underlies the absence of purple pigmentation in P. pruinosa and further demonstrate the utility of our genomic resources in deploying forward genetics in Physalis.
The MADS-box genes MPF2 and MPF3 are not essential regulators of ICS The most striking feature of Physalis is the ICS, which evolved repeatedly in other Solanaceae genera and angiosperms (Paton, 1990;Padmaja et al., 2014;Deanna et al., 2019). Soon after fertilization, sepals undergo remarkable growth and expansion acropetally to encapsulate fruits in balloon-like papery husks, which may provide protection from pathogens and promote seed dispersal ( Figure 3A; Baumann and Meier, 1993;Li et al., 2019). Despite longstanding interest, the evolutionary and mechanistic origins of ICS remain unclear. One early defining study proposed that heterotopic expression of MPF2 was essential to the evolution of ICS (He and Saedler, 2005). This hypothesis was based on overexpression of the potato ortholog StMADS16 in tobacco (Nicotiana tabacum), which produced leaf-like sepals. Empirical support in Physalis came from RNA interference (RNAi) knockdown of MPF2 in P. floridana, where multiple transgenic lines showed a reduced calyx size, the severity of which was highly correlated with impaired fertility, but counterintuitively not the level of reduction of MPF2 transcripts (He and Saedler, 2005).
Despite this contradictory result, follow-up studies proposed and tested an extended mechanism involving regulation of MPF2 by the AP1-like transcription factor gene MPF3 (ortholog of Arabidopsis APETALA1 and tomato MACROCALYX), in combination with hormonal control and fertilization (He and Saedler, 2007;Zhao et al., 2013). However, functional data supporting these conclusions were based on overexpression, plus also RNAi and virus-induced gene silencing (VIGS) knockdown of expression. Pleiotropic phenotypic outcomes are common in overexpression experiments, and are challenging to relate to specific genes studied, whereas RNAi and VIGS are difficult to interpret due to variable knockdown efficiencies and potential off-target effects (Xu et al., 2006;Senthil-Kumar and Mysore, 2011). Further convolution of a possible ICS mechanism emerged with the recent publication of the P. floridana genome, and the suggestion that absence of the SEP4 ortholog of the tomato MADS-box gene SlMBP21/J2 in Physalis was yet another critical factor in the origin of ICS (Lu et al., 2021).
To address these inconsistencies and provide a more robust genetic dissection of ICS, we first used CRISPR-Cas9 genome editing to eliminate MPF2 and MPF3 function in P. grisea. We generated five alleles of PgMPF2 (Phygri11g023460) and four alleles of PgMPF3 (Phygri12g018350) ( Figure 3B), and these independent mutations caused different premature stop codons. Notably, none of these homozygous mutants disrupted ICS; all Pgmpf2 CR mutants showed similar calyx inflation as wildtype (WT), and Pgmpf3 CR mutants displayed enlarged and more leaf-like tips of sepals before inflation, a phenotype also observed in tomato mc mutants ( Figure 3C; Yuste-Lisbona et al., 2016). Although this change of sepal tips was accompanied by a lower calyx height/width ratio ( Figure 3G), inflation was unaffected. Besides the sepal phenotype, Pgmpf3 also displayed abnormal branching patterns;   Figure 2 Loss of purple pigmentation in P. pruinosa is due to an intronic SV in the bHLH transcription factor gene ANTHOCYANIN1. A, Images showing the difference in pigmentation between P. grisea and P. pruinosa. Arrows point to purple (P. grisea) compared to yellow (P. pruinosa) pigmentation on stems and flowers. Top bars = 1 cm; bottom bars = 2 mm. B, Mapping by sequencing showing the DSNP-index across all twelve chromosomes using P. grisea as the reference, with SNP ratios between yellow-guide and the purple-guide pools from an interspecific F2 population. Yellow line: 95% confidence interval cut-offs of DSNP-index. C, Simplified pathway of anthocyanin biosynthesis based on data from petunia. Pgmpf3 mutants frequently produced three instead of two sympodial shoots (Figure 3, D-F). Finally, we generated double mutants to test whether eliminating PgMPF2 and PgMPF3 functions together would disrupt inflation. Notably, Pgmpf2 Pgmpf3 plants matched the phenotypes of Pgmpf3 single mutants, including the progression of ICS ( Figure 3H). In summary, these CRISPR-Cas9 engineered loss-of-function mutations in PgMPF2 and PgMPF3 show that these MADSbox genes are not responsible for the evolution of ICS and are not essential regulators of this developmental process.
Targeted mutagenesis of additional MADS-box genes does not abolish ICS In an effort to identify genes involved in ICS, we embarked on a more comprehensive reverse genetics approach targeting MADS-box genes known to regulate floral organ development in tomato and other species, including additional MADS-box family members that mimic ICS when overexpressed or mutated in non-ICS Solanaceae. For example, we characterized a spontaneous tomato mutant with greatly enlarged fleshy fruit-covering sepals and found a transposon insertion SV upstream of TOMATO AGAMOUS-LIKE1 (TAGL1) that caused more than 80-fold overexpression in developing sepals ( Figure 4A). TAGL1 belongs to the AGAMOUS clade of MADS-box transcription factors and is a close paralog of TOMATO AGAMOUS 1 (TAG1). Previous studies showed that both of these genes control flower development, and when either is overexpressed, enlarged and fleshy sepals are produced, in part mimicking ICS (Pnueli et al., 1994;Itkin et al., 2009). To test the roles of the Physalis orthologs of these genes, we generated CRISPR mutants. As observed in corresponding mutants of other species (Yanofsky et al., 1990;Pan et al., 2010), Pgtag1 CR-1 homozygous mutants displayed severe homeotic transformation of stamens to petal-like structures, while Pgtagl1 CR-1 displayed similar but weaker homeotic transformations ( Figure 4B). Importantly, despite these floral organ defects, accompanied also by partial or complete loss of selffertilization, both of these mutants maintained inflation, although calyx size was reduced, potentially due to secondary growth effects (Figure 4, B-E).
Based on their roles in floral organ development and inflorescence architecture, SEPALLATA4 (SEP4) MADS-box genes are another set of ICS candidates. Tomato has four SEP4 clade MADS-box genes: J2, SlMADS1/ENHANCER OF J2 (hereafter EJ2), LONG INFLORESCENCE (LIN), and RIPENING INHIBITOR (RIN). We previously showed that EJ2 and LIN regulate sepal development; mutants of ej2 alone and in combination with lin develop enlarged sepals (Soyk et al., 2017). Analysis of the genome of P. floridana (Lu et al., 2021), and confirmed in our genomes, showed that Physalis lost the ortholog of J2, whereas the other three SEP4 genes are present. Curiously, loss of J2 was proposed to have promoted the evolution of ICS, but non-ICS Solanaceae such as pepper also lack J2. To test the roles of the SEP4 clade in ICS, we used CRISPR-Cas9 to mutate all three SEP4 genes in P. grisea. Notably, multiple independent mutations in PgEJ2, PgLIN, and PgRIN did not inhibit ICS. Similar to our findings in tomato ej2 mutants (Soyk et al., 2017), mutants of Pgej2 CR-1 produced larger sepals in young and fully developed flowers, but inflation proceeded normally, with the only modification being sepal tips failing to coalesce to a single point after inflation is complete ( Figure 4C).

Fertilization is not required for ICS
In flower development, B-class MADS-box genes participate in specifying petal and stamen identity, and the loss of B function leads to homeotic transformations of petals and stamens, which impaired self-fertilization (Yanofsky et al., 1990;Weigel and Meyerowitz, 1994;Theißen and Saedler, 2001). If fertilization-related signals were required for ICS, as reported (He and Saedler, 2007), mutations in B-class MADS-box genes should result in abnormal ICS development. Previously, a mutation deleting the B-class MADS-box gene GLOBOSA1 (GLO1) was shown to develop a double-layered calyx phenotype in P. floridana when fertilized with WT pollen (Zhang et al., 2014). We identified four B-class MADS-box genes in P. grisea, including the four closest homologs of GLO1: PgGLO1 (Phygri01g009190), PgGLO2 (Phygri06g017940), PgDEF (Phygri11g018450), and PgTM6 (Phygri02g012900). CRISPR-Cas9-induced null mutations in all four genes failed to disrupt ICS. Mutants of Pgtm6 CR-1 and Pgglo2 CR-1 appeared WT, whereas Pgglo1 CR-1 and Pgdef CR-1 both displayed expected homeotic transformations of stamens to carpels and petals to sepals. Notably, calyx inflation was unaffected even in the second whorls of Pgglo1 CR-1 and Pgdef CR-1 where petals were converted to sepals (Figure 4, D and E).
Fertility or signals from developing fruits have also been observed to be required for the initiation and progression of inflation, perhaps due to the activity and signaling of hormones such as cytokinin and gibberellin (He and Saedler, 2007). However, many of our MADS-box mutants with severe floral organ homeotic transformations also fail to self-fertilize and have various degrees of defects in fruit development. That ICS is unaffected in these mutants provides compelling genetic evidence that ICS can be uncoupled from normal fertilization. In particular, both Pgdef CR-1 and Pgglo1 CR-1 homozygous mutants cannot selffertilize and form multiple small fruits without seeds due to homeotic transformations of stamens to carpels, yet the twin outer layers of sepals still form inflated calyces ( Figure 4E). Moreover, in Pgtagl1 CR-1 and Pgtag1 CR-1 mutants, which cannot self-fertilize and whose fruits arrest early in development or fail to form entirely, respectively, inflation remained intact ( Figure 4E). In summary, although earlier observations, hypotheses, and data suggested critical roles of several MADS-box genes in the evolution of ICS, our results show that calyx inflation is maintained in loss-of-function mutants of the P. grisea AG clade, SEP4 clade, and B-class MADS-box transcription factor genes. These data further demonstrate that although fertilization signals or developing fruit may contribute to the regulation of calyx inflation, neither is absolutely required.
The huskless mutant, caused by a mutation in an AP2-like transcription factor, eliminates inflated calyx Forward genetics is a powerful and unbiased approach to identify genes controlling traits of interest in model systems.
We performed a small-scale ethyl methanesulfonate (EMS) mutagenesis screen in P. grisea to identify genes involved in calyx development (see "Materials and methods"). A recessive mutant bearing fruits without husks was identified and named huskless (hu) (Figure 5, A and B). Scanning electron microscope (SEM) imaging of dissected flower buds showed that hu mutants developed three floral whorls instead of four compared to WT ( Figure 5, C and D). To isolate the causative mutation, we sequenced genomic DNA from a pool of hu mutants and WT siblings from the original P. grisea mutagenesis (M2) family (see "Materials and methods"). Aligning Illumina-sequenced reads to the P. grisea genome allowed screening for single-nucleotide variants (SNVs) that were homozygous in the hu pool but not in the WT sibling pool. We scored these SNVs for predicted functional consequences on annotated gene transcripts using SnpEff (Cingolani et al., 2012). Out of eight such SNVs, one was a G-to-A mutation in a 3 0 -splice site of Phygri09g010120, which encodes an APETALA2 (AP2)-like transcription factor ( Figure 5E; Supplemental Table S6). Co-segregation analysis in M3 families confirmed the association of this mutation with the hu phenotype (Supplemental Table S7), and sequencing RT-PCR products of Phygri09g010120 from hu floral tissue showed mis-splicing in the fourth intron, resulting in partial skipping of exon 5 ( Figure 5E). Importantly, independent CRISPR-generated mutations of this AP2-like gene in P. grisea resulted in independent mutations that caused the same phenotype as hu ( Figure 5F).
HU is the homolog of P. hybrida AP2B/BLIND ENHANCER (BEN) ( Figure 5G), which specifies the second and third floral whorl identity (Morel et al., 2017) with its redundant paralog BROTHER OF BEN (BOB). Petal development is strongly inhibited in ben bob double mutants, resulting in severely reduced or absent petals, and partial conversion of sepals into petals, resembling hu (Morel et al., 2017). Because the P. hybrida genome is highly fragmented (Bombarely et al., 2016), we performed a synteny analysis of the chromosomal segments containing BOB in P. grisea, P. pruinosa, and S. lycopersicum and found that this paralog of HU (BEN) is present in tomato but not in groundcherry ( Figure 5H). Thus, hu emerged in our forward genetics mutagenesis screen, because the BOB ortholog and therefore redundancy is absent in P. grisea.
The first floral whorl of hu displays characteristics of both sepals and petals (Figure 5, I and J). The whorl begins developing with green as the dominant color, like sepals, but gradually turns yellow as the flower matures, maintaining Carpel. Bar = 0.5 mm. E, Gene model showing the G-to-A point mutation causing partial skipping of exon 5 in the AP2-like transcription factor gene Phygri09g010120. Blue-colored nucleotides represent exonic sequences; red boxes indicate 3' splice sites in WT and hu. F, CRISPR-Cas9-generated mutations in Phygri09g010120. Top: gene models showing three independent CRISPR null alleles of hu. Sequences 3 0 of the third intron are omitted. hu CR-1 is homozygous for allele 1 (a1). Bottom: images of hu CR-1 flower phenotype. Bar = 2 mm. G, Maximum likelihood consensus tree of the TOE-type euAP2 proteins from A. thaliana (gene names in green), P. axillaris (Peaxi IDs in Figure 5 (Continued) purple), S. lycopersicum (Solyc IDs in red), and P. grisea (Phygri IDs in black). Bootstrap values (%) based on 500 replicates are indicated near the branching points; branches below 50% have collapsed. H, Local synteny analysis between S. lycopersicum and P. grisea showing the absence of the Solyc10g084340 orthologue (petunia BOB orthologue) in P. grisea. Arrows indicate genes and orientations. Protein identity percentages between orthologues are indicated by ribbon shades in gray scale; only links above 80% identity are shown. I and J, Series of images of WT and hu developing flowers from before anthesis through early fruit development. Bar = 5 mm. K, PCA of WT and hu RNA-seq data. Right image: visual reference of the two stages used for expression profiling from WT and hu floral whorls. Numbers (-1 or -2) in the sample groups represent stage 1 or 2; petal or sepal whorls in WT are denoted as Pe, Se respectively; PeSe represents the merged outer whorl in hu. The top 3,000 DEGs were used for PCA. Bar = 5 mm. green color at organ tips. Nectar guides are also visible throughout the development of the first whorl, indicative of early petal identity. After fertilization, the first whorl mildly increases in size but fails to fully inflate before gradually senescing as hu fruits develop into the size of WT fruits.
To characterize the role of HU in whorl identity and ICS, we profiled transcriptomes by RNA-seq from WT sepals and petals at two stages of organ maturation and compared them with corresponding stages of hu first whorls ( Figure 5K; see "Materials and methods"). Principal component analysis (PCA) revealed hu expression profiles (denoted as hu-PeSe) were positioned between the profiles of WT sepals and petals at both stages, supporting the mixedorgan identity observed phenotypically. Thus, the loss of the inflated calyx in hu mutants is from a failure to properly specify sepal and petal identity as opposed to directly disrupting a mechanistic origin of ICS. Our identification of hu through forward genetics exemplifies how the presenceabsence variation of paralogs can shape genetic redundancies and genotype-to-phenotype relationships in related lineages, and further illustrates the value of multiple related model systems.

Discussion
Discoveries in plant development, cell biology, and genetics continue to depend on a limited number of model systems, often centered around A. thaliana and its relatives in the Brassicaceae family (Chang et al., 2016). New models are essential to advance fundamental and applied research beyond the small amount of biological diversity captured by current models. While additional model species have been proposed or are under development (Chang et al., 2016), most lack the powerful combination of efficient genomics and genetics. Moreover, emphasis is largely on neglected lineages and single representative species within them. An approach with complementary benefits relies on multiple models within a lineage to address often overlooked questions of speciesspecific and comparative evolutionary history over short time frames. The Solanaceae family is ideal in this regard, including: (1) rich diversity throughout approximately 100 genera and 43,000 species spanning $30 million years of evolution; (2) broad agricultural importance from 42 dozen major and minor fruit and vegetable crops; and (3) feasibility of rapidly developing and integrating genome editing with reference and pangenome resources.
By establishing high-quality chromosome-scale assemblies for P. grisea and P. pruinosa, we developed these Physalis species as new models to advance Solanaceae systems with genomics and genetics. Most significantly, our integration of these resources revealed that the mechanisms underlying ICS remain elusive. Indeed, despite previous evidence suggesting otherwise, we conclude that none of the 11 candidate MADS-box genes we functionally characterized using genome editing, nor fertility alone, are core regulators of ICS. Our findings therefore force a reset in the search for the physiological, genetic, and molecular mechanistic origins of this evolutionary novelty. Though a logical starting point, the candidate gene approach based on MADS-box overexpression phenotypes in other species was prone to misleading hypotheses and false positives, likely due to the complex evolutionary history of the MADS-box family members and their even more complex genetic and physical interactions. Indeed, multiple MADS-box genes appear to be capable of mimicking ICS through overexpression, possibly due to coordinated activation of closely related paralogs and subsequent complex feedback regulation and interactions among other family members. This might suggest double and higherorder mutants of these or other MADS-box genes not investigated here would ultimately perturb ICS, possibly reflecting a collective role of multiple family members acting redundantly or in a network. However, such a result would not necessarily indicate direct roles for these genes in the evolutionary steps leading to ICS.
Based on our genetics, we expect additional or other genes and molecular programs to be central, and the tools established here provide the foundation to revisit ICS in an unbiased way. ICS is a rapid and dynamic process, where extraordinary morphological changes in sepal growth and inflation occur within a few days. This suggests that the molecular events driving and responding to the inception of the transition from a noninflated sepal whorl to active inflation may be short-lived, happening in the order of hours. We propose that the future dissection of ICS should be based on detailed and integrated temporal, morphological, and molecular analyses to capture these transient events. A recent study in tomato took advantage of transcriptome profiling and computational ordering of hundreds of singleshoot apical meristems to capture and reconstruct a highly detailed temporal gene expression map of the floral transition. These data revealed previously hidden genes, shortlived expression programs, and several genes that function in parallel transient pathways critical to the floral transition process (Meir et al., 2021). With the new reference genomes and annotations of P. grisea and P. pruinosa, a similar approach can be applied to ICS, where large numbers of individual sepals can readily and reliably be harvested and profiled throughout calyx development. As opposed to focusing on entire floral buds (Gao et al., 2020), such highresolution temporal transcriptome profiling of sepals alone would provide comprehensive and unbiased information regarding global and possibly gene-specific molecular signatures in the initiation and maintenance of inflation, and expose new candidates that can be studied using the integrated genomics and genome editing strategies demonstrated here.
Beyond floral development and ICS in Physalis, our work sets a high-quality anchor to broaden biological questions and discoveries in the Solanaceae, and further illustrates fast and efficient approaches to building new model systems.
Establishing new pangenome and genome editing tools in many additional genera of Solanaceae and of other plant families will enable comparative genomic and genetic studies over both short and long evolutionary timescales.

Materials and methods
Plant material, growth conditions, and phenotyping Seeds of P. grisea and P. pruinosa were obtained from the Solanaceae Germplasm Bank at the Botanical Garden of Nijmegen and from commercial seed sources. Seeds were directly sown into the soil (PRO-MIX BX Mycorrhizae Growing Mix) in 96-well plastic flats and grown in the greenhouse under long-day conditions (16-h light/8-h dark) supplemented with artificial light from high-pressure sodium bulbs ($250 lmol m -2 s -1 ). The temperature ranged from 26 C to 28 C during the day to 18 C-20 C during the night, with a relative humidity of 40%-60%. Four-week-old seedlings were transplanted to 4-L pots filled with soil (PRO-MIX HP Mycorrhizae Growing Mix) in the same greenhouse, or into the fields at Cold Spring Harbor Laboratory unless otherwise noted. The tomato mutant displaying enlarged fleshy sepals from Figure 4 was a gift from Dr. Dani Zamir, which arose from the whole-genome backcross lines constructed from a cross between Solanum pimpinellifolium (LA1589) and S. lycopersicum inbred variety cv. E6203 (TA209) (Grandillo and Tanksley, 1996). Branching and internode length phenotypes were assayed in greenhousegrown plants 2 months after sowing.

Extraction of high-molecular weight DNA and long-read sequencing
For long-read sequencing, shoot apices of 3-week-old seedlings were harvested after a 48-h dark treatment. Extraction of high-molecular weight genomic DNA, construction of ONT libraries and PacBio HiFi libraries, and sequencing were described previously (Alonge et al., 2020(Alonge et al., , 2021. Hi-C experiments were conducted at Arima Genomics (San Diego, CA, USA) from 2 g of flash-frozen leaf tissue.

Physalis grisea chloroplast and mitochondria genome assembly
To assemble the P. grisea chloroplast genome, all HiFi reads were aligned to the previously published Physalis chloroplast reference genome (GenBank ID MH019243.1) with Minimap2 (v2.17-r974-dirty, -k19 -w19) (Li, 2018). All reads with at least one primary alignment spanning at least 90% of the read were assembled with HiCanu (version 2.0, genomeSize = 155k) (Nurk et al., 2020). The three resulting HiCanu unitigs were aligned to themselves with Nucmer (v3.1, -maxmatch) (Kurtz et al., 2004) and manually joined to produce a single trimmed and circularized contig. The contig was rotated to start at the same position as the reference. Liftoff was used to annotate the P. grisea chloroplast genome (Shumate and Salzberg, 2021).
Physalis grisea mitochondrial contigs were extracted from the polished ONT Flye assembly (see below). To identify mitochondrial contigs, tobacco (N. tabacum), pepper (C. annuum), tomato (S. lycopersicum), and eggplant (S. melongena) mitochondrial transcript sequences (GenBank IDs NC_006581.1, NC_024624.1, NC_035963.1, and NC_050334.1, respectively) (Sugiyama et al., 2005) were extracted with gffread (Pertea and Pertea, 2020) and aligned to the ONT Flye assembly with Minimap2 (v2.17-r941, -x splice). For each query transcriptome, any ONT contig shorter than 500 kb with at least one alignment at least 100-bp long was considered, and any such contig identified by at least two query transcriptomes was labeled as mitochondrial. These contigs were aligned to the P. grisea chloroplast genome which indicated that they were all mitochondrial and not chloroplast sequences. These ONT mitochondrial sequences were aligned to the raw HiCanu contigs (see below) with Nucmer (v3.1, -maxmatch), and nine ONT contigs were manually replaced with two homologous HiCanu contigs. Liftoff was used to annotate the P. grisea mitochondrial genome using the S. melongena annotation as evidence.

P. pruinosa genome assembly
The P. pruinosa genome was assembled just as the P. grisea genome, with the following distinctions. HiFi reads were assembled with Hifiasm instead of HiCanu (v0.13-r308, -l0) (Cheng et al., 2021). Also, neither a chloroplast nor a mitochondria genome was assembled for P. pruinosa. To screen organellar contigs, raw Hifiasm primary contigs were aligned to the P. pruinosa reference chloroplast genome (GenBank ID MH019243.1) and the P. grisea mitochondria genome. As with P. grisea, SVs were called to identify potential misassemblies, and no misassemblies were found in the P. pruinosa scaffolds.

SV detection
Structural variation between P. grisea and P. pruinosa was identified using the same pipeline used to identify SV-like misassemblies described above. However, instead of aligning P. grisea reads to the P. grisea assembly and P. pruinosa reads to the P. pruinosa assembly, P. grisea reads were aligned to the P. pruinosa assembly and P. pruinosa reads were aligned to the P. grisea assembly. Also, Winnowmap2 (version 2.0) was used instead of Winnowmap for alignments (Jain et al., 2020a). SVs intersecting genomic features in Figure 1G were counted as previously described (Alonge et al., 2020) based on P. grisea annotation version 1.3.0.
CRISPR-Cas9 mutagenesis, plant transformation, and selection of mutant alleles CRISPR-Cas9 mutagenesis was performed following our protocol as previously described (Lemmon et al., 2018;Swartwood and van Eck, 2019). Gene IDs related to this study are listed in Supplemental Table S8. Briefly, guide RNAs (gRNAs) were designed to be used in the Golden Gate cloning system (all gRNAs used in this study are listed in Supplemental Table S9 and were assembled into Level 1 (L1) constructs under the control of the U6 promoter. L1 guide constructs were then assembled with Level 1 constructs pICH47732-NOS pro :NPTII and pICH47742-35S pro :Cas9 into the binary Level 2 vector pAGM4723. The final binary vectors were then transformed into groundcherry by Agrobacterium tumefaciens-mediated transformation through tissue culture (Swartwood and van Eck, 2019). Multiple independent first-generation transgenic plants (T 0 ) were genotyped with specific primers surrounding the target sites. T 0 plants were self-pollinated and the T 1 generation was genotyped for the target genes and the presence or absence of the CRISPR-Cas9 transgene. We noticed that tissue culture and transformation resulted in a variable frequency of tetraploidy. All mutants were verified as homozygous or biallelic and having only mutant alleles.
Tissue collection, RNA extraction, RT-PCR, and RT-qPCR All tissues used were immediately frozen in liquid nitrogen before RNA extraction. For the analysis of AN1 transcripts in P. grisea and P. pruinosa, young flower buds were harvested. For TAGL1 gene expression analysis in the tomato calyx mutant, developing sepals at the open flower stage were harvested. Sepal tissue from three different WT plants and four different mutant plants was assayed as three biological replicates and four biological replicates, respectively. For the analysis of huskless (hu) and WT sepal gene expression profiles, the first whorl of hu, and WT sepals and petals at the stages shown in Figure 5K were harvested. Total RNA was extracted with the Zymo Research Quick-RNA Microprep kit following the manufacturer's protocol. cDNA synthesis was performed using SuperScript IV VILO Master Mix (Thermo Fisher Scientific, Waltham, MA, USA) with 500 ng to 1,500 ng total RNA input. RT-PCR was performed with KOD One PCR Master Mix and primers listed in Supplemental Table S10. RT-qPCR was performed using Fast SYBR Green Master Mix with primers listed in Supplemental Table S10 on the Applied Biosystems QuantStudio version 6 system.

Transcriptome analysis of huskless and WT
RNA-seq and differentially expressed genes (DEGs) analyses were performed as previously described with slight modification (Kwon et al., 2022). Briefly, the libraries for RNA-seq were prepared by the KAPA mRNA HyperPrep Kit (Roche, Basel, Switzerland). Paired-end 150-base sequencing was conducted on the Illumina sequencing platform (NextSeq, High-Output). Reads for WT and hu were trimmed by quality using Trimmomatic (Bolger et al., 2014) (Patro et al., 2017). Quantification results from Salmon were imported into R using tximport version 1.24.0 (Soneson et al., 2016). PCA analysis of samples was performed and plotted using DEseq2 version 1.36.0 (Love et al., 2014) and pcaExplorer version 2.22.0 (Marini and Binder, 2019) with counts of the top 3,000 variable genes.

Mapping of the yellow nectar guide variant
The yellow-guide trait displayed classical patterns of Mendelian inheritance of a single recessive gene in the F1 and F2 populations from the cross between P. grisea and P. pruinosa. A bulk segregant analysis was performed using 20 plants from each of the yellow-guide pool and purple-guide pool in the F2 segregating population. All reads were assessed for overall quality by FastQC version 0.11.9 (Andrews, 2010). Read mapping, variant calling, and SNPindex calculation of the Illumina reads from each pool were done by QTL-seq version 2.2.2 (Takagi et al., 2013). Parameters used for the sliding window SNP-index calculation by the qtlplot command were -n1 20 -n2 20 -F 2 -D 250 -d 5 -w 1000 -s 50. The calculated SNP index in each sliding window was imported into R (R Core Team, 2020) for the final plot.

EMS mutagenesis and mutant screening in P. grisea
A small-scale EMS mutagenesis was performed using $1,500 P. grisea seeds (measured by weight). Seeds were soaked in distilled water overnight and then treated with 0.2% EMS (Sigma Aldrich, St. Louis, MO, USA) for 6 h. After treatment, seeds were washed with distilled water thoroughly and sowed into 96-well flats. Four-week-old seedlings were then transplanted into the field. When harvesting, fruits from every four M 1 plants were bulk harvested into one group. For mutant screening, 80 groups of M2s were sowed, transplanted, and screened for sepal-related phenotypes.

Mapping of huskless
Three huskless phenotype plants were identified from the same group. The pooled DNA from the three mutants, and the pooled DNA from 30 WT-looking siblings from the same group, were obtained by CTAB extraction methods. Libraries were prepared for sequencing using the Kapa Hyper PCR-free Kit and sequenced on Illumina Nextseq (PE150, high output). All reads were assessed for overall quality by FastQC version 0.11.9 (Andrews, 2010), and trimmed with Trimmomatic version 0.39 (Bolger et al., 2014) with parameters ILLUMINACLIP:TruSeq3-PE.fa:2:40:15:1:FALSE LEADING:30 TRAILING:30 MINLEN:75 TOPHRED33. Trimmed paired reads were mapped to the reference P. grisea genome using BWA-MEM (Li, 2013). Alignments were then sorted with samtools (Li et al., 2009). and duplicates marked with PicardTools (Picard Toolkit, 2019). Variants were called with freebayes (Garrison and Marth, 2012) and filtered with VCFtools (Danecek et al., 2011) for SNPs with a minimum read depth of 3 and minimum quality value of 20. SNPs that are homozygous in the mutant pool but not homozygous in the WT sibling pool were analyzed for effects on transcripts with snpEff (Cingolani et al., 2012) with P. grisea annotation version 1.3.0.

Molecular phylogenetic analyses
In order to determine the phylogenetic relationship between the eleven selected Solanaceae species, 18 genomes were used to define orthogroups by Conservatory (Hendelman et al., 2021). Protein sequences of the twenty most conserved orthogroups genes were aligned with MAFFT (version 7.487) FFT-NS-2 (Katoh and Standley, 2013) (see Supplemental Data Set 6), before constructing the tree by IQ-tree with the following parameters -st AA -b 100 -pers 0.5 -wbtl (Minh et al., 2020). For the phylogenetic analysis of AP2-like proteins, protein sequences of the orthologs were retrieved from P. grisea, S. lycopersicum, and P. axillaris by BLAST (Altschul et al., 1990). Protein sequences (see Supplemental Data Set 7) were imported in MEGA version 11 (Tamura et al., 2021) and aligned with MUSCLE (default parameters). The tree was constructed using the maximum likelihood method and JTT matrix-based model. Bootstrap values (%) based on 500 replicates are indicated near the branching points; branches 550% have been collapsed. Alignment and tree files are provided as Supplemental Files S1 and S2.

Synteny analysis at the SlBOB locus
Because the scaffold quality of the P. axillaris genome in the vicinity of BOB was suboptimal, we used SL4.0 with the P. grisea genome for the analysis. A BLAST search using Petunia BOB and SlBOB cDNA query sequences against the P. grisea genome failed to retrieve a high-confidence hit other than Phygri09g010120, which is the BEN ortholog. BLAST search of genes upstream and downstream of SlBOB located their syntenic regions in the P. grisea genome. Genomic sequences with annotations from Solyc10g084240 to Solyc10g084420, and from Phygri10g011780 toPhygri10g011960 were used in clinker version 0.0.23 (Gilchrist and Chooi, 2021) to generate gene translation alignments and visualizations.

Accession numbers
Genome assemblies and annotations are available at https:// github.com/pan-sol/pan-sol-data/tree/main/Physalis. Raw sequence data from this article can be found in Sequence Read Archive (SRA) under the BioProject PRJNA862958.

Supplemental data
The following materials are available in the online version of this article.
Supplemental Figure S1. Hi-C heatmaps confirm reference assembly structural accuracy.
Supplemental Figure S2. Illustrations of CRISPRengineered mutations in this study.
Supplemental Figure S3. Maximum likelihood consensus tree of the euAP2 proteins from A. thaliana, P. axillaris, S. lycopersicum, and P. grisea.
Supplemental Table S1. Genome assembly statistics. Supplemental Table S2. Annotation stats of P. grisea and P. pruinosa genomes.
Supplemental Table S3. Result summary of SNP calls of P. pruinosa Illumina reads against P. grisea as reference.
Supplemental Table S4. High-impact SNP calls of P. pruinosa Illumina reads against P. grisea as reference.
Supplemental Table S5. SVs intersecting CDS. Supplemental Table S6. SNPs with predicted high impact on transcripts of huskless.
Supplemental Table S7. Co-segregation test of the G/A SNP in Phygri09g010120 and the huskless phenotype.
Supplemental Table S8. Genes related to work in this study.
Supplemental Table S9. CRISPR guides used in this study. Supplemental Table S10. Primers used in this study. Supplemental Data Set 1. Internode length measurement of P. grisea and P. pruinosa related to Figure 1, B and C.
Supplemental Data Set 2. SVs intersecting genes. Supplemental Data Set 3. CRISPR-generated mutations in this study.
Supplemental Data Set 4. Branching phenotype counts for WT, Pgmpf2, and Pgmpf3 related to Figure 3F.
Supplemental Data Set 5. Calyx length and width measurement of WT, Pgmpf2, and Pgmpf3 related to Figure 3G.
Supplemental Data Set 6. Protein sequences used for the phylogenic analysis of Solanaceae species in Figure 1A.
Supplemental Data Set 7. Protein sequences used for the phylogenetic analysis of AP2-like proteins in Figure 5G.
Supplemental Data Set 8. Statistical analysis tables. Supplemental File S1. Tree file for the phylogenetic analysis in Figure 1A.
Supplemental File S2. Tree file for the phylogenetic analyses in Figure 5G and Supplemental Figure S3.