Characterization of the F Locus Responsible for Floral Anthocyanin Production in Potato

Anthocyanins are pigmented secondary metabolites produced via the flavonoid biosynthetic pathway and play important roles in plant stress responses, pollinator attraction, and consumer preference. Using RNA-sequencing analysis of a cross between diploid potato (Solanum tuberosum L.) lines segregating for flower color, we identified a homolog of the ANTHOCYANIN 2 (AN2) gene family that encodes a MYB transcription factor, herein termed StFlAN2, as the regulator of anthocyanin production in potato corollas. Transgenic introduction of StFlAN2 in white-flowered homozygous doubled-monoploid plants resulted in a recovery of purple flowers. RNA-sequencing revealed the specific anthocyanin biosynthetic genes activated by StFlAN2 as well as expression differences in genes within pathways involved in fruit ripening, senescence, and primary metabolism. Closer examination of the locus using genomic sequence analysis revealed a duplication in the StFlAN2 locus closely associated with gene expression that is likely attributable to nearby genetic elements. Taken together, this research provides insight into the regulation of anthocyanin biosynthesis in potato while also highlighting how the dynamic nature of the StFlAN2 locus may affect expression.

In petunia (Petunia hybrida), which is often used as a model organism for the study of anthocyanin production, expression of the early biosynthetic genes is initiated in flowers by a MYB transcription factor denoted as PhAN2. PhAN2 also controls expression of the bHLH transcription factor, PhAN1, which regulates the late biosynthetic genes (Quattrocchio et al. 1999;Spelt et al. 2000). In potato tubers, separate loci have been proposed to control tuber skin pigmentation: D, P and R. The D locus is required for any skin pigmentation, while P and R control purple or red skin color (Dodds and Long 1955). All three loci have been mapped and cloned. Subsequent analysis has shown that D encodes a MYB transcription factor on chromosome 10 with high sequence identity to PhAN2, currently called StAN1 (Payyavula et al. 2013;D'Amelia et al. 2014). There has been discrepancy in the naming of StAN1 as it was concurrently characterized and named both StAN1 and StAN2; in current literation and herein StAN1 will refer to the potato homolog of PhAN2 (Jung et al. 2009;Payyavula et al. 2013;D'Amelia et al. 2014;Liu et al. 2016). P and R encode the biosynthetic genes, F395'H on chromosome 11 and DFR on chromosome 2, respectively (De Jong et al. 2004;Jung et al. 2005;Zhang et al. 2009a). However, the control of tuber flesh pigmentation is not as clear. A study by Zhang et al. (2009b) showed that a homolog of the bHLH transcription factor PhAN1 [initially called StAN1 but since renamed StbHLH1; (Payyavula et al. 2013;D'Amelia et al. 2014)] mapped to chromosome 9 and plays a significant role in, but is not wholly responsible for, anthocyanin accumulation in tuber flesh. Later studies showed that there is substantial allelic diversity in the C-terminal region, specifically the R-repeat, of the R2R3 MYBs, such as StAN1, and in the bHLH transcription factors (StbHLH1 and StJAF13) that act as co-regulators, resulting in a variety of tuber pigmentations across genotypes (Liu et al. 2016;Strygina et al. 2019).
Similar to tubers, three potato flower loci have been described: F, D, and P (van Eck et al. 1993). The F locus is required for any floral pigmentation, whereas D and P control color shade. The floral D locus, responsible for the biosynthesis of red anthocyanins, maps to chromosome 2 and appears to correspond to the tuber R locus (DFR). The floral P locus is responsible for purple anthocyanin accumulation, localizes to chromosome 11, and appears to correspond to the tuber P locus (F395'H). The same study used restriction fragment length polymorphism analysis to map the F locus to chromosome 10, nearby the tuber D locus. Since purple tuber skins are observed in plants with white flowers and vice versa, it is plausible that there are multiple homologs of PhAN2 (which may have arisen via duplication on chromosome 10) that independently dictate pigment accumulation in different tissues.
As the genetic mechanism of anthocyanin production in potato flowers has received considerably less investigation than anthocyanin production in tubers, we set out to investigate the floral F locus. Using a segregating diploid population in conjunction with inbred and homozygous individuals derived from that population, we identified a separate PhAN2 homolog that underlies the floral F locus. An RNA-sequencing (RNA-seq) approach revealed the regulatory cascade caused by this homolog while analysis of the locus itself denoted duplications that may underlie differences in gene expression and floral phenotypes.

Plant material used in this study
The population segregating for purple or white flower color was generated as described by Peterson et al. (2016). Briefly, it is derived from a diploid cross between a white-flowered doubled monoploid, DM 1-3 516 R44, and a heterozygous purple-flowered individual, RH89-039-16 (Potato Genome Sequencing Consortium 2011). In this study, the F 1 population (DRH), comprised of 95 individuals, was screened for flower color, with the white-flowered plants designated DRH W and the purple-flowered plants designated DRH P . Plants were grown in growth chambers with a 16 h photoperiod, 250 mE m 22 s 21 , 22°days, and 18°nights. An inbred (S 5 ) individual derived from DRH and fixed for purple flowers, designated DRH P 28-5, was obtained through successive rounds of self-pollinations. To generate monoploids from F 1 individuals, we conducted anther culture on immature flower buds of numerous DRH plants (Paz and Veilleux 1999), with regenerated plants screened by flow cytometry to identify monoploid individuals according to Teparkum and Veilleux (1998). In addition, nine other monoploid potato clones available from a previous study (Hardigan et al. 2016) were characterized for flower color and sequence. The entirety of germplasm used is presented in Supplementary Figure 1 and Supplementary Table 1.
Genotype analysis SNP-chip genotype data from Peterson et al. (2016) were analyzed using the purple/white phenotype of the F 1 segregating population. Briefly, the Infinium platform (Illumina, Inc.; 8303 SNP array for potato (Felcher et al. 2012)) was used to genotype 95 DRH F 1 plants with known flower color (44 DRH W and 51 DRH P ). Genotyping calls were made using an Illumina iScan reader with the Infinium HD Assay Ultra and allele calls using GenomeStudio (Illumina, Inc.).

Gene expression profiling
Samples were collected by combining corollas from ten DRH P or ten DRH W individuals from the DRH F 1 segregating population; two samples of ten corollas were collected for each color. RNA was extracted using a hybrid trizol/Qiagen RNeasy mini kit extraction protocol (https://microarray.adelaide.edu.au/protocols/), including a DNase treatment (Ambion Catalog # AM 1906) and followed by quality assessment on a Bioanalyzer (Agilent). Illumina RNA-seq libraries were constructed and sequenced on Hi-Seq 2500 platform to obtain 100 nt paired-end reads. All read quality control and subsequent analysis were performed using the CLC Genomics Workbench 7.5. Reads were trimmed to remove adapters and 13 nt from the 59 end to mitigate bias in the random priming of library preparation. Reads were further cleaned to remove low quality base calls (,20) and short reads (,40 nt). Reads were mapped to the PGSC DM genome v 4.03 (Sharma et al. 2013) using the following parameters: mismatch cost = 2; insertion cost = 3; deletion cost = 3; length fraction = 0.9; similarity fraction = 0.8; max number of hits per read = 10. Expression values were reported in RPKM (reads per kilobase gene model per million mapped reads). Duplicate samples were used in an unpaired empirical analysis of differential gene expression and p-values were corrected for false discovery rate (FDR). Following this, the genes were filtered based on an FDR p-value ,0.05 and an absolute value fold-change .2.0. This resulted in 78 annotated genes classified as differentially expressed.

AN2 construct generation
All constructs were created using the pCambia 1305.1 vector as a backbone, excising the 35S promoter and GUS sequences by restriction digest (BamHI and BstEII) and replacing them with the relevant promoter/gene combination. Genomic sequences of the purple haplotype promoter and gene sequences were cloned from the inbred DRH P 28-5. Since sequence data were not available for DRH P 28-5 at the time, primers were designed to conserved regions within purple-flowered individuals of a sequenced monoploid panel (Hardigan et al. 2016). AN2 cDNA was cloned directly from the DRH P samples used for RNA-seq analysis (above). The following primers were used to engineer compatible restriction sites onto the promoter (BamHI) and coding (BstEII) sequence while a conserved Xba1 site located in the first exon was used to join promoters and coding sequences: An2pro2 (AATTATGgaTccTCTTGGTTTTTCTTTTCATATTTATAC), An2proXba1 (ACCAGCTCTAGAAGGAACAAGATGCC), An2CDSExon1 (TTGGGAGTGAGAAAAGGTTCATGG), and AN2CDSBstEII (ATATTAggtGaccCCCTAGTACAAGTAGTAGTACAATACC). Verified products were ligated into the pJET 1.2 cloning vector and sequenced (Thermo Scientific CloneJET PCR Cloning Kit #K1232). pJET plasmids were digested with the appropriate enzyme (BamHI and XbaI for the promoter; XbaI and BstEII for the coding sequence) and the promoter was triple-ligated with the coding sequence into the pCambia 1305.1 binary vector. Binary vectors were introduced to ElectroMAX Agrobacterium tumefaciens LBA4404 cells (Invitrogen #18313-015). Primers AN2cDNAF (GTATCCCTAGTACAAGTAGT) and AN2cDNAR (ACAACATATCATGAATATTGCCA) were designed from cDNA of StFlANs and used to amplify genomic DNA extracted from in vitro leaf tissue of DRH 28-5; the two resulting bands were Sanger sequenced at the Virginia Biocomplexity Insitute Core Facility.

Plant transformation
Plant transformation was carried out as described by Rooke and Lindsey (1998) with minor modifications. The resulting shoots were allowed to root in basal MS (Murashige and Skoog 1962) media (4.43 g l -1 MS salts, 3% sucrose, 7 g l -1 agar, pH 5.7-5.8). Once the rooted shoots were established, the media was washed off and plants transferred to peat pellets prior to placement in greenhouse groundbeds for phenotyping.

Genomic analysis of the F locus
To determine the sequence of the white-colored allele of the F locus, we performed whole genome sequencing on the DRH W -derived monoploids as described previously (Hardigan et al. (2016). The purple allele of the F locus was obtained by cloning the gene from the homozygous DRH P 28-5 inbred. Whole genome sequencing alignments were examined for variation at the F locus using Integrated Genome Viewer (Robinson et al. 2011). Sequence and phylogenetic analyses were performed using the Lasergene suite (DNASTAR, Inc., Madison, WI) and sequences of StAN1 (AGC31676) and PhAN2 (A4GRV2) were retrieved from the UniProt database (The Uniprot Consortium 2017). Identification of miniature transposable elements (MITEs) near the AN2 locus was performed using RepeatMasker to query the representative potato MITE sequences retrieved from the P-MITE database (Chen et al. 2014) against the whiteflowered DM reference genome v 4.04 (Potato Genome Sequencing Consortium 2011).
Data availability RNA-seq reads for the purple and white flower bulked samples are available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under BioProject ID PRJNA636502. Whole genome shotgun reads of DM and the monoploids from Hardigan et al. (2016)

RESULTS
Genetic mapping of flower color in a diploid segregating population An F 1 population segregating for flower color was derived from a cross between a white-flowered homozygous individual, DM, and a purple-flowered heterozygous individual, RH (Peterson et al. 2016). The F 1 generation (DRH) displayed an approximate segregation ratio of 1:1 (51 purple DRH P : 44 white DRH W ; x 2 = 0.38, P = 0.54), with all individuals being either white-or purple-flowered. The approximate 1:1 ratio and complete penetrance of the flower color phenotype indicated a single gene segregating from the heterozygous purpleflowered RH was the likely cause. Analysis of previously generated genotype data for this population (Peterson et al. 2016) identified SNPs significantly linked to the phenotype on the distal end of chromosome 10 ( Figure 2). This finding is consistent with previous literature regarding the F locus for flower color (van Eck et al. 1993) which mapped the locus to the same approximate position using restriction fragment length polymorphism analysis.
Transcriptomic analysis of purple vs. white flowers To identify which genes were differentially expressed between white and purple flowers and potentially identify the transcriptional regulator for flower-color determination, we conducted RNA-seq using bulked RNA from pools of white and purple corollas from individual F 1 lines. Using a cutoff of FDR-corrected p-value ,0.05 and a minimum fold-change of two, we identified 78 genes that were significantly differentially expressed between the purple and white flower bulked samples (Supplementary Table 2). The gene with the greatest fold-change difference (500· greater in DRH P ) was an R2R3 MYB transcription factor (PGSC0003DMG400019217) with 99% identity to a gene annotated by PGSC on Spud DB Genome Browser v4.03 as AN2 on the distal end of chromosome 10 and 58% identity to PhAN2 (Supplementary Figure 2). Henceforth, we will designate this gene as StFlAN2 (Solanum tuberosum Flower AN2) to avoid confusion.
In addition to StFlAN2, 16 genes within the phenylpropanoid and anthocynanin biosynthetic pathways were differentially expressed between DRH P and DRH W bulked samples including bHLH-encoding ANTHOCYANIN 1 (StbHLH1, initially called StAN1; (Zhang et al. 2009b;Payyavula et al. 2013), which was significantly up-regulated in DRH P samples (Supplementary Table 2; Figure 1). Additionally, genes associated with fruit ripening were up-regulated in DRH P samples, including four pectate lyase genes and three pectinesterase genes. Lastly, up-regulation of a subset of primary metabolism genes in the DRH W samples was observed, including photosystem subunits, ribulose bisphosphate carboxylase/oxygenase, and fructose-1,6-bisphosphatase (Supplementary Table 2). These results indicate that the StFlAN2 transcriptional regulator is a likely candidate gene to control the anthocyanin synthesis pathway involved in flowercolor determination of potato.
Transgenic recovery of the purple-flower phenotype To empirically test whether StFlAN2 does confer the purple-flower phenotype, we cloned the StFlAN2 promoter from genomic DNA and the coding sequence from cDNA of DRH P 28-5, an inbred (S 5 ) individual fixed for purple flowers. Interestingly, the promoter cloned from DRHp 28-5 is 1000 bp shorter than would be expected from the DM reference genome sequence and lacks the miniature transposable element (MITE) present in the DM promoter. When the StFlAN2 construct was introduced into either the white-flowered DM or a white-flowered DRH F 1 background, all of the resulting transgenic regenerants (seven and six independent transgenics for DM and F 1 , respectively) had purple flowers (Figure 3). These results suggest expression of StFlAN2 is sufficient to convert plant lines with a whiteflower phenotype to purple-flowered plants. In the particular DRH F 1 transgenic line (DRH F 1 -171 -DRH p-ne ::cDNA-26) shown, possible pleiotropic effects on leaf pigmentation and tuber flesh can also be observed; however, this effect was not consistent among all the transgenic lines, indicating this phenomenon was possibly dependent on the transgene insertion site (Supplementary Figure 3).

Characterization of the StFlAN2 locus
Having identified StFlAN2 as the likely cause of differences in anthocyanin accumulation within the DRH population, we next performed in-depth sequence analysis of the locus in available purpleand white-flowered lines. The nucleotide sequence of the StFlAN2 locus was analyzed by alignment in IGV (Figure 4) of RNA-seq data from the purple DRH bulk segregant pool (DRH P -RNA-seq) and whole genome shotgun sequence (WGS) data from (1)  The locus and the regulation of StFLAN2 expression appear to be more complex than initially expected. The bicolored lines in the histograms of the WGS alignments of purple-flowered monoploids (M1, M8, M9, and M11) in Figure 4 would ordinarily indicate heterozygosity; however, as these are monoploid plants with no possibility of heterozygosity, we interpret this as copy number variation (CNV) in the form of gene duplication of the StFlAN2 locus.
The duplication event appears to have encompassed only the 39 end of the promoter, as evidenced by a drop in read depth (black arrow in Figure 4). Furthermore, both StFlAN2 loci in the purple-flowered monoploids lack the MITEs found in the promoter region and second intron of the DM reference genome, as demonstrated by a lack of continuous reads aligned to the DM reference in this region ( Figure 4). A small number of reads mapped to the MITE regions in these alignments, but exhibit lack of continuity against the DM reference on both 59 and 39 ends, indicating likely spurious read alignments (red lines in Figure 4). The alignment of DRHp-RNA-seq Sanger sequence to the reference genome revealed complete identity to DM in exon 1, two SNPs present in exon 2 and 18 SNPs plus two indels (6 bp) in exon 3 (Supplementary Figure 4 note that the gene reads from right to left). All 18 SNPs in the third exon are present in all of the read alignments of the purple flowered bulk RNA sample (Figure 4 and Supplementary Figure 4). We interpret this lack of apparent CNV in the DRH P -RNA-seq sample to indicate that only one of the two StFlAN2 copies in DRH P and the purple-flowered monoploids is expressed. We performed a sequence search of the haplotype-resolved RH genome assembly (http://solanaceae.plantbiology.msu.edu/index.shtml) using Sanger sequences derived from amplification of DRH 28-5 genomic DNA using primers designed to DRH p cDNA sequences. StFlAN2 is located on haplotype 1 of chromosome 10 in RH and has an additional two paralogs within $100 kbp, confirming our prediction of CNV in DRH 28-5. In contrast, only one allele of StFlAN2 or its paralogs was detected in the corresponding region of haplotype 2 of chromosome 10 of RH. Specifically, the one that differs from the DM reference genome and is derived from the duplication event (Figure 4). To distinguish between the two copies, we have named the nonexpressed copy found in all lines StFlAN2ne and the duplicated copy found in purple-flowered lines StFlAN2e. DRH W M, although different from the DM reference, shows no CNV, indicating it harbors only one copy of the StFlAN2 locus. In addition, it lacks both portions of the promoter MITE and the intronic MITE; the alignments (red line in Figure 4) in the central region of the promoter MITE in DRH W M exhibit lack of continuity against the DM reference on both 59 and 39 ends, indicating that this region may exhibit spurious read alignments (Figure 4). Part of the region downstream of the promoter MITE position that has reduced read depth in the purple-flowered monoploids is lacking in the DRH W M promoter.
With the notable exception of M4, the remaining white-flowered monoploids (DM, M2, M3, M5, M7, and M10) appear to have only a single copy of the StFlAN2 locus (Figure 4). The apparent CNV encompassing the MITE in the promoter region of M5 and M7 indicates the assignment of a duplicate MITE to this region. This also explains the increased relative read depth of this region for M5 and M7 (Figure 4).
The white-flowered monoploid M4 is an exception. Like the purple-flowered monoploids, it also appears to have a duplication of the StFlAN2 locus. Both copies lack the intronic MITE, as demonstrated by a lack of reads mapped to the DM reference in this region ( Figure 4). However, based on read-depth variation, it appears that only one of the copies contains the promoter MITE (Figure 4). The M4 promoter MITE matches the duplicated MITE in M5 and M7, whereas the remaining part of the promoter matches the promoters of the purple monoploids. Based on the white flower color, neither copy is expressed.
We exploited the primers used to clone the cDNA from DRH P to amplify and analyze the genic StFlAN2 sequence from the genomes of DRH P , DRH W , DM, and M4 ( Figure 5A). The results for DRH P indicate that the genic region of the expressed copy (StFlAN2e, whose exons match the cDNA) also contains a MITE. However, although it is of the same size (235 bp), it is of a different superfamily (Tc1/Mariner; DTT) than the MITE found in the DM reference StFlAN2 promoter and intron (Mutator; DTM). Hence, the whole genome shotgun sequencing results from the purple-flowered monoploids (M1, M8, M9, and M11) do not show alignments to the DM reference intronic DTM MITE (Figure 4). The genic region of the non-expressed copy (StFlAN2ne, that matches the DM reference genome) does not contain any MITE and is consequently shorter, leading to two different-sized PCR amplicons between 1 and 2 kb ( Figure 5A). Amplification of the DRH W StFlAN2 genic region yielded an amplicon identical to the shorter of the two found in DRH P , matching the results of the whole genome shotgun sequence data for this line (Figure 4). Amplification of the DM StFlAN2 genic region yielded an amplicon of similar size as the longer of the two found in DRH P , but with a DTM MITE in the first intron, matching the DM reference genome (Figure 4). Lastly, amplification of the M4 StFlAN2 genic region yielded two amplicons identical to the two found in DRH P , one that matches the DM reference genome (StFlAN2ne) but lacks the DTM MITE, and one whose exons match the DRH P cDNA (StFlAN2e) and contains a DTT MITE in the first intron ( Figure 5A).
We next exploited the primers used to clone the promoter from DRH P to amplify and analyze the StFlAN2 promoter sequence from the genomes of DRH P (from DRH P 28-5), DRH W (from DRH W M), DM, and M4 ( Figure 5B). Only a single band was obtained for each promoter amplification, as the forward primer was designed based on the DM reference genome and only matched the single copy (StFlAN2ne), the reverse primer is at the start of the first exon ( Figure  5C). An approximately 1 kb amplicon was obtained from DRH P , matching the DM reference genome minus the DTM MITE ( Figure  5B). The DRH W promoter amplicon was slightly smaller, matching expectation based on the whole genome shotgun sequencing analysis ( Figure 4) that indicated it lacked the promoter MITE and was missing 20 bp downstream of the promoter MITE position. The amplicon obtained from DM was approximately 2 kb, representing the promoter including the DTM MITE. Lastly, the amplicon obtained from M4 was identical to the DRH P amplicon, representing the promoter without the DTM MITE and indicating that the MITE reads from the shotgun sequencing analysis (Figure 4) must have come from the duplicated locus.
These results lead us to the following interpretation of how StFlAN2 expression varies in the different white-and purple-flowered lines ( Figure 5C). The purple-flowered DRH P and monoploids (M1, M8, M9, and M11) harbor two copies of the gene; StFlAN2ne, which is not expressed and contains neither a promoter MITE or an intronic MITE, and StFlAN2e, which is expressed and contains an intronic DTT MITE. The white-flowered DRH W harbors only one copy, identical to DRH P StFlAN2ne, which is not expressed and contains neither a promoter MITE nor an intronic MITE. The white-flowered DM and monoploids (M2, M3, M5, M7, and M10) also harbor only one copy, which is not expressed but contains both a promoter DTM MITE and an intronic DTM MITE. The white-flowered monoploid M4 harbors two copies of the gene; StFlAN2ne, which is not expressed and contains neither a promoter MITE or an intronic MITE, and StFlAN2e Ã , which is also not expressed and, similar to the purpleflowered DRH P and monoploids, contains an intronic DTT MITE but, in contrast to the purple-flowered DRH P and monoploids, also contains a DTM MITE in its promoter. Lastly, the transgenically complemented, purple-flowered DM line with the inserted StFlAN2 construct harbors an additional gene copy with the StFlAN2ne promoter from DRH P driving the StFlAN2e cDNA, indicating that this promoter can be active outside of the StFlAN2 locus.

DISCUSSION
StFlAN2 underlies the F locus for flower color in potato With a combination of genetic mapping, RNA-seq, transgenic complementation, and genomic sequence analysis, we show that a PhAN2 homolog, StFlAN2, is the regulator of floral anthocyanin production in potato flowers, at least in the populations studied. This matches PhAN2 function in petunia, where it is also responsible for anthocyanin production in flowers. The 1:1 segregation pattern for flower color in the DRH F 1 segregating population combined with a presence/absence phenotype imply the flower color is due to segregation of a single regulatory gene. If there had been a continuum of color or a more complex segregation pattern, a biosynthetic gene might have been a more likely cause, as Sliwka et al. (2017) hypothesized (Note that DM is white-flowered.) DRH p -RNA-seq represents the RNA-seq from the purple DRH bulk whereas all other tracks are whole genome sequencing. The gene structure StFlAN2 is displayed with a purple arrow representing the promoter and boxes representing each exon (reading from right to left as the gene is on the bottom strand). Copy number variation is apparent in all purple tracks except for DRH p -RNA-seq (as only one paralog is expressed). This copy number variation manifests as multicolored bars in the allele frequency histogram which would otherwise be considered two separate SNP states in a heterozygous background. The locations of the two reference MITEs are displayed with green triangles. The black arrow highlights the start of reduced read depth from the 39to the 59 end of the promoter sequence in the purple monoploids. The red horizontal lines indicate regions of discontinuous alignment within MITE reads.
in their study on flower color intensity. Genetic mapping revealed a significant QTL on the distal end of chromosome 10 which, when combined with previous reports that this region harbors the requisite F locus for flower color (van Eck et al. 1993), provides further support for this assertion. RNA-seq analysis shows StFlAN2 to be the most differentially regulated gene between purple-and white-flowered genotypes. Finally, the shift from white to purple flower color in the DM background by a transgenic construct provides yet another level of support that StFlAN2 indeed controls expression of anthocyanin biosynthetic genes in potato corollas.

Regulatory effects exerted by StFlAN2
Transcriptome analyses provides insight into which genes are affected by the expression cascade initiated by StFlAN2. These genes span the gamut of anthocyanin biosynthesis, starting with the initial flux of carbon from aromatic amino acids into the phenylpropanoid pathway catalyzed by phenylalanine ammonia lyase (PAL), to the first committed step of the flavonoid pathway, chalcone synthase (CHS), and finally ending with the assortment of enzymes involved in anthocyanin structural modification, such as glucosyl-, acyl-, and glutathione transferases (Fraser and Chapple 2011). Within the gene annotation of the potato genome (DM 1-3 516 R44 v3.04), there are 11 PAL genes, although only six of them appear to be full length copies (Supplementary Table 3) and two of them appear to be partially duplicated. One of the two PAL genes identified in our study, PGSC0003DMG400031365, is highly expressed in purple corollas of RH compared to white-flowered DM [Univeristy of Toronto BAR ePlant website (http://bar.utoronto.ca/eplant_potato/) accessed 7/24/ 2020). The other, PGSC0003DMG400023458, has little or no expression in DM and high expression in the stems of RH. With regard to CHS genes, there are 11 annotated in the potato genome (DM 1-3 516 R44 v3.04), most of which appear to be full-length (Supplementary Table 3). Four of the 11 are highly expressed in the purple flowers of RH (http://bar.utoronto.ca/eplant_potato/) accessed 7/24/2020) and two are expressed in the white flowers of DM, including the gene identified in our study (PGSC0003DMG400019110) which appears to be expressed in flowers regardless of color. Similar to what is observed for the tuber-specific StAN1 and StAN2 [also referred to as StMYBA1 (Liu et al. 2016)] MYBs (Payyavula et al. 2013), this study finds that expression of StFlAN2 correlates with the expression of genes within the phenylpropanoid pathway that do not contribute to production of anthocyanins, such as P-coumaroyl quinate/shikimate 39-hydroxylase and caffeoyl-CoA O-methyltransferase. As these genes are believed to contribute to the production of other phenylpropanoid-derived compounds, their increased expression is more likely attributable to an increased flux of precursors than the direct action of StFlAN2 itself (Vanholme et al. 2010;Fraser and Chapple 2011).
The remaining differentially expressed genes are not involved directly in the production of anthocyanins but are possibly attributable to the physiological consequences of anthocyanin production. Three major functions stood out among these genes: up-regulation of genes involved in cell-wall degradation and ripening, downregulation of polyphenoloxidase genes, and down-regulation of genes involved in photosynthesis and carbon fixation in purple corollas. Although care was taken to sample only recently opened, turgid flowers for all samples, the abundance of up-regulated pectinesterase genes and pectate lyase genes in the purple corollas suggests that they were biochemically further along in the 'ripening' process than white corollas. The more than 20-fold reduction in polyphenoloxidase expression relates to anthocyanin levels, as these enzymes are known to mediate anthocyanin degradation (Pifferi and Cultrera 1974;Jiang 2000). The light-shielding function of anthocyanins could serve as an explanation for lower expression of photosynthetic genes, such as RuBisCo and photosystem subunits, in purple corollas. Anthocyanins have been shown to protect against In both A and B, L denotes Invitrogen 1kb+ ladder; C) Scheme of the StFlAN2 locus in DRH P , DRH w , DM, and a white flowered monoploid haplotype (M4). DRH P includes both the non-expressed (StFlAN2ne) and expressed (StFlAN2e) paralogs. Arrows represent promoters while boxes and lines represent exons (E) and introns, respectively. Green triangles depict MITEs inserted into either promoter or intronic sequences labeled by superfamily. Excluding transposons, shapes are filled to indicate expression whereas unfilled shapes indicate lack of expression. Striped fill indicates unknown sequence which lacks homology to the reference genome. Locations for primers used in cloning and PCR are displayed with red (promoter) and blue (CDS) arrows. Note: Differences in haplotype ideograms in C are reflected in band size and number in A and B.
photoinhibition by absorbance of light and limiting permeation into the leaf (Steyn et al. 2002).

Structural variation of the StFlAN2 locus
The analysis of the floral StFlAN2 locus in DM, DRH, and its inbred derivatives as well as a white-flowered monoploid indicates a dynamic local genetic terrain, with both transposon activity and copy number variation. Jung et al. (2009) identified a tuber-specific PhAN2 homolog as the regulator of anthocyanin production in tuber skin. Subsequent analysis showed that the region harboring StAN1, located approximately 300 kb distal to the StFlAN2 locus described here, to be replete with MYB homologs, including multiple pseudogenes and at least one other functional gene, named StAN2 [also called StMYBA1 (Liu et al. 2016)], which is responsive to cold and drought stress and responsible for anthocyanin accumulation throughout the plant (André et al. 2009;D'Amelia et al. 2014;D'Amelia et al. 2018). Hence, the regionand the MYB homologs by extensionhas been affected by multiple duplications leading to subfunctionalization in which no less than three separate potato genes have been referred to as AN2 due to their shared homology with PhAN2 (Supplementary  Table 4); for clarity we will continue to refer to the floral locus as StFlAN2.
We report here that another duplication, giving rise to the paralog StFlAN2e, is responsible for segregation of corolla anthocyanin production in the DRH population. In tomato (Solanum lycopersicum), there is also a duplication of PhAN2 homologs on chromosome 10; both homologs are functional but not redundant, with only one regulating fruit color (Kiferle et al. 2015). It has been observed that the regions harboring PhAN2 homologs in Petunia inflata and P. axillaris are also remarkably dynamic, with little synteny between the two despite the recent divergence of the species, perhaps due to high transposon density (Bombarely et al. 2016). Thus, it is possible that complexities surrounding the StAN1 and StFlAN2 loci are the current manifestations of a region especially prone to structural variation for many of the Solanaceae due to a heightened density of repetitive elements and lack of pleiotropic effects of the PhAN2 homologs themselves (Bombarely et al. 2016).
Interestingly, the promoter of non-expressed StFlAN2 allele (StFlAN2ne) that is present in both the DRH P and DRH W F 1 segregants is active when it is inserted elsewhere in the genome, as all 13 independently regenerated transgenic lines derived from two different genotypes generated for this study had purple flowers ( Figure 3). Ectopic expression in leaf and stem tissue was apparent in some of the transgenics. This suggests that there are likely some repressive cis-elements nearby but outside of the region cloned here and/or a repressed chromatin state. Sequence organization of the StAN1 promoter has also been suggested to be important for expression of anthocyanin synthesis in potato leaves and tuber skin (Strygina et al. 2019). This may also explain why the duplicated paralog StFlAN2e that is only present in the DHR P F 1 segregants and the DHR P 28-5 inbred line is expressed and confers purple flower color, if the duplication removed it from the influence of this putative repressive cis-element.