Soybean DICER-LIKE2 Regulates Seed Coat Color via Production of Primary 22-Nucleotide Small Interfering RNAs from Long Inverted Repeats

0000-0002-0217-0666 (J.Z.); In plants, 22-nucleotide small RNAs trigger the production of secondary small interfering RNAs (siRNAs) and enhance silencing. DICER-LIKE2 (DCL2)-dependent 22-nucleotide siRNAs are rare in Arabidopsis ( Arabidopsis thaliana ) and are thought to function mainly during viral infection; by contrast, these siRNAs are abundant in many crops such as soybean ( Glycine max ) and maize ( Zea mays ). Here, we studied soybean 22-nucleotide siRNAs by applying CRISPR-Cas9 to simultaneously knock out the two copies of soybean DCL2 , GmDCL2a and GmDCL2b , in the Tianlong1 cultivar. Small RNA sequencing revealed that most 22-nucleotide siRNAs are derived from long inverted repeats (LIRs) and disappeared in the Gmdcl2a/2b double mutant. De novo assembly of a Tianlong1 reference genome and transcriptome pro ﬁ ling identi ﬁ ed an intronic LIR formed by the chalcone synthase (CHS) genes CHS1 and CHS3 . This LIR is the source of primary 22-nucleotide siRNAs that target other CHS genes and trigger the production of secondary 21-nucleotide siRNAs. Disruption of this process in Gmdcl2a/2b mutants substantially increased CHS mRNA levels in the seed coat, thus changing the coat color from yellow to brown. Our results demonstrated that endogenous LIR-derived transcripts in soybean are predominantly processed by GmDCL2 into 22-nucleotide siRNAs and uncovered a role for DCL2 in regulating natural traits.


INTRODUCTION
In plants, microRNAs (miRNAs) and small interfering RNAs (siR-NAs) trigger cleavage of their complementary mRNA targets via the slicer activity of ARGONAUTE (AGO) proteins (Rogers and Chen, 2013;Song et al., 2019). In Arabidopsis (Arabidopsis thaliana), while the majority of the sliced mRNA targets are degraded after miRNA-mediated cleavage, a small subset of the cleaved targets generate secondary, trans-acting siRNAs that enhance the silencing cascade (Allen et al., 2005;Chen et al., 2007;Howell et al., 2007). Two major models account for the initiation of the production of trans-acting siRNAs: the "two-hit" model for the TAS3 locus, which requires the function of miR390 and AGO7 (Axtell et al., 2006;Xia et al., 2017); and the "one-hit" model, which requires a miRNA trigger to be precisely 22 nucleotides in length (Chen et al., 2010;Cuperus et al., 2010). The TAS2 locus in Arabidopsis mostly generates secondary 21nucleotide siRNAs triggered by miR173, but the 22-nucleotide tasiR2140 from TAS2 39 D6(-) processed by DICER-LIKE4 (DCL4) triggers tertiary siRNAs, although its functional significance is still unclear (Chen et al., 2010).
Compared with miRNAs, 22-nucleotide siRNAs in plants are primarily produced by DCL2 and play essential roles in viral defense and transgene silencing (Gasciolli et al., 2005;Ding and Voinnet, 2007;Parent et al., 2015;Taochy et al., 2017;Wu et al., 2017;Wang et al., 2018b). Recent reports also found that 22nucleotide siRNAs derived from endogenous genes can be induced by nutrient deficiency and may contribute to plant adaptation to environmental stresses (Wu et al., 2020). However, the function of endogenous 22-nucleotide siRNAs under normal growth conditions is still largely unexplored, partially because they are rare in wild-type Arabidopsis, and loss-of-function dcl2 mutants in Arabidopsis and tomato have no visible developmental defects (Henderson et al., 2006;Wang et al., 2018aWang et al., , 2018b. Loss of function of DCL4 in Arabidopsis can induce the production of a large amount of DCL2-dependent 22-nucleotide siRNAs from endogenous genes and cause silencing of their targets, and as a result, plants exhibit various developmental defects (Bouché et al., 2006;Zhang et al., 2015;Wu et al., 2017Wu et al., , 2020. However, endogenous 22-nucleotide siRNAs are abundant in many major crops such as soybean, rice (Oryza sativa), and maize (Zea mays), indicating their potential importance (Lunardon et al., 2020). For example, 22-nucleotide siRNAs from transposable element (TE) regions are abundant in maize, especially in the mediator of paramutation1-1 background, in which 24-nucleotide siRNAs disappear (Nobuta et al., 2008). Yet, their functions remain unclear, partly due to a lack of mutants that disrupt the production of 22-nucleotide siRNAs in major crops. Here, we take advantage of the completed soybean reference genome (Schmutz et al., 2010;Liu et al., 2020) and recent advances in crop genome-editing technologies (Shan et al., 2013) to directly investigate the function of DCL2-dependent 22nucleotide siRNAs in soybean by studying Gmdcl2 clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 lines with frame-shift mutations.

CRISPR-Cas9 Editing of DCL2a/2b in Soybean
To gain insight into the role of 22-nucleotide endogenous siRNAs in soybean, we applied the CRISPR-Cas9 technology in soybean cv Tianlong1 to obtain loss-of-function mutants of the GmDCL2 genes. The soybean genome encodes two copies of DCL2, GmDCL2a (Glyma.09G025300) and GmDCL2b (Glyma.09G025400); these have high similarity (80.4%) in the encoded protein sequences (Supplemental Figures 1A and 1B; Supplemental Files 1 and 2). These two genes are adjacent to each other on chromosome 9, indicating that they are derived from a recent tandem duplication and thus have a high likelihood of being functionally redundant. Taking advantage of their high sequence similarity, we designed a single guide RNA (gRNA) to simultaneously target a conserved coding region on both genes ( Figure 1A), and we performed CRISPR-Cas9-mediated editing through tissue culture (see Methods for details).
From multiple independent mutant lines, we selected three homozygous double mutant lines, Gmdcl2a/2b-8, Gmdcl2a/2b-9, and Gmdcl2a/2b-16, for further studies ( Figure 1A). All three lines harbor deletions around the target site that result in a frame shift and premature stop codon in the coding region of GmDCL2a/2b ( Figure 1A). The mutant lines showed no obvious abnormality in plant architecture except for being slightly dwarf (Supplemental Figure 1C; Supplemental File 3). The most dramatic phenotypic change is that the seed coat of all three lines was brown in contrast to the yellow seed coat of wild-type Tianlong1 ( Figure 1B; Supplemental File 3). In our later work, this phenotype was of particular interest for studying the function of DCL2.

Impacts of DCL2 Loss of Function on Small RNA Biogenesis in Soybean
To study the impacts of GmDCL2a/2b loss of function on small RNA (sRNA) biogenesis and gene regulation, we performed sRNA sequencing (sRNA-seq) and mRNA sequencing (mRNA-seq) in both the wild type (Tianlong1) and the Gmdcl2a/2b-8 mutant. Both the leaf and seed coat tissues were examined with three biological replicates for each sample. The sRNA-seq results showed that wild-type soybean produced abundant 22-nucleotide siRNAs, which decreased substantially in the Gmdcl2a/2b mutant ( Figure 2A).
To further analyze these DCL2-dependent siRNAs, we classified sRNA clusters based on the dominant size class in each cluster using ShortStack (Axtell, 2013;Lunardon et al., 2020). Consistent with previous analyses of sRNAs in soybean (Arikit et al., 2014;Lunardon et al., 2020), the majority of siRNA loci in soybean features 24-nucleotide siRNAs, but 21-and 22-nucleotide loci demonstrated higher sRNA abundances ( Figure 2B). In wild-type libraries, our analysis identified 1304 22-nucleotide siRNA loci from the seed coat and 1371 from leaf sample, which accounted for only ;3% of the total number of loci but over 20% of total siRNA abundance ( Figure 2B). Compared with the wild type, siRNAs from 22nucleotide siRNA loci decreased substantially in the Gmdcl2a/ 2b mutant ( Figures 2B and 2C). For example, over 70% of the 22-nucleotide siRNA loci showed at least a 10-fold decrease in abundance in the seed coat ( Figure 2D). Also, siRNAs from some 21-nucleotide siRNA loci, especially those that share sequence homology with 22-nucleotide siRNA loci, were decreased in the mutant ( Figure 2E), suggesting that DCL2dependent 22-nucleotide sRNAs are capable of targeting in trans and triggering the biogenesis of secondary 21-nucleotide siRNAs. These results indicated that, compared with the low level of 22-nucleotide siRNAs found in Arabidopsis, soybean accumulates a large amount of endogenous 22-nucleotide siRNAs whose biogenesis requires DCL2.
We noticed that a handful of 22-nucleotide siRNA loci remained abundant in the Gmdcl2a/2b mutant (Supplemental Table 1), and a further investigation found that these loci exhibited miRNA-like characteristics. For example, one locus contains two homologous Gypsy TEs in the fifth intron of a clathrin gene (Glyma.14G058300); an RNA from this locus is predicted to form a stem-loop-like structure with bulges and mismatches that resemble a miRNA precursor ( Figure 2F). This locus produces three highly abundant sRNAs in the wild type (namely sRNA-A, sRNA-B, and sRNA-C in Figure 2F). While sRNA-B and sRNA-C nearly disappeared in the Gmdcl2a/2b mutant, the 22-nucleotide sRNA-A was surprisingly even more abundant in the Gmdcl2a/2b mutant ( Figure 2F), suggesting that this stem-loop can be processed not only by DCL2 but also by other DCL proteins, most likely by DCL1. This finding indicates that DCL2-dependent 22-nucleotide siRNAs have the potential to evolve into DCL1-dependent 22-nucleotide miRNAs, thus shedding light on a potential path for the evolution of 22-nucleotide miRNA via DCL2.

A Large Number of TEs Are Targeted by DCL2-Dependent 22-Nucleotide siRNAs in Soybean
In addition to their relatively large number, 22-nucleotide siRNA loci in soybean have another unusual feature compared with Arabidopsis: ;70% of these 22-nucleotide siRNA loci are overlapping with TEs ( Figure 3A; Supplemental Figure 2A), suggesting a potential role in TE silencing. In particular, instead of a typical 24nucleotide length, the siRNAs from the Caulimovirus class of TEs are mainly 22 nucleotides in length; we found that other TE families, such as CMC-Enhancer/Suppressor-mutator (CMC-EnSpm, CMC represents the three conserved nucleotides in terminal inverted repeat region, M 5 A or C), also produced a large amount of 22-nucleotide siRNAs ( Figure 3B; Supplemental Figure 2B). These TE-derived 22-nucleotide siRNAs are mostly strand-specific, and their abundances sharply decreased in the Gmdcl2a/2b mutant, as exemplified by one Caulimovirus TE locus and one CMC-EnSpm locus (Figures 3B and 3C;Supplemental Figures 2B and 2C). In addition to 22-nucleotide loci, we also identified dozens of highly abundant 21-nucleotide loci that are TE-related, and their sRNA accumulation also decreased in the Gmdcl2a/2b mutant ( Figure 3D; Supplemental Figure 2D), such as one CMC-EnSpm locus that is homologous with other 22nucleotide siRNAs generating the CMC-EnSpm family of TEs ( Figure 3C; Supplemental Figure 2C). These findings are consistent with the hypothesis that 22-nucleotide TE-derived siRNAs may target the homologous TE transcripts and trigger the generation of secondary 21-nucleotide siRNAs to enhance their silencing. In the Gmdcl2a/2b mutant, however, we did not observe an elevated level of TE mRNA accumulation ( Figure 3D; Supplemental Figure 2E; with examples shown in Figure 3C and Supplemental Figure 2C), suggesting that there are other redundant mechanisms to suppress TE activities in soybean.  The DNA of TE regions in plant genomes is often highly methylated in CG and non-CG contexts (Law and Jacobsen, 2010). We tested whether these TE-derived DCL2-dependent 22nucleotide siRNAs are capable of triggering DNA methylation, like the well-established role of 24-nucleotide siRNAs in RNA-directed DNA methylation by methylome profiling of the leaf tissues. From our whole-genome bisulfite sequencing (WGBS) data, we found that the methylation level at these 22-nucleotide TE loci is similar between the wild type and the Gmdcl2a/2b mutant (Supplemental Figure 2C). Thus, we conclude that 22-nucleotide siRNAs are not required for the maintenance of DNA methylation at their targeted TEs in leaves.

GmDCL2 Favors Long Inverted Repeats as Its Substrates
Next, knowing that RNA secondary structure is a key feature of at least DCL1 processing, we examined the structural features of 22nucleotide siRNA loci. We found that inverted repeats (IRs) are enriched at a much higher proportion at 22-nucleotide loci compared with 21-or 24-nucleotide loci. For example, ;80% of 22-nucleotide siRNA loci with an sRNA accumulation level higher than 50 TPM overlapped with IRs, compared with only 12% of 21nucleotide siRNA loci with similar abundances overlapping with IRs ( Figure 4A). Our poly(A)-selected mRNA-seq data further showed that, for those IRs that are overlapping with 22-nucleotide siRNA loci and have relatively high sRNA accumulation levels (above 10 TPM), 90% of them are transcribed by Pol II ( Figure 4B). In contrast, IRs overlapping with 24-nucleotide siRNA loci have almost no detectable poly(A) signal ( Figure 4B), as most 24nucleotide siRNAs are derived from Pol IV transcripts that do not have a poly(A) tail (Kuo et al., 2017). We also found that a smaller number of IRs mainly produced 21-nucleotide siRNAs and are also transcribed by Pol II (Figures 4A and 4B).
A previous study showed that DCL4 prefers doublestranded RNA (dsRNA) substrates that are over 100 nucleotides in length (Nagano et al., 2014). In addition, in a number of monocot species, 24-nucleotide reproductive phased siRNAs are derived from many IR precursors (Kakrana et al., 2018). However, the substrate preference for DCL2 remains unknown. Our analysis found that the median length of 22nucleotide siRNA-generating IRs is much larger than that of 21-nucleotide siRNA-generating IRs, especially for loci with high sRNA abundance. For those loci with TPM higher than 50, the median lengths of 21-and 22-nucleotide siRNAgenerating IRs were 254 and 1289 nucleotides, respectively ( Figure 4C). For example, the eighth intron of gene Glyma.15G177500 contained a long IR consisting of Copia TEs. The repeat region of this IR was ;2800 nucleotides and produced a large number of 22-nucleotide siRNAs from only the sense strand in a DCL2-dependent manner ( Figure 4D). In contrast, the pre-mRNAs of gene Glyma.14G055600 contained a short IR whose repeat region is 240 nucleotides and produced a series of 21-nucleotide sense siRNAs that were unaffected by Gmdcl2a/2b mutation ( Figure 4D). These results suggested that long IRs transcribed by Pol II are preferentially processed by DCL2 in soybean. In contrast, shorter IRs (typically 300 nucleotides and shorter) are favored by DCL4 to produce 21-nucleotide siRNAs.

DCL2-Dependent 22-nucleotide siRNAs Regulate the Seed Coat Color in Soybean
Chalcone synthase (CHS) is a key enzyme for the biosynthesis of flavonoids, which cause black/brown pigmentation in the soybean seed coat (Tuteja et al., 2009). Wild soybean accessions have black or brown seed coats, whereas commercial soybean cultivars are yellow due to the presence of a dominant allele of the I locus. There are four genotypes of the I locus (I, i i , i k , and i, with that order of dominance). Seeds of the dominant I allele have no pigment either on the hilum or on seed coat proper; the i i allele exhibits pigment on the hilum but not on the seed coat proper; the i k allele shows a two-colored saddle pattern on the seed coat; and the recessive i alleles feature fully pigmented seed coat Cho et al., 2019). The i i allele contains two similar IR clusters, and each cluster contains three CHS genes (CHS1, CHS3, and CHS4 [Tuteja et al., , 2009; illustrated in Figure 5A). This locus is the source of siRNAs that target and silence other CHS genes in the genome in trans and result in a yellow seed coat (Tuteja et al., , 2009. Large deletions, some of which arise by homologous recombination within the CHS clusters, lead to missing CHS genes in the clusters and thereby can abolish the production of the CHS-derived siRNAs and reverse the seed coat color back to black/brown (Tuteja et al., 2009;Cho et al., 2019). However, the genome sequencing of black W05 wild soybeans showed that it also contains this large CHS cluster, suggesting that the cluster itself is not sufficient to trigger siRNA-mediated silencing (Xie et al., 2019;Liu et al., 2020).
Comparative genome analysis found that a complex inversion and gene duplication event adjacent to CHS gene clusters occurs in most cultivars, such as Williams 82 and ZH13, and results in the promoter as well as the first four exons/introns of a subtilisin gene inserted to the upstream of the CHS gene cluster ( Figure 5A; Shen et al., 2018;Xie et al., 2019;Liu et al., 2020). This leads to a model proposing that the CHS1 antisense transcripts driven by the promoter of the subtilisin gene can base pair with other CHS1 sense transcripts to form dsRNAs that are further processed into 21-nucleotide siRNAs to silence other CHS genes (Xie et al., 2019). Recent studies found that AGO5 is involved in this process, as  (A) The gene structures of the I locus in the wild soybean W05 genome as well as cultivated soybean ZH13 and Tianlong1 genomes. The mRNA-seq and sRNA-seq results of the Tianlong1 seed coat are shown below the gene structures. *, Due to the duplication of CHS1 and CHS3 genes, the reads specifically mapped to CHS1 or CHS3, but not to other CHS genes, are also shown as "Uniquely mapped." The junction read counts (n) are indicated. nt, nucleotide; WT, wild type. a naturally occurring ago5 mutant in soybean has altered color in the saddle region of the seed coat (Cho et al., 2017). DCL2 has not been implicated in this regulatory machinery, as the majority of CHS siRNAs are 21-mers (Tuteja et al., 2009). Yet, we inferred that DCL2 is required to maintain the silencing of the CHS gene family, as Gmdcl2a/2b mutation alters the seed coat color of Tianlong1 ( Figure 1B).
To understand the role of DCL2 in regulating seed coat color, we de novo assembled the I locus in Tianlong1 (i i allele, yellow seed coat with light brown hilum) using a combination of PacBio longread sequencing, Illumina sequencing, and chromosome conformation capture sequencing (Hi-C; see Methods for details). We found that the I locus in Tianlong1 has the same inversion and duplication of the subtilisin gene fragment as previously shown in other cultivars ( Figure 5A). We hypothesize that the transcription potentially driven by the promoter of this subtilisin gene traverses the antisense strand of CHS1 and the sense strand of CHS3 in Tianlong1 ( Figure 5A). This chimeric transcript has been previously discovered in ESTs in the Williams 82 cultivar  and confirmed here by a large number of splicing junction mRNAseq reads spanning the entire antisense-CHS1-sense-CHS3 IR region, connecting the last exon of the inserted subtilisin gene fragment and the exon located downstream of this IR. The antisense CHS1 region and the sense CHS3 region in the subtilisinantisense-CHS1-sense-CHS3 chimeric transcripts can form a long inverted repeat (LIR) because of the close sequence similarity between CHS1 and CHS3. Consistent with our finding that GmDCL2 favors LIRs as its substrates, we detected a series of siRNAs from the antisense CHS1 region and sense CHS3 region that are mainly 22 nucleotides in length ( Figures 5A and 5B) in the wild type and disappeared in the Gmdcl2a/2b mutant ( Figures 5A  and 5C).
Besides CHS1 and CHS3, we also detected a large number of secondary siRNAs from other CHS genes, such as CHS2, CHS7, and CHS8 ( Figure 5B). These siRNAs are mainly 21 nucleotides in length and generated from both the sense and antisense strands ( Figure 5B), indicating the possible involvement of an RNAdependent RNA polymerase in generating the dsRNA precursors. Interestingly, these 21-nucleotide siRNAs also disappeared in the Gmdcl2a/2b mutant ( Figure 5C), suggesting that they are likely secondary siRNAs triggered by DCL2-dependent 22-nucleotide CHS siRNAs ( Figure 5D). This is consistent with a previous study that investigated sRNA profiles at 10 different stages of seed coat development and found that the 22nucleotide CHS siRNAs are more prevalent than the 21nucleotide siRNAs at the very early stages of seed coat development, including 4 to 24 d after flowering, whereas 21nucleotide CHS siRNAs become dominant at the later stages of seed coat development. The transition from 22-nucleotide siRNAs at CHS1/3 (I locus) to predominantly 21-nucleotide secondary siRNAs at CHS7/8 likely occurs at the 5-to 6-mg stages (Cho et al., 2013). In the Gmdcl2a/2b mutant, the mRNAs of CHS2, CHS7, and CSH8 genes accumulate at much higher levels compared with the wild type ( Figure 5C), and the seed coat color changed from yellow to brown ( Figure 1B). Our results demonstrated that the DCL2-dependent 22-nucleotide siRNAs are required to maintain the silencing of the CHS gene family and regulate the color of the seed coat in soybean ( Figure 5D). In addition, by comparing mRNA-seq data of the wild type and the Gmdcl2a/2b mutant, we identified 381 genes differentially accumulated in leaf (Supplemental Data Set 1) and 1912 in seed coat (including multiple CHS family members; Supplemental Data Set 2), suggesting that GmDCL2-dependent 22-nucleotide siRNAs can regulate the expression of genes besides the CHS family.

DISCUSSION
While our results showed that DCL2 can act on the transcripts containing long CHS IRs from the I locus (i i allele in cv Tianlong1), one needs to be extra cautious about the simple hypothesis that transcription originating at the partial subtilisin fragment is the main or only driver for the production of the CHS dsRNAs at the I locus, given its complicated structure. In addition, several lines of evidence suggest a more complicated situation for the regulation of the production of the CHS dsRNAs as well as the CHS siRNAs: (1) according to the recent soybean pan-genome analysis, 7 out of the 24 modern cultivars have a haplotype 2 of CHS arrangement that do not have the partial subtilisin fragment at the I locus yet still have a yellow seed coat (Liu et al., 2020); (2) although the wild species W05 lacks the partial subtilisin promoter fragment and has a black seed coat, it is unclear whether the potential mutations of other modifying factors, such as DCL2, AGO5, or other genes involved in the silencing pathway, result in the inability to produce CHS siRNAs in W05 (Cho et al., 2017); and (3) the rearranged subtilisin promoter hypothesis cannot explain the two-color pattern observed in the i i and i k alleles, as both silencing (yellow) and nonsilencing (pigmented) regions should have the same genomic structure (Cho et al., 2017). Thus, future systemic studies using naturally occurring spontaneous mutations or transgenic lines that have closely related genetic backgrounds are best for investigating the mechanism of such a complex locus in diverse wild and cultivated soybeans.
The precursor of 22-nucleotide siRNAs in soybean resembles the precursor of miRNAs in many ways: a single-stranded RNA transcribed by Pol II forms a hairpin-like structure and does not require the activity of RNA-dependent RNA polymerase. Such types of IR precursors for siRNAs have also been observed for 24nucleotide reproductive phased siRNAs in some monocot species, including asparagus (Asparagus officinalis), lily (Lilium maculatum), and daylily (Hemerocallis lilioasphodelus; Kakrana et al., 2018). Moreover, 22-nucleotide siRNAs in soybean can also trigger the production of secondary 21-nucleotide siRNAs, consistent with recent reports in Arabidopsis on the role of 22nucleotide siRNAs in amplifying silencing signals (Wu et al., 2017(Wu et al., , 2020. The parallels in the structure of the precursor and the mode of function between the 22-nucleotide miRNA and LIRderived 22-nucleotide siRNA imply a potential mechanism for the direct evolutionary descent of 22-nucleotide miRNAs from 22nucleotide siRNAs. Our findings also support the previously proposed model that miRNA precursors can originate from local duplication that forms an IR structure (Allen et al., 2004;Cuperus et al., 2011). Therefore, the large number of LIR loci that generate 22-nucleotide siRNAs in soybean could provide a rich foundation for miRNA origin.
While wild-type Arabidopsis generates few 22-nucleotide siR-NAs, we found that a large number of 22-nucleotide siRNAs are produced mostly from LIR regions in soybean. These 22-mers are capable of triggering the production of secondary 21-nucleotide siRNAs to further silence TEs or protein-coding genes. A recent study of sRNAs from 47 diverse plant species showed that many major crops and vegetables produce 22-nucleotide siRNAs at much higher levels compared with Arabidopsis (Lunardon et al., 2020). Our results here demonstrated that DCL2-dependent 22nucleotide siRNAs are capable of regulating natural traits, and future investigation in more species could help to expand our understanding about the possibly underappreciated function of 22-nucleotide siRNAs in wild-type plants.

Plant Materials and Growth Conditions
For preparation of soybean (Glycine max) leaf samples, the wild-type Tianlong1 and CRISPR/Cas9-engineered mutant lines were grown under short-day conditions (10 h of 300 mmol m 22 s 21 white light/14 h of dark, 26°C) in the greenhouse, and the leaves of 10-d-old seedlings were collected at Zeitgeber time 8 (8 h after light exposure). For preparation of seed coat samples, the soybean plants were grown under long-day conditions (16 h of 300 mmol m 22 s 21 white light/8 h of dark, 26°C), and the seed coats were dissected from the fresh seeds with a weight of 50 to 75 mg (Tuteja et al., 2009) at Zeitgeber time 11 (11 h after light exposure). The RNA and DNA used in mRNA-seq, sRNA-seq, and WGBS were isolated from tissues of the wild type (Tianlong1) and the Gmdcl2a/2b-8 mutant. The seeds can be obtained by contacting the group of Bin Liu at the Chinese Agricultural Academy of Sciences.

Targeted Mutagenesis of GmDCL2 Genes
The gRNAs targeting GmDCL2 genes were designed by the CRISPR-P online tool (http://crispr.hzau.edu.cn/CRISPR2/; Lei et al., 2014). The GmU6 promoter was amplified with a pair of primers, GmU6-F and GmU6-R, and the gRNA scaffold was amplified with a pair of primers, gRNA-F and Scaffold-R, using the plasmid pU3-gRNA as a template. The GmU6:gRNA cassette was constructed by overlapping PCR with the GmU6-F and Scaffold-R primers and then inserted into the 35S-Cas9-Bar vector between the XbaI and BglII sites by infusion technology (Clontech). The constructed GmDCL2-gRNA binary vector was introduced into Agrobacterium tumefaciens strain EHA105 and transformed into wild-type soybean Tianlong1 (Paz et al., 2006). The T0 transgenic plants were regenerated on the medium under the selection of 8 mg/L glufosinate ammonium. For characterizing the targeted mutations, the genomic sequences of GmDCL2a and GmDCL2b were amplified and sequenced using the primer pairs GmDCL2a-SF/SR and GmDCL2b-SF/SR, respectively. Primers are listed in Supplemental Table 2.
Biallelic (both alleles are mutated but have different mutations), heterozygous (one allele is mutated, one is wild type), and chimeric (wild-type allele, and multiple different mutated alleles) mutations frequently occur in the soybean CRISPR/Cas9 transgenic lines. The T0 transgenic line #8 (brown seed coat) is biallelic in both GmDCL2a and GmDCL2b loci, based on the genotyping of 23 T1 transgenic lines (all have brown seed coat): 10 lines were homozygous but in two kinds of genotypes (6 lines:4 lines), and 13 are still biallelic. Both of the T0 transgenic lines #9 and #16 (yellow seed coat) are complex chimeric lines. We genotyped the offspring of transgenic plants to identify homozygous double mutants and only selected one mutant obtained from each line for further analysis; we named them Gmdcl2a/2b-8, Gmdcl2a/2b-9, and Gmdcl2a/2b-16.

RNA Extraction and Sequencing of mRNAs and sRNAs
The RNA samples of the wild type (Tianlong1) and the Gmdcl2a/2b-8 mutant were prepared by the Spectrum Plant Total RNA Kit (Sigma-Aldrich, STRN50) according to the manufacturer's instructions. The strand-specific mRNA and sRNA libraries were prepared and sequenced at BGI-Shenzhen.

WGBS and Analysis
Extraction of genomic DNA and the construction of WGBS libraries were performed by BGI-Shenzhen. After sequencing, the mapping of reads and the calculation of DNA methylation ratios in single-base levels were performed using BSMAP (v2.90; Xi and Li, 2009) with default parameters.

Bioinformatics Analysis of sRNAs
The adapters of raw reads were removed using cutadapt (v2.9; Martin, 2011), and the trimmed reads of 20 to 30 nucleotides in length were mapped to the Tianlong1 genome using bowtie (v1.2.2; Langmead et al., 2009) with the parameters -v 0 -a -S. After filtering out the reads matching to rRNAs, tRNAs, as well as the mitochondrial and chloroplast genome (allowing two mismatches), the clean reads of each library were individually mapped to the Tianlong1 genome by ShortStack (v3.8.5;Axtell, 2013) using the parameters -mismatches 0-ranmax 5 -mincov 2rpm. The resulting files were used to identify siRNA and miRNA loci as previously reported Lunardon et al. (2020). The predicted RNA secondary structure of the miRNA-like locus was visualized using strucVis (Michael J. Axtell at https://github.com/MikeAxtell/strucVis). For CHS siRNA analysis, the siRNAs mapped to a specific CHS gene, but not to other CHS genes, were defined as siRNAs derived from this CHS gene.

Bioinformatics Analysis of mRNA-Seq
The paired-end reads of mRNA-seq were mapped to the Tianlong1 genome using STAR (v2.7.2a; Dobin et al., 2013) with the parameters --outFilterMultimapScoreRange 0 --outSAMattributes Standard --alignIntronMin 20 --alignIntronMax 12,000. The PCR duplication reads were removed by . The uniquely mapped read count of genes, TEs, and IRs were extracted by featureCounts (v2.0.0; Liao et al., 2014) with parameters -O -p. Due to the duplication of CHS genes, the reads specifically mapped to a specific CHS gene, but not to other CHS genes, were defined as reads derived from this CHS gene. The fragments per kilobase of exon model per million mapped reads values were calculated based on read counts by the edgeR package (v3.22.5; Robinson et al., 2010).

De Novo Assembly of the Tianlong1 Genome
Genomic DNA was isolated and sequenced by BGI-Shenzhen. The 20-kb library was sequenced on the PacBio Sequel II System with 803 coverage. The 270-and 500-bp paired-end Illumina libraries were constructed by BGI and sequenced using the HiSeq X Ten platform with 1003 coverage. A total of 0.5 g of leaves of 40-d-old plants were collected and used for in situ Hi-C library construction, following the procedure previously described, with some modification, by Moissiard et al. (2012). In brief, the homogenized tissues were fixed in 1% (v/v) formaldehyde and stopped with 0.125 M Gly. Then, homogenate was filtered through two layers of Miracloth. After centrifugation, the nuclei were digested and ligated in situ as previously described. After reverse-cross-link and DNA purification, the DNA fragmentation was performed by Tn5 using the Vazyme kit (TruePrep DNA Library Prep Kit V2 for Illumina, TD501) and stopped with 0.2% (w/v) SDS at 55°C for 15 min. Biotinylated DNA fragments were pulled down and amplified. The Hi-C library was sequenced on the HiSeq X Ten platform using 150-bp paired-end mode.
De novo assembly was conducted according to a reported pipeline (Shen et al., 2018) with minor modifications. In brief, the PacBio reads were assembled to primary contigs using Canu (v1.7.1; Koren et al., 2017). Theprimary contigs were polished using SMRT Link (v6.0.0) and were corrected by wholegenome resequencing data using Pilon (v1.23;Walker et al., 2014). The contigs had an N50 (given a set of contigs arranged from long to short, N50 is defined as the sequence length of the n-th contig, where the total length of the first n contigs is equal to half of the total length of all contigs) up to 4.34 Mb and were further assembled into chromosome with Hi-C data using 3D DNA (Dudchenko et al., 2017) and Juicebox Assembly Tools (Durand et al., 2016).

Accession Numbers
The genome assembly and annotation of Tianlong1 (v1.0) are available at the National Center for Biotechnology Information with accession number PRJNA645754 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA645754). The sRNA-seq, mRNA-seq, and WGBS data generated in this study were deposited at the National Center for Biotechnology Information under accession number PRJNA644259 (https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA644259) and also hosted at http://ipf.sustech.edu.cn/priv/Tianlong1_ public with Tianlong1 as reference. The sRNA data mapped to the Williams 82 genome is also available at the MPSS plant sRNA website: https://mpss. danforthcenter.org/dbs/index.php?SITE5soy_sRNA_dcl2.

Supplemental Data
Supplemental Figure 1. The phylogenetic analysis of DCL proteins and the growth phenotype of the Gmdcl2a/2b mutants.
Supplemental Table 1. The list of 22-nucleotide siRNA loci with 22nucleotide siRNA TPM more than 5 and not decreased in Gmdcl2a/2b mutant.
Supplemental Data Set 1. The list of genes showing differential mRNA expression between wild type and Gmdcl2a/2b-8 mutant in seed coat.
Supplemental Data Set 2. The list of genes showing differential mRNA expression between wild type and Gmdcl2a/2b-8 mutant in leaf.
Supplemental File 1. Multiple sequence alignment fasta file for Supplemental Figure 1A.
Supplemental File 2. Newick tree file for Supplemental Figure 1A.
Supplemental File 3. The unprocessed images for Figure 1B and Supplemental Figure 1C.