Conserving a threatened North American walnut: a chromosome-scale reference genome for butternut (Juglans cinerea)

Abstract With the advent of affordable and more accurate third-generation sequencing technologies, and the associated bioinformatic tools, it is now possible to sequence, assemble, and annotate more species of conservation concern than ever before. Juglans cinerea, commonly known as butternut or white walnut, is a member of the walnut family, native to the Eastern United States and Southeastern Canada. The species is currently listed as Endangered on the IUCN Red List due to decline from an invasive fungus known as Ophiognomonia clavigignenti-juglandacearum (Oc-j) that causes butternut canker. Oc-j creates visible sores on the trunks of the tree which essentially starves and slowly kills the tree. Natural resistance to this pathogen is rare. Conserving butternut is of utmost priority due to its critical ecosystem role and cultural significance. As part of an integrated undergraduate and graduate student training program in biodiversity and conservation genomics, the first reference genome for Juglans cinerea is described here. This chromosome-scale 539 Mb assembly was generated from over 100 × coverage of Oxford Nanopore long reads and scaffolded with the Juglans mandshurica genome. Scaffolding with a closely related species oriented and ordered the sequences in a manner more representative of the structure of the genome without altering the sequence. Comparisons with sequenced Juglandaceae revealed high levels of synteny and further supported J. cinerea's recent phylogenetic placement. Comparative assessment of gene family evolution revealed a significant number of contracting families, including several associated with biotic stress response.


Introduction
Temporal analysis of biodiversity loss indicates that we are in the midst of the Earth's sixth mass extinction event (Ceballos et al. 2017).Habitat destruction and degradation, overexploitation, pollution, invasive species, climate change, and other anthropogenic factors are the greatest threats to Earth's flora and fauna.The International Union for the Conservation of Nature (IUCN) Red List noted that nearly 30% of the species assessed are at risk of extinction (IUCN 2022), including about 30% of trees.
Preserving adaptive potential through conservation genetics is widely recognized as important to ensuring long-term survival, especially in the face of climate change (Kardos et al. 2021).Conservation managers and geneticists can work together to monitor population genetic diversity and develop restoration plans including assisted breeding.However, several studies have acknowledged a "conservation genetics gap" stemming from, among other things, a lack of training, barriers to collaboration, discipline-specific terminology, and access to resources (Taylor et al. 2017;Sandström et al. 2019).The slow advancement of tool development and implementation in conservation is further impacted by the lack of genomic resources for the vast majority of threatened species, with less than 1% of all eukaryotic species https://doi.org/10.1093/g3journal/jkad189Advance Access Publication Date: 13 September 2023 Genome Report sequenced to date (Lewin et al. 2022).As such, the implementation of genetic tools in conservation and/or restoration benefits from the development of a reference genome (DeWoody et al. 2022).
Butternut (Juglans cinerea L.) is a deciduous, outcrossing, and wind-pollinated canopy tree native to eastern North America.Once valued for its wood and edible nuts, today, it is threatened by butternut canker, a disease caused by the fungus Ophiognomonia clavigignenti-juglandacearum (Nair et al. 1979), discovered in the late 1960s and now found throughout butternut's range.The disease had already killed more than 80% of butternut trees in numerous states by the 1990s (the last detailed censuses available) and is threatening the species with extinction.Diseased trees typically die within 5 to 10 years, and little resistance has been identified outside of hybrids with the introduced Japanese walnut (Juglans ailantifolia Carrière).The search for butternut individuals or populations with less susceptibility to butternut canker is confounded by environmental factors.For example, the severity of butternut canker as well as Armillaria root rot is greater in low-elevation sites with heavy soils and in humid sites with heavy canopies (LaBonte et al. 2015).Dramatic declines over the past few decades have resulted in the listing of butternut as endangered in Canada (Nielsen et al. 2003) and Endangered on the IUCN Red list in 2018.Several US states have also included butternut as protected or endangered, including Minnesota and Wisconsin (Smith 2018;WDNR 2021).
Butternut shows high genetic diversity (moderate cline from south to north) (Hoban et al. 2008;2010).In more isolated populations, lower genetic diversity and inbreeding are observed.Chloroplast markers have revealed population-level genetic differentiation across an east-west gradient that may have originated from multiple Ice Age refugia (Laricchia et al. 2015).Butternut is sympatric with Juglans nigra Linnaeus through much of its range however it cannot hybridize with this species, but it easily hybridizes with its close relative, the non-native J. ailantifolia.Hybrids with J. ailantifolia are more tolerant of butternut canker, grow faster/larger, and are more productive (Boraks and Broders 2014;Brennan et al. 2020).Hybrids between the species occur at low frequencies in numerous forests in several states (Hoban et al. 2012).Given the limited resistance observed among pure J. cinerea individuals, a plan proposed by practitioners seeks to conserve native germplasm and introduce genes for resistance from J. ailantifolia (Pike et al. 2021), similar to efforts in American chestnut (Fernandes et al. 2022).The development of reliable markers and the materials for forward and/or reverse breeding approaches, and for analyzing adaptive introgression of hybrids in the wild, will rely on a high-quality reference genome.
To address the conservation genetics gap, undergraduate and graduate students trained in the newly formed Biodiversity and Conservation Genomics Center at the University of Connecticut worked with practitioners to generate the first reference genome for the threatened J. cinerea.This study represents the intersection of industry (Oxford Nanopore), federal research agencies (USDA Forest Service, Canadian Forest Service (NRCAN)), forest tree conservation (Morton Arboretum), and academia.The chromosome-scale reference was annotated with existing transcriptomic (RNA-Seq) resources and examined in an evolutionary context to recently sequenced members of the Juglandaceae from around the globe.

Plant material, extraction, and sequencing
Leaves were collected from a single individual from a wild population of trees in Sheffield, New Brunswick, Canada (45.88 N, −66.36 W) (Fig. 1a) that appeared healthy (Fig. 1b).A vouchered specimen was deposited at the Connell Memorial Herbarium, University of New Brunswick (UNB #69382).Leaves were flash frozen in liquid nitrogen and stored at −80°C until extraction.Leaf tissue weighing 750 mg was bulked from a single accession and ground with a mortar and pestle in liquid nitrogen.High molecular weight (HMW) genomic DNA extraction was then conducted following Vaillancourt and Buell CTAB-Genomic-tip protocol (Vaillancourt et al. 2019).In brief, the aqueous layer containing the DNA was collected and mixed with isopropanol to precipitate the DNA, which was then resuspended in Buffer G2 and purified using a Qiagen Genomic-tip column.After being eluted in Buffer QF, the purified DNA was further purified by precipitation with isopropanol, followed by a wash with ethanol and elution in Buffer EB overnight at room temperature.DNA was stored at 4°C until shipped to University of Connecticut for library preparation and sequencing.The DNA was sheared to approximately 25 kb using a Covaris ® G-tube (#520079, Covaris, LLC, Woburn, MA, USA).The sequencing library was prepared using the ligation sequencing DNA v14 method (SQK-LSK114, Oxford Nanopore Technologies, Oxford, UK).Three separate loads of the library were run on one flow cell (FLO-PRO114M) over 90 hours, resulting in 5.2 million reads with MinKNOW ® (v.22.08.6).

Annotation
Repeat identification on the selected Canu assembly was performed with RepeatModeler (v.2.0.4), which provided a set of consensus sequences that served as input to RepeatMasker (v.4.1.4)(Supplementary Table 2) (Flynn et al. 2020).Regions identified as repeats were softmasked in preparation for genome annotation.Six public Illumina RNA-Seq libraries were accessed from NCBI (PRJNA413418) for J. cinerea leaf tissue (Supplementary Table 3).Each library was adapter trimmed with FastP (v.0.23.2) (Chen et al. 2018).The adapter trimmed reads were aligned to the softmasked reference with HISAT2 (v.2.2.1) to provide evidence for gene prediction with an 85% mapping rate cutoff, leading us to exclude two libraries (Kim et al. 2019).The RNA alignments and the softmasked genome served as the input to both BRAKER (v2.1.6)and TSEBRA (v.1.0.1) to generate reference annotations (Brůna et al. 2021;Gabriel et al. 2021).BRAKER was run with transcriptomic alignments (RNA-only mode) and viridiplantae protein input from OrthoDB (v.10) (protein-only mode).Both sets of predictions were fed into TSEBRA.This resulted in one transcriptomic set of predictions from BRAKER and one combined run from TSEBRA.The resulting GFF files were summarized with AGAT (v.1.0.0) and functionally annotated with EnTAP (v.0.10.8)(Hart et al. 2020;Dainat et al. 2022).This provided a reciprocal BLAST estimate against NCBI's RefSeq Plant Protein database (v208) as well as annotations derived from EggNOG and Gene Ontology (Huerta-Cepas et al. 2019;Ashburner et al. 2000).Both sets of predictions were filtered by the functional annotations; candidates without an EggNOG assignment were removed.The final annotation was selected from a combination of BUSCO, AGAT statistics, and the annotation rate reported by EnTAP.

Synteny
A macrosynteny analysis was conducted using MCscan (Python version) and jcvi (v.1.3.3) with available Juglans species, including J. cinerea (PRJEB56451), J. nigra (Zhou et al. 2023), J. regia (PRJNA291087), and J. mandshurica (PRJCA006358) (Tang et al. 2008) (Table 1).If not pre-filtered, any unclassified scaffolds were eliminated from both the genome and annotation files, resulting in a set of 16 chromosomes for each Juglans species.Protein-coding sequences from chromosome-scale assemblies, along with their respective annotations (GFF), were used as input for MCscan.The jcvi format module was used to first filter input FASTA files by longest isoform and convert GFF files to BED format.Next, a pairwise synteny search was conducted across J. cinerea and J. mandshurica, J. mandshurica and J. regia, and J. regia and J. nigra, respectively.The resulting anchors were then clustered into synteny blocks with a minimum span of 30 and visualized by the jcvi karyotype graphical module.The final chromosome order was determined by length decreasing from left to right.Juglans regia had eight noticeable inversions when compared to J. mandshurica; however, upon further inspection these chromosomes were found to be reverse complemented.To address this, the strand orientation in the simple anchors file was manually adjusted.

Gene family analysis
Orthologous gene families were identified with OrthoFinder (v2.5.4) (Emms and Kelly 2019) (Supplementary Table 4) using 10 species with reference genomes among Juglans (J.cinerea, J. nigra, J. regia, J. sigillata, J. microcarpa, J. hindsii, J. mandshurica, J. cathayensis), Carya (Carya illinoinensis), and Pterocarya (Pterocarya stenoptera).The species tree, based on resulting gene trees, was constructed in OrthoFinder and rooted using the species tree inference from all genes (STAG) methodology (Emms and Kelly 2018).Support values for internal nodes reflect the proportion of orthogroup trees where the bipartition is observed, and ranges from 0 to 1. Orthogroups only representing one species and orthogroups with oversized representatives from a single species (>100 genes) were removed.The remaining 17,471 orthogroups and ultrametric tree were provided to CAFE v5 (Mendes et al. 2021) to identify gene families with evidence of significant expansion or contraction at P-value <0.01 using the base model.The longest gene from each orthogroup was used to assign a putative function from EnTAP.Gene families associated with retroelements were filtered from the final analysis.

Assembly
A total of 59.7 Gb of high accuracy-basecalled (HAC) duplex reads and 59.9 Gb of super-basecalled (SUP) simplex Oxford Nanopore reads were generated using a PromethION sequencer, which each provided 105 × coverage at an estimated genome size of 570 Mb (Fig. 1c).Contamination filtering removed primarily bacteria, with the highest proportion of reads associated with Acinetobacter baumannii (Supplementary Fig. 2).Filtering resulted in 58.0 Gb of HAC reads with an N50 length of 19.5 Kb, and 58.0 Gb of SUP reads with an N50 length of 22.4 Kb.
With the SUP reads, Flye initially produced a slightly less complete but more contiguous assembly than Canu; 98.9% vs 99.2% BUSCO completeness and N50 of 2.55 b (1,320 scaffolds) vs N50 of 2.39 Mb (2,286 contigs), respectively (Supplementary Table 5; Fig. 1d).Polishing of Flye with Medaka yielded no increase in overall BUSCO completeness, but increased single-copy completeness by 0.1%.Purge Haplotigs increased the contiguity of both assemblies while largely maintaining completeness (N50 of 2.59 Mb in 667 scaffolds for Flye and N50 of 3.02 kb in 345 contigs for Canu).RagTag scaffolding with the J. mandshurica assembly drastically increased the contiguity of both assemblies, leaving Flye with 175 scaffolds and Canu with 58 scaffolds.The longest 16 scaffolds from each assembly were extracted and determined to be chromosome-level based on genome size, completeness, and read alignment.Juglans mandshurica was chosen for its close phylogenetic position, chromosome conservation, and ability to hybridize (Mu et al. 2017).The assemblies were fairly similar in terms of completeness, contiguity, and overall accuracy (Supplementary Table 5).The scaffolded Canu assembly was selected for downstream analyses and distinguished only by fewer gaps when compared to the scaffolded Flye assembly (5.32 vs 9.18 Ns per 100 kb).The final 16 pseudo-chromosome assembly reported is 539 Mb in length with an N50 of 34.86 Mb and an embryophyta BUSCO completeness of 99.0% (Supplementary Table 5; Fig. 1d).To ensure the assembly was not negatively impacted by the scaffolding process, missassembly was evaluated by examining the percentage of unmapped and soft-clipped bases from read alignment with mini-map2.Before scaffolding, 12% of bases were unmapped and 3.58% were soft-clipped.After scaffolding, the percentage of unmapped bases remained the same, and 5.77% of bases were softclipped (Supplementary Fig. 3).This supports minimal impact on overall quality through the scaffolding process.
This exclusively nanopore-assembled genome benefitted from high-quality base calls (SUP reads) and >100 × coverage to generate a genome of 539 Mb (slightly less than the estimated 570 Mb) (Supplementary Table 5).The initial genome size estimates were produced from uncorrected long reads and may not be as reliable of an estimate but were consistent with C-value estimates from Genomes on a Tree (GoaT) (Challis et al. 2023).Juglans cinerea is the 10th member of the Juglandaceae to have its whole genome sequenced.Comparing our final J. cinerea assembly with that of J. mandshurica to which it was scaffolded (Li et al. 2022b), J. cinerea exhibits a smaller scaffold N50 size of 34.86 Mb, higher BUSCO completeness score of 99.0%, larger genome size of 539 Mb, and higher GC content of 36.64%(Table 1).

Annotation
Prior to structural annotation, repeat sequences in the assembly were identified, resulting in 2,353 RepeatScout/RECON families and 1,117 LTRPipeline families.A total of 298 of the LTR families were determined to be redundant and removed, resulting in 3,172 unique repeats.Repeatmasker used this library to softmask 302 Mb (56.02%) of the butternut genome (Fig. 1e).Among the softmasked regions, 28.72% represented retroelements, most of which (22.46%) were predicted to be long terminal repeats (LTRs).Gypsy elements accounted for 8.13% of the genome, whereas Copia elements made up 11.12%.As expected, a large proportion of the repeats (22.42%) were unclassified (Supplementary Table 2).Juglans cinerea exhibited repeat content that is consistent with other sequenced Juglans, with a value that fell between J. sigillata (50.06%) and J. mandshurica (60.99%) (Ning et al. 2020;Li et al. 2022a).Retroelements-mostly LTRscompose the highest classified repeat content in all three genomes.As reported in the J. regia × Juglans microcarpa hybrid and J. regia genomes, the telomeric repeat CCCTAAA was identified in J. cinerea at the ends of each chromosome (Zhu et al. 2019;Marrano et al. 2020).
After trimming, the six RNA-Seq libraries had an average mapping rate of 91.2% against the assembled J.cinerea genome (Supplementary Table 3).Gene prediction was attempted with both BRAKER and TSEBRA on the softmasked assembly (Supplementary Table 6).BRAKER provided a high-quality annotation with only transcriptomic data, resulting in 38,377 predicted genes, 41,357 transcripts, a mono:multi ratio of 0.286, a BUSCO completeness score of 97.2% and an annotation rate of 77.1%.TSEBRA, which combined transcriptomic and protein BRAKER runs, resulted in a slightly lower quality annotation with 35,437 genes, 38,928 transcripts, 0.54 mono:mult ratio, 96.8% BUSCO completeness score, and 86.2% annotation rate.After filtering with EggNOG, the quality of both runs increased, and the number of predicted genes and transcripts was reduced to 30,526 and 33,275,respectively in BRAKER,and 30,328 and 33,694 in TSEBRA.Both programs resulted in a 98.1% annotation rate after filtering, although the BRAKER annotation was closer to the expected mono:multi ratio of plants at 0.21, outperforming that of TSEBRA, 0.36 (Vuruputoor et al. 2022).As a result, BRAKER was chosen to represent the final structural annotation for J. cinerea.These annotation results are similar to other Juglans species, including the J. mandshurica genome released in 2022, which has 39,466 predicted genes, 47,355 transcripts, a mono:multi ratio of 0.347, 95.2% BUSCO completeness, and an annotation rate of 84.9% in chromosome-level scaffolds.Likewise, J. nigra and J. regia, which have 29,949 and 30,709 genes, respectively, 29,949 and 45,959 transcripts, BUSCO completeness of 96.4 and 99.6%, mono:multi of 0.210 and 0.206, and an annotation rate of 96.3 and 99.9% (Table 1).The impressive BUSCO completeness score of the first version of J. cinerea, ranking second to version three of J. regia, is a testament to the effectiveness of our methodology.
All context methylation (based on 5mC) revealed a central peak pattern to each chromosome that did not correlate with CpG content (Fig. 1e).However, it does appear to correspond with small peaks of repeat content, possibly representing centromeres.Future analyses should analyze methylation by type (CpG, CHG, CHH) in association with repeats and across gene lengths.

Synteny
The success of our genome assembly is in part due to the quality of the J. mandshurica genome, which was used to scaffold our de novo assembly.However, utilizing the scaffold mode of RagTag and not the correction mode allowed some rearrangements to be retained in the J. cinerea assembly, as highlighted by the pairwise macrosynteny analysis (Fig. 2).Conserved synteny at chromosome-level was observed in a phylogenetic context, comparing J. cinerea to J. mandshurica, J. mandshurica to J. nigra, and J. nigra to J. regia.A total of 52,993 (NR:38,556) seed synteny blocks (anchors) were found in 294 clusters between J. cinerea and J. mandshurica, 46,251 (NR:37,316) anchors were identified in 297 clusters between J. mandshurica and J. nigra, and 47,683 (NR:37,354) anchors were found in 247 cluster across J. nigra and J. regia.Considering the total number of protein-coding genes in each pairwise comparison, respectively, 68.1%, 66.6%, and 78.6% of compared genomes were collinear, indicating strong support for ancestral collinearity (Li et al. 2022b).

Gene family analysis
The OrthoFinder analysis conducted with 10 Juglandaceae assigned 384,404 genes out of 401,209 (95.8%) into 33,922 orthogroups (Fig. 3a; Supplementary Table 4).A total of 13,173 orthogroups (38.8%) were observed in all species and 4,076 orthogroups (12%) were specific to a single species.There were 807 single-copy orthogroups and 16,805 unassigned genes.In J. cinerea, 32,743 genes (98.4%) were assigned to an orthogroup.Of the total number of orthgroups found across the 10 species, 22,863 (67.4%) included J. cinerea.A total of 132 orthogroups (0.39%) containing 328 genes were unique to J. cinerea, and only 532 genes (1.6%) could not be assigned to an orthogroup.Juglans cinerea shares more one-to-one orthologs outside of the Trachycaryon- A reference genome for Juglans cinerea | 5 Cardiocaryon clade than within the clade, possibly reflecting isolation and divergence of the ancestral Cardiocaryon population to the ancestral J. cinerea population (Fig. 3a; Supplementary Table 4).Members of the Juglandaceae assessed here belong to one of four sections.The North American species J. nigra, J. hindsii and J. microcarpa are members of sect.Rhysocaryon.The sampled Eurasian species (sect.Juglans) include J. sigillata and J. regia, the latter being the predominant cultivated species for walnut production.Juglans cathayensis and J. mandshurica are members of sect.Cardiocaryon.Juglans ailantifolia, also in section Cardiocaryon, but not included here, is sometimes described as a sub-species of J. mandshurica (Mu et al. 2017).This is of note given the ability of J. cinerea to hybridize with J. ailantifolia but not with the sympatric J. nigra (Brennan et al. 2020).
Phylogenetic relationships among Juglandaceae remain enigmatic despite multiple sequence and morphology-based analyses.Using protein sequences sourced from whole genome assemblies, Juglans was found to be sister to Carya with Pterocarya as the outgroup (Fig. 3b).Recent phylogenomic analyses relying on sequence data from de novo aligned SNPs (Mu et al. 2020), RADseq SNPs mapped to the J. regia genome (Zhou et al. 2021), and a deep analysis of subgenomes (Ding et al. 2023) support Juglans as sister to Pterocarya, with Carya as the outgroup (Fig. 3c).The incongruent results may be an artifact of reduced sampling in this study; however, further research is warranted.
Within Juglans, phylogenetic discordance is frequently observed between trees built from different datasets, including whole genome sequencing, and the above-mentioned SNP sets, subgenome analyses, and plastid vs nuclear markers.The tree generated from this study places Juglans section Rhysocaryon sister to sect.Juglans, in turn, are sister to a clade containing sections Cadiocaryon and Trachycaryon (Fig. 3c).However, RADseq SNPs dervied from de novo alignments generate a tree with section Juglans sister to Cardiocaryon and section Rhysocaryon sister to Trachycaryon (Mu et al. 2020), a topology in discordance with ours.Topologies made from nuclear SNPs mapped to the J. regia genome place Cardiocaryon sister to Trachycaryon, as observed here; however, section Juglans subtends the Cardiocaryon-Trachycarhon clade with Rhysocaryon as the outgroup in the nuclear SNP-based topology (Zhou et al. 2021).
The placement of J. cinerea has varied from the earliest plastid and ITS trees (Stanford et al. 2000), to more recent trees built from whole chloroplast genomes and low-copy nuclear markers (Dong et al. 2017).Characteristic of these trees is the uncertainty of the placement of J. cinerea with either Asian or North American Juglans.Phylogenetic evidence from whole plastid genomes of Juglans indicates J. cinerea captured a North American chloroplast genome, while the analysis conducted here places J. cinerea sister to the Asian Butternut clade (Fig. 3b).Clearly given the discordance within Juglans and Juglandaceae, there is a pressing need for additional research using multiple phylogenetic approaches and simulation models.
The overall gene family analysis with CAFE identified a mean gene family birth-death rate (λ) for the whole tree of 0.88, with a tri-modal distribution of gene family expansion/contraction probabilities (Supplementary Fig. 4).The species with the most rapidly evolving gene families (significant at P-value <0.01) was J. regia with 523 gene families.Juglans cinerea was the species within Juglans with the lowest number of significantly evolving gene families, with a total of 71, and the majority of these families were contracting (N = 52, Fig. 3b).After removing orthogroups consisting of retroelements, a total of 54 orthogroups were identified from CAFE as rapidly expanding (9) and rapidly contracting (45) in J. cinerea (Fig. 4; Supplementary Table 7).Among the small set expanding, COG categories broadly described them as cytoskeleton-related, transcription regulation, post-translational modification, cell cycle control, and ion transport.Among those  .Alternate hypotheses of relationships between Carya, Juglans, and Pterocarya using data from RADseq de novo SNPs (Mu et al. 2020), RADseq SNPs mapped to the J. regia genome (Zhou et al. 2021), subgenomes (Ding et al. 2023), and whole genome sequencing and orthogroup gene tree analysis (this study) (c).
A reference genome for Juglans cinerea | 7 hydrolase (LAH), mTERFs, Class III peroxidases (PRX3), UPF0481 proteins, and PAN-like domain proteins.Of these, the following are actually absent in J. cinerea: one NB-LRR gene family, one of the three contracting WAK gene families, one cytochrome P450, one of three contracting glycosyl hydrolases gene families, lipolytic acyl hydrolase (LAH), one of two PRX3s, and both families associated with UPF0481 proteins.WAK are increasingly associated with plant immunity, including neo-and sub-functionalization in response to changing selection pressure (Stephens et al. 2022;Zhou et al. 2023).Among Juglans, they are believed to be absent in J. hindsii and J. sigillata (Trouern-Trend et al. 2020), rapidly contracting in J. mandshurica, and expanding in J. regia (Yan et al. 2021).Evaluation of WAK gene families in J. mandshurica and J. regia found they are under purifying selection (Yan et al. 2021).Expression levels of WAK gene family members were observed to be higher in anthracnose-resistant varieties of J. regia (Li et al. 2022a).In J. cinerea, a strong signature of contraction across two WAK gene families and absence in the third, was observed.Further investigation of WAKs and other pathogenesis-related families in a population context may reveal more about their role in resistance to specific fungal pathogens.

Conclusions
The present study reports Juglans cinerea's first high-quality reference genome, generated using Oxford Nanopore technology and a combination of assembly tools.The J. mandshurica genome enabled pseudo-chromosome generation and chromosome-level analyses.A preliminary comparative view of J. cinerea was presented in context to other species in the Juglandaceae family to assess both synteny and gene family evolution.The construction of the genome was connected to a year-long training program initiated with high molecular weight extraction of the target tree.Students worked closely with graduate students and faculty members to understand the challenges associated with genome assembly and annotation.In doing so, the genome benefitted from robust methods testing.Finally, the program intersected with practitioners who provided input on the reference accession and a framework that will guide the application of this highquality reference in the future.Future studies expanded to population-level analyses of J. cinerea, and J. ailantifolia hybrid resistance to Oc-j, will be needed to develop conservation strategies informed by genetics.https://gitlab.com/PlantGenomicsLab/butternut-genome-assemblyor on Zenodo with the DOI: 10.5281/zenodo.8170410.
Supplemental material available at G3 online.

Fig. 1 .
Fig. 1.Range map from the BIEN database and sample location of the Juglans cinerea reference tree (UNB #69382) (a).Photographs of the reference tree from the Sheffield, NB population exhibiting a healthy crown and trunk absent of cankers (b).k-mer abundance plot (length = 17 bp) of k-mer count (x-axis) vs frequency (y-axis).Genome size from long-read data initially estimated at 570 Mb (c).Genome assemblies represented by their length (Mb) and contiguity (N50 (Kb)) obtained from QUAST.Assemblies software represented by color (Canu = yellow; Flye = blue) and shape (drafts = circles; selected draft = star) with the following abbreviations: MD = Medaka; PH = Purge Haplotigs.BUSCO single (S) and duplicated (D) completeness scores in are shown in parentheses (d).Circos plot representing the following tracks: (0) assembly chromosomes (1) average 5mC methylation in all sequence contexts (CG, CHG, and CHH) over 100 Kb windows (2) CpG site count over 100 Kb windows (3) gene coverage over 10 Kb windows (4) repeat coverage over 10 Kb windows (e).

Fig. 3 .
Fig. 3. Results of OrthoFinder and the phylogenomic analysis of 33,922 orthrogroup gene trees.Upset plot of the total gene families shared by different intersections of 10 species (a).Consensus phylogeny from 33,922 orthogroup gene trees.STAG support values at internal nodes and the number significantly expanding or contracting gene families (P-value <0.01) on branches (b).Alternate hypotheses of relationships between Carya, Juglans, and Pterocarya using data from RADseq de novo SNPs(Mu et al. 2020), RADseq SNPs mapped to the J. regia genome(Zhou et al. 2021), subgenomes(Ding et al. 2023), and whole genome sequencing and orthogroup gene tree analysis (this study) (c).

Table 1 .
Comparison of Juglans chromosome-level genome assemblies and annotations.