A chromosome-level draft genome of the grain aphid Sitobion miscanthi

Abstract Background Sitobion miscanthi is an ideal model for studying host plant specificity, parthenogenesis-based phenotypic plasticity, and interactions between insects and other species of various trophic levels, such as viruses, bacteria, plants, and natural enemies. However, the genome information for this species has not yet to be sequenced and published. Here, we analyzed the entire genome of a parthenogenetic female aphid colony using Pacific Biosciences long-read sequencing and Hi-C data to generate chromosome-length scaffolds and a highly contiguous genome assembly. Results The final draft genome assembly from 33.88 Gb of raw data was ∼397.90 Mb in size, with a 2.05 Mb contig N50. Nine chromosomes were further assembled based on Hi-C data to a 377.19 Mb final size with a 36.26 Mb scaffold N50. The identified repeat sequences accounted for 26.41% of the genome, and 16,006 protein-coding genes were annotated. According to the phylogenetic analysis, S. miscanthi is closely related to Acyrthosiphon pisum, with S. miscanthi diverging from their common ancestor ∼25.0–44.9 million years ago. Conclusions We generated a high-quality draft of the S. miscanthi genome. This genome assembly should help promote research on the lifestyle and feeding specificity of aphids and their interactions with each other and species at other trophic levels. It can serve as a resource for accelerating genome-assisted improvements in insecticide-resistant management and environmentally safe aphid management.

Prof. Julian Chen Abstract: Background: Sitobion miscanthi is an ideal model for studying host plant specificity, parthenogenesis-based phenotypic plasticity, and interactions between insects and other species of various trophic levels, such as viruses, bacteria, plants and natural enemies. However, the genome information for this species has not been published yet. Here, we analyzed the entire genome of a female aphid colony using long-read sequencing and Hi-C data to generate chromosome-length scaffolds and a highly contiguous genome assembly. Results: 1.The final draft genome assembly from 33.88 Gb of raw data was approximately 397.90 Mb with a 2.05 Mb contig N50. Nine chromosomes were further assembled based on Hi-C data to a 377. 19 Mb final size with a 36.26 Mb scaffold N50. 2.The identified repeat sequences accounted for 26.41% of the genome, and 16,006 protein-coding genes were annotated. According to the phylogenetic analysis, S. miscanthi is closely related to Acyrthosiphon pisum, with S. miscanthi diverging from their common ancestor approximately 25.0-44.9 million years ago. Conclusions: We generated a high-quality draft of the Sitobion miscanthi genome. This genome assembly promotes research on the lifestyle and feeding specificity of aphids and their interactions with each other and species at other trophic levels. It can serve as a resource for accelerating genome-assisted improvements in chemical drug resistant management and environmentally friendly aphid management. However, the genome information for this species has yet to be sequenced and published.

27
Here, we analyzed the entire genome of a female aphid colony using PacBio long-read 28 sequencing and Hi-C data to generate chromosome-length scaffolds and a highly contiguous 29 genome assembly.  The grain aphid Sitobion miscanthi (NCBI: txid44668, Figure 1) widely mis-reported as 48 Sitobion avenae in China [1], is a globally distributed sap-sucking specialist of cereal and a Hebei province, was kept in our laboratory for genome sequencing.  Table 1).

87
Genome assembly using PacBio long reads 88 The genomic DNA libraries were constructed and sequenced using the PacBio Sequel 89 platform. Additionally, 4.35 million subreads (33.88 Gb in total) with an N50 read length of 90 12,697 bp were obtained after removing the adaptor ( Figure S1).

91
De novo genome assembly with long reads was performed using two pipelines, Canu (Canu, To improve genome contiguity, two assemblies generated from the Canu and wtdbg pipelines  single-copy and duplicated BUSCOs, respectively ( Figure S4). alignable pairs reads whose mapping quality was more than 20 remained for further analysis.

140
In total, 38.44% of unique mapped read pairs were valid interaction pairs for scaffold 141 correction and were used to cluster, order and orient scaffolds onto chromosomes by 142 LACHESIS [19].

143
Before chromosome assembly, we first performed a preassembly for the error correction of 144 scaffolds, which required the splitting of scaffolds into segments of 50 kb on average. The

145
Hi-C data were mapped to these segments using BWA (version 0.7.10-r789) software. The

146
uniquely mapped data were retained to perform assembly by using LACHESIS software. Any two segments that showed inconsistent connection with information from the raw scaffold 148 were checked manually. These corrected scaffolds were then assembled with LACHESIS. Hi-C assembly was much higher than that of the 7 previously published aphid genome 158 assemblies constructed using DNA NGS technologies (Table 3). 160 To identify tandem repeats, we utilized 4 software, namely LTR_FINDER (v1.0.5;  (Table 4).  16,006 genes were annotated based on at least one database (Table S2). 209 We employed the OrthoMCL program [41] with an e-value threshold of 1e-5 to identify gene