Opsin Gene Duplication in Lepidoptera: Retrotransposition, Sex Linkage, and Gene Expression

Abstract Color vision in insects is determined by signaling cascades, central to which are opsin proteins, resulting in sensitivity to light at different wavelengths. In certain insect groups, lineage-specific evolution of opsin genes, in terms of copy number, shifts in expression patterns, and functional amino acid substitutions, has resulted in changes in color vision with subsequent behavioral and niche adaptations. Lepidoptera are a fascinating model to address whether evolutionary change in opsin content and sequence evolution are associated with changes in vision phenotype. Until recently, the lack of high-quality genome data representing broad sampling across the lepidopteran phylogeny has greatly limited our ability to accurately address this question. Here, we annotate opsin genes in 219 lepidopteran genomes representing 33 families, reconstruct their evolutionary history, and analyze shifts in selective pressures and expression between genes and species. We discover 44 duplication events in opsin genes across ∼300 million years of lepidopteran evolution. While many duplication events are species or family specific, we find retention of an ancient long-wavelength-sensitive (LW) opsin duplication derived by retrotransposition within the speciose superfamily Noctuoidea (in the families Nolidae, Erebidae, and Noctuidae). This conserved LW retrogene shows life stage–specific expression suggesting visual sensitivities or other sensory functions specific to the early larval stage. This study provides a comprehensive order-wide view of opsin evolution across Lepidoptera, showcasing high rates of opsin duplications and changes in expression patterns.

(Upper) Gene tree of all opsin sequences inferred using a maximum likelihood approach using IQtree.The tree and gene clade colours correspond to those in Figure 1B in the main text.(Lower) Gene trees for each of the visual opsins (UV, Blue, LW), with major duplications shared between a large number of species coloured and labelled with the lepidopteran family within which it occurred.
Gene tracks showing independant tandem duplication of the Blue opsin gene in Euclidia mi (Mother Shipton moth) and Ochlodes sylvanus (Large skipper butterfly).Both blue copies are located on chromosome 19 in Euclidia mi, with the duplicated Blue opsin located 83kb upstream of the parent copy in the opposite orientation.In Ochlodes sylvanus three blue opsin copies present on chromosome 13 represent one duplication shared with other skipper butterflies (Blue2 in the figure) and another more recent duplication just found in Ochlodes sylvanus in our dataset (Blue3).All blue opsin copies are all in the same orientation and closely linked in the genome, with 600 base pairs and 19Kb of intergenic space between them.The more recent duplicate (Blue3) likely emerged as a result of duplication of the Blue2 copy, as they share identical sequence homology.

Squid
Helix 3 β-strands Helix 5 Helix 6 Helix 7 Supplemental Figure S4: Gene tree and alignment of blue opsin duplicate genes.Left; Maximum likelihood gene tree of blue opsin genes which underwent duplication in certain lepidopteran species.Alternating grey colours correspond to specific families within which the blue opsin gene underwent a duplication event (Lycaenidae, Hesperidae, Erebidae, and Pieridae).The Squid opsin (taken from Todarodes pacificus, accession no.CAA4990) is used as an outgroup.Right; Partial alignment of the corresponding blue opsin paralogs, showing regions of the opsin protein corresponding to helix loops and beta-strands.Dots and triangles above the alignment are obtained from (Liénard et al. 2021) and correspond to sites within 5 Å of any carbon atom in the retinal polyenechain, and sites shown to result in spectral shift of the blue opsin gene, respectively.The numbered sites above the alignment correspond to amino acid sites within the squid opsin protein.Supplemental Figure S5: Autosome-Z chromosome fusion events in Tortricidae and Anorthoa munda.Orthologous BUSCO genes between outgroup and ingroup species are painted onto each chromosome using lepbuscopainter (https://github.com/charlottewright/lep_busco_painter).For each species, each chromosome is shown as a rectangle.A grey bar is given for shared orthologous BUSCO genes in the same position as the outgroup species, and a coloured bar is shown for BUSCO genes in an alternative chromosome location compared to the outgroup.This is shown for (A) three representative species in Tortricidae compared to Plutella xylostella as the outgroup, and (B) Anorthoa munda compared to Tholera decimalis as the outgroup species.In both A and B, the top chromosome rectangle represents the Z chromosome in all species (highlighted with a pink box) and shows an autosome-Z chromosome fusion in the ingroup species.Additionally, the location of the newly Z-linked LW opsin genes are labelled with green inverted triangles.(A) Topology of three families within the Noctuoidea superfamily (left) and the corresponding representative genomic location of the LWS2 copy in each family (right).The LWS2 copy is shown in yellow, while the genes located within the syntenic region in each family is shown in blue and labelled according to the functional domain contained within that gene.(B) Family specific syntenic clusters surrounding the LWS2 gene.

Nolidae
For each family specific LWS2 syntenic block (top, middle, and bottom), the surrounding genes are shown for Meganola albula (Nolidae), Autographa gamma (Noctuidae), Hecatera dysodea (Noctuidae), Lymantria monacha (Erebidae), and Orgyia antiqua (Erebidea).The syntenic genes surrounding the LWS2 copy are coloured and the corresponding colour and gene name are given in the legend.(C) The hypothetical modes of LWS2 origination and subsequent genome translocations are provided along the topology which corresponds to that in (A).

Micropterix aruncella
: Summary of LWS2 paralog genomic locations and duplication history.