Phylogenetic and Protein Structure Analyses Provide Insight into the Evolution and Diversification of the CD36 Domain “Apex” among Scavenger Receptor Class B Proteins across Eukarya

Abstract The cluster of differentiation 36 (CD36) domain defines the characteristic ectodomain associated with class B scavenger receptor (SR-B) proteins. In bilaterians, SR-Bs play critical roles in diverse biological processes including innate immunity functions such as pathogen recognition and apoptotic cell clearance, as well as metabolic sensing associated with fatty acid uptake and cholesterol transport. Although previous studies suggest this protein family is ancient, SR-B diversity across Eukarya has not been robustly characterized. We analyzed SR-B homologs identified from the genomes and transcriptomes of 165 diverse eukaryotic species. The presence of highly conserved amino acid motifs across major eukaryotic supergroups supports the presence of a SR-B homolog in the last eukaryotic common ancestor. Our comparative analyses of SR-B protein structure identify the retention of a canonical asymmetric beta barrel tertiary structure within the CD36 ectodomain across Eukarya. We also identify multiple instances of independent lineage-specific sequence expansions in the apex region of the CD36 ectodomain—a region functionally associated with ligand-sensing. We hypothesize that a combination of both sequence expansion and structural variation in the CD36 apex region may reflect the evolution of SR-B ligand-sensing specificity between diverse eukaryotic clades.


Introduction
Cellular responses to external stimuli are often initiated by membrane-bound pattern recognition receptors (PRRs) (Koropatnick et al. 2004;Ausubel 2005;Schaefer 2014).Many PRR proteins contain functional domains associated with ligand binding and/or cellular adhesion, coupled with signal transduction (Wheeler et al. 2008;Özbek et al. 2010;Richter et al. 2018;Richter and Levin 2019).Scavenger receptors (SRs) are a functionally defined group of structurally diverse transmembrane receptor proteins.Initially identified based on their function of "scavenging" and removing modified lipoproteins, it has been established that SRs recognize a wide array of ligands and can serve as PRRs (reviewed in PrabhuDas et al. 2017;Taban et al. 2022).
Members of the class B scavenger receptors (SR-Bs), also commonly referred to as the "CD36 family,", are uniquely defined among SRs by two transmembrane domains with relatively short N-and C-terminal cytoplasmic tails involved in intracellular signal transduction and an ectodomain domain constituting the CD36 antigen (Taban et al. 2022;fig. 1A).Within mammals, several multifunctional SR-Bs have been described: SCARB1, lysosomal integral membrane protein type 2 (LIMP-2 or SCARB2), and CD36 (also known as SCARB3).In vertebrates, SR-B proteins are expressed in a variety of cell types and bind diverse ligands including lipid-based pheromones; lipid-soluble vitamins; cell adhesion proteins; cholesterols; phospholipids; microbeassociated molecular patterns (MAMPs), as well as damage-associated molecular patterns (DAMPs) from cell debris; and long-chain fatty acids (LCFAs) (Koropatnick et al. 2004;Ausubel 2005;Silverstein and Febbraio 2009;Sheedy et al. 2013;Schaefer 2014;Chen et al. 2022).In mammalian microglial cells, dysregulation of CD36 may be involved in the pathologies of Alzheimer's disease (Stuart et al. 2007;Park et al. 2013;Dobri et al. 2021).SR-B proteins that recognize MAMPs and DAMPs also cooperate with other cell surface PRRs to mediate innate immune responses including phagosome/endosome formation (Erdman et al. 2009;Silverstein and Febbraio 2009;Stewart et al. 2010;Chen et al. 2022).
X-ray crystallography of human SR-B proteins shows that the CD36 ectodomain folds into an asymmetric beta barrel structure with a group of alpha helices at the protein's "apex"-the most distal region of the ectodomain extending farthest from the cell membrane (Neculai et al. 2013;Hsieh et al 2016;supplementary fig. 1, Supplementary Material online).This "three-helix bundle" at the apex plays a critical role in ligand recognition (Kar et al. 2008;Kuda et al. 2013;Neculai et al. 2013; supplementary fig.1A, Supplementary Material online).The apex region is spatially adjacent to a binding pocket and an intramolecular tunnel that assists with transport of cholesterol esters and LCFAs (fig.1A; Neculai et al. 2013;Conrad et al. 2017;Glatz and Luiken 2018;Heybrock et al. 2019).Additionally, the ectodomain contains two to three disulfide bridges between loop regions of the asymmetric beta barrel core (Papale et al. 2011;Yu et al. 2012).SR-B proteins have been identified and functionally characterized in diverse metazoans, from mammals to sponges (Franc et al. 1996;Müller et al. 2004;Dinguirard and Yoshino 2006;Means et al. 2009;Bi et al. 2015;Neubauer et al. 2016).Insects have three distinct lineage-specific clusters of CD36 domain-containing proteins (Nichols and Vogt 2008) with similar structural features to vertebrate homologs, including the three-helix bundle at the apex of the ectodomain, a putative binding pocket, presence of an intramolecular tunnel, and disulfide bridges (Nichols and Vogt 2008;Gomez-Diaz et al. 2016).Intriguingly, the amoebozoan slime mold Dictyostelium discoideum possesses three LIMP-2/CD36 genes (LmpA-C) that have been shown to function in phagocytosis (Karakesisoglou et al. 1999, Sattler et al. 2018).However, little is known regarding the distribution of SR-Bs or their diversity and evolution in other nonmetazoan eukaryotes.
To investigate SR-Bs across a diversity of eukaryotic lineages, we leveraged comparative analysis of proteins using sequence alignments, amino acid (aa) motif prediction (Bailey 2021), and artificial intelligence (AI) algorithm tools for both protein structure prediction (Baek et al. 2021;Jumper et al. 2021) and small molecule interaction prediction (Hekkelman et al. 2023).Our phylogenetic analyses support the presence of an ancestral SR-B-like protein in the last eukaryote common ancestor (LECA).Protein structure predictions show that eukaryotic SR-Bs share a broadly conserved asymmetric beta barrel tertiary structure within the CD36 ectodomain that contrasts with an evolutionarily labile region at the membrane-distal apex of the ectodomain.This membrane-distal apex region correlates with lineage-specific sequence expansions of highly variable alpha helical structure.Importantly, in functionally characterized SR-Bs, aa residues critical for ligand recognition reside in the apex region (Neculai et al. 2013;Hsieh et al. 2016; supplementary fig.1A, Supplementary Material online).Based on our results, we hypothesize that sequence expansion and structural divergence in the apex region of the CD36 ectodomain may reflect the evolution of SR-B ligand specificity between diverse eukaryotic lineages.2021), Li, Hou, et al. (2021), Li, Shen, et al. (2021), andStrassert et al. (2021).Raw CD36 Pfam domain hits following isoform removal for each taxon can be found in supplementary table S2, Supplementary Material online.(C) Length distribution for each of the 279 CD36 ectodomains.Includes aa comprising the predicted noncytoplasmic region containing CD36 ectodomain sequence between transmembrane helices.Colors follow the taxonomic scheme in panel (B).

SR
helices (PrabhuDas et al. 2017;this study).We used HMMER to search for the presence of CD36 domains in 165 publicly available eukaryotic proteomes and recovered a total of 418 unique CD36 domain-containing protein sequences (supplementary tables S1 and S2, Supplementary Material online).CD36 domain-containing protein sequences were then screened for flanking transmembrane domains, from which we identified 279 bona fide SR-B homologs.SR-B homologs were recovered across Eukarya, including in taxa from several early-diverging eukaryotic lineages including the Hemimastigophora, Ancyromonadida, and CRuMs (fig.1B).
Some recovered SR-B protein sequences with unusually long cytoplasmic regions appeared to represent gene model fusions and contained protein domains not typically associated with SR-Bs.For example, the ctenophore Mnemiopsis leidyi SR-B gene (ML01096a) contains a long N-terminal cytoplasmic region with a PHD finger motif that is not found in other eukaryotic SR-Bs.We compared transcript data with the genomic locus and identified exons 1-7 as belonging to a separate PHD finger-containing gene, whereas exons 8 and 9 of the gene model correspond to SR-B transcripts, with a putative transcription start site 75 bp upstream of exon 8 (Moreland et al. 2014).A revised protein sequence translated from this start codon through exons 8 and 9 was used in this study (supplementary file 1, Supplementary Material online).Similarly, the choanoflagellate Monosiga brevicollis sequence (XP_001747074.1)also likely represents a gene model fusion based upon comparison with other choanoflagellate SR-B transcripts (Richter et al. 2018).A revised protein sequence, translated from exons 2-11, was used in this study (supplementary file 1, Supplementary Material online).
We then isolated and aligned CD36 ectodomain aa sequences from recovered bona fide SR-B homologs by removing predicted sequences for transmembrane helices and cytoplasmic tails (supplementary file 2, Supplementary Material online).We recovered significant lineage-specific variation in sequence length among CD36 ectodomains across eukaryotes.Metazoan CD36 ectodomains are between 367 and 616aa, whereas nonmetazoan CD36 ectodomains vary more widely, between 413 and 2,294 aa (fig.1C).In all cases, the most substantial length variation is associated with the membrane-distal apex of the ectodomain (supplementary file 3, Supplementary Material online, positions 433-2064).Notably, this variable sequence expansion region corresponds with two alpha helices of the threehelix bundle that is known to play a functional role in mediating ligand recognition in mammalian SR-Bs (Neculai et al. 2013).

Phylogenetic Analysis of Eukaryotic SR-B Proteins
We employed both maximum likelihood (RAxML) and Bayesian (MrBayes) phylogenetic analyses to explore relationships between CD36 domains of SR-B homologs across Eukarya.Congruent bootstrap (BS ≥ 80%) and Bayesian posterior probability (BPP ≥ 0.9) support were recovered for shallow nodes that unite SR-B orthologs and identify putative lineage-specific paralogs, including some informative clades among metazoans (fig.2; supplementary figs. 2 and 3 and table S3, Supplementary Material online).However, results from both analyses are broadly typified by relatively long branch lengths and a lack of support for deep divergences.For example, vertebrate SCARB2 orthologs group with high support (BS = 100%, BPP = 1) indicating an ancestral SCARB2 protein was likely present in the craniates.However, despite support for each of the three insect SR-B groups (Nichols and Vogt 2008), the relationships between emp-like, crq-like, and SNMP-like insect SR-Bs to other metazoan SR-Bs remain unresolved (fig.2; supplementary table S3, Supplementary Material online).
We did not recover support for deep relationships between nonmetazoan SR-B homologs.For example, the largest supported clade of nonmetazoan SR-Bs is restricted to Peronosporales and Pythiales oomycetes.Among amoebozoans, SR-Bs clustered into two strongly supported clades, one containing putative LmpB-like homologs and the other including both LmpA-like and LmpC-like homologs (fig.2; supplementary table S3, Supplementary Material online; Sattler et al. 2018).Notably, all clades united by nodes with BS ≥ 80% were composed of SR-Bs from monophyletic taxa, with the exception of a single clade comprising six cnidarian and two lophotrochozoan sequences (fig.2; supplementary table S3, Supplementary Material online).In contrast to the majority of metazoan SR-Bs with CD36 ectodomains <500 aa, the cnidarian and lophotrochozoan CD36 ectodomains in this group are all >500 aa in length (fig.1C).Although these eight sequences share three invariant residues (F788, W812, and L1188; supplementary file 3, Supplementary Material online) and an additional independent alignment identified a further three shared invariant residues (supplementary fig.4, Supplementary Material online), we are unable to reject the possibility of long branch attraction.
Our combined data suggest that SR-Bs appear to have undergone significant sequence divergence during the evolution and diversification of eukaryotic lineages, obscuring homolog relationships between lineages.Additionally, eukaryotic genomic diversity remains underrepresented despite significant recent progress in the availability of genomic resources among nonmetazoans (Richter et al. 2022).In this study, ∼45% of CD36 domain-containing sequences are metazoan; thus, taxonomic sampling bias represents a potential limitation for resolving phylogenetic relationships among SR-B homologs.Despite these limitations, we interpret our phylogeny reconstruction as supporting the presence of at least one ancestral SR-B in LECA (fig.2; supplementary figs. 2 and 3, Supplementary Material online).

CD36 Disulfide Bridge Conservation in Metazoa
Three pairs of cysteine residues forming intramolecular disulfide bridges have been identified in the CD36  Rasmussen et al. 1998).Functional analyses have highlighted the importance of these disulfide bridge-forming cysteines for primate SCARB1 and SCARB3 multimerization, as well high-density lipoprotein (HDL) binding and HDL cholesterol ester uptake by SCARB1 in primates (Thorne et al. 1997;Papale et al. 2011;Yu et al. 2012).However, binding of oxidized low-density lipoproteins (oxLDLs) by human SCARB3 appears to be independent of disulfide bridge formation (Jay et al., 2015).We characterized the distribution of cysteine residue pairs with putative homology to human SCARB3 disulfide bridges-1, -2, and -3 across eukaryotic SR-Bs (supplementary table S4, Supplementary Material online).
Disulfide bridge-1 and disulfide bridge-2 cysteine residue pairs are well-represented among bilaterian SR-B sequences (83%), and both disulfide bridge pairs are detected among cnidarian, placozoan, and sponge sequences.Cysteine residue pairs with putative homology to disulfide bridge-3 appear to be specific to bilaterians and among vertebrates are restricted to SCARB3 homologs (supplementary table S4, Supplementary Material online).Among protostomes, disulfide bridge-3 cysteines are found in most arthropod sequences (90%).However, disulfide bridge-3 cysteines appear to be broadly absent from representative lophotrochozoan sequences, only being recovered from the annelid Dimorphilus gyrociliatus.Cnidarian and lophotrochozoan CD36 ectodomains >500 aa in length appear to lack homologous cysteine residues.Notably, no putative disulfide bridge-forming cysteine homolog pairs were identified in representative ctenophore CD36 ectodomain sequences.
Among nonmetazoan sequences, the only cysteine pairs recovered were from three saprolegnialean oomycete SR-Bs (Thrcl_SR-B, Aphin_SR-B, Sapdi_SR-B; supplementary table S4, Supplementary Material online).However, upon visual inspection, these putative disulfide bridge-3 cysteines were found to belong to a region of lineage-specific sequence expansion (supplementary file 2, Supplementary Material online) and thus excluded as false positives.Therefore, among representative nonmetazoans, no putative disulfide bridge-forming cysteine pairs were detected in our analysis.

Motifs within the CD36 Ectodomain
To further characterize patterns of sequence conservation and lineage-specific divergence in the CD36 ectodomain across eukaryotic SR-Bs, we performed an aa motif discovery analysis using the STREME algorithm to identify enriched motifs (Bailey 2021).We recovered 18 motifs that broadly characterize the CD36 ectodomain across the 279 identified eukaryotic SR-B sequences (fig.3A; supplementary fig.5, Supplementary Material online).Conserved regions were associated with high significance motifs; in contrast, lineage-specific regions with lower conservation were associated with relatively lower motif significance (supplementary table S5, Supplementary Material online).Motif position weight matrices were used to probe individual SR-B sequences to predict motif presence (fig.3B).
Motifs M1, M2, and M17 are detected with high confidence in >90% of recovered SR-B sequences (supplementary table S5, Supplementary Material online).These three motifs correspond with the most highly conserved regions of the unfiltered alignment (supplementary file 3, Supplementary Material online).Motifs M1 and M2 are located in the N-terminal region, and motif M17 is located in the C-terminal region of the CD36 ectodomain.These dominant motifs contribute structurally to the longest β-strands within the conserved asymmetric beta barrel of the CD36 ectodomain and are spatially adjacent (supplementary fig.5, Supplementary Material online; figs.4C and 5C).Motif M6 is also highly represented across eukaryotes (211 out of 279 sequences) and corresponds with a relatively short β-strand and α-helical region linking two α-helices in the apex (supplementary figs.5 and 8, Supplementary Material online).
Several motifs with restricted distribution patterns across eukaryotes were also recovered.For example, motif M7 is particularly prevalent among bilaterian SR-Bs (fig.3B) and characterizes a flexible loop region connecting the three-helix bundle at the apex to the asymmetric beta barrel of the CD36 ectodomain (supplementary fig.5, Supplementary Material online).The two lowest scoring motifs, M5 and M8, exhibit a primarily nonmetazoan distribution, with high representation among the oomycete stramenopiles.
A Structurally Homologous Asymmetric Beta Barrel Is Characteristic of Eukaryotic SR-Bs Divergent primary aa sequences can produce highly similar structures (Yang and Honig 2000;Illergård et al. 2009).Our observation of low primary sequence identity across CD36 ectodomains of eukaryotic SR-Bs (∼13%; supplementary file 3, Supplementary Material online) prompted us to interrogate SR-B protein structure.We generated AlphaFold2 (Jumper et al. 2021) and RoseTTAFold (Baek et al. 2021) protein structure models for the complete set of 279 SR-Bs (supplementary files 9 and 10, Supplementary Material online).The relative accuracy of SR-B structure predictions was assessed by comparing the Homo sapiens SCARB3 X-ray crystallographic model (PDB 5LGD) against AlphaFold2 and RoseTTAFold models (Hsieh et al. 2016; supplementary fig.1B, Supplementary Material online).
Our AlphaFold2 analysis accurately predicted SCARB3 secondary and tertiary structural features (RMSD = 0.515 Å across all atom pairs).RoseTTAFold also predicts most SCARB3 secondary and tertiary features, though with lower positional accuracy in the loop and alpha-helix regions when compared with the AlphaFold2 model (RMSD = 1.298Å across all atom pairs; supplementary fig.1C, Supplementary Material online).The relative length consistency of metazoan CD36 ectodomains also prompted us to derive a consensus sequence from an alignment of metazoan SR-Bs (supplementary files 7 and 8, Supplementary Material online) to investigate structural similarities across metazoan SR-Bs (supplementary fig.6A, Supplementary Material online).The consensus metazoan CD36 ectodomain is 401 aa in length and possesses alpha helices structurally congruent with the three-helix bundle at the apex of bilaterian CD36 ectodomains (supplementary fig.6B-F, Supplementary Material online).We leveraged the metazoan consensus SR-B structural model for comparative purposes.
Despite significant primary sequence variation across eukaryotic SR-Bs, all protein structure models recovered an asymmetric beta barrel, initially described in bilaterians, as the most highly conserved tertiary structure (Neculai et al. 2013;Gomez-Diaz et al. 2016;figs. 4 and 5).In particular, the arrangement of antiparallel β-strands is spatially consistent.The majority of conserved sequence motifs are located within the canonical asymmetric beta barrel.For example, sequence motifs M1, M2, M17, and M18 map to the longest β-strands that are also spatially adjacent to one another (supplementary fig.5, Supplementary Material online; figs.4C and 5C).Strikingly, across eukaryotes, we also recover evidence for a conserved pocket, cavity, and intramolecular tunnel feature associated with the canonical asymmetric beta barrel similar to mammalian and insect SR-Bs (supplementary fig.7, Supplementary Material online).
To investigate putative ligand-receptor interactions with structural features of the CD36 asymmetric beta barrel, we queried SR-B AlphaFold2 models with the AlphaFill algorithm (Hekkelman et al. 2023).The AlphaFill databank contains several thousand small molecule compounds, including two known ligands for human SCARB2: cholesterol (CLR) and phosphatidylcholine (PC/ PCW) (Conrad et al. 2017;Heybrock et al. 2019).CLR was recovered as a putative ligand within the intramolecular tunnel associated with the conserved asymmetric beta barrel, and PCW was recovered as putative ligand of the binding pocket adjacent to the conserved asymmetric beta barrel for 90% of metazoan SR-B structural models, including the metazoan consensus model (supplementary table S8, Supplementary Material online).

Discussion
Evidence for a Common Eukaryotic SR-B Origin, with Independent Losses The SR-B proteins are broadly represented across diverse eukaryotic lineages, including in some of the earliest-diverging branches (Discoba, Hemimastigophora, Ancyromonadida, CRuMs), supporting an ancient origin for the SR-B family (fig.1B).Among some eukaryotic clades, derived taxa appear to lack CD36 domain signatures in their proteomes.For example, within the opisthokont groups Ichthyosporea and Holomycota, only early-diverging taxa retain SR-Bs.Thus we infer independent loss of SR-B homologs during the early diversification of the ichthyosporeans and in holomycotans following the divergence of the nucleariids (e.g., Fonticula alba and Parvularia atlantis).We detected similar losses within Archaeplastida, where SR-B homologs can be found within chlorophytes but not among representative streptophyte, glaucophyte, or rhodophyte species.SR-B proteins have been hypthesized to be members of the ancient eukaryotic phagosome proteome (Boulais et al. 2010).While the origins of eukaryotic phagocytosis remain unclear (Yutin et al. 2009;Mills 2020), our analyses Structural superpositions were first performed for each exemplar SR-B with the metazoan consensus SR-B sequence, and alignments were generated based on the most spatially adjacent residues for each superposition.Residues in each model are color-coded by the RMSD-cα (root mean square deviation between the alpha carbons in each residue pair) to visualize structurally conserved versus divergent regions.RMSD-cα values are binned according to the categories of "high" (<1.5 Å), "medium" (1.5-2.7 Å), and "low" (>2.7 Å) resolution used to assess the quality of crystallographic protein models (Wlodawer et al. 2007).Residues that fail to align are "unaligned" and colored magenta.Brackets highlight stretches of unaligned, lineage-specific residues that correspond with arrowheads in the metazoan consensus SR-B and represent the location of sequence expansion.Positionally, lineage-specific apex sequence expansions are consistently found to correspond with the three-helix bundle at the apex of the canonical SR-B structure.
highlight an intriguing absence of SR-B homologs in eukaryotic clades that lack phaocytosis via plasma membrane infolding resulting in the formation of membrane bound intracellular phagosomes.Of note, metazoan sequences remain overrepresented in our phylogenetic analyses, highlighting the need for continued expansion of pan-eukaryotic taxon sampling and genomic resource development.
Structural congruence across eukaryotic SR-B homologs is consistent with the established position that protein tertiary structure typically demonstrates higher conservation than primary aa sequence identity (Yang and Honig 2000;Illergard et al. 2009;figs. 4 and 5).In particular, the CD36 domain retains an asymmetric beta barrel, an adjacent binding pocket, and a connected intramolecular tunnel that are broadly shared structural features across all eukaryotic SR-B proteins (supplementary fig.7, Supplementary Material online).The longest beta strands of the canonical asymmetric beta barrel correspond with sequence motifs that are widely distributed across eukaryotes (M1, M2, M17), further supporting an ancient origin.Additionally, ligand recognition modeling suggests cholesterol interaction with the intramolecular tunnel may represent a highly conserved feature of the SR-B receptor family across eukaryotic lineages.

Evidence for Lineage-Specific CD36 Ectodomain Sequence Expansions Reflecting Apex Region Tertiary Structure Diversity
The membrane-distal apex of the CD36 ectodomain is associated with ligand recognition by a three α-helix bundle, and its functional role has been investigated most thoroughly in mammalian SR-B homologs (Kar et al. 2008;Kuda et al. 2013;Neculai et al. 2013;Hsieh et al. 2016).Our protein models identified structurally homologous three α-helix bundles at the apex of both metazoan and nonmetazoan SR-B homologs that are <500 aa in length (fig.1C; supplementary files 9 and 10, Supplementary Material online).For example, similar three α-helix bundles are present in SR-B homologs found in the early branching discoban and hemimastigophran lineages (supplementary fig.8, Supplementary Material online).We hypothesize that the ancestral CD36 ectodomain apex present in LECA may have resembled the three α-helix bundle structure present in most metazoan SR-B homologs.
Strikingly, structural comparisons between CD36 ectodomains reveal a consistent "deviation" from the putative ancestral three α-helix bundle (exemplified by the metazoan consensus SR-B) positionally corresponding to α4 and α5 (fig.6).We identified several lineage-specific sequence expansions at the membrane-distal apices of SR-B proteins >500 aa in length composed of multiple α-helices that appear to lack homology to known protein domains.We posit that these CD36 apex sequence expansions have occurred independently multiple times across Eukarya.Critically, the α5 helix motif contains aa residues responsible for initial ligand recognition, such as binding of LCFAs, oxLDL, and Plasmodium falciparum MAMPs by human SCARB3 (Neculai et al. 2013;Hsieh et al. 2016).We hypothesize that a combination of lineage-specific sequence expansion and structural variation observed in the apex regions of divergent SR-B homologs may reflect the evolution of lineage-specific SR-B ligand-sensing specificity.

Conclusion
Sensing and processing environmental signals is central for many biological processes, including nutrient uptake, reproduction, and innate immunity.Integral receptor proteins at the interface of the cell and external environment are largely responsible for recognizing proximal extracellular signals and initiating eukaryotic cellular responses, such as signal transduction and/or interactions with coreceptors.Our protein structure data suggest that environmental and physiological ligand interactions may have driven structural variation in the membrane-distal apex region of the CD36 domain across the SR-B receptor family.The formation of alternative tertiary protein structures at the CD36 ectodomain apex may be, in part, constrained by sequence length.Lineage-specific sequence expansions in the putative ligand-sensing apex appear to have contributed to the generation of novel tertiary structures.We hypothesize that these novel, lineage-specific, apex tertiary structures may correlate with ligand interactions and provide a broader understanding of the evolution of this ancient receptor family across Eukarya.Our incorporation of protein structure modeling highlights the utility of using comparative protein structure analyses to derive evolutionary inferences beyond comparison of primary aa sequences.

SR-B Identification Pipeline
The HMMER v3.3.2 package was used to search 165 publicly available genomes and transcriptomes from a diverse range of eukaryotic lineages (supplementary table S1, Supplementary Material online) (Eddy 1998).The hidden Markov model (hmm) for the Pfam CD36 domain, PF01130, was used to query predicted protein sequences from each taxon with the hmmsearch command (Mistry et al. 2021).Sequences that exceeded the default inclusion threshold for hmmsearch were retained.The CD-HIT v4.8.1 package was then used to remove identical protein sequences within each taxon (Fu et al. 2012).For each taxon, a combination of MAFFT v7.450 multiple sequence alignments (MSAs) (Katoh andStandley 2013), FastTree v2.1.12 (Price et al. 2010), and genome browsers (supplementary table S1, Supplementary Material online) was used to identify likely sequence isoforms.Where multiple isoforms were detected, only the most complete representative sequence was retained for subsequent analyses.TMHMM v2.0 (Krogh et al. 2001) and InterProScan v2.0 (Quevillon et al. 2005) were then used to identify transmembrane helices flanking the CD36 domain in order to confirm SR-B-like architecture (TM-CD36-TM).Only sequences with identifiable SR-B-like architectures were used for subsequent alignments and analyses (supplementary file 1, Supplementary Material online).
Where applicable, we corroborated computationally predicted gene models that appeared to be gene model fusions or incomplete proteins with expression data in the form of transcripts or transcriptome data (e.g., Richter et al. 2018;Moreland et al. 2020).See supplementary table S2, Supplementary Material online for search results which specify modified sequences as well as all unique CD36 domain-containing proteins from each taxon, including those without an SR-B-like architecture.

CD36 Domain Phylogenetic Analyses
Full-length CD36 ectodomains were isolated from SR-B sequences by trimming predicted transmembrane regions and cytoplasmic tails (supplementary file 2, Supplementary Material online).For protein sequences with multiple CD36 ectodomains, only the first CD36 ectodomain was used in subsequent alignments (Trica_SR-B_1, Xensp_SR-B_1, and Pytin_SR-B_3).Recovered CD36 domains were then aligned with MAFFT v7.450 (Katoh and Standley 2013) -Gutiérrez et al. 2009), was used to remove blocks of poorly aligned residues in the CD36 ectodomain alignment to generate a filtered alignment optimized for phylogenetic signal using the "-gappyout" option (supplementary file 4, Supplementary Material online).ModelTest-NG v0.1.7 was used to identify LG + G4 + F as the best-fit aa substitution model for the filtered alignment (Darriba et al. 2020).Maximum likelihood (ML) analyses were performed using the Pthreads version of RAxML v8.2.12 (Stamatakis 2014).A total of 300 independent ML searches were performed on randomized maximum parsimony starting trees.One hundred BS replicates were performed and used to draw bipartitions on the best scoring tree (log-likelihood score of −223,409.574461)(supplementary fig. 2 and file 5, Supplementary Material online).
Bayesian analyses were performed with MrBayes v3.2.7 (Ronquist and Huelsenbeck 2003).Initially two independent runs of 5 million generations with five chains each using default heating and the "LG + G4" aa model were performed.The average standard deviation of split frequencies between the initial two runs was 0.147471 reflecting a lack of convergence.Therefore, an additional 1 million runs were performed, for a total of 6 million generations, to assess the feasibility of achieving convergence between the two runs.The average standard deviation of split frequencies between the two 6 million runs was 0.145354; thus, the Bayesian analyses failed to converge.To attempt to recover supported nodes despite the lack of convergence, the initial 25% of trees from each run were removed as burn-in and posterior probabilities (BPP) were calculated (supplementary fig. 3 and file 6, Supplementary Material online).FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/ figtree/) was used to visualize both ML and Bayesian trees.

SR-B Motif Prediction
Aa motifs were identified using STREME v5.5.3 (Bailey 2021).Default motif length settings were used (minimum and maximum motif length of 8 and 30 aa, respectively).The STREME algorithm stops searching for motifs after three calls when the default P value threshold exceeds 0.05.Recovered motifs are evaluated for statistical significance by STREME by determining the probability of discriminating primary sequences from a subset of randomized and shuffled control sequences.When STREME reports more than one motif, the P value does not completely account for multiple testing; in this case, E-values are calculated for each motif to evaluate statistical significance (Bailey 2021).The strength of a putative motif match in a given sequence is determined by deriving a "sequence score" by summing the weight of each position in the motif PWM (Bailey 2021).STREME computes a match threshold for each motif; sequence elements with scores below the match threshold are disregarded.

SR-B Structure Prediction
Complete SR-B sequences were modeled with AlphaFold2 (Jumper et al. 2021) via the ColabFold web server (accessed March 31, 2022-December 15, 2022;Mirdita et al. 2022) and with RoseTTAFold (Baek et al. 2021) via the Robetta web server (accessed March 14, 2022-December 15, 2022;Kim et al. 2004).Sequences >1,401 aa were excluded from both AlphaFold2 and RoseTTAFold structure prediction (ColabFold and Robetta web server parameters require sequences <1,401 aa in length).Protein sequences containing undetermined aa positions (X) were replaced with alanine (A), a nonbulky, nonreactive aa residue commonly used in Evolution and Diversification of the CD36 Domain "Apex" protein structure and stability experiments (Matthews 1996).AlphaFold2 via ColabFold generated five models for each SR-B sequence, ranking each by global accuracy with a predicted template-modeling score (pTM) (Jumper et al. 2021), an approximation of the "true" template modeling score (TM score) (Zhang and Skolnick 2004).pTM scores range from 0 to 1.0, with 1.0 being the best.Models with a pTM of >0.7 were used for comparative analyses.The highest ranked model above this pTM threshold for each SR-B was used for downstream analyses.
Local accuracy was measured with the predicted local distance difference test (pLDDT) computed by AlphaFold which gives per-residue confidence scores (supplementary table S5, Supplementary Material online).pLDDT values range from 0 to 100, with 100 being the best.SR-B RoseTTAFold models were evaluated with the local distance difference test (lDDT) using the deep learning framework DeepAccNet (Mariani et al. 2013;Hiranuma et al. 2021).lDDT scores range from 0 to 1.0, with 1.0 being the best.Local accuracy was assessed by root mean square (RMS) error of α-carbons in Ångstroms (supplementary table S5, Supplementary Material online).Internal cavities were predicted with the CASTp 3.0 web server (Tian et al. 2018) using a probe radius of 0.9 Å (Chovancova et al. 2012).All protein models were visualized in UCSF Chimera v1.17 (Pettersen et al. 2004).
An additional alignment of complete metazoan SR-B sequences was generated to produce a metazoan consensus SR-B sequence (supplementary files 6 and 7, Supplementary Material online).The metazoan consensus SR-B sequence was fed into both AlphaFold2 and RoseTTAFold structure prediction pipelines to yield a metazoan consensus protein structure prediction (supplementary fig.6, Supplementary Material online).
FIG. 1.-Distribution of SR-Bs across Eukarya.(A) Space-filling model of SCARB3.SR-Bs are integral membrane proteins with two transmembrane domains, cytoplasmic N-and C-terminal tails (gray shading) and a CD36 ectodomain (orange shading) characterized by a ligand-sensing apex and binding pocket and containing an intramolecular tunnel.(B) Phylogenetic distribution of SR-Bs.Asterisk indicates taxa for which transcriptomic data were used.An oval next to a row indicates obligate parasitic life history.Number indicates how many SR-B homologs were detected in each representative taxon.Color-coded taxonomic groupings are on the right.Phylogeny based on Burki et al. (2020), Carr et al. (2017), Derelle et al. (2016), Ettahi et al. (2021),Li, Hou, et al. (2021),Li, Shen, et al. (2021), andStrassert et al. (2021).Raw CD36 Pfam domain hits following isoform removal for each taxon can be found in supplementary table S2, Supplementary Material online.(C) Length distribution for each of the 279 CD36 ectodomains.Includes aa comprising the predicted noncytoplasmic region containing CD36 ectodomain sequence between transmembrane helices.Colors follow the taxonomic scheme in panel (B).

FIG. 3
FIG. 3.-CD36 ectodomain aa motifs.(A) Sequence logos for 18 aa motifs recovered from our set of 279 eukaryotic CD36 domains by STREME v5.5.3.Letter height represents the relative aa frequency.(B) A heatmap depicting the sequence score for each motif to each eukaryotic CD36 ectodomain is shown.Raw sequence scores calculated by STREME were transformed to a 0-100 scale represented on the right.The full matrix of motif sequence scores is found in supplementary table S5, Supplementary Material online.Colors correspond with the taxonomic scheme in fig.1B.

FIG. 4 .
FIG. 4.-Metazoan SR-Bs share a highly conserved tertiary structure.The highest ranked AlphaFold2 structure predictions for four exemplar metazoan SR-B proteins.From left to right, H. sapiens SCARB3, Drosophila melanogaster Crq, Nematostella vectensis SR-B_1, and M. leidyi SR-B_1, alongside the consensus structure prediction derived from an all-metazoan SR-B sequence alignment.Protein structure visualized with Chimera.(A) Green residues in the ribbon models reflect ≥40% sequence conservation across eukaryotic SR-Bs.(B) pLDDT scores computed by AlphaFold2 assess per-residue confidence across the CD36 ectodomain region of each model.Residue color reflects high-confidence (gray), moderate-confidence (light blue), and low-confidence (yellow) pLDDT (Varadi et al. 2021).(C) The CD36 ectodomain region of each model is annotated with the aa motifs discovered via STREME.Motifs M1-M4 (cyan) lie more N-terminal to the three-helix bundle at the apex, and motifs M7 and M9-M18 (purple) lie more C-terminal to the three-helix bundle at the apex.Motif M6 (orange) connects α5 to α7 but is not part of the bundle itself.(D) Graphs depict AlphaFold2 pLDDT scores across residue positions for each SR-B.AlphaFold2 via ColabFold generates five protein structure models per input protein sequence and ranks each by global accuracy using the pTM metric (pTM score).Line color indicates the model rank, with rank 1 being the best.

FIG. 5 .
FIG. 5.-Divergent eukaryotic SR-Bs retain a conserved CD36 domain beta barrel core with lineage-specific expansions in the apex region.The highest ranked AlphaFold2 structure predictions for four exemplar eukaryote SR-B proteins with divergent, lineage-specific alpha-helix structures at the apex of the CD36 ectodomain are shown, from left to right, the metazoan consensus structure, A. aurita SR-B, D. discoideum SR-B_2 (LmpA), Phytophthora infestans SR-B_1, and Tetrahymena thermophila SR-B_2.Protein structure visualization was in Chimera.(A) Green residues reflect ≥40% conservation across the alignment of all eukaryotic SR-Bs.(B) pLDDT scores computed by AlphaFold2 assess per-residue confidence across the CD36 ectodomain region of each model.Residue color reflects high-confidence (gray), moderate-confidence (light blue), and low-confidence (yellow) pLDDT (Varadi et al. 2021).(C) The CD36 ectodomain region of each model is annotated with the aa motifs discovered via STREME.Motifs M1-M4 (cyan) lie more N-terminal to the three-helix bundle at the apex, and motifs M7 and M9-M18 (purple) lie more C-terminal to the three-helix bundle at the apex.Motif M6 (orange) connects two apex α-helices but is not part of the bundle itself.M5 and M8 represent the two lowest significance STREME motifs and are short motifs found within the lineage-specific sequence expansions at the apex of the CD36 ectodomain of P. infestans.(D) Graphs depict AlphaFold2 pLDDT scores across residue positions for each SR-B.AlphaFold2 via ColabFold generates five protein structure models per input protein sequence and ranks each by global accuracy using the pTM metric (pTM score).Line color indicates the model rank, with rank 1 being the best.

FIG. 6
FIG.6.-Sequenceexpansions coincident with the SR-B apex are divergent and have independent origins.AlphaFold2 structure predictions for four eukaryotic SR-B exemplars with lineage-specific apex structures, (A) A. aurita SR-B 1, (B) D. discoideum SR-B_2 (LmpA), (C) P. infestans SR-B_1, and (D) T. thermophila SR-B_2, are each presented alongside the metazoan consensus SR-B structure, which serves as a structural proxy for the "three-helix bundle" SR-B apex.Structural superpositions were first performed for each exemplar SR-B with the metazoan consensus SR-B sequence, and alignments were generated based on the most spatially adjacent residues for each superposition.Residues in each model are color-coded by the RMSD-cα (root mean square deviation between the alpha carbons in each residue pair) to visualize structurally conserved versus divergent regions.RMSD-cα values are binned according to the categories of "high" (<1.5 Å), "medium" (1.5-2.7 Å), and "low" (>2.7 Å) resolution used to assess the quality of crystallographic protein models(Wlodawer et al. 2007).Residues that fail to align are "unaligned" and colored magenta.Brackets highlight stretches of unaligned, lineage-specific residues that correspond with arrowheads in the metazoan consensus SR-B and represent the location of sequence expansion.Positionally, lineage-specific apex sequence expansions are consistently found to correspond with the three-helix bundle at the apex of the canonical SR-B structure.
Evolution and Diversification of the CD36 Domain "Apex" ectodomain of several mammalian and insect SR-Bs (supplementary fig.1A,Supplementary Material online; FIG. 2.-ML (RAxML) gene tree for eukaryotic CD36 domains.Circles indicate nodes with BS support ≥ 95%.Diamonds indicate nodes with BS support ≥ 80%.Clades with ≥ 80% BS support are colored according to fig.1B.Unannotated tree with node support values is supplementary fig.2, Supplementary Material online.