Hydrophobicity-Driven Increases in Editing in Mitochondrial mRNAs during the Evolution of Kinetoplastids

Abstract Kinetoplastids are a diverse group of flagellates which exhibit editing by insertion/deletion of Us in the mitochondrial mRNAs. Some mRNAs require editing to build most of their coding sequences, a process known as pan-editing. Evidence suggests that pan-editing is an ancestral feature in kinetoplastids. Here, we investigate how the transition from nonedited to pan-edited states occurred. The mitochondrial mRNAs and protein sequences from nine kinetoplastids and related groups (diplonemids, euglenids, and jakobids) were analyzed. RNA editing increased protein hydrophobicity to extreme values by introducing Us in the second codon position, despite the absence of editing preferences related to codon position. In addition, hydrophobicity was maintained by purifying selection in species that lost editing by retroposition of the fully edited mRNA. Only a few hydrophobic to hydrophilic amino acid changes were inferred for such species. In the protein secondary structure, these changes occurred spatially close to other hydrophilic residues. The analysis of coevolving sites showed that multiple changes are required together for hydrophobicity to be lost, which suggest the proteins are locked into extended hydrophobicity. Finally, an analysis of the NAD7 protein–protein interactions showed they can also influence hydrophobicity increase in the protein and where editing can occur in the mRNA. In conclusion, our results suggest that protein hydrophobicity has influenced editing site selection and how editing expanded in mRNAs. In effect, the hydrophobicity increase was entrenched by a neutral ratchet moved by a mutational pressure to introduce Us, thus helping to explain both RNA editing increase and, possibly, persistence.


Introduction
Kinetoplastids are a clade of flagellate protozoa that includes species with medical, veterinary, and/or ecological importance. The most studied genera in the clade are Leishmania and Trypanosoma, with several species causing neglected diseases: leishmaniasis, Chagas disease, and sleeping sickness. In addition, kinetoplastids have unique features among eukaryotes. One of many peculiarities is a single mitochondrion with a complex network of concatenated DNA rings (Cavalcanti and de Souza 2018). Maxicircles are the DNA rings that code for mitochondrial ribosomal RNAs (9S and 12S) and <20 mitochondrial proteins (Simpson et al. 1987;Ruvalcaba-Trejo and Sturm 2011;Lin et al. 2015). Several of such genes require posttranscriptional modifications in their pre-mRNA to generate functional open-reading frames. The process is known as mRNA editing, and it is made by inserting (most commonly) or deleting uridines (Us) (Simpson et al. 1996). Some genes are known as cryptogenes because their mRNAs require extensive editing, a phenomenon known as pan-editing ). In addition, pan-editing is an ancestral feature of kinetoplastids that was lost in different lineages and genes (Gray 1994;Maslov et al. 1994;Simpson and Maslov 1994a). Some clades phylogenetically close to kinetoplastids, such as diplonemids and euglenids, have no evidence of current or past editing by inserting or deleting Us (Dobakova et al. 2015;Burger and Valach 2018). Consequently, it is unknown how this transition from nonediting to pan-editing has occurred in kinetoplastids.
U insertion/deletion requires complex and energetically expensive machinery and thousands of DNA minicircles coding short RNAs called guide RNAs (gRNAs), which indicate where editing must occur (Estevez and Simpson 1999;Simpson et al. 2003;Read et al. 2016;Rusman et al. 2021). Different evolutionary hypotheses have been proposed to explain the origin and persistence of this expensive editing system, although no consensus has been achieved yet (Gray et al. 2010;Speijer 2010;Flegontov et al. 2011). Gray et al. proposed several years ago that editing in kinetoplastids is an example of constructive neutral evolution MBE (CNE) (Covello and Gray 1993;Gray et al. 2010;Flegontov et al. 2011). In this framework, editing has no adaptive advantages, and kinetoplastids were trapped in a ratchet of complexity increase. Such a ratchet would require that the editing process not be lost by a single mutation step; otherwise selection pressures would favor cheaper systems without editing. The main criticism to the hypothesis of editing in kinetoplastids as an example of CNE was that editing can be lost in a single mutational step, that is, the replacement of the cryptogen in the maxicircle by a retrotranscribed fully edited mRNA (Speijer 2010). Despite this, CNE should not be discarded if the appearance of novel editing functions is considered or selective pressures for cheaper systems are not so strong (Lukes et al. 2021). Recently, it has been proposed that CNE can explain the origin of multimeric proteins by a hydrophobic ratchet (Hochberg et al. 2020). Basically, mutations at the binding interface between monomers accumulate hydrophobic amino acids by mutation. Such mutations are neutral in the dimeric or multimeric state since such sites stay buried and hidden from the solvent. However, they are deleterious in the monomeric state when such sites are exposed. Consequently, mutations that change hydrophilic to hydrophobic amino acids at the protein surface may favor multimerization. Purifying selection then entrenches the multimeric form because monomeric ones are now deleterious. Such a ratchet works even if there is no functional advantage for the multimeric form in comparison with the monomeric one. Interestingly, in kinetoplastids, the inserted Us probably contribute to increasing U content. Such an increase may affect protein hydrophobicity since the genetic code determines that codons with U residues in the second positions always code for hydrophobic amino acids. However, to our knowledge, the impact of the mRNA editing process on protein hydrophobicity and a potential hydrophobic ratchet were not analyzed at all.
In this paper, we address the evolutionary transition from nonedited (ancestral state) to pan-edited mRNAs. For that purpose, we analyzed available sequences of mitochondrion-encoded mRNA in kinetoplastids, their related groups within Euglenozoa (diplonemids and euglenids), and in a clade related to Euglenozoa (jakobids). In addition, we investigated how the hydrophobicity of the proteins has driven this transition-from nonedited mRNAs to pan-edited ones-and how this transition generated an extreme hydrophobicity increase in the proteins. Finally, we discuss the implications of our results for the neutral and adaptationist hypotheses about mRNA editing in kinetoplastids.

Kinetoplastids Have an Increased U Content in Mitochondrial Pan-Edited mRNAs with No Preferential U Distribution on Codon Positions
To determine how U insertion affected hydrophobicity in mitochondrion-encoded proteins, we first analyzed the U content in mitochondrial mRNAs in kinetoplastids and related groups, that is, diplonemids, euglenids, and jakobids (table 1) for different mitochondrial genes. We divided the analyzed genes into two groups. The first one includes genes that are pan-edited in trypanosomes but have lost pan-editing in some kinetoplastids (pan-edited genes). The second group includes genes that are nonedited or partially edited in all studied kinetoplastids (nonedited or partially edited genes). On average, the U content was higher in kinetoplastids than that in the related groups. This higher content in kinetoplastids was observed in both pan-edited genes (COX3, NAD7, NAD9, ATP6, and RPS12) and nonedited or partially edited genes (table 1). Introducing Us at the second position of any codon results in a hydrophobic amino acid, although at the third position, it may be related to codon usage preference. Consequently, we first analyzed whether there is a differential distribution of the total Us at different codon positions ( fig. 1 and  supplementary fig. S1, Supplementary Material online). Considering pan-edited genes (COX3, NAD7, NAD9, ATP6, and RPS12), no differential distribution was observed in terms of codon positions for the total Us for any species (table 2 and fig. 1). This pattern was also observed at different regions in the sequence of such pan-edited mRNAs ( fig. 2). Instead, COX1 and NAD1 (both nonedited) showed a differential distribution favoring Us at the second and third codon positions in kinetoplastids (table 2 and  supplementary fig. S1, Supplementary Material online). In addition, the analyzed pan-edited genes in Trypanosoma brucei (the trypanosome with most complete information about editing) exhibit homogeneous distribution of inserted Us (table 3). Instead, a differential U distribution at codon positions was mainly observed for diplonemids, euglenids, and jakobids (table 2,  These results suggest that editing has increased the U content in the ancestor of kinetoplastids. In addition, such an increase had no preference for any codon position, which suggests that it was not directed by selective pressures related to codon position.

Kinetoplastids Have Increased Hydrophobicity Indexes and Reduced Frequency of Disorder-Promoting Amino Acids in Proteins Coded by Pan-Edited Genes
Considering that U insertion at the second codon position introduces a hydrophobic amino acid, we compared hydrophobicity levels in proteins coded by mitochondrial MBE pan-edited mRNAs from T. brucei against nonedited homologs in diplonemids or euglenids and jakobids. As expected, we observed a higher accumulated hydrophobicity index for all proteins coded by pan-edited mRNAs (fig. 3, left). In addition, differences in amino acid composition were observed. The most frequent amino acids in T. brucei for cytochrome c oxidase subunit III (COX3) and ATP synthase subunit 6 (ATP6), NAD7, RPS12, and NAD9 were leucine (L), phenylalanine (F), valine (V), and cysteine (C), which are order-promoting residues (i.e., the first three are common in secondary structures, and C forms disulfide bonds). Instead, diplonemids and jakobids had a higher frequency of threonine (T) and disorder-promoting amino acids such as serine (S) and alanine (A) ( fig. 3, right).
Considering the higher frequency of hydrophobic (and most order-promoting) amino acids, we made a hydrophobic cluster analysis (HCA) to address how such differences affect secondary structures in the kinetoplastid proteins. HCA is based on a duplicated 2D representation of the protein sequence, which highlights local proximities between amino acids in secondary structures (Woodcock et al. 1992). The HCA plots of COX3 and ATP6 N-terminals for T. brucei against J. libera and Diplonema ambulator are shown in figure 4. It is observed for T. brucei that hydrophilic regions are smaller than those observed in jakobids and diplonemids. Interestingly, most hydrophobic clusters were merged resulting in a pattern of hydrophilic clusters in a hydrophobic background in this kinetoplastid. The average proportion of hydrophobic amino acids for the data set was 0.48 ± 0.03, and such proportion was 0.68 for T. brucei. In addition, T. brucei COX3 was ranked as the second protein with the largest number of hydrophobic amino acids. Interestingly, among the organisms    Pan-edited in the analyzed kinetoplastids.

MBE
the most hydrophobic ones among other eukaryotes, suggests that the high hydrophobicity of this protein, although possible, is poorly explained by common substitution rates and selection. Instead, a high mutational rate with a T bias or an editing system that inserts Us (both may be considered mutational pressures) may reach such hydrophobicity.
Hydrophobicity Loss Is Prevented by Purifying Selection after Missing mRNA Editing COX3, ATP6, and NAD7 mRNAs are pan-edited in Trypanosoma sp., but other trypanosomatids have partially or completely lost editing by replacing the gene by retroposition of the fully edited mRNA (Speijer 2010). Since T. brucei shares pan-editing with the ancestor of other kinetoplastids, it can be assumed that most of the amino acid sequence in such an ancestor was similar to T. brucei sequence. Consequently, a comparative analysis between T. brucei (ancestral) and kinetoplastids that lost pan-editing (derived) may help to infer and understand the changes in hydrophobicity after losing editing and whether these changes had an evolutionary role. Hydrophobicity along protein sequence was highly In order to analyze if such conserved hydrophobicity was caused by purifying selection, we first compared U content across codon positions among kinetoplastid species ( fig. 6B-D). The U content in the second codon position was less variable than that in the first and third ones ( fig. 6C). However, as mentioned above, protein hydrophobicity in kinetoplastids that lost editing was almost identical to the observed in T. brucei proteins. These results suggest a lower substitution rate in the second codon position as expected under purifying selection. Moreover, we performed a synonymous/nonsynonymous substitutions analysis of COX3 mRNA sequences from different kinetoplastids. An increase in frequency of synonymous substitutions versus nonsynonymous ones would suggest purifying selection. The analysis showed that 197/288 codons exhibited a site-specific synonymous substitution rate (dS) higher than their nonsynonymous substitution rate (dN), indicating that dN − dS < 0. Of these, 144 codons coded for hydrophobic amino acids in T. brucei. Among codons with dN − dS < 0, 126 were found to be significant using the fixed effects likelihood (FEL) method (P < 0.05) (see fig. 6E). On the other hand, only 40/288 codons had dN − dS > 0, but they were nonsignificant (P > 0.1), suggesting neutral evolution instead of positive selection (diversifying selection) (see fig. 6E). Most implied hydrophobic-to-hydrophobic or hydrophilic-tohydrophilic substitutions and just nine codons implied hydrophobic-to-hydrophilic substitutions. All these results suggest purifying selection preventing hydrophobicity loss.

A Hydrophobic Ratchet Prevents Hydrophobicity Reduction after Editing Loss
We performed HCA of protein structures to address where hydrophobic-to-hydrophilic substitutions (hydrophobicity reduction) occurred after editing was lost. Firstly, COX3 protein structure was analyzed and compared among T. brucei (pan-edited), L. tarentolae (5′ edited), Crithidia fasciculata, and A. deanei (nonedited). Results showed that whole hydrophobic-to-hydrophilic substitutions occurred at the edge of hydrophilic clusters ( fig. 7A-D). However, sites with strong purifying selection (P < 0.05) were also predominantly distributed within and at the edge of hydrophilic clusters. This suggested that such hydrophilic clusters cannot increase in size, at least not by changing only one amino acid at a time ( fig. 7A). Similar patterns were observed for ATP6 and NAD7 (supplementary figs. S7 and S8, Supplementary Material online). These results may suggest that replacing a single hydrophobic amino acid may be destabilizing for the protein. In fact, multiple substitutions occurring at a time or in short periods and changing two or more amino acids would be required to reduce hydrophobicity without protein destabilization.
It is known that editing can introduce multiple Us in contiguous or close positions, leading to the introduction of multiple hydrophobic amino acids at once. Many times, when two or more interacting hydrophilic amino acids are replaced by hydrophobic ones, the interactions (although of a different type) and the protein structure are preserved. These changes cannot occur step by step because replacing only one hydrophilic amino acid (by a hydrophobic one) breaks the interactions with other hydrophilic residues and destabilize the protein. This may explain why editing may force increased hydrophobicity avoiding deleterious states. Consequently, strong correlations between substitutions of nearby amino acids in the sequence should be observed. To analyze this hypothesis, we used Coeviz2 in order to detect sites that coevolve in COX3 and NAD7 and to analyze if such sites clustered closely together in the amino acid sequence. Figure 7E shows a heatmap of sites that coevolved for COX3 (red) and shows clear clusters for closely located amino acids. Finally, it was addressed in the heatmap whether hydrophobic amino acids coevolved. A total of 324 site pairs, located within a five amino acid distance in the sequence, scored >0.8 in the coevolution heatmap. These pairs are located along the diagonal of the heatmap. Among these pairs, 52% consisted of interactions between two hydrophobic amino acids. These results suggest that a hydrophobic ratchet prevents protein hydrophobicity reduction after editing loss.

Hydrophobicity Increase Was Favored at the Protein Surface
A Blast search was performed for T. brucei NAD7, NAD9, ATP6, and COX3 proteins against human protein databases. These analyses were performed against humans since respiratory complexes are well described for this species. NAD7 was the most conserved protein (table 4), and it is homologous to the NADH dehydrogenase (ubiquinone) iron-sulfur protein 2, which was considered in a cryo-EM 3D model of the respiratory complex I (accession:  MBE online). In addition, 44.9% of the exposed sites of NAD7 had higher hydrophobicity than chain Q, while only 20.4% had less hydrophobicity. Furthermore, the sites with higher hydrophobicity in NAD7 than in chain Q were mainly located on the protein surface (44.9% in exposed sites vs. 31.8% in nonexposed sites, P = 0.01). Similar results were observed by comparing T. brucei and D. ambulator NAD7 proteins (40.9% of sites had increased hydrophobicity in exposed sites vs. 28.0% in nonexposed sites, P = 0.009). However, the editing frequency of codons that correspond to exposed amino acids was similar to that of nonexposed amino acids (both 46%).

Protein-Protein Contact Surfaces May Limit Hydrophobicity Increase and Constrain Editing
Hydrophobicity changes at the protein-protein contact surface were analyzed by searching T. brucei homologs of human subunits that interact with chain Q in the complex I. Four proteins were identified by Blast search (table 5). Particularly, two proteins (chains B and P in the 5XTD model) were mitochondrion-encoded in T. brucei, and their mRNAs were pan-edited (called NAD8 and NAD9). Around 44% (52/119) sites of chain Q which interact with chains B and P were more hydrophobic in NAD7 than in chain Q. In addition, around 54% (29/54) and 47% (17/36) in chains P and B respectively were also more hydrophobic in the T. brucei homologs. This may imply compensatory changes (an example is observed in supplementary fig. S11, Supplementary Material online). However, it cannot be distinguished whether these changes in hydrophobicity are related to interactions with NAD7 or to the editing in the mRNA of such proteins (although both hypotheses are not mutually exclusive). The other two Q-interacting proteins were nuclearencoded (chains C and M in the 5XTD model of the respiratory complex I). The sites located at the contact surface between chains Q and M were poorly conserved in

MBE
NAD7 and located in an INDEL region. Instead, the sites located at the interacting surface between NAD7 and T. brucei homologs to the chain C were better conserved (supplementary fig. S10, Supplementary Material online). Consequently, we addressed hydrophobicity changes at the Q-C interacting surface in T. brucei homologs. The Q-C interface in T. brucei NAD7 had less hydrophobicity increase -calculated as the percent of amino acids with higher hydrophobicity-compared with sites outside such interacting surface on the same protein (17% vs. 44%, P = 0.0008). In addition, the contact surface to NAD7 in the T. brucei chain C homolog had a hydrophobicity increase not higher than that observed for other sites in the protein (18.9% vs. 28.5%, P = 0.24). In addition, the NAD7 codons that code for the Q-C interface were significantly less edited than other sites in mRNA (25.2% vs. 47.4%, P = 3 × 10 −6 ). This result shows that this interacting surface may limit the extent of hydrophobicity increase in the NAD7 protein and constrain editing of its mRNA.

Discussion
Since the discovery of kinetoplastids mRNA editing system in the mitochondrion in the 1980s (Benne et al. 1986;Feagin et al. 1988;Shaw et al. 1988), several questions remain, or at least, are incompletely answered. Why is such an energetically expensive system for editing a dozen (or fewer) mitochondrion-encoded genes maintained? Why did natural selection not erase such complex machinery? This is more disconcerting if we consider that kDNA involves around or more than 50% of the total cellular DNA in some kinetoplastids such as Perkinsela sp. or Trypanoplasma borreli (Lukes et al. 2018). Several attempts to answer such questions were made. These hypotheses range from a simple mutation correcting system (Benne  Sites with a positive score in the alignment.  et al. 1986;Gray 2003), an expression regulation method (Shaw et al. 1988), a mutational machinery to generate protein diversity (Landweber and Gilbert 1993) and alternative functions (Ochsenreiter et al. 2008), a mechanism to protect genes from deletions in complex life cycles (Speijer 2006(Speijer , 2008, a machinery to generate DNA/RNA hybrids that promotes transcription termination and DNA replication (Leeder et al. 2016) to a dead-end moved by an evolutionary neutral ratchet (Gray et al. 2010;Flegontov et al. 2011), or a combination of different ones (Lukes et al. 2021). Another big question is how the transition from nonedited genes to pan-edited ones occurred. In this matter, a three-step model was proposed (Covello and Gray 1993). First, the RNA editing capability appeared. Second, mutations that could be corrected occurred, and they were then fixed by genetic drift. Third, natural selection conserved RNA editing (Covello and Gray 1993). gRNAs directing the editing appeared later. Here, we contribute to the understanding of the increase in editing, resulting in pan-edited genes. Furthermore, we described some basic features of mRNA editing considering its effect on the coding sequence and on the protein. Our results clearly showed that mRNA editing in kinetoplastids mainly introduced Us with no codon position preference. However, such editing favored hydrophobic amino acids and hydrophobic regions in the protein by inserting Us at second codon position. The hydrophobic increase was distributed throughout the protein sequence, with a slightly greater preference for protein surface, at least in NAD7 (such surface preference was not observed for editing). However, protein-protein interactions between NAD7 and a nuclear-encoded protein may have influenced where editing occurred by limiting the hydrophobicity increase. In contrast, the contact surfaces between proteins that can be edited may have undergone compensatory changes that increased their hydrophobicity. Such changes may have been facilitated by editing as observed in the interactions between NAD7 and NAD8 or NAD9. Moreover, after editing was lost by retroposition in some kinetoplastids (Simpson and Maslov 1994b), loss of hydrophobicity was prevented by purifying selection, which is indicated by a lower substitution rate on the second codon position. Only a few substitutions from hydrophobic amino acids to hydrophilic ones were observed, and they occurred spatially close to other hydrophilic residues. And indeed, the analysis of coevolving sites clearly showed many local patterns suggesting that multiple mutations at a time are required for hydrophobicity loss around the hydrophilic clusters. Altogether, these findings suggest that a ratchet may be preventing hydrophobicity reduction; that is, multiple mutations at a time are required for variations in hydrophobicity.

MBE
Based on such results, we propose that editing did not occur at random positions in the mRNA. Instead, protein hydrophobicity influenced RNA editing site selection and where editing could be expanded. At the protein level, editing was probably allowed within or at the edge of hydrophobic clusters of amino acids. Additionally, it often involved multiple substitutions of closely situated amino acids simultaneously. If editing was lost by retroposition of the fully edited mRNAs, the changes made by editing could not be easily reverted by single-point mutations because intermediate states are deleterious. Consequently, they entrenched more and more hydrophobicity in the protein. We propose the following scenario for the evolution toward pan-editing. In the kinetoplastid ancestor, eventual deletions of one or few nucleotides occurred within maxicircles coding regions. If those deletions could be corrected, one or few Us would be introduced in the mRNA. Such mRNA editing can be compared with a mutational pressure to introduce Ts in DNA, like in ctenophores (see fig. 5 and Wang and Cheng 2019). However, changes in the amino acid properties in the translated protein generate constraints to the sites where editing can insert Us. This is caused by certain spatial clustering of hydrophobic (or order-promoting) amino acids. Consequently, editing is allowed within and at the edge of such clusters, whereas in other cases, inserting multiple Us at once was necessary to preserve protein stability. Interestingly, as microdeletions in the coding regions may simultaneously affect several codons, mRNA editing could correct deletions of amino acids that were spatially neighbors. Despite this, such multiple codon modifications did not necessarily occur at the same time, particularly if purifying selection is not so strong (Lovell and Robertson 2010). It is probable that changes in hydrophobicity could be slow at the beginning, but as more and more hydrophilic amino acids were replaced by hydrophobic ones, later changes could be less destabilizing, and consequently, the process could speed up.
In addition, we propose that natural selection does not appear to play a role in favoring an increase in hydrophobicity, as hydrophobicity can affect various regions of proteins and different types of proteins (such as transmembrane and soluble proteins from respiratory complexes or elsewhere, such as RPS12) (Opperdoes and Michels 2008;Aphasizheva et al. 2013) and editing seemingly occurred wherever it was possible. In addition, editing affects any codon position. A neutral hydrophobic ratchet has been previously proposed for explaining the evolution of multimeric proteins (Hochberg et al. 2020), and it may also apply to intramolecular amino acid composition. Some mutational pressures, for example, mutational bias in ctenophores (Wang and Cheng 2019) or editing in kinetoplastids, not only influenced amino acid composition but we propose they may also have ratcheted it. Anyway, protein function is conserved revealing how flexible amino acid sequences are.
It is unclear why editing was evolutionarily preserved in many genes and many lineages considering the possibility of retroposition of the fully edited mRNA. Complex systems or energetically unfavorable mechanisms, according to the CNE hypothesis, are only conserved if it is highly unlikely to reach the previous state (a ratchet) (Gray et al. 2010;Hochberg et al. 2020). One possibility is that new MBE essential functions have appeared (Lukes et al. 2021). Alternatively, a new hypothesis may be derived from our results. Basically, loss of editing may be unfavorable because the proteins are entrenched in a hydrophobic extreme and mutational pressure tends to reverse the high T content, making most nonsynonymous mutations deleterious. Still, more testing is required.
Alternatively, it is important to note that loss of editing due to gene replacement should leave the gene susceptible to be reedited. There are two possible reasons for this hypothesis: first, the poly-T regions are very susceptible to indels because of polymerase slippage. Second, the hydrophobic ratchet maintains the sequence easily editable; that is, new deletions probably do not imply that editing replaces hydrophilic amino acids by hydrophobic ones. Consequently, editing loss and restoration are expected to be part of a dynamic process. However, it is unclear whether such a process occurs.
Finally, plastid mRNA editing in other organisms such as plants has also been reported to generate substitutions that favor hydrophobicity in the protein and are related to the restoration of functional domains as alpha-helices (Jobson and Qiu 2008;Yura and Go 2008;He et al. 2016;Koo and Yang 2020). In consequence, could hydrophobic constraints (or requirements) be the link between very different and evolutionarily unrelated mechanisms such as editing in plants and editing in kinetoplastids? Further analyses in more species, sequencing of mitochondrial genomes and functional analyses may help answer such question. In conclusion, based on our results, we propose a mechanism for a transition toward pan-edited mRNA in ancestral kinetoplastids directed by hydrophobic constraints in the protein structure and suggest a new hypothesis about the evolution of this complex system.

Sequences
The following mitochondrion-encoded sequences (genes or fully edited mRNAs): COX1, COX2, COX3, CYTB, NAD1, NAD5, NAD7, NAD9, RPS12, and ATP6 were downloaded from GenBank for kinetoplastids (nine species), diplonemids (two species), euglenids (one species), and jakobids (two species). A total of 119 sequences were analyzed. The different species and accession numbers are provided in supplementary material S1, Supplementary Material online. Noncoding regions were trimmed. Additional protein sequences of cytochrome c oxidase subunit III (COX3) were downloaded from the RefSeq database of NCBI (supplementary material S2, Supplementary Material online).

DNA-RNA Sequence Analysis
U content across the mRNA sequence was addressed by a unidimensional U walk (Berger et al. 2004 A line is used to represent the values showing, for each position, an increase in one unit whenever there is a U and a decrease in one unit when there is another nucleotide. Consequently, it is possible to address both the overall U content (last point of the line) and U content at different regions, in the same graph. Positive slopes indicate more Us than other bases. Regions with the same slopes (parallels) indicate similar U content. The approach was used to address U content at each codon position. In addition, the standard deviation for the U content along every sequence in each codon position was determined using a nonoverlapped sliding window of 30 bases. The hypothesis of uniform distribution of the U content at different codon positions was evaluated by using a chi-square test. In addition, the hypothesis of uniform distribution of inserted Us at different codon positions was also evaluated. Inserted Us were inferred by a manual comparison between the fully edited mRNAs and the corresponding cryptogenes. In order to address purifying, neutral, and diversifying selection, the site-specific nonsynonymous and synonymous (dN and dS, respectively) substitution rates were inferred by using FEL (Kosakovsky Pond and Frost 2005) in the Datamonkey server (Weaver et al. 2018). The FEL method utilizes a maximum-likelihood approach to estimate the rates of nonsynonymous (dN) and synonymous (dS) substitutions at each site in a coding alignment and its phylogeny. Then, a likelihood ratio test is used to test if dN is significantly different to dS (Kosakovsky Pond and Frost 2005).

Protein Sequence Analysis
Hydrophobicity of different amino acid sequences was evaluated by using the Kyte-Doolittle index (Kyte and Doolittle 1982). The cumulative hydrophobicity was graphed in order to compare against U walks. Amino acid composition was estimated by using MEGA v7 (Kumar et al. 2016). Amino acids were classified as order-and disorder-promoting according to the criteria in Dunker et al. (2001). HCA (Gaboriaud et al. 1987;Woodcock et al. 1992;Bitard-Feildel et al. 2018;Lamiable et al. 2019) was made using the DrawHCA tool (http://osbornite.impmc.upmc.fr/ hca/hca-form.html). Search for homologous proteins of humans was made by Delta-Blast search.

Protein Structure Analysis
Prediction of the protein structure was made using the RoseTTAFold server (https://robetta.bakerlab.org/submit. php) (Baek et al. 2021), and the structure was drawn with the software ChimeraX v1.2.5 (Pettersen et al. 2021). Coevolving sites were analyzed by using CoeViz2 (Corcoran et al. 2021) with a 20-amino acid alphabet, and MBE the Uniprot-Uniref90 database to build the multiple sequence alignment. Surface-exposed amino acids in the above structures and in protein structures of the complex I downloaded from NCBI Protein (accession: 5XTD) were calculated by determining the SASA using the server of the Center for Informational Biology (http://cib.cf.ocha.ac. jp/bitool/ASA/). Amino acid positions exposed to the solvent were defined as those with an exposed area higher than 30%. Also, contact surface sites between two proteins in the 3D model 5XTD of the respiratory complex were studied. Contact surfaces were determined by comparing SASA for two proteins in the complex I individually, against their SASA when such proteins are interacting according the 5XTD model. Sites that evidenced any reduction in the SASA value when proteins interact were considered to be located in surface contact areas. Hydrophobicity increases were determined by using the Kyte-Doolittle index. The number of sites for which hydrophobicity was higher in T. brucei proteins than in their human homologs was calculated for exposed, nonexposed, interacting, and noninteracting sites. They were compared by using a chi-square test.

Supplementary material
Supplementary data are available at Molecular Biology and Evolution online.