Diversity and metabolic energy in bacteria

Abstract Why are some groups of bacteria more diverse than others? We hypothesize that the metabolic energy available to a bacterial functional group (a biogeochemical group or ‘guild’) has a role in such a group’s taxonomic diversity. We tested this hypothesis by looking at the metacommunity diversity of functional groups in multiple biomes. We observed a positive correlation between estimates of a functional group’s diversity and their metabolic energy yield. Moreover, the slope of that relationship was similar in all biomes. These findings could imply the existence of a universal mechanism controlling the diversity of all functional groups in all biomes in the same way. We consider a variety of possible explanations from the classical (environmental variation) to the ‘non-Darwinian’ (a drift barrier effect). Unfortunately, these explanations are not mutually exclusive, and a deeper understanding of the ultimate cause(s) of bacterial diversity will require us to determine if and how the key parameters in population genetics (effective population size, mutation rate, and selective gradients) vary between functional groups and with environmental conditions: this is a difficult task.


Introduction
Universal patterns of diversity have played a strategically important role in classical ecology (Rosenzweig 1995 ), and microbial ecologists have sought, with vary degrees of success, to observe such patterns in microbial communities (Bell et al. 2005, Horner-Devine et al. 2004 ). We report a ubiquitous pattern in the diversity of bacterial functional groups.
A functional group (also called a guild or biogeochemical group) is a collection of bacteria that exploit the same redox couples. For example, ammonia and o xygen (ammonia-o xidizing bacteria) or glucose and sulphate (sulphate-reducing bacteria). We hypothesize that the energetic yield of a particular redox couple, directly or indir ectl y, contr ols the genetic div ersity of or ganisms belonging to the corresponding functional group.
Ther e ar e m ultiple r easons to suspect that ener getic yield (as opposed to the energy impinging on an environment) plays a role in differences in diversity (Curtis et al. 2008 ). Not only is energy one of the few concepts to have gen uine predicti ve po w er in micr obial ecology (Br oda 1977, Br own 2000, Ettwig et al. 2010, Harte and Newman 2014, Harte et al. 2008, Jetten et al. 1998, it is also held to underlie man y macr oecological patterns in classical ecology (Brown 2000, Harte and Newman 2014, Harte et al. 2008 ) and local diversity in hot springs (Dick and Shock 2013 ). Evolution is implicitly about the processing of information (Kimura 1961 ), and there is a link between energy and information in biology (Wolpert 2016, Kempes et al. 2017. Energy has also been hypothetically linked to the evolution of higher organisms (Lane and Martin 2010 ), though that reasoning has been challenged (Lynch and Marinov 2016 ).
To test this hypothesis, we must estimate three things: the metabolic ener gy av ailable to a member of a given functional group, the extent of bacterial diversity, and the assignment of taxa to functional groups.
The first, the energetic yield of a cell, can be r oughl y modelled from first principles (Rittmann and McCarty 2001 ). For any given functional group, the catabolic energy available to the cell ( G cat ) is calculated from the free energies of the redox couples exploited. Some of this energy is required for anabolism ( G ana kJ/mole biomass formed) (m uc h mor e if carbon is fixed), some for cellular maintenance, and some dissipated ( G dis ) during biomass synthesis . T he maxim um gr owth yield is a function of catabolism, anabolism, and dissipation (Maximum Yield = G cat /( G ana + G dis ) (Kleerebezem and Van Loosdrecht 2010 ) (Table 1 ). This is only an a ppr oximation because: (1) we must assume a single electron donor, (2) the actual energy depends on the environmental conditions (pH, solute concentr ation, temper atur e, and pr essur e), and (3) we neglect maintenance energy. Ho w ever, w e posit that this approximation is 'good enough' to test our hypothesis because we are looking at the differences over nearly 3 orders of magnitude (Table 1 ). T hus , though man y factors could impinge on this r elationship, they are unlikely to obscure a strong and consistent correlation, should it exist. For example, differences in environmental conditions will impact on free energy from any given redox couple. Ho w e v er, the effects ar e linear (driv en by temper atur e and the natural log of the ratio of the reactants and products) and modest when compared to the very large differences between redox couples . Likewise , most monomeric sugars or amino acids would have similar (but lo w er) free energies to one we have assumed. The values obtained depend on the stoichiometries emplo y ed, which are sho wn in the supplementary methods. G cat is the ener gy r eleased by the r edox couple expr essed per mole of electr on donor. G anab is the anabolic energy required to generate a mole of biomass from re presentati ve substrates for a given metabolism and may endergonic or exergonic. G dis r epr esents the energy dissipated in anabolism and catabolism.
Maintenance ener gy, whic h we hav e neglected, is of the order of 1-10 kJ/mole biomass and has a negligible (median < 1%) impact on the calculated yield in cultured heterotrophs. Maintenance energy is more important in organism lo w er energy redox couples, but maintenance energy 'in the wild' is thought to 3-6 orders of magnitude lo w er than the classical Tijhuis data. The second, the diversity of a bacterial community, is usually determined from the variation and proportional abundance of a ubiquitous and conserved molecular marker, typically the gene for the 16S RNA molecule. Bacterial diversity is often thought of as having se v er al dimensions: the two most pr ominent being the absolute number and distribution ('e v enness') of the taxa. The practical and theoretical barriers to determining the absolute diversity or the true distribution are formidable. Consequently, microbial ecologists use a wide range of imperfect indices and estimators instead. We have chosen a robust estimator that is theor eticall y suited to the hypothesis we ar e addr essing. In pr actice, div ersity measur es ar e typicall y highl y corr elated (Walters and Martiny 2020 ).
The third thing we m ust infer, whic h taxa belong to which functional group, is the most problematic. The basic concept is simple one must compare the taxa-in our case, the genus-observed in an environment with a database [such as KEGG (Kanehisa et al. 2023 )] of organisms of known function. There are a number of tools for doing this (Wemheuer et al. 2020 ). Errors will occur, for example, if the databases are imperfect (which is inevitable, all databases are biased to w ar ds cultured organisms) or the taxon observed has no known function (this is also inevitable, especially for less well-studied functions). Furthermore, some judgement is r equir ed when using a database like KEGG to infer function. For example, KEGG reports that the canonical ammonia oxidizer Nitrosomonas europaea has genes for dissimilatory sulphate reduction. The best results are probably best obtained when the reference database is calibrated for a given environment, but this may not be a ppr opriate when, as in this study, multiple environments ar e consider ed. We ther efor e elected to categorize our genera 'by hand' using Bergey's manual (Whitman et al. 2015 ).
Ob viousl y, the effect of an error in any one of these three areas would be to obscure the hypothesized relationship between metabolic energy and bacterial diversity. T hus , if we are able to discern an ener gy-div ersity r elationship, that r elationship is pr obably a robust one.
We tested the hypothesized ener gy-div ersity r elationship using classical methods to determine energetics, the 16S RNA gene to e v aluate div ersity, and a simple classification method to infer function.

Assignment to functional groups
Functional group assignments were determined initially by refer ence to Ber ge y's Man ual of Systematic Bacteriology (Whitman et al. 2015 ), follo w ed b y m ultiple r ounds of manual cur ation. All members of a genus are assigned the same metabolism, and those for which a single classification would be unrealistic have been remov ed fr om the anal ysis. In all 964 gener a wer e classified, and the classifications can be inspected on the Github page ( https: // github.com/beadyallen/ EnergyEvoRate ) in the 'Intro' file.
Facultativ el y, aer obic fermenters ha ve , for the purposes of this study, been grouped along with aerobic heterotrophs. We recognize that in doing so, there is a danger of undue weighting of classifications in favour of higher ener gies, whic h would affect the balance between aerobic heterotrophs (over-representation) and fermenters (under-r epr esentation). Furthermor e , we ha ve elected to exclude the lar ge pr okaryotic gr oup of or ganisms ca pable of phototr ophic gr owth, on the gr ounds that the ener gy model emplo y ed does not r eadil y extend to photosynthesis . We en visage that extensions to this work would include more detailed classification of a taxa's metabolic category through reference to metabolic pathwa y databases .

Energetics
Fr ee ener gies of anabolism and catabolism wer e calculated (for pH, 298.15 K, 1 M, and 1 bar) using the methods described pr e viousl y (Gonzalez-Cabaleir o et al. 2015 ). The Python code and v alues used ar e online ( https:// github.com/beadyallen/ EnergyEv oRate under the metabolics tab). The value of G cat is given by the change in free energy between reactants and products in the r ele v ant complete catabolic reaction, per mole of electron donor. Aer obic heter otr ophs ar e assumed to pr ocess a complete gl ycolysis to tricarboxylic acid cycle conversion to CO 2 and H 2 O, while anaerobic fermenters are assumed to use glucose as the primary substrate in a mixed acid fermentation pathway with acetate and H 2 being produced in stoichiometric quantities . We ha ve also determined the yields for homolactic , heterolactic , and Sticklandtype fermentations and found all values to be similar. While an alternativ e c hoice of fermentation pathway for a particular taxon would lead to slightly different results, we do not believe the overall results of this study will be alter ed. Giv en the difficulty in reliably assigning a single functional group at the genus le v el, we maintain our a ppr oac h is a 'good enough' a ppr oximation.
Anabolism is calculated as a combination of both 'true' anabolic free energy (per mole of new biomass), G ana , and G disthe energy dissipated to the surroundings. For the calculation of G ana , the unit of biomass is taken to be CH 1.8 O 0.5 N 0.2 (Roels 1983 ). G dis is calculated using the Gibbs Dissipation Method (Heijnen et al. 1992 ).
The maximum energetic was calculated and expressed as a ratio.

Maximum energetic yield
Our a ppr oac h is necessaril y simplified and idealized for two r easons. Firstl y, standard physiological conditions will not necessaril y r epr esent the conditions in nature either now or over evolutionary time, and secondly, a single r epr esentativ e or ganic carbon source (glucose) was used in place of the unknown and unknowable range of organic matter bacteria may consume.
To partiall y v alidate our estimates of yield, the modelled yield for aerobic heterotrophs (13.1 j/j) was compared with the empirical yields reported by DeLong et al. ( 2010 ). De-Long et al. ( 2010 ) recorded the maximum specific growth r ate in r ecipr ocal time (d −1 ), wet mass (g), and metabolic rate (W) for 33 cultured bacterial species growing on glucose. We calculated the specific substrate uptake rate by dividing the active (as opposed to basal) metabolic rate of the cell Watts (Js −1 ) by the mass (g) and converting the time unit from seconds to da ys . On this basis, we could determine the energetic yield (gJ −1 ) by dividing the maximum specific growth rate by the specific substrate rate. De-Long made an identical calculation and used the term efficiency of biomass production for energetic yield. To convert the yield from g per joule to joules per joule, we multiplied by 0.2 (to allow for dry weight) (Makarie v a et al. 2005 ) and assumed 22 523 joules per gramme of dry weight (Pr oc hazk et al. 1970 ) (originally 5383 cal/g). The yields are lognormally distributed, and the geometric mean value of 13.47 j/j of the De-Long data are very close to the theoretical value (13.1 j/j) for yield on glucose (Supplementary Fig. S1).

Data sources
The pr ojects wer e selected fr om source biomes in the European Bioinformatics Institute MGnify ( https:// www.ebi.ac.uk/ metage nomics/), Operational Taxonomic Unit tables were downloaded and extracted using a Python script ( https://github.com/beadyal len/EnergyEvoRate ). At the time of r etrie v al, all EBI pr ojects wer e generated by version 2 of the EBI metagenomics portal. Local sam-ple collections were processed through a standard QIIME pipeline (1.9.1) with open r efer ence OTU pic king according to 97% similarity with the Green Genes database (13.5) and classified against a database derived from Berge y's man ual, the database is to be found here: ( https:// github.com/beadyallen/ EnergyEvoRate ).
The sources for the biomes were as follows.  (Han et al. 2018 ). The joining of a hydrothermal vent and a volcanic soil was a mistake (see no 'cherry picking' below'). The Tara Ocean Survey (Tara ERP001736) (Pesant et al. 2015 ) was used as an independent sample.
To avoid 'cherry picking', an EBI 16S dataset, once chosen, could not be withdr awn fr om the study. Ho w e v er, an attempt to use the GreenGenes database (DeSantis et al. 2006 ) to e v aluate the phylogenetic diversity has been recorded in supplementary methods (Supplementary Figs. S3 and S4). We are not relying on this data to support our argument because an informal r e vie wer pointed out the database was out of date, and we (subsequentl y) r ealized that suc h databases ar e biased to w ar ds cultur ed or ganism so ar e not random samples of the diversity of all functional groups.

Di v ersity estimation
All data analysis was performed using R (v3.3.1), and detailed scripts are available for download at ( https://github.com/beadyal len/EnergyEvoRate ). Briefly, alpha diversity (inverse Simpson's Index) was determined using the estimate richness function in the v egan pac ka ge. Metacomm unity div ersity ( θ) was used as a proxy for gamma diversity across metabolic groups and determined for each metabolic group using NMGS code (Harris et al. 2017 ) ( ht tps:// github.com/beadyallen/ EnergyEvoRate ). This method calculates the size of the metacommunity that would be required to explain the diversity observed locally. The θ is a compound parameter and is (theor eticall y) equal to the number of individuals in the metacommunity multiplied by 2 times the speciation rate. We do not assume that communities are neutrally assembled. Other The median value for the significant slopes is 0.95. div ersity measur es can be c hosen using this softwar e and giv e compar able r esults . T he complete GreenGenes 13.5 annotated OTU tree was pruned using functions from the ape package to include onl y gener a r ele v ant to our anal ysis , lea ving 189 743 tips . Phylogenetic diversity was determined as the sum of br anc h lengths within each metabolic group. The data for phylogenetic distance (in the GreenGenes dataset), yield, and catabolism were subject to Bo x-Co x tr ansformations, and a natur al log tr ansformation was selected. The energetic yield data used with the Green-Genes data was slightly better supported by a X −0.5 transformation, but the natural log transformation was preferred for simplicity and clarity and still gav e normall y distributed r esiduals after r egr ession a gainst phylogenetic distance.

Results
T he catabolic , anabolic , and dissipated ener gies wer e determined for eac h r edox couple (under standard physiological conditions) and used to calculate the energetic yield (Table 1 ). Catabolic energy accounts for most of the differences in energetic yield between functional groups . T he model is necessarily a generalization that neglects many important metabolic details . Moreo ver, the r elativ e importance of these other factors will increase as the ener gy av ailable to the cell decr eases. But the comparison with the DeLong data (S1) sets suggests we have correctly estimated the median yields and that the other factors are incorporated into the normally distributed noise. The diversity of individual metabolic groups in each biome was calculated using both the Inverse Simpson's diversity index and metacomm unity div ersity ( θ ), with r ankings of div ersity acr oss metabolic groups being consistent across biome types (Friedman's test P = 9.3 × 10 −5 ).
We initially examined the relationship between the estimated energetic yield and metacommunity diversity in w astew ater treatment plants (Fig. 1 ). Intrigued by the relationship we found, we sought to compare this finding with other biomes to see if it was observed more widely; it was.
There was a good relationship (Fig. 2 ) between the estimated energetic yield and metacommunity diversity in all biomes (slope 0.73, P < 0.001; adjusted R 2 = 40%), with residuals being normally distributed.
The ener gy-div ersity r elationship was significant for all the individual biomes with the exception of the volcanic dataset, where we had accidentally mixed a deep-sea hydrothermal vent with a soil ( Table 2 ). The range of slopes was a r elativ el y narr ow ( ∼0.6-1).
Analogous results can be found by comparing the different div ersity measur es with the catabolic ener gy ( G cat ) (Supplementary Fig. S2). Mor eov er, though metacomm unity div ersity was c hosen as the most a ppr opriate measur e of diversity, other measures of 'richness' can be examined using the software.
We considered that some, or all, of the observed energydiv ersity r elationship could pr osaicall y be attributed to the greater metabolic versatility of functional groups guilds with a range of complex substrates (aerobic heterotrophs, fermenting organisms , sulphate reducers , and iron reducers).
We ther efor e anal ysed the r elationship between ener gy and diversity, examining those functional groups purported to have simple (typically a single) substr ates. A ne w dataset was chosen (the EBI-Tara Oceans) (Sunagawa et al. 2015 ). The relationship between energetic yield and the metacommunity diversity of functional groups with simple substrates was not statistically significant ( Supplementary Fig. 3A). Ho w e v er, a cr edible ( P = 0.02, R 2 = 76%) relationship was found between catabolic energy and metacomm unity div ersity (Fig. 3 B).
Methylotr ophs wer e an ob vious outlier. Our anal ysis distinguishes between the true methanotr ophs (gener a that, as a first ste p, o xidize methane to methanol) and methylotrophs (those deriving ener gy fr om extr acellular methyl compounds suc h as methanol). It could be argued that combining into a single guild makes the most sense. Ho w e v er, since post -hoc r emov al of methylotr ophs fr om the anal ysis could be construed as ' P -hac king', we have elected to retain a separate methylotroph classification.

Discussion
We conclude that there are universal and systematic differences in the diversity of functional groups (on the basis of the consistent r anking in div ersity) and that these differ ences ar e dir ectl y or indir ectl y linked to the metabolic energy of the cell (on the basis of the observ ed corr elations between ener getic yield and diversity). This ener gy-div ersity r elationship is, potentiall y at least, analogous to the patterns in diversity and space seen in classical ecology. Such ubiquitous patterns are imbued with significance because they tell 'us that diversity is a predictable variable susceptible to scientific analysis' (Rosenzweig 1995 ).
We have considered the alternative interpretation: that the putativ e ener gy r elationship was simpl y attributable to the ele v ated diversity of the heterotrophs and variation in the diversity of the low-energy taxa is just noise . T he apparent persistence of our proposed pattern in a separate 'single resource' marine data set (from the TARA study) suggests there is an energy pattern in the nonheter otr ophic taxa. Finall y, we attac h particular significance to the repeatability of the overall pattern across biomes and data sets and the r elativ e consistency of the ener gy-div ersity slopes within biomes.
We note that this pattern was discerned e v en though our methodology had many imperfections. Our study was potentially limited by: the number of functional groups for which we can propose a redox couple, the approximations of free energies, and difficulties in the attribution of function and the estimation diversity (Pielou 1977 ) (Suzuki and Giovannoni 1996 ). We hope that subsequent studies, with impr ov ed methodologies, could yield an e v en better picture of the relationship between metabolic energy and diversity.
Ho w e v er, the true benefit of a finding a consistent pattern in nature comes not from identifying the pattern, but from the search for, and discovery of the underlying mechanism or mechanisms. Consequently, it would seem sensible to ask what mechanisms might underlie the a ppar entl y ubiquitous r elationship between metabolic energy and bacterial diversity.
Classical ecological theory suggests that environmental variation plays a central role in the generation (Levins 1968 ) and maintenance of diversity (Tilman 1982 ). Microbial ecologists typically concur (Cohan 2017 ). If environmental variation explains our Figur e 2. T he relationship between the estimated metacommunity diversity in all biomes and the energetic yield. The ov er all slope is significant (slope = 0.73, P slope < 0.001; adjusted R 2 = 40%). Values for individual biomes are given in Table 2 .
Figur e 3. T he relationship between the meta-community diversity of functional groups in the EBI-Tara dataset thought to have simple substrates and: (A) energetic yield (not significant) (B) catabolic energy a ( P = 0.02, R 2 = 79%). findings, then there is an energetic limit on the amount of envir onmental v ariation and thus the number of ecotypes or niches. This is easy to conceive for functional groups using organic matter as an electron donor (aerobic heterotrophs, fermenter, ir on-r educing bacteria, and sulphate-r educing bacteria), wher e the breadth of viable organic electron donors and end products (Gr oßk opf and So y er 2016 ), and thus the range of ecotypes, could be dictated by the redox value of the electron acceptor. For example, some substr ates ar e onl y ener geticall y viable in the presence of oxygen. For bacteria with single substrates the energy could still affect the range of ecotypes by placing constraints on the r ange of ener geticall y demanding envir onments that ar e viable (Oren 2011 ).
If variation did scale with ener gy, ther e could be a signal in the m utation r ates of the bacteria. A number of theor eticians (using a variety of simplifying assumptions) have proposed that the rate of mutation will come into balance with the rate of selection or envir onmental v ariation (Dawson 1998, Gillespie 1981, Kim ur a 1967, Leigh 1970, Orr 2000, Sturte v ant 1937. Though there is only circumstantial evidence for this proposal (Drake et al. 1998 ). But if the theories were correct and environmental variation did underlie our observations, we would expect mutation rates to scale positiv el y with metabolic energy.
We think it is instructive to at least consider other explanations. Not least because attempts to experimentall y demonstr ate a relationship between bacterial diversity and envir onmental v ariation hav e pr e viousl y failed in both comm unities (Pholc han et al. 2010 ) and experimental bacterial populations (Jasmin and Kassen 2007 ).
A very simple explanation is that the number of generations contr ols div ersity, and gener ation times could be affected by metabolic energy. Ho w ever, the actual number of generations will depend on the amount of redox donor and acceptor available (all of which can vary between biomes) and the age of the lineage (some of the oldest lineages are the least diverse). More generally, generation time does not appear to be the universal and consistent mec hanism r equir ed to explain our observ ations (Gibson and Eyre-Walker 2019 ).
Others hav e pr e viousl y conjectur ed that m utational load (which has an absolute as opposed to relative effect on growth rate) would be felt more acutely by bacteria with less available energy (Curtis et al. 2008 ). Ho w ever, in classical population genetics, the mutational load is dictated by the rate of mutation and not the deleterious effect of the mutations per se (Haldane 1937 ). Though dynamic models of the evolution of mutation have found that the deleterious effect of a mutation can influence mutation rate, the effect is very modest and only apparent at certain parameter v alues (Andr e and Godelle 2006, Johnson 1999. In any case, div ersity r eflects the fixation of m utations, not simpl y the r ate at which they are generated.
The small but unavoidable energetic cost of the loss of entropy that information stor a ge entails could be responsible for the differences in diversity we have observed. In principle, the costs of entropy do place an absolute upper limit on the rate of information processing in a cell and a microbiome (Wolpert 2016, Kempes et al. 2017. In practice, the costs-though not well understoodappear to be very small indeed, and so we, for the time being, discount this mechanism. A simple and testable explanation is also offered by the drift barrier: only alleles with a selection coefficient greater than the po w er of drift ( ∼1/N e ) where N e is the effective population size) are subject to selection and thus fixation (L ynch 2012 , L ynch et al. 2016 , Sung et al. 2012 ). If the metabolic energy affected either the effective population size or the r elativ e impact of selection on fitness, this could lead to a relationship between diversity and energy.
It is difficult to distinguish between these explanations . T he pr oposed mec hanisms ar e not m utuall y exclusiv e . T he generation of bacterial diversity is perhaps subject to a hierarchy of controls. For example , en vir onmental v ariation ( ∼nic hes) but could be important in generating diversity, but within a limit set by the drift barrier. To gain a deeper understanding, we need to determine how the fundamentals of population genetics (effective population size, mutation rates, and selection coefficients) vary between functional groups and with environmental conditions. Determining these fundamentals will not be easy, but the quantitative description of determinants of, and limits to, natural selection in micr oor ganisms is a w orthwhile endeav our (Gould and Lewontin 1979 ).