Island biogeography theory provides a plausible explanation for why larger vertebrates and taller humans have more diverse gut microbiomes

Abstract Prior work has shown a positive scaling relationship between vertebrate body size, human height, and gut microbiome alpha diversity. This observation mirrors commonly observed species area relationships (SARs) in many other ecosystems. Here, we expand these observations to several large datasets, showing that this size–diversity scaling relationship is independent of relevant covariates, like diet, body mass index, age, sex, bowel movement frequency, antibiotic usage, and cardiometabolic health markers. Island biogeography theory (IBT), which predicts that larger islands tend to harbor greater species diversity through neutral demographic processes, provides a simple mechanism for positive SARs. Using a gut-adapted IBT model, we demonstrated that increasing the length of a flow-through ecosystem led to increased species diversity, closely matching our empirical observations. We delve into the possible clinical implications of these SARs in the American Gut cohort. Consistent with prior observations that lower alpha diversity is a risk factor for Clostridioides difficile infection (CDI), we found that individuals who reported a history of CDI were shorter than those who did not and that this relationship was mediated by alpha diversity. We observed that vegetable consumption had a much stronger association with CDI history, which was also partially mediated by alpha diversity. In summary, we find that the positive scaling observed between body size and gut alpha diversity can be plausibly explained by a gut-adapted IBT model, may be related to CDI risk, and vegetable intake appears to independently mitigate this risk, although additional work is needed to validate the potential disease risk implications.


Introduction
From birth, vertebrate animals are colonized by a diverse community of commensal microbiota that they carry throughout their lives [1].The vast majority of these commensal microbes reside in the gastrointestinal tract [2].The human gut microbiota has an enormous impact on our phenotype [3], with almost half of the metabolites circulating in blood significantly associated with cross-sectional variation in the ecological composition of the gut microbiome [4,5].One of the key ecosystem functions that the gut microbiota provides to its host is resistance to enteric bacterial pathogens [6].Niche saturation or nutrient competition are commonly invoked mechanisms for how the microbiota excludes invaders [6,7].Specifically, species-diverse commensal communities are more apt to saturate available metabolic niches so that an invasive pathogen is less likely to colonize, outcompete commensals, and cause disease [6][7][8].Though many determinants of gut microbiome alpha diversity (i.e.taxon richness and/or evenness in a given sample) are known, including diet, intestinal transit time, and antibiotic treatment, much of the variation in gut alpha diversity remains unexplained [9][10][11][12].
Vertebrate body size, which varies over six orders of magnitude, has been shown to be positively associated with gut microbiome alpha diversity, indicating that larger animals with larger guts harbor more species [13].This prior work used an older method for quantifying alpha diversity that involved rapidly transitioning single-stranded DNA from 16S rRNA gene amplicons from warmer to colder temperatures so that single strands quickly fold into unique shapes that are determined by their primary sequence.These unique, negatively charged single-stranded DNA strands migrate at different rates through a capillary tube during electrophoresis, with each band representing a different taxon, and laser-induced f luorescence detection allows for band/taxon quantification [13,14].Similarly, recent work from a large human cohort, using 16S rRNA gene amplicon sequencing, showed a positive association between height and gut microbiome alpha diversity [15].We were curious to understand the mechanism(s) underlying these scaling patterns.
Body size versus gut microbiome diversity scaling patterns mirror similar species area relationships (SARs) seen in other ecosystems, where larger geographic areas correspond to more observed taxa [16][17][18].The mechanisms underlying SAR-like behavior in the gut, and the potential clinical consequences of this scaling, have yet to be explored.Prior literature has shown that simple neutral processes are able to replicate certain features of gut microbiome community dynamics [19,20].Island biogeography theory (IBT), a classic neutral model of species immigration/ emigration, birth/death, and speciation/extinction, predicts a positive SAR [21].Specifically, larger islands tend to harbor more individuals, which ultimately gives rise to a larger number of coexisting species [21].We hypothesized that IBT may explain the size-diversity relationships observed across vertebrates and within humans.To further explore this hypothesis, we built an individual-based model (IBM) that approximates IBT in the gut, allowing for variation in system length, with immigration, emigration, reproduction, and a unidirectional f low-through system.We simulated length ranges that approximate the scaling of vertebrate body sizes and human height variation, with residence time (i.e. the product of the f low rate and the length of the system) as our proxy for area, and compared these simulation results to our empirical observations.
The potential clinical significance of this scaling pattern lies in the fact that lower gut microbiome alpha diversity has been associated with greater susceptibility to enteric infections [8,22].As mentioned above, the mechanism for this relationship between infection risk and diversity is likely related to the niche saturation hypothesis, which posits that a more diverse commensal microbiota protects the host from enteric pathogens through competitive exclusion [7,23,24].Clostridioides difficile, an opportunistic enteric pathogen, is the most common form of hospitalacquired colitis in the USA, and susceptibility to this disease is strongly related to common diversity-reducing disruptions to the commensal gut microbiota, such as antibiotic treatment or diarrhea [8,[25][26][27].Clostridioides difficile infections (CDIs) are initially treated with oral antibiotics, and recurrent illness is common, especially in those starting out with lower gut microbiome alpha diversity [8,28,29].We hypothesized that, due to the scaling between height and gut microbiome alpha diversity, individuals who report a history of enteric infection may be slightly shorter, on average, than those who do not.
Overall, we demonstrate a consistent scaling between body size and gut microbiome alpha diversity across vertebrates and human populations.We find that this association is independent of many potential confounders, like diet, bowel movement frequency (BMF), body mass index (BMI), age, and sex.We provide a plausible mechanistic hypothesis for this size-diversity scaling, based on a gut-adapted IBT.Finally, we show evidence that this scaling may be relevant to human health, although additional validation will be necessary to assess the clinical relevance of this observation.

Published datasets
In order to investigate the relationship between vertebrate body size and gut microbiome alpha diversity, we used three independent datasets: Godon et al. 2016, Song et al. 2020, and Groussin et al. 2017 [13, 30, 31].
The Godon et al. 2016 dataset included pre-calculated Simpson's diversity [13], derived from capillary electrophoresis single-strand conformation polymorphism (CE-SSCP) f luorescence patterns of DNA amplicons from the V3 region of the 16S rRNA gene [32], and body mass for 71 vertebrate species, where bacterial diversity was assessed by extracting DNA from feces.These samples were obtained from captive or domesticated vertebrates in France from either zoos, farms, aquariums, recreational farms, or individual keepers.The metadata for these samples included information about the individual's diet type and mass, which were curated from literature for the smaller species or provided by the breeders for the larger species.
Song et al. 2020 included 16S rRNA gene amplicon sequencing data derived from 1373 samples from 164 vertebrate species' fecal samples, intestinal contents, or aspirations of the large intestine.Samples that showed signs of contamination from soil or environmental bacteria, from juveniles/newborn individuals, and from diseased individuals were removed.Moreover, duplicate samples from the same individual were removed and samples were included only if the host species had been sampled at least twice.ASVs that were not of bacterial origin were removed.Samples with no information on country of origin and/or information on the preservative used were removed.This dataset included fecal samples from vertebrates living in wild and captive populations.The metadata for this dataset included the body mass and diet type of the vertebrate.We restricted our vertebrate analyses to carnivores and herbivores, removing insectivores and omnivores for clearer estimation of the effect of diet by making it a binomial categorical variable.
Groussin et al. 2017 includes 16S rRNA gene amplicon sequencing data derived from fecal samples from 33 mammalian species.The operational taxonomic unit (OTU) table was downloaded from Groussin's publicly available GitHub repository "Mam-malianGuts" and had been processed as described in Groussin et al. 2017 (https://github.com/mgroussi/MammalianGuts).Animal masses were not included from this study, but average masses for each species were manually curated from literature and from the AnimalTraits Database , and these masses are now deposited in the GitHub repository for this paper (see data availability section).These samples were taken from both wild and captive populations of vertebrates.
The data collected from the Arivale cohort included height, BMI, sex, BMF, vegetable consumption frequency, and 16S rRNA gene amplicon data derived from stool samples.We used the mbtools workf low (https://github.com/gibbons-lab/mbtools)to denoise the 16S rRNA gene data.Moreover, training error models and removal of chimeric reads using DADA2 was done separately for each sequencing run to generate amplicon sequence variants (ASVs) for each sample [58].
The American Gut cohort consisted of self-selected adults participating in a citizen-science program, primarily from the USA, the UK, and Australia [9,59].We downloaded the American Gut data from figshare (https://doi.org/10.6084/m9.figshare.6137192.v1), with the metadata including self-reported height, weight, sex, age, BMF, CDI history, and vegetable intake frequency.We used the self-reported height and weight to calculate BMI for each participant.The sub operational taxonomic unit (sOTU) table was generated by the original American Gut authors using Deblur [60], and all reads were trimmed to a length of 125 nucleotides.Moreover, because samples had been sent via mail at room temperature, samples with obvious bacterial blooms that occurred in the sample tubes were removed.Samples were rarefied to 1250 reads and samples with <1250 sequences were not included in the sOTU table.Subsequent filtering on the data was performed in order to avoid including erroneous self-reported data: we filtered out individuals who were <18 years old; we removed samples from participants who reported a height >244 cm or a height <122 cm; and we removed participants who reported a weight ≥300 kg.
The Arivale and American Gut cohort metadata included equivalent information about each individual's age, sex, height, and BMI.However, the way that BMF was recorded for the American Gut cohort was different from how it was recorded for the Arivale cohort.In the Arivale cohort questionnaire, participants were prompted to respond to: "I have bowel movements" with the options: "2 or fewer times per week," "3-6 times per week," "1-3 times daily," and "4+ times daily."The American Gut bowel questionnaire prompted individuals to respond to: "How many times do you have a bowel movement in an average day?",where individuals could choose from the options: "Less than one," "one," "two," "three," "four," or "five or more."In addition, dietary reporting questions were slightly different across cohorts.American Gut cohort participants were prompted to report their weekly dietary intake of vegetables by responding to the prompt: "In an average week, how often do you consume at least 2-3 servings of vegetables, including potatoes in a day?(1 serving = 1 / 2 cup vegetables/potatoes; 1 cup leafy raw vegetables)" and chose from the following responses: "a.Never," "b.Rarely (less than once/week)," "c.Occasionally (1-2 times/week)," "Regularly (3-5 times/week)," and "e.Daily."Individuals in the Arivale cohort were prompted to report their weekly dietary intake of vegetables by responding to the prompt: "How many servings of vegetables (including juice) do you have each day?(1 serving = 1 cup raw/leafy vegetables, 1 / 2 cup cooked vegetables, or 1 / 2 cup vegetable juice)" and chose from the following responses: "Zero/less than 1 per day," "1," "2-3," "4-5," or "6 or more."We standardized BMI, age, log-Simpson's diversity, and log-height by calculating their Z-scores (Z = (x − mean(x)) /std(x)) within each cohort prior to statistical analyses.We categorized individuals in each human cohort into "high" and "low" vegetable consumption groups when binning the two groups in separate regressions (Fig. 2C-D).Individuals in both the American Gut and Arivale cohorts were categorized into the high vegetable consumption group if they reported at least 1 serving of vegetables a day, with anyone reporting less being included in the low vegetable consumption group.

Sequencing data processing and diversity calculations
We used Qiime 2-2022.8 to rarefy our ASV and OTU tables and to compute the Simpson's diversity metric across samples.The rarefaction depth (i.e. the minimum sampling depth) for each dataset was as follows: Song et

Simulations
All of our simulated digestive tracts had identical parameters for the shape of the lognormal meta-community population distribution (lgp = 0.9999), reproduction rate (r = 0.01), and immigration rate (im = 25).Our decision to sample from a lognormal population distribution was meant to approximate the heavytailed species abundance distribution that has been observed in the gut microbiome.Furthermore, lognormal SADs have been predicted by Hubbell's unified neutral theory of biodiversity [61] and have been successfully applied to describe real-life SADs of both microbial and non-microbial communities [61][62][63][64][65].For each individual in our simulated system, a species ID (represented by an integer) and an x-coordinate (x-coordinate = 1) were assigned during the immigration process.For each individual simulation, the length of the system was randomly determined using the numpy.randommodule (version 1.23.3).The reproduction rate determined the percentage of the population randomly selected for reproduction per timestep.In our simulations, 1% of the population were randomly selected for reproduction and the resulting progeny inherited the species ID and current x-coordinate of their mother cell.The immigration rate determined the number of new individuals sampled from the meta-community that entered the system at each time step, and individual species IDs were determined according to a lognormal distribution of integers (i.e.there were more 1 s than 2 s, more 2 s than 3 s).At each time step, each individual moved one unit down the length of the system relative to their current position during a process defined as "f low."If an individual's x-coordinate was larger than the length of the system, their species ID and x-coordinate were deleted from the model.Apart from receiving a unique ID, each species was functionally identical (i.e. this is a purely neutral model).
Once these parameters were determined, a simulation was initialized, in which the processes-immigration, reproduction, and f low-were carried out at each time step.The order in which these processes took place was randomized for each time step to remove systematic biases.Simulations continued iterating through these processes until the number of individuals and the number of species reached stationarity, which was determined using an Augmented Dickey-Fuller test using the statsmodels package after at least 1000 time steps had been completed (version 0.13.5)[66].When stationarity was reached, the simulation stopped and the steady-state Simpson's diversity was calculated based on the individuals and species in the system and printed to a CSV file at a location which can be defined by the user in main.py(see code availability section).

Statistical analyses
Ordinary Least Squares (OLS) regression (statsmodels 0.13.2) was used to test the association between vertebrate body size and gut microbiome alpha diversity, using the formula: log-Simpson's diversity ∼ log-mass (kg).We used the numpy package to log transform the Simpson's diversity (1/D) and the mass (kg) for all vertebrate datasets.To further investigate the role of diet and the interaction between diet and body mass, we also carried out another OLS regression using the formula: log-Simpson's diversity ∼ log-mass (kg) + diet + diet:log-mass (kg).Similarly, the association between height and gut microbiome alpha diversity in the Arivale and the American Gut cohorts was tested using OLS regression, first with a univariate model, with the formula: log-Simpson's diversity ∼ log-height, and secondly with a multivariate model with the formula: log-Simpson's diversity ∼ log-height + age + sex + vegetable consumption + BMI + BMF.In order to assess the significance of height in the OLS models, we used an analysis of variance (ANOVA) F-test (statsmodels 0.13.2) to compare a reduced model with the formula: Simpson's diversity ∼ age + sex + vegetable + BMI + BMF to the full model, which includes height, described above.For our health-adjusted analysis in the Arivale dataset, we added multiple health markers measured from participant blood plasma as covariates in our OLS regression, including low-density lipoprotein (LDL) cholesterol, C-reactive protein (CRP), and hemoglobin A1C (HbA1C), excluding individuals who reported antibiotic usage within the last 3 months.We used the following formula: log-Simpson's diversity ∼ log-height + age + sex + vegetable consumption + BMI + BMF + height + vegetable consumption: height + LDL-cholesterol + CRP + HbA1C.In the American Gut cohort, we performed a Welch's t-test (scipy 1.9.1) with Bonferroni correction (correcting for three tests) to assess the mean difference in Simpson's diversity between individuals with and without a history of CDI, as well as within the groups "low vegetable intake" and "high vegetable intake."Similarly, we used a Welch's t-test with Bonferroni correction correcting for three comparisons per hypothesis in order to compare the mean heights of individuals with and without a history of CDI in the entire cohort and within the groups: low vegetable intake and high vegetable intake.We defined individuals with high vegetable intake as individuals who ate vegetables at least once a day.Individuals who ate their vegetables less than once a day were placed in the low vegetable intake category.We conducted our causal mediation analyses in R, using the "mediation" package (version 4.5.0)[67], with height or diet as the treatment, alpha diversity as the mediator, and CDI history as the response.The significance threshold for all tests was set at P < .05.

Data and code availability
All code, notebooks, and intermediate data files related to data analysis and IBM simulations can both be found in the following GitHub repository: https://github.com/Gibbons-Lab/IBT-and-the-Gut-Microbiome.Raw data from the Godon et al. 2016, Song et al. 2020, and Groussin et al. 2017 studies can be accessed in the original papers, or above in the "Published datasets" section [13,31,68].The American Gut dataset was downloaded from figshare and can be found at the following link: https://figshare.com/articles/dataset/American_Gut_Project_fecal_sOTU_counts_table/6137192.Qualified researchers can access the full Arivale deidentified dataset supporting the findings in this study for research purposes through signing of a data use agreement.Requests to access the Arivale data can be made at data-access@isbscience.org and will be responded to within seven business days.We added an interaction effect term between diet and logmass in the OLS regression models to assess the impact of diet on the observed body-size scaling (log-Simpson's diversity ∼ logmass + diet + diet:log-mass; Fig. 1C-D).The diet:log-mass interaction term was statistically significant in the Godon et al. 2016 dataset (P = .0174),but not in the combined Groussin et al. 2016 and Song et al. 2020 datasets (P = .770).If we use the Fisher's method to combine the interaction term P values, we do not find evidence for a consistent effect across datasets (combined P = .0710).Overall, the effect of vertebrate body mass on diversity is significant, independent of diet, though results were mixed in the presence of an interaction term between mass and diet.

Relationship between human height and gut microbiome alpha diversity is robust to inclusion of relevant covariates
We found that human height and gut microbiome alpha diversity were positively associated across two large, independent human cohorts.We used data from the Arivale cohort (n = 3063) and the American Gut cohort (n = 5516) to compare log-Simpson's diversity versus log-height (Fig. 2).The Arivale cohort consisted of self-selected American adults who had enrolled in a scientific wellness program primarily from the Pacific West of the United States, and the American Gut cohort consisted of self-selected adults participating in a citizen-science program, primarily from the USA, the UK, and Australia [9,59].
Because several demographic variables are known to affect gut microbiome diversity, we ran OLS regressions within each cohort, with log-height, age, sex, BMI, BMF, vegetable consumption, and vegetable consumption:log-height interaction terms (log-Simpson ∼ age + sex + BMI + BMF + vegetable consumption + log-height + vegetable consumption:log-height).Consistent with the literature, we found that in both cohorts, being male, having a higher BMF, and a higher BMI were all negatively associated with log-Simpson's diversity, while height, age, and vegetable consumption were all positively associated with log-Simpson's diversity (Table 1).Log-height was positively associated with Simpson's diversity in the presence and in the absence of the covariates listed above (Table 1).In addition, an ANOVA comparing the univariate model to an intercept-only model found that there was a significantly higher fraction of variance explained by a model including only height across both cohorts (F-test P = 1.31 • 10 −6 for the Arivale cohort; P = 3.11 • 10 −2 for the American Gut cohort; Table 1).Furthermore, an ANOVA, comparing a reduced model without height (covariates only) versus a full model including height, showed that there was a significantly higher fraction of variance explained by the full model across both cohorts (F-test P < 10 −6 for the Arivale cohort; P = 3.36 • 10 −5 for the American Gut cohort; Table 1).A paired Z-test determined that there was no significant difference in height β coefficients between the univariate and multivariate models for the Arivale dataset (Z = 1.25, P = .211).There was a significant difference between the univariate and multivariate models for the American Gut dataset (Z = 2.69, P = .00716),but we did not think this difference in coefficients was interpretable or meaningful given the small magnitude of the coefficients as well as the collinearities among the covariates (e.g.sex and height).We did not find evidence of a significant interaction effect between height and vegetable intake in the Arivale dataset, but we did in the American Gut dataset (P = .576and P = .0268,respectively; Table 1).
We followed up these regressions with a health-adjusted analysis in the Arivale cohort, in order to exclude the possibility that these patterns were driven by differences in host health status.We excluded individuals who reported antibiotic use in the last 3 months and we added the following health markers as covariates to the multivariate model: LDL cholesterol, CRP, and HbA1C (log-Simpson ∼ age + sex + BMI + BMF + vegetable consumption + log-height + vegetable consumption:log-height + LDL + CRP + HbA1C).In this healthadjusted analysis, height retained its significance (P = 3.76 • 10 −3 ) and its beta-coefficient was similar (β = 0.1920) when compared to the non-health-adjusted multivariate model (β = 0.1670).In summary, the relationship between height and diversity is independent of these covariates across two large, independent cohorts.

Adapting island biogeography theory to the gut
In order to demonstrate a mechanistic link between body size and gut alpha diversity, we reformulated IBT, adapting it to simulate varying gut lengths instead of island areas (Fig. 3 and Table 2).At each time step, the system iterated through the processes: "immigration," "f low," and "reproduction," in a random order.For the immigration step, individuals were randomly sampled from a mainland (i.e.meta-community) lognormal species abundance distribution, entering the system from the inlet.For the f low step, individuals moved one spatial unit down the simulated system (i.e.upstream to downstream; f low was unidirectional).For the reproduction step, all individuals had the same random chance of reproducing, with a probability defined by the reproduction rate.Simulations progressed through these steps in a random order until the simulation had iterated through at least 1000 time steps, at which point the system checked to see if the number of species and individuals had reached stationarity using an Augmented Dickey-Fuller (ADF) test run on the prior 1000 time steps.If the system had not reached stationarity, the simulation would continue and check for stationarity at each time step beyond the 1000th.When stationarity of the system was reached, the steady-state Simpson's diversity for that model was calculated.
We ran 2 sets of 1000 simulations, one approximating the vertebrate body size range (three orders of magnitude; Fig. 4A) and another approximating the human size range (∼2-fold; Fig. 4B).All of our simulations had identical and fixed reproduction and immigration rates (Table 2).Most model parameters were fixed, selected following a series of test simulations (Fig. S1).In our series of test simulations, we tested a combination of different parameter settings for immigration rates and reproduction rates.Low, medium, and high immigration rates were defined as 1, 25, and 100, respectively (i.e. the number of individuals sampled from the meta-community at each time step).For testing different reproduction rates, which determined what percentage of the  The following table lists the parameters in our IBMs and the value(s) of each parameter used in the simulations presented in the main results section.
population would be selected for reproduction, we defined low, medium, and high reproduction rates as 0.1, 1, and 10%.Based on test simulation performance (determined by the variance in alpha diversity explained by system length), we decided to use the medium immigration and medium reproduction rates, respectively.Thus, all simulations had an immigration rate of 25 individuals added to the system per time step and a reproduction rate of 1% per time step.In both sets of simulations, steady-state Simpson's diversity was positively associated with simulated system length ( Fig. 4).OLS regression (log[Simpson (1/D)] ∼ log[system length]) showed that the simulations run on the vertebrate scale had a much larger R 2 value (R 2 = 0.50, P < 10 −6 ) compared to the simulations run on the human scale (R 2 = 0.01 • 10 −3 , P = 7.98 • 10 −3 ).Given an apparent saturation effect observed over the larger size ranges, we fit an exponential model, which yielded a higher R 2 (0.69), indicating that over larger size ranges an exponential model is a better fit than a linear model.Overall, we found our simulated data recapitulated the empirically observed positive scaling between body mass, height, and Simpson's diversity (Figs.1-2).

Exploring the clinical implications of the height-diversity relationship in the American gut cohort
Low gut alpha diversity has been associated with susceptibility to enteric infections [8].We hypothesized shorter individuals, with slightly lower alpha diversity, were perhaps slightly more susceptible to CDI.The American Gut cohort contained individuals with self-reported histories of CDI, which allowed us to explore this hypothesis.Consistent with prior literature, individuals who reported a history of CDI (n = 138) had less diverse gut microbiomes than those who did not (n = 9072; Welch's t-test with Bonferroni correction t = 4.76, P = 2.42 • 10 −5 ; Fig. 5A).Moreover, individuals who reported a history of CDI had a shorter average height than those who did not (Welch's t-test with Bonferroni correction t = 3.92, P = 4.21 • 10 −4 ; Fig. 5D).We hypothesized that this association between height, diversity, and CDI could be inf luenced by diet, as prior research has shown that increasing consumption of a larger variety of plants is positively associated with gut microbiome alpha diversity, and more specifically, eating fewer vegetables has been found to be a risk factor for CDI [69,70].We partitioned our analysis by self-reported vegetable consumption, looking at differences in height between those with or without a history of CDI in high and low vegetable consumption groups.Of the individuals who reported low vegetable intake, the mean height of individuals with (n = 72) and without a history of CDI (n = 4311) was not significantly different (Fig. 5E, Welch's t-test with Bonferroni correction P = .502).However, of the individuals who reported high vegetable intake, the mean heights of individuals who reported a history of CDI (n = 64) were significantly shorter than those who had not (n = 4666; Fig. 5F; Welch's t-test with Bonferroni correction t = 4.19, P = 2.62 • 10 −4 ).This variable result across vegetable consumption groups is consistent with the observed shift in the scaling relationship between height and diversity across different dietary contexts (Figs.1-2).Specifically, we might expect the height effect to be dampened in the low vegetable group if most of the population is unlikely to cross the alpha-diversity threshold for lower Figure 3. Schematic of the individual-based model (IBM) used to simulate island biogeography theory (IBT) in the gut.We built a simple IBM that approximated the unidirectional f low of the gut, where we could vary the length of a one-dimensional system.Individuals were randomly generated from a heavy-tailed species abundance distribution, entering the system on one side, f lowing along the length of the system over time at a constant rate, and eventually exiting the other side of the system.The number of individuals entering the system per time step was determined by the immigration rate of the simulation (im).In addition to f lowing through the system (f), some individuals were randomly selected for reproduction (r) at each time step.Simulations were run until the number of individuals and the number of species reached a steady state, as determined by the augmented Dickey-Fuller (ADF) test, before the diversity metric was calculated for each simulation.ADF tests were performed from T1000 (i.e. the 1000th time step) onward, as depicted here.The bottom diversity vs. time plot shows the initial nonstationary period, and the approach toward stationarity over time.
CDI risk, whereas this lower risk threshold may be crossed for a larger proportion of the population (i.e. for shorter individuals) in the high vegetable consumption group ( Fig. S2).In addition, the observed loss of significance in the low vegetable consumption group could be due to a simple lack of statistical power to detect a relatively weak effect when subsetting the population.Gut alpha diversity tended to be lower in individuals with a history of CDI in both vegetable consumption groups, although the association was on the edge of our significance threshold for the high vegetable consumption group (Fig. 5B and 5C, Welch's t-test with Bonferroni Figure Simulations mirror empirical relationships observed between size and diversity.Both plots show 1000 simulations, where the length of the system was varied within a set range.(A) When varying IBM lengths over three orders of magnitude to approximate vertebrate gut size range, we see a strong association between size and diversity.We find that an exponential model is a better fit than a linear model.(B) When varying IBM length over a much smaller 2-fold range, which is more in line with the observed range in human heights, we see a much weaker association between size and diversity, similar to empirical observations.The zoom-in box drawn in panel (A), which illustrated the human size range, is not to scale, but has been increased in size to improve visibility.correction t = 4.08, P = 5.02 • 10 −4 ; t = 2.51, P = .0517,for low and high vegetable intake, respectively).
We ran a mediation analysis, with bootstrapping (n = 5001), to investigate whether the associations between diet, height, and CDI history were mediated by alpha diversity (i.e.testing the niche saturation hypothesis).When we classified vegetable consumption as a treatment, Simpson's diversity as a mediator, and CDI history as an outcome, we found that the average causal mediated effect (ACME), the average direct effect (ADE), and total effect were all statistically significant (P < 4.00 • 10 −4 , P = .0116,P = .00560,respectively; Fig. 6A).We found evidence for partial mediation, with 8% of the effect of vegetable consumption on CDI history mediated by Simpson's diversity (Fig. 6A; mediation fraction P = .00560).We next classified height as a treatment, Simpson's diversity as a mediator, and CDI history as an outcome.In the height mediation analysis, the ACME was negative and statistically significant (ACME = −0.0004,P = 4.00 • 10 −4 ), but the ADE and the total effect were not statistically significant (Fig. 6B; P = .657and P = .793,respectively).These results suggest that whatever effect that height may have on CDI history may be completely mediated by Simpson's diversity.

Bigger animals harbor more bacterial taxa in their guts
Prior literature has shown that there are many drivers of alpha diversity in the gut microbiota of vertebrates, including gut morphology, evolutionary history, and diet [71][72][73].For example, foregut and hindgut fermenters have been shown to have higher gut microbial richness, measured by Shannon diversity, than carnivores with simpler gut anatomies [71].We found that host body size was strongly associated with gut alpha diversity across several datasets, even when controlling for the host diet (Fig. 1), which suggests that mechanisms independent of gut morphology, phylogeny, and diet are at play.

Human height is associated with gut microbiome alpha diversity, independent of relevant covariates
Many factors are known to affect human gut microbiome alpha diversity [9,15,74,75].For example, prior work in less-thanhealthy older individuals has shown a decline in alpha diversity with age [76], whereas other studies in healthier older people and in community-dwelling centenarians have shown a decline in core taxa and increased alpha diversity with age [59,74,77].In addition to age, sex has been associated with gut microbiome diversity, with females tending to show higher gut alpha diversity levels than males [74,78].Obesity and BMI have been negatively associated with gut alpha diversity, perhaps due to lower dietary fiber intake and higher levels of systemic inf lammation associated with obesity [75,79].BMF is negatively associated with gut alpha diversity, with individuals experiencing constipation showing higher diversity and individuals experiencing diarrhea showing lower diversity [80].Finally, a diet rich in plantbased substrates has been associated with higher gut alpha diversity, which is often attributed to the complex polysaccharide content of these foods [70,81].To complicate matters further, these demographic variables are highly interrelated.For instance, females, on average, exhibit lower BMFs, shorter heights, and higher fruit and vegetable consumption than males, and the cumulative effects of these entanglements on diversity can be difficult to predict [74,80,82].Furthermore, BMI's negative association with gut microbiome alpha diversity, and positive association with intestinal inf lammation (which can cause shortening of the bowel through smooth muscle contractions), pushes against the overall positive scaling we see between body mass and diversity across vertebrates [75,79].Thus, we used height, which has been shown to be positively associated with bowel length [83][84][85][86], as an obesity/inf lammation-independent proxy for measuring (D) The mean height of individuals who reported a history of CDI was significantly lower than individuals who did not.(E) There were no significant differences in mean height between people with and without a history of CDI if they had low vegetable intake.(F) Individuals with a history of CDI were significantly shorter than those who did not among individuals with high vegetable intake.Brackets with stars indicate various levels of significance when performing a Welch's t-test with Bonferroni correction.body size in humans.Overall, we were able to control for all of these potentially confounding variables in our regressions, and we found that the associations between height and gut alpha diversity were robust to the inclusion of these variables across two large, independent cohorts ( Table 1).Given the vast literature confirming several disease states have been associated with gut microbiota dysbiosis [87,88], we followed up our regression analysis with a health-adjusted regression using the Arivale cohort.Controlling for several health-associated clinical chemistries and excluding individuals who had taken antibiotics within the last 3 months, we found that the association between height and gut diversity retained its significance.Similar to the vertebrate case above, these results suggest that size-diversity scaling is driven by a mechanism that is independent of dietary intake, inf lammation, health status, or bowel movement frequency.

Adapting IBT to simulate guts of varying lengths
We hypothesized that these size-diversity SARs could be explained by IBT, a simple neutral model that shows how bigger islands harbor a larger number of ecologically equivalent species.We built a simple IBM, designed to approximate IBT in the gut, to show that gut length is indeed positively associated with species diversity (Fig. 4).In our gut-adapted model, intestinal transit time served as a proxy for island area.The amount of variance in Simpson's diversity explained by system length in the IBMs increased with increasing length scales over which our IBMs were simulated, which closely matched what we observed across both vertebrates and human body size scales (Figs.1-3).The relationship between system length and diversity appeared to behave somewhat asymptotically over the vertebrate body size range (Fig. 4A), which was similar to what we and others have empirically observed (Fig. 1A) [17].These results provide a plausible mechanism for the observed size-diversity scaling and add to the body of literature showing how simple neutral models can account for macroecological patterns in real-world microbial communities [19,20,89].

Investigating the clinical implications of IBT in humans
A diverse gut microbiota can be a barrier against invasive pathogens, by saturating available metabolic niches [7,23,24,90].Thus, our results suggest that height may be a weak predictor of enteric pathogen susceptibility, due to its minor inf luence on alpha diversity.Indeed, we found that, on average, individuals reporting a history of CDI were shorter and had lower gut alpha diversity than those who had no history of CDI (Fig. 5).
In a prior study, CDI patients tended to report consuming fewer vegetables than individuals without CDI, highlighting the role of Figure 6.American gut cohort mediation analysis: Simpson's diversity mediates the impact of diet and height on CDI history.All regressions reported include the covariates: age, sex, BMI, diet (not a covariate in the diet mediation analysis), and BMF.(A) Mediation analysis assigning vegetable consumption as a treatment, Simpson's diversity as a mediator, and CDI history as an outcome showed that the ACME, ADE, and total effect coefficients were negative and statistically significant (P < .05).(B) Mediation analysis assigning height as a treatment, Simpson's diversity as a mediator, and reported CDI history as an outcome found that only the ACME was statistically significant (P < .05).Both diet and height were significantly, positively associated with Simpson's diversity (P < .05). Simpson's diversity was significantly negatively associated with CDI history (P < .05).diet in CDI prevention [ 69].Dietary plant intake has a stronger positive inf luence over diversity than almost any other lifestyle factor in human cohorts [9,13,91,92].Vegetable intake was, indeed, strongly associated with gut alpha diversity and with CDI history in the American Gut cohort (Table 1 and Fig. 5).However, the interplay between diet and height was more complex than we anticipated, with height showing a significant association with diversity in the high vegetable intake group, but not in the low vegetable intake group.In order to make sense of this result, we postulated that higher vegetable intake shifts height-diversity scaling such that the putative pathogen susceptibility threshold in alpha diversity is crossed at shorter heights than for the low vegetable consumption group (Fig. S2).This shift could account for the lack of association observed between height and CDI history in the low vegetable consumption group, but a lack of statistical power could also explain this result.
A formal mediation analysis on the effect of diet and height on CDI history yielded evidence that alpha diversity mediated the treatment effect of both height (complete mediation) and vegetable consumption (partial mediation; Fig. 6).Classical mediation analysis does not usually consider the significance of the ACME in the absence of a significant direct or total effect, as was the case with height (Fig. 6) [93].However, in certain scenarios, this conservative approach can miss true mediation effects, such as when the mediator has nearly the same magnitude effect as the total model or when the direct and mediated effects of the treatment on the outcome are of opposite sign [94][95][96].In these scenarios, one can use bootstrapping to estimate confidence intervals for the ACME, as we did here [97].Overall, diet showed a much stronger association with both diversity and CDI history, suggesting that simple lifestyle interventions, like a higher vegetable intake, can have a larger impact on gut diversity and CDI risk than height.
In conclusion, we find a consistent association between body size and gut microbiome alpha diversity across vertebrates and human populations.The association between human height and gut microbiome alpha diversity was reproducible and was robust to the inclusion of several relevant covariates known to inf luence gut alpha diversity, including age, sex, BMI, BMF, and diet, across two large, independent cohorts.Furthermore, in the Arivale cohort, we find that this height-diversity scaling is independent of recent antibiotic use and blood measures of LDL cholesterol, HbA1C, and CRP.We showed how this macroecological scaling phenomenon could be plausibly explained by a gut-adapted version of IBT, with simulations closely matching empirical observations.Finally, we explored how the relationship between human height and gut alpha diversity is potentially relevant to CDI risk and how dietary patterns, like vegetable intake, may help mitigate this putative risk.The impact of body size on gut alpha diversity was relatively weak over human body size ranges, and the clinical relevance of this work remains tentative until it can be further validated in additional independent cohorts.

Study limitations
In our regression analyses, we implicitly consider several deterministic explanations for the observed size-diversity scaling, like the fact that herbivores can achieve larger body sizes than carnivores, bowel movement frequency can vary across sexes, and any number of interactions between our demographic and healthrelated covariates.None of these factors were able to explain away the observed size-diversity scaling.While we find our gutadapted formulation of IBT to be a plausible neutral mechanism to explain the size-diversity scaling relationships we observe, there is no definitive way to prove a positive [98], and there may well be other mechanisms at play that were not explored here.For example, we were not able to explore the potential for reverse causation of diversity on height in these datasets, but it is possible that early-life diversity (which could, conceivably, be related to later-life diversity that we measure) may itself inf luence linear growth of infants and the subsequent height of adults.In addition, in our simulated data, we were only able to span three orders of magnitude in size due to computational constraints, whereas vertebrates span six orders of magnitude in body size.However, we do not necessarily expect a 1:1 scaling relationship between gut length and body mass or height, which is consistent with prior literature [65,99].Although real-world guts show complex spatial structure, our model is spatially homogeneous (i.e. a one-dimensional tube).Food enters the mouth, largely devoid of gut bacteria, travels unidirectionally through the gastrointestinal tract, and then exits the anus with around 10 11 gut bacterial cells per gram feces [100].The mucus layer and its microbial inhabitants are continually shed outwards toward the lumen, becoming integrated into fecal material as it passes [101].Thus, despite the highly simplified spatial structure of our simulations, we believe that they capture the major features that are important to our neutral model of the gut: immigration, growth, and unidirectional f low.Finally, we must reiterate that we were likely underpowered to detect an association between height and CDI history in the American Gut cohort, which we expected to be a small effect size due to the already subtle inf luence of height on alpha diversity.The fact that we did see some signal there is interesting, but the clinical significance of this result is preliminary and will need to be validated in follow-up studies.

Figure 1 .
Figure 1.Relationship between body size and gut microbiome Simpson's diversity across vertebrates.(A) Log-body mass and log-Simpson's diversity are positively associated across two independent 16S rRNA gene amplicon sequencing datasets measuring gut alpha diversity among vertebrates.(B) A similar result emerges from a CE-SSCP dataset.(C) Here, we show OLS regression analysis (formula: log-Simpson's diversity ∼ log-mass + diet + diet:log-mass) of the Song et al. 2020 and Groussin et al. 2017 datasets including diet and a diet:log-mass interaction effect.(D) Here, we show OLS regression analysis (formula: log-Simpson's diversity ∼ log-mass + diet + diet:log-mass) including diet and a diet:log-mass interaction effect.

Figure 2 .
Figure 2. Relationship between height and gut microbiome Simpson's diversity across humans.Both figures displayed are on log-log axes, displaying Simpson's diversity versus height in (A) the Arivale cohort (n = 3063) and in (B) the American Gut cohort (n = 5516).Both plots show a similar trend where log-height is positively associated with log-Simpson's diversity.OLS regressions used the formula: log-Simpson's diversity∼ sex + age + BMI + BMF + vegetable consumption + log-height.Adding an interaction effect to our OLS regressions (log-Simpson's diversity∼ sex + age + BMI + BMF + vegetable consumption + log-height:vegetable consumption + log-height) did not yield a significant interaction effect in the (C) Arivale cohort (P = .576),but did in the (D) American Gut cohort (P = 2.68 • 10 −2 ).

Figure 5 .
Figure 5. Exploring interactions between height, gut Simpson's diversity, diet, and self-reported history of CDI in the American Gut cohort.For all plots, dashed lines represent the median of each group, which are shaded by CDI history according to the legend.(A) Individuals who have reported a history of CDI have a significantly lower mean Simpson's diversity (n = 9210) than individuals with a history of CDI (n = 138).(B) Of individuals who reported low vegetable intake (n = 4311), mean Simpson's diversity was significantly lower in individuals who had a history of CDI (n = 72).(C) Individuals who reported higher vegetable intake (n = 4666) did not show significantly lower Simpson's diversity among those who reported a history of CDI (n = 64).(D)The mean height of individuals who reported a history of CDI was significantly lower than individuals who did not.(E) There were no significant differences in mean height between people with and without a history of CDI if they had low vegetable intake.(F) Individuals with a history of CDI were significantly shorter than those who did not among individuals with high vegetable intake.Brackets with stars indicate various levels of significance when performing a Welch's t-test with Bonferroni correction.

Table 1 .
Ordinary Least Squares (OLS) regression shows height's association with Simpson's diversity is robust to inclusion of several relevant covariates.Height retains its significant association with diversity in the presence and in the absence of covariates across both cohorts.F-Tests, comparing a reduced model without height (including all covariates) to a full model including height, indicates that height provides a significant increase in the variance explained by the full model.