Legionella relative abundance in shower hose biofilms is associated with specific microbiome members

Abstract Legionella are natural inhabitants of building plumbing biofilms, where interactions with other microorganisms influence their survival, proliferation, and death. Here, we investigated the associations of Legionella with bacterial and eukaryotic microbiomes in biofilm samples extracted from 85 shower hoses of a multiunit residential building. Legionella spp. relative abundance in the biofilms ranged between 0–7.8%, of which only 0–0.46% was L. pneumophila. Our data suggest that some microbiome members were associated with high (e.g. Chthonomonas, Vrihiamoeba) or low (e.g. Aquabacterium, Vannella) Legionella relative abundance. The correlations of the different Legionella variants (30 Zero-Radius OTUs detected) showed distinct patterns, suggesting separate ecological niches occupied by different Legionella species. This study provides insights into the ecology of Legionella with respect to: (i) the colonization of a high number of real shower hoses biofilm samples; (ii) the ecological meaning of associations between Legionella and co-occurring bacterial/eukaryotic organisms; (iii) critical points and future directions of microbial-interaction-based-ecological-investigations.


Introduction
The genus Legionella comprises over 60 species, of which about half ar e r ecognised opportunistic pathogens and the etiological agents of Legionellosis.Legionella pneumophila was the first species identified, and is still linked with most reported Legionellosis cases (Fields et al. 2002 ).Ho w e v er, m ultiple other Legionella species ar e ecologicall y r ele v ant, and sometimes associated with different clinical manifestations of the disease (Muder andVictor 2002 , Chambers et al. 2021 ).Legionella are ubiquitous in freshwater, and commonly detected in engineered aquatic ecosystems such as drinking water systems (Falkinham et al. 2015 ).As such, Legionella have been detected in all segments of building plumbing systems, including sho w er hoses, which often represent a fa vourable en vironment for their proliferation and transmission (Falkinham et al. 2015, Proctor et al. 2016, Gebert et al. 2018 ).In these en vironments , Legionella mainly establish in biofilms , where interactions with other organisms influence their survival and pr olifer ation (Declerc k 2010 ).Shower hoses have been pr e viousl y linked to the transmission of Legionella through inhalation of water aerosol containing the bacteria (Niculita-Hirzel et al. 2022 ), and ranked second in a relative ranking of Legionella exposure pathways from common household water uses (Hines et al. 2014 ).In a few cases, sho w ers have been directly linked to Legionellosis tr ansmission, specificall y in nursing homes, hospitals and hotels (Tobin et al. 1980, Mühlenberg 1993, Bauer et al. 2008 ), although limited information regarding showers as route of transmission is ov er all av ailable.
Interactions with their surrounding microbiota may be either favourable or detrimental for Legionella (Taylor et al. 2009, Cavallaro et al. 2022 ).For example, Legionella are able to infect a variety of eukaryotes such as amoebae , exca vates , and ciliates , escaping their degradation mechanisms and exploiting their intracellular environment for protection and replication (Boamah et al. 2017, Mondino et al. 2020 ).Ho w e v er, ther e ar e also pr otists suc h as Willaertia magna and Paramecium multimicronucleatum that have been documented to degrade Legionella under specific environmental conditions (Dey et al. 2009, Amaro et al. 2015, Boamah et al. 2017 ).Both positive and negative interactions between Legionella and other bacteria have also been described previously.For example, Legionella was demonstrated to grow in association with Cyanobacteria, as well as with Flavobacterium breve (Tison et al. 1980, Berendt 1981, Wado wsk y and Yee 1983 ).Toze and colleagues (Toze et al. 1990 ) observed that heterotrophic bacteria isolated from chlorinated drinking w ater w ere able to support the growth of some species of Legionella but inhibited others.Others have also observed inhibition zones surrounding heterotrophic bacteria colonies isolated from various w ater sour ces when co-cultured with Legionella (Guerrieri et al. 2008, Corre et al. 2018 ).A fe w r ecent studies inv estigated corr elations between Legionella and other microbiome members identified through amplicon sequencing.P ar anja pe and collea gues (P ar anja pe et al. 2020 , P ar anja pe et al. 2020 ) conducted two studies on cooling to w er water and found that the genus Pseudomonas exhibited a strong negativ e corr elation with Legionella , whic h, in turn, positiv el y correlated with the genus Brevundimonas ; the latter positive interaction has been further demonstrated in laboratory experiments with pure culture isolates.Scaturro and co-workers (Scaturro et al. 2022 ) investigated building plumbing water samples from sever al Eur opean cities and found positiv e corr elations with bacteria of the genus Nitrospira , while observing negative associations with the genera Pseudomonas and Sphingopyxis .
Despite these correlation or inhibition observations, it remains unclear to which degree results are general or site-specific, whether amplicon sequencing provides sufficient resolution for such complex ecological interactions, and whether a mechanistic understanding of why some members support and other inhibit Legionella can be deduced from statistical correlations in microbiome data.Mor eov er, these studies also focussed exclusiv el y on water samples (with noticeable exceptions , e .g. (P ania gua et al. 2020 )), which may not accurately reflect biofilm ecology (Ji et al. 2017 ), and large sample numbers are sometimes difficult to obtain in such studies.
Her e, we extr acted 85 sho w er hose biofilms collected in a multiunit residential building with known Legionella contamination that shared the same incoming water, had a similar age, and were made from the same material.We used digital droplet polymerase c hain r eaction (ddPCR) and amplicon sequencing to identify and quantify potential associations between Legionella , bacteria, and eukaryotes in biofilm samples with similar environmental exposure.

Sample collection
Samples were collected from a retirement facility in Switzerland.In total, 85 sho w er hoses were replaced and transported to the laboratory in sealed bags.All sho w ers-hoses w ere (i) from the same manufacturer, (ii) of similar oper ational a ge, and (iii) fed from the same building plumbing system.Unfortunately, no specific information on use habits (e.g.frequency, temperature, etc.) could be obtained due to privacy protection concerns.

Biofilm and DNA extraction
Biofilm extraction was performed as described pr e viousl y (Pr octor et al. 2018 ).Briefly, sho w er hoses w ere cut at both ends and filled with sterile glass beads (2 mm diameter) and filtered (0.2 μm) tap water.With both ends plugged, sho w er hoses wer e inv erted fiv e times in order to allow the beads to distribute internally.Then the hoses were placed in an ultrasonic bath (5 min).After sonication, the sho w er hoses were unplugged and the biofilm-containing-water was collected in autoclaved Schott bottles .T he entire cycle was repeated 5 times, alternating from which end the water was collected.Biofilm-containingw ater samples w er e filter ed concentr ated (0.2 μm pol ycarbonate membrane filters) and stored at −20 • C until DNA extraction.DN A extraction w as carried out using an ada pted v ersion of the DN A Po w erWater Kit (Qiagen), described b y Vosloo and colleagues (Vosloo et al. 2019 ).Briefly, the filters were fragmented with a scalpel, inserted into 2 mL tubes and treated (i) enzymatically with the addition of Lysozyme; (ii) c hemicall y with the addition of Proteinase K; (iii) mechanically using a tube shaker and the beads provided with the kit, after the addition of chloroform/isoamyl alcohol 24:1.The rest of the extraction was conducted according to the manual provided by the manufacturer of the kit.
ddPCR for Legionella spp .and Legionella pneumophila Legionella spp.(ssrA gene) and L. pneumophila (mip gene) were measured using a digital droplet polymerase chain reaction (ddPCR) duplex assay with TaqMan c hemistry.Gene tar get primers and pr obes wer e based on pr e viousl y published assays (Benitez andWinchell 2013 , Rhoads et al. 2022 ).Each 25 μL reaction contained 1XPerfeCT a Multiplex ToughMix 5X (Quantabio), 0.6 μM of ssrA and 0.4 μM of mip gene forw ar d and r e v erse primers, 0.15 μM of each probe, 100 nM Fluorescein (Sigma Aldrich), and 5 μL of DNA template.Primer and pr obe sequences, fluor ophor es utilized, master mix composition, and reaction conditions can be found in Table S1 .A ddPCR r eaction negativ e contr ol (DNAse fr ee water) was included for each batch of master mix.A ddPCR reaction positiv e contr ol (Centr e National de Référ ence des Légionelles) was included in each thermocycling run.Droplet formation and PCR thermoc ycling w er e performed using Stilla geodes and r ead using a Prism6 analyser with Crystal Reader softwar e ima ging settings pre-set and optimized for PerfeCTa multiplex master mix.Dr oplets wer e anal ysed using Crystal Miner softwar e. Onl y wells with a sufficient number of total and analysable droplets, as well as a limited number of saturated signals, were accepted according to the Crystal Miner software quality control.

ddPCR for 16S quantification
Total quantification of the 16S rRN A gene w as carried out using a ddPCR assay with an inter calating-dy e c hemistry.Eac h 25 μL reaction contained 1X PerfeCTa Multiplex ToughMix 5X (Quantabio), 1.5X Ev aGr een (Biotium), 0.8 ng/ μL AlexaFluor 488, 0.1 μM of each primer and 5 μL of the DNA template.A ddPCR r eaction negativ e contr ol (DNAse fr ee w ater) w as included for eac h batc h of master mix pr epar ed.A ddPCR r eaction positiv e contr ol was included in eac h run.The positiv e contr ol corr esponded to the 16S rRNA gene sequence (V4 region) of Escherichia coli and was synthesized in the form of gBlock by Integrated DNA technologies (IDT).Primer sequence, master mix composition, and reaction conditions can be found in Table S2 .

Amplification of 16S and 18S rRNA genes for NGS libr ary prepar a tion and illumina sequencing
For sequencing, the V4 region of the 16S rRNA gene and the V9 region of the 18S rRNA gene were amplified by PCR using the primers Bakt_515F-Bakt_805R (Ca por aso et al. 2011 ) and EUK1391F-EUK1510R (Tsao et al. 2019 ) and the DN A w as quantified by Qubit dsDNA HS Assay.Samples were diluted according to their concentration such that 0.1 to 10 ng DN A w as loaded into eac h r eaction.Two-step PCR pr otocol was used to pr epar e the sequencing library: a first amplification (target PCR) was carried out with 1X KAPA HiFi HotStart DNA pol ymer ase (Roc he), 0.3 μM of each 16S primer and 5 μL of 0.1 to 10 ng of template DNA.After amplification, the PCR products were purified with the Agencourt AMPure System (Beckman Coulter, Inc.).The second PCR (adaptor PCR) was performed with limited cycles to attach specific sequencing Nextera v2 Index adapter (Illumina).After purification, the products were quantified and checked for correct length (bp) with the High Sensitivity D1000 Scr eenTa pe system (Agilent 2200 Ta peStation).Sample concentr ation was adjusted and samples wer e subsequentl y pooled together in a libr ary at a concentr ation 4 nM.The Illumina MiSeq platform was used for pair-end 600 cycle (16S) and paired-end 300 cycle sequencing with 10% PhiX (internal standard) in the sequencing run.Negative controls (PCR grade water) and a positive control (For 16S library; self-made MOCK comm unity) wer e incor por ated.Primer sequences, master mix composition, and reaction conditions can be found in the Table S3 .Experiments and data on community composition were generated in collaboration with the Genetic Diversity Centre (GDC) of ETH Zurich.

Da ta anal ysis
16S and 18S rRNA sequencing data were processed on HPC Euler (ETHZ) using w orkflo ws established b y the GDC, ETHZ.Detailed data processing w orkflo ws are provided in the supplementary materials.For the 16S dataset, because of the low quality of the rev erse r ead, onl y the forw ar d r ead has been pr ocessed further and used for the rest of the analysis.Since the primers amplified only the V4 region of the 16S, excluding the reverse read from the analysis did not cause a loss in length of the amplicon.For the 16S dataset, all R1 reads were trimmed (based on the error estimates) by 25 nt, the primer r egion r emov ed, and quality filter ed.For the 18S dataset, all read pairs were merged, primer sites removed, and quality filter ed.Ultimatel y, using UNOISE3 (Edgar and Fl yvbjer g 2015 ), sequences were denoised with error correction and chimera r emov al and Amplicon Sequence Variants (ASV) established.In this study, the predicted biological sequences will be referred to as Zero-Radius Operational Taxonomic Units (ZOTUs).Taxonomic assignment was performed using the Silva 16S database (v128, 16S amplicon sequencing) and PR 2 (V4.12.0, 18S amplicon sequencing) in combination with the SINT AX classifier .Samples were not rarified to avoid the loss of data due to differences in the sequencing depth (McMurdie and Holmes 2014 ).Alpha-diversity (calculated with Shannon Index, which measures intra-sample diversity accounting for both richness and evenness), distance ordination and av er a ge r elativ e abundance anal ysis wer e performed using R (v ersion 4.2.1) and R studio (version 2022.07.2 + 576) using the packa ges 'ggplot2', 'micr obiomeExplor er' (thr ough a Shin y a pp, (Reeder et al. 2021 )) and the Bioconductor 'phyloseq' (version 1.42.0,(Mc-Murdie and Holmes 2013 )).SparCC correlation analysis was performed with the R pac ka ge 'SpiecEasi' and built as a network with Cytosca pe (v ersion 3.9.1).Unless otherwise specified, all pac ka ges wer e oper ated using the default settings.Random For est anal ysis was performed using Microbiome Analyst (Chong et al. 2020 ).Briefly, the samples were filtered based on a low count filter (minimum count of 4 reads with 20% pr e v alence in samples) and a low variance filter (10% removal with the variance measured using Interquartile-range), using the Microbiome Analyst default settings .T hen, the experimental variable was set ( Legionella spp. or L. pneumophila r elativ e abundance) and a Random Forest is run with 1000 tr ees, 7 pr edictors and a Randomness setting option.The contribution of eac h pr edictor to the model was calculated and reported as Mean Decrease Accuracy (MDA), which indicates how m uc h the model loses confidence if that predictor is removed from the model itself.The Out Of Bag (OOB) error was used as a validation parameter for the accuracy of the models.In general, the OOB value is a result of 1-accurac y.Furthermore, analyses w ere performed at genus le v el as well as at ZOTU le v el in order to detect associations that would not be displayed when grouping single sequences into large genus categories.All the remaining graphs were constructed with the R package 'ggplot2' (version 3.4.0).

Microbial ecology of sho w er hoses biofilms
Sho w er hose biofilms fed with non-chlorinated drinking water consist of complex and div erse pr okaryotic and eukaryotic communities (Fig. 1 ).Amplicon sequencing revealed 1518 ZO-TUs (16S, bacterial) and 1389 ZOTUs (18S, eukaryotic), respectiv el y, with 111 bacterial and 78 eukaryotic taxa classified at genus le v el.These numbers r epr esent a gross underestimation of the true diversity, since unclassified taxa at genus le v el accounted on av er a ge for 64.4 ±1.4% of the bacterial community and 85.3 ±1.2% of the eukaryotic community.Among the bacterial genera, Meiothermus spp.(3.6 ±0.6%), H16 spp.(2.5 ±0.2%) and Caulobacter spp.(3.2 ±0.6%) were the most abundant, with the top ten organisms representing 19.6% of the identified community at the genus le v el.The eukaryotic community was composed of 10 phyla, of which Amoebozoa (7.8 ±0.9%) , Excavata (3.2 ±0.4%) and Opisthokonta (4.0 ±0.6%) were most abundant.Among the eukaryotic community, an average of 78 ±1.3% of the taxa were unclassified e v en at phylum le v el (in contr ast, 0% of the bacterial community was unclassified taxa at phylum le v el).At the genus le v el, the thr ee most abundant kno wn eukary otic taxa w ere Hartmannella (4.5 ±0.9%), Limnofilila spp.(1.6 ±0.3%) and LKM74 (1.6 ±0.2%).
The distance between samples was measured using the Bray-Curtis index for both 16S and 18S data (Fig. 2 ).In both cases, the samples did not show significant clustering with respect to the limited metadata available for this study (such as the location in the building where the sample campaign took place), indicating that the ov er all micr obial composition of the samples was similar, with a few notable exceptions.Howe v er, r elative abundance of individual taxa at genus le v el shows noticeable variations among samples (Fig. S1 ).In both 16S and 18S data, 12 samples (four shared; S001, S004, S008, S101) clustered separ atel y fr om the bulk of the samples .T his was likely due to the low richness of these samples.Alpha-diversity was measured using the Shannon index (H), which evaluates both richness and e v enness of the species present.The analysis for the 16S data set shows high H values within samples, ranging between 3 and 5, with 11 samples having less than 3.The same analysis performed on the 18S dataset shows that most samples range between 3 and 4.8, while 12 samples hav e v alues below 3 ('H', Fig. S2 ).

Legionella spp. and L. pneumophila have different relati v e a bundances across samples
The genus Legionella was a predominant taxon in many samples.With an av er a ge r elativ e abundance of 0.27 ±0.02%, it r anked among the 30 most abundant genera in the community .Notably , 30 unique ZOTUs associated with the genus Legionella (the highest number of ZOTUs for a genus in this dataset) were detected and the r elativ e abundance of these varied considerably across samples from the same building (Fig. S3 ).
We performed ddPCR for specific quantification of Legionella spp.and L. pneumophila , as well as for quantification of total 16S rRNA gene abundance (Fig. 3 ).Subsequently, the abundance of Legionella spp.and L. pneumophila was calculated r elativ e to the total 16S rRNA abundance, and this r elativ e abundance was compar ed acr oss samples.Legionella spp.(r elativ e abundance: 0.001% to 7.8%) was detected in all samples, while L. pneumophila (r elativ e abundance: 0 to 0.46%) was detected in 34% of samples (Fig. 3 A).No clear correlation was observed between the r elativ e abundance of Legionella spp.and L. pneumophila .Importantl y, the r elativ e abundance of Legionella spp.calculated from the 16S rRNA amplicon sequencing data and that calculated with ddPCR (Legionella spp.and L. pneumophila r elativ e to the total 16S rRNA) correlated poorly (R 2 = 0.18; Fig. 3 B).Unless  This study focused on the interactions between microbiome members and the absolute abundance of Legionella in the hoses were not quantified.Howe v er, based on the ddPCR data and the av er a ge biofilm volumes sampled, Legionella spp.wer e pr esent at v alues between a ppr oximatel y 1E + 3-1E + 7 GU/hose., whic h is in a gr eement with pr e vious studies (Proctor et al. 2018 ).Legionella pneumophila were considerably less abundant at approximate values ranging from 0 to 1E + 06 GU/hose.

Associa tions betw een Legionella spp . and members of the community
Random forest analysis identified combinations of bacterial and eukaryotic taxa that predicted the presence of high and low relative abundance of Legionella spp., but the choice of the model threshold affected the prediction (Fig. 4 , Figs S5 and S6 ).Using a median r elativ e abundance thr eshold of 0.347% (equalling 42 samples with 'high' r elativ e abundance and 43 samples with 'low' r elativ e abundance), the model (at genus le v el) sho w ed an OOB error of 0.275 for the 16S dataset (Fig. 4 A) and 0.43 for the 18S Ho w e v er, selection of the criteria, such as the range of relative abundances to analyse for the model, is critical.When performing the analysis only using the samples at the extremes (15 samples with highest r elativ e abundance; 15 samples with lo w est r elativ e abundance), the model returned an OOB error of 0.225 (16S) and 0.567 (18S) at the genus le v el and 0.2 (16S) and 0.5 (18S) at the ZOTU le v el (Fig. 4 A-D).For the bacterial comm unity, SM1A02 and Chthonomonas, consistently with the previous model, predict high Legionella abundance with MDAs of 0.0175, as did ZOTU193 (g_ Legionella , MDA 0.01), ZOTU130 (c_ Acidobacter subgroup , MDA 0.007) and ZOTU198 (g_ Coxiella , MDA 0.005).Aquabacterium was the only trait that predicted low Legionella abundance (MDA 0.15.;Among the eukaryotic predictors, the genera Parav annella and LKM74_lineage_X consistentl y with the models described abo ve , predict respectively high and low Legionella spp.abundance (MD A 0.0175; MD A 0.012).Thr ee pr edictors of low Legionella spp.abundance are detected at the ZOTU level (ZOTU64; ZO TU97; ZO TU109, unclassified sequences).ZO TU96 (unclassified) was the stronger predictor of high Legionella spp.relative abundance, with an MDA of 0.004.We furthermore tested the models decreasing the thresholds between samples with high and low Legionella r elativ e abundance to 0.2% (27 samples with low relative abundance; 58 samples with high r elativ e abundance) and 0.1% (17 samples with low r elativ e abundance; 68 samples with high r elativ e abundance), whic h gener ated gr oups with une v en sample distribution and displayed different associations (Figs S5 and S6 ).

Associa tions betw een L. pneumophila and members of the biofilm microbiome
We also identified predictors of L. pneumophila relative abundance using Random Forest analysis.Using the median value (0.002%, Fig. 3 ) as a threshold for samples with high and low r elativ e abundance of L. pneumophila , the model performed with an OOB of 0.4 (16S) and 0.529 (18S) at genus le v el, and with OOB of 0.352 (16S) and 0.435 (18S) at ZOTU le v el.In contr ast to the observations for Legionella spp.(above), the analysis sho w ed more predictors of low abundance (eight at genus le v el; se v en at ZOTU le v el).Rhodobacter (MDA of 0.014) and ZOTU57 (c_TK10, MDA 0.017) predicted low L. pneumophila relative abundance, while ZOTU52 (g_ Pedomicrobium ; MDA 0.030) was the most important predictor of high r elativ e abundance.Onl y 10 featur es wer e detected with thr ee negativ e associations, of whic h LKM74_lineage_X r esulted as the most important (MDA 0.007).At the ZOTU le v el for the 18S dataset, the first three predictors are all associated with high L. pneumophila numbers (ZO TU342; ZO TU181; ZO TU168, unclassified sequences), while six ZOTUs predicted low L. pneumophila abundance , of which ZO TU230 (unclassified) had the highest importance.
We furthermore performed the analysis using only the extreme samples (25 samples highest abundance, 25 samples lowest abundance; Fig. 3 ).Here, the model performed with an OOB of 0.38 and 0.5 at the genus le v el, and 0.32 (16S) and 0.46 (18S) at the ZOTU le v el.For the 16S data, the r esults wer e similar to the pr e viousl y described model, with mor e pr edictors of low L. pneumophila r elativ e abundance (10).Inter estingl y, the model was not able to associate any predictors to the abundance of L. pneumophila for eukaryotes using these model parameters.At the ZOTU le v el, the bacterial ZOTU869 (g_ Legionella ) had the strongest importance value and an association with high L. pneumophila abundance.In gener al, mor e bacterial ZOTUs (11) predicted high L. pneumophila abundance in this model.For the eukaryotic sequences, the first three important ZOTUs predict low L. pneumophila abundance (ZO TU134; ZO TU48; ZO TU64, unclassified sequences).

Legionella ZOTUs correlations in a SparCC netw ork anal ysis
SparCC anal ysis r e v ealed se v er al corr elations between specific Legionella ZOTUs and other taxa, using only the 16S amplicon sequencing data.Figure 6 shows only significant correlation coefficients with a threshold of ±0.3, with a minimum of −0.36 and a maximum of 0.63.Since not all ZOTUs are assigned to the same le v el of taxonomy, the lo w est le v el av ailable was used to describe our observations.Of the 30 sequences associated to the Legionella genus , 18 ZO TUs did not displa y an y significant corr elation with other microbiome members above the threshold.The remaining 12 ZOTUs correlated in different ways with other sequences.For example , ZO TU193 was the most pr e v alent sequence assigned to the genus Legionella in the dataset (Fig. S3 ) and was also the one displaying the most correlations (67), the majority of which wer e positiv e. Onl y four negativ e corr elations hav e been detected, with ZO TU104 (g_ Ferribacterium ), ZO TU765 (f_ Obscuribacterales ), ZO TU156 (g_ Lacibacter ), and ZO TU1309 (g_ Rhodoplanes ).T he str ongest positiv e corr elation (0.63 corr elation coefficient) is with ZOTU33 (g_ H16) .
The analysis also sho w ed that Legionella ZOTUs associated in different ways with the same taxa.For example, while the Legionella ZOTU193 positiv el y corr elated with ZOTU135 (o_ Obscuribacterales ) and ZOTU196 (f_ Rhodobiaceae ) while Legionella ZO TU1258 displa yed negativ e corr elations with these .T he same phenomenon was observed for ZOTU55 (g_ SM1A02 , positively correlates with Legionella ZOTU193, negatively correlates with Legionella ZOTU1483) and for ZOTUs 323 and 37, (f_ Planctomycetaceae ; g_ P edomicrobium ; positiv el y corr elate with Legionella ZOTU193 and negativ el y corr elate with Legionella ZO TU1379).ZO TU996 mostly displayed negativ e corr elations (nine, r anging fr om −0.3 to −3.6): the only positive correlation (coefficient of 0.322) was with ZO TU817 (g_ Bdellovibrio ), while ZO TU78 (g_ Nitrospira ), ZO TU347 (g_ H16 ) were the ones with the highest negativ e corr elation coefficients.In only two cases, two Legionella ZOTUs correlated with each other, in both cases positively: ZOTU735 and ZOTU640 corr elated with eac h other (corr elation coefficient 0.3), as well as ZOTU462 and ZOTU241 (correlation coefficient 0.31).The genus Chthonomonas w as alw ays associated with high Legionella spp.abundance and positiv el y corr elated with the Legionella ZOTU801 (correlation coefficient 0.3).

Discussion
Comm unity anal ysis of 85 sho w er hose samples r e v ealed a highl y rich bacterial and eukaryotic composition (Fig. 1 ) that was relativ el y consistent among samples (Fig. 2 ).Ho w e v er, Legionella spp.and L. pneumophila r elativ e abundance v aried consider abl y among samples (Fig. 3 ).Various Random Forest analyses demonstrated that se v er al bacterial and eukaryotic taxa were associated with either high or low r elativ e abundance of Legionella spp.and respectiv el y L. pneumophila (Figs 4 -5 ).Mor eov er, SparCC corr elation analysis sho w ed ho w individual Legionella ZOTUs correlated with specific microbiome members (Fig. 6 ).

Microbial ecology in the final meters of water distribution
Sho w er hoses biofilms are inhabited by diverse prokaryotic and eukaryotic communities .T hree main factors promote microbial gr owth her e: (i) temper atur e, whic h is often warmer in the last meters; (ii) flexible materials that leach organic carbon; (iii) usage patterns that often create extensive stagnation periods (Proctor et al. 2016 ).This makes sho w er hoses ideal envir onmental nic hes where biofilms form and microorganisms can proliferate.Differences in the above-mentioned factors also means that there is consider able div ersity between differ ent sho w er biofilm microbiomes (Proctor et al. 2016, Gebert et al. 2018 ).To mitigate the uncertainty caused by suc h differ ences, we collected all hoses on pur pose fr om a single building, whic h shar ed the same incoming water, had a similar age, and were made from the same material.The sho w er hose biofilms sho w ed general similarity in terms of common diversity indices, with dominant taxa similar to those identified from previous studies of non-chlorinated drinking water systems (Liu et al. 2014, Proctor et al. 2018, Neu et al. 2019 ).T he a v er a ge r elativ e abundance of the micr oor ganisms is highl y variable among biofilm samples (Fig. S1 ), as well as the r elativ e abundance of Legionella spp.and L. pneumophila, and Legionellaassociated ZOTUs (Fig. 3 , Fig. S3 ).While the reasons behind this v ariability ar e not clear, the data provided interesting information: (i) Legionella spp.r elativ e abundance r eac hes v ery high v alues in sho w er hoses biofilms (7.8% in our samples); (ii) the higher r elativ e abundance of Legionella spp.compared to L. pneumophila , as well as the 30 ZOTUs classified as Legionella , suggest multiple Legionella species inhabiting individual sho w er hose biofilms.Despite the lack of available studies, the presence of multiple Legionella species with different abundances has been detected in different water systems, although poorly described in sho w er hoses to our knowledge (Lesnik et al. 2016, Per eir a et al. 2017, Logan-Jac kson and Rose 2021, Salinas et al. 2021, Gleason et al. 2023 ).
Man y studies cov er pr okary otic drinking w ater biofilms, but information on eukaryotes in these systems is still se v er el y limited.Studies focussing on eukaryotes detected amongst others the fungal sub-phylum LKM11 and the amoeba clade LKM74 in biofilm samples of different DWDSs, together with other organisms like Streptophyta , Ciliates and Algae, Opisthokonta, Stramenopiles and Alveolata (Inkinen et al. 2019, Soler et al. 2021 ).We also detected these taxa, but, importantly, as well as se v er al taxa that are known carriers of opportunistic pathogens, such as Vannella spp., Hartmannella spp ., Acanthamoeba spp .and Echinamoeba spp .(Inkinen et al. 2019 ).Ho w e v er, the r ele v ant eukaryotic comm unity is likel y under-r epr esented with curr ent 18S amplicon sequencing tools.For example, 18S primers target all eukaryotes, which increases the chances of detection of undesired species/contaminations.In response, specific databases have been developed for certain fractions of eukaryotes of interest to the exclusion of others (Guillou et al. 2013 ).A further major dr awbac k is that the taxonomic identification of eukaryotes is still limited and this calls for impr ov ements in all aspects of eukaryotes ecology r esearc h, fr om tar geted sample collection to dedicated analysis pipelines (Gabrielli et al. 2023 ).
All this information together highlight ho w sho w er hoses biofilms are rich en vironments , where interactions happen from species to kingdom le v el (Sadiq et al. 2022 ).As shown, Legionella spp.are members of these en vironments , and therefore subjected to different ecological interactions.

Legionella abundance
Microbes in biofilms exist in close pr oximity, thus cr eating opportunities to establish positive and/or negative interactions with each other.Random Forest analysis sho w ed a prevalence of positive associations between the resident microbiomes and Legionella spp ./ L. pneumophila (Figs 4 A-D and 5 A-D).Positive bacterial interactions include promoting the growth of other bacteria by increas-Figure 6. SparCC analysis constructed as a network using the 16S amplicon sequencing dataset, subsetted for the ZOTUs associated with Legionella taxonomy.To help inter pr etation, the nodes corresponding to the Legionella ZOTUs are coloured in dark blue, while the edges are in light blue indicating positive correlations and orange for negative correlation coefficients.In the far left are all the Legionella ZOTUs that do not show any significant correlation (above 0.3 correlation coefficient).ing general nutrient availability, creating new niches or directly exchanging important nutrients (e.g.cross feeding of amino acids) (D 'Souza et al. 2018, Kehe et al. 2021 ).In the context of Legionella gr owth, suc h beneficial inter actions can play a decisiv e r ole: Legionella are auxotrophs for at least seven amino acids (Tesh andMiller 1981 , Chien et al. 2004 ).
In contr ast, negativ e associations between bacteria and Legionella can include direct (interference) or indirect (exploitative) competition (Granato et al. 2019, Cavallaro et al. 2022 ).Our results suggest a negative association between Legionella spp.and only two genera, namely Aquabacterium and Sphingopyxis (Fig. 4 ).It is not possible to exactly establish the nature of these interactions using only amplicon sequencing data.While these two taxa are not known to be producers of secondary metabolites with antimicrobial acti vity, the y ar e c har acterized by a highl y v ersatile metabolism, whic h pr ovides adv anta ges with r espect to surviving to unfavourable environmental conditions (Sharma et al. 2021 ).This can in theory give them an adv anta ge in terms of competition for resources in low-nutrient envir onments compar ed to other micr oor ganisms, especiall y the ones with specific and high nutrient r equir ements, like Legionella .
A fe w pr e vious studies have explored associations between Legionella and the resident microbiome in different systems.In two studies focusing on cooling towers, P ar anja pe and collea gues found bacteria associated with the presence or absence of Legionella spp.and L. pneumophila (P ar anja pe et al. 2020 , P ar anja pe et al. 2020 ).Pseudomonas was found as the main genus correlating with the absence of Legionella , as already reported elsewhere (Leoni et al. 2001, Proctor et al. 2018 ).Inter estingl y, our study did not show a str ong negativ e association between Pseudomonas and Legionella .In addition, P ar anja pe and co-workers r eported Sphin-gopyxis as a positiv e corr elating genus, in contradiction with our r esults (Fig. 4 ).A r ecent inv estigation on plumbing systems acr oss four cities of Europe revealed associations between culturable Legionella and the microbial communities detected in the water samples (Scaturro et al. 2022 ).While reinforcing the negative association between Legionella and Pseudomonas , the authors also described a negative association with the genera Staphylococcus and, consistently with our analysis , Sphingop yxis .Any specific associations between the genera detected in this study and Legionella were, to our knowledge, not pr e viousl y described.Notabl y, two genera among the ones associated with Legionella (i.e.Meiothermus and Chthonomonas ) are thermophilic (Lee et al. 2011, Ho et al. 2016 ), suggesting the possibility that some of these associations are dictated by environmental preferences (i.e.shared niches) rather than actual direct interactions.Ho w ever, Chthonomonas spp. is also known for establishing m utualistic r elationship with other or ganisms, specificall y in biofilms (Stott et al. 2018 ).
Associations between bacterial communities and culturable L. pneumophila in hot and cold water were also investigated in a study conducted on high rise buildings in New York City (Ma et al. 2020 ).In that study, associations were described at higher taxonomic le v els and negativ el y linked L. pneumophila and members of the classes Bacteroidia and Solibacterales , while positive associations were detected with Chloracidobacteria , Gemm-1, and Actinobacteria , among others .T he fact that disa gr eement exists between the various studies depends on different factors.For example, pr e vious studies analysed exclusively the water phase, in contrast with our study that focused on the biofilm phase .Moreo ver, the taxonomic classification at the genus (sometimes class) le v el that was used in all of these studies does not allow for specific identification of the organisms involved, and different species within the same genus might exhibit different associations with the same or ganism.We explor ed the latter point by looking at the corr elations that specific Legionella ZOTUs exhibit in a SparCC analysis (Fig. 6 ; discussed below).The real ecological meaning of these associations is not clear, as correlation can ob viousl y not be linked automatically to causation.Many microbial ecology studies aim to find associations between two species that reflect their ecology in the environment (Carr et al. 2019 ).Ho w e v er, inter actions between species often occur in high-order combinations, where an interaction betw een tw o species is modulated by a third one (Bairey et al. 2016 ).T herefore , associations detected by statistical analysis may not necessarily describe direct interactions between micr oor ganisms.It is also possible that two or ganisms ar e correlating with a common environmental variable rather than eac h other.Ne v ertheless, these observ ational studies cr eate good opportunities to further study bacterial interactions using labor atory a ppr oac hes .For example , P ar anja pe et al. (P ar anja pe et al. 2020 ) isolated a strain of Brevundimonas and confirmed the positive effect of the strain to w ar ds the gro wth of L. pneumophila previously observ ed thr ough statistical anal ysis.

Some eukaryotic taxa are also associated with Legionella abundance
Random forest analysis performed on the eukaryotic data set rev ealed mor e negativ e associations with Legionella abundance compared to the bacterial data.Unravelling the meaning of these associations is challenging, as this is likely linked to the intracellular lifestyle of Legionella .Interactions between protists and Legionella can, in fact, hav e v ery differ ent outcomes .For example , Legionella are taken up by their host as a potential food source, but escape the host's degradation mechanisms and start the re plicati ve phase by establishing in the Legionella -containing-vacuole (LCV) (Boamah et al. 2017 ).When this happens, the protist hosts are often killed (thus resulting in increased Legionella abundance and decreased host abundance).Alternatively, some protists effectiv el y gr aze on Legionella as a food source and are resistant to their escape mechanisms, resulting in decreased Legionella abundance and increased host abundance (Dey et al. 2009, Amaro et al. 2015, Boamah et al. 2017, Mondino et al. 2020 ).P ar anja pe and co-workers inter pr eted a positiv e association between Legionella and Oligohymenophorea as an indication of host-prey relationship in which Legionella is able to replicate (potentially without causing the death of the host) (P ar anja pe et al. 2020 ).To add to the complexity, it has been reported that the temper atur e and the abundance of Legionella can change the fate of host-prey relationships, which can be either digested, packaged into vesicles and secreted in pellets, or survive intracellularly with no replication (Boamah et al. 2017 ).
Different Legionella species can also be associated with different outcomes with the same host (Boamah et al. 2017 ).Our dataset identified se v er al associations with protists proposed as Legionella hosts.Vannella spp., Echinamoeba spp.and Hartmannella spp.(now Vermamoeba ) predicted the low abundance of Legionella in the biofilm samples.While Vannella has been pr e viousl y described as able to kill Legionella (Rowbotham 1986 ), Vermamoeba is often r eferr ed to as a permissive host for Legionella , although cases of intr acellular surviv al hav e been r eported (Buse and Ashbolt 2011 ).To our knowledge, no information on how the interaction Echinamoeba -Legionella takes place is a vailable .Additional negative associations include Novel-Gran-6 , which are members of the phylum Cercozoa , described as particularly prone to digest Legionella ( Boamah et al. 2017 ).The genera Vrihiamoeba and Protacan-thamoeba predicted, on the other hand, high Legionella abundance.No clear interaction between these taxa and Legionella has to our knowledge been described, but both are known to prey on bacteria (Delafont et al. 2014, Ma et al. 2016 ).Inter estingl y, Vrihiamoeba is a predictor of low L. pneumophila abundance, indicating that the same interaction could produce a different outcome depending on the specific Legionella species involved.
Associations are often interactions that are happening at small taxonomic scale A limitation of community analysis performed through 16S amplicon sequencing is that genus-based generalization of the interactions is usually made, while species dynamics is not considered.In r esponse, we explor ed the div ersity and associations of Legionella ZOTUs (Fig. 6 ; Fig. S3 ).Pr e vious studies discussed the possibility to identify species of Legionella using only a partial sequence of the 16S rRNA gene (Wilson et al. 2007, Ma et al. 2020 ).If correct, this offers an opportunity to study ecological dynamics of the genus Legionella at species le v el in environmental samples.
While this evidence combined suggest that different species of Legionella inhabit building plumbing biofilms, the Random Forest analysis used either the entire genus or specifically L. pneumophila datasets as external variables for the predictions .To challenge this further, we focused on the correlations of the individual Legionella ZO TUs , which suggested that the 30 sequences display markedl y differ ent associations in the complex ecological network r epr esented by these biofilms.A first observation of this network, consistently with the Random Forest data, highlights a pr e v alence of positiv e associations between Legionella and the r esident micr obiome.In the specific case of this study, different Legionella sequences have dissimilar correlation patterns, and correlations between individual Legionella ZOTUs are rare .T his suggests the possibility that Legionella species live within different and defined ecological niches in biofilms, and that they interact/associate with different microorganisms (Rottjers and Faust 2018 ).T his is , from an ecological point of view, an important aspect as more than 25 species within the genus Legionella are considered pathogenic and we believe that studying the ecology of non-pneumophila species is as important.
Ho w e v er, these observ ations should be r ead in the light of an unequal distribution of Legionella across biofilm samples .T he approximate Legionella abundances ranged between 1E + 3 and 1E + 7 GU/hose, suggesting considerable but variable colonisation of the hoses .Moreo ver, the distribution of the ZOTUs across samples was also une v en, with sequences pr esent in 2 to 81 biofilm samples (Fig. S4 ).Hence, it cannot be excluded that stochastic effects influenced which Legionella ZOTUs colonised specific samples.

Critical consider a tions of the methods used to detect associations
The associations identified in this study display similarities, but also differences with other studies in the field (discussed above).This is influenced by a number of factors, which are not always clearly highlighted in similar studies: (1) Legionella quantification : Legionella r elativ e abundance deriv ed fr om 16S rRNA ddPCR and Legionella -specific ddPCR data did not corr elate particularl y well with r elativ e abundance derived from 16S amplicon sequencing data (Fig. 3 B).While both carry valuable information, the two a ppr oac hes hav e intrinsic differ ences; amplicon sequencing can be used as a quantitative measure, but it is always r elativ e to the total reads.Amplification through ddPCR, on the other hand, allows absolute quantification of a specific tar get expr essed in gene copies/ μL.In an ecological context, a combination of both a ppr oac hes is pr obabl y the best way to get as many information as possible and carry associations-based-anal ysis.Mor eov er, both these methods would amplify and detect non-viable Legionella , of which the importance in ecological-association studies is questionable.Pr e vious studies with a similar focus used cultivation-based quantification methods (Ma et al. 2020, Scaturro et al. 2022 ), which also has known bias towards L. pneumophila and is limited by only analysing cells that grow on agar plates (Toplitsch et al. 2021 ).Clearly, the method of Legionella quantification would consider abl y affect the outcome of any correlation/association analysis.
(2) Tools used to determine associations : Changing the threshold on the r elativ e abundance of Legionella (high Legionella vs low Legionella ) affected the outcome of the Random Forest models (Fig. S6 ).This is partially due to the OOB errors of the models, whic h ar e especiall y higher for the pr ediction of low Legionella abundance.From a biological point of view, this is an important consider ation: c hoosing the median r elativ e abundance value as the threshold is statistically stronger as all the samples are part of the Random Forest analysis.Focusing only on the samples at the extremes w ould, ho w e v er, in theory define more relevant predictors of the presence/absence of Legionella .
In addition, different statistical tools (i.e.SparCC and Random For est anal ysis) pr edictabl y did not pr oduce the same r esult but have complementary value .T he Random Forest analysis identifies general predictors of the abundance of Legionella spp.and L. pneumophila in a given environment; SparCC analysis provides us with evidence on the separate ecological niches that different Legionella species occupy.In the former, the cause for high OOB values indicates the model is less accurate than desired while the latter highlights that associations of micr oor ganisms ideally need to be resolved at a smaller taxonomic and spatial scale .Other studies ha v e used linear r egr ession or LefSe anal ysis to link Legionella to other micr oor ganisms (P ar anja pe et al. 2020 , P ar anja pe et al. 2020 ).Our methodology choice can be justified by the data collected in our study: targeted ddPCR allows us in fact to use the r elativ e abundance of Legionella as an external variable that can be predicted by the amplicon sequencing results (v alidated mor eov er by the fact that Legionella is displayed as a predictor of high Legionella abundance).While not advocating for certain methods over others, it is obvious that the choice of method would affect the result.An ideal situation would be standardizing methods and protocols across different studies, and reporting as much metadata as possible, as to gain generalizable and comparable insights into Legionella ecology and microbial associations in diverse en vironments .
(3) Biofilm vs water phase: In our study, the biofilm phase was specifically collected and analysed, compared to other inv estigations wher e the water phase was analysed (Ma et al. 2020, P ar anja pe et al. 2020, Scaturr o et al. 2022 ).While noting the challenges in biofilm sampling, we do belie v e that this choice is critical for the determination of associations between micr oor ganisms .T he microbial composition of biofilm and water is different and not necessarily reflective of each other (Chan et al. 2019 ), and therefore associations detected in the two phases are likely different.Also, Legionella ar e clearl y r ecognised as biofilm-associated bacteria (Declerck 2010 ), hence, associations within this spatial niche should be most relevant.Finally, the bacterial composition of established biofilms bacterial composition is more stable in time, while water is a more dynamic system (Inkinen et al. 2014 ).

Function vs taxonomy
A strong limitation of conventional amplicon sequencing is that only taxonomic information is obtained after processing, and it is often not possible to r esolv e the dataset at species le v el.This does not allow a good ov ervie w of the species dynamics in given samples, while making it also nearly impossible to look at specific functions shaping the ecology of the environments (Lajoie and Kembel 2019 ).Shotgun metagenomics may be one alternativ e a ppr oac h (Eisen 2007, Per ez-Cobas et al. 2020 ).The advantages of such a method compared to amplicon sequencing are a finer taxonomic resolution (it is possible in some cases to reconstruct entire genomes) and, more importantly, the detection of genes with their abundances) associated with specific functions .T his would benefit investigations on Legionella ecology and interactions with other organisms as well.Our study identified se v er al or ganisms corr elating with/pr edicting the abundance of Legionella and we explor ed potentiall y explanations for these inter actions.Using shotgun meta genomics in these studies would, inform on the specific functions associated with the abundance of Legionella , which would in turn help creating a more detailed ov ervie w on what c har acterizes an envir onment as favourable or undesirable for Legionella proliferation, being moreover able to link the genes to the organisms.We previously proposed potential pr obiotics a ppr oac hes a gainst Legionella in engineer ed aquatic ecosystems (Cav allar o et al. 2022 ), and r e vie wed studies that identified antagonistic behaviours of certain micr oor ganisms .T he use of shotgun metagenomics in observational studies can be a useful resource in this sense, as the identification of specific functions that make the environment undesirable for Legionella can be searched for in specific microorganisms to use in such an appr oac h.

Conclusions
r Non-chlorinated sho w er hose biofilms from the same build- ing plumbing system are inhabited by diverse bacterial and eukaryotic communities, including widely varying relative abundances of Legionella .
r Eukaryotic communities are important for a complete under- standing of the ecology of Legionella in plumbing systems, but a considerable expansion of 18S databases is crucial for the identification of k e y organisms and associations.
r A combination of bacterial and eukaryotic organisms sta- tisticall y pr edicted the abundance of Legionella spp.(and L. pneumophila ), but the direct ecological meaning of specific associations is ambiguous .T he low predicting quality of the models, ho w e v er, needs to be considered when putting the associations in a biological context.
r Analysing the associations looking at single amplicon se- quence variants (rather than gen us) gi ves an overview of potential interactions at species level, and a deeper ecological understanding of Legionella species dynamics.
r Moving to w ar ds function-based a ppr oac hes (i.e.meta ge- nomics of drinking water biofilms) will provide a better understanding of the mechanisms that affect Legionella abundance.This will in turn allow the use of specific known functions as predictors of Legionella abundance.

Figure 1 .
Figure 1.Ov ervie w of the av er a ge micr obial composition of the 85 sho w er hose biofilms collected fr om a single building for this study.The figur e shows av er a ge 16S and 18S data (n = 85) for ZO TUs abo v e 0.01% r elativ e abundance.Size of the nodes depicts the r elativ e abundance of the corresponding taxa, while the connectors show the connection to the lower taxonomic le v el (fr om phylum to genus).Nodes ar e shown until the last le v el that is taxonomically assigned.Unclassified taxa (not shown) at genus level account for 64.4 ±1.4% of the bacterial community and 85.3 ±1.2% of the eukaryotic community.

F igure 2 .
Or dination analysis of the (A) 16S and (B) 18S rRNA amplicon sequencing data from 85 sho w er hose biofilm samples, stress = 0.189 The gr a phs show distance among samples, calculated using the Bray-Curtis method and plotted on NMDS gr a phs.stated otherwise, the following analysis uses the targeted ddPCR data.

Figure 3 .
Figure 3. (A) Relative abundance of Legionella spp.(green bars) and Legionella pneumophila (orange bars) measured with ddPCR.Absolute quantification of Legionella spp.and L. pneumophila was measured with a duplex ddPCR assay, and concentrations were then normalized relative to the total 16S rRNA gene concentrations .T he inla y figure displa ys the linear corr elation (r = 0.5) between the r elativ e abundance of Legionella spp.and L. pneumophila among samples (RA = Relative Abundance); (B) Linear correlation plot of the relative abundance of Legionella spp.calculated both with ddPCR and sequencing data.

Figure 4 .
Figure 4. Random Forest analysis of the 16S and 18S datasets using the relative abundance of Legionella spp. as variable.Red represents predictors of low Legionella abundance while blue and green represent predictors associated with high abundance for bacteria and eukaryotes, respectively.A-B : Main bacteria (genus le v el, A; ZOTUs le v el, B) pr edicting Legionella spp.abundance with a thr eshold of 0.3%; C-D : Main bacteria (genus le v el, C; ZOTU le v el, D) pr edicting Legionella spp.abundance using the samples at the extremes.E-F : Main eukaryotes (genus le v el, E; ZOTU, F) pr edicting Legionella spp.abundance using 0.3% as threshold; G-H : Main eukaryotes (genus level, G; ZOTU, H) predicting Legionella spp.abundance using the samples at the extremes.

Figure 5 .
Figure5.Random Forest analysis of 16S and 18S data sets using the relative abundance of Legionella pneumophila as variable.Red represents predictors of low Legionella abundance while blue and green represent predictors associated with high abundance for bacteria and eukaryotes respectively.A-B : Main bacteria (genus le v el, A; ZOTUs le v el, B) pr edicting L. pneumophila abundance with a thr eshold of 0.02% r elativ e abundance; C-D : Main bacteria (genus le v el, C; ZOTU le v el, D) pr edicting L. pneumophila abundance using the samples at the extr emes.E-F : Main eukaryotes (genus le v el, E; ZOTU, F) predicting L. pneumophila abundance using 0.02% relative abundance as threshold; G : Main eukaryotes (ZOTU, G) predicting L. pneumophila abundance using the samples at the extremes.