Have genetic targets for faecal pollution diagnostics and source tracking revolutionized water quality analysis yet?

Abstract The impacts of nucleic acid-based methods - such as PCR and sequencing - to detect and analyze indicators, genetic markers or molecular signatures of microbial faecal pollution in health-related water quality research were assessed by rigorous literature analysis. A wide range of application areas and study designs has been identified since the first application more than 30 years ago (>1100 publications). Given the consistency of methods and assessment types, we suggest defining this emerging part of science as a new discipline: genetic faecal pollution diagnostics (GFPD) in health-related microbial water quality analysis. Undoubtedly, GFPD has already revolutionized faecal pollution detection (i.e., traditional or alternative general faecal indicator/marker analysis) and microbial source tracking (i.e., host-associated faecal indicator/marker analysis), the current core applications. GFPD is also expanding to many other research areas, including infection and health risk assessment, evaluation of microbial water treatment, and support of wastewater surveillance. In addition, storage of DNA extracts allows for biobanking, which opens up new perspectives. The tools of GFPD can be combined with cultivation-based standardized faecal indicator enumeration, pathogen detection, and various environmental data types, in an integrated data analysis approach. This comprehensive meta-analysis provides the scientific status quo of this field, including trend analyses and literature statistics, outlining identified application areas, and discusses the benefits and challenges of nucleic acid-based analysis in GFPD.


Introduction
Safe drinking water, sanitation, and hygiene (WASH) are prerequisites to good health and well-being. Despite considerable global pr ogr ess in r ecent decades, ∼829 000 people still die each year from diarrheal disease, primarily through faecal-oral pathwa ys , due to unsafe WASH practices (World Health Organisation 2019 ). While there is clear evidence that safely managed water resour ces, w ater supply, and adequate sanitation reduce the health risks related to water exposure and consumption (drinking, recreational activities, household exposure as well as transmission thr ough irrigation, aquacultur e, and so on), there is a constant, urgent need for more comprehensi ve, informati ve, and rapid microbiological assessment a ppr oac hes to elucidate intricate WASHrelated questions and to clarify complex faecal contamination issues.
For well over 100 years, faecal pollution assessment through the microbiological analysis of water has relied on the cultivationbased detection of facultativ e anaer obic bacterial colonizers of the animal and human gut, e.g. total coliforms, faecal coliforms, Esc heric hia coli , and intestinal enter ococci. Recent adv ances in nucleic acid sequencing methods and bioinformatics have revealed the immense richness and diversity of the gut microbiota, opening unprecedented possibilities to develop new microbiological assessment a ppr oac hes. Giv en the gr eat div ersity of assessment types made possible by genetic detection and analysis methods, we introduce the new term ' genetic faecal pollution diagnostics (GFPD) ' to cover the entirety of this field, wherein 'genetic' means 'nucleic acid-based'. For terms and definitions, please refer to the 'Glossary'.
Gut micr obiotas ar e pr ofoundl y differ ent fr om fr ee-living micr obial comm unities (e.g. Chen et al. 2018 ) acr oss the biospher e (Ley et al. 2008 ). The Human Microbiome Project revealed Bacteroidota and Firmicutes to be the dominant phyla in the human gut, with substantial variability among individuals (The Human Micr obiome Pr oject Consortium 2012 ). The micr obiome of m unicipal w astew ater pr ovides a comm unity finger print that ca ptur es this diversity, with significantly lo w er community-level variability compared to individuals (Newton et al. 2015 ). In addition to faecal taxa, the w astew ater microbiome also harbours a large proportion of w astew ater infrastructure-related microorganisms (Shanks et al. 2013 ). The within-species variability in the human gut proves to be minor in comparison to the stark differences among other animal species, where both host phylogeny and diet are k e y dri vers (Ley et al. 2008, Youngblut et al. 2019, Mallott and Amato 2021, Youngblut et al. 2021. In addition to the pr okaryotic comm unity, the gut also harbours a great diversity of viruses (bacteriophages, viruses of archaea, and of human cells as well as viruses transientl y pr esent in food; Liang and Bushman 2021 ). Nov el molecular biological and genetic tools offer fascinating new ways to anal yse and tr ac k faecal micr oor ganisms or viruses in water. To date, these opportunities have only partially been exploited, and futur e r esearc h is poised to further the discov ery and impacts of the GFPD field.
The aim of this work is to assess the impacts of nucleic acidbased methods on faecal pollution detection and analysis in the field of health-related water microbiology (HRWM). For the first time, this r e vie w pr ovides a critical anal ysis of the ne w possibilities that state-of-the-art genetic methods have opened in a gr eat div ersity of a pplication ar eas . T his is accomplished via a systematic liter atur e r e vie w to identify GFPD a pplication ar eas, k e y r esearc h questions, and study designs fr om mor e than 1100 peer-r e vie wed publications, since the very beginning of using such molecular techniques in the environmental water compartment. The r e vie w focuses on genetic tar gets and par ameters that take a faecal indication r ole; ther efor e, specific pathogen detection is only included if the indicator role is explicitly stated. Furthermore, description of the various methodological developments of molecular methods and their e v aluation is outside the scope of this effort (please find a selection of methodological r e vie w articles in the section ' Background information on genetic targets and methods: a historical overview '). The outcomes of the systematic liter atur e r e vie w include tr end anal yses of r ele v ant scientific liter ature (' Outcomes of the systematic study design analysis '), follo w ed b y the analysis and discussion of se v en identified a pplication ar eas in HRWM (' In-depth review of the application areas of genetic faecal pollution diagnostic through case studies '). The r e vie w concludes with a critical discussion on the benefits and limitations of GFPD in health-related water quality research and management. Figure 1 provides an ov ervie w of this article.

Cultiv a tion-based methods for faecal pollution detection: where it all began
The first routine bacteriological analyses of drinking water were initiated by Percy and Grace Frankland in London in 1885, building on the seminal work of Robert Koch and colleagues regarding microbiological media for detecting bacteria (Koch 1881 ). Around this time, Esc heric h described the bacterium that was later renamed Esc heric hia coli , in the faeces of br east-fed c hildr en (Esc heric h 1886 , Castellani and Chalmers 1919 ). E. coli is curr entl y one of the most widely used faecal indicator organisms (FIO; see the section ' Glossary ') for water quality testing (Levine 1921, Perry and Bayliss 1936, Geldr eic h 1966, together with intestinal enterococci (Kjellander 1960 , Geldreich andKenner 1969 ) and their pha ges, suc h as somatic coliphages, and F-specific RNA bacteriopha ges (Gr abow 2001, Jofr e et al. 2016. These standardized, cultiv ation-based FIO par ameters hav e found their way into regulations all over the world and are still the gold standards for monitoring general faecal pollution in most types of water resources. While these FIOs r e volutionized water quality testing and public health protection at the end of the 19th century, they also face se v er al limitations . For example , most protocols r equir e mor e than one working day to pr oduce r esults, and these FIOs are unable to differentiate between faecal pollution sour ces (i.e. human, bir d, cattle, and so on). It must be mentioned that host-associated cultivable enteric microorganisms, such as human-associated sorbitol-fermenting bifidobacteria are known (Mar a and Or a gui 1983 , Mushi et al. 2010 ) and hav e pav ed the way for the field of microbial source tr ac king (MST; see the section ' Glossary '). Ho w e v er, adv ances in molecular biology offer ed an unpr ecedented r ange of ne w opportunities to de v elop genetic technologies that can provide same-day water quality results and c har acterize k e y sources of faecal pollution.

T he earl y days of genetic methods for faecal pollution diagnostics
Faecal indicator bacteria often show tremendous genotypic subspecies variation. MST studies in the early 2000s intensively attempted to exploit this str ain-le v el div ersity by genetic fingerprinting and -typing methods (e.g. re petiti ve element PCR, ribotyping, amplified fr a gment length pol ymor phism, and pulsed-field gel electr ophor esis) to tr ac k the origin of E. coli and enter ococci iso- lates (Mott and Smith 2011 ). Large isolate libraries , co vering faecal pollution sources and polluted water bodies in a given catchment of inter est, wer e typed. Band/finger print-patterns wer e statisticall y anal ysed to account for the high spatial and temporal variation ( classical library-based MST ; Domingo et al. 2007 , Mott and Smith 2011 ). Suc h libr ary-based genotyping str ategies wer e also used to e v aluate the gener al faecal indication ca pacity of faecal indicator bacteria (Ishii et al. 2006 , Ishii andSado wsk y 2008 ).

Detection and quantification of genetic markers for faecal pollution diagnostics
Genetic c har acterization has led to the identification of k e y genes associated with a specific host, whic h r epr esents a significant source of pollution (Bernhard and Field 2000 ). With the advent of conventional end-point PCR in the 1990s, the first studies appear ed on tar geted detection of gener al and host-associated genetic bacterial and viral targets for water quality monitoring (Bej et al. 1990, Puig et al. 1994, Bernhard and Field 2000, r e vie wed in Scott et al. ( 2002 ) and Noble and Weisberg ( 2005 ), which were later adapted to quantitative real-time PCR (qPCR; Seurinck et al. 2005 ).
The use of conventional PCR for target quantification has many limitations . T hus , qPCR appeared in the field of GFPD in the early 2000s and became the most widespr ead cultiv ationindependent technology (Jofre and Blanch 2010 ). Today, there are numerous qPCR assays for a wide variety of bacterial and viral tar gets, suc h as enter ococci (USEPA 2012a(USEPA , 2013, E. coli (Siv a ganesan et al. 2019 ), human-and other animal-associated bacterial markers [original works: (Reischer et al. 2006, Shanks et al. 2008, Mieszkin et al. 2009 ), large-scale evaluations: , Reischer et al. 2013, Mayer et al. 2018, and r e vie ws: (Wuertz et al. 2011, García-Aljaro et al. 2018], viral MST markers including crAsspha ge [(García-Aljar o et al. 2017, Stac hler et al. 2017 ) r e vie wed in Bivins et al. ( 2020 )] and pepper mild mottle virus [PMMoV, (Rosario et al. 2009 ), r e vie wed in Kitajima et al. ( 2018 ), Symonds et al. ( 2018 )] or human enter oviruses (r e vie wed in Farkas et al. ( 2020 ). Arc haeal tar gets (Ufnar et al. 2006 ) and host mitoc hondrial DNA tar gets (Martellini et al. 2005, Sc hill and Mathes 2008, Malla and Haramoto 2020 have also been proposed as hostassociated MST tools. Inter estingl y, intestinal fungi hav e not yet been targeted. A good overview of the most useful indicators and MST markers for which qPCR assays are available is provided in the online Global Water Pathogens Project (GWPP) book for bacterial  ) and viral indicators of faecal pollution (Ahmed and Harwood 2017 ) or in a recent review article (Li et al. 2021a ). Many of these methods have been subjected to multiple laboratory performance assessments and shown to be highly r epr oducible when standardized protocols are used , Shanks et al. 2016. Some human-associated qPCR assays are even available as government agency standardized protocols (USEPA 2019a , b ) with certified companion r efer ence materials (Kralj et al. 2021, Sivaganesan et al. 2022. Mor e r ecent r esearc h foci of genetic anal ysis methods include ease of use, r a pid field-testing, and mor e sensitiv e and r epr oducible methods. For example, isothermal amplification assays such as LAMP (loop-mediated isothermal amplification; Martzy et al. 2017 ) or HDA (helicase dependent amplification; Kolm et al. 2017 ) have been developed for rapid enterococci detection in environmental waters; an overview can be found in Nieuwkerk et al. ( 2020 ).
In contrast to qPCR, where quantification of target genes relies on a calibration model, digital PCR (dPCR) allows quantification based on Poisson statistics of presence/absence results from thousands to millions of reaction mixture compartments per sample. Adv ances in micr ofabrication tec hnologies in the 2010s have allo w ed the de v elopment of commercial dPCR platforms, making this an emerging and highly promising technology for the GFPD field (Tiwari et al. 2022 ).

High-throughput DNA sequencing for genetic faecal pollution diagnostics
With the advent of high-throughput DNA sequencing (HTS) in the 2010s, whole-comm unity pr ofiling r e volutionized gut micr obiome r esearc h. T his , in turn, has enabled the identification of new hostassociated and general faecal pollution targets follo w ed b y the development of new qPCR assays Eren 2014 , Bibby et al. 2019 ). Appl ying HTS to envir onmental samples stim ulated the de v elopment of entir el y ne w concepts for the GFPD field. HTSbased a ppr oac hes hav e e volv ed r a pidl y, concomitant with rising capabilities in computing and bioinformatics (Garner et al. 2021 ). Curr entl y, the two most widely used methods are 16S rRNA gene amplicon sequencing (16S AmpSeq) providing taxonomic information, and whole metagenome sequencing, allowing, in addition to taxonomic profiling, the identification of functional genes, such as virulence or antibiotic resistance genes (ARGs; Chan et al. 2019 ). Ther e ar e two str ategies to use HTS for faecal pollution analysis in aquatic en vironments . One approach works by identifying gut-associated taxa within the complex aquatic microbiome signal and thus identifying the presence of faecal pollution (e.g. Ulrich et al. 2016 ). The other approach relies on predefined faecal r efer ence sequence libr aries, based on a local sample collection and public sequence databases and aims to identify specific sources of faecal pollution. Sophisticated machine learning algorithms such as SourceT racker, FEAST , or FORENSIC are then required for data analysis and interpretation (Tan et al. 2015, Unno et al. 2018, Mathai et al. 2020, Raza et al. 2021. HTS, as curr entl y applied for most applications in microbiomics , only pro vides relative quantification within the sequence pool recovered (% of target sequences within total r ecov er ed sequences). The resolution depends on the applied sequencing depth (i.e. number of total sequence reads per sample). Per se, it does not provide quantitative information on the analysed sequences in relation to their occurrence in the water sample (see the ' Sensitivity of environmental detection of nucleic acid targets ' section). In its current form of application in GFPD, HTS seems to be of complementary nature to the qPCR/dPCR quantification of genetic fecal markers.

Liter a ture da ta base searc hes
The liter atur e databases Scopus and Web of Science/Cor e Collection w ere sear ched for studies on genetic methods to detect microbial faecal pollution in water. In both cases, the query included the following building blocks: 'genetic methods' AND 'faeces' AND 'water quality', with a suite of related words for each term. 'Genetic methods': (genetic OR qPCR OR ddPCR OR PCR OR ribotyp * OR DGGE OR metagenomics OR 'microbial communit * ' OR 'bacterial comm unit * ' OR 'micr obial div ersity' OR (source AND tr ac k * )); 'faeces': (feces OR faeces OR fecal OR faecal OR w astew ater OR se wa ge OR enteric OR intestinal); and 'water quality': ((water * OR freshwater OR seawater) AND (quality OR pollution OR contamination)). Each of the blocks was searched in the title, the abstract and the author k e yw or d fields . T he document type was restricted to r esearc h articles . T he time period co v er ed expanded from the first such article up until the end of 2022. The resulting list included 3112 articles from Web of Science/Core Collection and 3508 articles fr om Scopus. After r emoving duplicates and articles with no DOI, the combined list contained 3554 articles (Fig. 2 ). The search syntax and the retrieved records are available as supplementary data ( ' Demeter et al GFPD review Suppl Data.xlsx ').

Article screening
Next, the combined list (titles and abstracts) was screened manually to remove off-topic studies. Only articles that explicitly stated the use of at least one genetic microbial parameter as an indicator for faecal pollution diagnostics (but not if used as e.g. enteric pathogen) wer e r etained. Studies de v eloping and e v aluating ne w methods for GFPD as well as their field a pplication wer e r etained. A total of 1122 articles fulfilled these criteria ('all genetic studies', Fig. 2 ).

Broad ca tegoriza tion of 'all genetic studies'
The 1122 articles in the 'all genetic studies' pool were then categorized based on their broad study aim, as follows: (1) method establishment articles: the r esearc h question r elates to method de v elopment and e v aluation/v alidation (sensitivity/specificity, persistence , resistance , and so on).
(2) Application articles: the r esearc h question relates to the environment, and the genetic parameter is assumed to have been pr e viousl y v alidated. Studies on, e.g. the detection and source tr ac king of faecal pollution, or the estimation of the associated health risk, belong to this category. (3) Both : articles having both method establishment and application aspects. Since the r e vie w aims to assess a pplication ar eas, articles fr om (2) and (3) were retained for detailed analysis ('application studies', n = 649, Fig. 2 ).

Systema tic anal ysis of the 'applica tion studies'
Titles and abstracts from all application studies ( n = 649, Fig. 2 ) wer e r e vie wed to extr act information on fiv e study elements: (i) genetic faecal parameters, (ii) other types of parameters, (iii) sample type and use, (iv) data analysis approach, and (v) application area. The following section and Table 1 describe the study element definitions.
(i) Genetic faecal parameters: the two selection criteria for micr obial par ameters included her e wer e (1) detection using genetic methods and (2)  Class. All other parameters that the analysed articles reported were assessed on the level of parameter 'class', allowing an ov ervie w of the study design. Table 1 lists the 11 parameter classes that were identified. The class ' other ' covers div erse par ameters with low occurr ence , e .g. biological oxygen demand, heter otr ophic plate count, observ ational data on WASH practices.
(iii) Sample type and use. A total of 13 categories of 'sample type', including various water types, faecal matter, and other materials, were identified. If the authors stated the intended use of the water resource, this was also logged. For a list of 'sample type' and 'use type' categories, please refer to Table 1 . (iv) Data analysis approach. This study element describes how the dataset, c har acterized by the thr ee study elements explained abo ve , w as analysed b y the authors. In contrast to the three study elements, where several items could be logged, depending on the study design of the article, her e eac h article was assigned to one of the six categories listed in Table 1 . In cases in which only summary statistics wer e r eported, we differ entiated between qualitativ e data (occurr ences) and quantitativ e data (minim un, maxim um, median, and so on). Correlation analyses, hypothesis testing and simple bioinformatics such as sequence annotation and community analysis (e .g. Bra y-Curtis dissimilarities) wer e gr ouped together into the category ' correlations , hypothesis tests , or simple bioinformatics '. The category ' multivariate statistics or advanced bioinformatics ' includes multivariate statistics, classification algorithms in the case of classical library-based MST, MST algorithms with HTS data, and HTS-based community analyses involving statistical analysis with metadata. Studies performing Quanti-tativ e Micr obial Risk Assessment (QMRA) or micr obial fate and transport models were grouped together in the category ' QMRA , fate & transport modelling ' . Other data analysis a ppr oac hes, suc h as GIS-based data anal ysis, or, in the case of classical library-based MST, genotyping fingerprints without reporting a statistical classification method were assigned to the category ' other data analyses ' . (v) Application ar ea. Eac h article was assigned to one of the se v en scientific application areas identified during the study design analysis . T he application assignment is based on the predominant research question. For a list of the application areas, please refer to Table 1 .
The assessment was performed in MS Excel. The resulting study design database (available as supplementary data, ' Demeter et al GFPD review Suppl Data.xlsx ') was analysed and visualized in R, using tidyverse (Wickham et al. 2019 ). Co-occurrence netw orks w ere computed and visualized using igraph (Csardi and Nepusz 2006 ), following Ogn yanov a ( 2021 ). The pie dia gr ams ov er the ma p wer e cr eated using scatterpie (Yu 2023 ) and ggplot2 (part of tidyv erse). Alluvial dia gr ams that gr oup and visualize categorical data, wer e cr eated with ggalluvial (Brunson and Read 2023 ).

Broad study design trends across all articles
A systematic scientific liter atur e database searc h follo w ed b y manual screening identified 1122 scientific articles (Fig. 2 , 'all genetic studies'). Research with genetic methods in this field started in the 1990s with a few articles per year, increasing to up to almost 100 articles in 2021 (Fig. 3 ). The broad categorization of study design types r e v ealed thr ee distinct phases: (i) the emergence of genetic methods in the 1990s with just a handful of articles published y early; (ii) betw een ∼2003 and 2010, the field started to gro w with the main focus of r esearc h being on the de v elopment and validation (establishment) of new methods, namely, new general and host-associated faecal markers; (iii) since 2011, the field continues to grow, but there is a clear shift from method establishment activities to the implementation across a broad range of applications (Fig. 3 ). A closer look at the author affiliations r e v eals that Northern America is the dominant hub of both method establishment and application studies, with Europe and Asia coming second and third, r espectiv el y. Cooper ation was e vident among continents, demonstrating the international and interconnected nature of the GFPD field (Fig. 4 ).
Since the aspects of establishing methods have been duly revie wed else wher e (see r efer ences in the ' Bac kground information on genetic targets and methods: a historical overview ' section), articles focused on these aspects were excluded from further analyses (Fig. 2 ).

'Application studies' trend analyses
'Application studies' ( n = 649; Fig. 2 ) were reviewed to extract defined study elements r anging fr om par ameters measur ed to 'a pplication area' ( Table 1 , ' Methods of the systematic study design analysis '). The following sections describe study element assignments and occurrence trends.
Parameter 'class' assignment and trends P ar ameter 'class' assignments were designed to provide a coarse ov ervie w of the general experimental study design where parameter 'class' was defined as a group of similar parameters. A total of  17 parameter 'class' types, including six genetic and ele v en other par ameter classes, wer e identified during the systematic r e vie w r anging fr om ' MST mar kers ' (measur ed by n = 434 articles) and ' cultivation-based FIOs ' ( n = 410) to ' epidemiology ' ( n = 13). A total of 468 articles (72% of 'application studies') included three or fewer parameter classes. A total of four parameter classes were reported by 116 articles, while complex study designs with five or more parameter classes were rare with only 65 articles. A co-occurrence network analysis indicated that the combination ' MST markers ' and tr aditional ' cultiv ation-based FIO ' was the most common one ( n = 277 articles). In fact, not only were ' MST markers ' paired often with ' cultivation-based FIO ', but this was the most common combination for each of the genetic parameter classes. Additionally, ' MST mar kers ' wer e often combined with ' pathogens ' ( n = 126 articles) and ' physicochemistry and nutrients ' ( n = 87 articles, Fig. 5 ).
Genetic par ameters: 'tar get or ganism ', 'host', and 'method' assignments All 'application studies' were mined for detailed information on the genetic parameters. For each parameter reported, the target or ganism, host or ganism, and anal ytical method wer e r ecorded, resulting in a total of 952 par ameter occurr ences acr oss the 649 application studies . T he most widel y r eported tar get or ganism was ' prokaryotes ' ( n = 756 parameter occurrences) follo w ed b y ' viruses ' ( n = 166). In contrast, ' host mitochondrial DNA ' and ' other ' tar get or ganisms collectiv el y accounted for 30 par ameter occurrences. Host assignments indicated that ' human ' ( n = 322) is the most widely researched host animal follo w ed b y ' multiple hosts ' ( n = 209), ' general ', ' faecal ' ( n = 157), and ' nonhuman ' ( n = 40). Method assignments suggest that PCR-based methods account for the vast majority of parameter occurrences ( n = 720), with ' qPCR/dPCR ' methods used 82% of the time. ' Sequencing ' was the next most pr e v alent method assignment gr oup ( n = 146). An alluvial plot (Fig. 6 ) illustrates linkages or lack thereof between class, tar get or ganism, host, and method parameters.

'Data analysis approach' assignment and trends
While 171 articles (26%) only report summary statistics (qualitative and quantitative), the majority report more sophisticated F igure 5. Netw ork analysis of the parameter 'class' assignment occurrence in the genetic faecal and other types of parameters (Table 1 ) in the 'application studies' pool ( n article = 649). The node size is proportional to the number of articles, the line thickness reflects the number of articles for a r espectiv e combination. Blue lines mark more than 20 co-occurrences while grey lines show less than 20 co-occurrences.  Figure 6. Alluvial plot showing the occurrence of genetic parameter types in the 'application studies' pool ( n article = 649). Each item, i.e. each line corresponds to one parameter measured in one study, so one 'class'-'target organism'-'host'-'method' assignment. The thickness of the stratum (ribbon) corresponds to the number of studies that measured that particular class-organism-host-method combination. Ho w ever, since a study might hav e measur ed se v er al genetic par ameters, the y -axis does not corr espond to the number of articles in the 'a pplication studies' pool. data anal ysis a ppr oac hes suc h as ' correlation anal yses , hypothesis tests or simple bioinformatics ' ( n = 214, 33%) and ' multivariate statistics or advanced bioinformatics ' ( n = 177, 27%). ' QMRA or fate & transport modelling ' were found to be conducted only by a small portion of the articles ( n = 29, 4%, Fig. 9 ).

'Application' type assignment and trends
A total of se v en genetic method a pplication ar eas wer e identified in this systematic liter atur e r e vie w (Fig. 10 ). In addition to faecal pollution detection using general faecal indicators (' Application 1 ', 91 articles), MST was the predominant use of genetic faecal mark-ers (' Application 2 ' and ' Application 3 ', 356 articles in total). Most of these studies performed MST in the classical sense , in vestigating se v er al potential sources (' Application 3 ', 230 articles), while 126 articles targeted just one source type, mainly human (' Application 2 '). To a m uc h smaller extent, genetic faecal markers were found in performance assessments of (w aste)w ater treatment and in studies of micr oor ganism fate and tr ansport in gr oundwater as tr ansport surr ogates (' A pplication 4 ', 44 articles). An equall y small, but emerging field is health and infection risk assessment, where genetic methods have been found to be emplo y ed as risk indicators, or as support in selected steps of QMRA (' Application 5 ', 26 articles). F igure 7. Netw ork analysis of the 'sample type' assignment occurrence in the 'application studies' pool ( n article = 649). The node size is proportional to the number of articles, the line thickness reflects the number of articles for a respective combination. Blue lines mark more than 10 co-occurrences while grey lines show less than 10 co-occurrences. CSO denotes combined sewer overflow.  Host-associated faecal indicators have also been used to trace the origin of waterborne outbreaks, elucidate pathogen transmission routes and support the interpretation of SARS-CoV-2 wastewater surveillance data (' Application 6 ', 25 articles). Apart from these core application areas, GFPD tools have also been found to support other scientific disciplines, such as the tracking of the source of nutrients or ARGs, as well as archaeology. The section ' A pplication 7 ' pr o vides an o v ervie w of these additional areas (107 articles).

In-depth review of the application areas of genetic faecal pollution diagnostic through case studies
The following sections demonstrate the successful implementation of GFPD in the se v en identified a pplication ar eas of water quality r esearc h (Fig. 11 ). To do so, trend analyses of selected study elements for a given application area are presented at the beginning of each section, follo w ed b y an illustration of these findings through a collection of cutting-edge case studies.

Application 1: faecal pollution detection
In gener al, ther e ar e two a ppr oac hes to detect faecal pollution using genetic methods, and the 91 articles in this application category can be divided along these lines, with just a small overlap: (i) the targeted detection of traditional or new general faecal markers, mostly using qPCR (for definitions, see the section ' Systematic analysis of the 'application studies ', n = 36 articles); (ii) the nontargeted detection of faeces-related taxa using HTS ( n = 50); and (iii) five articles measuring both. ' Traditional general faecal markers ' were used more often than ' new general faecal markers ' ( n = 37 and n = 9 articles, r espectiv el y). In most instances, ' traditional general faecal markers ' were measured in parallel with the corresponding ' cultiv ation-based FIO ' par ameter (28 out of 37 articles). The dominant method for community composition analysis was 16S AmpSeq (45 articles). ' Freshw ater ', ' seaw ater ', and ' sediments and sand ' were the most common sample types while ' recreational ' and ' drinking ' were the most frequently observed intended use types.

Targeted detection of general faecal indicators
Regulatory a gencies, suc h as the United States Envir onmental Pr otection Agency (USEPA) have begun to capitalize on the potential of qPCR as a r a pid monitoring solution for r ecr eational waters , pro viding same-da y results ( < 4 h). In 2012, water quality beach action values for qPCR measurements of enterococci were included in the U.S. Recreational Water Quality Criteria (USEPA 2012b ). This addition was based upon epidemiological studies conducted at freshwater and marine beaches that provided evidence that enterococci levels measured by qPCR are predictive for swimmer-related illness ( (Wade et al. 2008(Wade et al. , 2010, see details in the section ' Application 5 ').
Since then, enterococci qPCR (USEPA Methods 1611 and 1609.1) has been applied in several beach monitoring demonstration and implementation pr ogr ams (Ferr etti et al. 2013, Dor e vitc h et al. 2017, Byappanahalli et al. 2018. In one of the largest studies, nine Chica go beac hes wer e monitor ed ov er the course of 894 beachdays in 2015 and 2016, resulting in 1796 water samples that were analysed by enterococci qPCR while maintaining standard E. coli cultiv ation testing, whic h is typicall y used at the Gr eat Lakes (Dor e vitc h et al. 2017 ). Side-by-side comparison of the two appr oac hes sho w ed that enterococci qPCR beach action values were exceeded 3.4 times less fr equentl y than E. coli cultivation beach action values (6.6% vs. 22.6% of beach-days) (Dorevitch et al. 2017 ). Ho w e v er, gener alizations-suc h as that qPCR testing necessarily leads to fewer beach action value exceedances than cultivationbased testing-cannot be made. Se v er al prior studies hav e found v arying le v els of a gr eement between E. coli cultiv ation and enter o- cocci qPCR beach action value exceedances (Haugland et al. 2014, Byappanahalli et al. 2018. Moreover, data analysis of this large m ultibeac h, m ultiyear e v aluation study found that prior-day E. coli cultiv ation r esults ar e no better than chance alone at predicting current-day water quality at Chicago beaches (Dorevitch et al. 2017 ). Based upon these findings, enterococci qPCR testing was expanded by the local authority at up to 20 Lake Michigan beach locations from 2017 onw ar ds and E. coli cultivation-based testing was discontinued (Shrestha and Dorevitch 2020 ).
Mor e r ecentl y, the USEPA de v eloped a dr aft standard method for qPCR testing of E. coli ('Draft Method C'; Sivaganesan et al. 2019 ) driven by the need for rapid E. coli testing. In a large-scale method comparison effort, data from 101 Michigan (USA) recreational beac hes fr om mor e than 6000 samples sho w ed 91.5% a gr eement in beach notification outcomes between the cultivation-based standard of 300 MPN or CFU/100 ml and a putative threshold of 1.863 log 10 gene copies/reaction, estimated in this study (Haugland et al. 2021 ). A str ong corr elation was observed between cultiv ation and qPCR r esults, with a Pearson R -squared value of 0.641 for the pooled data of the 39 sites passing the data eligibility criteria (sample n = 2092) (Haugland et al. 2021 ).
The universal Bacteroidales marker BacUni, a new general faecal marker, was e v aluated together with three cultivation-based FIOs as a predictor of protozoan and bacterial pathogens in samples from rivers and estuaries in California, USA (Schriewer et al. 2010 ). The universal Bacteroidales marker was detected in all water samples at concentrations two orders of magnitude higher Figure 11. Ov ervie w of the application areas of the genetic faecal pollution diagnostics field. than cultivation-based FIOs . T he results also sho w ed the universal Bacteroidales marker to have a comparable or higher mean predictive potential than cultivation-based FIOs (Schriewer et al. 2010 ). The high abundance of ne w gener al faecal markers is certainly an asset, as sensitivity can become a challenging aspect for genetic faecal pollution detection in water resources with low faecal pollution le v els (for details, see ' Sensitivity of environmental detection of nucleic acid targets ' in the ' Discussion ').
Nontargeted detection of f aeces-r elated taxa using highthroughput sequencing HTS a ppr oac hes hav e emer ged in micr obial water quality monitoring allowing for new opportunities. From a public health perspectiv e, HTS surv eys hav e been shown to identify faecal taxa (e.g. Bioinformatic analyses of 16S AmpSeq data sho w ed a drastic restructuring of the bacterial community, associated with hydrological dynamics . T he r elativ e abundances of sequences matc hing faecal bacteria ( Bacteroides , Clostridium , and Blautia genera) and potentially pathogenic populations ( Campylobacter and Helicobacter ) wer e observ ed to incr ease after the peak of the storm (Ulrich et al. 2016 ). Given that HTS applications can provide profiles of micr obial comm unities and information on faeces-associated taxa, such genetic approaches may become useful as a screening tool in the future for identifying potential health risks and for prioritizing sites for follow-up analysis of water samples using targeted quantitative PCR approaches (Vadde et al. 2019, Jiang et al. 2020 ).

Application 2: MST of faecal pollution from a single source type
Faecal pollution may originate from a multitude of point and nonpoint sources . T he need to identify the sources of faecal pollutions arose years ago, and since then, many different appr oac hes hav e been de v eloped and v alidated (' Bac kground information on genetic targets and methods: a historical overview '). Focusing the investigation on a single type of faecal source often happens (i) if there is evidence regarding the dominant source of pollution such that neglecting other sources is acceptable or (ii) the inv estigation specificall y addr esses one source type because, e.g. some faecal sources could r epr esent a higher public health risk than others. In an y case, the need to v alidate the hypothesis of the origin of contamination using a reliable analytical tool exists, since scientific evidence facilitates posterior effective measures.
Of the 126 articles in this application area, the single source was ' human ' in 113 cases and only a handful of articles focused on ' nonhuman ' sources such as ruminants , gulls , ducks , chickens , or dogs . T he majority, 73%, of the articles combine ' MST markers ' with the measurement of traditional ' cultivation-based FIOs '. Other parameter classes that often appeared were ' pathogens ', ' traditional general f aecal mar kers ', ' physicoc hemistry and nutrients ' and ' c hemical tracers ' ( n = 31 to n = 15 articles). ' Freshwater ' was most often sampled ( n = 80 articles), follo w ed b y ' seawater ' ( n = 23) and ' sewage ' ( n = 18). A total of 44 articles reported ' summary statistics ' (qualitative or quantitative), while 48 articles performed ' correlations , hypothesis testing or simple bioinformatics '. A smaller set of articles performed more advanced data analyses, such as ' multivariate statistics or advanced bioinformatics ' ( n = 23) or ' QMRA, fate & transport modelling ' ( n = 4).

Human sources: decentralized w aste w ater systems
The inter pr etation of MST r esults is gr eatl y enhanced by cultivation-based FIO and land-use data or additional parameters that can help to explain the origin, fate and transport of a specific pollution source. For example, in watersheds with more than 1621 septic systems in Mic higan, USA, higher concentr ations of Bacteroides thetaiotaomicron (human-associated marker) were detected under baseflow conditions suggesting that control measures should include septic system maintenance and construction in the area (Verhougstraete et al. 2015 ). In this study, analyses were performed using a classification r egr ession tr ee including riparian buffers, septic tanks, and physicochemical data. Beyond c hr onic pollution scenarios, r ainfall e v ents can impair water quality through combined sewer o verflows , se ptic tank see pages, agricultural runoff or other events governed by precipitation. A similar study found that three human-associated Bacteroides markers corr elated positiv el y with septic tank density during w et w eather, suggesting that septic tanks are a significant source (Peed et al. 2011 ). Since there was no correlation with FIO during baseflow conditions, the authors postulate that other sources might be implicated in c hr onic pollution.

Human sources: centralized w aste w ater systems
In some cases, genetic MST markers can be combined with other types of tracers to strengthen the interpretations and to overcome markers' limitations, such as low specificity, differing decay rates or different transport. For example, the detection of the humanassociated genetic marker HF183 and optical brighteners in private drinking water supplies in rur al ar eas of Vir ginia, USA, r ev ealed se wa ge as a potential pollution sour ce. Ho w e v er, onl y a fe w samples sho w ed E. coli together with the optical brighteners, suggesting a different fate and transport of these indicators within the aquifer (Smith et al. 2014 ). In Montreal, Canada, a study applied a m ultipar ameter source tr ac king toolbox combining c hemical source tr ac king markers for se wa ge (caffeine , theophylline , and carbamazepine) together with the human-associated genetic markers HF183 and mitochondrial DNA to detect illicit w astew ater disc har ges into storm se w ers during dry w eather (Hachad et al. 2022 ). The authors used a composite index of the different markers together with the le v els of E. coli to identify household cross connections or indirect illicit disc har ges and v erified them successfully with dye tracing.
Hydr ological and meteor ological data ar e often indispensable to understand the fate of faecal micr oor ganisms in the environment. F or example, h ydrological and meteorological data combined with the human-associated marker HMBif, cultivationbased MST parameters and FIO allo w ed modelling the selfdepuration distance of a small Mediterr anean riv er (P ascual-Benito et al. 2020 ). The obtained models provided information about the r ecuper ation of the riv er's initial conditions after r eceiving treated sewage discharge. MST tools are also useful after extr eme meteor ological e v ents . For example , after Hurricane Harvey, the detection of the human-associated markers HF183 and BacHum and their correlation with FIO indicated a large input of se wa ge thr ough se wa ge ov erflows and stormwater in two catchments in Texas, USA (Kapoor et al. 2018 ).
HTS a pplications hav e also been r eported. After the pioneering work of Unno et al. ( 2010 ) in South Korea, the study by Newton et al. ( 2013 ) was one of the first large-scale studies that also demonstrated the complex challenges in data interpretation. The authors examined c hr onic human faecal pollution at an urban site in Lake Michigan, USA, and set out to identify its sources and deliv ery r outes. By identifying the r elativ e abundance of se wer infr astructur e-associated, faecal and human faecal signatures in lake water samples, they identified combined sewer overflows as the dominant pollution source during heavy rainfall events, whereas nonhuman faecal sources exhibited the highest r elativ e abundance during dry weather and noncombined sewer overflow pr oducing r ain e v ents. Mor e r ecentl y, Zimmer-Faust et al. ( 2021 ) tr ac ked the plume of a w astew ater treatment plant (WWTP) outfall in the coastal Pacific Ocean on the USA/Mexico border and sho w ed that its behaviour differs depending on oceanic and meteorological conditions . T hey used a human-associated MST marker and 16S AmpSeq together with the algorithm SourceTr ac ker, with pristine marine water, WWTP disc har ge and a nearby riv er as potential sources to derive the spatial extent and concentration gradient of human pollution.

Recreational w ater s
Coastal waters have important value for leisure, tourism, and coastal ecosystems including shellfish harvesting areas; therefore , MST tools ha ve been extensively tested in these areas (Korajkic et al. 2009, González-Fernández et al. 2021. In Thailand, Kongprajug et al. ( 2021 ) used two genetic viral MST markers, crAssphage and HPyV, at various beaches during dry and wet seasons to verify human waste practices as the main faecal source. Their results reported temporal variability but not spatial variability, thus recommending a future monitoring strategy based on mor e fr equent sampling at a unique sentinel site. Other studies include environmental data such as precipitation and solar radiation, oceanogr a phic data like tides and currents, and use correlations or more complex models to be able to predict a potential pattern. For example, at different sites in San Francisco, USA, the human-associated marker HF183 was found to correlate mainly with 72 h precipitation but also water temper atur e, tides or insolation . Cao et al. ( 2018 ) sought to de v elop a standardized data analysis approach that incorporates all qPCR measur ements fr om a defined group of samples (i.e. nondetections , detections , and measurements in the range of quantification) to assess av er a ge human faecal pollution le v els at r ecr eational water sites . T he authors proposed a metric, the human faecal score, that combines the results of the human-associated qPCR marker HF183/BacR287 with a defined sampling strategy (sampling intensity and number of replicates) and a Bayesian weighted av er a ge a ppr oac h. The scor e can be used to prioritize sites for remediation and has more recently been used to compare source-associated impacts under wet and dry conditions (Shrestha et al. 2020 ) and identify trends with cultured FIO paired measurements (Li et al. 2021b ). In addition to human sources, wild animals can also contribute to faecal indicator bacterial loads in coastal areas with large gull colonies . T he presence of the gullassociated bacterium Catellicoccus marimammalium in 58% of the water samples and at all sampling sites as well as their correlation with faecal indicators suggested a c hr onic impact of gull faeces on the water quality in southern Ontario, USA (Lu et al. 2011 ). The same marker sho w ed a decrease together with faecal indicators and bacterial pathogens after gull r emov al in Lake Michigan, USA (Converse et al. 2012 ).

Rural ar eas , domestic animals
Single source c har acterization is also r ele v ant in rural areas with high a gricultur al pr essur e wher e tr ac king animals suc h as s wine , ruminants, or poultry can be of interest (Weidhaas et al. 2011, Heaney et al. 2015, Wiesner-Friedman et al. 2021 ). These studies include, in addition to the r ele v ant genetic faecal marker, data on land uses, land-applied manure, and/or animal feeding operations . For example , after testing for a ruminant-associated marker, BoBac, and including data on animal feeding operations, the authors found that a ppl ying manur e in the fields implied an incr ease in faecal indicators in riverbed sediments (Wiesner-Friedman et al. 2021 ).

Application 3: MST of faecal pollution from multiple sources
Man y impair ed water bodies ar e polluted by mor e than one source . T hus , it is important to c har acterize k e y sources because the corresponding health risk as well as the mitigation steps may differ b y sour ce. Ne v ertheless, study design and choice of methods ar e highl y dependent on the w ater resour ce type, the intended water use and other factors.
Of the 230 articles with a focus on multisource MST, MST was ac hie v ed pr edominantl y using ' MST mar kers ' ( n = 180, 78% of articles) follo w ed b y classical libr ary-based MST ( n = 33, mostl y published before 2015) or HTS ( n = 21, mostly published after 2015). In m ultisource MST articles, FIOs ar e measur ed pr edominantl y with cultiv ation-based methods (' cultiv ation-based FIO ', n = 163 articles). In contrast, ' traditional ' and ' new genetic faecal markers ' played a minor role ( n = 17 and n = 20, respectively). The most common parameter combination was ' MST markers ' with ' cultivation-based FIO ' ( n = 132, 57% of articles). Other common parameter classes w ere ' ph ysicochemistry and nutrients ', ' patho gens ', ' meteorolo gy ', and ' land use ' ( n = 50 to n = 25 articles). The proportion of articles with four or mor e par ameter classes was higher than in single-source MST (31% in multisource MST and 28% in single-source MST, ' Application 2 '). This higher study design complexity was reflected in the data analysis approach: 35% of articles performed ' multivariate statistics or advanced bioinformatics analyses ' (18% in single-source MST, ' Application 2 ' ) .

Elevated pollution levels on a w ater shed scale
The starting point in watershed studies is usuall y ele v ated le vels of cultivation-based FIOs in rivers , lakes , or coastal waters . Often, the spatial scale is r elativ el y lar ge and ther e ar e m ultiple potential sources r anging fr om human faeces (via leaky infrastructur e, tr eated, or untr eated waste water or combined sewer overflows) to liv estoc k (gr azing or stabled), pets as well as avian and mammalian wildlife. Often, there is limited knowledge on hydrology , meteorology , and land use. An illustr ativ e example is given by three studies conducted over a span of 16 years in the Tillamook Bay catchment in Oregon, USA demonstrating how state-of-theart genetic MST applications have evolved o ver time . Bernhard et al. ( 2003 ) and Shanks et al. ( 2006 ) compared PCR-based ruminant and human marker frequencies with faecal pollution levels considering rainfall patterns and seasonal pollution dynamics to identify pollution sources. Much more recently, Li et al. ( 2019 ) used quality-controlled and, in several cases, standardized qPCR assays for five faecal sources, and high-resolution GIS for landuse and meteorological data to not only identify but also quantify and locate pollution sources and patterns to guide remediation efforts and risk assessment. In a similar a ppr oac h, Bushon et al. ( 2017 ) ranked tributaries to the Little Blue River catchment in Missouri, USA, based on estimated contributions to water quality impairment. The studies by Nguyen et al. ( 2018 ) and Yamahara et al. ( 2020 ) demonstrate how hypothesis-formulation can support study design for GFPD. Both studies also try to shed light on the potentiall y confounding r ole that soil and sediments might hav e on MST a pplications, especiall y in tr opical waters. To elucidate the r elativ e r oles of human and other animal sources polluting the Danube River and its tributaries, Kirschner et al. ( 2017 ) used a combination of longitudinal survey along more than 2500 km of river and a temporal survey over the course of a year at three sites successfully identifying human waste as the dominant source. Bambic et al. ( 2015 ) encountered difficulties segregating pollution sources due to the confounding influence of disinfected municipal w astew ater. Separating w et from dry weather based on meteorological data allo w ed data inter pr etation, with m unicipal waste water (human) being the dominant dry-weather pollution source, while during wet weather, a gricultur al runoff, and stormwater (ruminant and dog) dominate. Using bacterial and viral markers allo w ed the authors to demonstrate the difficulty to detect the presence of viral pathogens when only using bacterial indicators . T he authors used cutting edge data handling methods, including statistical methods to account for the lar ge pr oportion of nondetects, and an estimation of spatial and temporal variations of same-host contribution using ratios between given Bacteroidales MST markers and a general Bacteroidales marker (Bambic et al. 2015 ). Separating the sample set into dry and wet periods allo w ed  to r e v eal differing pollution pathwa ys . T he results of MST markers a gr eed with those from 16S AmpSeq and the FEAST algorithm: humans were the main pollution source in the dry season, and ruminant and swine were the main pollution sources in the wet season at this river site near Beijing, China. MST methods have also been used to more generally identify factors and features that promote or reduce watershed faecal pollution rather than just identifying pollution sources. As an example, Green et al. ( 2021 ) used MST and cultivation-based FIO in an investigation of 68 streams in New York State, USA, to identify stream features, land use practices and meteorological patterns that drive faecal pollution le v els fr om m ultiple sources.

Recreational w ater s
In contrast to general watershed pollution scenarios, bathing water studies are usually triggered by persistently elevated FIO levels at public beaches directly threatening the health of visitors, necessitating beach closures and inflicting considerable economic dama ge. Study ar eas ar e often smaller, and the potential sources are less diverse (e.g. sewage discharges , birds , and pets) . Prudently, studies often make efforts to consider the influence of hydrology (flows , tides , and so on) and the effect of precipitation and solar radiation on water quality changes and to r esolv e faecal source contributions (Williams et al. 2022 ). In a proof of concept study in Xiamen, China,  used high-throughput qPCR for a large number of assays targeting multiple faecal sources and pathogens to investigate bathing waters.

Drinking water
Impairment of drinking water quality is one of the most pressing issues worldwide . T he specific challenge in this application field is that low le v els of pollution already pose r ele v ant health risks . For example , ele v ated FIO le v els observ ed in karst and fractured aquifers after precipitation were the starting point for several MST studies . T he pr oblem of highl y v ariable pollution dynamics in the course of very short time periods can be a ppr oac hed by linking sampling to hydrological dynamics (Reischer et al. 2008 ) and nested sampling with higher sampling frequencies during periods of hydrological fluctuations and during/after rainfall e v ents (Reisc her et al. 2011 ). The v ery short r esidence times of faecal pollution in the studied springs also allo w ed direct source apportionment based on MST marker concentrations in spring water because differential persistence can be disregarded when measuring v ery r ecent pollution. To determine the source and risk factors for nitrate and microbial pollution in private dolomite karst w ells, Bor char dt et al. ( 2021 ) used m ultiv ariate r egr ession models with potential drivers such as land use, precipitation, hydrogeology, and well construction.

Aquaculture and irrigation water
Shellfish harvesting areas in coastal waters and aquaculture in gener al ar e also under a lar ge amount of anthr opogenic pr essur e often resulting in the contamination of products with FIOs and pathogens . T he applicability of MST approaches to identify and prioritize pollution sources has been demonstrated in shellfish harvesting waters and products such as oysters (Mieszkin et al. 2013 ). Klase et al. ( 2019 ) integrated MST markers, ARG assays, and pathogen detection with bacterial comm unity-based anal ysis to br oadl y inv estigate the potential public health risks associated with pollution of fishponds. Similarly, faecal pollution levels, ARG and pathogen occurrence were investigated in irrigation waters used for fresh produce to determine sources of pollution and risk factors (Weller et al. 2020 ).

Application 4: evaluation of treatment processes
P athogen r emov al is one of the primary functions of waste water and drinking w ater treatment. Ho w ever, relying on direct pathogen determination only is not practicable due to the low and v arying concentr ations in r aw w ater as w ell as the high number of different pathogens potentially occurring. T hus , treatment performance assessment often relies on treatment indicators used as r epr esentativ e surr ogates for pathogen r emov al (see the section 'Glossary'; Momba et al. 2019 ). While cultivation-based micr obial par ameters ar e the most commonl y emplo y ed treatment indicators (Jofre et al. 2016, Momba et al. 2019, the systematic liter atur e r e vie w r e v ealed 44 articles that used genetic markers as treatment indicators. In this article pool, ' MST markers ' and ' traditional general faecal markers ' were the most often measured genetic parameter classes (23 and 18 articles), whereas ' cultivationbased FIO ' and ' pathogens ' were the most common other parameter classes (22 and 15 articles). For the treatment type, 36 articles dealt with engineered treatment processes, with the majority, 24 studies, focusing on w astew ater treatment and w ater reuse . T he various steps of drinking water treatment, as well as stormwater and greywater treatment, were the topics of the other 12 articles. The attenuation of microorganisms during groundwater transport was the focus of eight studies. In total, five of these involved natural tr acers, and thr ee involv ed injected tr acers. Riv erbank filtr ation, mana ged aquifer r ec har ge and the drinking water treatment step of slow sand filtration were found to be the main processes studied. Inv estigations of micr oor ganism attenuation expr ess c hanges in treatment indicator concentration during a treatment step as percenta ge r eduction or as log 10 r eduction v alues (LRV, the difference in log 10 -transformed concentrations before and after the treatment step; Momba et al. 2019 ).
In summary, the identified studies using GFPD, as r epr esentativ el y sho wn belo w, pr edominantl y focus on nucleic acid target concentration changes, as an indication for the decrease of cell and virus concentrations during biological w astew ater treatment or aquifer tr ansport. Importantl y, inv estigation of water tr eatment pr ocesses often also determine disinfection efficacies by c har acterizing the micr obicidal and virucidal effects on FIO and pathogens (section ' Generating viability -and infectious status information by molecular tools ' in the section ' Discussion '). Viability PCR and enzymatic treatment PCR (ET-qPCR) are molecular tech-niques used to assess the viability and infectious status of micr oor ganisms. In our systematic searc h, thr ee articles wer e identified that emplo y ed these methods.

Detection of nucleic acids
For the c har acterization of the r emov al of pathogens, such as viruses, thr ough waste water tr eatment, vir al qPCR MST markers have been increasingly used and offer some advantages over traditional indicator viruses such as phages . T he most important aspect of qPCR MST markers is that their concentrations in untr eated waste water ar e expected to be far greater than those of most viral pathogens (Hughes et al. 2017 ). This is particularly important because an indicator whose concentration is high can be detected consistently and more easily in differ ent sta ges of tr eatment pr ocesses . T he concentrations of colipha ges in waste water wer e found to be 7-log 10 PFU/l, while the concentrations of enteric viruses such as human adenovirus and human polyomaviruses wer e v ariable and reported to be on the scale of 6 to 9-log 10 copies/l (r e vie wed in Ahmed et al. 2020 ). Several studies have reported high numbers of PMMoV, crAssphage, Bacteroides (HF183) and Lac hnospiraceae (Lac hno3) and other qPCR MST markers in untreated w astew ater (Rosario et al. 2009, Hughes et al. 2017, Ahmed et al. 2018. Furthermore, qPCR MST markers show little variation in untreated w astew ater, and the concentr ations r ange between 8 and 10 log 10 copies/l (Hughes et al. 2017, Ahmed et al. 2019. Se v er al studies determined the log reduction values of human MST qPCR markers such as crAssphage and PMMoV in full-scale WWTPs (r e vie wed in Ahmed et al. 2020, Sabar et al. 2022. For example, Hamza et al. ( 2011 ) r eported an ∼3-log 10 r eduction in PMMoV concentrations in a conventional activated sludge treatment plant in German y, whic h was similar to the reduction in polyomavirus and torque teno virus. Hughes et al. ( 2017 ) reported an ∼1.1-log 10 reduction in PMMoV concentrations in an activated sludge WWTP, which was less than those of HAdV and HPyV but similar to those of norovirus and entero virus . Similar log reduction value of PMMoV was reported by Kuroda et al. ( 2015 ) in a WWTP in Vietnam. Schmitz et al. ( 2016 ) reported < 1-log 10 reduction of PMMoV during activated sludge and biological trickling filter and the r eduction r ate was similar to aic hivirus, nor ovirus, sapo virus , adeno virus , and polyoma virus . Based on the log reduction v alues r eported in the liter atur e PMMoV a ppears to be a conserv ativ e vir al indicator for the r eduction of pathogenic viruses in WWTPs. Se v er al studies r eported the r eduction of crAsspha ge 'the most abundant [known] virus' in the human gut in WWTPs with activated sludge. Tandukar et al. ( 2020 ) reported a log reduction of 3.3 log 10 , while (Farkas et al. 2018 ) reported 1.0-2.0 log 10 reduction. Asami et al. ( 2016 ) determined the log 10 reduction of PMMoV and JC polyomavirus for coagulation-sedimentation and rapid sand filtration processes in a drinking water treatment plant (DWTP) in Bangkok, Thailand using qPCR. The observed removal efficiencies varied depending on treatment step, season, and raw water quality, with LRVs ranging between 0.4 and 1.6 for PMMoV and between 0.5 and 1.9 for JC polyoma virus .

Molecular str ate gies to indicate the viability and the infectious status
The original idea of a ppl ying viability qPCR to bacterial MST markers was to gain information on recent faecal pollution e v ents in w ater resour ces Wuertz 2012 , 2015 ). Mor e r ecentl y, Ja ger et al. ( 2018 ) used qPCR with and without propidium monoazide (PMA) pr etr eatment as well as cultivation-based methods for E. coli , enterococci and P. aeruginosa to evaluate the removal efficiency of wastewater ozonation, a tertiary treatment step. PMA is an inter calating DN A dy e that penetrates cells with impaired membranes and prevents PCR-based amplification (Nocker et al. 2006 ). It thus allows for selective detection of viable cells. PMA-qPCR is, ther efor e, also known as viability qPCR. FIO r emov al r ate estimates wer e r anked in the follo wing or der: cultivationbased > viability qPCR > qPCR (Jager et al. 2018 ), emphasizing the differences among the culturable population, the viable but not culturable population and the total bacterial DNA. Viability qPCR, in comparison with qPCR, has also been applied to FIO ( E. coli ) and bacterial and viral MST markers (crAssphage, JC, and BK polyoma virus , human adeno virus , human-associated Bacteroides HF183) in se wa ge sludge flocs to assess their r emov al and inactivation during potassium ferrate treatment (Wang et al. 2023 ). Spatial distribution and mov ement r esulting fr om the potassium ferr ate tr eatment of the FIO and MST markers could be analysed in different compartments of the sludge flocs, encompassing various extr acellular pol ymeric substance fr actions . T he reduction of the MST marker determined by qPCR was up to two orders of magnitude lo w er than the reduction determined by viability qPCR (Wang et al. 2023 ).
Similarl y, enzymatic tr eatment qPCR (ET-qPCR), whic h a pplies enzymatic treatment using proteinase K and RNase, was used to estimate infectivity of bacteriophage MS2 in w ater (P ecson et al. 2009 ). By utilizing multiple-PCR-amplicons (providing whole genome cov er a ge) and partial inactiv ation using differ ent virucidal agents (such as heat, UV-B light, and singlet oxygen), the authors demonstrated that genome damage does not fully explain vir al inactiv ation. Ther efor e, PCR-based assays would ne v er yield r esults equiv alent to infectivity assa ys . T hese assa ys fail to completely account for specific false positives that may arise when testing for MS2 bacteriopha ges. Consequentl y, to effectiv el y monitor MS2 infectivity using ET-qPCR, it becomes crucial to determine a statistical ratio of total inactivation by cell culture in advance. Ther efor e, this calculation should be established beforehand for the applied treatment conditions and the given virus, but culture methods are not available for all human pathogenic viruses (Pecson et al. 2009 ). A follow-up study investigating UV-C treatment and r el ying on qPCR without pr etr eatment demonstr ated that vir al inactiv ation may be estimated in conjunction with mathematical models for JC polyomavirus and HAdV (Calgua et al. 2014 ). For more discussion on this topic, please refer to the section 'Direct detection of nucleic acids: characteristics and challenges' in the 'Discussion' .
Ev aluating microor ganism attenuation in groundwater P athogen r emov al during subsurface passage may be studied by inv estigating infiltr ated faecal pollution (e.g. mana ged aquifer r ec har ge), and monitoring the r emov al of pathogenic or indicator micr oor ganisms . One wa y to in v estigate pathogen r emov al is to analyse water samples for naturally present microorganisms along a transect. Another way is with tracer tests using an injected tar get micr oor ganism or surr ogates . T his can be done either as a laboratory experiment, using columns packed with aquifer material, or in the field.
The vast majority of such transport studies quantify microbial targets with microscopy or cultivation-based methods. Using genetic tools to quantify surrogate or pathogenic organisms (i.e. bacteriophages and enteric viruses) for groundwater transport stud-ies is a r elativ el y nov el a pplication of this tec hnology, and ther efor e, limited liter atur e exists . T hese a ppr oac hes allow innov ativ e anal yses suc h as the quantification of m ultiple micr oor ganism cotr ansport using m ultiplex qPCR and differ entiating between infectious and inactivated viruses, when qPCR is used together with cultur e tec hniques (Betancourt et al. 2014, Bellou et al. 2015, Wang et al. 2022 ). In addition, genetic methods are a reliable way to enumer ate micr oor ganisms attac hed to particles, suc h as sediment and microplastics (Hassard et al. 2016 ). Genetic tools can also be used to confirm possible false-negatives deriv ed fr om microscopy or cultivation-based methods . T his is especially useful, as field tests are often expensive and labour intensive, and practical (small) sampling volumes often yield negative results. Low concentr ations of tar get or ganisms r equir e sampling lar ger volumes, which often presents additional challenges (Haramoto et al. 2018, Forés et al. 2022 ).

Natur al tr acers
Mana ged aquifer r ec har ge involv es natur al subsurface pr ocesses to tr eat intentionall y infiltr ated surface w ater or w astew ater effluent. In a study of the treatment efficiencies of three such systems in the USA, (Betancourt et al. 2014 ) measured viral pathogens and PMMoV, a human-associated viral marker, by qPCR in the infiltrated water and in a series of wells , pro viding the log reduction r ates ov er giv en distances. Near the highl y polluted Roc ha Riv er in Bolivia, surface water and riverbank filtrate are often used for irrigation, another example of indirect w astew ater reuse (Verbyla et al. 2016 ). T he remo val (log reduction) during riverbank filtration was assessed for this study using r efer ence pathogens r ecommended for w astew ater reuse, PMMoV, as well as a humanassociated bacterial indicator, and a QMRA of the consumption of the irrigated lettuce was performed.

Injected tracers
If the aim is to study the transport of pathogenic micr oor ganisms in field tests, a surrogate is often used as a tracer, that mimics the pathogen in size and surface c har acteristics, while die-off r ates are determined separately using batch tests . T he transport of the surrogate can be compared to the pathogenic microorganism in small column tests in the laboratory, using aquifer material, while the surrogate is injected or applied at a field site. In this way, it is possible to upscale the transport of dangerous substances using transport models. With this goal in mind, Stevenson et al. ( 2015 ) used qPCR to quantify the transport and r emov al of HAdV and its surrogate, PRD1 phages, in small column tests. In regards to water treatment, the removal of Cryptosporidium parvum and its surrogate Clostridium perfringens by slow sand filtration was e v aluated by Hijnen et al. ( 2007 ) as the last step in drinking water treatment using water taken from the Rhine River and spiked with the micr oor ganisms. C. perfringens was enumerated using cultivation, and the colonies identified with PCR. Bauer et al. ( 2011 ) used qPCR to analyse enteric adenoviruses to evaluate the efficiency of slow sand filtration and river bank filtration as drinking water treatment steps. Wang et al. ( 2022 ) investigated the transport of MS2 pha ges, a surr ogate for enteric viruses, fr om a surface water pond to groundwater via riv erbank filtr ation. The authors differentiated between infectious phages by plaque assay versus the total number of phages detected by qPCR.

Synthetic tracers
A unique application of genetic tools is using synthetic DNA as a tr acer whic h can be emplo y ed as m ultipoint tr acers thanks to the pr acticall y unlimited sequence options and their specific quantifi-cation using qPCR (Dahlke et al. 2015, Pang et al. 2022. Another innov ativ e idea is the use of DNA-labelled micr ospher es as surr ogates for pathogenic micr oor ganisms (P ang et al. 2014 ). This enables the enumeration of the pathogen and its surrogate by the same anal ytical pr ocedur e, qPCR, allowing mor e dir ect compar ability.

Application 5: infection and health risk assessment
GFPD ar e incr easingl y a pplied to support infection-and health risk estimation regarding human usage of water and water resources . T he range of applications is very broad and includes guidance in hazard identification (e.g. r efer ence pathogen selection), calibration of fate, transport, and QMRA models targeted to specific sources, and the genetic detection of risk indicators and mark ers, as alternati ve to culti vation-based en umer ation tec hniques.
The study design analysis found 26 articles that estimated health risk by the support of GFPD: se v en epidemiology studies at ' recreational ' water sites and 19 QMRA studies, most of which were conducted in ' recreational ' waters, five focussed on ' drinking ' water and one on ' irrigation ' water. The epidemiology studies compare ' tr aditional gener al f aecal mar kers ' with illness r ate, while the QMRA studies a ppl y ' MST mar kers ' to QMRA, using one of the abov e a ppr oac hes . T he most prominent parameter classes are ' traditional general f aecal mar kers ' and ' MST mar kers ' ( n = 7 and n = 21, respectiv el y), measur ed by qPCR. The r ele v ance of obtaining information on the viability-or infectious status for infection and health risk assessment is addressed in the ' Discussion ' section (' Direct detection of nucleic acids: characteristics and challenges ').

Guidance in hazard identification for QMRA
Host-associated faecal marker quantification in water resources can guide r efer ence pathogen selection for QMRA. This concept has been included in the fr ame work of integrated faecal pollution analysis and management ('3-step approach') of karstic drinking w ater resour ces (Farnleitner et al. 2018 ). The thr ee steps involv e (1) catc hment pollution source pr ofiling, (2) monitoring of general faecal pollution, and finally, (3) hypothesisguided qPCR MST mark er en umeration in spring water. At a large, complex and hardly accessible alpine karstic spring water catchment with importance for public water supply in Austria, the results pointed at zoonotic pathogens from ruminants, including cattle as the priority QMRA r efer ence tar gets (Reisc her et al. 2011. The a ppr oac h intr oduced by Farnleitner et al. ( 2018 ) was later extended to urban riv er catc hments using probabilistic modelling to simulate the occurrence and extent of faecal pollution sources in parallel with zoonotic pathogens fr om dir ect human as well as indirect livestock and wildlife faecal pollution sources (Derx et al. 2023 ). The probabilistic estimates from the catchments and the direct measurements in the river indicated that combined sewer overflows and communal WWTPs were the largest contributors to faecal pollution at the studied site . T he dev eloped a ppr oac h was indicated to be a r obust basis for micr obial fate and transport modelling and for QMRA (Derx et al. 2023 ).
MST qPCR marker analysis was also used to associate cases of human illness predicted by QMRA with bo vine , human, or unkno wn sour ces in contaminated private wells in Wisconsin, USA. Although some of the cases of illness were indicated to be of human pollution origin, the results suggested that most of the cases w ere caused b y bo vine faecal pollution. T his outcome had important implications for land use and water safety and health risk management of the fractured aquifers (Burch et al. 2021 ). In a study in the Netherlands, MST qPCR marker analysis was applied to trace back the origin of infection risks from Campylobacter sp. at a stormwater collection site (water plaza). The presence of human MST markers indicated a cross-connection with the combined sewer system (Sales-Ortells and Medema 2015 ).
Importantly, the performance characteristics (e.g. faecal sensitivity and specificity) of MST markers as well as their application design have to match the infection and health risk c har acteristics of the human and zoonotic pathogens considered (e.g. specific infectivity, specific health burden) to avoid masking of faecal hazards and their associated risk le v els ( Table 2 ).

Calibration of catchment models to estimate pathogen concentrations for QMRA
Genetic faecal marker quantification also pr ov ed v aluable for catchment-based QMRA modelling of faecal pollution sources. One of the principles of the 'QMRAcatch' philosophy is the catc hment-specific calibr ation of micr obial tr ansport (i.e. dilution, advection, and dispersion) and fate (i.e . deca y/persistence) models for specific faecal pollution sources by the use of MST markers. The calibrated and verified models can be used to derive pollution and management scenarios for given points of interest (e.g. drinking water abstraction sites) based on pathogen transport/fate simulations. Refer ence pathogens ar e quantified in pollution sources or derived from epidemiological data and the liter atur e (Sc hijv en et al. 2015 , Derx et al. 2016 ).
In a scenario analysis considering river water as a raw water source for drinking water production, the authors calibrated QM-RAcatch for human faecal pollution pathwa ys , such as from comm unal waste water disposal, using human-associated MST qPCR marker data for the Austrian section of the Danube River (Demeter et al. 2021 ). By use of a conceptual semidistributed hydrological model and regional climate model outputs, the authors simulated the interplay of future changes (e.g. climate change, population) and w astew ater management measures (enhanced WWTP tr eatment, pr e v ention of combined sewer overflows) with respect to the infection risks for viral and bacterial r efer ence pathogens . The study demonstrated that the degree to whic h futur e c hanges affect drinking water safety str ongl y depends on the type and magnitude of faecal pollution sources, and is thus highly site-and scenario-specific.
Mor e r ecentl y, the modelling a ppr oac h w as extended to w ar ds source-specific calibration to multiple faecal pollution sources, using MST markers for humans , ruminants , pigs , and birds . An impr ov ed hydr ological module (2D hydr odynamic flow, r ainfallrunoff, and differential MST decay) allo w ed comparing external (allochthonous) and internal (autochthonous) faecal pollution sources and their associated infection risks from zoonotic parasites ( Giardia , Cryptosporidium ) for the Dan ube Ri ver (human w astew ater input) and its floodplains (animal sources) downstream of Vienna  ). An important result for best management practices is that autochthonous and allochthonous faecal sources during flood and rainfall events contributed pathogen loads with similar orders of magnitude.

Infection and health risk indicator role trough epidemiological studies
The traditional method of r ecr eational water quality monitoring of surface waters has been based on the application of cultivationbased FIOs. For example, the r elativ e risk of illness for swimmers and nonswimmers in r ecr eational waters was estimated based on cultivation-based enterococci levels (USEPA 1986 ). Table 2. Ov ervie w of essential biological-diagnostic attributes of faecal indicators and associated genetic targets . T he o verview shows the 'big five': sensitivity , specificity , persistence, resistance, and mobility. Most of the shown attributes can be divided into subc har acteristics . Various methods and specification metrics ha ve been suggested for determination.

Methods/metrics
Faecal sensitivity Persistence varies widely among micr oor ganisms and genetic targets and is influenced by many potential abiotic and biotic ecological factors, such as sunlight, temper atur e, salinity, grazing, and so on.

Resistance
Extent of survival (i.e. viability, pr olifer ation, and infectivity) or molecular detectability (i.e. nondegraded amenable nucleic acids) of indicator or genetic target, r espectiv el y, to w ar ds chemical substances (e .g. metals , antibiotics) and during technical treatment and disinfection processes.
Resistance varies widely among micr oor ganisms and genetic targets and is influenced by many chemical and physical factors, such as type of chemicals (chlorine, ozone,), concentration and contact time (ct-v alue), temper atur e and time in thermal processes, and fluence in UV irradiation. Inactiv a tion r a te and kinetics obtained under car efull y controlled conditions (Hoff and Akin 1986 ) Inactiv ation r ate constants Log reduction of the concentration of micr oor ganisms/pathogens; a measure for the effect of a substance or for the efficacy of the process (Guerrero-Latorre et al. 2016 ). The log reduction to be achieved for a target is determined by risk assessment.

Mobility
Tr ansport c har acteristics of the indicator or genetic target in the (aquatic) environment Mobility is influenced by many factors, such as mass and size of the micr oor ganism/pha ge, its attac hment and a ggr egation behaviour (electrostatic and hydrophobic forces), its detachment behaviour, as well as the motility of certain micr oor ganisms. Mobility c har acteristics may change as the microorganism deca ys .
Sedimentation onto the river bed applies to larger-sized micr oor ganisms (pr otozoa) or micr oor ganisms attac hed to sediment ( Motility (Becker et al. 2004 ) Ho w e v er, a r e vision of these guidelines in 2012 ('NEEAR study') r eported that qPCR measur ements of gener al enter ococci concentr ations ar e better pr edictors of the r ate of gastr ointestinal illness among swimmers in r ecr eational waters compared to cultivationbased enterococci levels (USEPA 2012a ). This study established a combined a ppr oac h, using cultiv ation-based E. coli enumer ation (beac h action v alue of 235 CFU per 100 ml of water) and genetic enter ococci qPCR quantification (beac h action v alue of 1000 calibrator cell equivalents per 100 ml) with a health-based compliance target of 36 cases of gastrointestinal illnesses per 1000 swimmers (USEPA 2012b ). MST marker quantification by qPCR has also been incor por ated in epidemiological studies. For example, Griffith et al. ( 2016 ) applied se v er al bacterial and vir al indicators to pr edict gastr ointestinal illness in three Californian beaches ( n = 10 785 swimmers) by comparing qPCR and cultivation-based methods. At one beach, human-associated genetic MST marker le v els displayed the highest associations with gastrointestinal illness . T he authors concluded that performance of a selected parameter is likely sitespecific. Napier et al. ( 2017 ) conducted a prospective cohort study also using human-associated genetic MST markers in water (selfr eported gastr ointestinal illness among 12 060 swimmers at six beac hes acr oss USA). Inconsistent associations were noted betw een results; ho w ever, the authors concluded that qPCR MST marker data may be useful in assessing human health risks in r ecr eational water bodies.

Infection and health risk indicator role through indicator to pathogen ratio and QMRA
An increasing number of studies have attempted to establish a link between genetic MST marker concentrations and infection risks in r ecr eational waters using a QMRA modelling fr ame work. One of the first studies of this type was conducted to estimate the risk of gastrointestinal illness for adults swimming in waters contaminated with untreated sewage (Staley et al. 2012 ). In this study, norovirus was selected as the reference pathogen. The HF183 marker was detected in se wa ge dilutions indicating gastrointestinal illness risks greater than or equal to the benchmark value of 10/1000 primary contact r ecr eators in se v er al sampling sites based on the 1986 Ambient Water Quality Criteria (USEPA, 1986). Boehm et al. ( 2015 ) established a relationship between concentrations of the human-associated qPCR markers HF183 and HumM2 and gastrointestinal illness risk of swimmers in recreational waters using a QMRA a ppr oac h. The authors noted that the benchmark gastrointestinal illness rate of 30/1000 primary contact r ecr eators occurr ed when the median concentrations of HF183 and HumM2 marker genes were 4200 and 2800 GC/100 ml of water, r espectiv el y . In a subsequent study , Boehm et al. ( 2018 ) incor por ated the decay of both human faecal-associated markers and norovirus in the model to determine the risk associated with scenarios in which the age of contamination is unknown or water is contaminated by fresh untreated sewage. When an untr eated se wa ge contamination scenario was considered, the riskbased threshold was ∼9700 GC/100 ml. The analysis suggested that a risk-based threshold of 4100 GC/100 ml is warranted for the HF183 marker gene when the age of contamination is unknown. Schoen et al. ( 2020 ) modelled risk-based thresholds across differ ent mixtur e and se wa ge-a ge scenarios for crAsspha ge, HF183 and polyomavirus using QMRA. The authors concluded that genetic markers may not be effective when aged sewage contributes most pathogens r elativ e to fresh contamination. Similar riskbased MST marker thresholds have also been estimated for gull Catellicoccus , human Bacteroides , and human Lachnospiraceae markers (Brown et al. 2017, Boehm et al. 2018, McLellan et al. 2018. Such information can be extremely valuable to regulators in inter pr eting quantitativ e MST marker data concerning potential human health risk and de v eloping plans for faecal pollution mitigation and to assess human health risks more accurately  ).

Application 6: outbreak tracing and w astew ater surveillance
The GFPD toolbox has also pr ov ed useful in fields that traditionally focus on the detection and characterization of pathogens, such as waterborne disease outbreaks or pathogen transmission route characterization. Twenty outbreak and pathogen transmission tracing articles were retrieved, predominantly employing ' MST markers ' with paired measurements of ' pathogens ' and ' cultiv ation-based FIO '. Additionall y, fiv e of the r etrie v ed articles a pplied MST markers in w astew ater surveillance for SARS-CoV-2. Given the importance of this topic, additional liter atur e searc hes were performed and revealed three different roles in which MST markers may be implemented for w astew ater surveillance.

Outbreak tracing, disease transmission r outes , and sanitation trials
Waterborne disease outbreaks occur worldwide and may be caused by se v er al factors , e .g. in the case of drinking water, these ma y include ra w water contamination, treatment deficiencies, and drinking water distribution network failur es. Tr acing an outbreak is done predominantly by tracking the pathogen strain from patients through the transmission routes back to the exposure sour ce b y genetic typing and sequencing (molecular epidemiology, e .g. P opa et al. 2021 ). Alternativ el y, host-associated genetic faecal indicators can help identify the source for contamination and support the elucidation of disease or pathogen transmission routes. While they provide less specific outbreak-related information compared to pathogen typing, these markers are much more abundant than the pathogen in question, making them easier to detect in the environment. For example, host-associated markers were used in outbreak studies in Finland with ∼450 illness cases to identify the source of pollution and to ensure the success of contaminant r emov al fr om the drinking water distribution system (Kauppinen et al. 2019 ). A nov el a ppr oac h used the human-associated genetic marker HF183 in a norovirus outbreak involving 179 cases in Pennsylvania, USA. It was applied as a micr obial tr acer to demonstr ate the hydr ogeological connection between a malfunctioning septic system, drinking w ater w ell, and r ecr eational water area and, therefore, helped inform outbreak pr e v ention str ategies in the ar ea (Mattioli et al. 2021 ). The coastal Biobío Region of Chile had been affected by repeated hepatitis A outbr eaks. Human mitoc hondrial DNA, faecal coliforms, and live microbial biomass correlation was investigated and the concordance between human faecal pollution in the coastal waters and a seasonal hepatitis A outbreak strongly suggests that the investigated parameters can be used as a proxy to evaluate the risk of outbreaks of thalassogenic diseases (González-Saldía et al. 2019 ). During a large Campylobacter outbreak in Norway with over 2000 cases and 76 hospitalizations, an old cave used as a drinking water pool was identified to be faecally contaminated as indicated by the presence of E. coli . Host-associated genetic markers for humans , ruminants , horses , pigs , and other animals were applied to generate a faecal source distribution profile . T his revealed that the faecal contamination was likely zoogenic in origin (horses) (P aruc h et al. 2020 ).
In settings with poor sanitation facilities and practices, pathogen tr ansmission r outes can be m ultiple, ther efor e, planning WASH interventions to reduce pathogen exposure is challenging. A study in an urban slum in Nair obi, K en ya, set out to separate two types of human faecal waste, originating from children and from adults, because mitigation steps to reduce contamination could differ (Bauza et al. 2019 ). Using 16S AmpSeq analysis of faeces from both cohorts and various surfaces and waters, as well as the algorithm SourceTr ac ker, the authors identified child faeces as the dominant pollution source inside households, whereas faecal pollution from adults was more prevalent outside households.
GFPD tools can also be used to e v aluate WASH interv entions. A contr olled, befor e-and-after trial was performed in neighbourhoods of Maputo, Mozambique to estimate the potential health impacts of a sanitation intervention (installation of impr ov ed pit latrines). The authors first assessed the transmission routes through a comprehensive sanitary, environmental, and socioeconomic survey, including the measurement of a set of general and host-associated faecal indicators . T hey found widespread faecal contamination in soil, water, and food pr epar ation surfaces, including from human sour ces. Ho w ever, faecal contamination levels were largely disconnected from these analysed factors (Holcomb et al. 2020 ). In the before-and-after trial, the authors used a Bayesian hier arc hical modelling a ppr oac h to account for MST marker performance. Bootstr a p estimates found no effect of the sanitation intervention on the prevalence of general and humanassociated indicators, which highlights the complexity of the system and the need for m ultisectorial, 'tr ansformativ e' WASH interventions (Holcomb et al. 2021 ).

Waste w ater surveillance
Waste water surv eillance, also called waste water-based epidemiology, seeks to relate the occurrence of a public health target of inter est measur ed in waste water to the public health of a respective population (e.g. Choi et al. 2018 , Lorenzo andPicó 2019 ). COVID-19 gave a strong boost to the field, where SARS-CoV-2 RNA occurrence in w astew ater is used as a proxy for the prevalence and dynamics of the infection in the population . In contrast to HRWM, which focuses on the users of the w ater (e.g. drinking w ater, r ecr eation, and irrigation), waste water surveillance is an 'upstream approach', looking back at the population's health. Samples for w astew ater surveillance are taken fr om r aw waste water collected by centralized sewer systems. Surface waters heavily contaminated by se wa ge may also exhibit an epidemiological indicator function in terms of w astew ater surveillance (e .g. (K olar e vi ć et al. 2022, Maidana-Kulesza et al. 2022. Successful w astew ater surveillance applications require the accur ate measur ement of public health tar gets in waste water. Ho w e v er, this can be challenging because the proportion of human waste in a wastewater sample can be highly variable in time and space (i.e. between/within sampling site variability). In addition, the sample matrix may be c hallenging fr om an anal ytical point of vie w. In r esponse, man y scientists hav e suggested to use faecal markers (e .g. PMMoV, crAssphage , HF183) to support sample c har acterization and pr ovide quality contr ol in waste water surveillance.
One application category is the characterization of surveillance samples, whic h mainl y aims to quantify the human faecal le v els in (w aste)w ater but could also be used to c har acterize other animal sources. One study examining the epidemiological indicator function of SARS-CoV-2 in surface waters for countries with poor w astew ater treatment, e.g. applied an advanced sampling site c har acterization a ppr oac h including measur ement of human-(BacHum), ruminant-(BacR), and pig-(Pig-2-Bac) associated genetic faecal markers. By using this a ppr oac h, they could trace and identify sites with significant raw sewage influence from human populations , which ma y serve as sampling locations for w astew ater surveillance where no obvious sewage outlets exist (Kolar e vi ć et al. 2022 ).
In addition, MST methods have also been used as internal process controls within w astew ater surveillance in vestigations , either as a proxy for the public health target of interest to ensure adequate recovery and/or as performance metrics of sampling/sample processing protocols. In a monitoring study of SARS-CoV-2 in the w astew ater and rivers of Tapachula (southern Mexico), e.g. PMMoV was not only used as a faecal pollution marker but also as an analytical control to confirm RNA extraction and amplification (Zarza et al. 2022 ). In another study investigating the intr aday v ariability in 1-h and 24-h composite w astew ater samples, the concentrations of the human viral indicators crAssphage and PMMoV were monitored in addition to the less prevalent human pathogen adenovirus (HAdV) to inform the design of appr opriate waste water sampling str ategies for waste water surv eillance (Ahmed et al. 2021 ).
The most widely observed use of faecal markers for wastewater surveillance was the normalization of pathogen occurrence data. In this context, different MST markers were used either to describe spatial and temporal trends of the public health target of interest or to support the prediction of community infection trends . For example , Wolfe et al. ( 2021 ) describe how normalizing SARS-CoV-2 concentr ations fr om m ultiple WWTPs with PMMoV can be used to compare the incidence of laboratory-confirmed new COVID-19 cases by accounting for variability in recovery and differences in human faecal loads within or between WWTPs. Another study investigated the suitability and performance of v arious normalization par ameters and ho w w ell they correlated with local clinical cases. Normalization by crAssphage and PM-MoV (amongst others) was found to show varying performance for different sampling sites (Mitranescu et al. 2022 ). Similar findings were described for PMMoV in a study by Nagarkar et al. ( 2022 ) suggesting that the most suitable faecal marker for normalization may vary by site and wastewater management practices.
Waste water surv eillance r epr esents an exciting ne w a pplication for GFPD. Ho w e v er, additional r esearc h is warr anted, especially in areas highly relevant for w astew ater surveillance, such as the behaviour of MST targets in sewer systems, distribution between hosts, or protocol performance assessments with wastewater sample processing methods. Although genetic faecal markers hav e alr eady pr ov en to be v aluable, it r emains unclear whic h of the man y av ailable methods ar e most suitable. Optimal method selection will likely vary by use scenario, surveillance target, and geogr a phic location. In addition, applications will likely not be restricted to MST markers, but will use the entire methodological capacity of GFPD.

Application 7: other applications
Assessing w ater resour ces for the possible presence of faecal pathogens is the foundation of GFPD. Ho w e v er, these tools hav e also pr ov en useful in other arenas. For example, 48 out of the 107 articles in this category had antibiotic resistance as the primary r esearc h focus, complemented with a GFPD method, mostly MST pertaining to markers. In total, 12 articles used MST markers to trace nutrient inputs into ambient waters. Interestingly, three articles were observed from the archaeology field, and emplo y ed genetic methods for faecal bacteria. These three disciplines are further discussed below.

Identification of the sources of ARGs
Antimicr obial r esistance (AMR) is one of the top 10 global public health threats (World Health Organisation 2021 ). The spread of antibiotic resistant bacteria (ARB) and their ARG from hotspots such as WWTPs or agricultural run-off into freshwater and coastal ecosystems is of growing concern (Gao et al. 2018 ). Identifying such hotspots is , therefore , a pressing issue. Beyond the monitoring of a large panel of ARB and ARG targets of concern and the genotyping of ARGs (similar to pathogen typing), two additional a ppr oac hes hav e been established that allow tr ac king their source.
The first relies on the differing AMR patterns of the gut micr obiota of v arious host species, r eflecting the differing antibiotic usage in human and veterinary medicine . T his differing pattern is exploited for MST, where the pattern of the environmental samples of unknown pollution profile is compared to a library of kno wn faecal sour ces . In the early 2000s , this 'antibiotic resistance anal ysis' r elied on the phenotypic AMR c har acterization of E. coli or enterococci isolates (see also the section ' The early days of genetic methods for faecal pollution diagnostics '; Mott and Smith 2011 ). More r ecentl y, Li et al. ( 2018 ) adapted the Bayesian source tr ac king tool SourceTr ac ker, originall y r el ying on 16S AmpSeq data, to ARG data from whole metagenome sequencing. At two rivers in China with dense human and liv estoc k populations and with excess nutrient le v els, this tool identified WWTPs as the major source of ARG at the majority of sites (Hu et al. 2020 ). At one site, nonhuman animal faeces pr ov ed to be the major pollutant. Correlations with host-associated faecal indicator genera, identified based on 16S AmpSeq data, helped identify swine manure as the main nonhuman faecal input.
The second a ppr oac h r elies on the co-occurrence of hostassociated faecal micr oor ganisms and ARG and/or ARB, because of a common source. Williams et al. ( 2022 ) studied persistent faecal pollution in an urban coastal bay in Sydney, Australia. qPCR MST and 16S AmpSeq together with SourceTr ac ker wer e emplo y ed to pinpoint which stormwater drains drive dry-weather or w et-w eather faecal pollution. Significant correlations between ARGs and the human-associated MST marker HF183 sho w ed that the same stormwater drains were the main sources of ARG and of human faecal pollution. The Bolivian Andes is an intense mining area, and heavy metals exert selective pressure for the coselection of ARGs . T hr ough m ultiple linear r egr ession between the first principal component of a PCA of ARG data as dependent variable and metals, the human-associated viral marker crAssphage and physicoc hemical par ameters as independent v ariables, Agr amont et al. ( 2020 ) demonstrated that it is likely that human w astew ater inputs, rather than heavy metals, drive ARG concentrations in the thr ee riv ers studied.

Identification of the sources of nutrient inputs
Nutrients, such as nitrite (NO 2 − ), nitrate (NO 3 − ), and phosphate (PO 4 3 − ), are essential for plant life. Ho w ever, excess concentrations can lead to eutrophication and harmful algal blooms (K endall et al. 2007(K endall et al. , Fenec h et al. 2012 ). In addition, ingestion of high amounts of nitrate , e .g. through drinking water, ma y ha ve serious health consequences such as methemoglobinemia of infants (blue baby syndr ome), color ectal cancer, and thyroid disease (Ward et al. 2018 ). The World Health Organisation Drinking Water Guidelines recommend setting thresholds of nitrate and nitrite concentrations in drinking water (WHO 2017 ) and some countries also regulate surface water and gr oundwater (Eur opean Union: 91/676/EEC and 2006/118/EC, within the frame of 2000/60/EC). Mitigation of excessive nutrient inputs is, ther efor e, a k e y water quality mana gement task. Tr acing nutrient inputs r elies on the fact that ratios of r ar e to abundant isotopes of certain elements differ among environmental and biological compartments, due to isotopic fractionation during physiochemical and biochemical reactions. As a typical example, nitrate sources can be tracked using δ 15 N and δ 18 O isotopes (K endall et al. 2007(K endall et al. , Fenec h et al. 2012. Since nitrate has numerous biotic and abiotic sources and isotope tracing cannot separate all source types, a toolbox approach is often useful, which can include MST markers (Fenech et al. 2012 ).
One of the early studies combining δ 15 N and δ 18 O isotope tracing with MST was conducted along the Sava River, a tributary of the Danube Riv er, that cr osses Slov enia, Cr oatia, Bosnia and Herzego vina, and Serbia. T he combined results indicated that soil nitrification and human w astew ater w ere the primary nitrate sources in the Sav a Riv er, and the latter was also the main faecal pollution source (Vrzel et al. 2016 ). Carrey et al. ( 2021 ) assessed the main sources of nitrate pollution in surface water and groundwater across Catalonia, Spain in a government-led effort to r e vie w vulnerable zones as defined by the European Union Nitrates Dir ectiv e (91/676/EEC). Nearl y 200 samples wer e anal ysed for m ultiple isotopes ( δ 15 N, δ 18 O, δ 2 H, and δ 11 B fr om v arious molecules), viral and bacterial FIO, human-, ruminant-, and swine-associated MST markers and complemented by land use data. Each sampling location was inter pr eted individuall y. The conclusions fr om multi-isotopic and MST data agreed or partially agreed in 79% of the samples . T he authors offered detailed discussion on the complementary nature of the two a ppr oac hes and the possible sources of disa gr eement (Carr ey et al. 2021 ). In the coastal areas of Southwest Florida, harmful algal blooms caused by ele v ated nutrient le v els ar e a r ecurring pr oblem. Malfunctioning septic tanks were suspected to be the source of nutrients. Br e wton et al. ( 2022 ) applied δ 15 N and δ 13 C isotope tracing, elemental composition of particulate matter (C:N:P), a panel of nutrients, c hemical tr acers, cultiv ation-based FIO and human-, bird-, and gull-associated MST markers to tackle the complex challenge. These multiple lines of evidence pointed to a link between septic systems, groundwater, and surface water, ultimately resulting in harmful algal blooms. Additionall y, c hemical tr acers and birdand gull-associated MST markers indicated rainfall runoff to be a contributing factor (Br e wton et al. 2022 ). The Changle Riv er catc hment in China has a high human population, intensive liv estoc k farming (swine), and a gricultur al activities, all of which potentially contribute to the high nutrient le v els of the river. A Bayesian isotopic mixing model using data from the nitrate dual stable isotope technique ( δ 15 N-NO 3 − and δ 18 O-NO 3 − ) suggested manure and se wa ge to be the dominant pollution sources (Cao et al. 2022 ).
Since nitrate isotopes cannot differentiate between manure and se wa ge, Cao et al. ( 2022 ) applied MST using 16S AmpSeq together with the algorithm SourceTr ac ker, whic h suggested untr eated and tr eated domestic waste water as the main sources. Redundancy anal ysis br ought all lines of evidence (isotopes , MST, land use , and various ions) together to r e v eal domestic waste water as a probable cause of nutrient pollution (Cao et al. 2022 ).

Ar c haeology
Genetic markers can remain detectable much longer in sediments than in the o verla ying water column (Korajkic et al. 2019 ).
Sediments may ther efor e offer time-integr ated information on faecal pollution. In a tidal freshwater marsh in South Carolina, USA, the ruminant-associated MST marker BoBac was found in all sections of a soil core, the deepest section of which dated to 1961 (Drexler et al. 2014 ). While in this hydrogeological system the bacterial community of fresh pollution might migrate through the la yers , the findings pro vide e vidence of at least r ecent, but potentially long-term faecal pollution likely from deer and/or cow manur e. On m uc h lar ger timescales, lake sediments may act as biological arc hiv es of sedimentary ancient DNA fr om autoc hthonous (in-lake) and allochthonous (from the catchment and beyond) sources (Capo et al. 2021 ). Among other tools , palaeoen vironmental enquiries into ancient human presence and pastoral activities may also use MST markers or DNA sequencing techniques (Capo et al. 2021 ). In a study in Northern France, the authors documented a shift fr om a gr o-pastor al pr actices to for ested landscapes during the Roman period. Testing for ovine and bovine mtDNA markers r e v ealed shee p as the dominant li v estoc k befor e the transition (Etienne et al. 2015 ).

Emergence of a new field in health-related water quality analysis
The advent of genetic faecal pollution diagnostics (GFPD) Our search for peer-reviewed science regarding the analysis of faecal pollution-associated nucleic acid targets in water demonstrates the rapid development of genetic diagnostics within the field of HRWM since the start of the new millennium. The meta-analysis of the currently existing application types also highlights that this novel scientific discipline extends far beyond the enumeration of genetic MST markers. Man y tr aditional HRWM aspects, suc h as tr eatment and micr obial tr ansport indications, infection risk assessment and QMRA, as well as integration into modelling and simulations were found to be supplemented by GFPD (sections ' Application 1 ' thr ough ' A pplication 7 '). In addition, se v er al nov el aspects suc h as the support of epidemiological outbreak tracing, w astew ater surveillance, and supplementing ABR research, have also been de v eloped. The emerging scientific field of GFPD still grows; no plateau phase is in view (Fig. 3 ). In the past decade, the focus of research has shifted from method establishment to the implementation of these methods in scientific field r esearc h. An emphasis on field implementation is also indicated by the frequent use of certain genetic faecal markers, with some of them already standardized at the national le v el (section ' A pplication 1 '). Howe v er, method de v elopment has not halted, and it is v ery likel y that expected future technological developments in molecular biological analytics, sequencing and bioinformatics (e.g. Callaway 2022 ) will further promote diversification within the field of GFPD r esearc h.
It thus seems justified to define this emerging part of science as a new discipline: genetic faecal pollution diagnostics in health-related microbial water quality analysis (see the section ' Glossary '). The aim of GFPD is to open up the 'black box' of microbial faecal pollution of water resources to support problem-oriented water safety mana gement, cov ering aspects suc h as catc hment pr otection and management, water quality monitoring, health risk management, and treatment requirement evaluation. Additionally, GFPD can be applied to areas outside the water sector, as exemplarily indicated by its use in archaeology (section ' Application 7 ').

GFPD analyses distinct nucleic acid-based faecal pollution signatures
Vertebrate gut microbial communities fundamentally differ from envir onmental 'nondigestiv e' micr obial comm unities (e.g. water, sediment, soil, plant, and nonv ertebr ate), as first demonstr ated by the meta-analysis of 16S AmpSeq data by Ley et al. ( 2008 ). Long coev olution betw een host v ertebr ate animals (including humans) and their intestinal micr obiomes, driv en by many selective forces (e.g. ada ptiv e imm une system, host selection pr essur e, and unique bioc hemical envir onment), is likel y r esponsible for this clear distinction (Ley et al. 2008 ). Although cosmopolitan populations do occur, str ong v ertebr ate gut-associations also exist on the individual taxa le v el of micr oor ganisms (McLellan and Eren 2014, Youngblut et al. 2019, 2021. This clear intestinal versus nonintestinal micr obial comm unity dic hotomy forms the essential basis of specific detection of faecal pollution in water, targeting nucleic acid-based signatures from gut-associated bacteria, archaea and viruses. Similarl y, e volutionary ada ptations between macr o-and intestinal micr oor ganisms also exist on the host le v el, pr oviding the basis for MST (section ' Introduction ').
GFPD of today primarily focuses on the cultivationindependent detection of nucleic acid-based targets in the en vironment. T he literature analysis highlighted that GFPD thus far pr edominantl y r elies on targeted analysis, where faecal pollution-associated sequences ar e dir ectl y detected by amplification methods (e.g. PCR, qPCR, and dPCR), using specific primers and probes. Owing to the enormous technological developments in HTS, nontargeted approaches, using broader taxonomic sequencing and subsequent specific in silico sequence alignment to faecal-associated signatures , ha ve substantially improved during the past decade (Fig. 12 , section ' Outcomes of the systematic study design analysis ').
Advances in intestinal microbiomics will certainly further benefit GFPD, expanding our understanding of ecophylogenetics and providing access to re presentati ve sequence databases to support, (i) in silico design and e v aluation of molecular assa ys , and (ii) bioinformatic analysis of big data from HTS (Fig. 13 ). Human and other animal intestinal microbiome research, with the greatest r ele v ance in life sciences and medicine, is a very young discipline, and m uc h is expected to be ac hie v ed in the futur e.

Genetic faecal pollution detection and MST: a methodological quantum leap
The use of GFPD has fundamentally changed the way scientific questions on faecal pollution problems in the environment can be addressed and answered (Malakoff 2002 ). MST using genetic methods has opened the way to identify and quantify many different pollution sources that cultivation-based methods do not allow. Appr oximatel y, half of the identified GFPD studies (356 out of 649 articles) dealt with MST, i.e. the c har acterization and origin determination of faecal pollution. Many novel cutting edge GFPD studies , co vering single and multiple sources in differing types of w ater resour ces, including ele v ated faecal pollution le v els in watersheds, r ecr eational waters, gr oundwater r esources, aquaculture and others, could be successfull y r ealized (sections ' A pplication 2 ' and ' Application 3 ').

Biobanking: a new key element in HRWM resear c h
Tr aditional cultiv ation-based FIO anal ysis r equir es sample tr ansport, processing, and subsequent cultivation within a short time  period (usually < 1 working da y). T his often significantly constrains the possibilities and extent of research. In contrast, GFPD enables long-term nucleic acid pr eserv ation ( > 1 year) befor e performing the diagnostic analysis (De Paoli 2005, Jackson et al. 2011, Cary and Fierer 2014. The possibility of storing nucleic acids for posterior analysis has se v er al essential implications for HRWM r esearc h. Assuming that there is sufficient capacity to establish a re presentati ve sample bank over time and space, r esearc hers can (i) focus on selected samples of interest (e.g. pollution event-based analysis), (ii) focus on the parameters appearing most appropriate at the time of analysis, and (iii) extend the investigation to other samples and/or genetic parameters at any time, if sufficient analyte is a vailable . In hydrological sciences, this type of sample archiving for posterior analysis (e.g. isotopes) has already been a standard practice for decades.

Nucleic acid sample transfer supports the globalization of HRWM resear c h
Nucleic acid sample conservation during field work also opens the way to international network structures, useful for performing centr alized anal ysis in specialized labor atories , Reischer et al. 2013, Mayer et al. 2018 ). This point is especially interesting for developing regions that lack the infrastructure for advanced GFPD. During the COVID-19 pandemic, infrastructures for molecular biological analysis were established in many urban centr es thr oughout the globe and will likel y contribute to centr alized GFPD activities in the future . T hus , even advanced GFPD will not be limited to certain regions of the world but will be accessible from any remote location, provided that basic infrastructure for sample collection, processing, storage, and transfer, as well as standard operating procedures, are a vailable .

Characteristics of DNA/RNA-based target analysis
The liter atur e anal ysis highlighted that GFPD tar geting of pr okaryotic microbiota (bacteria and archaea) has almost exclusively relied on DNA analysis, with the 16S rRNA gene as the most frequentl y used dia gnostic r egion. In ad dition, alternati v e tar gets, such as gene regions for protein coding parts , ha ve also been used (Shanks et al. 2008, Green et al. 2014a. The primary aim of tracing intestinal DNA signatures in the environment is the sensitive detection and c har acterization of faecal pollution. Suc h DNA analysis does not give any information about the physiological status of the targeted microbiota in the analysed water. Acti ve, inacti ve, starving, viable but not culturable, or dead microbial populations are often detected equally. Depending on the applied extraction pr ocedur e, DNA attac hed to cells, or ganic debris, biofilms, or sediments, and e v en fr eel y suspended DNA, is also detectable (Carini et al. 2016 ). The same is true for vir al tar gets. Detecting vir al DNA or RNA does not provide information on the infectious or noninfectious status of the targeted populations.
Notably, it was reported, that the application of ribosomal RNA via RT-qPCR for bacterial general faecal markers and MST markers increases the sensitivity and frequency of faecal pollution detection for se v er al water r esource types (Pitkänen et al. 2013 ). In addition, rRNA analysis may also be interesting for viability investigations (section ' Generating viability-and infectious status information by molecular tools ').

Relevance of viability-or infectious status information
While the majority of genetic detection methods available do not account for information on the viability or infectivity status of the micr oor ganisms or viruses fr om whic h the nucleic acids originate, it is important to note, that this is not the main purpose for man y GFPD a pplications . For example , this is clearly the case for most of the identified faecal pollution detection and MST studies throughout the literature analysis (sections ' Application 1 ' to ' Application 3 '). Ne v ertheless, as outlined below, robust information on the persistence and r esistance pr operties of the genetic targets is essential for the correct selection and application of genetic MST markers and for the a ppr opriate data inter pr etation (section ' GFPD (MST) application frame: status quo and research needs '). Other identified GFPD application areas, such as the support of outbreak tr acing or waste water surv eillance, do not r el y on the viability status of the microbial targets either (section ' Application 6 ').
Even the use in recreational water quality monitoring seems to be a realistic exercise, without the need for a viability endpoint (section ' Application 5 '). For example, a recent investigation on swimming-associated health risks, including 80 000 beachgoers at 13 beaches (pooled data), revealed the strongest associations between gastrointestinal symptoms and qPCR-quantified enterococci, but not with cultivation-based enumeration (Wade et al. 2022 ). It was pr e viousl y hypothesized, that enter ococci DNA, as quantifiable by qPCR, better reflects the survival of resistant pathogens during w astew ater tr eatment (e.g. r esistant enteric viruses) than cultivable enterococci concentrations (Wade et al. 2006, Srinivasan et al. 2011. Obviously, it is desirable for pathogen die-off kinetics to match the decay kinetics of the analysed indicator signals, irr espectiv e of whether viability-or nonviability-based par ameters ar e consider ed. Undoubtedl y, mor e r esearc h is needed to better understand the principles behind these important relationships in GFPD and health risk assessment. Ho w e v er, the extent of already existing innovative research by nucleic acid-based qPCR analysis for infection-and health risk indication holds great promise for the future (section ' Application 5 ').
Information on viability or infectious status becomes an essential criterion when microbicidal and virucidal treatments are to be c har acterized. In particular, the efficacy assessment of disinfection, including all technologies (e.g. by heat, chlorine , ozone , UV light, and so on), r equir es the a pplication of r epr esentativ e and reliable indicators for viability and especially infectivity, often supplemented by selected r efer ence pathogens . T he assessment is historically based on cultivation methods, the considered lege artis gold standard, especially when disinfection processes and log-reduction targets are to be monitored, validated or verified. For example, a recent European Union regulation requires the cultiv ation-based v alidation monitoring of r eclaimed water for a gricultur al irrigation (class A) using E. coli , somatic colipha ges and C. perfringens spores, with defined performance targets of ≥5, ≥6, and ≥4 log 10 reductions within the treatment chain, respectiv el y (Eur opean Union 2020 ).

Generating viability-and infectious status information by molecular tools
In addition to cultivation-based enumer ation, cultiv ationindependent, molecular strategies for viability-and infectious status analysis are also increasingly applied in research. For pr okaryotes, a v ast arr ay of differ ent tec hniques, including RN A-based methods (rRN A, messenger RN A), membrane integrity (e.g. viability stains, viability PCR), cellular metabolism (e.g. ATP, r espir ation, isotope labelling), protein-based methods (e .g. BONC AT), and microcalorimetry, ha ve been suggested within the broad field of microbial ecology (Emerson et al. 2017 ). Ho w e v er, the delineation of dead versus viable microbial cells is complex and still under debate (Davey 2011 , Kirschner et al. 2021 ). There is consensus that living microbial cells should ha ve , (i) intact functional cell membranes, (ii) intact cellularand energy metabolism, and (iii) the capability to r epr oduce (i.e. intact tr anscription/tr anslation mec hanisms). Str aightforw ar d determination str ategies fr equentl y addr ess onl y one of these aspects of microbial viability (e.g. 'live/dead' protocols), leaving room for uncertainty (Emerson et al. 2017 ). T hus , (more timeconsuming) multiple criteria are to be applied simultaneously, if precise viability characterization of the target microbiota is r equir ed (Kirsc hner et al. 2021 ). Detection of infectious viruses is equally challenging, and no single method is available to detect all infectious viruses in water (Gerba et al. 2018 ). At least three criteria must be fulfilled for infectious viruses, (i) sufficient genomic integrity to produce the required proteins for replication and to provide an accurate genetic template for subsequent gener ations, (ii) pr otection of the genome fr om degr adation, and, (iii) the ability of the virus to recognize and infect the host cell (Pecson et al. 2009, Gerba et al. 2018. Viability PCR and a similar a ppr oac h, ET-qPCR, wer e intr oduced to the field of GFPD more than a decade ago Wuertz 2009 , Pecson et al. 2009 ) and have been increasingly applied in HRWM r esearc h in r ecent years . T he original idea of a ppl ying viability PCR to bacterial MST markers was to gain information on recent faecal pollution e v ents in water resources Wuertz 2012 , 2015 ). Viability PCR relies on the pretreatment of the sample with an inter calating dy e, PMA, that penetrates cells with impaired membr anes and pr e v ents PCR-based amplification (Noc ker et al. 2006 ), thus allowing the selective detection of cells with an intact membr ane. Virus ca psid integrity may also be assessed using the same principles (r e vie wed in Leifels et al. 2021 ) or using ET-PCR (Pecson et al. 2009 ). Ho w e v er, se v er al authors note c hallenges r elated to conditions of pr ocedur e confounding the results and emphasize that experimental conditions need to be optimized and validated for the micr oor ganism under inv estigation (Fittipaldi et al. 2012, Lazou et al. 2019, Leifels et al. 2021. The application of viability PCR now extends to the assessment of micr oor ganism attenuation during treatment processes (section ' Application 4 ').
In summary, molecular tools to generate information on viability and infectious state constitute a novel and innov ativ e ar ea of r esearc h in GFPD. Relativ el y little experience exists in comparison to traditional PCR and qPCR analysis (section ' Application 4 '). Man y c hallenges ar e still associated with their a pplication, suc h as problems with methodical reproducibility, cross-reaction with bac kgr ound-or free nucleic acids, selection of optimal r ea gents, and experimental conditions and protocols (Gerba et al. 2018, Codony et al. 2020 ). Furthermore, the success of these methods often depends on the particular mechanism of inactivation (e.g. chemical vs. physical agents). Nonetheless, further development activities in the future will likely open new windows of opportunity in HRWM as well as in complementing cultivation-based standards. In addition, many potential areas within the range of these available molecular tools have not yet been exploited (Emerson et al. 2017 ). For example, and in contrast with viability PCR applications, RNA-based methods have only very rarely been applied and e v aluated in GFPD (Pitkänen et al. 2013 ). As successfull y demonstrated in other fields of environmental microbiology, RNA analysis may significantly contribute to information on the activity status of microbial populations (Gourse et al. 1996, Amann and Ludwig 2000, Deutscher 2006 ).

Sensitivity of environmental detection of nucleic acid targets
A common narr ativ e is that molecular DN A/RN A dia gnostics ar e highly specific and sensitive . T his ma y be true for theoretical consider ations. For 'r eal world' a pplications, this dictum, especiall y in relation to sensitivity, must be considered in the context of the ov er all anal ytical measur ement c hallenge (Wintzinger ode et al. 1997 ). For example, an optimally designed qPCR test should be able to detect, in theory, one target molecule of DN A/RN A, if present in a single reaction unit. Ho w ever, as target molecules follow a stochastic distribution during analyte dilution for parallel analysis, the assay limit of detection (aLOD) cannot be less than three target molecules for a 95% detection probability per qPCR anal ysis, e v en with perfect PCR kinetics (Bustin et al. 2009 ). Ho w e v er, ov er all consider ations r equir e whole c hain analysis (WCA), including sampling, recovered sampling volume, filtration-and enrichment-, nucleic acid extraction-, and purification efficacies, and finally, the amount of nucleic acid analysed (Table S2, Supporting Information). The r esulting ov er all WCA sensiti vity, re ported for instance as the sample limit of detections (sLOD), can be quite ele v ated (Domingo et al. 2007 ). To illustr ate, sLOD or alternativ e estimates on WCA sensitivity for qPCR DN A/RN A tar get enumer ation wer e r eported to be in the range of log 10 1.5-3.9 genetic targets per 100 ml sample (Pitkänen et al. 2013 ).
Selected genetic targets for GFPD often target highly abundant intestinal bacterial and viral populations as occurring in faecal excreta or w astew ater, to compensate for the abovementioned WCA sensitivity issues. This fundamental design criterion is ac hie v ed by almost all top performing qPCR assays of genetic faecal markers , Reischer et al. 2013, Green et al. 2014b, Mayer et al. 2018, Sabar et al. 2022. Less abundant intestinal tar gets, suc h as tr aditional E. coli or enterococci (Farnleitner et al. 2010 ) can still be detected using genetic methods, if faecal pollution le v els ar e ele v ated, as fr equentl y observ ed for surface waters under communal and a gricultur al influence. Howe v er, in situations with low to very low faecal pollution le v els, such as groundwater and drinking water resources, the sensitivity issues of genetic faecal markers can be very limiting. Highvolume sampling, specific enrichment or alternative amplification systems may bring impr ov ed sensitivity and thus extend the possibilities of GFDP to such situations (Min and Baeumner 2002, Heijnen and Medema 2009, Rhodes et al. 2011, Liu et al. 2012. In scenarios of low faecal pollution le v els, it is common for a lar ge pr oportion ( > 50%) of measur ements to be below a GFPD method limit of quantification. For these censored data, the true genetic tar get concentr ation cannot be firml y established and can r epr esent a significant source of bias in downstream statistical analyses. While it may be convenient to ignore censored data, these measurements offer important information. As a result, there is a growing interest in the de v elopment and use of statistical methods that can r esponsibl y incor por ate censor ed data into concentration estimates, hypotheses tests, r egr essions, and other analyses to help minimize potential bias and maximize faecal pollution trend insights. For example, Cao et al. ( 2018 ) de v eloped a qPCR censored data faecal score approach to estimate a weighted-av er a ge genetic marker concentration from a defined group of samples using all measurements (e.g. nondetection, below the limit of quantification, or within the range of quantification). Additional r esearc h is needed to further adv ance censor ed data analysis methodologies custom designed for GFPD applications.
HTS applications as identified in our literature analysis (section ' Outcomes of the systematic study design analysis ' ) , face challenges in addition to WCA. In fact, the ac hie v able sensitivity of 16S AmpSeq a pplications, a ppl ying gener al primers for br oad taxonomic detection, such as kingdom and phylum le v el, str ongl y depends on the r elativ e abundance of faecal pollution-associated intestinal micr obiota compar ed to nonfaecal pollution associated microbiota (i.e . en vironmental 'background microbiome'). Water resour ces, sho wing lo w to moderate faecal pollution le v els and abundant aquatic microbiomes (e.g. 10 9 -10 11 cells/l for lakes or riv ers; Kirsc hner et al. 2004, Velimirov et al. 2011, become problematic, e v en when a ppl ying high amplicon sequencing-depth (Vierheilig et al. 2015 ). Consequently, identified studies have most fr equentl y focused on water resources with significant municipal and a gricultur al faecal pollution le v els (section ' In-depth review of the application areas of genetic faecal pollution diagnostic through case studies ').

Cutting-edge solutions r equir e in-depth expert knowledge
The ability of GFPD methods to detect (is there a pollution problem?), quantify (what is the extent of pollution?), and allocate (what are the sources of pollution?) faecal pollution in water and w ater resour ces has undoubtedl y r e volutionized this ar ea of HRWM r esearc h during the last two decades (Malakoff 2002 ). Howe v er, the a pplication of gener al and host-associated faecal markers to generate accurate information on the responsible faecal pollution sources is not trivial. For example, the available genetic faecal marker targets as well as their quantification systems, differ in pollution source abundance and environmental persistence. Ther efor e, differ ences in these c har acteristics may se v er el y compr omise or pr e v ent meaningful inter pr etation of r esults. Box 1 (upper panel, nonoptimal parameter setup) shows a hypothetical MST situation to illustrate the confusing effects that differential abundance and persistence of MST markers can impose for corr ect indication. Quantitativ e comparisons of MST r esults, or the more complex task of source apportionment (i.e. computation of faecal loads from the various sources), solely based on qPCR results , ma y ther efor e, onl y be ac hie v able for a limited 'diagnostic space' (see the examples t 0, t 1, and t 2 in Box 1, and ' A toolbox approach with case-dependent selection criteria ' below). Having sound expert knowledge on the potentials and limits of GFPD is thus an essential pr er equisite for corr ect a pplication of GFPD in the field.

A toolbox approach with case-dependent selection criteria
No method comes without limitations, and no single method can have a universal application. Each genetic faecal parameter has specific biological-diagnostic and technical-analytical attributes (T able 2 ; T able S1, Supporting Information). The selection of di-Bo x 1: Micr obial source tracking markers: diagnostic scenarios Simple hypothetical MST situation with two different point sources of pollution [e.g. human wastewater and animal (pig) manure] of equiv alent disc har ge and contamination load for a small riv er. For r easons of simplicity, onl y dilution at the time of contamination ( t 0) and decay of the MST markers is considered (i.e. batch-reactor system with complete mixing and no sedimentation). Three time slots ( t 0, t 1, and t 2) are chosen to illustrate the different 'diagnostic windows' of MST indications at the given detection limit (sample limit of detection, sLOD). Nonoptimal toolbox. All four applied MST markers show different abundance in their respective faecal excreta and persistence in the water body. At t 0, all four MST markers allow corr ect qualitativ e detection of both sources (differential persistence insignificant). Due to the differential abundances of MST markers, no direct estimation on the relative importance of PS1/PS2 is possible. Ho w ever, mathematical corrections of concentration differences in excreta would make this possible. At t 1, MST marker 1B leads to false negative detection of PS1, due to differential persistence. Even in the case of accounting for differential abundance, only MST markers 1A and 2A can be used to estimate the r elativ e importance of pollution of PS1/PS2 thanks to their similar persistence. At t 2, only PS2 is detectable by MST marker 2B, thus the diagnosis would miss PS1 (false negative detection at the given sLOD). Optimal toolbox. Both selected pairs of MST markers show comparable pollution source abundance in faecal excreta and persistence in the water body. The MST markers pair 1A-2A allow the estimation of the r elativ e contribution of PS1 and PS2 at all times ( t 0-t 2). Due to lo w er sour ce abundance, the MST marker pair 1B-2B only allo w detection and comparison at t 0 and t 1, but not at t 2.
agnostic tools, as well as the chosen field investigation strategy, should ther efor e, be designed to best suit the giv en faecal pollution pr oblem (Sc hoen et al. 2020 ), including a sound knowledge of the catchment characteristics and hydrological regime (Reischer et al. 2008 ). A basic catchment survey or pollution source profile can substantially improve the understanding of the situation and guide the selection of GFPD parameters and methods with a ppr opriate performance c har acteristics (Reisc her et al. 2011, Derx et al. 2023. In addition to persistence, it is equally essential for MST to have a ppr opriate (binary) faecal sensitivity and specificity of the se-lected genetic marker ( Table 2 ). The minimum acceptable le v els of faecal sensitivity and specificity depend on the faecal pollution scenario under investigation (such as the relative abundance of the diagnosed faecal pollution sources). These levels can be determined through statistical considerations or catchment-based scenario simulations (Kildare et al. 2007. A well-selected combination of markers, along with an algorithm that considers the sensitivity and specificity characteristics of the markers, enables more confident source identification compared to an individual marker (Ballesté et al. 2020 ). Faecal specificity is also important for general faecal markers (should be absent in a pristine environment, Table 2 ) and, in analogy with MST markers, should be e v aluated in the studied catchments (Vierheilig et al. 2012 ). Ther e ar e significant knowledge gaps regarding the mobility of indicator micr oor ganisms and viruses detected by GFPD (Table 2 ). Mobility can be an essential factor in almost any natural and technical aquatic compartment. For example , mobility ma y codetermine the fate of MST markers , (i) in deposited fresh cow pats on pastures (e.g. activation tendency and run-off during r ainfall; De v ane et al. 2022 ); (ii) during w astew ater tr eatment (e.g. attac hment to se wa ge sludge fr action or dispersion in the water phase; Wang et al. 2023 ); (iii) in surface water tr ansport pr ocesses, suc h as riv er water (e.g. attac hment to settling particles and sedimentation or transport in suspended fraction; Fauvel et al. 2017 ); or (i v) during ri ver bank filtration (e.g. straining or attachment in the aquifer or aquifer transport; Wang et al. 2022 ). Mobility c har acteristics ar e often complex, as they are potentially influenced by physical, chemical and biological processes, depending on the aquatic scenario. Finally, the resistance of genetic markers to technical treatment is an additional important biological-diagnostic attribute (Table 2 ). Different resistance is expected for cultivation-based parameters than for DN A/RN Abased parameters (section ' Generating viability-and infectious status information '). For instance, in contrast to cultivation-based FIO concentr ations, almost no r eduction was observ ed for qPCR-based prokaryotic MST markers during UV-treatment of w astew ater and drinking water (Steinbacher et al. 2021 ).

Pr okar yotic targets dominate , but the importance of viral targets increases
Key biological elements in the current state-of-the-art toolbox are the various general-and host-associated faecal markers, quantified by qPCR or dPCR assays (section ' Background information on genetic targets and methods: a historical overview '). The systematic liter atur e anal ysis r e v ealed that pr okaryotic faecal markers hav e dominated GFPD up to now (Fig. 14 ). Howe v er, vir al faecal markers have seen an increase in the past 10 years, while mitochondrial markers have been applied to a far lesser extent (Fig. 14 ). The combined use of selected MST markers, with adequate performance c har acteristics, hold pr omise for detecting and allocate faecal pollution with incr eased confidence, e v en under c hallenging faecal pollution scenarios (e.g. undiluted vs. diluted, fresh versus a ged, untr eated vs. tr eated faecal pollution). In this r espect, complementing prokaryotic GFPD applications with viral faecal markers can be especially important to account for the increased persistence, resistance and mobility characteristics of such types of intestinal contaminants (' Application 4: Microorganisms attenuation during treatment ' and ' Application 5: Estimating of infection and health risk '). Cultivation-based FIO (see discussion 'hybrid application' below), pathogen detection and antibiotic resistance analysis complements the current array of biological elements in GFPD (Fig. 5 ).
The GFPD toolbox is steadily gro wing, although no w at a slo w er pace than during the first pioneering decade of the new millennium (Fig. 3 , number of establishment/a pplication studies). Ne w genetic faecal markers and/or impr ov ed detection systems are, without any doubt, essential for the further development of the discipline. Ho w e v er, it should be k e pt in mind that providing detailed information on their environmental behaviour and application c har acteristics (T able 2 ; T ables S1 and S2, Supporting Information) is equally important, if not more, to successfully implement them in HRWM r esearc h (Fig. 13 ). Ther e is a disparity between the availability of genetic markers and detection systems compared to the availability of information on their biological-dia gnostic c har acteristics and on their a pplicability in differ ent types of water resources and subcompartments (under given biotic and abiotic conditions , e .g. (Boehm et al. 2019, K orajkic et al. 2019, Lu and Imlay 2021).

Integrated data analysis and modelling
The inclusion of other microbiological and environmental parameters into the study design can gr eatl y enhance the information gained from GFPD in vestigations . Remarkably, almost twothirds of the identified source tr ac king studies (single and multiple sources) sim ultaneousl y a pplied genetic MST-marker qPCR quantification and traditional cultivation-based enumeration of FIOs (Fig. 3 ), determined b y standar dized par ameters suc h as those for E. coli (ISO 1998a(ISO , 2012(ISO , 2014(ISO , 2018 or intestinal enterococci (ISO 1998b(ISO , 2000. The need to determine the causes responsible for faecal pollution in water ob viousl y pr omotes this most popular 'hybrid application' (sections ' Application 2 ' and ' Application 3 '). In addition, data on pathogen occurrence and physicochemical water quality were used to complement the investigation, although to a far lesser extent. In contrast, the identified GFPD studies hardly utilized information on hydr ology, meteor ology, land use or epidemiology for statistical data analysis (Fig. 5 ). This is contrary to expectations, as environmental data, such as data on catc hment hydr ology and land use (GIS, ma pping), hav e pr ov en essential for an impr ov ed a pplication, understanding and interpretation of GFPD in w ater quality resear ch (Reischer et al. 2008, Peed et al. 2011, Bambic et al. 2015, Verhougstraete et al. 2015, Fric k et al. 2020, Gr een et al. 2021, Diedric h et al. 2023. Without a doubt, there is significant potential to better utilize and integrate environmental data in GFPD analysis in future HRWM research (Fig. 13 ).
Data from GFPD, together with FIO and pathogens, are increasingly used in modelling and simulation. Potential areas of interest include all issues and scales of HRWM r esearc h (r anging from faecal marker persistence/dilution models to catchmentbased source/sink transport simulations) as well as application types (such as faecal pollution, MST, treatment, and infectionand health risk assessment) as cov er ed in this liter atur e anal ysis (Dorner et al. 2006, Sok olov a et al. 2012, Boehm et al. 2015, Pascual-Benito et al. 2020. To highlight the importance, modelling becomes essential, e.g. to estimate the r equir ed micr obial/vir al log r eduction tar gets for waste water or drinking water treatment or to determine the appropriate setback distances during riverbank filtration. Importantly, modelling and simulations also allow the assessment of future scenarios and e v en the prediction of the management measures that will be r equir ed considering future climate and global change phenomena . One of the big challenges of modelling and simulation in health-related water quality research and GFPD is to provide all the data and data collections r equir ed (Fig. 13 ).

Conclusions
→ The tools and a ppr oac hes de v eloped for GFPD hav e r e volutionized HRWM r esearc h in the last two decades in terms of faecal pollution detection and microbial source tr ac king, the current core areas of application. Together with nucleic acid extract biobanking, GFPD represents a new level of methodological possibilities in health-related water quality r esearc h in the 21st century, e v en in remote or less de v eloped regions. → GFPD is ready to expand to many other application areas within and outside the field of HRWM. For instance, it will further gain importance in infection and health risk assessment (e.g. r ecr eational water quality monitoring) and will incr easingl y support the e v aluation and v erification of water treatment and disinfection processes, in combination with standardized treatment indicators and cultivationbased enumeration. → The COVID-19 pandemic gave a strong boost to the field of w astew ater surv eillance. Waste water surv eillance for SARS-CoV-2 is curr entl y tr ansforming into a global earl y warning disease monitoring system. GFPD will likely incr easingl y support waste water surv eillance in data gener ation, pollution source c har acterization, normalization, and quality assurance. Since both 'sister' disciplines use the same molecular biological fr ame work and infr astructur e, potential synergies are significant. In general, GFPD has the potential to support any environmental global infectious disease surv eillance system, cov ering human and other animal populations. → As demonstrated by the many identified studies, internationally acce pted, culti vation-based water quality par ameters, suc h as E. coli or intestinal enterococci, can be effectiv el y complemented with GFPD, thus significantl y expanding the methodical possibilities in water quality monitoring and management, when needed (e.g. MST to trace the origin of cultivation-based FIO). GFPD constitutes a toolbox a ppr oac h. Tailor-made scientific inv estigation and monitoring solutions can be r a pidl y established by experts. → The current century is 'the Century of Life Sciences', especially considering how molecular biology and bioinformatics r a pidl y tr ansform health sciences and medicine. It is also the era of information technology, artificial intelligence, and automatization. These driving forces will certainly promote further innovation within genetic faecal pollution detection. Man y tec hnological br eakthr oughs ar e expected. → From science to practice . T he water management sector incr easingl y needs the tools and a ppr oac hes offer ed by GFD to solv e futur e c hallenges (e.g. c hallenges r elated to SDG6).
The translation of such tools to practice has to be paralleled b y standar dization efforts . While some countries ha ve alr eady started suc h activities (e.g. thr ee assays ar e standardized in the USA), international standards are still lacking. These needs will have to be defined by the water management sector and translated to future GFPD guidelines and standar ds b y global panels of experts.
→ This meta-anal ysis pr ovides the scientific status quo of the field of GFPD. It should promote further research to advance the scientific field and serve as a condensed information source for the wider audience, including microbiolog ists, water hyg ienists, water management professionals, and public health experts.

Ac kno wledgements
The authors acknowledge TU Wien Bibliothek for financial support through its Open Access Funding Pr ogr amme . T he authors are thankful to Andreas Pacher from TU Wien Bibliothek for help with database searches and for critical discussion on paper evaluation metrics. We thank also Dr. Mats Leifels for providing useful discussion and r efer ences. Information has been subjected to U.S. EPA peer and administr ativ e r e vie w and has been a ppr ov ed for external publication. Any opinions expressed in this paper are those of the authors and do not necessarily reflect the official positions and policies of the U.S. EPA. Any mention of trade names or commercial products does not constitute endorsement or recommendation for use . T his is a joint r esearc h effort of the Interuniversity Cooper ation Centr e for Water & Health ( www.waterandhealth.at ) and the Global Water P athogen Pr ogr am (GWPP) initiativ e ( www. water pathogens.or g ).

Supplementary data
Supplementary data is available at FEMSRE online.

Conflict of interest statement.
The authors declare no conflict of interest.

Funding
This w ork w as supported b y the