Convergent Amino Acid Signatures in Polyphyletic Campylobacter jejuni Subpopulations Suggest Human Niche Tropism

Abstract Human infection with the gastrointestinal pathogen Campylobacter jejuni is dependent upon the opportunity for zoonotic transmission and the ability of strains to colonize the human host. Certain lineages of this diverse organism are more common in human infection but the factors underlying this overrepresentation are not fully understood. We analyzed 601 isolate genomes from agricultural animals and human clinical cases, including isolates from the multihost (ecological generalist) ST-21 and ST-45 clonal complexes (CCs). Combined nucleotide and amino acid sequence analysis identified 12 human-only amino acid KPAX clusters among polyphyletic lineages within the common disease causing CC21 group isolates, with no such clusters among CC45 isolates. Isolate sequence types within human-only CC21 group KPAX clusters have been sampled from other hosts, including poultry, so rather than representing unsampled reservoir hosts, the increase in relative frequency in human infection potentially reflects a genetic bottleneck at the point of human infection. Consistent with this, sequence enrichment analysis identified nucleotide variation in genes with putative functions related to human colonization and pathogenesis, in human-only clusters. Furthermore, the tight clustering and polyphyly of human-only lineage clusters within a single CC suggest the repeated evolution of human association through acquisition of genetic elements within this complex. Taken together, combined nucleotide and amino acid analysis of large isolate collections may provide clues about human niche tropism and the nature of the forces that promote the emergence of clinically important C. jejuni lineages.


Introduction
Many bacterial species that are known as causes of gastroenteritis are common commensal organisms causing little or no harm to the host species. For pathogenic strains of these species, the pathway to disease can involve a series of population bottlenecks. Therefore, clinical isolates sampled from patients are a subset of the bacterial population, representing strains that had the opportunity to infect and survive new selective pressures associated with a pathogenic lifestyle.
The common gastrointestinal pathogen Campylobacter jejuni is widely distributed among wild and domesticated animal species/reservoirs (Sheppard et al. 2011), and the majority of the human infections are the result of consumption of contaminated food (Kapperud et al. 2003;Friedman et al. 2004;Skarp et al. 2016). Campylobacter jejuni populations are generally structured by host source (Sheppard et al. 2010(Sheppard et al. , 2011, and this has allowed the attribution of the source of human infection based upon comparative multilocus sequence typing (MLST) and whole-genome characterization of host and clinical isolates (Sheppard, Dallas, MacRae, et al. 2009;Sheppard, Dallas, Strachan, et al. 2009;Pascoe et al. 2015;Dearlove et al. 2016;Thepault et al. 2017). These studies revealed chickens as a major source of human campylobacteriosis (EFSA 2015). On the assumption that all strains are equally able to infect humans, the abundance of C. jejuni in farmed chickens (Vidal et al. 2016) and contamination of retail poultry (Wimalarathna et al. 2013) would be enough to explain the importance of chickens as a pathogen reservoir. However, recent studies of C. jejuni in poultry have shown that some common chicken-associated strains are rare among clinical isolates while others increase in relative frequency (Yahara et al. 2017). This suggests that factors other than simple opportunity for transmission are involved in human infection.
In some species, such as Escherichia coli, the emergence of pathogenic strains can be associated with the acquisition of specific attributes which confer increased ability to cause disease or evade treatment. For example, genetic elements that encode virulence and persistence in humans such as those carried by phages and plasmids in E. coli or the acquisition of antibiotic resistance in Staphylococcus(as reviewed in Kaper et al. 2004;Pantosti et al. 2007). In some cases the acquisition of small amount of genetic material increases the virulence, as seen in the large scale outbreak of the Shiga-like-toxin producing E. coli O104:H4 (Frank et al. 2011). Where specific pathogenicity elements can be identified, it is relatively simple to identify the agent causing an outbreak and its molecular cause. However, in C. jejnui, traits associated with clinical isolates not only reflect virulence but also those that confer a fitness advantage against the various selective pressures encountered in the poultry processing chain, such as survival in the nonhost environment (Yahara et al. 2017).
The increasing availability of whole-genome data provides opportunities to investigate the genomic differences underlying variation in proteins and their motifs that may promote the proliferation of particular pathogenic strains. Epidemiological studies of C. jejuni from clinical samples and animal reservoirs typically reveal genetically diverse populations. However, isolates belonging to CC21 and CC45 are regularly the most common lineages isolated from human disease (K€ arenlampi et al. 2007;Levesque et al. 2008;Mullner et al. 2009;Sheppard, Dallas, MacRae, et al. 2009;Sheppard, Dallas, Strachan, et al. 2009;Sanad et al. 2011;Mughini Gras et al. 2012;Sahin et al. 2012;Guyard-Nicodeme et al. 2015). Both of these lineages have been isolated from a variety of sources, including ruminants, poultry, wild birds, domesticated companion animals, as well as environmental samples (Sopwith et al. 2008;Sheppard et al. 2011Sheppard et al. , 2014. This ecological generalism may reflect a degree of genotypic and phenotypic plasticity that facilitates rapid host adaptation in a multihost environment (Read et al. 2013;Woodcock et al. 2017;Pascoe et al. 2017) but little is known about the specific genomic variations that promote proliferation of particular STs, within generalist lineages, in different niches such as human hosts.
Here we combine nucleotide-based phylogenetic analysis with amino acid sequence-based clustering to characterize populations of C. jejuni from humans and agricultural animals, and identify candidate genes involved in these possible host associations. Our hypothesis was that a combined methodological approach would identify subtle host-associated differences between isolates from major generalist groups. These analyses identified sublineages of the ST-21 complex that were overrepresented among isolates sampled from human disease. The putative functions of genes within human-only amino acid clusters included those important in human pathogenesis, such as flagella and capsule synthesis. Our study provides a new way of interrogating genomic data sets to identify candidate genes in a subset of strains that may indicate a population bottleneck associated with human colonization.

Bacterial Genomes
A total of 601 C. jejuni genomes were used in this analysis, previously published in various studies (Cody et al. 2013;Sheppard, Didelot, Jolley, et al. 2013;Sheppard, Didelot, Meric, et al. 2013;Pascoe et al. 2017;Yahara et al. 2017) (supplementary table S1, Supplementary Material online). The majority of these came from clinical isolates (n ¼ 481) and the rest from agricultural sources, either poultry (n ¼ 88) or cattle (n ¼ 32). Most isolates were from the United Kingdom (n ¼ 546/601, 90.1%). A total of 134/601 (22.3%) were from CC-45 and 467/601 (77.7%) were from CC-21-48-206 (supplementary table S1, Supplementary Material online), which have been shown to form a single sequence cluster in previous studies (Sheppard, Didelot, Meric, et al. 2013). These constituted all the sequenced genomes available to us when this study was initiated. CC21-48-206 is henceforth collectively referred to as CC21 group in this study. Sequencing was performed on Illumina platforms, and assemblies were performed with either Velvet (Zerbino and Birney 2008) or Spades (Bankevich et al. 2012). Assembled DNA sequences from various sources (supplementary table S1, Supplementary Material online) were uploaded to a web-based database based on the BIGSdb platform (Jolley and Maiden 2010) which allowed archiving, whole-genome gene-by-gene sequence alignments and prevalence analyses. In addition, the isolation source of all available CC21 group and CC45 isolate records (n ¼ 17,107) from the pubMLST database (https:// pubmlst.org/campylobacter/; last accessed February 07, 2018) were obtained (October 21, 2016) and analyzed to quantify the numbers of different STs isolated from humans and agricultural animals and contextualize this study.

Phylogenetic Tree Inference
Sequence alignments were obtained using a gene-by-gene approach (Sheppard et al. 2012). Briefly, the presence of 1,668 coding sequences (CDS) from the reference C. jejuni NCTC11168 genome (NCBI accession: NC_002163.1) in all 601 genomes of this study was inferred using BLAST with the following parameters: A gene was considered present when a local alignment match with the reference was obtained on >50% of the sequence length with >70% sequence identity. Using these criteria, 1,058 genes were shared by all 601 genomes from our data set, constituting the "core genome." Gene-by-gene alignments using MAFFT (Katoh and Standley 2013) were concatenated to create a core genome gene-bygene alignment that was used subsequently. For protein trees, in-frame translation was performed using custom scripts (supplementary file 1, Supplementary Material online) for each individual gene alignment, which were then concatenated. The resulting concatenations were used as an input for the reconstruction of phylogenetic trees, either using an approximation of the maximum-likelihood algorithm implemented in FastTree2 ( KPAX2 Method: Bayesian Clustering Based on Amino Acid Sequence KPAX2 is a new Bayesian method for identifying evolutionary signals in amino acid sequences that relate to differential evolution of lineages that may be either monophyletic or polyphyletic, for example, resulting from the horizontal distribution of relevant genomic elements through recombination (Pessia et al. 2015). Earlier analysis of a database of thousands of influenza A virus H3N2 subtypes demonstrated that the method could accurately identify antigenic clusters determined by amino acid variation and the sequence positions relevant for the antigenic differences (Pessia et al. 2015). The concatenated set of 601 core genome sequences corresponded to 153,911 amino acid positions, harboring 17,405 polymorphic sites. KPAX2 was used with the default prior settings, and inference was initialized with a proposal partition of the samples obtained using the k-medoids algorithm based on Tajima and Nei (1984) pairwise distances of protein sequences together with the Tamura and Kumar (2002) correction for heterogeneous patterns. The initial number of clusters was chosen by selecting the k associated with the highest log posterior probability under the KPAX2 model. In total, 100 partitions were then created by applying random modifications to the initial partition obtained by the kmedoids solution to the proposal partition. Split, merge, and transfer operators were as previously described (Pessia et al. 2015). Each of the 100 partitions was then independently used as a starting state for the KPAX2 posterior maximization algorithm to ensure that the final estimate was as close to the global posterior mode as possible. The 100 KPAX2 runs were done in parallel on a cluster computer, where the individual runs took approximately 1-2 weeks until convergence. The clustering solution with the highest log posterior probability among the 100 independent runs was chosen as the final estimate. The source of isolates belonging to different KPAX clusters was indicated for isolates from: human clinical only (clinical); chicken and human clinical sources (chicken þ clinical); cattle and human clinical sources (cattle þ clinical); and chicken, cattle and human clinical sources (chicken þ cattle þ clinical) (supplementary  table  S2, Supplementary Material online). For each KPAX cluster, characteristic amino acids were determined (Pessia et al. 2015), as well as corresponding proteins and genes in the C. jejuni NCTC11168 reference genome (supplementary table S3, Supplementary Material online). This allowed for a comparison of KPAX clustering results with genome-wide association study (GWAS) results to identify the genes associated with clinical-only C. jejuni KPAX groups.

Prevalence of STs from Human-Only KPAX Clusters among Isolates from Human and Nonhuman Sources
Total prevalence of C. jejuni STs observed to belong to human-only KPAX clusters was quantified among samples isolated from human and nonhuman sources (mainly poultry and cattle) and was inferred using isolation source information specified in a total of 17,107 CC21, CC48, CC206, and CC45 isolate records, taken from a total of 49,598 archived isolate records from every CC publicly available in the pubMLST database (https://pubmlst.org/campylobacter/; accessed October 21, 2016).

SEER Method: Genome-Wide Association Mapping
We used a k-mer enrichment method to identify, from the nucleotide sequence data, which genomic elements were significantly more prevalent in two groups of isolates: The human-only KPAX clusters (group 1, n ¼ 103) compared to the remainder of the C. jejuni population (group 2, n ¼ 498) (Weinert et al. 2015;Lees et al. 2016). This binary trait analysis was performed to ensure that eventual gene regulatory elements or accessory genes associated with the clusters would not remain unidentified, because the KPAX2 method is based only on core protein sequence variation. The input assemblies contained approximately 31 M unique k-mers with lengths between 10 and 99 nucleotides. The following filtering steps were applied to reduce the original k-mer input set by including only k-mers that: 1) had >75% frequency in group 1 and <25% frequency in group 2; 2) had a chi-square association test P-value < 10 À8 ; and 3) had association P-value < 10 À8 in a logistic regression model with the three first multidimensional scaling coordinates representing the population structure correction. The multidimensional scaling coordinates were calculated from a distance matrix based on 10,000 randomly selected k-mers from the initial set. The final set of genome-wide significant k-mers contained 347 k-mers, which were mapped to an annotated reference genome to identify their contexts.

STs Vary in Frequency in Human Clinical and Agricultural Environments
Direct comparison of the relative prevalence of sequence types was performed using the entire Campylobacter PubMLST database. This contained a total of 49,598 entries on October 21, 2016. Of these 13,095 belonged to the CCs 21, 48, and 206, previously shown to form a single sequence cluster based upon whole-genome analysis, and 4,012 belonged to CC45 complex. Within the CC21 group there were 8,382 human clinical isolates and 3,869 originating from agricultural animal sources, while in CC45 there were 1,674 human clinical isolates and 1,685 agricultural isolates. The relative abundance of isolate STs belonging to CC21-48-206 and CC45 was determined ( fig. 1). In both CCs, there was variation in the relative frequency of STs isolated from human clinical and agricultural animal samples.

Amino Acid Sequence-Based Analysis Reveals Human-Only Subclusters
The Bayesian model-based method KPAX2 was used to classify aligned proteins into functionally divergent groups, based upon amino acid residues of a collection of 601 genomes representing 66 STs belonging to the CC21 group and CC45. A total of 1,058 core CDS used in the nucleotide phylogeny were in silico translated and a concatenated amino acid alignment produced for each genome-sequenced strain. We then performed Bayesian clustering using the KPAX2 algorithm, and the tree was annotated with the 36 KPAX clusters identified ( fig. 2). KPAX groups could be classified into four categories depending on sources of isolates: Human only (12 KPAX groups, 112 isolates from 20 STs), human and chicken only (10 KPAX groups, 150 isolates from 20 STs), human and cattle only (4 KPAX groups, 33 isolates from 13 STs), and human, chicken and cattle (10 KPAX  genomes of the pubMLST-archived comparative data set; however, it is useful to contextualize KPAX-ST correlation within a wider data set. It should be noted that the ST designation can have poor specificity in contrast to the lineages determined from whole genomes and therefore an isolate from a nonhuman host present in the pubMLST database may lack the genetic elements identified in our present analysis.

Identification of Genes with Human-Associated Amino Acid Signatures within the CC21 Group
We sought to identify the discriminatory amino acids that resulted in clustering of human clinical-only CC21 group isolates. We identified a total of 1,213 amino acids sites which mapped to 265 genes (supplementary table S4, Supplementary Material online). Mapping the physical location of these against the reference CC21 genome NCTC11168 suggested that these loci were distributed across the genome and not under strong linkage disequilibrium resulting from physical proximity ( fig. 3A). Interestingly, a total of 24/265 (9.0%) genes were found to be associated with previous GWASs (supplementary table S4, Supplementary Material online). More specifically, 3 genes were predicted to have a role in survival from farm to clinical disease (Yahara et al. 2017), 8 genes to have a role in in vitro colonization of surfaces and aggregation (Pascoe et al. 2015), and 14 genes to have a role in nonhuman host adaptation (Sheppard, Didelot, Meric, et al. 2013) (supplementary table  S4, Supplementary Material online). Although some of these associations were sometimes weak in the corresponding studies, they were nonetheless highlighted and are consistent with a general role in transmission and host colonization.
To confirm whether these loci were associated with a human clinical-only sublineage we also performed sequence element enrichment analysis, using SEER (Lees et al. 2016), to identify the genetic basis of human clinical-only sublineage strains compared with those from other host sources ( fig. 3 (Sheppard, Didelot, Meric, et al. 2013;Pascoe et al. 2015;Yahara et al. 2017).
A total of 26 genes were significantly associated with human-only lineages in both KPAX clustering and SEER association analyses ( fig. 3, table 2). Half of these genes have been described as important for host colonization or pathogenesis, nine in humans or human cell studies, and four in chicken colonization studies ( Isolates are labeled by KPAX group labels (integers) and colored by their source distribution within KPAX groups: Isolates from chicken and clinical sources (yellow), cattle and clinical sources (blue), chicken, cattle and clinical sources (pink), or clinical only (red). Polyphyletic KPAX groups, reflecting isolates in the same KPAX group but in multiple lineages on the tree, are indicated with an asterisk. The phylogenetic tree was reconstructed from a whole-genome gene-by-gene amino acid alignment, translated in-frame, using an approximation of the maximum-likelihood algorithm implemented in FastTree2, and using a general time reversible model. these genes in host adaptation and/or in multihost fitness. Of particular note within these genes were the flagellar gene flgH highlighted in a previous GWAS on nonchicken host adaptation (Sheppard, Didelot, Meric, et al. 2013), two genes (ceuC and ceuE) involved in the enterochelin iron uptake system in C. jejuni, a gene (aspB) involved in aspartate metabolism, and a gene (fdhD) encoding a formate dehydrogenase, a function that has been highlighted as important for survival from farm to clinical disease (Yahara et al. 2017). All five of these genes are known to be important in the invasion of mammalian cells and/or human colonization (Palyada et al. 2004;Guerry 2007;Novik et al. 2010;Sheppard, Didelot, Meric, et al. 2013;Yahara et al. 2017).

Discussion
An important aim in zoonotic pathogen research is to identify genetic and functional variations associated with lineages or sublineages that cause human infection. Comparative analysis of nucleotide sequence variation across the genome has improved understanding of the epidemiology and evolution of Campylobacter (Sheppard, Didelot, Jolley, et al. 2013;Gilbert et al. 2016;Llarena et al. 2016). Although this has provided a basis for identifying candidate genes with potential functional significance (Morley et al. 2015;Pascoe et al. 2015;Yahara et al. 2017), straight forward genome analysis often ignores factors relating translation and the production of specific amino acid chains and proteins that may be important in host adaptation or pathogenicity. For example, although the four nucleotides can form 64 different triplets they only encode 20 amino acids. This means that the same amino acid can be encoded by different triplets, typically with variation at the third base, and divergent genomes may have convergent amino acid sequences that are potentially functionally important in host adaptation or pathogenesis. Analysis of encoded amino acid sequences in this study identified polyphyletic nucleotide sequence clusters within the CC21 group that clustered together within the same amino acid sequence clusters. These convergent human-only amino acid KPAX clusters, in divergent genomic backgrounds, may have been overlooked using conventional nucleotide sequence-based approaches.
Comparative analysis of the nucleotide sequence of the 601 C. jejuni genomes in this study identified STs belonging to the CC21 group and CC45 that were reported to have been isolated at different frequencies from agricultural animal and human sources lineages. This is consistent with other population genomic studies, where the variation in relative abundance has been explained by the different capacity of certain strains to survive through the poultry production chain at atmospheric oxygen concentrations (Yahara et al. 2017). Asymptomatic carriage of C. jejuni is not thought to be common in humans in industrialized countries (Lee et al. 2013). Therefore, under a simple transmission model, amino acid clusters would be expected to be present in both reservoir animal and infected human hosts. For this reason, the existence of strongly human-only amino acid KPAX clusters is unexpected. There are two possible explanations. First, isolates assigned to human-only KPAX clusters are derived from a source that is not represented in our isolate collection, which has not been captured by the sampling of isolates used in this study. Second, there are isolates that share amino acid clusters within CC21 group C. jejuni in our data set that increase in relative frequency in humans, compared with the isolates from other hosts. Additionally, it is possible that asymptomatic carriage of Campylobacter may be underestimated and underreported (Calva et al. 1988;Louwen et al. 2012;Lee et al. 2013;Islam et al. 2017). These factors could influence the evolution and population structure of symptomatic bacteria.
Examination of isolate records in the entire pubMLST database revealed that 97% of the isolates assigned to humanonly amino acid KPAX clusters are of STs that have been isolated from other host species as well as humans (table 1). Notably, only five STs from human-only KPAX groups (corresponding to 7/276 isolates in our data set) have never been reported in nonhuman hosts, either in our data set or from isolate records in pubMLST. On the basis of the known sources of C. jejuni in human infection-including CC21 group isolates (Sheppard, Dallas, MacRae, et al. 2009;Sheppard, Dallas, Strachan, et al. 2009), the close similarity between C. jejuni populations on food and those from clinical samples (Kittl et al. 2013), and the presence of STs belonging to human-only amino acid KPAX clusters among agricultural hosts in pubMLST, it is unlikely that they indicate an unknown The outer circle indicates genes from the C. jejuni NCTC1168 reference genome, with core genes shared by all isolates in our data set (black) and accessory genes (gray) indicated. Genes found to contain characteristic amino acid sites defining KPAX groups are represented (red ticks) along with a quantitative visualization of the number of these sites per gene (red dots; scale of the quantification from 0 to 420). Genes found to contain k-mers associated with clinical-only KPAX groups using SEER are represented (blue ticks) along with a quantitative visualization of the number of these k-mers mapped per gene (blue dots; scale of the quantification from 0 to 25). Black ticks indicate genes containing both KPAX group characteristic sites and associated k-mers using SEER. (B) Difference in COGs prevalence (%) among genes containing KPAX characteristic sites (red) and genes containing associated k-mers inferred by SEER (blue) with COGs prevalence in the C. jejuni NCTC11168 reference genome annotation. host source population, although this cannot be ruled out in this study. Our results are therefore consistent with the increase in relative frequency of particular amino acid sequence subclusters that are uncommon in animal hosts, among isolates from humans. Host colonization potential is influenced by the adaptive genomic variations that exist before and after transmission to the new host species (Geoghegan et al. 2016). In both cases, population bottlenecks reduce the genetic variance in the population at interhost transmission which would account for the increased relative frequency of human-only amino acid KPAX clusters. It remains difficult to differentiate genetic changes associated with bottlenecking and drift from adaptive physiological changes that directly impact pathogenesis, such as human tissue tropism and virulence. Furthermore, human passage can induce genetic variation in contingency genes coding surface structure through frame shifts and phase variation (Bayliss et al. 2012;Revez et al. 2013;Thomas et al. 2014). However, the sharing of amino acid sequence clusters by polyphyletic lineages is evidence of homoplasy and investigating the putative function of these genes may provide clues about their potential role in human colonization. Human-only KPAX clusters are present in every major lineage within the CC21 group ( fig. 2) and are notably absent among CC45 isolates. This asymmetry cannot be explained by an insufficient sample size from the CC45 population in our data set and may suggest that, despite being an efficient human colonizer, CC45 strains may lack the suitable genetic background for acquisition of genomic elements that are associated with elevated human colonization that we observe in the CC21 group. Further analysis of larger sample sets, potentially including phenotypic analyses, is needed to confirm this.
Genome-wide association methods that have recently been applied to bacteria (Sheppard, Didelot, Meric, et al. 2013) allow the investigation of genetic variation that underlies important phenotypes. By quantifying the nucleotide sequence that was enriched in isolates from humans (Lees et al. 2016) across the genomes, we were able to investigate the putative function of genes with human-only amino acid KPAX clusters. A total of 26 genes were identified (table 2), half of which have been previously linked to host colonization or pathogenesis, nine in humans or human cells, four in chicken. For example, flgH, a gene associated with flagellar assembly (table 2) and otherwise associated with adaptation in a mammalian host (Sheppard, Didelot, Meric, et al. 2013). Flagellar motility has been shown to be important for human and chicken colonization, and possibly for the secretion of virulence factors into host cells (Guerry 2007). Genes directly involved in host colonization also included ceuCE, involved in enterochelin uptake (table 2). The uptake of siderophore has been described as a virulence/host colonization trait in Campylobacter (Richardson and Park 1995), and a ceuE mutant has been shown to be altered in chicken colonization abilities (Palyada et al. 2004). Additionally, the cdsA gene is located in the genomic region of known maf adhesins, involved in survival and host colonization (Karlyshev et al. 2002). Knockout mutants of cj0005c, an uncharacterized oxidoreductase, have been shown to be strongly impaired in infection abilities and adherence to human Caco2 cells in vitro (Tareen et al. 2011), whereas a neighboring gene, cj0006, encoding a putative transporter, has been shown in global transcriptomic studies to be overexpressed in vivo when C. jejuni infects chicken (Hu et al. 2014). Finally, the tyrS gene, predicted to encode a tyrosyl-tRNA synthetase, has been observed to be overexpressed in a poor chicken colonizer strain of C. jejuni (Seal et al. 2007). Additionally, it has been associated with mammalian (cattle) adaptation in a previous GWAS from our laboratory (Sheppard, Didelot, Meric, et al. 2013).
Genes predicted to have a role in metabolism were also highlighted. The ackA and aspB genes are involved in acetate and aspartate metabolism, respectively, and have been shown in mutagenesis studies to be important for entry into human epithelial cells in vitro (Novik et al. 2010). The fdhD gene encoding a formate dehydrogenase was also associated with isolates belonging to human-only amino acid clusters. Formate metabolism has been previously implicated in host association and survival in the food production chain from farm to human disease (Yahara et al. 2017). The racR gene which regulates fumarate utilization in a low-oxygen environment also displayed human-associated variation and racR-deficient mutants have shown reduced chicken colonization in vivo (Bras et al. 1999;van der Stel et al. 2015). Other genes with variation associated with the CC21 human amino acid clusters included the dnaX gene that encodes a DNA polymerase and is a marker for the campylobacteriosis sequelae Guillain-Barre syndrome (Godschalk et al. 2006) and trpC that encodes an indole-3-glycerol-phosphate synthase in a genomic region important for human cell hyperinvasiveness (Javed et al. 2010).
Genomic variation associated with clinical C. jejuni isolates includes elements associated with the primary host (Sheppard, Didelot, Meric, et al. 2013) and the food production chain (Yahara et al. 2017), as well as variation which confers an adaptive advantage to human colonization and may directly impact pathogenesis (Thompson and Gaynor 2008). Evidence of genetic bottlenecks and selection fostered by this complex fitness landscape will not only be reflected in nucleotide sequence variation but also in features, such as gene order, distribution of CDS on leading and lagging strands, GC skew, and codon usage (Bentley and Parkhill 2004;Rocha 2004). By combining analysis of nucleotide sequence and amino acid variation we were able to identify a subset of human-associated C. jejuni. As these isolates are found in nonhuman hosts, we interpret this as evidence of a genetic bottleneck that increases the relative frequency of certain strains in the infected individuals. Although larger scale studies are necessary to confirm a potential adaptive role for the human-associated variation, our analysis has identified a group of human-pathogenic C. jejuni that do not exhibit typical source-sink epidemiology, potentially reflecting human tissue tropism or virulence.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.