Exploring the Accuracy and Limits of Algorithms for Localizing Recombination Breakpoints

Abstract Phylogenetic methods are widely used to reconstruct the evolutionary relationships among species and individuals. However, recombination can obscure ancestral relationships as individuals may inherit different regions of their genome from different ancestors. It is, therefore, often necessary to detect recombination events, locate recombination breakpoints, and select recombination-free alignments prior to reconstructing phylogenetic trees. While many earlier studies have examined the power of different methods to detect recombination, very few have examined the ability of these methods to accurately locate recombination breakpoints. In this study, we simulated genome sequences based on ancestral recombination graphs and explored the accuracy of three popular recombination detection methods: MaxChi, 3SEQ, and Genetic Algorithm Recombination Detection. The accuracy of inferred breakpoint locations was evaluated along with the key factors contributing to variation in accuracy across datasets. While many different genomic features contribute to the variation in performance across methods, the number of informative sites consistent with the pattern of inheritance between parent and recombinant child sequences always has the greatest contribution to accuracy. While partitioning sequence alignments based on identified recombination breakpoints can greatly decrease phylogenetic error, the quality of phylogenetic reconstructions depends very little on how breakpoints are chosen to partition the alignment. Our work sheds light on how different features of recombinant genomes affect the performance of recombination detection methods and suggests best practices for reconstructing phylogenies based on recombination-free alignments.


Introduction
Recombination exchanges genetic material horizontally between organisms and thus plays a crucial evolutionary role in creating novel genotypes (Posada et al. 2002).Because recombination allows for individuals to inherit different regions of their genomes from different parents, recombination can bring beneficial alleles from different genomic backgrounds together, increasing the rate of adaptation (Alves et al. 2017).Recombination also allows for deleterious mutations to be purged, avoiding the accumulation of deleterious mutations in a single genetic background known as Muller's ratchet (Muller 1964).Detecting the occurrence of recombination is, therefore, important to understanding the evolutionary history of genomic sequence data and estimating population genetic parameters such as the recombination rate which fundamentally shape the trajectory of evolving populations (Visscher and Hill 2006).
Recombination can also mislead the inference of phylogenetic relationships from sequence data.In the presence of recombination, the true ancestral relationships among individuals may vary across the genome because individuals inherit different regions of their genome from different ancestors.No single phylogeny can, therefore, represent the true ancestral relationships among individuals across a recombining genome (Awadalla 2003;Rasmussen et al. 2014).If recombination is ignored during phylogenetic reconstruction, the inferred phylogeny may not accurately reflect the true relationships for any region of the genome (Kubatko and Degnan 2007).Moreover, ignoring recombination during phylogenetic reconstruction can distort the topology of inferred phylogenetic trees, leading to more star-like trees with longer terminal branches and less clock-like rates of molecular evolution (Schierup and Hein 2000).Hence, it is essential to detect recombination events prior to performing phylogenetic analysis and then either removing recombinant sequences entirely or partition sequence alignments into recombinationfree segments (Bell and Bedford 2017;Boni et al. 2020;Jackson et al. 2021).
Because of the central role of recombination in biological sequence analysis, several algorithms have been developed to detect recombination events.These algorithms can be divided into a few major classes depending on what information and statistical methods are used to detect recombination (Posada and Crandall 2001;Shikov et al. 2022).These include: (1) Substitution distribution methods which aim to detect a mosaic structure in the distribution of substitutions (i.e.polymorphic sites) across the genome.For example, MaxChi uses a χ 2 statistic to test whether there are significantly more substitutions between pairs of sequences to the left or right of a potential breakpoint than would be expected by chance (Smith 1992).Similar ideas can be applied to sequence triplets.For example, 3SEQ considers whether substitutions between a presumed child sequence and two parent sequences are clustered such that the child sequence resembles different parental sequences over the span of the genome (Boni et al. 2007).( 2) Phylogenetic methods aim to identify recombination breakpoints by looking for phylogenetic discordance between trees reconstructed from different regions of a genome.For example, Genetic Algorithm Recombination Detection (GARD) iteratively divides a genome into smaller blocks, builds neighborjoining (NJ) trees for each block, and then tests whether the sequence data strongly supports discordance between trees indicative of possible recombination (Pond et al. 2006).Lastly one can consider "compatibillity" methods which consider whether each site in a sequence is congruent with an evolutionary history without recombination and recurrent mutations (Hudson and Kaplan 1985;Smith and Smith 1998).However, since recurrent mutations, including convergence and reversion, are a frequent cause of homoplasy in rapidly evolving populations, we do not consider compatibility methods here.
Regardless of the exact method employed, detecting and precisely locating recombination breakpoints is challenging.Only a small fraction of sites in a given sequence alignment are likely to be informative about recombination.At a minimum, there needs to be polymorphic sites present in two recombining parental sequences in order for a signal of recombination to be detected since it is not possible to detect recombination among identical sequences.Furthermore, not all polymorphic sites will necessarily be informative about recombination because processes such as recurrent mutation may obscure signals of recombination.There has therefore been great interest in evaluating the performance of recombination detection methods by analyzing their detection power (sensitivity) and false positive rate (specificity) on real and simulated sequences (Posada and Crandall 2001;Wiuf et al. 2001;Posada 2002;Bay and Bielawski 2011).All these studies have shown that detection methods can under certain circumstances (e.g.high sequence diversity) perform well in terms of power and false positive rates.However, these previous studies paid little attention to how accurately and precisely different methods could localize recombination breakpoints.This is an important oversight as the ability to localize breakpoints has many practical applications in bioinformatics including creating nonrecombinant alignments for phylogenetic reconstruction.
When recombination detection methods are used to identify breakpoints prior to phylogenetic reconstruction, the location of identified breakpoints is generally used to partition or "slice" the full alignment into local subalignments presumed to be free of recombination, which are then used to reconstruct local phylogenetic trees for nonrecombinant regions of the genome (Tamura et al. 2023).However, one can choose between several different strategies for slicing alignments based on identified breakpoints.For example, one could be as aggressive as possible and slice the alignments at the exact position of identified breakpoints in order to include as many sites as possible in the resulting subalignments.Alternatively, one could be as conservative as possible and only select subalignments that are free of recombination with high-confidence, excluding all sites within a certain confidence window around identified breakpoints.To our knowledge, how errors in breakpoint localization impact downstream phylogenetic inference and how to best choose recombination-free subalignments based on identified breakpoints has never been systematically explored.
In this paper, we use simulated sequence data to explore the accuracy and precision of recombination detection methods including MaxChi, 3SEQ, and GARD at localizing recombination breakpoints.Our simulation study consists of three parts.In the first part, we simulate triplets of sequences with only a single recombination breakpoint under the coalescent with recombination (Hudson 1983(Hudson , 1990)).We then investigate which genomic features lead to the greatest variability in performance/accuracy between different data sets and methods.For these analyses, we primarily focus on the single breakpoint simulations as this simple scenario provides the greatest insights into which features contribute the most to performance.However, in the second part, we consider performance among the same methods in the more general setting where multiple recombination events generate the observed sequence data.In the third part, we explore how best to partition alignments into recombination-free segments based on the breakpoints identified by different detection methods.Through these three simulation studies, we aim to provide insights into the selection of appropriate recombination detection methods and best practices for subsequently reconstructing phylogenetic trees based on inferred breakpoint locations.

Related Work
We briefly summarize how each of the three recombination detection algorithms explored here test for the presence of recombination and localize the most likely recombination breakpoints.
The MaxChi method, developed by Smith (1992), is a classic substitution distribution method.For a single recombination event, it assumes that the recombinant Cen and Rasmussen • https://doi.org/10.1093/molbev/msae133MBE genome consists of two blocks inherited from two independent parent sequences.For a sequence pair, it makes an arbitrary cut at position k and builds a 2 × 2 contingency table around the position k to find the statistic χ 2 that indicates whether the number of mutations to the left/right of the breakpoint deviates from what would be expected based on the total number of mutations in each region under the null hypothesis that there is no recombination.The site k max that maximizes the χ 2 statistic is considered a candidate breakpoint location.If there are multiple breakpoints to infer, MaxChi will iteratively search each block until no further breakpoints are detected.
GARD is a classic phylogenetic-based recombination detection method developed by Pond et al. (2006).At each site in the alignment, GARD divides the alignment into two blocks and builds a NJ tree individually for each block.GARD then uses corrected Akaike information criterion (AIC c ) values to decide if the likelihood of the sequence data given two trees is significantly higher than under a single tree.If at least one AIC c value of the inferred NJ trees from two blocks is less than the AIC c value of the NJ tree inferred from the whole alignment, that position will be considered a potential breakpoint.When dealing with multiple breakpoints, GARD uses a genetic algorithm to search for candidate positions based on the likelihood of the model (Eshelman 1991;Pond and Frost 2005).
3SEQ is a widely used exact nonparametric method for detecting recombination developed by Boni et al. (2007).Given three sequences, one sequence (C) is presumed to be the recombinant child, while the other two sequences, denoted as P and Q, are presumed to be the parents of C. 3SEQ considers the sequence of mutations in C that match the allelic state of the parents P or Q at a given site.A sequence of several P-like mutations followed by a sequence of several Q-like mutations would therefore suggest C inherited genetic material from P to the left and from Q to the right of a recombination breakpoint.Under the null hypothesis that there is no recombination, the algorithm calculates the exact P-value of observing a sequence of P or Q-like mutations in the child sequence based on a hypergeometric random walk model.A very unlikely sequence of mutations under the null hypothesis of no recombination (low P-values) is considered to be evidence for a recombination event.The test is then repeated for all possible sequence triplets in an alignment.

Ancestral Recombination Graph and Sequence Simulation
To simulate sequence alignments, we used msprime 1.2.0 (Baumdicker et al. 2021) to first simulate ancestral recombination graphs (ARGs) with known recombination events.In our first set of simulations, we fixed the sample size at three and conditioned the simulations on having exactly one recombination event occur between two parent lineages and a child recombinant sequence.The location of the true recombination breakpoint was recorded for comparison with the inferred locations.After simulating each ARG, pyvolve 1.1.0(Spielman and Wilke 2015) was used to simulate sampled sequences along each local tree in a given ARG and then sequences were concatenated end-to-end to obtain a full alignment spanning the entire genome.We varied the mutation rate μ such that pairwise genetic diversity π ranged from 0.01 to 0.2.All other simulation parameters are given in Table 1.In our second set of simulations, we constrained the number of recombination events between 1 and 10 such that there could be multiple breakpoints in the simulated alignments.For these simulations, we expanded the pairwise genetic diversity range up to 0.3 and set the sample size to 10. Simulated ARGs that contained undetectable recombination events (i.e. the Type I events defined below) were discarded.All simulation code and data are available at https://github.com/davidrasm/RecBreakDetect.

Recombination Breakpoint Detection and Performance Evaluation Metrics
Three methods were applied to detect recombination events: MaxChi (Smith 1992), 3SEQ (version 1.7) (Boni et al. 2007) and GARD (version 2.5.40)(Pond et al. 2006).Simulations in which the recombination event was not identified by any method were discarded.We then used two evaluation metrics to compare their relative performance in localizing breakpoints.First, we define the error as the distance of the imputed breakpoint location from the true location in units of base pairs.For single breakpoint simulations, we measure the localization error as the distance between the true breakpoint and the inferred breakpoint.For the multiple breakpoint simulations, different measurement strategies are used depending on the detection method.For 3SEQ, each row in a 3SEQ output file Exploring the accuracy of recombination detection algorithms • https://doi.org/10.1093/molbev/msae133

MBE
indicates a potential recombination event between three sequences.Within the full simulated ARG, we identify the sub-ARG that only contains these three sequences and check whether there is a recombination event in this sub-ARG.If so, for each inferred breakpoint interval, we calculate the distance between the midpoint of the interval and the nearest recombination breakpoint in the sub-ARG.An average localization error for each simulation is calculated from all of the intervals.A similar strategy is used for MaxChi: for all pairs of sequences in which a recombination event is detected, we look at the corresponding sub-ARG involving the two sequences and find the nearest true recombination breakpoint to calculate the mean localization error.For GARD, we directly find the nearest recombination breakpoint for each inferred breakpoint to calculate the mean localization error.Second, we define precision as the reciprocal of the variance in the site-specific probabilities of each site being the recombination breakpoint, such that placing all probability on a single site would have the highest precision and placing uniform probability on all sites would have the lowest precision.The site-specific probabilities were normalized such that these probabilities sum to one.How site-specific breakpoint probabilities were computed for each method is described in the supplementary material 1, Supplementary Material online.

Explaining Variability in Performance Across Data Sets
We looked at several summary statistics or features of the simulated sequence data that could potentially explain the variable performance of different methods across simulations.These features were used in the partial least squares regression model described below.These variables include:

Recombination Type Categorization
Following Wiuf and Hein (1999), we categorize recombination events into three types.An example ARG corresponding to each type is shown in Fig. 1a.Type I recombination events happen when the two parent lineages of a recombinant child immediately coalesce before they merge with other sampled lineages in the genealogy.In this case, recombination has no impact on local tree topologies and leaves no trace in the sequence data.Type I events should therefore be undetectable by any method, but we include For example, the eighth position is an informative site where the child has the same state as the Q parent.But the left side of the child sequence is actually inherited from the P parent.Hence, this site is considered to be an inconsistent informative site.
Cen and Rasmussen • https://doi.org/10.1093/molbev/msae133MBE them here to test the specificity of each method against detecting false positives.Type II events happen when one of the parent lineages coalesces with another sampled lineages to form a new parent lineage that then immediately coalesces with the other parent lineage.Hence, there is no topology change on the local trees, but the branch lengths of the parent lineages may differ.Type III events happen when the recombining parent lineages coalesce with different nonrecombinant lineages on the two sides of the recombination breakpoint.This type results in a clear topology change between the two local trees.

Number of Consistent and Inconsistent Informative Sites
For any sequence triplet, let the true recombinant child be denoted as C, the parental sequence inherited along the left side of the recombinant child sequence be designated as the P sequence and the parental sequence inherited along the right side be the Q sequence.We define informative sites as positions where C and either P or Q share the same mutation/nucleotide, but not both (Boni et al. 2007).We further define consistent informative sites as positions in C such that the nucleotide state of C matches the parent from which it inherited its genetic material at that site.Figure 1b illustrates our definition of consistent versus inconsistent informative sites.However, recurrent mutations and back mutations (reversions) can obscure the pattern of inheritance by causing the nucleotide state of the child to match another sequence from which it did not inherit its genetic material as shown in Fig. 1b.We will refer to these sites as inconsistent informative sites.

Average Distance Between Informative Sites and Breakpoint
In order to measure the effect of the distance between informative sites and the true breakpoint location on detection performance, we define the average distance of informative sites as the mean value of the distance between each informative site and the true breakpoint.

Partial Least Squares Regression
We use a partial least squares regression (PLSR) model to evaluate the contribution of the five predictor variables described above to the performance of the recombination detection methods at localizing breakpoints.A PLSR model was chosen because there is strong multicollinearity among our predictor variables, a condition where standard linear regression often fails, but PLSR excels because it seeks to find a linear combination of predictor variables that explain the maximum variance in the response variable (i.e.localization error) (Haenlein and Kaplan 2004).As in Principle Component Analysis, the contribution of each explanatory variable to the new axes is quantified by a weight.Following the approach described in Ratmann et al. (2016), the explained variance of each predictor is calculated as where k is the number of latent factors in each PLSR model for the three detection methods (k = 5 for GARD, MaxChi and 3SEQ via cross-validation), w 2 ij /w i is the relative weight of the jth predictor in the ith latent factor with w i =  p j=1 w 2 ij and w ij being the loading of the linear combination for each component, and π i is the explained variance by ith latent factor.We used the plsRglm package (version 1.2.1) to fit the PLSR model and construct the confidence intervals of each coefficient using R = 5,000 bootstraps (Bertrand and Maumy 2023).When fitting the PLSR model, we only consider Type II and Type III recombination events as Type I events are undetectable.

Breakpoint Detection for Phylogenetic Reconstruction
In order to evaluate how different partitioning strategies impact phylogenetic reconstruction, we performed simulation experiments in which we set different window sizes around the breakpoints identified by different detection methods.We then sliced the sequence alignment into local subalignments at either the true recombination breakpoints or the different window borders (supplementary material fig.S1, Supplementary Material online).Sites within windows were excluded from the resulting subalignments, such that larger window sizes are more conservative with respect to ensuring recombination breakpoints are excluded from the subalignments.Maximum-likelihood trees were then reconstructed from the resulting local subalignments using RAxML (Stamatakis 2014).The performance of the phylogenetic reconstruction was then evaluated by calculating the Robinson-Foulds (RF) distance (Robinson and Foulds 1981) between the corresponding true local tree and the reconstructed local tree.For genomic regions inside windows excluded from a local subalignment for which a local tree could not be reconstructed, we compute RF distances between the true local tree and the neighboring ML trees to interpolate over excluded sites.Given the midpoint of the window, the neighboring ML tree to the left or right of the window is used to interpolate RF distances for sites between the ends of the window and the midpoint.All RF distances are normalized based on the number of tips in the tree and then averaged over all sites in an alignment.RF distances are calculated using the symmetric_difference function in DendroPy (Sukumaran and Holder 2010).

Detecting and Localizing Single Breakpoints
An Overview of the Three Recombination Detection Methods While previous studies explored the power of different algorithms to detect recombination events (Posada and Crandall 2001;Wiuf et al. 2001;Posada 2002), we begin our exploratory analysis by summarizing the power and specificity of GARD, MaxChi, and 3SEQ.Because Type I recombination events should be undetectable, we consider detection of Type I events to be false positives.Failure to detect Type II or III events is considered to be false Exploring the accuracy of recombination detection algorithms • https://doi.org/10.1093/molbev/msae133MBE negatives.Overall, MaxChi and 3SEQ show a high specificity (i.e. a lower false positive rate) at all levels of genetic diversity, whereas the specificity of GARD decreases as genetic diversity increases (Fig. 2a).On the other hand, GARD shows a much greater power to detect recombination compared to the other two methods (Fig. 2b).For Type II events, MaxChi has a higher power than 3SEQ, while 3SEQ has a higher power than MaxChi when detecting Type III events.We conclude that among the three detection methods, GARD is more sensitive at detecting recombination events, but there is a tradeoff between sensitivity and specificity, i.e. the increased sensitivity of GARD comes at the cost of a higher false positive rate.

Accuracy in Localizing Breakpoints
To determine the accuracy of each method in localizing breakpoints we quantify the error in the inferred breakpoint location in terms of the distance in base pairs (bp) from the true location.The accuracy of the three detection methods in localizing recombination breakpoints substantially differs (Table 2).When detecting Type I recombination events, no method performs better than the expected error of a random guess, which should be approximately 333 bp given the probability of a breakpoint is uniformly distributed along a 1,000 bp sequence.For Type II and III events, 3SEQ generally has a lower localization error (37.37 bp) than the other two methods, whereas GARD has a localization error of nearly 140 bp.However, it is worth noting that GARD operates exclusively on unrooted trees so cannot differentiate between alternative three taxon trees.Hence, any signal of recombination is driven by differences in branch lengths in the trees to the left and right of a potential breakpoint, which likely disadvantages GARD at localizing recombination breakpoints in sequence triplets.
In order to see which features of the sequence data explain the most error in localizing breakpoints, we take the loading on each predictor variable in the PSLR model as a weight to evaluate the explained variance.Overall, 30% to 40% of the variance in performance can be captured by the features in the model for each of the three detection methods (Fig. 3).How each feature contributes to localization error is summarized in Table 3 and explored individually below.
Recombination type.For all methods, error is higher for Type II than Type III events.This result makes intuitive sense as Type II events only impact branch lengths and are therefore expected to leave a weaker signal in the sequence data than Type III events which impact local tree topologies.Overall, MaxChi performs better when detecting Type II events while 3SEQ has a significantly lower error when detecting Type III events (ANOVA P < 0.001, Fig. 4a).The recombination type factor explains more variance in performance for 3SEQ and GARD than for MaxChi (Fig. 3).
Genetic diversity and consistent informative sites.All else being equal, recombination among more genetically divergent sequences should be easier to detect than recombination between genetically similar sequences.In theory then, genetic diversity should closely reflect how much information sequence data contain about recombination.However, as genetic diversity increases the localization error does not decrease as strongly as we might expect, especially for GARD and 3SEQ (Fig. 4b).In order to understand this trend, we further explored the relationship between genetic diversity and the number of informative sites that are either consistent or inconsistent with the allelic state of the parent from which genetic material was inherited in the recombinant child sequence (see Fig. 1b).As expected, the number of consistent informative sites increases with greater genetic diversity, which should benefit all methods (Fig. 4c).However, the number of  Cen and Rasmussen • https://doi.org/10.1093/molbev/msae133MBE inconsistent informative sites also increases with genetic diversity (Fig. 4d) due to recurrent mutations overriding the correct signal of ancestry.Consistent with these expectations, increasing the number of consistent informative sites decreases localization error, while increasing the number of inconsistent sites increases the localization error (Fig. 4e).Together these two factors explain up to half of the total variance in performance (Fig. 3).Moreover, the regression coefficients for genetic diversity in the PLSR model are nonsignificant for all three methods when the number of consistent and inconsistent informative sites are included in the model (Table 3).Therefore, while the direct impact of genetic diversity on the localization error is weak, decomposing overall genetic diversity in this way shows that the number consistent and inconsistent sites play a crucial role in determining localization error.
Average distance between informative sites and recombination breakpoint.For MaxChi and GARD, the farther away informative sites are from the true recombination breakpoint on average, the higher the localization error (Fig. 4f).On the contrary, the distance between informative sites and recombination breakpoint does not significantly affect the error of 3SEQ.The regression coefficients in the PLSR model also match these univariate relationships (Table 3), where the average distance will increase the error for MaxChi and GARD, but have no significant effect on 3SEQ.For MaxChi and GARD, it is clear that information about the breakpoint position is maximized when the informative sites are closer to the breakpoint.However, the random walk model employed by 3SEQ appears less sensitive to the distance between informative sites and the breakpoint.As a result, the average distance explains very little variance in the performance of 3SEQ (Fig. 3).

Precision Analysis
We next considered how precisely each method can locate breakpoints based on the probability mass (i.e.site-wise supporting probability) assigned to each site being the breakpoint.For this analysis, we only consider Type III recombination events.When genetic diversity is relatively low, MaxChi has higher precision than GARD, but as the genetic diversity increases the median precision of both GARD and MaxChi increase and eventually approach one another (Fig. 5a).For MaxChi, the site-wise probabilities generally increase nearly monotonically toward the breakpoint, but for GARD there are often multiple peaks or local maxima in their distribution (Fig. 5b).This could explain why precision varies across data sets much more for GARD than MaxChi.3SEQ has lower precision than the other two methods, which is expected because 3SEQ assigns all uninformative sites between two informative sites the same site-wise probability, which tends to increase the variance and thus lower precision.For all other methods, precision tends to increase with the number of The sign of the impact corresponds to the sign of the regression coefficient estimated under the PLSR model.Type II events are set as the reference level to Type III events.Asterisks indicate significant effects.

Run Time Comparison
We compared the run times of the three methods on single breakpoint alignments with varying numbers of sequences and sequence lengths (supplementary material fig.S2, Supplementary Material online).The run times for GARD and maxChi increase dramatically with the alignment size and length, whereas the run times for 3SEQ remained relatively constant.Additionally, 3SEQ consistently ran faster than the other two methods.For example, it only takes about 1 s for 3SEQ to run an alignment of 10,000 bp with 30 sequences, whereas it takes about 10 min for maxChi and 1 h for GARD.

An Overview of the Three Recombination Detection Methods
We compared the power of each method to detect recombination events in simulations with multiple Type II and Cen and Rasmussen • https://doi.org/10.1093/molbev/msae133

MBE
Type III recombination events at different levels of genetic diversity (Fig. 6a).Here, power refers to the ability to detect at least one recombination event.Overall, GARD outperforms the other two methods in terms of power.When genetic diversity is greater than 0.10, GARD always detects a recombination event.When genetic diversity is extremely low, MaxChi and 3SEQ fail to detect any recombination events.With multiple breakpoints, it is not always possible to tell which of the true breakpoints is detected.We therefore quantify the localization error as the distance between each inferred breakpoint and its nearest true breakpoint (see Methods).MaxChi has a slightly lower localization error than 3SEQ and GARD but a larger variance as well (Table 4).We however note that MaxChi's lower error may result from its limited detection power, such that GARD and 3SEQ have higher errors due to their difficulty localizing breakpoints MaxChi was unable to detect.

Accuracy in Localizing Breakpoints at Different Genetic Diversity Levels
As in our single breakpoint simulations, localization error decreases as genetic diversity increases for 3SEQ and Exploring the accuracy of recombination detection algorithms • https://doi.org/10.1093/molbev/msae133MBE GARD, while the trend for MaxChi is not clear due to its low detection power (Fig. 6b).There is no clear relationship between the localization error and the number of breakpoints in an alignment (Fig. 6c).Combined with detection power, it seems that 3SEQ performs better than the other two methods as it has a decent detection power as well as a low localization error.
Based on our single breakpoint results, localization accuracy should decrease with the number of consistent informative sites in the alignment.However, with multiple breakpoints, we do not know which true breakpoint is detected, such that defining consistent and inconsistent informative sites is difficult.Hence, we only consider the total number of informative sites.Nevertheless, taking 3SEQ as an example, localization error still decreases with the number of informative sites with multiple breakpoints in the alignment (Fig. 6D).

Effects of Breakpoint Localization on Phylogenetic Reconstruction
The breakpoints identified by recombination detection methods are often used to partition sequence alignments into recombination-free subalignments for phylogenetic reconstruction.We, therefore, wanted to see if using different methods to localize breakpoints would have a measurable impact on the quality of phylogenetic reconstruction.To do this, we ran each detection method on simulated sequence data sets where the underlying true ARG, and thus the true local phylogeny for each region of the genome, was known.Alignments were then cut at the identified recombination breakpoints and maximum-likelihood trees were reconstructed from the resulting recombination-free subalignments.To compare performance, we then calculated the RF distance between each reconstructed tree and the corresponding true local tree at each site in the alignment and then averaged RF distances across all sites.As shown in Fig. 7, using the sliced alignment to build the phylogenies can dramatically improve reconstruction accuracy, as seen by the decrease in the average RF distance relative to using the whole alignment which ignores the breakpoints.Generally, the RF distance between the true phylogenies and the phylogenies reconstructed from recombination-free subalignments decreases with genetic diversity for all three recombination detection methods before leveling off at higher values of genetic diversity.Comparing the three detection methods, GARD performs slightly better than the other two methods but the choice of detection methods has little impact on reconstruction error overall.
We further considered how aggressive or conservative we should be about slicing alignments by excluding sites within a certain distance of identified breakpoints (i.e. the breakpoint window, see supplementary material  Window size refers to the number of sites neighboring breakpoints that are excluded from the subalignments.We only consider simulations where at least one recombination event was detected by each method.RF distances between reconstructed and true trees were computed at each site and then averaged across sites.For regions in which a local tree could not be reconstructed due to excluding sites, we use the neighboring ML trees to interpolate between regions when computing RF distances. Cen and Rasmussen • https://doi.org/10.1093/molbev/msae133MBE fig.S1, Supplementary Material online).Increasing the window size therefore lowers the probability of a true breakpoint being erroneously included in a subalignment but at the potential cost of excluding informative sites from the alignment.Overall, RF distances increase slowly as window sizes increase (Fig. 7).It is notable that for 3SEQ, using the conservative intervals given by the algorithm to cut the alignment and build the phylogenies is no better than using the midpoint of these intervals.These results suggest that if we are confident about breakpoint locations (Table 4), aggressive strategies may slightly outperform more conservative approaches.We should therefore retain as many sites as possible by slicing alignments as close as possible to identified breakpoints.But overall, the quality of phylogenetic reconstruction depends very little on how we partition the alignment.This can be explained by the fact that there is a tradeoff between being aggressive, which can lead to missed breakpoints, and being conservative, which can yield shorter subalignments with less informative sites.Both a greater number of missed breakpoints and a smaller number of informative sites increase RF distances (supplementary material fig.S3, Supplementary Material online).Hence, there is generally only a small difference in reconstruction accuracy for different window sizes.Accounting for recombination by partitioning alignments at identified breakpoints, therefore, appears to be much more important than exactly how alignments are partitioned.
In the preceding analysis, the average RF distance across all sites was used to quantify reconstruction accuracy across the entire genome.This performance metric therefore gives more weight to local trees spanning a larger proportion of the genome.On the other hand, if we only care about the accuracy of individual local trees regardless of what proportion of the genome they span, using a more conservative approach with a larger window size improves reconstruction accuracy for GARD, 3SEQ, or MaxChi (supplementary material fig.S4, Supplementary Material online).

Discussion
In this study, we conducted an in-depth exploration of the performance of three widely used recombination detection methods using simulated ARGs and sequence data.Our investigation revealed that these three methods perform very differently in detecting and localizing recombination breakpoints.For instance, 3SEQ exhibited a tendency toward lower localization error but also lower precision, while GARD demonstrated robust detection power, albeit with subpar specificity and somewhat unsatisfactory localization accuracy.More generally, we identified several genetic features that can explain this variable performance in breakpoint localization.These features include recombination type and average distance between informative sites and breakpoint.Overall though, the number of (consistent) informative sites was found to be the most important factor determining the accuracy at which both single and multiple breakpoints could be localized.Armed with these insights, we proceeded to evaluate the accuracy of phylogenies reconstructed from sequence alignments partitioned at the breakpoint positions imputed by each method.Notably, we found that the choice of recombination detection method had minimal impact on phylogenetic reconstruction.These findings show that partitioning alignments on identified breakpoints does improve reconstruction accuracy, irrespective of the specific recombination detection method employed.
The statistical models underlying these different methods each consider unique information coming from sequence data as a signal of recombination.The variable performance of these methods can often be explained by features of the genetic data that may amplify or obscure these signals of recombination.For instance, recombination type plays a pivotal role in GARD's ability to infer breakpoint positions because GARD relies on detecting topological discordance between local phylogenetic trees, which will only be present at Type III recombination events.In contrast, MaxChi considers the clustering of substitutions along pairs of aligned sequences and is therefore much better at detecting Type II recombination events than GARD.However, the accuracy of GARD and MaxChi declines when informative sites are distant from the true breakpoint, as these methods consider the length of local segments on each side of a potential breakpoint.Conversely, 3SEQ remains less affected by the distance of informative sites.Knowledge of how these genetic features impact performance therefore allows for more informed method selection based on prior information.
Among the genetic features we explored, the number of consistent informative sites emerges as the most influential contributor to localization error, especially when we differentiate between consistent and inconsistent informative sites.As previously reported, detection power increases as genetic diversity and therefore the number of informative sites also increases (Posada and Crandall 2001), yet breakpoint localization accuracy only increases slowly with increasing genetic diversity.This pattern can be better understood if we view the mutations generating genetic diversity as a "double-edged" sword for detecting recombination.Higher mutation rates create informative sites where the allelic state of a child recombinant sequence correctly matches the allelic state of the parent from which genetic material was inherited, creating a signal of recombination.However, higher mutation rates also create recurrent mutations (homoplastic sites) which obscure patterns of inheritance and thus signals of recombination.Localization accuracy therefore rapidly improves with the number of consistent informative sites but declines with the number of inconsistent informative sites for all three detection methods.While it is not possible to differentiate consistent from inconsistent sites in real sequence data, efforts could be taken to mask highly homoplastic or other sites likely to be inconsistent with the true pattern of inheritance.
Phylogenetic analysis conventionally assumes that the ancestral relationships among samples can be adequately Exploring the accuracy of recombination detection algorithms • https://doi.org/10.1093/molbev/msae133MBE represented by a single phylogenetic tree.However, this assumption often fails when applied to real-world data, leading to inaccurate phylogenetic inference and estimates of molecular clock rates (Schierup and Hein 2000;Zhu et al. 2022).Consequently, recombination detection methods are often used to identify breakpoint locations to generate recombination-free subalignments from which local phylogenetic trees can be reconstructed.Our findings indicate that as long as genetic diversity is not excessively low, there is indeed a large reduction in phylogenetic error when reconstructing local trees using recombination-free segments (see Fig. 7).However, the exact choice of method used to detect breakpoints had little downstream impact on phylogenetic reconstruction accuracy, suggesting that running a single detection method is likely sufficient.Of the methods we compared, 3SEQ would be our preferred choice as it exhibits robust detection capabilities, high localization accuracy and remains computationally efficient, even for larger datasets.Moreover, we observed remarkably little variation in reconstruction error when different window sizes were used to exclude sites within a certain distance of identified breakpoints.These results suggest that when selecting inferred breakpoints for reconstructing local trees, a more aggressive approach using longer segments may be preferred in order to retain a greater number of informative sites if we want to capture the majority of information of the whole genome (Fig. 7).However, being aggressive comes at the cost of potentially including missed breakpoints in the resulting subalignments.The resulting tradeoff between subalignment length and the inclusion of missed breakpoints likely explains why the exact partitioning method has little impact on reconstruction accuracy.If we are interested in recovering individual local trees independently of what proportion of the genome they represent, using a more conservative approach to exclude other potential breakpoint sites may be preferred (supplementary material fig.S4, Supplementary Material online).
While we only compared three recombination detection methods, the selected methods are representative of the major categories of statistical methods available for both detecting and localizing recombination events.GARD is a classical phylogenetic method, MaxChi falls within the substitution category, and 3SEQ is a tripletbased method.Both MaxChi and 3SEQ are widely used and have been incorporated into popular software packages like RDP5 (Martin et al. 2020).We however note that the results of these methods may depend on the settings and parameters chosen by the user, especially the significance level chosen as the detection threshold.In practical terms, one can employ a higher significance level (lower P-value) threshold for 3SEQ and MaxChi to increase specificity, potentially leading to different results compared to our current settings.Nonetheless, in our study, we adhered to the default parameters, which are commonly used in practice.Consequently, our results remain representative of the general performance of these detection methods.
Our study also sheds light on how to potentially improve the accuracy and precision of recombination detection methods in the future.For example, one could incorporate information about recombination hot spots or other sources of recombination rate variation across sites as an informative prior on the location of breakpoints.More ambitiously, one could estimate the the odds of a site being a consistent or inconsistent informative site based on the observed level of genetic diversity relative to what would be expected given an underlying demographic model.Having prior information about the relative odds of informative sites being generated by true recombination events (consistent sites) rather than recurrent mutations (inconsistent sites) could allow for more accurate and precise inferences which would better reflect our true uncertainty about breakpoint locations.

Fig. 1 .
Fig. 1. a) Illustration of the three recombination event types as categorized byWiuf and Hein (1999).Left: Type I recombination; Middle: Type II recombination; Right: Type III recombination.b) A hypothetical alignment resulting from a single recombination event between the parents P and Q producing the recombinant child sequence C. The consistent informative sites are colored red and the inconsistent sites are colored blue.For example, the eighth position is an informative site where the child has the same state as the Q parent.But the left side of the child sequence is actually inherited from the P parent.Hence, this site is considered to be an inconsistent informative site.

Fig. 2 .
Fig. 2. a) Specificity and b) sensitivity (power) of GARD, MaxChi, and 3SEQ at detecting different types of recombination events at different levels of genetic diversity.

Fig. 3 .
Fig. 3. Variance in breakpoint localization error explained by each predictor variable for three detection methods as quantified by the PLSR model.

Fig. 4 .
Fig. 4. The impact of different genetic features on localization error.a) Localization error by recombination type.A * * * indicates that P < 0.001 by ANOVA after Bonferroni correction.b) Relationship between localization error and genetic diversity.c) Relationship between genetic diversity and the number of consistent informative sites.d) Relationship between genetic diversity and the number of inconsistent informative sites.e) Relationship between localization error and the number of consistent informative sites.f) Relationship between localization error and average distance between informative sites and the breakpoint.

Fig. 5 .
Fig. 5. Precision of different detection methods in localizing breakpoints.a) Precision at different levels of genetic diversity.b) An example of how site-wise probabilities are distributed over sites.Here the dotted line indicates the true breakpoint location.c) Relationship between the logarithm of precision and the logarithm of the number of consistent informative sites

Fig. 7 .
Fig.7.RF distances between true phylogenetic trees and maximum-likelihood trees reconstructed from subalignments sliced at the recombination breakpoints identified by different methods.Medians and confidence intervals were computed from 500 simulations at each level of genetic diversity.Original refers to the full alignment ignoring all breakpoints.Window size refers to the number of sites neighboring breakpoints that are excluded from the subalignments.We only consider simulations where at least one recombination event was detected by each method.RF distances between reconstructed and true trees were computed at each site and then averaged across sites.For regions in which a local tree could not be reconstructed due to excluding sites, we use the neighboring ML trees to interpolate between regions when computing RF distances.

Table 1
Parameters used to simulate ARGs and sequence alignments

Table 2
Localization accuracy (mean ±SEM) of three detection methods averaging over simulations where the true breakpoint is at least 400 bp distant from ends of the alignment Results are averaged over all levels of genetic diversity.

Table 4
Localization accuracy (mean±SEM) of three detection methods for multiple breakpoint inference