A2B-COVID: A Tool for Rapidly Evaluating Potential SARS-CoV-2 Transmission Events

Abstract Identifying linked cases of infection is a critical component of the public health response to viral infectious diseases. In a clinical context, there is a need to make rapid assessments of whether cases of infection have arrived independently onto a ward, or are potentially linked via direct transmission. Viral genome sequence data are of great value in making these assessments, but are often not the only form of data available. Here, we describe A2B-COVID, a method for the rapid identification of potentially linked cases of COVID-19 infection designed for clinical settings. Our method combines knowledge about infection dynamics, data describing the movements of individuals, and evolutionary analysis of genome sequences to assess whether data collected from cases of infection are consistent or inconsistent with linkage via direct transmission. A retrospective analysis of data from two wards at Cambridge University Hospitals NHS Foundation Trust during the first wave of the pandemic showed qualitatively different patterns of linkage between cases on designated COVID-19 and non-COVID-19 wards. The subsequent real-time application of our method to data from the second epidemic wave highlights its value for monitoring cases of infection in a clinical context.

The blue line shows the basic infectivity profile, described by an offset gamma distribution.
The black dots show an approximation to this distribution, derived for the purposes of simulating data. The approximation consists of a sum of offset gamma distributions, describing in turn the infectivity profile conditional on an individual becoming symptomatic a fixed number of days after being infected. Within the simulation regime, the day of symptom onset is used to condition the infectivity profile so that an individual cannot infect someone else before they are themselves infected. Tables   Table S1: Parameters for the offset gamma distribution fitted to data describing intervals between times of reporting symptoms and positive test results. Inferred values were generated using maximum likelihood; the range describes a window of size two likelihood units from the maximum.   In the main text we stated that:

Model
To derive this result we note that, if it is observed that CAB(T)=0, transmission cannot occur at time T, so that P(CAB|XT)=0. If it is observed that CAB(T)=1, we next consider the element is observed, we apply our approach to contact patterns, assuming that P(CAB(t)=1|XT) = P(CAB(t)=0|XT) = 0.5, such that the probability of this observation is 0.5. If CAB(t) is not observed, its probability is obtained by integration. We have that P(CAB(t) | XT) = wAB(t)* 0.5 + (1 -wAB(t)) * 0.5 = 0.5. Hence if CAB(T)=1, the probability P(CAB | XT) of the whole contact vector is equal to 0.5 |C|-1 . Finally, we consider the case in which CAB(T) is missing data. Integrating over the missing value, we have that P(CAB(T) | XT) = wAB(T)* P(CAB | XT, CAB(T)= 1) + (1 -wAB(T))*P(CAB | XT, CAB(T)= 0) = wAB(T) Applying again the reasoning above, this gives us the result P(CAB | XT) = 0.5 |C|-1 wAB(T). As we defined wAB(T) = CAB(T) when CAB(T) was observed, we thus have that P(CAB | XT) = 0.5 |C|-1 wAB(T) in every case.
In the integral, we note that there are a large number of possible vectors CAB that indicate all times when a pair were in contact. We approximated the sum by generating 100 random vectors CAB for each set of other parameters, and calculating the sum over these vectors, altering the value 0.5 |C|-1 in P(CAB|XT) so as to normalise the integral. Reflecting our approach to contact patterns, we generated the CAB as random vectors of draws from a Bernoulli distribution with probability 0.5. Repeating this calculation with different sets of 100 vectors did not substantially change the thresholds obtained. Our code allows for the generation of alternative thresholds with different probabilities of an element of CAB being equal to 1. We note that if this probability is higher, fewer datasets will be judged consistent with transmission.

Supplementary Text S3: Simulations
The Mathematica software package v12.3.1.0 was used to generate simulated data. An individual A was assumed to be infected on day 0. After this, A infected either the individual B, or a chain of n individuals before, the last of which infected B. Upon being infected, an individual became symptomatic a number of days afterwards, according to the symptom onset distribution used within A2B-COVID. The time at which an individual infected another was calculated using an offset gamma distribution, with parameters calculated conditionally upon the time of symptom onset, as described below. A sample was collected for sequencing a whole number of days between 2 and 10 days after symptom onset, this value being calculated from a uniform distribution. The number of substitutions in each genome sequence was calculated as a Poisson distribution, with parameter equal to the sum of two values. The first value was calculated as the rate of evolution used within A2B-COVID multiplied by the time from the divergence in the transmission tree between A and B, and the time at which a sample was collected. We note that divergence occurs at the time when A infects another individual. The second value represents noise, and was specified according to the value inferred from within-host data.
Simulated data was analyed using a cut-down version of the A2B-COVID code, called A2B-Core, implemented in C++, which facilitates rapid calculations across large numbers of pairs of individuals without the use of an R interface. The code for A2B-Core is included in the Github for A2B-COVID.

Conditional parameters for an offset-gamma distribution
We first note that generating simulated data requires some elaboration of the distributions used in our study. While the time to symptom onset has a mean of about 5.2 days, the time between symptom onset and infecting another individual, specified by the infectivity profile, has a range of possible outputs, starting at -25 days. If the two distributions are considered independently, it is therefore possible to generate a case in which individual A infected individual B at a time before individual A was infected with the virus.
To solve this problem we generated a conditional infectivity profile. Suppose that individual A became infected on day zero. Then the time SA at which A became symptomatic is given by a distribution similar to that of equation 4.
In our model, if A infects B, then in the absence of further information about the locations of the two individuals, or about the time at which A was infected, the probability that the transmission occurred at time T is given by the standard infectivity profile of equation 3.

( )
For the purpose of generating simulations we elaborated on this model by decomposing the infectivity profile to be conditional on the time between A being infected and A becoming symptomatic. Where we suppose that A was infected at time zero, we have Where the first term within the sum is the time to symptom onset and the second term is an infectivity profile conditional on x. At this point, this distribution is unknown, except that, for obvious reasons, T is greater than or equal to x: Individual A cannot transmit the virus before being infected by the virus.
For the purpose of our simulations, we generated an approximate series of distributions, each having the form of an offset gamma distribution, equivalent to the original distribution, giving the expression Where specifically We then optimised the coefficients αx and βx so as to minimise the RMS distance D between the two distributions: -6" + A simple minimisation routine was implemented to perform this optimisation, terminating after 10 6 iterations. Supplementary Figure S9 shows the fit of our model to the original distribution, while our inferred parameters are shown in Table S3.
While we do not claim that our conditional distributions give a precise description of the reality of SARS-CoV-2 transmission, we obtain from this process a model which closely approximates the unconditional infectivity profile inferred from previous literature, while never producing the unrealistic outcome that an individual infects another without themselves being infected.