Assessing the Suitability of Next-Generation Viral Outgrowth Assays to Measure Human Immunodeficiency Virus 1 Latent Reservoir Size

Abstract Background Evaluations of human immunodeficiency virus (HIV) curative interventions require reliable and efficient quantification of replication-competent latent reservoirs. The “classic” quantitative viral outgrowth assay (QVOA) has been regarded as the reference standard, although prohibitively resource and labor intensive. We compared 6 “next-generation” viral outgrowth assays, using polymerase chain reaction or ultrasensitive p24 to assess their suitability as scalable proxies for QVOA. Methods Next-generation QVOAs were compared with classic QVOA using single leukapheresis-derived samples from 5 antiretroviral therapy–suppressed HIV-infected participants and 1 HIV-uninfected control; each laboratory tested blinded batches of 3 frozen and 1 fresh sample. Markov chain Monte Carlo methods estimated extra-Poisson variation at aliquot, batch, and laboratory levels. Models also estimated the effect of testing frozen versus fresh samples. Results Next-generation QVOAs had similar estimates of variation to QVOA. Assays with ultrasensitive readout reported higher infectious units per million values than classic QVOA. Within-batch testing had 2.5-fold extra-Poisson variation (95% credible interval [CI], 2.1–3.5-fold) for next-generation assays. Between-laboratory variation increased extra-Poisson variation to 3.4-fold (95% CI, 2.6–5.4-fold). Frozen storage did not substantially alter infectious units per million values (−18%; 95% CI, −52% to 39%). Conclusions The data offer cautious support for use of next-generation QVOAs as proxies for more laborious QVOA, while providing greater sensitivities and dynamic ranges. Measurement of latent reservoirs in eradication strategies would benefit from high throughput and scalable assays.

be incomplete or insufficient and may not recapitulate the in vivo context of latency reversal.After 20 years of use, QVOA remains laborious, and only recently have there been rigorous assessments of its performance compared with other reservoir measurement strategies [8][9][10].
Next-generation HIV reservoir assays have been developed that are less labor intensive and costly and require lower input cell numbers, but this reduction in input cells and replicate well analyses could affect assay sensitivity and precision.In addition, increased sensitivity achieved by enhanced detection methods may not necessarily reflect intact and replication-competent virus [4,11,12].
In 2019, our group reported findings from a rigorous blinded panel study comparing results from 4 laboratories performing classic QVOAs [10].In the current follow-up study, we sought to assess 3 things: the precision of next-generation QVOA under real experimental conditions, the effect of cryopreservation on infectious units per million (IUPM) values, and assess the suitability of next-generation assays as proxies for classic QVOA.
Even under perfect conditions, there is some inherent and unavoidable variability in measuring the HIV reservoir by IUPM outgrowth owing to Poisson sampling variation of a rare population of infected cells, resulting in relatively wide variation in target cells even between split samples from the same collection.To address sources of variation, the Reservoir Assay Validation and Evaluation Network (RAVEN) Study Group developed methods and models for statistical analysis of the accuracy and precision of QVOA in 75 split samples from 5 ART-suppressed participants using 4 classic QVOAs, and measured the impact of cryopreservation and laboratory-specific practices on assay results [10].Variation at 3 levels was described: between split samples in the same testing batch, between batches tested with the same assay, and between laboratories performing different assays [10].That study provided evidence for a lack of substantial systematic differences in IUPM measurement between fresh and frozen samples, supporting the use of frozen samples for batched analysis to monitor the impact of reservoir-reducing treatments before and after interventions and avoid the logistical difficulties in performing QVOA on fresh cells.
In the current investigation, we estimated the sources of variation that influence next-generation QVOA relative to classic QVOA results.We applied statistical methods to estimate the extra variation introduced by experimental conditions beyond that expected from Poisson variation, and to estimate the suitability of next-generation QVOA to serve as a proxy for classic QVOA.

Study Design and Objectives
Assessing the suitability of next-generation assays as proxies for classic QVOAs requires considering that, within a large reservoir of cells containing HIV proviruses (HIV infected), a much smaller subset is measurable by QVOA [12][13][14].Some fraction of this larger HIV-infected reservoir is clinically meaningful, but not the entire pool of HIV DNA-positive cells.Not all provirus-positive cells are replication competent; however, in vitro QVOAs may fail to measure all latently infected cells capable of expressing replication-competent HIV [12].Nextgeneration assays arise from an attempt to develop more sensitive and less cumbersome methods to assess the clinically relevant and biologically meaningful pool of inducible infected cells.Using analytic methods developed in our assessment of classic QVOAs, we sought to understand how well several next-generation QVOAs correspond to classic QVOAs and to each other, and to assess the impact of freezing cells on nextgeneration assay performance.

Experimental Design
Participants in the RAVEN project were enrolled and followed up as part of the University of California, San Francisco, OPTIONS and SCOPE studies, with specific consent for apheresis collections and testing for this study as approved by the University of California, San Francisco, Committee on Human Research (Institutional Review Board).Study design and participant characteristics (Supplementary Table 1) were described in detail elsewhere [10].
Five ART-suppressed HIV-1-infected participants based on time of initial ART after infection who have well-suppressed viral replication >3 years were included in the comparison; persons treated during acute infection (within 6 months) were excluded in order to have a reasonably established reservoir, increasing the potential for measurable and reproducible results.Specifically, participants were selected based on preexisting QVOA data to include participants with detectable and varying levels of previously characterized inducible virus, based on the Siliciano laboratory QVOA results, as reported by Eriksson et al [8] One HIV-uninfected control was included, and all 6 participants underwent leukapheresis collections of mononuclear cells.Isolated peripheral blood mononuclear cells from each collection were divided into identical replicate aliquots for testing in blinded fresh and frozen panels with multiple classic QVOA and next-generation assays.

Analytical Methods
We used methods described elsewhere [10] to account for unavoidable Poisson variation and estimate additional sources of assay variation.The experimental design permitted us to identify 4 separate levels of random variation, which are as follows, from lowest to highest.First, additional variation may affect each aliquot of a split sample independently, even when the aliquots are measured by the same assay and in the same batch.Second, batch-to-batch variation may cause aliquots assayed in separate batches to tend to differ more than if they were assayed in the same batch.Third, a split sample with aliquots measured by 2 different assays may tend to differ more than if the aliquots were measured by the same assay.This extra variability may reflect differences in procedures (Table 2) that exist among assays of the same general type (ie, classic QVOA, enhanced sensitivity QVOA, or next-generation QVOA).We measured this variability after correcting for systemic scale difference between assays.Fourth, a split sample with aliquots measured by assays of different types may tend to differ more than if the aliquots were measured using 2 assays of the same type.This additional variability may reflect the fact that the different types of assays are targeting different entities (although the different entities may be substantially correlated).We used this primarily comparison to assess how much each assay tended to differ from the 3 classic QVOAs, beyond how much the 3 differ from one another.These 4 levels of variation are modeled as random effects (normally distributed on natural log scale, centered at 0), whereas systemic assay scale differences and frozen storage were both modeled as fixed effects.We used Markov chain Monte Carlo methods to obtain posterior medians and 95% CIs for all effects.In addition to modeling all 9 assays together, we compared estimates obtained by modeling subsets separately: the 3 classic QVOAs, the 2 enhanced -sensitivity QVOAs, the 4 nextgeneration QVOAs, and the 3 classic QVOAs plus each of the other 5 assays (sets of 4 assays modeled together).Finally, we modeled every possible pair of different assays (36 different sets of 2) to obtain the most direct estimates of variation between pairs of assays.Each model assumed that the aliquot and batch sources of variability, along with the effect of frozen storage, were the same for all assays in the model.For reporting, random effect sizes (standard deviations) were exponentiated to obtain fold increases in variation.
We used the above estimates to project a "bottom line" expected error in each next-generation or enhanced sensitivity assay, relative to the classic QVOAs.Specifically, median absolute error (in log 10 terms) was computed, as in reference 9, from each model that included the 3 classic QVOAs plus 1 other assay.A range of IUPM values on the QVOA M scale was assumed, and the error estimates were adjusted for the fixed scale differences, so those differences did not count as error for any assay.We excluded batch-to-batch variation in these calculations, on the assumption that it could be avoided in practice by selecting samples to be run in the same batch.Variation due to assay type was assigned entirely to the alternative assay, reflecting an assumption that the classic QVOAs measure the most relevant biological entities.

Immunosorbent Assays
The IUPM maximum likelihood estimates and 95% CIs of infection frequency for each aliquot are presented in Figure 1.Assays using polymerase chain reaction or digital p24 antigen (Ag) (Simoa) assays to detect ex vivo-induced virus demonstrated consistently higher IUPM values than classic QVOA, specifically, standard-sensitivity p24 Ag enzyme-linked immunosorbent assay (ELISA) (Figure 2).QVOA SR was chosen as the reference because it is a classic QVOA with p24 Ag ELISA readout and also tended to have the lowest values.Relative to typical QVOA SR, the 2 other p24 Ag ELISA-based assays (QVOA S and QVOA M) average 2.2-and 3.1-fold higher values, respectively.The 2 QVOAs using ultrasensitive detection (usQVOA) methods (QVOA Simoa and QVOA RNA) average 4.5-and 28-fold higher IUPM values.Next-generation QVOAs reported 4.5-444-fold higher IUPM values.At these higher levels of detection, it is important to note the possibility of detecting p24 Ag or RNA produced by replication-defective virus [19], although approximately half of such cells producing RNA are replication competent [20].

Random Variation Between QVOA Types
Random variation was observed at the aliquot, batch, and assay levels, even after correction for systematic effects between assays and unavoidable Poisson error (Tables 3 and 4).When all 9 assays were considered together, aliquots tested in different batches using the same assay had 2.3-fold excess variation (95% CI, 2.0-2.7-fold).Split aliquots tested using different assays varied 3.1-fold (2.6-3.9-fold)beyond Poisson variation and systematic assay differences.Combined aliquot plus batch variation was estimated to be lower for the 3 classic QVOAs than for the other 2 types of assay, but CIs did overlap (1.9-fold variation for classic QVOA ultra sensitive vs 2.7-fold and 2.6-fold variation, respectively, for QVOA and next-generation QVOA).

Effect of Cryopreservation on IUPM Values
Overall, cryopreservation caused small reductions in IUPM values, but increases or an absence of an effect could not be ruled out.In the primary models tested (Tables 3 and 4), the estimated systematic fixed effect of cryopreservation on IUPM measurements involved reductions of 18%-37%, and all 95% CIs spanned the absence of an effect.At low IUPM values, next-generation assays with higher readout scales tended to have smaller typical errors (median absolute log 10 errors) than QVOA M and QVOA SR, owing to detection of more abundant targets (0.1 IUPM) (Supplementary Table 2).They did not similarly outperform QVOA S on this metric because of the high cell input, which is due to the high number of replicates performed in this QVOA.Differences in error diminished with increasing IUPM values, with the exception of iCARED caRNA1, which lost accuracy relative to other assays at high IUPM values; this is potentially attributable to a higher likelihood of all positive replicate wells at many dilutions and consequently a higher scale factor (typical IUPM output, 444-fold above QVOA SR) (Figure 2).

Pairwise Comparison of Variation Between Assays
Seven of the 9 assays studied had correlated readouts (random variation between all pairs, <2-fold) (Figure 3).These assays are all 3 classic QVOAs, both alternate-readout QVOAs, and both next-generation inducible QVOAs using gag templates (iCARED caRNA1 and cfRNA).Within this group, 3 pairs had particularly good agreement with each other (magnitude of between-assay variation not exceeding that of batch variation): iCARED caRNA1 and QVOA M, QVOA RNA and QVOA M, and iCARED cfRNA and QVOA S. The 2 other assays, TILDA and iCARED caRNA2, which both detect multiply spliced tat/ rev transcripts, clustered together (random variation, <2-fold).in reservoir size with desired precision.The classic QVOA, though historically the reference standard, is not scalable for routine use in clinical studies of latent reservoir-reducing interventions.In addition, the low readout scale of classic QVOA limits its sensitivity and dynamic range, and it is not routinely practical to obtain the larger peripheral blood mononuclear cell inputs needed to improve these assay characteristics.The greater sensitivity of next-generation assays reduces cell input requirements and increases the available dynamic range for measuring reductions in the size of the latent reservoir.Although these assays measure related targets-CD4 + cells harboring inducible provirus-derived p24 Ag or RNA-there is variation among experimental approaches, even within assay categories [21,22].Even among classic QVOA procedures, there are differences in cell stimulation methods, feeder cell type, and even isolation of CD4 + cells (thus determining input cell numbers) [5,6,9,23].Because of these differences, each assay measures a slightly different aspect of latency, reflected in both the systematic and the random variation observed between assays [4,14].
A single, complete measurement of the clinically relevant latent reservoir remains elusive [24].Although the classic QVOA provides an underestimate, it is considered the best approximation, pending further progress.If systematic differences in IUPM measurements are quantified between types of assays (ie, replication-competent virus that is detectable as exponential progressive increases in supernatant p24 Ag detected by ELISA in classic QVOA vs induced virus supernatant or cell-associated HIV Ag or RNA detected by next-generation assays), then the more scalable next-generation assays could be used as proxies for classic QVOA, capitalizing on their enhanced sensitivity, relative precision, and dynamic range.
Once systematic differences in assay scale were accounted for, we found that next-generation assay readout both correlated with classic QVOA and exhibited similar levels of random variation.In some cases, the excess variation associated with using a next-generation assay as proxy for classic QVOA was found to be similar to that of batch-to-batch variation (Figure 3).This finding provides initial evidence that some next-generation  assays may in fact be suitable proxies.It remains to be seen, however, whether responses to latency-reducing agents or other therapeutic interventions are similar across assays.
In experimental practice, combined aliquot and batch variation may be more relevant than variation at a single level, because aliquot-and batch-level variation would be combined when samples are tested in different batches.In theory, if a research study could batch samples from a participant, then only aliquot variation (the first source of excess variation described in Methods) would contribute to extra variability above Poisson variability.Batch size may be limiting because most laboratories cannot set up large batches, this might not be a factor if one needs to assay only 2-3 longitudinal samples from a participant to measure the efficacy of an intervention.
Assays using ultrasensitive means of monitoring QVOA culture supernatants tend to report approximately 4.5-28-fold higher IUPM values than classic QVOA when normalized to QVOA SR (Figure 2).The increased sensitivity of monitoring outgrowth by RNA or digital p24 Ag assays may come at the cost of the inability to distinguish clinically relevant virus that is capable of robust replication from defective or ineffective virus or nonpackaged viral RNA [4].However, it has been recently shown that approximately half of the cell-free virions measured in the more sensitive QVOA RNA assay are replication competent [20], suggesting that the efficient propagation of these virions in culture may be a limiting step contributing to underestimation of the size of the reservoir given by cell culture-based assays.
Interestingly, newer modifications of the QVOA can increase the sensitivity by as much as 20-fold [25].In addition, cells that produce HIV Ag, but not replication-competent virus, may merit clinical attention as contributors to immune activation and pathogenesis [26].Cells carrying defective proviruses can produce viral RNA; therefore, higher IUPM values with RNA-based assays may reflect cells with defective proviruses or suboptimal efficiency of QVOAs.Only by demonstrating exponential increases in viral RNA over time can these assays demonstrate replication-competent virus.
In the current study, we found that cryopreservation had a <2-fold effect on IUPM estimates.The cryopreservation process, however, is complex and quality of the procedure can vary dramatically in different study sites.It is important to note that experienced researchers carried out all laboratory procedures, and thus our results likely represent a best-case scenario.Our analysis also assumes that freezing causes a fixed fold change in all samples.If cryopreservation had opposite effects on different subsets of the reservoir, it could decrease assay reliability in a manner not captured by our study.
Outgrowth assays using culture-based methods tend to underestimate the true size of the latent reservoir, and measurement of outgrowth depends on the capacity of infected cells to produce infectious virus on stimulation [5,22].Further work is needed to clarify the genetic nature of HIV provirus and induced virions and thus the replication capacity of infected cells producing cell-associated and/or cell-free RNA.This should include genetic characterization of low-level virus observed in cultures lacking the robust kinetics required for detection with classic QVOA monitored by p24 ELISA.
Specifically, it will be important to determine whether induced viral RNA or virus with low-level growth kinetics at concentrations not detectable by ELISA are replication competent, and thus relevant to the reservoir that would rebound when treatment is interrupted.This would be informed by genetic characterization of the HIV transcripts detected and virus present at low levels in culture wells, and by assessment of the intactness of provirus producing such transcripts and virions.Preliminary studies have shown that examination of longitudinal outgrowth kinetics and single-genome sequencing analyses verified replication competence of reactivated virus in some cases [25].However, what is critically needed for the field is to identify, evaluate, and validate methods that could accurately predict the time to rebound after treatment interruption.Although beyond the scope of this comparison study, development of a consortium similar to the RAVEN program to collaborate with therapeutic trials involving treatment interruption is needed to inform future eradication strategies.
Overall, our results offer cautious support for applying next-generation assays with systematically higher readouts as proxies for the more laborious and less sensitive classic QVOA.Analytical tools introduced by Rosenbloom et al [10].allow rigorous comparison of outgrowth-based dilution coculture and next-generation polymerase chain reaction assays that are designed to specifically quantify intact proviruses or transcripts, pointing the way toward precise, efficient assessments of HIV cure strategies [26][27][28].The RAVEN program is now executing larger-scale studies evaluating many of these assays.

Figure 3 .
Figure 3. Between-assay random effect (fold variation in infectious units per million [IUPM] values).Comparison of extra-Poisson variation in split samples tested by different assays.Purple boxes indicate pairs of assays with <2-fold excess random variation; orange boxes, pairs with <1.3-fold excess variation; black circles, median estimates; blue shaded areas, upper limit of credible interval; and size of white center, lower limit of credible interval.One-fold variation is the minimum possible, corresponding to no excess variation after correction for any systematic scale effects (Figure2).The fold variation in IUPM values is the result of exponentiating the standard deviation of the random effect modeled on the natural log scale.Abbreviations: caRNA1, cell-associated human immunodeficiency virus (HIV) gag RNA; caRNA2, cell-associated HIV tatrev RNA; cfRNA, cell-free HIV RNA; iCARED, inducible cell-associated RNA expression in dilution; QVOA, quantitative viral outgrowth assay; QVOA M, QVOA by University of Pittsburgh; QVOA RNA, QVOA by University of California, San Diego, with HIV RNA readout; QVOA S, QVOA by Johns Hopkins University; QVOA Simoa, QVOA by Southern Research using Simoa readout; QVOA SR, QVOA by Southern Research; TILDA, tat/rev-induced limiting dilution assay.

Table 4 . Estimated Extra-Poisson Variation for Effect of Cryopreservation for Classic Quantitative Viral Outgrowth Assay Versus Next-Generation Assays, After Adjustment for Fixed Scale Differences Between Assays at the Aliquot, Batch, and Assay Levels
: caRNA1, cell-associated human immunodeficiency virus (HIV) gag RNA; caRNA2, cell-associated HIV tat-rev RNA; cfRNA, cell-free RNA; CI, credible interval; NA, not applicable; QVOA, quantitative viral outgrowth assay; QVOA RNA, QVOA by University of California, San Diego, with HIV RNA readout; QVOA Simoa, QVOA by Southern Research using Simoa readout; TILDA, tat/rev-induced limiting dilution assay. Abbreviations