Syphilis Laboratory Guidelines: Performance Characteristics of Nontreponemal Antibody Tests

Abstract We reviewed the relevant syphilis diagnostic literature to address the following question: what are the performance characteristics, stratified by the stage of syphilis, for nontreponemal serologic tests? The database search included key terms related to syphilis and nontreponemal tests from 1960–2017, and for data related to the venereal disease research laboratory test from 1940–1960. Based on this review, we report the sensitivity and specificity for each stage of syphilis (primary, secondary, early latent, late latent, or unknown duration; tertiary as well as neurosyphilis, ocular syphilis, and otic syphilis). We also report on reactive nontreponemal tests in conditions other than syphilis, false negatives, and automated nontreponemal tests. Overall, many studies were limited by their sample size, lack of clearly documented clinical staging, and lack of well-defined gold standards. There is a need to better define the performance characteristics of nontreponemal tests, particularly in the late stages of syphilis, with clinically well-characterized samples. Published data are needed on automated nontreponemal tests. Evidence-based guidelines are needed for optimal prozone titrations. Finally, improved criteria and diagnostics for neurosyphilis (as well as ocular and otic syphilis) are needed.

Since Treponema pallidum cannot be cultured, and direct detection methods are not routinely available in most clinical settings, the detection of nonspecific or nontreponemal and treponemal antibodies forms the mainstay of syphilis laboratory diagnoses. Of note, the terms "nonspecific" or "nontreponemal" antibodies would be more accurately termed "antiphospholipid" antibodies, since they represent host antibodies made in response to phosphatidylcholine taken up from mammalian tissue by T. pallidum. However, these terms are commonly used in the literature and in clinical practice, so we have elected, for clarity, to use them in this document. For further discussion of treponemal-specific antibodies, please see Park et al in this issue. Antiphospholipid antibodies are used in combination with treponemal antibodies in the clinical context to help diagnose infections with T. pallidum. The primary antiphospholipid antibody tests in current use are the rapid plasma reagin (RPR), the venereal disease research laboratory (VDRL) and, to a much lesser extent, the toluidine red unheated serum test (TRUST) and unheated serum reagin. Both nonautomated and automated platforms are available to detect these antibodies. Nontreponemal tests are generally performed on serum, but some may also be performed on cerebrospinal fluid (CSF) to aid in the diagnosis of neurosyphilis. Finally, nontreponemal antibody titers are used to monitor treatment responses, although this is not the focus of this review [1].
The diagnosis of any syphilis stage relies on a clinical evaluation of patient symptoms and medical history, as well as on an interpretation of laboratory tests. As syphilis rates continue to rise throughout the United States, there is a need to systematically identify the performance characteristics of nontreponemal tests to aid laboratorians as they seek to provide support to clinicians for syphilis diagnoses. We sought to review the literature to address this question: what are the performance characteristics, stratified by the stage of syphilis, for nontreponemal serologic tests?
The combined literature search resulted in 4534 documents. Abstracts for these 4534 publications were reviewed, papers not relevant to syphilis or to nontreponemal syphilis serologies were excluded, and 452 publications were selected for full review. Those that did not address the relevant topic, did not include primary data, related solely to the serofast state, or focused on tests not currently in use were excluded. This yielded 138 papers that were abstracted. A review of references of these 138 papers revealed an additional 2 relevant publications for a total of 140 papers. Additionally, data were obtained from the Food and Drug Administration (FDA) on the 2 approved, automated nontreponemal tests whose primary data had not yet been published.
Papers were assigned to several categories: (1) false negatives; (2) positive nontreponemal tests in conditions other than syphilis (including subsections for pregnancy, leprosy, illicit drug use, malaria, human immunodeficiency virus [HIV], hepatitis C virus, autoimmune diseases, endemic treponematoses, and vaccinations); (3) primary syphilis; (4) secondary syphilis; (5) early latent syphilis; (6) late latent syphilis; (7) tertiary syphilis; (8) neurosyphilis; (9) ocular syphilis; (10) otic syphilis; and (11) automated nontreponemal tests. Papers were additionally categorized by relevance and quality, into those which were most relevant and of high quality, those of moderate quality and relevance, and those of lower quality and relevance, based on specific criteria detailed separately in each syphilis category below. Several studies reported on composite endpoints (eg, primary and secondary syphilis together or "nonspecific" latent syphilis), which limited their impact. Data abstracted from these studies were not included in the findings below.

RESULTS AND DISCUSSION
Overall, many of the studies reviewed were retrospective and had small sample sizes. Many were also limited by a lack of clearly documented clinical staging and well-defined gold standards. Documentation of the treatment status or the time interval between treatment and sample collection was not always clear. There were no published papers examining the performance characteristics of the 2 existing FDA-approved, automated nontreponemal assays at the time that the literature review was conducted. With these caveats, more detailed findings for each subgroup follow below.

Conclusions
Serum RPR and VDRL are 62-78% sensitive for the diagnosis of primary syphilis.

Secondary Syphilis
For secondary syphilis, we identified 11 high-quality papers with a clearly defined gold standard (see Tables 1 and 2) [2, 4, 6-9, 12, 16-18, 20] and 9 lower-quality papers with a less well-defined gold standard [23,[25][26][27][28][29][30][31]101]. There was some heterogeneity in gold-standard definitions within the highquality papers. Gold standards for the 11 high-quality papers included a clinical diagnosis, a clinical diagnosis "with positive darkfield, " and a clinical diagnosis with reactive treponemal or alternative nontreponemal tests (e.g., RPR, VDRL, or reagin screening test). Based on high-quality papers, the sensitivity of the VDRL was 100%. RPR was also reported as 100% sensitive, with the exception of 1 high-quality paper that reported a sensitivity of 97.2% [8]. No studies reported on specificity in the setting of secondary syphilis.

Conclusions
Serum RPR and VDRL are 97-100% sensitive for the diagnosis of secondary syphilis.

Early Latent Syphilis
We identified 4 papers on early latent syphilis, and 2 were deemed high-quality based on having a well-defined gold standard [17,20] (see Tables 1 and 2). The 2 others with less well-defined gold standards [23,27] were deemed to be of lower quality. Based on these 4 studies, the overall sensitivity of VDRL ranged from 82.1-100%. Based on the 2 high-quality studies, the sensitivity of VDRL ranged from 85-100% [17,20]. TPPA, CIA, then dilution Overall incidence of prozone phenomenon of .83% 36/1573 RPR−, TPPA + were prozone. Prozone reaction was most common in primary and secondary syphilis; neurosyphilis and pregnancy also increased the odds of prozone.

Conclusions
Based on a small number of studies (2 high-quality and 2 lowerquality studies), the sensitivity of VDRL for early latent syphilis ranges from 82-100% overall, and ranges from 85-100% based on the 2 high-quality papers.

Late Latent Syphilis or Syphilis of Unknown Duration
We identified 2 high-quality papers with a clearly defined gold standard [17,21] (see Tables 1 and 2) and 2 lower-quality papers with a less well-defined gold standard for late latent syphilis or syphilis of an unknown duration (see Table 1) [23,27]. There was 1 high-quality study that reported a sensitivity of 61% for RPR [21]. The other 3 studies reported a sensitivity of 64-75% for VDRL [17,23,27] and the other high-quality study reported a sensitivity of 64% for VDRL [17].

Conclusions
Based on a small number of studies (2 high-quality and 2 lowerquality studies), the sensitivity of RPR and VDRL for diagnosing late latent syphilis ranges from 64-75% overall, and ranges from 61-64% based on the 2 high-quality papers.

Tertiary Syphilis
Only 2 lower-quality papers, for which the gold standard was not well defined, were identified for tertiary syphilis (see Tables  1 and 2) [27,31]. The first, based on 17 patients, reported a VDRL sensitivity of 47% [27]. The second, based on 58 patients, reported a VDRL sensitivity of 64% [31].

Conclusions
Based on 2 studies with very limited data, the sensitivity of serum VDRL for tertiary syphilis is 47-64%.

Unspecified Syphilis Stage
We identified 12 high-quality studies [16,74,[89][90][91][92][93][94][101][102][103][104] with a well-defined gold standard and 8 [28,30,95,[105][106][107][108][109] lower-quality studies with a less well-defined gold standard, all of which provided little or no information on the syphilis stage. Without information on the syphilis stage, these papers were more difficult to interpret. However, the papers which contained data on both RPR and VDRL provided some useful information. There were 5 high-quality papers that reported a similar or higher sensitivity of the serum RPR as compared to the serum VDRL, independent of syphilis stage [74,89,90,[92][93][94], and 1 study that reported a lower sensitivity of RPR as compared with VDRL [91]. We found, 3 high-quality studies reported a higher specificity of RPR as compared to VDRL [74,92,94], while 1 high-quality study reported a slightly lower specificity of RPR as compared to VDRL [91]. Importantly, the serum RPR and VDRL titers were not equivalent (in 1 study, only 29% of sera had concordant titers), suggesting that serum RPR and VDRL titers should not be used interchangeably to manage patients [93] (see Table 1).  For full references, see the text of the paper and Table 1.
Abbreviations: CSF, cerebrospinal fluid; RPR, rapid plasma reagin; VDRL, venereal disease research laboratory. a 1 high-quality paper reported a sensitivity of 92.7%. b 1 high-quality paper reported a sensitivity of 50%. c 1 high-quality paper reported a sensitivity of 97.2%.
In general, serum RPR appears to be more sensitive than serum VDRL at detecting nontreponemal antibodies, independent of the syphilis stage. Based on more limited data, RPR also appears to be more specific than VDRL. Serum RPR and VDRL titers should not be used interchangeably to manage patients.

Neurosyphilis
A neurosyphilis diagnosis is challenging, as no single laboratory test is perfectly sensitive and specific for a diagnosis. Additionally, clinical guidelines focus on obtaining lumbar punctures primarily in situations in which patients are symptomatic, as the significance of CSF abnormalities in the absence of symptoms is unclear. Assessing the literature was difficult as well. Definitions for neurosyphilis differed between studies. Additionally, some definitions were circular (for example, CSF VDRL was often included as part of the definition of neurosyphilis in studies reporting on the sensitivity of CSF VDRL), which complicates an interpretation of nontreponemal test performance characteristics. A mixture of symptomatic and asymptomatic patients was included in many studies, and it was not always possible to determine nontreponemal test characteristics separately for these groups. We identified 4 high-quality studies with a gold standard that did not include a CSF nontreponemal test [32][33][34][35] (see Tables 1 and 2); 13 moderate-quality studies where the goldstandard definition was partially circular (ie, included some element of the nontreponemal test itself) [17,40,[110][111][112][113][114][115][116][117][118][119][120]; and 2 lower-quality studies with a less well-defined gold standard [121,122]. We identified 2 high-quality case reports [36,38] (and 2 lower-quality reports [123,124]) and 1 high-quality study reporting on the amount of CSF blood contamination required to show a false-positive CSF VDRL [37] (see Table 1).
Based on the 4 high-quality studies, the sensitivity of the CSF VDRL ranged from 49-87.5% and the specificity ranged from 74-100% for diagnosing neurosyphilis [32][33][34][35]. In the studies in which the specificity was <90%, the gold standard was either CSF PCR for T. pallidum, symptoms (defined as new vision or hearing loss), clinical signs and symptoms with a positive CSF Treponema pallidum particle agglutination (TPPA) assay, or CSF white blood cells ≥ 10 and a positive CSF TPPA assay [33][34][35]. The gold standard for the study showing 99% specificity was positive serologies, reactive CSF fluorescent treponemal antibody (FTA), increased CSF protein >45 mg/dL, and CSF pleocytosis ≥10 cells/mm 3 [32]. The sensitivity of CSF RPR ranged from 51.5-81.8% and the specificity ranged from 81.8-100% [32,34,35]. For symptomatic neurosyphilis, the sensitivity of CSF VDRL ranged from 66.7-87.5% and the specificity ranged from 78.2-90.2% [32][33][34][35]. For symptomatic neurosyphilis, the sensitivity of CSF RPR ranged from 51.5-100% and the specificity ranged from 89.7-90.2% [32,34,35]. However case definitions for symptomatic neurosyphilis had significant heterogeneity (2 studies by Marra et al [33,34] defined symptoms as "vision or hearing loss"; for the remaining studies, symptoms were not further defined). There was 1 high-quality study that reported a sensitivity of 76.2% and specificity of 93.1% for CSF TRUST [35]. Limited data suggest that the sensitivity of CSF RPR may be lower than that of CSF VDRL [34]. There were 2 good-quality case reports describing false positive CSF VDRL results in the setting of central nervous system malignancy [36,38]. Finally, 1 paper reported on a laboratory experiment in which syphilitic blood was added to CSF, demonstrating the principle that bloody contamination of the CSF during a traumatic tap could lead to a false-positive CSF VDRL in a patient with syphilis, particularly a patient with a high VDRL serum titer [37]. (For a discussion of the use of CSF treponemal antibodies in neurosyphilis, please see Park et al in this issue.) Studies have suggested the neurosyphilis risk is highest in persons living with HIV with a serum RPR of >1:32 [125][126][127].

Conclusions
A neurosyphilis diagnosis is challenging. Different definitions for neurosyphilis across studies, the heterogeneity of gold standards used, and the inclusion of a mixture of symptomatic and asymptomatic patients were highlighted as limitations. Based on current data, there is no recommendation for the use of 1 assay over the other in the laboratory diagnosis of neurosyphilis, though limited data suggest that CSF RPR may be less sensitive than CSF VDRL. CSF VDRL is 49-87% sensitive and 74-100% specific for diagnosing neurosyphilis. CSF RPR is 51-82% sensitive and 82-100% specific for diagnosing neurosyphilis. Due to significant heterogeneity in case definitions, a lack of gold standards, and the wide range of results, it is not possible to give definitive information on the performance characteristics of CSF nontreponemal tests in neurosyphilis diagnosis.

Conclusions
The sensitivity of CSF VDRL in ocular syphilis is <50%.

Otic Syphilis
Similar to ocular syphilis, the diagnosis of otic syphilis relies on a clinical assessment in the setting of reactive serum serologies, as over 90% of persons with otic syphilis may have normal CSF parameters. We identified 4 lower-quality papers, all cases series (see Table 1) [57][58][59][60]. Of these, 2 reported a sensitivity of CSF VDRL ranging from 5.4-5.6% [59,60] (see Table 2). The sensitivity of serum VDRL was reported in 1 study at 22.2% [59], and the sensitivity of serum RPR was reported in 2 studies at 55-88.9% [57,58].

Conclusions
Limited data suggest that the sensitivity of CSF VDRL is poor (<10%) in otosyphilis.

False Negatives: Prozone Reaction
We identified 2 high-quality papers with a clear gold standard [61,63] and 1 high-quality case report [62], as well as 2 lowerquality case reports [128,129] (see Table 1). Based on the 2 high-quality papers, the prevalence of false-negative results from nontreponemal syphilis tests is rare (<0.85% of those tested) [61,63] The prozone reaction generally refers to a falsenegative response arising from cases in which high antibody titers interfere with the antigen-antibody lattice network formation necessary to visualize a positive flocculation test [57]. The prozone reaction can occur during any stage of syphilis but is more common in primary and secondary syphilis; neurosyphilis and pregnancy may increase the risk of the prozone reaction [61,63]. There was 1 large study of 46 856 samples that reported that a third of prozone reactions may occur when titers are ≤1:16 [63]. Finally, 1 study reported that a cold centrifuge led to false negatives [62]. There was no formal guidance on the optimal number of serial dilutions when investigating a possible prozone reaction. In 1 study, titers were rechecked with serial dilutions up to 16-fold [61]; in the other, they were diluted from 1:1 to 1:32 [63].

Conclusions
The prevalence of false-negative results from nontreponemal syphilis tests is rare (<0.85% of those tested.) The prozone phenomenon can occur during any stage of syphilis but is more common in primary and secondary syphilis; neurosyphilis and pregnancy may increase the risk of the prozone reaction. A third of prozone reactions may occur when titers are ≤1:16. False negatives may be more common if sera are centrifuged at colder temperatures (eg, 4℃ vs 27℃).

Conclusions
In most large studies with a well-defined gold standard (generally a reactive treponemal test), the overall prevalence of BFPs in the general populations tested for syphilis was ≤1.5%. Several factors were associated with BFPs, including older age, certain autoimmune diseases, leprosy, yaws, and HIV. More limited data exist on the associations of malaria, hepatitis B infection, hepatitis C infection, and injecting drugs with increased risks of BFPs. The data for the association of BFPs and pregnancy are conflicting. The data do not suggest a significant impact of vaccination on BFPs. In general, BFPs tend to occur when the nontreponemal titers are low, but there is clear documentation of exceptions where BFPs occur with higher nontreponemal titers. The duration of BFPs may depend on their underlying etiology. BFPs, in most cases, tend to revert to nonreactive.

Automated Nontreponemal Tests
There are limited data available comparing automated nontreponemal tests (platforms which allow for the automated qualitative and quantitative detection of nontreponemal antibodies in serum or plasma) and manual nontreponemal tests [28,89,[95][96][97][98]. Only 2 automated assays are currently FDA approved (see Table 1) [99,100]. Data vary, but studies suggest similar overall performances of automated and nonautomated tests. A large number of samples were tested in the unpublished data submitted to the FDA, although not all the sample testing was relevant. For 1 test where the comparator was an RPR card test, the overall agreement was close to 100%, with the percent positive agreement (PPA) ranging from 95-100% and the percent negative agreement ranging from 99.1-100% [99]. For the other, in which the comparator was "an FDA-approved RPR test, " the PPA ranged from 81.5-100% (with 1 instance, with only 6 samples where PPA was 0%) and the percent negative agreement ranged from 80.7-98% [100]. The automated tests may only provide a limited range of titer dilutions, beyond which manual procedures are necessary to achieve endpoint titration. Endpoint titers are necessary for patient management and should be obtained in all cases, without exception.

Conclusions
Based on limited data (including some unpublished manufacturer's data submitted to the FDA), the automated nontreponemal tests appear to have reasonable performance as compared to the nonautomated tests. However, manual procedures may be necessary to achieve endpoint titration, which is critical for patient management.

GAPS IN THE FIELD AND OVERALL CONCLUSIONS
Overall, there is a need to better define the performance characteristics of nontreponemal tests, particularly in neurosyphilis and the latent stages of syphilis, with clinically well-characterized samples, including in populations living with and without HIV. Published data are needed on FDA-approved, automated nontreponemal tests. Additionally, questions remain regarding whether titers from automated tests are interchangeable with manual tests, and what recommendations should be made regarding manual testing proficiency (especially as expertise declines with the introduction of the automated tests). Data are needed to better define the performance characteristics of nontreponemal tests in neurosyphilis. Studies which exclude the nontreponemal test in question as part of the gold-standard definition would be particularly valuable. Additional data are needed to gain a better understanding of the relationship between disease activity and nontreponemal antibody titers. Evidence-based guidelines for prozone titrations and improved criteria and diagnostics for neurosyphilis (as well as ocular and otic syphilis) are needed.