Contemporary Diagnostic Imaging Modalities for the Staging and Surveillance of Melanoma Patients: a Meta-analysis

regional Background Meta-analyses were performed to examine the utility of ultrasonography, computed tomography (CT), positron emission tomography (PET), and a combination of both (PET-CT) for the staging and surveillance of melanoma patients. Method Patient-level data from 74 studies containing 10 528 patients (between January 1, 1990, and June, 30, 2009) were used to derive characteristics of the diagnostic tests used. Meta-analyses were conducted by use of Bayesian bivariate binomial models to estimate sensitivity and specificity. Diagnostic odds ratios [ie, true-positive results/ false-negative results)/(false-positive results/true-negative results)] and their 95% credible intervals (CrIs) and positive predictive values were used as indicators of test performance. Among the four imaging methods examined for the staging of regional lymph nodes, ultrasonography had the highest sensitivity (60%, 95% CrI = 33% to 83%), specificity (97%, 95% CrI = 88% to 99%), and diagnostic odds ratio (42, 95% CrI = 8.08 to 249.8). For staging of distant metastases, PET-CT had the highest sensitivity (80%, 95% CrI = 53% to 93%), specificity (87%, 95% CrI = 54% to 97%), and diagnostic odds ratio (25, 95% CrI = 3.58 to 198.7).

Staging guidelines for distant metastatic melanoma published by the National Comprehensive Cancer Network indicate that melanoma patients with regional lymph node involvement (ie, American Joint Committee on Cancer stage III) (1) should undergo diagnostic imaging at the time of diagnosis (2), although the detection rate is low particularly in subsets of asymptomatic patients with microscopically detected disease in the lymph nodes (stage IIIA) (3). The guidelines further state that those with less advanced disease (ie, stage IA or II) with clinical indications (ie, suspected or palpable lymph nodes) should be considered for imaging. Although sentinel lymph node biopsy is the acknowledged gold standard for pathological staging of clinically lymph nodenegative patients (4), in some clinical settings, ultrasound has also been used for preoperative lymph node assessment and postoperative surveillance (5,6). Positron emission tomography (PET) and a combination of PET and computed tomography (CT) (PET-CT) have rapidly gained acceptance as the imaging modalities of choice for identifying metastatic melanoma but have often been applied without regard to tumor-specific risk strata or known benefits (7)(8)(9)(10). Given the limited nature of health-care resources, it is critical to examine these and other new technologies as they emerge.
In 2006, the number of melanoma survivors in the United States was estimated to be more than four million (11), largely as a result of the successful treatment of most patients with newly diagnosed early-stage melanoma (12). In up to 50% of these patients, however, the tumor may recur (13)(14)(15), with the risk of first recurrence being greatest in the initial years after diagnosis (16)(17)(18). It has been estimated that 20% of all first recurrences occur locally, 50% occur in the regional lymph nodes, and 30% arise at distant ARTICLE Contemporary Diagnostic Imaging Modalities for the Staging and Surveillance of Melanoma Patients: a Meta-analysis sites (19)(20)(21)(22). Although surgical resection continues to be the standard treatment for local and regional recurrences, reports of surgical resection or metastasectomy for distant recurrence in select patients have also been associated with improved survival (23)(24)(25)(26)(27). These optimistic reports of survival after salvage surgical resection of melanoma recurrences offer a rationale for defining optimal follow-up strategies. Despite the benefit of early detection of locoregional (19,28,29) or distant (23)(24)(25)(26)(27) recurrences in these patients, there are no evidence-based guidelines for their surveillance, and clinical practice patterns vary widely.
Currently, the most commonly used imaging modalities for melanoma patients include ultrasonography, CT, PET, and PET-CT. The proposed advantage of PET-CT is that differences in metabolism and function can be detected that complement anatomical imaging techniques (30). Several studies (7)(8)(9)(10)31) have reported characteristics of the individual diagnostic imaging tests for the evaluation of melanoma recurrences. However, the utility of each modality as applied to various clinical scenarios has not been examined, and the modalities have not been directly compared. The objective of this meta-analysis was to analyze the contemporary literature related to diagnostic imaging in melanoma patients and to compare the test characteristics of various imaging modalities, such as ultrasonography, CT, PET, and PET-CT, for the staging and surveillance of patients with melanoma.

Prior knowledge
Melanoma may recur in up to 50% of melanoma survivors, especially during the first years after diagnosis. Positron emission tomography (PET) and a combination of PET and computed tomography (CT) (PET-CT) have gained acceptance as the imaging modalities to identify recurrence in survivors and to stage lymph nodes and metastatic melanoma but evidence-based data on their risks and benefits are scarce.

Study design
Meta-analysis of patient-level data from published studies was used to derive characteristics (sensitivity, specificity, diagnostic odds ratio, and positive predictive value) of the four diagnostic imaging modalities.

Contribution
Among the four imaging methods examined, for regional lymph node staging, ultrasonography had the best performance. For staging of distant metastases, PET-CT had the best performance. Similar patterns were observed for surveillance of melanoma survivors for lymph node involvement and for distant metastases.

Implications
The superior modality for lymph node staging and detecting lymph node involvement was ultrasonography. The superior modality for staging and detecting distant metastases was PET-CT.

Limitations
Diagnostic criteria and the quality of the imaging equipment for each modality varied during the period studied. Most studies included in this meta-analysis had a retrospective design. A comprehensive literature search of MEDLINE (from January 1,  1990, through June 30, 2009), EMBASE (from January 1, 2001,  through June 30, 2009), Cancerlit (from January 1, 1990, through  October 31, 2002), and the Controlled Trials Register from the  Cochrane Library (from January 1, 1990, through June 30, 2009) was performed with the following keywords: "melanoma"; "lymph node metastasis"; "ultrasound"; "computed tomography"; "positronemission tomography"; and "positron emission tomography with computerized tomography." Articles identified from the search were reviewed in detail and included in the analysis if they met following criteria: 1) included more than 10 patients with melanoma and 2) included comparisons of single or multiple imaging modalities (ie, ultrasonography, CT, PET, and/or PET-CT) to a gold standard. For primary staging of regional lymph nodes, sentinel lymph node biopsy with pathological confirmation is the gold standard for clinically lymph node-negative patients (2,5,32). For surveillance studies, a minimum of 6 months of follow-up was required for clinical confirmation. No language restrictions were applied, and additional references in identified articles were also reviewed for inclusion.

Studies and Patients Included in the Meta-Analysis
The literature search yielded 1096 unique citations. In total, 1020 (93%) citations were excluded. The two most common reasons for exclusion were inadequate reporting of patient-level data that were required to calculate test characteristics and/or lack of a reported gold standard. Two reports that met inclusion criteria were excluded because of overlapping study populations (33,34). Thus, 74 studies containing 10 528 patients were included in this meta-analysis.
Patient-level data were extracted and used to construct two-bytwo tables. Each melanoma patient who was included in this study had undergone ultrasonography, CT, PET, or PET/CT. Their test results had been classified as true positive, true negative, false negative, or false positive by the histological analysis of lymph node specimens or distant metastasis specimens or by the outcome after long-term follow-up (ie, >6 months) as the gold standards. Diagnostic test characteristics were analyzed according to standard definitions for each individual study: sensitivity [TP/(TP + FN)], specificity [TN/(FP + TN)], false-negative rate [FN/(FN + TP), or 1 2 sensitivity], and the positive predictive value [TP/(TP + FP)] (where TP is the number of patients with a true-positive result, TN is the number of patients with a true-negative result, FP is the number of patients with a false-positive result, and FN is the number of patients with a false-negative result). Accuracy was calculated as [(TP + TN)/number of patients in the study. Several studies reported true-positive, false-positive, false-negative, and true-negative results at the patient level, whereas others reported these values at the lesion or assessment level. In these instances, independence between lesions on the same image and different assessments for the same patient were assumed.
After data abstraction, two raters (Y. Xing and J. N. Cormier) independently assessed the quality of the included studies by use of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) scale (35). Discrepancies were resolved by consensus or third-party review (by R. L. Askew). This scale contains 14 items that examine potential sources of bias in diagnostic studies, with one point assigned for each criterion satisfied. The questions related to the representativeness of the sample, selection criteria, and the appropriateness of the reference standard test. Higher scores reflect higher quality, and no articles were excluded because of the assigned quality score.

Statistical Analysis
Bayesian Bivariate Binomial Model To provide overall summary estimates for sensitivity and specificity for each imaging modality, Bayesian bivariate binomial models were applied that were similar to those proposed by Chu and Cole (36). This model assumed a binomial distribution for the number of patients with true-positive and true-negative results and allowed the inclusion of covariates and random effects. The inherent association between sensitivity and specificity was modeled in the bivariate normal distribution by assuming random effects. The full model can be expressed as: where i represent the individual diagnostic studies n l,i = TP i + FN i (ie, the number of subjects with the disease), n 0,i = TN i + FP i (ie, the number of subjects without the disease), X i and Z i were vectors of covariates related to specificity and sensitivity, respectively, a and b were the corresponding regression coefficients, and µ and n are the variables representing random effects, N is normal distribution, s 2 is the between-study variance, and r is correlation coefficient.
To fully specify the model, the following vague previous probability distribution were assigned to the parameters: Separate analyses were conducted for detecting lymph node and distant metastases, but ultrasound imaging was not included in the model for detecting distant metastasis. Both models included the specific surveillance tests used to study melanoma patients (eg, ultrasonography, CT, PET, and PET/CT) as covariates along with other clinically important covariates to account for betweenstudy heterogeneity, including study design (eg, prospective vs retrospective), reason for diagnostic imaging (eg, primary staging only, restaging, or both), and the level of tumor assessment (eg, patient or lesion). Sensitivity, specificity, and diagnostic odds ratios were calculated as conditional probabilities that were identified as specific values for the covariates in the model. The diagnostic odds ratio, defined as [(TP/FN)/(FP/TN)], was used as an indicator of test performance because it combines information from all four statistical cells. Its value ranges from zero to infinity, with a higher value indicating better discriminatory power. A value of 1.0 is expected for tests with no difference detected between disease and nondisease groups (37). The 95% credible intervals (CrIs), or Bayesian confidence intervals, which are equivalent to the frequentist's confidence interval, were calculated for sensitivity, specificity, and diagnostic odds ratios. Positive predictive values were also calculated for each of the diagnostic modalities by use of the following formula: (sensitivity × prevalence)/{(sensitivity × prevalence) + [(1 2 specificity) × (1 2 prevalence)]}, with 5-year recurrence serving as prevalence estimates (38). The three prevalence risk categories for lymph node and distant metastasis were defined as follows for the calculation of the positive predictive value: low = 5%, intermediate = 15%; and high = 30%.

Model Implementation
Bayesian bivariate binomial models were constructed with Markov chain Monte Carlo methods by use of WinBUGS, version 1.4.2 (39). Each covariate was centered about its mean to ensure approximate previous independence between the regression coefficients and good convergence of the three Markov chains (39). The first 10 000 draws were discarded, and only the second 10 000 draws were used to obtain posterior estimates that were based on three separate chains with overdispersed starting values. The Brooks, Gelman, and Rubin convergence statistics (40) were used to assess model convergence, and only properly converged models were further considered. The results were based on 30 000 draws, and R-hat (40) for all parameters was equal to 1.0 in two models, indicating model convergence.
Funnel plots were created to assess publication bias by examining the relationship between the effect measure (log diagnostic odds ratio) and its standard error. Standard error serves as a good proxy for sample size because of their inversely proportional relationship; a small value reflects high precision for an effect size estimate. Conversely, smaller studies are more likely to exhibit a larger spread around the summary estimate of effect size (41). Egger tests were used to assess asymmetry of the funnel plot and to quantitatively assess bias (42). For this analysis, P values were two-sided and statistical significance was defined as a P value of less than or equal to .05. These analyses were performed with Stata, version 10.0 (Stata Corporation, College Station, TX).

Results
Quality assessment scores for the diagnostic studies were calculated as the number of individual criteria satisfied from the 14-point QUADAS scale (35). Figure 1 shows the distribution of the quality rankings for the 74 studies included in this analysis, with an overall mean score of 5.8 (standard deviation = 2.5). Approximately 90% of the studies had a total quality score of less than 9.0, and the majority of articles satisfied items pertaining to details of the reference standard and index test. The most commonly unmet quality criteria related to insufficient detail when reporting patient withdrawals, intermediate results, and the selection and training of raters.
Funnel plots that demonstrate the effects of small study size for each diagnostic imaging modality are presented in Figure 3. Using diagnostic odds ratio as the effect measure, potential publication bias was identified for the studies examining ultrasonography because estimates from 11 of the 16 studies fell outside of the funnel. Results of the Egger test for small study effects were not statistically significant (P = .44), indicating that no trend toward higher levels of test accuracy was observed among studies with smaller sample sizes.

Discussion
The results of this meta-analysis indicate that when selecting among the four diagnostic imaging modalities examined, the anatomical site to be evaluated was more important than the clinical scenario (ie, staging or surveillance). Among the four diagnostic imaging modalities for the assessments of lymph node metastasis, ultrasonography was superior to CT, PET, and PET-CT. PET-CT had the highest positive predictive value for the surveillance of distant metastasis; however, the higher number of false-positive results (ie, lower specificity) from PET-CT lead to the loss of precision. Furthermore, for patients at low risk of metastasis, the positive predictive value of PET-CT (ie, 33%, 95% CI = 9% to 61%) indicated that use of PET-CT is not warranted without additional clinical indications.
Practice guidelines are becoming an increasingly important element in disseminating treatment algorithms to physicians who treat patients in a community setting (116)(117)(118)(119)(120)(121)(122)(123), and investigators have suggested that these guidelines can be used as a means of measuring the quality of care delivered (124). However, evidence-based   † These studies provided detailed results on different levels of lesions. ‡ This study was not included in the statistical models because of missing data on true-positive, false-positive, false-negative, and true-negative results. § For PET + CT, mean PET and CT were conducted independently but the final test results were determined using both tests. surveillance strategies for survivors of most cancers including melanoma do not exist. A recent report (125) that was based on Surveillance, Epidemiology, and End Results-Medicare data acknowledges geographic and patient variation in the receipt of surveillance after treatment of primary melanoma. With the increasing number of melanoma survivors and rapid advances in health-care technology, the costs of caring for these survivors are rising (126,127). In 1997, Mooney et al. (128) reported that screening for melanoma recurrence (in this report for asymptomatic pulmonary metastasis) accounted for approximately 80% of program costs, totaling between $27 and $32 million for a 20-year program. As technological advances permit us to more precisely determine metastatic tumor spread, physicians and patients alike are faced with making clinical decisions on the basis of contemporary risk assessment. Nevertheless, controversy continues to surround the optimal imaging modality and interval of patient surveillance. Sentinel lymph node biopsy is the acknowledged gold standard for the pathological staging of clinically lymph node-negative melanoma (3,5). A recent study by Sanki et al. (5) comparing ultrasonography with sentinel lymph node biopsy found that the sensitivity of targeted high-resolution ultrasound was only 24.3% (95% CI = 19.5% to 28.7%) compared with that of the sentinel lymph node biopsy. The combination of preoperative ultrasound and fine needle biopsy in select high-risk patients can, however, eliminate the need for sentinel lymph node biopsy by preoperatively identifying lymph node metastases, which indicate the need for therapeutic lymph node dissection (52,129,130). The primary utility of ultrasonography for the assessment of metastases in regional lymph nodes is for lymph node surveillance (31,46,131). PET-CT was superior for detection of distant metastases. Given the low positive-predictive value of CT, PET, and PET-CT in the surveillance of patients at low risk of lymph node metastasis, ultrasonography is the only justifiable imaging choice for lymph node surveillance.
The overall point estimates for the diagnostic test characteristics in this study are lower than those reported in two recently published prospective studies (6,132) that evaluated the utility of ultrasonography, CT, and PET in primary staging. Voit et al. (6) reported that ultrasonography combined with fine needle aspiration cytology had a sensitivity of 65% and a specificity of 99% in a cohort of 400 consecutive melanoma patients. Another study (132) reported the sensitivities of PET and CT for 251 patients with clinically palpable (stage III) lymph nodes as 86% and 78%, respectively, with a specificity of 94% for both tests. These discrepancies likely relate to heterogeneity among patient populations in these studies.
The purpose of staging and surveillance is to detect treatable tumors, monitor success of therapy, and provide reassurance and support to patients (133,134). However, these benefits must be balanced with the risks of testing to patients and their associated costs. Costs for CT, PET, and PET-CT can often be more than twice that of ultrasonography, with differences in charges of up to four times more. Although sufficient evidence regarding clinical effectiveness is not yet available to justify the use of new technologies, such as PET-CT, instead of the best existing alternatives, they are already widely used in oncology. Imaging is one of the fastest growing health-care services (135) and is a prime example of technology that must be examined in the context of comparative effectiveness to "improve the quality and Table 3. Estimates of sensitivity, specificity, and diagnostic odds ratio for the staging and surveillance of metastatic sites for ultrasonography (US), computed-tomography (CT), positron emission tomography (PET), and PET-CT* affordability of US health care" (136). Quality medical care has been summarized by Earle et al. (137) as the "delivery of optimal health services" (138), with "technical proficiency" (139); "avoiding overuse, underuse, or misuse of technologies" (140); and "in-corporating patient centered preferences in shared decision making" (141). Inappropriate imaging, which adds to health-care costs without improving the quality of care, has been attributed to both physician   and patient factors (142). Lack of knowledge (143) and fear of liability for missed diagnoses (144) attributed to physicians have commonly resulted in the inappropriate use of imaging. In addition, patients with a newly diagnosed cancer often expect certain examinations (145), particularly whole-body imaging. A negative imaging result, even when unnecessary, is often reassuring for the patient and physician and is often perceived to come with few if any negative consequences. However, levels of radiation exposure are known to vary widely even with the same imaging modality, potentially leading to health consequences, including increased lifetime risk of cancer (146,147). Furthermore, incidental abnormalities that can be identified on by imaging that do not affect health but require additional evaluation (eg, further imaging or interventional procedures) can result in additional associated costs, complications, and patient anxiety (142). Compared with previous meta-analyses (7)(8)(9)(10)31) that examined test characteristics of diagnostic imaging modalities in patients with melanoma, this analysis has a number of strengths. First, all eligible studies from January 1, 1990 through June 30, 2009 with sufficient data on four widely used contemporary imaging modalities (ie, ultrasonography, CT, PET, and PET-CT) were examined. Patient-level data from these studies were extracted and analyzed according to specific clinical scenarios (eg, initial staging vs surveillance); these data have been reported by few studies, despite the large potential impact of diagnostic imaging on both the quality and cost of medical care (148). Second, Bayesian bivariate binomial models were used for the meta-analysis of diagnostic test characteristics to capture the variability in both sensitivity and specificity simultaneously, as well as their intercorrelation. Such models are applicable to both large and small studies without ad hoc correction (36). Because of the methodological advantages of bivariate models, Harbord et al. (149) have recommended that such models be considered standard methods for meta-analysis of diagnostic accuracy.
This study has several limitations that must also be considered. First, technology has advanced over the last two decades, and the diagnostic criteria for each modality have varied during the period studied. Second, selection bias and work-up bias inherent to each individual study could be considerable in this pooled analysis because most of the studies of patients undergoing the index test were retrospective in design. Third, partial verification bias may exist when only those patients undergoing a reference test are included in a sample, and no data were reported on the remaining patients who only underwent the index test. Another welldescribed drawback of meta-analyses is publication bias because studies with favorable results have a higher likelihood of being published than those with unfavorable results. The studies examining the diagnostic accuracy of ultrasonography reported widely varying estimates of sensitivity and specificity ranging from 5% to 100% with similar variations observed for PET imaging. There are several potential explanations for such variation including small sample sizes in some studies, differing study designs, varying quality of imaging equipment, and differing imaging criteria for diagnosis. An inherent strength of a meta-analysis in evaluating a large body of literature is that it can overcome limitations of small sample sizes and heterogeneous designs of individual trials by pooling the data and obtaining summary sensitivities and specificities.
With the ever-increasing number of melanoma survivors and limited health-care resources, the need to tailor current consensusbased National Comprehensive Cancer Network guidelines toward an evidence-based cost-effective surveillance program is becoming increasingly critical. Test characteristics and performance are considered the first two levels of the evidence hierarchy for all diagnostic technologies (150). The objective of this analysis was to use contemporary techniques of meta-analysis to summarize the existing evidence for four common diagnostic imaging modalities that are used in the staging and surveillance of regional and distant metastasis for patients with melanoma. Future comparative effectiveness analyses should use decision-analytic modeling to simulate the effectiveness and cost-effectiveness of various surveillance strategies with respect to imaging modality and frequency on stage-specific patient outcomes.
In summary, when diagnostic imaging is indicated for staging or surveillance, we found that ultrasonography was the best diagnostic imaging test to detect lymph node metastases and that PET-CT was more suitable for the detection of distant metastases in patients at intermediate or high risk or when distant metastases are clinically indicated. Results of this meta-analysis should provide information for clinical decisions on the staging and surveillance of patients with melanoma.