Abstract

Objectives

The Ki-67 proliferation index is integral to gastroenteropancreatic neuroendocrine tumor (GEP-NET) assessment. Automated Ki-67 measurement would aid clinical workflows, but adoption has lagged owing to concerns of nonequivalency. We sought to address this concern by comparing 2 digital image analysis (DIA) platforms to manual counting with same-case/different-hotspot and same-hotspot/different-methodology concordance assessment.

Methods

We assembled a cohort of GEP-NETs (n = 20) from 16 patients. Two sets of Ki-67 hotspots were manually counted by three observers and by two DIA platforms, QuantCenter and HALO. Concordance between methods and observers was assessed using intraclass correlation coefficient (ICC) measures. For each comparison pair, the number of cases within ±0.2xKi-67 of its comparator was assessed.

Results

DIA Ki-67 showed excellent correlation with manual counting, and ICC was excellent in both within-hotspot and case-level assessments. In expert-vs-DIA, DIA-vs-DIA, or expert-vs-expert comparisons, the best-performing was DIA Ki-67 by QuantCenter, which showed 65% cases within ±0.2xKi-67 of manual counting.

Conclusions

Ki-67 measurement by DIA is highly correlated with expert-assessed values. However, close concordance by strict criteria (>80% within ±0.2xKi-67) is not seen with DIA-vs-expert or expert-vs-expert comparisons. The results show analytic noninferiority and support widespread adoption of carefully optimized and validated DIA Ki-67.

Key Points
  • Expert Ki-67 measurement in gastroenteropancreatic neuroendocrine tumors by manual counting exhibits the same variability and reproducibility as digital image analysis (DIA).

  • Measurement of Ki-67 proliferation index by whole-slide DIA is a robust method that can be adopted in routine clinical practice.

  • We propose a validation framework for digital Ki-67 that (1) ensures DIA Ki-67% exhibits high correlation (r > 0.9) with manual counting and (2) shows more than 50% cases within ±0.2xKi-67 of manual counting.

The Ki-67 proliferation index is integral to the diagnosis and prognosis of gastroenteropancreatic neuroendocrine tumors (GEP-NETs) and in many instances guides therapy.1 The Ki-67 index has been shown to be more powerful than tumor stage as a prognostic factor in pancreatic tumors,2 and there are significant overall survival differences between World Health Organization (WHO) G1 and G3 tumors.3 Manual counting of camera-captured images represents the de facto gold standard for Ki-67 assessment. Computer-assisted systems (CAS 2004,5) dating back to the mid-1990s did not widely penetrate clinical practice. Given increased access to whole-slide imaging and technological leaps in computing, the time is ripe for more widespread adoption of Ki-67 digital image analysis (DIA), although concerns about nonequivalency persist.6

The University of Iowa Hospitals and Clinics has a large neuroendocrine tumor program; we follow over 2,000 patients and see 200 new patients a year. We perform Ki-67 immunohistochemistry on all neuroendocrine tumors (NETs) (ie, primary, recurrent, metastatic), and on resections we separately evaluate primary, regional, and distant disease.7 Although we gestalt (aka “eyeball estimate”) rare Ki-67 proliferation indices (eg, those close to 0%), we manually count most (especially those around the WHO G1/2 and 2/3 thresholds), which is exceedingly time intensive. We perform manual counting of camera-captured images, as eyeball estimation and manual counting under the microscope have been shown to be clearly inferior.8-11 Some in our group have used the ImmunoRatio web application,12,13 which is an automated counting method with specific image input requirements. Observers have contended that DIA-based quantification and immunohistochemical assessment carry limitations. Reid et al,10 for example, ascribed high costs to implementing digital quantification. In addition, they contended that miscounting due to inclusion of nontarget cells was a contributor to inaccuracy with digital methods. This study partly grew from our Ki-67 DIA validation process. One of us is an immunohistochemistry laboratory director, and we drew heavily on that experience and the College of American Pathologists guideline for the analytic validation of immunohistochemical assays (eg, cohort size of 20 cases, expected concordance between comparator and the gold standard of 90%, and investigation of sources of lower-level concordance).14

The Ki-67 proliferation index is unique in surgical pathology in that it represents a continuous variable ranging from 1% to 100%. Studies evaluating Ki-67 DIA assessment across tumor types,15-22 including GEP-NETs,9-11,13,23-26 typically report the Pearson correlation coefficient (r) or the intraclass correlation coefficient (ICC).27 Despite the fact that Ki-67 is a continuous variable, NET grades are assigned at (arbitrary) proliferation index thresholds (ie, G1/2 at 3%, G2/3 at 20%), and crossing a grade threshold can have clinical import (eg, clinical trial eligibility). Two assays can be highly correlated but consistently produce results on opposite sides of these thresholds. Thus, we concluded that correlation coefficients alone were an insufficient metric for a validation and that, more broadly, there was a need for a simpler index or parameter of evaluating agreement between methods in histopathology that produce continuous data. The 3% GEP-NET G1/G2 threshold is frequently encountered in clinical practice; counts within 10% of this value range from 2.7% to 3.3%. However, this was intuitively and by consensus deemed “too difficult to reproduce.” The ±20% range around this value is 2.4% to 3.6%, and this represented our reproducibility target, and we refer to comparisons within the ±0.2xKi-67 index as “close matches.”

With the above, we sought to perform an in-depth examination of the reproducibility of digital Ki-67 measurement in GEP-NET with two widely available commercial whole-slide image–based DIA platforms with both hotspot- and case-level concordance assessment in comparison to manual counting. We examined the effect of strict predetermined thresholds to quantify the proximity of Ki-67 indices obtained by manual and digital methods. In addition, we examined the extent of grade-discrepant Ki-67 values in cases obtained by manual and digital methods.

Materials and Methods

Case Inclusion

The validation study was carried out as part of a quality improvement project. The University of Iowa Institutional Review Board determined the validation to be exempted from a full review by the board. We initially assembled a cohort of 20 GEP-NETs, including a mix of primary and metastatic tumors, from 16 patients. The cohort was assembled with consideration of the grade distribution in our patient population (66% G1, 30% G2, <5% G3). We intentionally overrepresented G2 tumors (50% of the cohort) (as G1 tumors with proliferation indices frequently close to 0% were not considered a “fair test”) and intentionally challenged the G1/G2 threshold (35% of the cohort with reported proliferation indices from 2%-5%). The cases were selected from 2017 and 2018 to reduce variation in counterstain (hematoxylin) quality and intensity. They were all initially signed out by a single expert gastroenterology (GI) pathologist with 11 years of experience (observer 1), who had performed manual counting of camera-captured Ki-67 hotspot images at the time of clinical reporting. Mitotic activity was not considered in grade assignment.

G3 GEP-NETs are relatively infrequently encountered, although they are overrepresented in our institutional consults. At the recommendation of the reviewers, we evaluated an additional group of five tumors from four patients, targeting Ki-67 proliferation indices in the 10% to 30% range. Clinicopathologic features of these 25 total cases are presented in Table 1.

Table 1

Patient Demographics of the Study Casesa

Patient No.Slide No.Age, ySexTumor LocationPrimary or MetastaticClinical Reported Ki-67 PIWHO Tumor Grade
1151FDistal pancreasPrimary6.20G2
2258FTerminal ileumPrimary2.00G1
3375MDuodenumPrimary2.00G1
4462MDistal pancreasPrimary23.00G3
5551FDuodenumPrimary1.20G1
651FLymph node Metastatic1.60G1
6775FCBD lymph node Metastatic4.00G2
8DuodenumPrimary5.50G2
7969FDuodenumPrimary9.00G2
81054FProximal ileumPrimary3.00G2
11Ileal tumor noduleMetastatic9.00G2
12Liver Metastatic8.50G2
91354FProximal ileumPrimary0.70G1
101470FPararenal node Metastatic10.70G2
15Liver Metastatic10.10G2
121668MDuodenumPrimary4.50G2
131763MLiver Metastatic2.30G1
141856MTerminal ileumPrimary1.90G1
151967MDiaphragmatic noduleMetastatic2.99G1
162059FStomachPrimary2.00G1
172177MStomachPrimary15G2
182267FLiver Metastatic7.1G2
192360FIleumPrimary8G2
202478FLiver metastasisMetastatic16G2
25PancreasPrimary36G3
Patient No.Slide No.Age, ySexTumor LocationPrimary or MetastaticClinical Reported Ki-67 PIWHO Tumor Grade
1151FDistal pancreasPrimary6.20G2
2258FTerminal ileumPrimary2.00G1
3375MDuodenumPrimary2.00G1
4462MDistal pancreasPrimary23.00G3
5551FDuodenumPrimary1.20G1
651FLymph node Metastatic1.60G1
6775FCBD lymph node Metastatic4.00G2
8DuodenumPrimary5.50G2
7969FDuodenumPrimary9.00G2
81054FProximal ileumPrimary3.00G2
11Ileal tumor noduleMetastatic9.00G2
12Liver Metastatic8.50G2
91354FProximal ileumPrimary0.70G1
101470FPararenal node Metastatic10.70G2
15Liver Metastatic10.10G2
121668MDuodenumPrimary4.50G2
131763MLiver Metastatic2.30G1
141856MTerminal ileumPrimary1.90G1
151967MDiaphragmatic noduleMetastatic2.99G1
162059FStomachPrimary2.00G1
172177MStomachPrimary15G2
182267FLiver Metastatic7.1G2
192360FIleumPrimary8G2
202478FLiver metastasisMetastatic16G2
25PancreasPrimary36G3

CBD, common bile duct; PI, proliferation index; WHO, World Health Organization.

aKi-67 proliferation index values at clinical reporting and corresponding tumor grades according to the WHO cutoffs (<3%, grade 1; 3%-20%, grade 2; >20%, grade 3) are shown.

Table 1

Patient Demographics of the Study Casesa

Patient No.Slide No.Age, ySexTumor LocationPrimary or MetastaticClinical Reported Ki-67 PIWHO Tumor Grade
1151FDistal pancreasPrimary6.20G2
2258FTerminal ileumPrimary2.00G1
3375MDuodenumPrimary2.00G1
4462MDistal pancreasPrimary23.00G3
5551FDuodenumPrimary1.20G1
651FLymph node Metastatic1.60G1
6775FCBD lymph node Metastatic4.00G2
8DuodenumPrimary5.50G2
7969FDuodenumPrimary9.00G2
81054FProximal ileumPrimary3.00G2
11Ileal tumor noduleMetastatic9.00G2
12Liver Metastatic8.50G2
91354FProximal ileumPrimary0.70G1
101470FPararenal node Metastatic10.70G2
15Liver Metastatic10.10G2
121668MDuodenumPrimary4.50G2
131763MLiver Metastatic2.30G1
141856MTerminal ileumPrimary1.90G1
151967MDiaphragmatic noduleMetastatic2.99G1
162059FStomachPrimary2.00G1
172177MStomachPrimary15G2
182267FLiver Metastatic7.1G2
192360FIleumPrimary8G2
202478FLiver metastasisMetastatic16G2
25PancreasPrimary36G3
Patient No.Slide No.Age, ySexTumor LocationPrimary or MetastaticClinical Reported Ki-67 PIWHO Tumor Grade
1151FDistal pancreasPrimary6.20G2
2258FTerminal ileumPrimary2.00G1
3375MDuodenumPrimary2.00G1
4462MDistal pancreasPrimary23.00G3
5551FDuodenumPrimary1.20G1
651FLymph node Metastatic1.60G1
6775FCBD lymph node Metastatic4.00G2
8DuodenumPrimary5.50G2
7969FDuodenumPrimary9.00G2
81054FProximal ileumPrimary3.00G2
11Ileal tumor noduleMetastatic9.00G2
12Liver Metastatic8.50G2
91354FProximal ileumPrimary0.70G1
101470FPararenal node Metastatic10.70G2
15Liver Metastatic10.10G2
121668MDuodenumPrimary4.50G2
131763MLiver Metastatic2.30G1
141856MTerminal ileumPrimary1.90G1
151967MDiaphragmatic noduleMetastatic2.99G1
162059FStomachPrimary2.00G1
172177MStomachPrimary15G2
182267FLiver Metastatic7.1G2
192360FIleumPrimary8G2
202478FLiver metastasisMetastatic16G2
25PancreasPrimary36G3

CBD, common bile duct; PI, proliferation index; WHO, World Health Organization.

aKi-67 proliferation index values at clinical reporting and corresponding tumor grades according to the WHO cutoffs (<3%, grade 1; 3%-20%, grade 2; >20%, grade 3) are shown.

Ki-67 Counting and Digital Measurement

Manual Counting of Hotspot Photomicrographs From Glass Slides

Immunohistochemistry for Ki-67 was performed using standard techniques (MIB1; 1:200 dilution, heat-induced epitope retrieval; Dako). In routine reporting, Ki-67 proliferation index values were measured by manual counting of immunostained nuclei (all by observer 1).Hotspot Selection.—In brief, hotspot selection was performed by light microscopy on glass slides by carefully combined low- and high-magnification examination of tumor areas to identify the area/focus with the highest apparent density of immunopositive tumor nuclei Figure 1. Depending on the proportions of tumor and stroma, two to three nonoverlapping photomicrographs were acquired (×400, 2,448 × 1,920 pixels, 72 pixels per inch) from each hotspot area with an Olympus DX27 CCD camera (Olympus) Figure 2A. The captured images together were known from routine practice to contain approximately 1,000 tumor cells.Manual Counting.—Manual counting was performed with cellSens version 1.7 (Olympus) using the point-annotation function applied to the hotspot images viewed in cellSens on computer monitors Figure 2C. Immunonegative tumor nuclei were included, and stromal, inflammatory, and endothelial cells were excluded. Ki-67–positive tumor nuclei were counted separately, and positive nontumor elements (inflammatory cell, stromal and endothelial nuclei) were excluded by morphologic assessment. The historical/retrospective data from routine reporting as described in the above method were included in the analysis Table 2. The remainder of the measurements were obtained prospectively.

Figure 2

Ki-67 measurement by manual and digital image analysis (DIA) methods. A representative case (A) with segmented nuclei in HALO (B) and the same field with manual counting annotation of nuclei performed in Olympus cellSens (C). An annotated hotspot on a digital whole-slide image (D). This is subject to DIA measurement in 3D Histech Quantcenter (E) and manual counting in cellSens (F). The open plus symbols in C and F are from manual counting.

Study design. The circles indicate hotspots and the site of hotspot selection (glass vs whole-slide image [WSI]) and the connected rectangles the method of Ki-67 measurement and the image substrate used to perform counting (photomicrograph [PMG]). “Study hotspots” were annotated by the digital image analysis (DIA) pathologist on WSIs and were counted by observer 1 and observer 3 and measured by HALO and QuantCenter. Observer 1 and observer 2 hotspots were identified on glass slides and counted using on-screen photomicrographs in Olympus cellSens. The dashed lines (green, expert vs expert; blue, expert vs DIA; red, DIA vs DIA) indicate the intercomparisons performed in the analysis.
Figure 1

Study design. The circles indicate hotspots and the site of hotspot selection (glass vs whole-slide image [WSI]) and the connected rectangles the method of Ki-67 measurement and the image substrate used to perform counting (photomicrograph [PMG]). “Study hotspots” were annotated by the digital image analysis (DIA) pathologist on WSIs and were counted by observer 1 and observer 3 and measured by HALO and QuantCenter. Observer 1 and observer 2 hotspots were identified on glass slides and counted using on-screen photomicrographs in Olympus cellSens. The dashed lines (green, expert vs expert; blue, expert vs DIA; red, DIA vs DIA) indicate the intercomparisons performed in the analysis.

In the study, a second independent set of hotspots was prospectively chosen on glass slides by another expert GI pathologist (observer 2). Using the same method as described above, two to three TIFF images were acquired with an Olympus DX27 camera at ×400 from each hotspot area. The acquired photomicrographs were manually counted in cellSens by observer 2, as described above.

Manual Counting of Hotspots Acquired From Digital Whole Slides

Snapshot images containing study hotspots from whole-slide images (WSIs; see below) were manually counted by two independent observers (observer 1, observer 3) using cellSens’s point-annotation function, as described above (Figure 2C and Figure 2F).

DIA-Based Ki-67 Measurement on WSI and Hotspot Photomicrographs

The Ki-67 immunohistochemistry slides were digitally scanned with the ×20 objective (0.24-μm/pixel resolution) by a P1000 Pannoramic scanner (3DHistech) to yield .mrxs digital WSIs.Hotspot Selection.—On WSI slides, a ×400 high-power field–equivalent circular area (0.152 mm2), known from prior experience to contain ~1,000 tumor cells, was annotated by a third pathologist independent of the prior two selections using 3DHistech Caseviewer’s gradient visualization function that facilitates identification of the highest Ki-67–labeled areas Figure 2D. This was carried out by the pathologist performing digital image analysis (DP), and the selected hotspots are termed study hotspots. In addition, the study hotspots were extracted using Caseviewer’s Snapshot function in a 1,920 × 1,017-pixel JPEG image that contained within it the circular annotation.Automated Counting by DIA.—The DP annotated study hotspots were analyzed in 3DHistech QuantCenter with the CellQuant algorithm to digitally enumerate nuclei on WSIs Figure 2E. Because of a lack of cross-compatibility between 3DHistech Caseviewer and HALO (Indica Labs), digital whole-slide annotated study hotspots could not be directly opened in HALO. Instead, study hotspot circular annotated areas were exported using Caseviewer’s Export function as standalone .mrxs WSI files, opened in HALO, and analyzed using the Cytonuclear algorithm. Similarly, observer 2–chosen TIFF hotspot images acquired by photomicrography from glass slides were imported and analyzed in HALO using the Cytonuclear algorithm (Figure 2B).

The selection of hotspots and counting methods are depicted in Figure 1, Figure 2, and Table 2. CellQuant and Cytonuclear algorithms were independently optimized for tumor nuclear detection and positive nucleus classification using multiple areas from nonstudy neuroendocrine tumor and other WSI slides. The same algorithm settings were used in QuantCenter and HALO when analyzing WSIs or imported hotspot photomicrographs each, respectively. Nuclear count data were exported as .xls files. The optimized algorithm parameters for 3DHistech CellQuant and HALO CytoNuclear, and example results from both platforms are included in the supplementary data (Supplementary Figure S1, Supplementary Figure S2, and Supplementary Figure S3; all supplemental materials can be found at American Journal of Clinical Pathology online).

Table 2

Summary of Ki-67 Measurement Experiments Performed or Included in the Studya

Hotspot SelectorManual CountingAnalysis Platform
DIA pathologist Observer 3
Observer 1
QuantCenter
HALO
Observer 1 Observer 1
Observer 2 Observer 2 HALO
Hotspot SelectorManual CountingAnalysis Platform
DIA pathologist Observer 3
Observer 1
QuantCenter
HALO
Observer 1 Observer 1
Observer 2 Observer 2 HALO

aEach row indicates the hotspot selector and the methods of counting each of the selected hotspots were subjected to. The digital image analysis (DIA) pathologist study hotspots and observer 2 chosen hotspots were counted by both DIA platforms. Observer 1 counts on observer 1–chosen hotspots were performed as part of clinical reporting of each case.

Table 2

Summary of Ki-67 Measurement Experiments Performed or Included in the Studya

Hotspot SelectorManual CountingAnalysis Platform
DIA pathologist Observer 3
Observer 1
QuantCenter
HALO
Observer 1 Observer 1
Observer 2 Observer 2 HALO
Hotspot SelectorManual CountingAnalysis Platform
DIA pathologist Observer 3
Observer 1
QuantCenter
HALO
Observer 1 Observer 1
Observer 2 Observer 2 HALO

aEach row indicates the hotspot selector and the methods of counting each of the selected hotspots were subjected to. The digital image analysis (DIA) pathologist study hotspots and observer 2 chosen hotspots were counted by both DIA platforms. Observer 1 counts on observer 1–chosen hotspots were performed as part of clinical reporting of each case.

Five cases with higher clinically reported Ki-67 index values were selected and analyzed separately. For these slides, hotspots were chosen and annotated on WSIs by an expert GI pathologist (observer 2). The hotspots were extracted as JPEG images using Caseviewer’s Snapshot function, resulting in JPEG images that contained the circular annotation. These were manually counted by two pathologists (observer 1, observer 2) in Olympus cellSens. The hotspots were analyzed in 3DHistech QuantCenter with the CellQuant algorithm to digitally enumerate nuclei on WSIs. Ki-67 measurement by observer 3 manual counting and HALO were not performed. Because the five cases were not subject to the uniform measurement process as the remaining 20, their data are analyzed and presented separately (Supplemental Figure S4).

Analysis

Ki-67 proliferation indices were calculated by the standard method: positive tumor nuclei/total tumor nuclei. For every comparison, the Pearson correlation coefficient (r) was calculated. Concordance within methodologic and observer groups was assessed by ICC measures. As a measure of the closeness of match between measured Ki-67 indices, a preset threshold was assessed, namely, the number of cases for each method that fell within ±0.2x of its comparator. In all comparisons, where applicable, the manual counts were used to derive close match thresholds. In addition, for each comparison pair, κ statistics and the number of cases that were WHO GEP-NET grade discrepant were assessed. Confidence intervals (CIs) for mean Ki-67 (%) were estimated using 1,000 bootstrap samples. Differences between mean Ki-67 values were compared using the Student t test and differences in proportions analyzed using Fisher exact test. Statistical analysis was performed using SPSS version 26 and RStudio version 1.2.1335 running R 64-bit version 3.6.1.

Results

Profile of Cases

Taken together, the age of the patients (n = 20) ranged from 51 to 78 years. There were 7 men and 13 women (male/female = 1:1.8). Sections from 25 tumors from these patients were studied; of these, 15 were primary tumors, and the remaining 10 were metastatic foci. The primary sites studied included pancreas (n = 3), duodenum (n = 5), ileum (n = 5), and stomach (n = 2). There were four liver metastases, with nodal metastases (n = 3) and peritoneal nodules (n = 3) forming the remainder. The clinically reported Ki-67 proliferation index values ranged from 0.70% to 36%, with a mean of 5.5%, with 9 G1, 14 G2, and 2 G3 (WHO) tumors. The five cases analyzed separately showed a mean Ki-67 of 16.42%, with three reported with Ki-67 values more than 10%. The details of the included cases and slides are presented in Table 1.

Correlation Analysis

The measured Ki-67 values are shown in Table 3 and Supplementary Table S1. Ki-67 measurement by both DIA platforms showed excellent correlation with their human counterparts Figure 3and Table 4. Table 4 displays the findings of Ki-67 tumor slides that were subjected to the full analysis (n = 20) described above. The Pearson correlation coefficient (r) ranged from 0.881 to 0.980 (P < .001, all possible pairs), indicating a high degree of correlation between different observers and between methods (Table 4). The ICC was excellent in both within-hotspot (0.94, and 0.89, respectively) and case-level assessments (0.926, all P < .001). In addition, ICC values were high when examined expert vs expert (0.97; 95% CI, 0.95-0.98) and expert vs DIA (0.91; 95% CI, 0.84-0.96). Reproducibility between different DIA platforms (ie, DIA-vs-DIA assessments) was 0.88 (95% CI, 0.77-0.94).

Table 3

Ki-67 Index Values (%) Calculated by Different Methodsa

Tumor No. Manual CountingDigital Image Analysis
Observer 1 (DP)Observer 1 (Ob 1)Observer 2 (Ob 2)Observer 3 (DP)QuantCenter (DP)HALO (DP)HALO (Ob 2)
16.316.205.855.925.704.705.72
21.952.002.401.723.342.493.40
31.852.001.241.042.162.061.85
424.6723.0023.8422.6124.3320.9036.50
50.601.201.240.600.520.431.04
61.371.601.641.262.751.131.69
73.814.004.743.793.463.144.23
84.805.504.395.385.484.803.78
96.429.005.777.066.622.115.08
102.123.002.442.121.951.321.88
118.139.004.277.954.363.343.42
125.958.507.576.616.074.806.09
130.440.700.980.441.160.901.04
1411.7310.709.5111.4010.293.827.91
158.0810.108.587.716.054.618.01
162.414.502.132.342.632.361.49
173.612.304.654.114.244.134.04
181.721.902.511.711.611.641.91
193.412.994.113.131.941.512.82
202.092.002.001.961.601.381.56
2114.91520.223.15
227.97.19.813.48
237.188.16.45
24261629.627.56
2521.13626.524.5
Tumor No. Manual CountingDigital Image Analysis
Observer 1 (DP)Observer 1 (Ob 1)Observer 2 (Ob 2)Observer 3 (DP)QuantCenter (DP)HALO (DP)HALO (Ob 2)
16.316.205.855.925.704.705.72
21.952.002.401.723.342.493.40
31.852.001.241.042.162.061.85
424.6723.0023.8422.6124.3320.9036.50
50.601.201.240.600.520.431.04
61.371.601.641.262.751.131.69
73.814.004.743.793.463.144.23
84.805.504.395.385.484.803.78
96.429.005.777.066.622.115.08
102.123.002.442.121.951.321.88
118.139.004.277.954.363.343.42
125.958.507.576.616.074.806.09
130.440.700.980.441.160.901.04
1411.7310.709.5111.4010.293.827.91
158.0810.108.587.716.054.618.01
162.414.502.132.342.632.361.49
173.612.304.654.114.244.134.04
181.721.902.511.711.611.641.91
193.412.994.113.131.941.512.82
202.092.002.001.961.601.381.56
2114.91520.223.15
227.97.19.813.48
237.188.16.45
24261629.627.56
2521.13626.524.5

DP, digital image analysis pathologist; Ob, observer; —, data unavailable.

aThe column names list the method of counting (manual counting vs Quantcenter vs HALO) and the observer performing the count. Listed in parentheses are observers choosing the hotspots. For cases 21 to 25, hotspots were chosen by observer 2.

Table 3

Ki-67 Index Values (%) Calculated by Different Methodsa

Tumor No. Manual CountingDigital Image Analysis
Observer 1 (DP)Observer 1 (Ob 1)Observer 2 (Ob 2)Observer 3 (DP)QuantCenter (DP)HALO (DP)HALO (Ob 2)
16.316.205.855.925.704.705.72
21.952.002.401.723.342.493.40
31.852.001.241.042.162.061.85
424.6723.0023.8422.6124.3320.9036.50
50.601.201.240.600.520.431.04
61.371.601.641.262.751.131.69
73.814.004.743.793.463.144.23
84.805.504.395.385.484.803.78
96.429.005.777.066.622.115.08
102.123.002.442.121.951.321.88
118.139.004.277.954.363.343.42
125.958.507.576.616.074.806.09
130.440.700.980.441.160.901.04
1411.7310.709.5111.4010.293.827.91
158.0810.108.587.716.054.618.01
162.414.502.132.342.632.361.49
173.612.304.654.114.244.134.04
181.721.902.511.711.611.641.91
193.412.994.113.131.941.512.82
202.092.002.001.961.601.381.56
2114.91520.223.15
227.97.19.813.48
237.188.16.45
24261629.627.56
2521.13626.524.5
Tumor No. Manual CountingDigital Image Analysis
Observer 1 (DP)Observer 1 (Ob 1)Observer 2 (Ob 2)Observer 3 (DP)QuantCenter (DP)HALO (DP)HALO (Ob 2)
16.316.205.855.925.704.705.72
21.952.002.401.723.342.493.40
31.852.001.241.042.162.061.85
424.6723.0023.8422.6124.3320.9036.50
50.601.201.240.600.520.431.04
61.371.601.641.262.751.131.69
73.814.004.743.793.463.144.23
84.805.504.395.385.484.803.78
96.429.005.777.066.622.115.08
102.123.002.442.121.951.321.88
118.139.004.277.954.363.343.42
125.958.507.576.616.074.806.09
130.440.700.980.441.160.901.04
1411.7310.709.5111.4010.293.827.91
158.0810.108.587.716.054.618.01
162.414.502.132.342.632.361.49
173.612.304.654.114.244.134.04
181.721.902.511.711.611.641.91
193.412.994.113.131.941.512.82
202.092.002.001.961.601.381.56
2114.91520.223.15
227.97.19.813.48
237.188.16.45
24261629.627.56
2521.13626.524.5

DP, digital image analysis pathologist; Ob, observer; —, data unavailable.

aThe column names list the method of counting (manual counting vs Quantcenter vs HALO) and the observer performing the count. Listed in parentheses are observers choosing the hotspots. For cases 21 to 25, hotspots were chosen by observer 2.

Table 4

Comparisons Between Different Methods of Ki-67 Measurementa

Comparison PairPearson Correlation CoefficientCloseness of Match: No. (%) of Cases Within ±0.2xKi-67 IndexNo. of Grade-Discordant Casesκ
Manual counting vs manual counting
Observer 1 (DP) vs observer 2 (Ob 2) (case level)0.97610/20 (50)41
Observer 1 (DP) vs observer 1 (Ob 1) (case level)0.97712/20 (60)40.63
Observer 1 (Ob 1) vs observer 2 (Ob 2) (case level)0.95210/20 (50)00.63
Manual counting vs DIA
Observer 2 (Ob 2) vs HALO (Ob 2) (hotspot level)0.97111/20 (55)20.81
Observer 2 (Ob 2) vs HALO (DP) (case level)0.9497/20 (35)20.81
Observer 2 (Ob 2) vs QuantCenter (DP) (case level)0.97810/20 (50)20.81
Observer 1 (Ob 1) vs HALO (Ob 2) (case level)0.9027/20 (35)20.81
Observer 1 (Ob 1) vs HALO (DP) (case level)0.8817/20 (35)40.81
Observer 1 (Ob 1) vs QuantCenter (DP) (case level)0.9468/20 (40)40.81
Observer 1 (DP) vs HALO (Ob 2) (case level)0.9429/20 (45)20.63
Observer 1 (DP) vs HALO (DP) (hotspot level)0.9229/20 (45)20.63
Observer 1 (DP) vs QuantCenter (DP) (hotspot level)0.97613/20 (65)20.63
DIA vs DIA
QuantCenter (DP) vs HALO (DP) (hotspot level)0.95310/20 (50)20.81
QuantCenter (DP) vs HALO (Ob 2) (case level)0.9699/20 (45)01
HALO (DP) vs HALO (Ob 2) (case level)0.9806/20 (30)20.81
Comparison PairPearson Correlation CoefficientCloseness of Match: No. (%) of Cases Within ±0.2xKi-67 IndexNo. of Grade-Discordant Casesκ
Manual counting vs manual counting
Observer 1 (DP) vs observer 2 (Ob 2) (case level)0.97610/20 (50)41
Observer 1 (DP) vs observer 1 (Ob 1) (case level)0.97712/20 (60)40.63
Observer 1 (Ob 1) vs observer 2 (Ob 2) (case level)0.95210/20 (50)00.63
Manual counting vs DIA
Observer 2 (Ob 2) vs HALO (Ob 2) (hotspot level)0.97111/20 (55)20.81
Observer 2 (Ob 2) vs HALO (DP) (case level)0.9497/20 (35)20.81
Observer 2 (Ob 2) vs QuantCenter (DP) (case level)0.97810/20 (50)20.81
Observer 1 (Ob 1) vs HALO (Ob 2) (case level)0.9027/20 (35)20.81
Observer 1 (Ob 1) vs HALO (DP) (case level)0.8817/20 (35)40.81
Observer 1 (Ob 1) vs QuantCenter (DP) (case level)0.9468/20 (40)40.81
Observer 1 (DP) vs HALO (Ob 2) (case level)0.9429/20 (45)20.63
Observer 1 (DP) vs HALO (DP) (hotspot level)0.9229/20 (45)20.63
Observer 1 (DP) vs QuantCenter (DP) (hotspot level)0.97613/20 (65)20.63
DIA vs DIA
QuantCenter (DP) vs HALO (DP) (hotspot level)0.95310/20 (50)20.81
QuantCenter (DP) vs HALO (Ob 2) (case level)0.9699/20 (45)01
HALO (DP) vs HALO (Ob 2) (case level)0.9806/20 (30)20.81

DIA, digital image analysis; DP, digital image analysis pathologist; Ob, observer.

aThe comparison pairs are grouped under three headings. For each method, the name/label in the leftmost column indicates the observer or software device performing the count followed by the person choosing the hotspot in parentheses. The grade-discordant cases column enumerates the number of cases that fall under different World Health Organization grades (G1/G2/G3) for each comparison, and κ indicates the Cohen’s κ statistic value for the pair.

Table 4

Comparisons Between Different Methods of Ki-67 Measurementa

Comparison PairPearson Correlation CoefficientCloseness of Match: No. (%) of Cases Within ±0.2xKi-67 IndexNo. of Grade-Discordant Casesκ
Manual counting vs manual counting
Observer 1 (DP) vs observer 2 (Ob 2) (case level)0.97610/20 (50)41
Observer 1 (DP) vs observer 1 (Ob 1) (case level)0.97712/20 (60)40.63
Observer 1 (Ob 1) vs observer 2 (Ob 2) (case level)0.95210/20 (50)00.63
Manual counting vs DIA
Observer 2 (Ob 2) vs HALO (Ob 2) (hotspot level)0.97111/20 (55)20.81
Observer 2 (Ob 2) vs HALO (DP) (case level)0.9497/20 (35)20.81
Observer 2 (Ob 2) vs QuantCenter (DP) (case level)0.97810/20 (50)20.81
Observer 1 (Ob 1) vs HALO (Ob 2) (case level)0.9027/20 (35)20.81
Observer 1 (Ob 1) vs HALO (DP) (case level)0.8817/20 (35)40.81
Observer 1 (Ob 1) vs QuantCenter (DP) (case level)0.9468/20 (40)40.81
Observer 1 (DP) vs HALO (Ob 2) (case level)0.9429/20 (45)20.63
Observer 1 (DP) vs HALO (DP) (hotspot level)0.9229/20 (45)20.63
Observer 1 (DP) vs QuantCenter (DP) (hotspot level)0.97613/20 (65)20.63
DIA vs DIA
QuantCenter (DP) vs HALO (DP) (hotspot level)0.95310/20 (50)20.81
QuantCenter (DP) vs HALO (Ob 2) (case level)0.9699/20 (45)01
HALO (DP) vs HALO (Ob 2) (case level)0.9806/20 (30)20.81
Comparison PairPearson Correlation CoefficientCloseness of Match: No. (%) of Cases Within ±0.2xKi-67 IndexNo. of Grade-Discordant Casesκ
Manual counting vs manual counting
Observer 1 (DP) vs observer 2 (Ob 2) (case level)0.97610/20 (50)41
Observer 1 (DP) vs observer 1 (Ob 1) (case level)0.97712/20 (60)40.63
Observer 1 (Ob 1) vs observer 2 (Ob 2) (case level)0.95210/20 (50)00.63
Manual counting vs DIA
Observer 2 (Ob 2) vs HALO (Ob 2) (hotspot level)0.97111/20 (55)20.81
Observer 2 (Ob 2) vs HALO (DP) (case level)0.9497/20 (35)20.81
Observer 2 (Ob 2) vs QuantCenter (DP) (case level)0.97810/20 (50)20.81
Observer 1 (Ob 1) vs HALO (Ob 2) (case level)0.9027/20 (35)20.81
Observer 1 (Ob 1) vs HALO (DP) (case level)0.8817/20 (35)40.81
Observer 1 (Ob 1) vs QuantCenter (DP) (case level)0.9468/20 (40)40.81
Observer 1 (DP) vs HALO (Ob 2) (case level)0.9429/20 (45)20.63
Observer 1 (DP) vs HALO (DP) (hotspot level)0.9229/20 (45)20.63
Observer 1 (DP) vs QuantCenter (DP) (hotspot level)0.97613/20 (65)20.63
DIA vs DIA
QuantCenter (DP) vs HALO (DP) (hotspot level)0.95310/20 (50)20.81
QuantCenter (DP) vs HALO (Ob 2) (case level)0.9699/20 (45)01
HALO (DP) vs HALO (Ob 2) (case level)0.9806/20 (30)20.81

DIA, digital image analysis; DP, digital image analysis pathologist; Ob, observer.

aThe comparison pairs are grouped under three headings. For each method, the name/label in the leftmost column indicates the observer or software device performing the count followed by the person choosing the hotspot in parentheses. The grade-discordant cases column enumerates the number of cases that fall under different World Health Organization grades (G1/G2/G3) for each comparison, and κ indicates the Cohen’s κ statistic value for the pair.

Correlation coefficients between manual counting and all digital image analysis (DIA) methods. In each panel (A, observer 2; B, observer 1; C, observer 1) (clinical reporting), manual counts are plotted along the x-axis against all corresponding DIA Ki-67 values (both hotspot and case level) from both platforms (QuantCenter and HALO) on the y-axis. A, P < .001; Pearson r = 0.93; 99% confidence interval [CI], 0.87-0.97. B, P < .001; Pearson r = 0.91; 99% CI, 0.83-0.95. C, P < .001; Pearson r = 0.88; 99% CI, 0.77-0.94.
Figure 3

Correlation coefficients between manual counting and all digital image analysis (DIA) methods. In each panel (A, observer 2; B, observer 1; C, observer 1) (clinical reporting), manual counts are plotted along the x-axis against all corresponding DIA Ki-67 values (both hotspot and case level) from both platforms (QuantCenter and HALO) on the y-axis. A, P < .001; Pearson r = 0.93; 99% confidence interval [CI], 0.87-0.97. B, P < .001; Pearson r = 0.91; 99% CI, 0.83-0.95. C, P < .001; Pearson r = 0.88; 99% CI, 0.77-0.94.

Close-Match Analysis

Among possible meaningful pairs (expert vs DIA, DIA vs DIA, expert vs expert), the number of cases that were within the ±0.2xKi-67 index of each other ranged from 6 of 20 to 13 of 20, with the lowest closeness of match occurring in HALO measurements from different hotspots (Table 4). The median close match was 45%. The highest (13/20; 65%) occurred with QuantCenter measurements compared with the expert pathologist’s manual counting of study hotspots. Mean close-match levels between experts were comparable to expert-vs-DIA and DIA-vs-DIA matches (53% vs 45% vs 41% of cases, respectively). The number of closely matched cases was not significantly different between within-hotspot and case-level measurement comparisons (mean, 8.8 ± 1.3 vs 10.75 ± 2.7; P = .09, t test).

The five separately analyzed cases showed a mean close match of 40% for expert-vs-expert comparisons and a slightly higher mean close match of 46% for expert-vs-DIA comparisons.

Close concordance by strict criteria (>80% cases within ±0.2xKi-67) was not seen with DIA-vs-expert or expert-vs-expert comparisons (Table 4).

Comparison of Dispersion of Values Between Manual Counting and DIA Methods

Mean Ki-67 (%) values by DIA (three sets of counting) and manual counting (four sets of measurements) fell within 95% CI of each other in 7 (35%) of 20 cases Table 5and Figure 4. Manual counting, overall, showed a lower dispersion of counts, with mean width of the 95% CIs being 1.14%, compared with a 2.057% seen with DIA methods. In nonoverlapping cases, manual counting values with wide CIs for the means (eg, cases 9, 11, 12, 14, and 15; see Figure 4) and constrained manual and DIA counts but with entirely nonoverlapping confidence intervals counts (eg, cases 2, 13, and 20; Figure 4) were both seen.

Table 5

Mean Ki-67 Proliferation Index (%) and Bootstrapped 95% Confidence Intervals (1,000 Samples) by Manual Counting and Digital Image Analysis

Case No.Manual Counting, Mean (95% CI)Digital Image Analysis, Mean (95% CI)
16.07 (5.88-6.26)5.38 (4.70-5.72)
22.02 (1.79-2.29)3.08 (2.49-3.40)
31.53 (1.14-1.96)2.02 (1.85-2.12)
423.53 (22.80-24.25)27.25 (20.90-36.50)
5.91 (.60-1.22).66 (.43-1.04)
61.47 (1.32-1.62)1.86 (1.13-2.40)
74.09 (3.80-4.51)3.61 (3.14-4.23)
85.02 (4.59-5.47)4.69 (3.78-5.25)
97.06 (6.10-8.52)4.60 (2.11-6.11)
102.42 (2.12-2.78)1.72 (1.32-1.93)
117.34 (5.24-8.78)3.71 (3.34-4.04)
127.16 (6.28-8.04)5.65 (4.80-6.09)
13.64 (.44-.85)1.03 (.90-1.12)
1410.84 (9.98-11.57)7.34 (3.82-9.49)
158.62 (7.90-9.60)6.22 (4.61-8.01)
162.85 (2.20-3.98)2.16 (1.49-2.54)
173.67 (2.63-4.39)4.14 (4.04-4.20)
181.96 (1.72-2.31)1.72 (1.62-1.91)
193.41 (3.02-3.87)2.09 (1.51-2.82)
202.01 (1.97-2.07)1.51 (1.38-1.59)
Case No.Manual Counting, Mean (95% CI)Digital Image Analysis, Mean (95% CI)
16.07 (5.88-6.26)5.38 (4.70-5.72)
22.02 (1.79-2.29)3.08 (2.49-3.40)
31.53 (1.14-1.96)2.02 (1.85-2.12)
423.53 (22.80-24.25)27.25 (20.90-36.50)
5.91 (.60-1.22).66 (.43-1.04)
61.47 (1.32-1.62)1.86 (1.13-2.40)
74.09 (3.80-4.51)3.61 (3.14-4.23)
85.02 (4.59-5.47)4.69 (3.78-5.25)
97.06 (6.10-8.52)4.60 (2.11-6.11)
102.42 (2.12-2.78)1.72 (1.32-1.93)
117.34 (5.24-8.78)3.71 (3.34-4.04)
127.16 (6.28-8.04)5.65 (4.80-6.09)
13.64 (.44-.85)1.03 (.90-1.12)
1410.84 (9.98-11.57)7.34 (3.82-9.49)
158.62 (7.90-9.60)6.22 (4.61-8.01)
162.85 (2.20-3.98)2.16 (1.49-2.54)
173.67 (2.63-4.39)4.14 (4.04-4.20)
181.96 (1.72-2.31)1.72 (1.62-1.91)
193.41 (3.02-3.87)2.09 (1.51-2.82)
202.01 (1.97-2.07)1.51 (1.38-1.59)

CI, confidence interval.

Table 5

Mean Ki-67 Proliferation Index (%) and Bootstrapped 95% Confidence Intervals (1,000 Samples) by Manual Counting and Digital Image Analysis

Case No.Manual Counting, Mean (95% CI)Digital Image Analysis, Mean (95% CI)
16.07 (5.88-6.26)5.38 (4.70-5.72)
22.02 (1.79-2.29)3.08 (2.49-3.40)
31.53 (1.14-1.96)2.02 (1.85-2.12)
423.53 (22.80-24.25)27.25 (20.90-36.50)
5.91 (.60-1.22).66 (.43-1.04)
61.47 (1.32-1.62)1.86 (1.13-2.40)
74.09 (3.80-4.51)3.61 (3.14-4.23)
85.02 (4.59-5.47)4.69 (3.78-5.25)
97.06 (6.10-8.52)4.60 (2.11-6.11)
102.42 (2.12-2.78)1.72 (1.32-1.93)
117.34 (5.24-8.78)3.71 (3.34-4.04)
127.16 (6.28-8.04)5.65 (4.80-6.09)
13.64 (.44-.85)1.03 (.90-1.12)
1410.84 (9.98-11.57)7.34 (3.82-9.49)
158.62 (7.90-9.60)6.22 (4.61-8.01)
162.85 (2.20-3.98)2.16 (1.49-2.54)
173.67 (2.63-4.39)4.14 (4.04-4.20)
181.96 (1.72-2.31)1.72 (1.62-1.91)
193.41 (3.02-3.87)2.09 (1.51-2.82)
202.01 (1.97-2.07)1.51 (1.38-1.59)
Case No.Manual Counting, Mean (95% CI)Digital Image Analysis, Mean (95% CI)
16.07 (5.88-6.26)5.38 (4.70-5.72)
22.02 (1.79-2.29)3.08 (2.49-3.40)
31.53 (1.14-1.96)2.02 (1.85-2.12)
423.53 (22.80-24.25)27.25 (20.90-36.50)
5.91 (.60-1.22).66 (.43-1.04)
61.47 (1.32-1.62)1.86 (1.13-2.40)
74.09 (3.80-4.51)3.61 (3.14-4.23)
85.02 (4.59-5.47)4.69 (3.78-5.25)
97.06 (6.10-8.52)4.60 (2.11-6.11)
102.42 (2.12-2.78)1.72 (1.32-1.93)
117.34 (5.24-8.78)3.71 (3.34-4.04)
127.16 (6.28-8.04)5.65 (4.80-6.09)
13.64 (.44-.85)1.03 (.90-1.12)
1410.84 (9.98-11.57)7.34 (3.82-9.49)
158.62 (7.90-9.60)6.22 (4.61-8.01)
162.85 (2.20-3.98)2.16 (1.49-2.54)
173.67 (2.63-4.39)4.14 (4.04-4.20)
181.96 (1.72-2.31)1.72 (1.62-1.91)
193.41 (3.02-3.87)2.09 (1.51-2.82)
202.01 (1.97-2.07)1.51 (1.38-1.59)

CI, confidence interval.

A, Mean Ki-67 proliferation index (%) and bootstrapped 95% confidence intervals (1,000 samples). Black horizontal bars depict Ki-67 by manual counting (four observers) and blue digital image analysis (three sets). Open circles indicate the upper and lower limits and means. For values, see Table 4: case 4 is excluded for ease of plotting. B, Ki-67 proliferation index values by all methods. Colored lines depict each method of Ki-67 measurement and dots depict values obtained. For each, the hotspot selector is listed in parentheses (Ob 1, observer 1; Ob 2, observer 2). DP, pathologist performing digital image analysis.
Figure 4

A, Mean Ki-67 proliferation index (%) and bootstrapped 95% confidence intervals (1,000 samples). Black horizontal bars depict Ki-67 by manual counting (four observers) and blue digital image analysis (three sets). Open circles indicate the upper and lower limits and means. For values, see Table 4: case 4 is excluded for ease of plotting. B, Ki-67 proliferation index values by all methods. Colored lines depict each method of Ki-67 measurement and dots depict values obtained. For each, the hotspot selector is listed in parentheses (Ob 1, observer 1; Ob 2, observer 2). DP, pathologist performing digital image analysis.

WHO Tumor Grade Comparisons

Interobserver agreement between experts on the assigned WHO grade according to the Ki-67 index (<3%, grade 1; 3%-20%, grade 2; >20%, grade 3) showed κ values ranging from 0.63 to 1 (good to excellent). Agreement in expert-vs-DIA and DIA-vs-DIA comparisons was 0.63 to 0.81 and 0.81 to 1, respectively (Table 4). Despite a slightly better close match between experts, grade-discordant cases in expert-vs-DIA comparisons were seen at a rate no greater than in expert-vs-expert comparisons (P = .6, Fisher exact test) Table 6.

Table 6

Tumor Grade According to the World Health Organization Cutoffs (<3%, Grade 1; 3%-20%, Grade 2; >20%, Grade 3) From Different Methodsa

Manual CountingDigital Image Analysis
Case No.Observer 1 (DP)Observer 1 (Ob 1)Observer 2 (Ob 2)Observer 3 (DP)QuantCenter (DP)HALO (DP)HALO (Ob 2)
1G2G2G2G2G2G2G2
2G1G1G1G1G2G1G2
3G1G1G1G1G1G1G1
4G3G3G3G3G3G3G3
5G1G1G1G1G1G1G1
6G1G1G1G1G1G1G1
7G2G2G2G2G2G2G2
8G2G2G2G2G2G2G2
9G2G2G2G2G2G1G2
10G1G2G1G1G1G1G1
11G2G2G2G2G2G2G2
12G2G2G2G2G2G2G2
13G1G1G1G1G1G1G1
14G2G2G2G2G2G2G2
15G2G2G2G2G2G2G2
16G1G2G1G1G1G1G1
17G2G1G2G2G2G2G2
18G1G1G1G1G1G1G1
19G2G1G2G2G1G1G1
20G1G1G1G1G1G1G1
Manual CountingDigital Image Analysis
Case No.Observer 1 (DP)Observer 1 (Ob 1)Observer 2 (Ob 2)Observer 3 (DP)QuantCenter (DP)HALO (DP)HALO (Ob 2)
1G2G2G2G2G2G2G2
2G1G1G1G1G2G1G2
3G1G1G1G1G1G1G1
4G3G3G3G3G3G3G3
5G1G1G1G1G1G1G1
6G1G1G1G1G1G1G1
7G2G2G2G2G2G2G2
8G2G2G2G2G2G2G2
9G2G2G2G2G2G1G2
10G1G2G1G1G1G1G1
11G2G2G2G2G2G2G2
12G2G2G2G2G2G2G2
13G1G1G1G1G1G1G1
14G2G2G2G2G2G2G2
15G2G2G2G2G2G2G2
16G1G2G1G1G1G1G1
17G2G1G2G2G2G2G2
18G1G1G1G1G1G1G1
19G2G1G2G2G1G1G1
20G1G1G1G1G1G1G1

DP, digital image analysis pathologist; Ob, observer.

aSee Table 3 for Ki-67 proliferation index (%) values. The column names list the method of counting (manual counting vs Quantcenter vs HALO) and the observer performing the count.

Table 6

Tumor Grade According to the World Health Organization Cutoffs (<3%, Grade 1; 3%-20%, Grade 2; >20%, Grade 3) From Different Methodsa

Manual CountingDigital Image Analysis
Case No.Observer 1 (DP)Observer 1 (Ob 1)Observer 2 (Ob 2)Observer 3 (DP)QuantCenter (DP)HALO (DP)HALO (Ob 2)
1G2G2G2G2G2G2G2
2G1G1G1G1G2G1G2
3G1G1G1G1G1G1G1
4G3G3G3G3G3G3G3
5G1G1G1G1G1G1G1
6G1G1G1G1G1G1G1
7G2G2G2G2G2G2G2
8G2G2G2G2G2G2G2
9G2G2G2G2G2G1G2
10G1G2G1G1G1G1G1
11G2G2G2G2G2G2G2
12G2G2G2G2G2G2G2
13G1G1G1G1G1G1G1
14G2G2G2G2G2G2G2
15G2G2G2G2G2G2G2
16G1G2G1G1G1G1G1
17G2G1G2G2G2G2G2
18G1G1G1G1G1G1G1
19G2G1G2G2G1G1G1
20G1G1G1G1G1G1G1
Manual CountingDigital Image Analysis
Case No.Observer 1 (DP)Observer 1 (Ob 1)Observer 2 (Ob 2)Observer 3 (DP)QuantCenter (DP)HALO (DP)HALO (Ob 2)
1G2G2G2G2G2G2G2
2G1G1G1G1G2G1G2
3G1G1G1G1G1G1G1
4G3G3G3G3G3G3G3
5G1G1G1G1G1G1G1
6G1G1G1G1G1G1G1
7G2G2G2G2G2G2G2
8G2G2G2G2G2G2G2
9G2G2G2G2G2G1G2
10G1G2G1G1G1G1G1
11G2G2G2G2G2G2G2
12G2G2G2G2G2G2G2
13G1G1G1G1G1G1G1
14G2G2G2G2G2G2G2
15G2G2G2G2G2G2G2
16G1G2G1G1G1G1G1
17G2G1G2G2G2G2G2
18G1G1G1G1G1G1G1
19G2G1G2G2G1G1G1
20G1G1G1G1G1G1G1

DP, digital image analysis pathologist; Ob, observer.

aSee Table 3 for Ki-67 proliferation index (%) values. The column names list the method of counting (manual counting vs Quantcenter vs HALO) and the observer performing the count.

Discussion

We assessed and characterized the level of agreement between manual counting–based and semiautomated DIA-based measurement of Ki-67 proliferation index (%) by several metrics. Ki-67 measurement by DIA showed excellent agreement with expert-performed manual counting values and with each other. To our knowledge, this is the first study to perform an interplatform comparison between two widely used commercial DIA platforms, 3DHistech Quantcenter and HALO, in GEP-NETs. The high interplatform reproducibility is similar to results obtained in Ki-67 measurement in breast carcinoma using the same two platforms.21

The employment of a threshold-based requirement yielded several insights into the nature of Ki-67 assessment by manual counting. While high correlation coefficients suggest a high degree of agreement, the introduction of a Ki-67 index requirement to be within 0.2x of a target value showed that human expert-vs-expert close-match reproducibility was met ~53% on average, which is comparable to a 45% average close-match rate in expert-vs-DIA reproducibility. When strict criteria for case counts meeting the threshold are added (≥80% cases within ±0.2xKi-67 index by manual counting), close concordance is not seen by either method (Table 2). In other words, the performance of the de facto gold standard was equivalent to DIA.

The ostensible superiority of manual counting rests on the notion that expert morphologic assessment directly underpins the quantitative evaluation. The task of meaningful separation of positive and negative tumor nuclei from stromal, inflammatory, and endothelial cells and other noncellular elements is thought to contribute to the accuracy of the Ki-67 proliferation index.10 On a per-nucleus basis, however, decisions to include or exclude objects in the count are driven by a multitude of factors that include the morphologic features, judgment of whether it belongs in the plane of focus and is to be included, and whether it represents a faintly immunopositive vs a darkly counterstained negative nucleus, in addition to whether the object truly represents a tumor nucleus. At this level, fatigue, interpretative variation, differing stringency levels for object identification, and the limitations imposed by print/monitor resolution effects—all causes for variability with visual assessment—have a greater impact. With a perfect image substrate, say a fully white background with a low number of widely spaced perfect dark circular objects as nuclei, decisions can be expected to be driven with a high degree of reproducibility. With imperfectly defined, less-than-ideally contrasted objects, numerous admixed mimickers, and the number of decisions required to be made in the evaluation of a hotspot (~1,000), manual counting itself is a noisy process. The resulting variability ensures that the same image/field counted by two different human observers will result in slightly different counts. Analogously, similar variability is present in image analysis algorithms whose parameters such as size limits, color characteristics, and edge contrast function as the machine vision counterparts of human morphologic assessment and result in the inclusion or exclusion of objects as nuclei.26 In both instances, the uncertainty involved in within-hotspot enumeration is additive to that imposed by the morphologic variation of nuclei between different hotspots. The net effect, as the results above show, is emergence of variability inherent in nuclear object enumeration. This variability is comparable in extent between manual counting and semiautomated image analysis, as well as possibly irreducible.

The findings suggest the existence of a “concordance ceiling” independent of the method of assessment. This has implications for the setting up and calibration of DIA algorithms in routine practice: typically, an expert user “tweaks” algorithm settings based on visual inspection of nuclear identification to finalize an algorithm state. Even if an algorithm instance can be thereby set to achieve high or near-perfect concordance with the expert, its expected rate of close agreement with other independent observers would be no more than what was possible between the observers themselves.

Interobserver variability in case-level evaluation of Ki-67 has been evaluated in the literature. In a recent large-scale study28 involving a panel of experts (n = 23) performing manual Ki-67 measurement in breast carcinoma on step-sectioned glass slides, there was a ~10% range of variability in median Ki-67 between observers in evaluating 30 cases. However, in the range comparable to values of the Ki-67 index (%) in the present study (0%-20%), in examining their raw data, the average difference between observers is much higher (18.5%). Saadeh et al26 reported interobserver variability in the counts of negative (counterstained, or blue) nuclei between three observers in gastrointestinal neuroendocrine tumors; they did not, however, report significantly large differences of the Ki-67 index. Volynskaya et al29 compared manual counting with automated analysis in primary and metastatic gastrointestinal neuroendocrine tumors. Four observers performed manual counts on printed hotspot images from 20 cases that were subject to DIA counting. A maximum of ~15% range in variation in absolute value terms between observers was noted in Ki-67 counts, but this was mainly confined to cases with high Ki-67 values, whereas low dispersion among counts with lower Ki-67 values was found. Apart from the study examining breast carcinoma, the reported variability and the findings of the present study are comparable to the above studies.

Our study has certain limitations. Although each slide was subjected to multiple rounds of Ki-67 measurement by different methods, the total number of cases subject to full analysis (n = 20) is low. The wider 95% CIs for mean Ki-67 (%) from manual counting and the relative inability of DIA methods to attain Ki-67 values close to manual counting could be a reflection of the low numbers of cases. There are additional identifiable sources of variability arising from the methodology. First, the Ki-67 values obtained at clinical reporting were included in the analysis. While this incorporates data from real-world reporting and was performed by a single pathologist (observer 1) using the same method throughout, the measurements were performed over a long period of time and under different conditions compared with the study slides. Second, both 3DHistech QuantCenter and HALO CellQuant algorithms were tuned to identify neuroendocrine tumor nuclei by a single pathologist. Another possible deficiency in the data could be the low number of cases with Ki-67 values close to the G2/G3 thresholds. Further studies with larger sample sizes and/or different software platforms would be needed to explore whether a higher “close match” with manual counting is possible with DIA Ki-67 measurement.

It is frequently lost that hotspot manual counting owes its emergence as a reproducible and clinically adopted method more to the limitations of what is possible30,31 in routine practice. With WSI analysis, a proliferation index can be calculated for larger samples of tumor nuclei approaching what is possible in quantitative analyses in flow cytometry. There may be a need to evolve new thresholds based on WSI analysis that may permit a more granular correlation with clinical outcomes.

In conclusion, based on our findings, we propose a possible validation framework for digital Ki-67 measurement in tumors to examine whether (1) the DIA Ki-67 index (%) values exhibit a high degree of correlation (r > 0.9) with manual counting with the same hotspot images under uniform conditions of immunostaining, counterstaining, and WSI acquisition and (2) the number of cases in which a close match—defined as the number of cases with the DIA Ki-67 index (%) within ±0.2xKi-67 index by manual counting—is 50% or more. Our results show that expert-vs-expert Ki-67 measurements exhibit the same variability and reproducibility as DIA: κ and close count rates were no worse for manual count–DIA comparisons than for manual counts between experts. Given the noninferiority and substantial time savings, we support more widespread adoption of DIA Ki-67 measurement in routine clinical practice.

This work was supported by NIH grant P50 CA174521-01A1 (A.M.B.).

References

1.

Rindi
G
,
Klimstra
DS
,
Abedi-Ardekani
B
, et al.
A common classification framework for neuroendocrine neoplasms: an International Agency for Research on Cancer (IARC) and World Health Organization (WHO) expert consensus proposal
.
Mod Pathol.
2018
;
31
:
1770
-
1786
.

2.

Martin-Perez
E
,
Capdevila
J
,
Castellano
D
, et al.
Prognostic factors and long-term outcome of pancreatic neuroendocrine neoplasms: Ki-67 index shows a greater impact on survival than disease stage. The large experience of the Spanish National Tumor Registry (RGETNE)
.
Neuroendocrinology.
2013
;
98
:
156
-
168
.

3.

Richards-Taylor
S
,
Ewings
SM
,
Jaynes
E
, et al.
The assessment of Ki-67 as a prognostic marker in neuroendocrine tumours: a systematic review and meta-analysis
.
J Clin Pathol.
2016
;
69
:
612
-
618
.

4.

Makkink-Nombrado
SV
,
Baak
JP
,
Schuurmans
L
, et al.
Quantitative immunohistochemistry using the CAS 200/486 image analysis system in invasive breast carcinoma: a reproducibility study
.
Anal Cell Pathol.
1995
;
8
:
227
-
245
.

5.

Pinder
SE
,
Wencyk
P
,
Sibbering
DM
, et al.
Assessment of the new proliferation marker MIB1 in breast carcinoma using image analysis: associations with other prognostic factors and survival
.
Br J Cancer.
1995
;
71
:
146
-
149
.

6.

Klimstra
DS
,
Modlin
IR
,
Adsay
NV
, et al.
Pathology reporting of neuroendocrine tumors: application of the Delphic consensus process to the development of a minimum pathology data set
.
Am J Surg Pathol.
2010
;
34
:
300
-
313
.

7.

Keck
KJ
,
Choi
A
,
Maxwell
JE
, et al.
Increased grade in neuroendocrine tumor metastases negatively impacts survival
.
Ann Surg Oncol.
2017
;
24
:
2206
-
2212
.

8.

Tang
LH
,
Gonen
M
,
Hedvat
C
, et al.
Objective quantification of the Ki67 proliferative index in neuroendocrine tumors of the gastroenteropancreatic system: a comparison of digital image analysis with manual methods
.
Am J Surg Pathol.
2012
;
36
:
1761
-
1770
.

9.

Young
HT
,
Carr
NJ
,
Green
B
, et al.
Accuracy of visual assessments of proliferation indices in gastroenteropancreatic neuroendocrine tumours
.
J Clin Pathol.
2013
;
66
:
700
-
704
.

10.

Reid
MD
,
Bagci
P
,
Ohike
N
, et al.
Calculation of the Ki67 index in pancreatic neuroendocrine tumors: a comparative analysis of four counting methodologies
.
Mod Pathol.
2016
;
29
:
93
.

11.

Cottenden
J
,
Filter
ER
,
Cottreau
J
, et al.
Validation of a cytotechnologist manual counting service for the Ki67 index in neuroendocrine tumors of the pancreas and gastrointestinal tract
.
Arch Pathol Lab Med.
2018
;
142
:
402
-
407
.

12.

Tuominen
VJ
,
Ruotoistenmäki
S
,
Viitanen
A
, et al.
ImmunoRatio: a publicly available web application for quantitative image analysis of estrogen receptor (ER), progesterone receptor (PR), and Ki-67
.
Breast Cancer Res.
2010
;
12
:
R56-R
.

13.

Jin
M
,
Roth
R
,
Gayetsky
V
, et al.
Grading pancreatic neuroendocrine neoplasms by Ki-67 staining on cytology cell blocks: manual count and digital image analysis of 58 cases
.
J Am Soc Cytopathol.
2016
;
5
:
286
-
295
.

14.

Fitzgibbons
PL
,
Bradley
LA
,
Fatheree
LA
, et al. ;
College of American Pathologists Pathology and Laboratory Quality Center
.
Principles of analytic validation of immunohistochemical assays: guideline from the College of American Pathologists Pathology and Laboratory Quality Center
.
Arch Pathol Lab Med.
2014
;
138
:
1432
-
1443
.

15.

Koopman
T
,
Buikema
HJ
,
Hollema
H
, et al.
Digital image analysis of Ki67 proliferation index in breast cancer using virtual dual staining on whole tissue sections: clinical validation and inter-platform agreement
.
Breast Cancer Res Treat.
2018
;
169
:
33
-
42
.

16.

Pham
DT
,
Skaland
I
,
Winther
TL
, et al.
Correlation between digital and manual determinations of Ki-67/MIB-1 proliferative indices in human meningiomas
.
Int J Surg Pathol.
2020;28:273-279
.

17.

Sugita
S
,
Hirano
H
,
Hatanaka
Y
, et al.
Image analysis is an excellent tool for quantifying Ki-67 to predict the prognosis of gastrointestinal stromal tumor patients
.
Pathol Int.
2018
;
68
:
7
-
11
.

18.

Liu
SZ
,
Staats
PN
,
Goicochea
L
, et al.
Automated quantification of Ki-67 proliferative index of excised neuroendocrine tumors of the lung
.
Diagn Pathol.
2014
;
9
:
174
.

19.

Wang
M
,
McLaren
S
,
Jeyathevan
R
, et al.
Laboratory validation studies in Ki-67 digital image analysis of breast carcinoma: a pathway to routine quality assurance
.
Pathology.
2019
;
51
:
246
-
252
.

20.

Wang
HY
,
Li
ZW
,
Sun
W
, et al.
Automated quantification of Ki-67 index associates with pathologic grade of pulmonary neuroendocrine tumors
.
Chin Med J (Engl).
2019
;
132
:
551
-
561
.

21.

Acs
B
,
Pelekanou
V
,
Bai
Y
, et al.
Ki67 reproducibility using digital image analysis: an inter-platform and inter-operator study
.
Lab Invest.
2019
;
99
:
107
-
117
.

22.

Del Rosario Taco Sanchez
M
,
Soler-Monsó
T
,
Petit
A
, et al.
Digital quantification of KI-67 in breast cancer
.
Virchows Arch.
2019
;
474
:
169
-
176
.

23.

Kroneman
TN
,
Voss
JS
,
Lohse
CM
, et al.
Comparison of three Ki-67 index quantification methods and clinical significance in pancreatic neuroendocrine tumors
.
Endocr Pathol.
2015
;
26
:
255
-
262
.

24.

Basile
ML
,
Kuga
FS
,
Del Carlo Bernardi
F
.
Comparation of the quantification of the proliferative index KI67 between eyeball and semi-automated digital analysis in gastro-intestinal neuroendrocrine tumors
.
Surg Exp Pathol.
2019
;
2
:
21
.

25.

Dogukan
FM
,
Yilmaz Ozguven
B
,
Dogukan
R
, et al.
Comparison of monitor-image and printout-image methods in Ki-67 scoring of gastroenteropancreatic neuroendocrine tumors
.
Endocr Pathol.
2019
;
30
:
17
-
23
.

26.

Saadeh
H
,
Abdullah
N
,
Erashdi
M
, et al.
Histopathologist-level quantification of Ki-67 immunoexpression in gastroenteropancreatic neuroendocrine tumors using semiautomated method
.
J Med Imaging (Bellingham).
2020
;
7
:
012704
.

27.

Shrout
PE
,
Fleiss
JL
.
Intraclass correlations: uses in assessing rater reliability
.
Psychol Bull.
1979
;
86
:
420
-
428
.

28.

Leung
SCY
,
Nielsen
TO
,
Zabaglo
LA
, et al. ;
International Ki67 in Breast Cancer Working Group of the Breast International Group and North American Breast Cancer Group (BIG-NABCG)
.
Analytical validation of a standardised scoring protocol for Ki67 immunohistochemistry on breast cancer excision whole sections: an international multicentre collaboration
.
Histopathology.
2019
;
75
:
225
-
235
.

29.

Volynskaya
Z
,
Mete
O
,
Pakbaz
S
, et al.
Ki67 quantitative interpretation: insights using image analysis
.
J Pathol Inform.
2019
;
10
:
8
.

30.

Cavalcanti
MS
,
Gönen
M
,
Klimstra
DS
.
The ENETS/WHO grading system for neuroendocrine neoplasms of the gastroenteropancreatic system: a review of the current state, limitations and proposals for modifications
.
Int J Endocr Oncol.
2016
;
3
:
203
-
219
.

31.

Scarpa
A
,
Mantovani
W
,
Capelli
P
, et al.
Pancreatic endocrine tumors: improved TNM staging and histopathological grading permit a clinically efficient prognostic stratification of patients
.
Mod Pathol.
2010
;
23
:
824
-
833
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)