-
PDF
- Split View
-
Views
-
Cite
Cite
Sarag A Boukhar, Matthew D Gosse, Andrew M Bellizzi, Anand Rajan K D, Ki-67 Proliferation Index Assessment in Gastroenteropancreatic Neuroendocrine Tumors by Digital Image Analysis With Stringent Case and Hotspot Level Concordance Requirements, American Journal of Clinical Pathology, Volume 156, Issue 4, October 2021, Pages 607–619, https://doi.org/10.1093/ajcp/aqaa275
- Share Icon Share
Abstract
The Ki-67 proliferation index is integral to gastroenteropancreatic neuroendocrine tumor (GEP-NET) assessment. Automated Ki-67 measurement would aid clinical workflows, but adoption has lagged owing to concerns of nonequivalency. We sought to address this concern by comparing 2 digital image analysis (DIA) platforms to manual counting with same-case/different-hotspot and same-hotspot/different-methodology concordance assessment.
We assembled a cohort of GEP-NETs (n = 20) from 16 patients. Two sets of Ki-67 hotspots were manually counted by three observers and by two DIA platforms, QuantCenter and HALO. Concordance between methods and observers was assessed using intraclass correlation coefficient (ICC) measures. For each comparison pair, the number of cases within ±0.2xKi-67 of its comparator was assessed.
DIA Ki-67 showed excellent correlation with manual counting, and ICC was excellent in both within-hotspot and case-level assessments. In expert-vs-DIA, DIA-vs-DIA, or expert-vs-expert comparisons, the best-performing was DIA Ki-67 by QuantCenter, which showed 65% cases within ±0.2xKi-67 of manual counting.
Ki-67 measurement by DIA is highly correlated with expert-assessed values. However, close concordance by strict criteria (>80% within ±0.2xKi-67) is not seen with DIA-vs-expert or expert-vs-expert comparisons. The results show analytic noninferiority and support widespread adoption of carefully optimized and validated DIA Ki-67.
Expert Ki-67 measurement in gastroenteropancreatic neuroendocrine tumors by manual counting exhibits the same variability and reproducibility as digital image analysis (DIA).
Measurement of Ki-67 proliferation index by whole-slide DIA is a robust method that can be adopted in routine clinical practice.
We propose a validation framework for digital Ki-67 that (1) ensures DIA Ki-67% exhibits high correlation (r > 0.9) with manual counting and (2) shows more than 50% cases within ±0.2xKi-67 of manual counting.
The Ki-67 proliferation index is integral to the diagnosis and prognosis of gastroenteropancreatic neuroendocrine tumors (GEP-NETs) and in many instances guides therapy.1 The Ki-67 index has been shown to be more powerful than tumor stage as a prognostic factor in pancreatic tumors,2 and there are significant overall survival differences between World Health Organization (WHO) G1 and G3 tumors.3 Manual counting of camera-captured images represents the de facto gold standard for Ki-67 assessment. Computer-assisted systems (CAS 2004,5) dating back to the mid-1990s did not widely penetrate clinical practice. Given increased access to whole-slide imaging and technological leaps in computing, the time is ripe for more widespread adoption of Ki-67 digital image analysis (DIA), although concerns about nonequivalency persist.6
The University of Iowa Hospitals and Clinics has a large neuroendocrine tumor program; we follow over 2,000 patients and see 200 new patients a year. We perform Ki-67 immunohistochemistry on all neuroendocrine tumors (NETs) (ie, primary, recurrent, metastatic), and on resections we separately evaluate primary, regional, and distant disease.7 Although we gestalt (aka “eyeball estimate”) rare Ki-67 proliferation indices (eg, those close to 0%), we manually count most (especially those around the WHO G1/2 and 2/3 thresholds), which is exceedingly time intensive. We perform manual counting of camera-captured images, as eyeball estimation and manual counting under the microscope have been shown to be clearly inferior.8-11 Some in our group have used the ImmunoRatio web application,12,13 which is an automated counting method with specific image input requirements. Observers have contended that DIA-based quantification and immunohistochemical assessment carry limitations. Reid et al,10 for example, ascribed high costs to implementing digital quantification. In addition, they contended that miscounting due to inclusion of nontarget cells was a contributor to inaccuracy with digital methods. This study partly grew from our Ki-67 DIA validation process. One of us is an immunohistochemistry laboratory director, and we drew heavily on that experience and the College of American Pathologists guideline for the analytic validation of immunohistochemical assays (eg, cohort size of 20 cases, expected concordance between comparator and the gold standard of 90%, and investigation of sources of lower-level concordance).14
The Ki-67 proliferation index is unique in surgical pathology in that it represents a continuous variable ranging from 1% to 100%. Studies evaluating Ki-67 DIA assessment across tumor types,15-22 including GEP-NETs,9-11,13,23-26 typically report the Pearson correlation coefficient (r) or the intraclass correlation coefficient (ICC).27 Despite the fact that Ki-67 is a continuous variable, NET grades are assigned at (arbitrary) proliferation index thresholds (ie, G1/2 at 3%, G2/3 at 20%), and crossing a grade threshold can have clinical import (eg, clinical trial eligibility). Two assays can be highly correlated but consistently produce results on opposite sides of these thresholds. Thus, we concluded that correlation coefficients alone were an insufficient metric for a validation and that, more broadly, there was a need for a simpler index or parameter of evaluating agreement between methods in histopathology that produce continuous data. The 3% GEP-NET G1/G2 threshold is frequently encountered in clinical practice; counts within 10% of this value range from 2.7% to 3.3%. However, this was intuitively and by consensus deemed “too difficult to reproduce.” The ±20% range around this value is 2.4% to 3.6%, and this represented our reproducibility target, and we refer to comparisons within the ±0.2xKi-67 index as “close matches.”
With the above, we sought to perform an in-depth examination of the reproducibility of digital Ki-67 measurement in GEP-NET with two widely available commercial whole-slide image–based DIA platforms with both hotspot- and case-level concordance assessment in comparison to manual counting. We examined the effect of strict predetermined thresholds to quantify the proximity of Ki-67 indices obtained by manual and digital methods. In addition, we examined the extent of grade-discrepant Ki-67 values in cases obtained by manual and digital methods.
Materials and Methods
Case Inclusion
The validation study was carried out as part of a quality improvement project. The University of Iowa Institutional Review Board determined the validation to be exempted from a full review by the board. We initially assembled a cohort of 20 GEP-NETs, including a mix of primary and metastatic tumors, from 16 patients. The cohort was assembled with consideration of the grade distribution in our patient population (66% G1, 30% G2, <5% G3). We intentionally overrepresented G2 tumors (50% of the cohort) (as G1 tumors with proliferation indices frequently close to 0% were not considered a “fair test”) and intentionally challenged the G1/G2 threshold (35% of the cohort with reported proliferation indices from 2%-5%). The cases were selected from 2017 and 2018 to reduce variation in counterstain (hematoxylin) quality and intensity. They were all initially signed out by a single expert gastroenterology (GI) pathologist with 11 years of experience (observer 1), who had performed manual counting of camera-captured Ki-67 hotspot images at the time of clinical reporting. Mitotic activity was not considered in grade assignment.
G3 GEP-NETs are relatively infrequently encountered, although they are overrepresented in our institutional consults. At the recommendation of the reviewers, we evaluated an additional group of five tumors from four patients, targeting Ki-67 proliferation indices in the 10% to 30% range. Clinicopathologic features of these 25 total cases are presented in Table 1.
Patient No. . | Slide No. . | Age, y . | Sex . | Tumor Location . | Primary or Metastatic . | Clinical Reported Ki-67 PI . | WHO Tumor Grade . |
---|---|---|---|---|---|---|---|
1 | 1 | 51 | F | Distal pancreas | Primary | 6.20 | G2 |
2 | 2 | 58 | F | Terminal ileum | Primary | 2.00 | G1 |
3 | 3 | 75 | M | Duodenum | Primary | 2.00 | G1 |
4 | 4 | 62 | M | Distal pancreas | Primary | 23.00 | G3 |
5 | 5 | 51 | F | Duodenum | Primary | 1.20 | G1 |
6 | 51 | F | Lymph node | Metastatic | 1.60 | G1 | |
6 | 7 | 75 | F | CBD lymph node | Metastatic | 4.00 | G2 |
8 | Duodenum | Primary | 5.50 | G2 | |||
7 | 9 | 69 | F | Duodenum | Primary | 9.00 | G2 |
8 | 10 | 54 | F | Proximal ileum | Primary | 3.00 | G2 |
11 | Ileal tumor nodule | Metastatic | 9.00 | G2 | |||
12 | Liver | Metastatic | 8.50 | G2 | |||
9 | 13 | 54 | F | Proximal ileum | Primary | 0.70 | G1 |
10 | 14 | 70 | F | Pararenal node | Metastatic | 10.70 | G2 |
15 | Liver | Metastatic | 10.10 | G2 | |||
12 | 16 | 68 | M | Duodenum | Primary | 4.50 | G2 |
13 | 17 | 63 | M | Liver | Metastatic | 2.30 | G1 |
14 | 18 | 56 | M | Terminal ileum | Primary | 1.90 | G1 |
15 | 19 | 67 | M | Diaphragmatic nodule | Metastatic | 2.99 | G1 |
16 | 20 | 59 | F | Stomach | Primary | 2.00 | G1 |
17 | 21 | 77 | M | Stomach | Primary | 15 | G2 |
18 | 22 | 67 | F | Liver | Metastatic | 7.1 | G2 |
19 | 23 | 60 | F | Ileum | Primary | 8 | G2 |
20 | 24 | 78 | F | Liver metastasis | Metastatic | 16 | G2 |
25 | Pancreas | Primary | 36 | G3 |
Patient No. . | Slide No. . | Age, y . | Sex . | Tumor Location . | Primary or Metastatic . | Clinical Reported Ki-67 PI . | WHO Tumor Grade . |
---|---|---|---|---|---|---|---|
1 | 1 | 51 | F | Distal pancreas | Primary | 6.20 | G2 |
2 | 2 | 58 | F | Terminal ileum | Primary | 2.00 | G1 |
3 | 3 | 75 | M | Duodenum | Primary | 2.00 | G1 |
4 | 4 | 62 | M | Distal pancreas | Primary | 23.00 | G3 |
5 | 5 | 51 | F | Duodenum | Primary | 1.20 | G1 |
6 | 51 | F | Lymph node | Metastatic | 1.60 | G1 | |
6 | 7 | 75 | F | CBD lymph node | Metastatic | 4.00 | G2 |
8 | Duodenum | Primary | 5.50 | G2 | |||
7 | 9 | 69 | F | Duodenum | Primary | 9.00 | G2 |
8 | 10 | 54 | F | Proximal ileum | Primary | 3.00 | G2 |
11 | Ileal tumor nodule | Metastatic | 9.00 | G2 | |||
12 | Liver | Metastatic | 8.50 | G2 | |||
9 | 13 | 54 | F | Proximal ileum | Primary | 0.70 | G1 |
10 | 14 | 70 | F | Pararenal node | Metastatic | 10.70 | G2 |
15 | Liver | Metastatic | 10.10 | G2 | |||
12 | 16 | 68 | M | Duodenum | Primary | 4.50 | G2 |
13 | 17 | 63 | M | Liver | Metastatic | 2.30 | G1 |
14 | 18 | 56 | M | Terminal ileum | Primary | 1.90 | G1 |
15 | 19 | 67 | M | Diaphragmatic nodule | Metastatic | 2.99 | G1 |
16 | 20 | 59 | F | Stomach | Primary | 2.00 | G1 |
17 | 21 | 77 | M | Stomach | Primary | 15 | G2 |
18 | 22 | 67 | F | Liver | Metastatic | 7.1 | G2 |
19 | 23 | 60 | F | Ileum | Primary | 8 | G2 |
20 | 24 | 78 | F | Liver metastasis | Metastatic | 16 | G2 |
25 | Pancreas | Primary | 36 | G3 |
CBD, common bile duct; PI, proliferation index; WHO, World Health Organization.
aKi-67 proliferation index values at clinical reporting and corresponding tumor grades according to the WHO cutoffs (<3%, grade 1; 3%-20%, grade 2; >20%, grade 3) are shown.
Patient No. . | Slide No. . | Age, y . | Sex . | Tumor Location . | Primary or Metastatic . | Clinical Reported Ki-67 PI . | WHO Tumor Grade . |
---|---|---|---|---|---|---|---|
1 | 1 | 51 | F | Distal pancreas | Primary | 6.20 | G2 |
2 | 2 | 58 | F | Terminal ileum | Primary | 2.00 | G1 |
3 | 3 | 75 | M | Duodenum | Primary | 2.00 | G1 |
4 | 4 | 62 | M | Distal pancreas | Primary | 23.00 | G3 |
5 | 5 | 51 | F | Duodenum | Primary | 1.20 | G1 |
6 | 51 | F | Lymph node | Metastatic | 1.60 | G1 | |
6 | 7 | 75 | F | CBD lymph node | Metastatic | 4.00 | G2 |
8 | Duodenum | Primary | 5.50 | G2 | |||
7 | 9 | 69 | F | Duodenum | Primary | 9.00 | G2 |
8 | 10 | 54 | F | Proximal ileum | Primary | 3.00 | G2 |
11 | Ileal tumor nodule | Metastatic | 9.00 | G2 | |||
12 | Liver | Metastatic | 8.50 | G2 | |||
9 | 13 | 54 | F | Proximal ileum | Primary | 0.70 | G1 |
10 | 14 | 70 | F | Pararenal node | Metastatic | 10.70 | G2 |
15 | Liver | Metastatic | 10.10 | G2 | |||
12 | 16 | 68 | M | Duodenum | Primary | 4.50 | G2 |
13 | 17 | 63 | M | Liver | Metastatic | 2.30 | G1 |
14 | 18 | 56 | M | Terminal ileum | Primary | 1.90 | G1 |
15 | 19 | 67 | M | Diaphragmatic nodule | Metastatic | 2.99 | G1 |
16 | 20 | 59 | F | Stomach | Primary | 2.00 | G1 |
17 | 21 | 77 | M | Stomach | Primary | 15 | G2 |
18 | 22 | 67 | F | Liver | Metastatic | 7.1 | G2 |
19 | 23 | 60 | F | Ileum | Primary | 8 | G2 |
20 | 24 | 78 | F | Liver metastasis | Metastatic | 16 | G2 |
25 | Pancreas | Primary | 36 | G3 |
Patient No. . | Slide No. . | Age, y . | Sex . | Tumor Location . | Primary or Metastatic . | Clinical Reported Ki-67 PI . | WHO Tumor Grade . |
---|---|---|---|---|---|---|---|
1 | 1 | 51 | F | Distal pancreas | Primary | 6.20 | G2 |
2 | 2 | 58 | F | Terminal ileum | Primary | 2.00 | G1 |
3 | 3 | 75 | M | Duodenum | Primary | 2.00 | G1 |
4 | 4 | 62 | M | Distal pancreas | Primary | 23.00 | G3 |
5 | 5 | 51 | F | Duodenum | Primary | 1.20 | G1 |
6 | 51 | F | Lymph node | Metastatic | 1.60 | G1 | |
6 | 7 | 75 | F | CBD lymph node | Metastatic | 4.00 | G2 |
8 | Duodenum | Primary | 5.50 | G2 | |||
7 | 9 | 69 | F | Duodenum | Primary | 9.00 | G2 |
8 | 10 | 54 | F | Proximal ileum | Primary | 3.00 | G2 |
11 | Ileal tumor nodule | Metastatic | 9.00 | G2 | |||
12 | Liver | Metastatic | 8.50 | G2 | |||
9 | 13 | 54 | F | Proximal ileum | Primary | 0.70 | G1 |
10 | 14 | 70 | F | Pararenal node | Metastatic | 10.70 | G2 |
15 | Liver | Metastatic | 10.10 | G2 | |||
12 | 16 | 68 | M | Duodenum | Primary | 4.50 | G2 |
13 | 17 | 63 | M | Liver | Metastatic | 2.30 | G1 |
14 | 18 | 56 | M | Terminal ileum | Primary | 1.90 | G1 |
15 | 19 | 67 | M | Diaphragmatic nodule | Metastatic | 2.99 | G1 |
16 | 20 | 59 | F | Stomach | Primary | 2.00 | G1 |
17 | 21 | 77 | M | Stomach | Primary | 15 | G2 |
18 | 22 | 67 | F | Liver | Metastatic | 7.1 | G2 |
19 | 23 | 60 | F | Ileum | Primary | 8 | G2 |
20 | 24 | 78 | F | Liver metastasis | Metastatic | 16 | G2 |
25 | Pancreas | Primary | 36 | G3 |
CBD, common bile duct; PI, proliferation index; WHO, World Health Organization.
aKi-67 proliferation index values at clinical reporting and corresponding tumor grades according to the WHO cutoffs (<3%, grade 1; 3%-20%, grade 2; >20%, grade 3) are shown.
Ki-67 Counting and Digital Measurement
Manual Counting of Hotspot Photomicrographs From Glass Slides
Immunohistochemistry for Ki-67 was performed using standard techniques (MIB1; 1:200 dilution, heat-induced epitope retrieval; Dako). In routine reporting, Ki-67 proliferation index values were measured by manual counting of immunostained nuclei (all by observer 1).Hotspot Selection.—In brief, hotspot selection was performed by light microscopy on glass slides by carefully combined low- and high-magnification examination of tumor areas to identify the area/focus with the highest apparent density of immunopositive tumor nuclei Figure 1. Depending on the proportions of tumor and stroma, two to three nonoverlapping photomicrographs were acquired (×400, 2,448 × 1,920 pixels, 72 pixels per inch) from each hotspot area with an Olympus DX27 CCD camera (Olympus) Figure 2A. The captured images together were known from routine practice to contain approximately 1,000 tumor cells.Manual Counting.—Manual counting was performed with cellSens version 1.7 (Olympus) using the point-annotation function applied to the hotspot images viewed in cellSens on computer monitors Figure 2C. Immunonegative tumor nuclei were included, and stromal, inflammatory, and endothelial cells were excluded. Ki-67–positive tumor nuclei were counted separately, and positive nontumor elements (inflammatory cell, stromal and endothelial nuclei) were excluded by morphologic assessment. The historical/retrospective data from routine reporting as described in the above method were included in the analysis Table 2. The remainder of the measurements were obtained prospectively.
Ki-67 measurement by manual and digital image analysis (DIA) methods. A representative case (A) with segmented nuclei in HALO (B) and the same field with manual counting annotation of nuclei performed in Olympus cellSens (C). An annotated hotspot on a digital whole-slide image (D). This is subject to DIA measurement in 3D Histech Quantcenter (E) and manual counting in cellSens (F). The open plus symbols in C and F are from manual counting.
![Study design. The circles indicate hotspots and the site of hotspot selection (glass vs whole-slide image [WSI]) and the connected rectangles the method of Ki-67 measurement and the image substrate used to perform counting (photomicrograph [PMG]). “Study hotspots” were annotated by the digital image analysis (DIA) pathologist on WSIs and were counted by observer 1 and observer 3 and measured by HALO and QuantCenter. Observer 1 and observer 2 hotspots were identified on glass slides and counted using on-screen photomicrographs in Olympus cellSens. The dashed lines (green, expert vs expert; blue, expert vs DIA; red, DIA vs DIA) indicate the intercomparisons performed in the analysis.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/ajcp/156/4/10.1093_ajcp_aqaa275/3/m_aqaa275_fig1.jpeg?Expires=1748447749&Signature=PYOBe~~Yv5jsIw042S1ITl5dPhRpbMnQ9ne33fLjouwQJ-G6hSmlUUY7nYqbAfRps6uhU-nMlCdaEkFoUeXIUZdceC6xyDQDnigfWnn85rgCDj0hOC-j13Jh7oiQtM4rGLxHLvoA2tpvbWctm67A6Cag1~EEh6DqoZmNpEG9--O7GgfjjXiGxJ0tK8n2Qv37DSDhXsoxnIfoo7RGeiOdblKdxXAM5F4TAE24TxmxLHAKDd7Ge9sa96aF~6AoreTfK3IvObgQ73ogyS5QuocuI6oQBOk-L5AVNYT8Ms9adUKe~Z7uKLc2kfB3iBgVgfV-Xu9mICtN1KXVc0yoNaDuDg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Study design. The circles indicate hotspots and the site of hotspot selection (glass vs whole-slide image [WSI]) and the connected rectangles the method of Ki-67 measurement and the image substrate used to perform counting (photomicrograph [PMG]). “Study hotspots” were annotated by the digital image analysis (DIA) pathologist on WSIs and were counted by observer 1 and observer 3 and measured by HALO and QuantCenter. Observer 1 and observer 2 hotspots were identified on glass slides and counted using on-screen photomicrographs in Olympus cellSens. The dashed lines (green, expert vs expert; blue, expert vs DIA; red, DIA vs DIA) indicate the intercomparisons performed in the analysis.
In the study, a second independent set of hotspots was prospectively chosen on glass slides by another expert GI pathologist (observer 2). Using the same method as described above, two to three TIFF images were acquired with an Olympus DX27 camera at ×400 from each hotspot area. The acquired photomicrographs were manually counted in cellSens by observer 2, as described above.
Manual Counting of Hotspots Acquired From Digital Whole Slides
Snapshot images containing study hotspots from whole-slide images (WSIs; see below) were manually counted by two independent observers (observer 1, observer 3) using cellSens’s point-annotation function, as described above (Figure 2C and Figure 2F).
DIA-Based Ki-67 Measurement on WSI and Hotspot Photomicrographs
The Ki-67 immunohistochemistry slides were digitally scanned with the ×20 objective (0.24-μm/pixel resolution) by a P1000 Pannoramic scanner (3DHistech) to yield .mrxs digital WSIs.Hotspot Selection.—On WSI slides, a ×400 high-power field–equivalent circular area (0.152 mm2), known from prior experience to contain ~1,000 tumor cells, was annotated by a third pathologist independent of the prior two selections using 3DHistech Caseviewer’s gradient visualization function that facilitates identification of the highest Ki-67–labeled areas Figure 2D. This was carried out by the pathologist performing digital image analysis (DP), and the selected hotspots are termed study hotspots. In addition, the study hotspots were extracted using Caseviewer’s Snapshot function in a 1,920 × 1,017-pixel JPEG image that contained within it the circular annotation.Automated Counting by DIA.—The DP annotated study hotspots were analyzed in 3DHistech QuantCenter with the CellQuant algorithm to digitally enumerate nuclei on WSIs Figure 2E. Because of a lack of cross-compatibility between 3DHistech Caseviewer and HALO (Indica Labs), digital whole-slide annotated study hotspots could not be directly opened in HALO. Instead, study hotspot circular annotated areas were exported using Caseviewer’s Export function as standalone .mrxs WSI files, opened in HALO, and analyzed using the Cytonuclear algorithm. Similarly, observer 2–chosen TIFF hotspot images acquired by photomicrography from glass slides were imported and analyzed in HALO using the Cytonuclear algorithm (Figure 2B).
The selection of hotspots and counting methods are depicted in Figure 1, Figure 2, and Table 2. CellQuant and Cytonuclear algorithms were independently optimized for tumor nuclear detection and positive nucleus classification using multiple areas from nonstudy neuroendocrine tumor and other WSI slides. The same algorithm settings were used in QuantCenter and HALO when analyzing WSIs or imported hotspot photomicrographs each, respectively. Nuclear count data were exported as .xls files. The optimized algorithm parameters for 3DHistech CellQuant and HALO CytoNuclear, and example results from both platforms are included in the supplementary data (Supplementary Figure S1, Supplementary Figure S2, and Supplementary Figure S3; all supplemental materials can be found at American Journal of Clinical Pathology online).
Summary of Ki-67 Measurement Experiments Performed or Included in the Studya
Hotspot Selector . | Manual Counting . | Analysis Platform . |
---|---|---|
DIA pathologist | Observer 3 Observer 1 | QuantCenter HALO |
Observer 1 | Observer 1 | |
Observer 2 | Observer 2 | HALO |
Hotspot Selector . | Manual Counting . | Analysis Platform . |
---|---|---|
DIA pathologist | Observer 3 Observer 1 | QuantCenter HALO |
Observer 1 | Observer 1 | |
Observer 2 | Observer 2 | HALO |
aEach row indicates the hotspot selector and the methods of counting each of the selected hotspots were subjected to. The digital image analysis (DIA) pathologist study hotspots and observer 2 chosen hotspots were counted by both DIA platforms. Observer 1 counts on observer 1–chosen hotspots were performed as part of clinical reporting of each case.
Summary of Ki-67 Measurement Experiments Performed or Included in the Studya
Hotspot Selector . | Manual Counting . | Analysis Platform . |
---|---|---|
DIA pathologist | Observer 3 Observer 1 | QuantCenter HALO |
Observer 1 | Observer 1 | |
Observer 2 | Observer 2 | HALO |
Hotspot Selector . | Manual Counting . | Analysis Platform . |
---|---|---|
DIA pathologist | Observer 3 Observer 1 | QuantCenter HALO |
Observer 1 | Observer 1 | |
Observer 2 | Observer 2 | HALO |
aEach row indicates the hotspot selector and the methods of counting each of the selected hotspots were subjected to. The digital image analysis (DIA) pathologist study hotspots and observer 2 chosen hotspots were counted by both DIA platforms. Observer 1 counts on observer 1–chosen hotspots were performed as part of clinical reporting of each case.
Five cases with higher clinically reported Ki-67 index values were selected and analyzed separately. For these slides, hotspots were chosen and annotated on WSIs by an expert GI pathologist (observer 2). The hotspots were extracted as JPEG images using Caseviewer’s Snapshot function, resulting in JPEG images that contained the circular annotation. These were manually counted by two pathologists (observer 1, observer 2) in Olympus cellSens. The hotspots were analyzed in 3DHistech QuantCenter with the CellQuant algorithm to digitally enumerate nuclei on WSIs. Ki-67 measurement by observer 3 manual counting and HALO were not performed. Because the five cases were not subject to the uniform measurement process as the remaining 20, their data are analyzed and presented separately (Supplemental Figure S4).
Analysis
Ki-67 proliferation indices were calculated by the standard method: positive tumor nuclei/total tumor nuclei. For every comparison, the Pearson correlation coefficient (r) was calculated. Concordance within methodologic and observer groups was assessed by ICC measures. As a measure of the closeness of match between measured Ki-67 indices, a preset threshold was assessed, namely, the number of cases for each method that fell within ±0.2x of its comparator. In all comparisons, where applicable, the manual counts were used to derive close match thresholds. In addition, for each comparison pair, κ statistics and the number of cases that were WHO GEP-NET grade discrepant were assessed. Confidence intervals (CIs) for mean Ki-67 (%) were estimated using 1,000 bootstrap samples. Differences between mean Ki-67 values were compared using the Student t test and differences in proportions analyzed using Fisher exact test. Statistical analysis was performed using SPSS version 26 and RStudio version 1.2.1335 running R 64-bit version 3.6.1.
Results
Profile of Cases
Taken together, the age of the patients (n = 20) ranged from 51 to 78 years. There were 7 men and 13 women (male/female = 1:1.8). Sections from 25 tumors from these patients were studied; of these, 15 were primary tumors, and the remaining 10 were metastatic foci. The primary sites studied included pancreas (n = 3), duodenum (n = 5), ileum (n = 5), and stomach (n = 2). There were four liver metastases, with nodal metastases (n = 3) and peritoneal nodules (n = 3) forming the remainder. The clinically reported Ki-67 proliferation index values ranged from 0.70% to 36%, with a mean of 5.5%, with 9 G1, 14 G2, and 2 G3 (WHO) tumors. The five cases analyzed separately showed a mean Ki-67 of 16.42%, with three reported with Ki-67 values more than 10%. The details of the included cases and slides are presented in Table 1.
Correlation Analysis
The measured Ki-67 values are shown in Table 3 and Supplementary Table S1. Ki-67 measurement by both DIA platforms showed excellent correlation with their human counterparts Figure 3❚ and Table 4. Table 4 displays the findings of Ki-67 tumor slides that were subjected to the full analysis (n = 20) described above. The Pearson correlation coefficient (r) ranged from 0.881 to 0.980 (P < .001, all possible pairs), indicating a high degree of correlation between different observers and between methods (Table 4). The ICC was excellent in both within-hotspot (0.94, and 0.89, respectively) and case-level assessments (0.926, all P < .001). In addition, ICC values were high when examined expert vs expert (0.97; 95% CI, 0.95-0.98) and expert vs DIA (0.91; 95% CI, 0.84-0.96). Reproducibility between different DIA platforms (ie, DIA-vs-DIA assessments) was 0.88 (95% CI, 0.77-0.94).
Tumor No. . | Manual Counting . | . | . | . | Digital Image Analysis . | . | . |
---|---|---|---|---|---|---|---|
. | Observer 1 (DP) . | Observer 1 (Ob 1) . | Observer 2 (Ob 2) . | Observer 3 (DP) . | QuantCenter (DP) . | HALO (DP) . | HALO (Ob 2) . |
1 | 6.31 | 6.20 | 5.85 | 5.92 | 5.70 | 4.70 | 5.72 |
2 | 1.95 | 2.00 | 2.40 | 1.72 | 3.34 | 2.49 | 3.40 |
3 | 1.85 | 2.00 | 1.24 | 1.04 | 2.16 | 2.06 | 1.85 |
4 | 24.67 | 23.00 | 23.84 | 22.61 | 24.33 | 20.90 | 36.50 |
5 | 0.60 | 1.20 | 1.24 | 0.60 | 0.52 | 0.43 | 1.04 |
6 | 1.37 | 1.60 | 1.64 | 1.26 | 2.75 | 1.13 | 1.69 |
7 | 3.81 | 4.00 | 4.74 | 3.79 | 3.46 | 3.14 | 4.23 |
8 | 4.80 | 5.50 | 4.39 | 5.38 | 5.48 | 4.80 | 3.78 |
9 | 6.42 | 9.00 | 5.77 | 7.06 | 6.62 | 2.11 | 5.08 |
10 | 2.12 | 3.00 | 2.44 | 2.12 | 1.95 | 1.32 | 1.88 |
11 | 8.13 | 9.00 | 4.27 | 7.95 | 4.36 | 3.34 | 3.42 |
12 | 5.95 | 8.50 | 7.57 | 6.61 | 6.07 | 4.80 | 6.09 |
13 | 0.44 | 0.70 | 0.98 | 0.44 | 1.16 | 0.90 | 1.04 |
14 | 11.73 | 10.70 | 9.51 | 11.40 | 10.29 | 3.82 | 7.91 |
15 | 8.08 | 10.10 | 8.58 | 7.71 | 6.05 | 4.61 | 8.01 |
16 | 2.41 | 4.50 | 2.13 | 2.34 | 2.63 | 2.36 | 1.49 |
17 | 3.61 | 2.30 | 4.65 | 4.11 | 4.24 | 4.13 | 4.04 |
18 | 1.72 | 1.90 | 2.51 | 1.71 | 1.61 | 1.64 | 1.91 |
19 | 3.41 | 2.99 | 4.11 | 3.13 | 1.94 | 1.51 | 2.82 |
20 | 2.09 | 2.00 | 2.00 | 1.96 | 1.60 | 1.38 | 1.56 |
21 | 14.9 | 15 | 20.2 | — | 23.15 | — | — |
22 | 7.9 | 7.1 | 9.8 | — | 13.48 | — | — |
23 | 7.1 | 8 | 8.1 | — | 6.45 | — | — |
24 | 26 | 16 | 29.6 | — | 27.56 | — | — |
25 | 21.1 | 36 | 26.5 | — | 24.5 | — | — |
Tumor No. . | Manual Counting . | . | . | . | Digital Image Analysis . | . | . |
---|---|---|---|---|---|---|---|
. | Observer 1 (DP) . | Observer 1 (Ob 1) . | Observer 2 (Ob 2) . | Observer 3 (DP) . | QuantCenter (DP) . | HALO (DP) . | HALO (Ob 2) . |
1 | 6.31 | 6.20 | 5.85 | 5.92 | 5.70 | 4.70 | 5.72 |
2 | 1.95 | 2.00 | 2.40 | 1.72 | 3.34 | 2.49 | 3.40 |
3 | 1.85 | 2.00 | 1.24 | 1.04 | 2.16 | 2.06 | 1.85 |
4 | 24.67 | 23.00 | 23.84 | 22.61 | 24.33 | 20.90 | 36.50 |
5 | 0.60 | 1.20 | 1.24 | 0.60 | 0.52 | 0.43 | 1.04 |
6 | 1.37 | 1.60 | 1.64 | 1.26 | 2.75 | 1.13 | 1.69 |
7 | 3.81 | 4.00 | 4.74 | 3.79 | 3.46 | 3.14 | 4.23 |
8 | 4.80 | 5.50 | 4.39 | 5.38 | 5.48 | 4.80 | 3.78 |
9 | 6.42 | 9.00 | 5.77 | 7.06 | 6.62 | 2.11 | 5.08 |
10 | 2.12 | 3.00 | 2.44 | 2.12 | 1.95 | 1.32 | 1.88 |
11 | 8.13 | 9.00 | 4.27 | 7.95 | 4.36 | 3.34 | 3.42 |
12 | 5.95 | 8.50 | 7.57 | 6.61 | 6.07 | 4.80 | 6.09 |
13 | 0.44 | 0.70 | 0.98 | 0.44 | 1.16 | 0.90 | 1.04 |
14 | 11.73 | 10.70 | 9.51 | 11.40 | 10.29 | 3.82 | 7.91 |
15 | 8.08 | 10.10 | 8.58 | 7.71 | 6.05 | 4.61 | 8.01 |
16 | 2.41 | 4.50 | 2.13 | 2.34 | 2.63 | 2.36 | 1.49 |
17 | 3.61 | 2.30 | 4.65 | 4.11 | 4.24 | 4.13 | 4.04 |
18 | 1.72 | 1.90 | 2.51 | 1.71 | 1.61 | 1.64 | 1.91 |
19 | 3.41 | 2.99 | 4.11 | 3.13 | 1.94 | 1.51 | 2.82 |
20 | 2.09 | 2.00 | 2.00 | 1.96 | 1.60 | 1.38 | 1.56 |
21 | 14.9 | 15 | 20.2 | — | 23.15 | — | — |
22 | 7.9 | 7.1 | 9.8 | — | 13.48 | — | — |
23 | 7.1 | 8 | 8.1 | — | 6.45 | — | — |
24 | 26 | 16 | 29.6 | — | 27.56 | — | — |
25 | 21.1 | 36 | 26.5 | — | 24.5 | — | — |
DP, digital image analysis pathologist; Ob, observer; —, data unavailable.
aThe column names list the method of counting (manual counting vs Quantcenter vs HALO) and the observer performing the count. Listed in parentheses are observers choosing the hotspots. For cases 21 to 25, hotspots were chosen by observer 2.
Tumor No. . | Manual Counting . | . | . | . | Digital Image Analysis . | . | . |
---|---|---|---|---|---|---|---|
. | Observer 1 (DP) . | Observer 1 (Ob 1) . | Observer 2 (Ob 2) . | Observer 3 (DP) . | QuantCenter (DP) . | HALO (DP) . | HALO (Ob 2) . |
1 | 6.31 | 6.20 | 5.85 | 5.92 | 5.70 | 4.70 | 5.72 |
2 | 1.95 | 2.00 | 2.40 | 1.72 | 3.34 | 2.49 | 3.40 |
3 | 1.85 | 2.00 | 1.24 | 1.04 | 2.16 | 2.06 | 1.85 |
4 | 24.67 | 23.00 | 23.84 | 22.61 | 24.33 | 20.90 | 36.50 |
5 | 0.60 | 1.20 | 1.24 | 0.60 | 0.52 | 0.43 | 1.04 |
6 | 1.37 | 1.60 | 1.64 | 1.26 | 2.75 | 1.13 | 1.69 |
7 | 3.81 | 4.00 | 4.74 | 3.79 | 3.46 | 3.14 | 4.23 |
8 | 4.80 | 5.50 | 4.39 | 5.38 | 5.48 | 4.80 | 3.78 |
9 | 6.42 | 9.00 | 5.77 | 7.06 | 6.62 | 2.11 | 5.08 |
10 | 2.12 | 3.00 | 2.44 | 2.12 | 1.95 | 1.32 | 1.88 |
11 | 8.13 | 9.00 | 4.27 | 7.95 | 4.36 | 3.34 | 3.42 |
12 | 5.95 | 8.50 | 7.57 | 6.61 | 6.07 | 4.80 | 6.09 |
13 | 0.44 | 0.70 | 0.98 | 0.44 | 1.16 | 0.90 | 1.04 |
14 | 11.73 | 10.70 | 9.51 | 11.40 | 10.29 | 3.82 | 7.91 |
15 | 8.08 | 10.10 | 8.58 | 7.71 | 6.05 | 4.61 | 8.01 |
16 | 2.41 | 4.50 | 2.13 | 2.34 | 2.63 | 2.36 | 1.49 |
17 | 3.61 | 2.30 | 4.65 | 4.11 | 4.24 | 4.13 | 4.04 |
18 | 1.72 | 1.90 | 2.51 | 1.71 | 1.61 | 1.64 | 1.91 |
19 | 3.41 | 2.99 | 4.11 | 3.13 | 1.94 | 1.51 | 2.82 |
20 | 2.09 | 2.00 | 2.00 | 1.96 | 1.60 | 1.38 | 1.56 |
21 | 14.9 | 15 | 20.2 | — | 23.15 | — | — |
22 | 7.9 | 7.1 | 9.8 | — | 13.48 | — | — |
23 | 7.1 | 8 | 8.1 | — | 6.45 | — | — |
24 | 26 | 16 | 29.6 | — | 27.56 | — | — |
25 | 21.1 | 36 | 26.5 | — | 24.5 | — | — |
Tumor No. . | Manual Counting . | . | . | . | Digital Image Analysis . | . | . |
---|---|---|---|---|---|---|---|
. | Observer 1 (DP) . | Observer 1 (Ob 1) . | Observer 2 (Ob 2) . | Observer 3 (DP) . | QuantCenter (DP) . | HALO (DP) . | HALO (Ob 2) . |
1 | 6.31 | 6.20 | 5.85 | 5.92 | 5.70 | 4.70 | 5.72 |
2 | 1.95 | 2.00 | 2.40 | 1.72 | 3.34 | 2.49 | 3.40 |
3 | 1.85 | 2.00 | 1.24 | 1.04 | 2.16 | 2.06 | 1.85 |
4 | 24.67 | 23.00 | 23.84 | 22.61 | 24.33 | 20.90 | 36.50 |
5 | 0.60 | 1.20 | 1.24 | 0.60 | 0.52 | 0.43 | 1.04 |
6 | 1.37 | 1.60 | 1.64 | 1.26 | 2.75 | 1.13 | 1.69 |
7 | 3.81 | 4.00 | 4.74 | 3.79 | 3.46 | 3.14 | 4.23 |
8 | 4.80 | 5.50 | 4.39 | 5.38 | 5.48 | 4.80 | 3.78 |
9 | 6.42 | 9.00 | 5.77 | 7.06 | 6.62 | 2.11 | 5.08 |
10 | 2.12 | 3.00 | 2.44 | 2.12 | 1.95 | 1.32 | 1.88 |
11 | 8.13 | 9.00 | 4.27 | 7.95 | 4.36 | 3.34 | 3.42 |
12 | 5.95 | 8.50 | 7.57 | 6.61 | 6.07 | 4.80 | 6.09 |
13 | 0.44 | 0.70 | 0.98 | 0.44 | 1.16 | 0.90 | 1.04 |
14 | 11.73 | 10.70 | 9.51 | 11.40 | 10.29 | 3.82 | 7.91 |
15 | 8.08 | 10.10 | 8.58 | 7.71 | 6.05 | 4.61 | 8.01 |
16 | 2.41 | 4.50 | 2.13 | 2.34 | 2.63 | 2.36 | 1.49 |
17 | 3.61 | 2.30 | 4.65 | 4.11 | 4.24 | 4.13 | 4.04 |
18 | 1.72 | 1.90 | 2.51 | 1.71 | 1.61 | 1.64 | 1.91 |
19 | 3.41 | 2.99 | 4.11 | 3.13 | 1.94 | 1.51 | 2.82 |
20 | 2.09 | 2.00 | 2.00 | 1.96 | 1.60 | 1.38 | 1.56 |
21 | 14.9 | 15 | 20.2 | — | 23.15 | — | — |
22 | 7.9 | 7.1 | 9.8 | — | 13.48 | — | — |
23 | 7.1 | 8 | 8.1 | — | 6.45 | — | — |
24 | 26 | 16 | 29.6 | — | 27.56 | — | — |
25 | 21.1 | 36 | 26.5 | — | 24.5 | — | — |
DP, digital image analysis pathologist; Ob, observer; —, data unavailable.
aThe column names list the method of counting (manual counting vs Quantcenter vs HALO) and the observer performing the count. Listed in parentheses are observers choosing the hotspots. For cases 21 to 25, hotspots were chosen by observer 2.
Comparison Pair . | Pearson Correlation Coefficient . | Closeness of Match: No. (%) of Cases Within ±0.2xKi-67 Index . | No. of Grade-Discordant Cases . | κ . |
---|---|---|---|---|
Manual counting vs manual counting | ||||
Observer 1 (DP) vs observer 2 (Ob 2) (case level) | 0.976 | 10/20 (50) | 4 | 1 |
Observer 1 (DP) vs observer 1 (Ob 1) (case level) | 0.977 | 12/20 (60) | 4 | 0.63 |
Observer 1 (Ob 1) vs observer 2 (Ob 2) (case level) | 0.952 | 10/20 (50) | 0 | 0.63 |
Manual counting vs DIA | ||||
Observer 2 (Ob 2) vs HALO (Ob 2) (hotspot level) | 0.971 | 11/20 (55) | 2 | 0.81 |
Observer 2 (Ob 2) vs HALO (DP) (case level) | 0.949 | 7/20 (35) | 2 | 0.81 |
Observer 2 (Ob 2) vs QuantCenter (DP) (case level) | 0.978 | 10/20 (50) | 2 | 0.81 |
Observer 1 (Ob 1) vs HALO (Ob 2) (case level) | 0.902 | 7/20 (35) | 2 | 0.81 |
Observer 1 (Ob 1) vs HALO (DP) (case level) | 0.881 | 7/20 (35) | 4 | 0.81 |
Observer 1 (Ob 1) vs QuantCenter (DP) (case level) | 0.946 | 8/20 (40) | 4 | 0.81 |
Observer 1 (DP) vs HALO (Ob 2) (case level) | 0.942 | 9/20 (45) | 2 | 0.63 |
Observer 1 (DP) vs HALO (DP) (hotspot level) | 0.922 | 9/20 (45) | 2 | 0.63 |
Observer 1 (DP) vs QuantCenter (DP) (hotspot level) | 0.976 | 13/20 (65) | 2 | 0.63 |
DIA vs DIA | ||||
QuantCenter (DP) vs HALO (DP) (hotspot level) | 0.953 | 10/20 (50) | 2 | 0.81 |
QuantCenter (DP) vs HALO (Ob 2) (case level) | 0.969 | 9/20 (45) | 0 | 1 |
HALO (DP) vs HALO (Ob 2) (case level) | 0.980 | 6/20 (30) | 2 | 0.81 |
Comparison Pair . | Pearson Correlation Coefficient . | Closeness of Match: No. (%) of Cases Within ±0.2xKi-67 Index . | No. of Grade-Discordant Cases . | κ . |
---|---|---|---|---|
Manual counting vs manual counting | ||||
Observer 1 (DP) vs observer 2 (Ob 2) (case level) | 0.976 | 10/20 (50) | 4 | 1 |
Observer 1 (DP) vs observer 1 (Ob 1) (case level) | 0.977 | 12/20 (60) | 4 | 0.63 |
Observer 1 (Ob 1) vs observer 2 (Ob 2) (case level) | 0.952 | 10/20 (50) | 0 | 0.63 |
Manual counting vs DIA | ||||
Observer 2 (Ob 2) vs HALO (Ob 2) (hotspot level) | 0.971 | 11/20 (55) | 2 | 0.81 |
Observer 2 (Ob 2) vs HALO (DP) (case level) | 0.949 | 7/20 (35) | 2 | 0.81 |
Observer 2 (Ob 2) vs QuantCenter (DP) (case level) | 0.978 | 10/20 (50) | 2 | 0.81 |
Observer 1 (Ob 1) vs HALO (Ob 2) (case level) | 0.902 | 7/20 (35) | 2 | 0.81 |
Observer 1 (Ob 1) vs HALO (DP) (case level) | 0.881 | 7/20 (35) | 4 | 0.81 |
Observer 1 (Ob 1) vs QuantCenter (DP) (case level) | 0.946 | 8/20 (40) | 4 | 0.81 |
Observer 1 (DP) vs HALO (Ob 2) (case level) | 0.942 | 9/20 (45) | 2 | 0.63 |
Observer 1 (DP) vs HALO (DP) (hotspot level) | 0.922 | 9/20 (45) | 2 | 0.63 |
Observer 1 (DP) vs QuantCenter (DP) (hotspot level) | 0.976 | 13/20 (65) | 2 | 0.63 |
DIA vs DIA | ||||
QuantCenter (DP) vs HALO (DP) (hotspot level) | 0.953 | 10/20 (50) | 2 | 0.81 |
QuantCenter (DP) vs HALO (Ob 2) (case level) | 0.969 | 9/20 (45) | 0 | 1 |
HALO (DP) vs HALO (Ob 2) (case level) | 0.980 | 6/20 (30) | 2 | 0.81 |
DIA, digital image analysis; DP, digital image analysis pathologist; Ob, observer.
aThe comparison pairs are grouped under three headings. For each method, the name/label in the leftmost column indicates the observer or software device performing the count followed by the person choosing the hotspot in parentheses. The grade-discordant cases column enumerates the number of cases that fall under different World Health Organization grades (G1/G2/G3) for each comparison, and κ indicates the Cohen’s κ statistic value for the pair.
Comparison Pair . | Pearson Correlation Coefficient . | Closeness of Match: No. (%) of Cases Within ±0.2xKi-67 Index . | No. of Grade-Discordant Cases . | κ . |
---|---|---|---|---|
Manual counting vs manual counting | ||||
Observer 1 (DP) vs observer 2 (Ob 2) (case level) | 0.976 | 10/20 (50) | 4 | 1 |
Observer 1 (DP) vs observer 1 (Ob 1) (case level) | 0.977 | 12/20 (60) | 4 | 0.63 |
Observer 1 (Ob 1) vs observer 2 (Ob 2) (case level) | 0.952 | 10/20 (50) | 0 | 0.63 |
Manual counting vs DIA | ||||
Observer 2 (Ob 2) vs HALO (Ob 2) (hotspot level) | 0.971 | 11/20 (55) | 2 | 0.81 |
Observer 2 (Ob 2) vs HALO (DP) (case level) | 0.949 | 7/20 (35) | 2 | 0.81 |
Observer 2 (Ob 2) vs QuantCenter (DP) (case level) | 0.978 | 10/20 (50) | 2 | 0.81 |
Observer 1 (Ob 1) vs HALO (Ob 2) (case level) | 0.902 | 7/20 (35) | 2 | 0.81 |
Observer 1 (Ob 1) vs HALO (DP) (case level) | 0.881 | 7/20 (35) | 4 | 0.81 |
Observer 1 (Ob 1) vs QuantCenter (DP) (case level) | 0.946 | 8/20 (40) | 4 | 0.81 |
Observer 1 (DP) vs HALO (Ob 2) (case level) | 0.942 | 9/20 (45) | 2 | 0.63 |
Observer 1 (DP) vs HALO (DP) (hotspot level) | 0.922 | 9/20 (45) | 2 | 0.63 |
Observer 1 (DP) vs QuantCenter (DP) (hotspot level) | 0.976 | 13/20 (65) | 2 | 0.63 |
DIA vs DIA | ||||
QuantCenter (DP) vs HALO (DP) (hotspot level) | 0.953 | 10/20 (50) | 2 | 0.81 |
QuantCenter (DP) vs HALO (Ob 2) (case level) | 0.969 | 9/20 (45) | 0 | 1 |
HALO (DP) vs HALO (Ob 2) (case level) | 0.980 | 6/20 (30) | 2 | 0.81 |
Comparison Pair . | Pearson Correlation Coefficient . | Closeness of Match: No. (%) of Cases Within ±0.2xKi-67 Index . | No. of Grade-Discordant Cases . | κ . |
---|---|---|---|---|
Manual counting vs manual counting | ||||
Observer 1 (DP) vs observer 2 (Ob 2) (case level) | 0.976 | 10/20 (50) | 4 | 1 |
Observer 1 (DP) vs observer 1 (Ob 1) (case level) | 0.977 | 12/20 (60) | 4 | 0.63 |
Observer 1 (Ob 1) vs observer 2 (Ob 2) (case level) | 0.952 | 10/20 (50) | 0 | 0.63 |
Manual counting vs DIA | ||||
Observer 2 (Ob 2) vs HALO (Ob 2) (hotspot level) | 0.971 | 11/20 (55) | 2 | 0.81 |
Observer 2 (Ob 2) vs HALO (DP) (case level) | 0.949 | 7/20 (35) | 2 | 0.81 |
Observer 2 (Ob 2) vs QuantCenter (DP) (case level) | 0.978 | 10/20 (50) | 2 | 0.81 |
Observer 1 (Ob 1) vs HALO (Ob 2) (case level) | 0.902 | 7/20 (35) | 2 | 0.81 |
Observer 1 (Ob 1) vs HALO (DP) (case level) | 0.881 | 7/20 (35) | 4 | 0.81 |
Observer 1 (Ob 1) vs QuantCenter (DP) (case level) | 0.946 | 8/20 (40) | 4 | 0.81 |
Observer 1 (DP) vs HALO (Ob 2) (case level) | 0.942 | 9/20 (45) | 2 | 0.63 |
Observer 1 (DP) vs HALO (DP) (hotspot level) | 0.922 | 9/20 (45) | 2 | 0.63 |
Observer 1 (DP) vs QuantCenter (DP) (hotspot level) | 0.976 | 13/20 (65) | 2 | 0.63 |
DIA vs DIA | ||||
QuantCenter (DP) vs HALO (DP) (hotspot level) | 0.953 | 10/20 (50) | 2 | 0.81 |
QuantCenter (DP) vs HALO (Ob 2) (case level) | 0.969 | 9/20 (45) | 0 | 1 |
HALO (DP) vs HALO (Ob 2) (case level) | 0.980 | 6/20 (30) | 2 | 0.81 |
DIA, digital image analysis; DP, digital image analysis pathologist; Ob, observer.
aThe comparison pairs are grouped under three headings. For each method, the name/label in the leftmost column indicates the observer or software device performing the count followed by the person choosing the hotspot in parentheses. The grade-discordant cases column enumerates the number of cases that fall under different World Health Organization grades (G1/G2/G3) for each comparison, and κ indicates the Cohen’s κ statistic value for the pair.
![Correlation coefficients between manual counting and all digital image analysis (DIA) methods. In each panel (A, observer 2; B, observer 1; C, observer 1) (clinical reporting), manual counts are plotted along the x-axis against all corresponding DIA Ki-67 values (both hotspot and case level) from both platforms (QuantCenter and HALO) on the y-axis. A, P < .001; Pearson r = 0.93; 99% confidence interval [CI], 0.87-0.97. B, P < .001; Pearson r = 0.91; 99% CI, 0.83-0.95. C, P < .001; Pearson r = 0.88; 99% CI, 0.77-0.94.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/ajcp/156/4/10.1093_ajcp_aqaa275/3/m_aqaa275_fig3.jpeg?Expires=1748447749&Signature=kxyA6~FQO-c1a4kw-0Y6WCGZawpjusMdgKL70OIsJRlswm31Shk8GckhQwZklTCFCctmRL8yXgg4Qk1KiSeURG8QEelx8ioNl3PLN8B68JCdrkcVJRB9Z1Wr5OXgxMfVyxRdIEvUKi7KZfIosvuYvGpMzeq5E18P0iZBskN615NYTn5qDS-XOPff7NWjegpyk1GPDg~XR4CLcytboQkDZhSDPZB-JTIi4Qh4bxMrgckfr-VC3-DrHNhlWzkbGivSdKEbDpMk~OADps6wz2~ZY~s1udaYCsT7jGY8UqhNxOaoRYUadcGCx2Rho5JFbEUiWRQ4qeCgTElb87BOUdMXzg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Correlation coefficients between manual counting and all digital image analysis (DIA) methods. In each panel (A, observer 2; B, observer 1; C, observer 1) (clinical reporting), manual counts are plotted along the x-axis against all corresponding DIA Ki-67 values (both hotspot and case level) from both platforms (QuantCenter and HALO) on the y-axis. A, P < .001; Pearson r = 0.93; 99% confidence interval [CI], 0.87-0.97. B, P < .001; Pearson r = 0.91; 99% CI, 0.83-0.95. C, P < .001; Pearson r = 0.88; 99% CI, 0.77-0.94.
Close-Match Analysis
Among possible meaningful pairs (expert vs DIA, DIA vs DIA, expert vs expert), the number of cases that were within the ±0.2xKi-67 index of each other ranged from 6 of 20 to 13 of 20, with the lowest closeness of match occurring in HALO measurements from different hotspots (Table 4). The median close match was 45%. The highest (13/20; 65%) occurred with QuantCenter measurements compared with the expert pathologist’s manual counting of study hotspots. Mean close-match levels between experts were comparable to expert-vs-DIA and DIA-vs-DIA matches (53% vs 45% vs 41% of cases, respectively). The number of closely matched cases was not significantly different between within-hotspot and case-level measurement comparisons (mean, 8.8 ± 1.3 vs 10.75 ± 2.7; P = .09, t test).
The five separately analyzed cases showed a mean close match of 40% for expert-vs-expert comparisons and a slightly higher mean close match of 46% for expert-vs-DIA comparisons.
Close concordance by strict criteria (>80% cases within ±0.2xKi-67) was not seen with DIA-vs-expert or expert-vs-expert comparisons (Table 4).
Comparison of Dispersion of Values Between Manual Counting and DIA Methods
Mean Ki-67 (%) values by DIA (three sets of counting) and manual counting (four sets of measurements) fell within 95% CI of each other in 7 (35%) of 20 cases Table 5❚ and Figure 4. Manual counting, overall, showed a lower dispersion of counts, with mean width of the 95% CIs being 1.14%, compared with a 2.057% seen with DIA methods. In nonoverlapping cases, manual counting values with wide CIs for the means (eg, cases 9, 11, 12, 14, and 15; see Figure 4) and constrained manual and DIA counts but with entirely nonoverlapping confidence intervals counts (eg, cases 2, 13, and 20; Figure 4) were both seen.
Mean Ki-67 Proliferation Index (%) and Bootstrapped 95% Confidence Intervals (1,000 Samples) by Manual Counting and Digital Image Analysis
Case No. . | Manual Counting, Mean (95% CI) . | Digital Image Analysis, Mean (95% CI) . |
---|---|---|
1 | 6.07 (5.88-6.26) | 5.38 (4.70-5.72) |
2 | 2.02 (1.79-2.29) | 3.08 (2.49-3.40) |
3 | 1.53 (1.14-1.96) | 2.02 (1.85-2.12) |
4 | 23.53 (22.80-24.25) | 27.25 (20.90-36.50) |
5 | .91 (.60-1.22) | .66 (.43-1.04) |
6 | 1.47 (1.32-1.62) | 1.86 (1.13-2.40) |
7 | 4.09 (3.80-4.51) | 3.61 (3.14-4.23) |
8 | 5.02 (4.59-5.47) | 4.69 (3.78-5.25) |
9 | 7.06 (6.10-8.52) | 4.60 (2.11-6.11) |
10 | 2.42 (2.12-2.78) | 1.72 (1.32-1.93) |
11 | 7.34 (5.24-8.78) | 3.71 (3.34-4.04) |
12 | 7.16 (6.28-8.04) | 5.65 (4.80-6.09) |
13 | .64 (.44-.85) | 1.03 (.90-1.12) |
14 | 10.84 (9.98-11.57) | 7.34 (3.82-9.49) |
15 | 8.62 (7.90-9.60) | 6.22 (4.61-8.01) |
16 | 2.85 (2.20-3.98) | 2.16 (1.49-2.54) |
17 | 3.67 (2.63-4.39) | 4.14 (4.04-4.20) |
18 | 1.96 (1.72-2.31) | 1.72 (1.62-1.91) |
19 | 3.41 (3.02-3.87) | 2.09 (1.51-2.82) |
20 | 2.01 (1.97-2.07) | 1.51 (1.38-1.59) |
Case No. . | Manual Counting, Mean (95% CI) . | Digital Image Analysis, Mean (95% CI) . |
---|---|---|
1 | 6.07 (5.88-6.26) | 5.38 (4.70-5.72) |
2 | 2.02 (1.79-2.29) | 3.08 (2.49-3.40) |
3 | 1.53 (1.14-1.96) | 2.02 (1.85-2.12) |
4 | 23.53 (22.80-24.25) | 27.25 (20.90-36.50) |
5 | .91 (.60-1.22) | .66 (.43-1.04) |
6 | 1.47 (1.32-1.62) | 1.86 (1.13-2.40) |
7 | 4.09 (3.80-4.51) | 3.61 (3.14-4.23) |
8 | 5.02 (4.59-5.47) | 4.69 (3.78-5.25) |
9 | 7.06 (6.10-8.52) | 4.60 (2.11-6.11) |
10 | 2.42 (2.12-2.78) | 1.72 (1.32-1.93) |
11 | 7.34 (5.24-8.78) | 3.71 (3.34-4.04) |
12 | 7.16 (6.28-8.04) | 5.65 (4.80-6.09) |
13 | .64 (.44-.85) | 1.03 (.90-1.12) |
14 | 10.84 (9.98-11.57) | 7.34 (3.82-9.49) |
15 | 8.62 (7.90-9.60) | 6.22 (4.61-8.01) |
16 | 2.85 (2.20-3.98) | 2.16 (1.49-2.54) |
17 | 3.67 (2.63-4.39) | 4.14 (4.04-4.20) |
18 | 1.96 (1.72-2.31) | 1.72 (1.62-1.91) |
19 | 3.41 (3.02-3.87) | 2.09 (1.51-2.82) |
20 | 2.01 (1.97-2.07) | 1.51 (1.38-1.59) |
CI, confidence interval.
Mean Ki-67 Proliferation Index (%) and Bootstrapped 95% Confidence Intervals (1,000 Samples) by Manual Counting and Digital Image Analysis
Case No. . | Manual Counting, Mean (95% CI) . | Digital Image Analysis, Mean (95% CI) . |
---|---|---|
1 | 6.07 (5.88-6.26) | 5.38 (4.70-5.72) |
2 | 2.02 (1.79-2.29) | 3.08 (2.49-3.40) |
3 | 1.53 (1.14-1.96) | 2.02 (1.85-2.12) |
4 | 23.53 (22.80-24.25) | 27.25 (20.90-36.50) |
5 | .91 (.60-1.22) | .66 (.43-1.04) |
6 | 1.47 (1.32-1.62) | 1.86 (1.13-2.40) |
7 | 4.09 (3.80-4.51) | 3.61 (3.14-4.23) |
8 | 5.02 (4.59-5.47) | 4.69 (3.78-5.25) |
9 | 7.06 (6.10-8.52) | 4.60 (2.11-6.11) |
10 | 2.42 (2.12-2.78) | 1.72 (1.32-1.93) |
11 | 7.34 (5.24-8.78) | 3.71 (3.34-4.04) |
12 | 7.16 (6.28-8.04) | 5.65 (4.80-6.09) |
13 | .64 (.44-.85) | 1.03 (.90-1.12) |
14 | 10.84 (9.98-11.57) | 7.34 (3.82-9.49) |
15 | 8.62 (7.90-9.60) | 6.22 (4.61-8.01) |
16 | 2.85 (2.20-3.98) | 2.16 (1.49-2.54) |
17 | 3.67 (2.63-4.39) | 4.14 (4.04-4.20) |
18 | 1.96 (1.72-2.31) | 1.72 (1.62-1.91) |
19 | 3.41 (3.02-3.87) | 2.09 (1.51-2.82) |
20 | 2.01 (1.97-2.07) | 1.51 (1.38-1.59) |
Case No. . | Manual Counting, Mean (95% CI) . | Digital Image Analysis, Mean (95% CI) . |
---|---|---|
1 | 6.07 (5.88-6.26) | 5.38 (4.70-5.72) |
2 | 2.02 (1.79-2.29) | 3.08 (2.49-3.40) |
3 | 1.53 (1.14-1.96) | 2.02 (1.85-2.12) |
4 | 23.53 (22.80-24.25) | 27.25 (20.90-36.50) |
5 | .91 (.60-1.22) | .66 (.43-1.04) |
6 | 1.47 (1.32-1.62) | 1.86 (1.13-2.40) |
7 | 4.09 (3.80-4.51) | 3.61 (3.14-4.23) |
8 | 5.02 (4.59-5.47) | 4.69 (3.78-5.25) |
9 | 7.06 (6.10-8.52) | 4.60 (2.11-6.11) |
10 | 2.42 (2.12-2.78) | 1.72 (1.32-1.93) |
11 | 7.34 (5.24-8.78) | 3.71 (3.34-4.04) |
12 | 7.16 (6.28-8.04) | 5.65 (4.80-6.09) |
13 | .64 (.44-.85) | 1.03 (.90-1.12) |
14 | 10.84 (9.98-11.57) | 7.34 (3.82-9.49) |
15 | 8.62 (7.90-9.60) | 6.22 (4.61-8.01) |
16 | 2.85 (2.20-3.98) | 2.16 (1.49-2.54) |
17 | 3.67 (2.63-4.39) | 4.14 (4.04-4.20) |
18 | 1.96 (1.72-2.31) | 1.72 (1.62-1.91) |
19 | 3.41 (3.02-3.87) | 2.09 (1.51-2.82) |
20 | 2.01 (1.97-2.07) | 1.51 (1.38-1.59) |
CI, confidence interval.

A, Mean Ki-67 proliferation index (%) and bootstrapped 95% confidence intervals (1,000 samples). Black horizontal bars depict Ki-67 by manual counting (four observers) and blue digital image analysis (three sets). Open circles indicate the upper and lower limits and means. For values, see Table 4: case 4 is excluded for ease of plotting. B, Ki-67 proliferation index values by all methods. Colored lines depict each method of Ki-67 measurement and dots depict values obtained. For each, the hotspot selector is listed in parentheses (Ob 1, observer 1; Ob 2, observer 2). DP, pathologist performing digital image analysis.
WHO Tumor Grade Comparisons
Interobserver agreement between experts on the assigned WHO grade according to the Ki-67 index (<3%, grade 1; 3%-20%, grade 2; >20%, grade 3) showed κ values ranging from 0.63 to 1 (good to excellent). Agreement in expert-vs-DIA and DIA-vs-DIA comparisons was 0.63 to 0.81 and 0.81 to 1, respectively (Table 4). Despite a slightly better close match between experts, grade-discordant cases in expert-vs-DIA comparisons were seen at a rate no greater than in expert-vs-expert comparisons (P = .6, Fisher exact test) Table 6.
Tumor Grade According to the World Health Organization Cutoffs (<3%, Grade 1; 3%-20%, Grade 2; >20%, Grade 3) From Different Methodsa
. | Manual Counting . | . | . | . | Digital Image Analysis . | . | . |
---|---|---|---|---|---|---|---|
Case No. . | Observer 1 (DP) . | Observer 1 (Ob 1) . | Observer 2 (Ob 2) . | Observer 3 (DP) . | QuantCenter (DP) . | HALO (DP) . | HALO (Ob 2) . |
1 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
2 | G1 | G1 | G1 | G1 | G2 | G1 | G2 |
3 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
4 | G3 | G3 | G3 | G3 | G3 | G3 | G3 |
5 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
6 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
7 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
8 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
9 | G2 | G2 | G2 | G2 | G2 | G1 | G2 |
10 | G1 | G2 | G1 | G1 | G1 | G1 | G1 |
11 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
12 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
13 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
14 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
15 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
16 | G1 | G2 | G1 | G1 | G1 | G1 | G1 |
17 | G2 | G1 | G2 | G2 | G2 | G2 | G2 |
18 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
19 | G2 | G1 | G2 | G2 | G1 | G1 | G1 |
20 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
. | Manual Counting . | . | . | . | Digital Image Analysis . | . | . |
---|---|---|---|---|---|---|---|
Case No. . | Observer 1 (DP) . | Observer 1 (Ob 1) . | Observer 2 (Ob 2) . | Observer 3 (DP) . | QuantCenter (DP) . | HALO (DP) . | HALO (Ob 2) . |
1 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
2 | G1 | G1 | G1 | G1 | G2 | G1 | G2 |
3 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
4 | G3 | G3 | G3 | G3 | G3 | G3 | G3 |
5 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
6 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
7 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
8 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
9 | G2 | G2 | G2 | G2 | G2 | G1 | G2 |
10 | G1 | G2 | G1 | G1 | G1 | G1 | G1 |
11 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
12 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
13 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
14 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
15 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
16 | G1 | G2 | G1 | G1 | G1 | G1 | G1 |
17 | G2 | G1 | G2 | G2 | G2 | G2 | G2 |
18 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
19 | G2 | G1 | G2 | G2 | G1 | G1 | G1 |
20 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
DP, digital image analysis pathologist; Ob, observer.
aSee Table 3 for Ki-67 proliferation index (%) values. The column names list the method of counting (manual counting vs Quantcenter vs HALO) and the observer performing the count.
Tumor Grade According to the World Health Organization Cutoffs (<3%, Grade 1; 3%-20%, Grade 2; >20%, Grade 3) From Different Methodsa
. | Manual Counting . | . | . | . | Digital Image Analysis . | . | . |
---|---|---|---|---|---|---|---|
Case No. . | Observer 1 (DP) . | Observer 1 (Ob 1) . | Observer 2 (Ob 2) . | Observer 3 (DP) . | QuantCenter (DP) . | HALO (DP) . | HALO (Ob 2) . |
1 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
2 | G1 | G1 | G1 | G1 | G2 | G1 | G2 |
3 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
4 | G3 | G3 | G3 | G3 | G3 | G3 | G3 |
5 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
6 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
7 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
8 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
9 | G2 | G2 | G2 | G2 | G2 | G1 | G2 |
10 | G1 | G2 | G1 | G1 | G1 | G1 | G1 |
11 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
12 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
13 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
14 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
15 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
16 | G1 | G2 | G1 | G1 | G1 | G1 | G1 |
17 | G2 | G1 | G2 | G2 | G2 | G2 | G2 |
18 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
19 | G2 | G1 | G2 | G2 | G1 | G1 | G1 |
20 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
. | Manual Counting . | . | . | . | Digital Image Analysis . | . | . |
---|---|---|---|---|---|---|---|
Case No. . | Observer 1 (DP) . | Observer 1 (Ob 1) . | Observer 2 (Ob 2) . | Observer 3 (DP) . | QuantCenter (DP) . | HALO (DP) . | HALO (Ob 2) . |
1 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
2 | G1 | G1 | G1 | G1 | G2 | G1 | G2 |
3 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
4 | G3 | G3 | G3 | G3 | G3 | G3 | G3 |
5 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
6 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
7 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
8 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
9 | G2 | G2 | G2 | G2 | G2 | G1 | G2 |
10 | G1 | G2 | G1 | G1 | G1 | G1 | G1 |
11 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
12 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
13 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
14 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
15 | G2 | G2 | G2 | G2 | G2 | G2 | G2 |
16 | G1 | G2 | G1 | G1 | G1 | G1 | G1 |
17 | G2 | G1 | G2 | G2 | G2 | G2 | G2 |
18 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
19 | G2 | G1 | G2 | G2 | G1 | G1 | G1 |
20 | G1 | G1 | G1 | G1 | G1 | G1 | G1 |
DP, digital image analysis pathologist; Ob, observer.
aSee Table 3 for Ki-67 proliferation index (%) values. The column names list the method of counting (manual counting vs Quantcenter vs HALO) and the observer performing the count.
Discussion
We assessed and characterized the level of agreement between manual counting–based and semiautomated DIA-based measurement of Ki-67 proliferation index (%) by several metrics. Ki-67 measurement by DIA showed excellent agreement with expert-performed manual counting values and with each other. To our knowledge, this is the first study to perform an interplatform comparison between two widely used commercial DIA platforms, 3DHistech Quantcenter and HALO, in GEP-NETs. The high interplatform reproducibility is similar to results obtained in Ki-67 measurement in breast carcinoma using the same two platforms.21
The employment of a threshold-based requirement yielded several insights into the nature of Ki-67 assessment by manual counting. While high correlation coefficients suggest a high degree of agreement, the introduction of a Ki-67 index requirement to be within 0.2x of a target value showed that human expert-vs-expert close-match reproducibility was met ~53% on average, which is comparable to a 45% average close-match rate in expert-vs-DIA reproducibility. When strict criteria for case counts meeting the threshold are added (≥80% cases within ±0.2xKi-67 index by manual counting), close concordance is not seen by either method (Table 2). In other words, the performance of the de facto gold standard was equivalent to DIA.
The ostensible superiority of manual counting rests on the notion that expert morphologic assessment directly underpins the quantitative evaluation. The task of meaningful separation of positive and negative tumor nuclei from stromal, inflammatory, and endothelial cells and other noncellular elements is thought to contribute to the accuracy of the Ki-67 proliferation index.10 On a per-nucleus basis, however, decisions to include or exclude objects in the count are driven by a multitude of factors that include the morphologic features, judgment of whether it belongs in the plane of focus and is to be included, and whether it represents a faintly immunopositive vs a darkly counterstained negative nucleus, in addition to whether the object truly represents a tumor nucleus. At this level, fatigue, interpretative variation, differing stringency levels for object identification, and the limitations imposed by print/monitor resolution effects—all causes for variability with visual assessment—have a greater impact. With a perfect image substrate, say a fully white background with a low number of widely spaced perfect dark circular objects as nuclei, decisions can be expected to be driven with a high degree of reproducibility. With imperfectly defined, less-than-ideally contrasted objects, numerous admixed mimickers, and the number of decisions required to be made in the evaluation of a hotspot (~1,000), manual counting itself is a noisy process. The resulting variability ensures that the same image/field counted by two different human observers will result in slightly different counts. Analogously, similar variability is present in image analysis algorithms whose parameters such as size limits, color characteristics, and edge contrast function as the machine vision counterparts of human morphologic assessment and result in the inclusion or exclusion of objects as nuclei.26 In both instances, the uncertainty involved in within-hotspot enumeration is additive to that imposed by the morphologic variation of nuclei between different hotspots. The net effect, as the results above show, is emergence of variability inherent in nuclear object enumeration. This variability is comparable in extent between manual counting and semiautomated image analysis, as well as possibly irreducible.
The findings suggest the existence of a “concordance ceiling” independent of the method of assessment. This has implications for the setting up and calibration of DIA algorithms in routine practice: typically, an expert user “tweaks” algorithm settings based on visual inspection of nuclear identification to finalize an algorithm state. Even if an algorithm instance can be thereby set to achieve high or near-perfect concordance with the expert, its expected rate of close agreement with other independent observers would be no more than what was possible between the observers themselves.
Interobserver variability in case-level evaluation of Ki-67 has been evaluated in the literature. In a recent large-scale study28 involving a panel of experts (n = 23) performing manual Ki-67 measurement in breast carcinoma on step-sectioned glass slides, there was a ~10% range of variability in median Ki-67 between observers in evaluating 30 cases. However, in the range comparable to values of the Ki-67 index (%) in the present study (0%-20%), in examining their raw data, the average difference between observers is much higher (18.5%). Saadeh et al26 reported interobserver variability in the counts of negative (counterstained, or blue) nuclei between three observers in gastrointestinal neuroendocrine tumors; they did not, however, report significantly large differences of the Ki-67 index. Volynskaya et al29 compared manual counting with automated analysis in primary and metastatic gastrointestinal neuroendocrine tumors. Four observers performed manual counts on printed hotspot images from 20 cases that were subject to DIA counting. A maximum of ~15% range in variation in absolute value terms between observers was noted in Ki-67 counts, but this was mainly confined to cases with high Ki-67 values, whereas low dispersion among counts with lower Ki-67 values was found. Apart from the study examining breast carcinoma, the reported variability and the findings of the present study are comparable to the above studies.
Our study has certain limitations. Although each slide was subjected to multiple rounds of Ki-67 measurement by different methods, the total number of cases subject to full analysis (n = 20) is low. The wider 95% CIs for mean Ki-67 (%) from manual counting and the relative inability of DIA methods to attain Ki-67 values close to manual counting could be a reflection of the low numbers of cases. There are additional identifiable sources of variability arising from the methodology. First, the Ki-67 values obtained at clinical reporting were included in the analysis. While this incorporates data from real-world reporting and was performed by a single pathologist (observer 1) using the same method throughout, the measurements were performed over a long period of time and under different conditions compared with the study slides. Second, both 3DHistech QuantCenter and HALO CellQuant algorithms were tuned to identify neuroendocrine tumor nuclei by a single pathologist. Another possible deficiency in the data could be the low number of cases with Ki-67 values close to the G2/G3 thresholds. Further studies with larger sample sizes and/or different software platforms would be needed to explore whether a higher “close match” with manual counting is possible with DIA Ki-67 measurement.
It is frequently lost that hotspot manual counting owes its emergence as a reproducible and clinically adopted method more to the limitations of what is possible30,31 in routine practice. With WSI analysis, a proliferation index can be calculated for larger samples of tumor nuclei approaching what is possible in quantitative analyses in flow cytometry. There may be a need to evolve new thresholds based on WSI analysis that may permit a more granular correlation with clinical outcomes.
In conclusion, based on our findings, we propose a possible validation framework for digital Ki-67 measurement in tumors to examine whether (1) the DIA Ki-67 index (%) values exhibit a high degree of correlation (r > 0.9) with manual counting with the same hotspot images under uniform conditions of immunostaining, counterstaining, and WSI acquisition and (2) the number of cases in which a close match—defined as the number of cases with the DIA Ki-67 index (%) within ±0.2xKi-67 index by manual counting—is 50% or more. Our results show that expert-vs-expert Ki-67 measurements exhibit the same variability and reproducibility as DIA: κ and close count rates were no worse for manual count–DIA comparisons than for manual counts between experts. Given the noninferiority and substantial time savings, we support more widespread adoption of DIA Ki-67 measurement in routine clinical practice.
This work was supported by NIH grant P50 CA174521-01A1 (A.M.B.).
References