Digital quantification of somatostatin receptor subtype 2a immunostaining: a validation study

Objective The aim of this study was to develop an open-source and reproducible digital quantitative analysis (DIA) of somatostatin receptor subtype 2a (SST2) staining in formalin-fixed paraffin-embedded tissues of pancreatic neuroendocrine tumors (panNETs) and growth hormone (GH)-secreting pituitary adenomas (GHomas). Design SST2 immunostaining of 18 panNETs and 39 GHomas was assessed using a novel DIA protocol and compared with a widely used semi-quantitative immunoreactivity score (IRS). Methods The DIA software calculates the staining intensity/area and the percentage of positive cells (%PC). Four representative images were selected for each sample by two independent selectors (S1 and S2), with the analysis performed by two independent analyzers (A1 and A2). Agreement between observers was calculated using the concordance correlation coefficient (CCC). Results In panNETs, the CCC ranged 0.935–0.977 for intensity/area and 0.942–0.983 for %PC. In GHomas, the CCC ranged 0.963–0.997 for intensity/area and 0.979–0.990 for %PC. In both panNETs and GHomas, the DIA staining intensity was strongly correlated with the IRS (Spearman rho: 0.916–0.969, P < 0.001), as well as the DIA %PC with the IRS %PC (Spearman rh: 0.826–0.881, P < 0.001). In GHomas, the biochemical response to somatostatin receptor ligands correlated with SST2 expression, evaluated both as DIA intensity/area (Spearman rho: −0.448 to −0.527, P = 0.007–0.004) and DIA %PC (Spearman rho: −0.558 to −0.644, P ≤ 0.001). Conclusions The DIA has an excellent inter-observer agreement and showed a strong correlation with the widely used semi-quantitative IRS. The DIA protocol is an open-source, highly reproducible tool and provides a reliable quantitative evaluation of SST2 immunohistochemistry.


Introduction
Somatostatin receptors (SSTs) belong to a family of G-protein-coupled receptors widely expressed in the endocrine system and play a crucial role in the regulation of hormonal secretion and cell growth (1).Five different SST subtypes have been identified.Among these, the subtype 2 (SST 2 ) represents the main target of firstgeneration somatostatin receptor ligands (fg-SRLs) (1).Tumors arising from the endocrine system mostly retain Correspondence should be addressed to L J Hofland Email l.hofland@erasmusmc.nlEuropean Journal of Endocrinology (2022) 187, 399-411 SST expression,and therefore, fg-SRLs represent a valuable treatment option (2,3).
Fg-SRLs are widely used in the management of neuroendocrine tumors (NETs) to control hormonal hypersecretion (if any) and tumor growth (4,5).A positive SST 2 immunostaining is a favorable independent prognostic factor in NETs (6,7,8), and a correlation between SST 2 expression assessed by immunohistochemistry and functional imaging has been demonstrated as well (6,9).
Moreover, fg-SRLs represent the first-line medical treatment for GH-secreting pituitary adenomas (GHomas) (10).Several studies have evaluated the clinical, radiological and molecular factors that are able to predict the response to fg-SRLs (11).As expected, the expression of SST 2 evaluated by immunohistochemistry has a well-recognized role (12,13).In this light, the 2017 WHO classification proposed a routine evaluation of SST 2 immunostaining on paraffinembedded tissues of GHomas.However, the subsequent Consensus Statements on acromegaly management reiterated its investigational role, due to lack of validation and harmonization among the different scoring systems reported in the literature (14,15).Multiple methods have been proposed to assess SST 2 immunostaining, but no consensus has been reached, yet.Furthermore, the scoring systems currently in use rely on the subjective estimation of staining intensity and/or percentage of positive cells and, most importantly, are all expressed as semi-quantitative scales (16).
Here, we propose an open-source, novel digital quantification of SST 2 immunostaining performed on paraffin-embedded tissues by use of standard immunohistochemistry.We investigated the reproducibility of the method evaluating the agreement between different operators.Finally, we compared the results of the digital analysis with the widely used immunoreactivity score (IRS), which has been routinely used in our laboratory in the last 10 years (17,18,19,20,21).

Patients
Eighteen patients diagnosed for pancreatic NETs (panNETs), who underwent surgical resection of the primary tumor between 2004 and 2012 at the Erasmus Medical Center, were analyzed.Furthermore, 39 pituitary tumors from patients diagnosed with acromegaly, who underwent neurosurgery at our Institution, were evaluated.Data regarding patients' clinical characteristics, medical treatment before surgery and disease control (reported as age-adjusted IGF-1 values, normalized to the upper limit of normality (ULN)) were already described in a previous publication from our group (17).
A detailed description of patients' characteristics is reported in Table 1 and in the Results section.
Permission from the Institutional Review Board of the Erasmus MC was obtained.The study was performed retrospectively and according to the guidelines of the Central Committee on Research involving Human Subjects.
The GHoma tissue samples included in the current validation study were previously stained for SST 2 using a fully automated method (Ventana BenchMark ULTRA stainer, Tucson, Arizona, USA) and scored by use of the semi-quantitative IRS method, as already reported by Franck and colleagues (17).The anti-SST 2 rabbit MAB (BioTrend) was used at a dilution of 1:25.One patient of the original cohort was excluded due to the low quality of the hematoxylin staining.
SST 2 staining was quantified using both the IRS and the novel digital image analysis (DIA) method.For the DIA, the tissue slides were digitalized using the NanoZoomer 2.0 HT (Hamamatsu, Naka-ku, Hamamatsu City, Japan).

Digital image analysis
The DIA was performed with the open-source software Cellprofiler version 4.0.7 (the pipeline can be freely downloaded from https://cellprofiler.org/publishedpipelines) (23).
Before starting the software analysis, two manual steps were performed (Fig. 1): (i) independent selection of four representative images (including positive and negative areas, when applicable) for each slide by two investigators, hereafter defined as selector S 1 and S 2 ; (ii) independent definition of the region of interest (ROI; outline of the tumoral area) for each image by two investigators, hereafter defined as analyzer A 1 and A 2 (Fig. 2B).
Then, the DIA software converted the images into grayscale pictures, according to the selected staining method (Fig. 2D, E and Supplementary Fig. 1, see section on supplementary materials given at the end of this article) and quantified: number of cells, number of stained cells, intensity of the staining (total intensity: sum of all pixel intensity values) and ROI area (number of analyzed pixels).Finally, two measures were computed: %PC (calculated as (number of stained cells/total number of cells) × 100) and intensity/area (total intensity/ROI area).The intensity/area values ranged between 0 (no staining) and 1 (maximum staining) arbitrary units/ pixel.For these two variables, the value representing the measure of a single sample was calculated as the mean of each set of four images analyzed.Additionally, the mean of the measurements of the different selector/analyzer combinations was calculated.
To validate the DIA, we compared this novel method with the widely used IRS.In particular, the intensity/area was correlated with the 'total' IRS because both measures consider the staining intensity (DIA: total intensity; IRS: intensity score component) together with a parameter reflecting the distribution of the staining (DIA: ROI area, including both DAB positive and negative pixels; IRS: %PC).Similarly, we correlated the DIA percentage of positive cells with the corresponding component of the IRS.

Statistical analysis
Categorical data are presented as frequencies and percentages, while quantitative data are reported as mean ± S.D. or as median and interquartile range (IQR) where appropriate.The agreement between selectors/ analyzers was assessed by use of the Kruskal-Wallis test (or one-way ANOVA, where appropriate), as well as the concordance correlation coefficient (CCC).The CCC ranges between −1 and 1, where −1 represents complete disagreement, 0 absence of agreement, and 1 complete agreement.The complete agreement can only be obtained if the two vectors are identical.As previously reported, we considered absence of agreement a CCC lower than 0.800, acceptable agreement when the CCC was equal or higher than 0.800 and strong agreement a CCC equal or higher than 0.950 (24).The validation of the DIA with respect to the IRS and the association of the DIA with the biochemical data of the GHomas were performed using the Spearman's correlation coefficient.Assessment of the predictive discrimination of SST 2 expression to IGF-1 normalization during treatment with fg-SRLs, evaluated both with DIA and IRS, was performed using the receiveroperating characteristic (ROC) curve.The best-fitting cutoffs were then computed using the Youden index.
Statistical evaluation was performed using GraphPad Prism version 5.01 (GraphPad Software) and the R software, version 4.0.4.Differences were considered statistically significant at P < 0.05.

Pancreatic neuroendocrine tumors
Patient characteristics are presented in Table 1.Briefly, ten patients (56%) were females, and the mean age at diagnosis was 52.9 ± 11.9 years.Most patients were stage II (n = 6, 33%) and stage IV (n = 8, 44%).Most of panNETs had a low grade tumor (G1 n = 12, 67%).The majority of panNETs included in our study did not show hormonal hypersecretion (n = 12, 67%), although our cohort included also four insulinomas, one gastrinoma and one glucagonoma.Six patients (33%) were naïve to previous treatments at time of surgery, four cases (22%) received fg-SRL therapy before surgery, while neoadjuvant PRRT was administered in five cases (28%).The median SST 2 IRS was 5 out of the maximal score of 12 (IQR: 1.5-8.0),with a median IRS %PC of 3.5 out of 4 (IQR: 1.25-4.0)(Fig. 3 and Table 2).DIA of the different samples resulted in a median %PC between 40.4 and 49.8%, while the median intensity/area varied between 0.051 and 0.077 among the different combinations of selectors/analyzers (Fig. 3 and Table 2).The CCC between selectors/analyzers for DIA ranged between 0.935 -and  0.977 for the percentage of positive cells and between 0.942 and 0.983 for the intensity/area (Fig. 4A).The agreement between the independent selectors (S 1 vs S 2 ) was generally acceptable for the percentage of positive cells, whereas it was strong for the intensity/area (all but one case: S 1 -A 2 vs S 2 -A 1 , CCC 0.942; CI 0.857-0.977).Considering the same selector, concordance between different analyzers (A 1 vs A 2 ) was strong in all combinations for both the percentage of positive cells and the intensity/area values (Fig. 4A and Supplementary Figs 2, 3).

Correlation between DIA and IRS in panNETs
The DIA results showed a strong positive correlation with both the IRS %PC and the total IRS (Table 3).The correlations between DIA and IRS for the S 1 -A 1 analysis are depicted in Fig. 5A and C as a representative evaluation, while the results of the other combinations are reported in Supplementary Fig. 4. The Spearman's rho for the correlation between the DIA %PC and the IRS %PC value ranged between 0.826 and 0.874 for each selector/analyzer combination (P < 0.001 for all).Moreover, the correlation coefficient computed using the mean of the measurements of the different selector/analyzer combinations was comparable to that obtained from the single evaluations (Spearman's rho, 0.881; P < 0.001, Table 3).
As expected, the evaluation of the percentage of positive cells in the tumor tissues was not completely overlapping between DIA and IRS.Two samples that were categorized as negative by the IRS %PC showed about 20% of positive cells by the DIA (with a very low median value of the intensity/area of 0.007).On the other hand, the majority of the samples categorized as highly positive by the IRS %PC (score 4: >80% positive cells) were also highly positive on the DIA analysis (median value of 80.7%) (Fig. 5A and Supplementary Fig. 4B, D, F, H).
The DIA intensity/area and the total IRS showed a very strong correlation for each selector/analyzer combination (Spearman's rho ranged between 0.916 and 0.963, P < 0.001).Again, the mean of the measurements of the different selector/analyzer combinations showed a correlation coefficient comparable to those obtained with the single evaluations (Spearman's rho, 0.969; P < 0.001, Table 3).Noteworthy, the DIA allowed a better discrimination of the different samples included in the same IRS category.Particularly, the two patients classified as IRS 12 showed a clear difference in the DIA intensity/area (values ranging from 0.174 to 0.364; Fig. 5C and Supplementary Fig. 4A, C,  E, G).

GH-secreting pituitary adenomas
Patient characteristics are reported in Table 1.Briefly, 15 patients (40%) were females, the mean age at surgery was  40.8 ± 12.2 years and the vast majority (n = 36, 95%) had a macroadenoma at the time of diagnosis.Most of patients (n = 23, 60.5 %) were naïve to medical treatment, while eight subjects (21%) were treated with fg-SRL monotherapy and the remaining seven patients (18%) were treated with fg-SRLs in combination with the GH receptor antagonist pegvisomant before surgery (17).The median SST 2 IRS was 6.0 out of 12 (IQR, 2.5-12.0),with a median %PC of 3.0 out of 4 (IQR, 1.5-4.0)(Fig. 3 and Table 2).For DIA, the median %PC in the different samples ranged between 59.5 and 64.2%, while the median intensity/area varied between 0.114 and 0.120 among the different combinations of selectors/analyzers (Fig. 3 and Table 2).No significant association of SST 2 expression was observed with clinical variables such as age, gender and adenoma size.No significant difference was observed between the patients pretreated with fg-SRL monotherapy and those subjects naïve to medical treatment before surgery (both for intensity/area and %PC; data not shown).
As shown in Fig. 4B, the CCC showed a strong agreement between the different analyzer/selector combinations, ranging between 0.959 and 0.997 for the %PC and between 0.976 and 0.990 for the intensity/area (see also Supplementary Figs 5 and 6).

Correlation between DIA and IRS in GH-secreting pituitary adenomas
Similar to the findings in the panNET cohort, the DIA showed a strong positive correlation with the IRS in GHoma tissues as well (Table 4).The correlations between DIA and IRS for the S 1 -A 1 analysis are depicted in Fig. 5B and D as a representative evaluation, while the results of the other combinations are reported in Supplementary Fig. 7.
The Spearman's rho for the correlation between DIA %PC and the corresponding IRS %PC value ranged between 0.840 and 0.876 for each selector/analyzer combination (P < 0.001 for all).
Despite the strong correlations observed, the tissues categorized by the IRS %PC as score 2 (20-50 % of positive cells) and score 3 (50-80% of positive cells) showed a   5B and Supplementary Fig. 7B, D, F, H).
As observed in the panNET cohort, the DIA intensity/ area and the total IRS showed a very strong correlation for each selector/analyzer combination (Spearman's rho ranged between 0.921 and 0.928, P < 0.001).The mean of the measurements of all the different selector/analyzer combinations showed a correlation coefficient comparable to those obtained with the single evaluations (Spearman's rho, 0.924; P < 0.001, Table 4).
The DIA allowed a better discrimination of the different samples included in the same category by use of the IRS.In particular, among those samples classified with a total IRS of 12 (maximum score), we observed a wide range in the DIA intensity/area (values from 0.215 to 0.507; Fig. 5D and Supplementary Fig. 7A, C, E, G).

Correlations between DIA and biochemical response to fg-SRLs in GH-secreting pituitary adenomas
The DIA confirmed the negative correlation already observed between SST 2 IRS and the age-adjusted IGF-1 values reached after long-term fg-SRL treatment (namely, the highest SST 2 expression and the lowest IGF-1 levels) (17).The correlations obtained by the S 1 -A 1 analysis are depicted in Fig. 6B and D as a representative evaluation, while the other correlations are reported in Supplementary Fig. 8.

Discussion
In the present study, for the first time, we have performed a quantitative digital evaluation of SST 2 immunoreactivity in both panNET and GHoma samples, by use of a widely available open-source software.
In order to validate the DIA protocol, we compared the performance of this novel method to the widely used semi-quantitative IRS.The IRS was first established for the quantification of estrogen receptors in breast cancers (22), and it has been subsequently used by different groups for SST 2 evaluation in both GHomas (19,25) and NETs (26,27).SST 2 IRS has shown a strong correlation with the biochemical response to fg-SRL treatment in GHomas (17,19,25), as well as a moderate correlation with the results of functional imaging (namely, 68Ga-DOTA-NOC PET/CT) in NETs (26).
The strong positive correlation that we observed between the IRS and the DIA parameters supports the reliability of the DIA.The DIA performed equally well with respect to IRS in both panNETs and GHomas, despite the immunohistochemistry being performed using different protocols in the two disease groups (manual vs automated, respectively).
Our results highlight DIA as a highly reproducible method, since the CCC observed between the different combinations of selectors/analyzers were overall extremely satisfying.Unlike the currently available scoring systems, which rely on the subjective evaluation of the staining intensity and/or the percentage of positive cells (prone to inter-observer variability and difficult to be standardized), the DIA is by definition less-dependent on subjective estimation (28).
However, we have to acknowledge that our DIA protocol includes two non-automated steps, consisting of manual image selection, which may represent a potential source of bias.The selection of representative tumor areas was necessary because the CellProfiler software does not support analysis of whole slide scans.Since it is known that SST 2 expression can exhibit intratumoral heterogeneity in both panNETs and GHomas (29,30), we opted for the selection of four images for each slide as a balance between the complete representation of the tissue slide and the feasibility of the downstream analysis.The data showed an excellent agreement, again demonstrating the reproducibility of the DIA protocol.This was confirmed also for the second step, as CCC showed an excellent agreement between the different analyzers in both panNETs and GHomas, supporting the low impact of the manual ROI definition.
Noteworthy, using the mean of the measurements obtained from the different selectors/analyzers did not significantly improve the correlations observed between DIA and IRS, compared to the use of a single selector/ analyzer evaluation.This finding reflects the robustness of the CCC analysis and the reproducibility of the DIA.
In this light, we conclude that the analysis can be performed by a single operator without any major impact on the quality of the results.This could represent a further advantage compared to the routinely used semiquantitative methods that should be performed by two experienced pathologists, due to their potential high interand intra-observer variability.
Interestingly, the DIA showed a stronger discriminatory power compared to the IRS score.Among the tissues classified as IRS 12 (all had the maximum achievable score), the DIA intensity/area provided a wide range of high level values.Indeed, the DIA express the results as a continuous variable with a wide dynamic range, thus allowing a better characterization of the receptor staining, while the IRS has an intrinsic semi-quantitative nature and a more limited dynamic range.
As described above, our group previously reported that SST 2 IRS correlated with the age-adjusted IGF-1 levels after fg-SRL therapy in the cohort of acromegaly patients we analyzed by DIA in the present study (17).This correlation has been confirmed when evaluating the SST 2 expression using our novel DIA protocol.Giving the satisfactory sensitivity and specificity shown by the DIA in predicting IGF-1 normalization during fg-SRLs, together with the high reproducibility of the method, this novel analysis can represent an important tool for patients' selection.Interestingly, while the DIA intensity/area showed a comparable correlation and accuracy to the one previously described with IRS, a stronger correlation and a better accuracy were found using the DIA %PC values.These results suggest that the %PC closely relates to the hormonal response during fg-SRL treatment in GHomas (more than the staining intensity).Therefore, we hypothesize that the use of a continuous variable with a more precise discrimination (intrinsic characteristics of the DIA)   allowed us to better define this association compared to the IRS categorical classification.Due to the heterogeneous expression of SST 2 in GHomas, the proportion of receptorexpressing cells seems to be a key determinant of the overall therapeutic response to fg-SRLs.Watanabe and colleagues recently reported a quantitative digital analysis evaluation of SST 2 in panNET and gastro-intestinal (GI) NET tissues, performed using a commercially available software (namely, HALO Membrane) (31,32).The authors compared the DIA with different semi-quantitative scores (i.e.IRS and Volante score for GI-NETs and HER2 score for panNETs).An overall good concordance between the DIA and the manual semiquantitative scores was reported in both studies.However, as acknowledged by the authors themselves, the high cost of the software may represent a major limitation of this method.The optimization and validation of an opensource software (e.g.CellProfiler) could make the DIA more affordable and widely available.
Finally, our study has some limitations.Although we demonstrated that the two manual steps of our protocol do not affect the data reproducibility, the DIA evaluation using the CellProfiler software is not completely automated, and the expertise of a skilled pathologist cannot be ruled out when transposing this method into the daily clinical practice.Furthermore, additional studies aimed to assess the value of the DIA protocol in terms of correlations with clinical data are strongly needed, particularly with respect to NET patients.In the present cohort, only a minority of patients had functional imaging (namely, 111 In-pentetreotide scintigraphy) or SRLs treatment, therefore no definitive conclusion could be drawn.In addition, differently from the HALO Membrane algorithm used by Watanabe and colleagues, we have evaluated the whole-cell staining intensity, considering both membranous and cytoplasmic staining.Although some manual scores in use only evaluate the extent of SST 2 membranous staining (e.g.Volante score), whether the whole receptor expression needs to be considered (irrespective of the subcellular localization) is still matter of debate.In our opinion, in case of pre-surgical treatment with SRLs, the evaluation of the membranous staining alone could lead to an underestimation of the whole receptor pool of the tumor cells (at least in the setting of GHomas and functioning NETs).However, we are aware that only comparative studies aimed to address this issue, evaluating the best correlate with the clinical outcomes, can provide us a more robust direction.
In conclusion, the DIA protocol showed an excellent agreement between the different operators involved.Furthermore, the DIA showed a strong agreement with the widely used IRS, as well as a good correlation with the biochemical response to fg-SRL treatment in GH-secreting pituitary adenomas.
The DIA has a wide dynamic range and is expressed as a continuous variable, allowing us to perform a more detailed characterization of the receptor staining.Therefore, it can provide a more reliable quantitative evaluation of SST 2 immunostaining compared to the currently available semi-quantitative methods.

Figure 1
Figure 1Outline of the validation study.When the tissue was too small to allow the selection of four different images, the maximum possible number of images was acquired.SST 2 , somatostatin receptor subtype 2; ROI, region of interest; S 1 , selector 1; S 2 , selector 2; A 1 , analyzer 1; A 2 , analyzer 2.

Figure 2
Figure 2 Schematic representation of the digital analysis.Panel A: image identified by the selector (10× magnification).Panel B: region of interest (ROI) definition by the analyzer, consisting in the manual outlining of the tumor tissue with the exclusion of fibrotic tissue, vascular structures and all potential staining artifacts that may impact on SST 2 quantification.Panel C: higher magnification of Panel B. Panel D: magnification of the grayscale image produced by the software based on 3,3′-diaminobenzidine (DAB) staining (the areas with the weaker staining correspond to the darker areas on the grayscale image).Panel E: magnification of grayscale image produced by the software, based on the hematoxylin staining (the stained objects (nuclei) are converted to white and used for counting the nuclei) and automatic cell delimitation (green lines).

Figure 3
Figure 3 Representative images of SST 2 expression in panNETs and GH-secreting pituitary adenomas.Intensity/area and percentage of positive cells (%PC) are expressed as the mean of the different selectors/ analyzers combinations.Photos were performed at 20× magnification.

Figure 4
Figure 4 Detailed representation of the concordance correlation coefficients between the different combination of selectors/analyzers in panNETs (panel A) and GH-secreting pituitary adenomas (panel B).

Figure 5
Figure 5 Representative images of the positive correlation between the percentage of positive cells component of the IRS and the DIA's

Figure 7
Figure 7 Accuracy of the DIA and IRS for predicting IGF-1 normalization during fg-SRL monotherapy.Panel A shows the ROC curves obtained with DIA intensity/area ( gray line) and IRS (black line); panel B shows the ROC curves obtained with DIA %PC (gray line) and IRS %PC (black line).The dots represent the best-fitting cut-offs.Specificity and sensitivity values are reported in brackets.AUC, area under the curve; IGF-1, insulin growth factor 1; IRS, immunoreactivity score; %PC, percentage of positive cells.

Table 1
General and clinical characteristics of patients diagnosed with panNET or acromegaly included in the study.Data are presented as n (%) or as mean± s .d .
a Age at time of diagnosis; b Three patients received multiple treatments (one PRRT and long-term SRL therapy; one long-term SRL therapy and perioperative SRL administration, and another one PRRT and perioperative SRL administration); no information on treatment before surgery in one patient; c Continued until the surgery; d Age at time of surgery PRRT, peptide receptor radionuclide therapy; SRL, somatostatin receptor ligand.European Journal of Endocrinology 187:3 402 Original Research C Campana and others SST 2 digital image quantification https://eje.bioscientifica.com

Table 2 Median
values of the digital image analysis (DIA) and the immunoreactivity score (IRS).S

Table 3
Spearman's correlation coefficients between DIA and IRS in panNETs.

Table 4
Spearman's correlation coefficients between DIA and IRS in GH-secreting pituitary adenomas.