This interlaboratory study evaluated the reproducibility of the assessments of neuritic plaques and neurofibrillary tangles (NFTs)-the hallmark lesions of Alzheimer disease-and compared the staining between the BrainNet Europe centers. To reduce the topography-related inconsistencies in assessments, we used a 2-mm tissue microarray (TMA) technique. The TMA block included 42 core samples taken from 21 paraffin blocks. The assessments were done on Bielschowsky and Gallyas silver stains using an immunohistochemical (IHC) method with antibodies directed to beta-amyloid (IHC/Aβ) and hyperphosphorylated tau (IHC/HPtau). The staining quality and the assessments differed between the participants, being most diverse with Bielschowsky (good/acceptable stain in 53% of centers) followed by Gallyas (good/acceptable stain in 57%) and IHC/Aβ (good/acceptable stain in 71%). The most uniform staining quality and assessment was obtained with the IHC/HPtau method (good/acceptable stain in 94% of centers). The neuropathologic diagnostic protocol (Consortium to Establish a Registry for Alzheimer Disease, Braak and Braak, and the National Institute of Aging and Reagan [NIA-Reagan] Institute) that was used significantly influenced the agreement, being highest with NIA-Reagan (54%) recommendations. This agreement was improved by visualization of NFTs using the IHC/HPtau method. Therefore, the IHC/HPtau methodology to visualize NFTs and neuropil threads should be considered as a method of choice in a future diagnostic protocol for Alzheimer disease.
The neuropathologic diagnosis of Alzheimer disease (AD) is based on the detection and distribution of hallmark lesions such as neuritic plaques (NPs) and neurofibrillary tangles (NFTs). In 1985, the Diagnosis of Alzheimer Disease Research Workshop neuropathology panel stated that both senile plaques and NFTs were helpful diagnostic markers. This group recommended the use of Bielschowsky silver impregnation as one possible staining technique (1). Various staining techniques used for senile plaques or NPs and NFTs were evaluated in 1980s, and the results indicated that the highest number of both NPs and NFTs were detected using modified Bielschowsky silver impregnation (2, 3). Subsequently, in 1991, to produce more accurate and reliable neuropathologic criteria for AD, the Consortium to Establish a Registry for Alzheimer Disease (CERAD) published more detailed instructions (4). This protocol recommended the use of 6- to 8-μm-thick sections, silver stains such as the modified Bielschowsky, and a semiquantitative assessment of neocortical NPs (4), and defined senile plaques of neuritic type (NPs) as being those plaques with thickened silver-positive neurites. Also in 1991, Braak and Braak launched the staging of AD-related changes in which the emphasis was on the regional distribution of Gallyas silver-stained NFTs in 50-μm-thick sections (5). Six years later, in 1997, the consensus recommendations by the National Institute of Aging and Reagan (NIA-Reagan) Institute Working Group recommended that assessment of NPs and NFTs should be carried out according to both the CERAD protocol (4) and Braak and Braak staging (5) to estimate the likelihood that AD pathologic changes would underlie the symptoms of dementia (6). It was also recommended by the NIA-Reagan protocol that, in addition to silver-impregnation techniques (modified Bielschowsky or Gallyas), specific immunohistochemical (IHC) stains should also be used (6).
In summary, the likelihood that dementia is the result of AD lesions is high, intermediate, or low based on the detection and semiquantitative assessment of AD-related lesions in the postmortem brain. This emphasizes the importance of the staining used because it might be influenced by nonpathology-related conditions such as postmortem delay, fixation, embedding and technical practices, and the reliability in the quantification of lesions used by different evaluators.
The BrainNet Europe (BNE) consortium includes 20 centers. Brain banking for research purposes and neuropathologic diagnostic examination is carried out in 17 of these centers. It is imperative for a brain-banking consortium that the neuropathologic evaluations are comparable between the centers. A previous pilot study done by the BNE consortium assessed the comparability of the neuropathologic diagnosis of AD following the CERAD (4) and Braak and Braak (5) recommendations. The results of this pilot study indicated poor agreement in the evaluations among the participants (data not published). Therefore, this interlaboratory study was designed to compare the reliability of the semiquantitative assessment of NPs and NFTs among the members of the BNE consortium using the detection methods recommended by the NIA-Reagan (6), CERAD (4), and the Braak and Braak (5) protocols.
The emphasis of this study was on the quantification of NPs and NFTs by different evaluators using a 2-mm tissue microarray (TMA) technique (7, 8). The TMA block included numerous brain samples received from the 17 participating BNE centers. This report summarizes the results obtained when participating neuropathologists semiquantitatively assessed AD-related lesions in 42 core samples, each measuring 2 mm in diameter, using 4 different staining techniques.
Materials and Methods
Tissue Microarray Block Construction and Sectioning
Each participating BNE center (n = 17) was asked to provide to the coordinating center a routinely processed, paraffin-embedded block of temporal cortex with various AD-related lesions as well as details of the postmortem delay, the fixatives used, and the storage time (Fig. 1; Table 1). Briefly, the male/female distribution was approximately equal. The age at death ranged from 57 to 96 years. The postmortem delay varied from 3 hours to 5 days and the fixation time from 1 day to 14 months. The maximum temperature of the embedding medium ranged from 54°C to 62°C and the storage duration for blocks ranged from 1 day to 5.5 years.
Paraffin blocks (at least 3-mm-thick) from a total of 21 cases were obtained for the construction of the TMA block (Table 1), which was constructed as described elsewhere (9). Briefly, to give a wide range of detectable lesions, 2 core samples were taken from each case, one from the sulcal and the other from gyral grey matter (n = 42) (Fig. 2). The core samples were taken using a Manual Tissue Arrayer 1 instrument (Beecher Instruments, Inc., Sun Prairie, WI). Representative samples were obtained using a 2.0-mm-diameter needle. The core samples were placed into the recipient TMA block. Serial 7-μm-thick sections were cut from the TMA block with a rotating microtome without the use of supportive methods such as a tape-transfer system (8, 10). The serial sections were placed on commercial SuperFrost Plus microscope slides and dried overnight.
BNE Participant Efforts
Each BNE participant received 5 consecutive slides to be stained by modified Bielschowsky (2, 11), Gallyas silver-impregnation techniques (12), and IHC methodology using antibodies directed to beta-amyloid (Aβ) and hyperphosphorylated tau (HPtau) (Figs. 1, 2). BNE participants also received the protocol with recommended staining practices and data sheets for recording assessments (Fig. 2). According to the assessment instructions, Bielschowsky silver stain was to be used only for assessment of NPs (i.e. plaques with dark brown to black silver-positive thickened neurites) (4). Gallyas silver stain was to be used for assessment of NFTs and NTs (neuropil threads/neurites), and IHC/Aβ staining was to be used for assessment of protein aggregates (i.e. plaques) and IHC/HPtau for NFTs and NTs. For the assessment, each detectable lesion (regardless of the size) within the core sample of 2 mm in diameter was to be counted. If the core sample was only partly grey matter, this had to be taken into account; that is, if there were 4 lesions in a core sample but only 50% of the core was grey matter, the estimated count was twice as high, or 8 lesions. The counts were reported by the participating neuropathologists as semiquantitative scores on a 5-step scale (Fig. 2).
Coordinating Center Efforts
The coordinating center assessed the comparability of the sections, ensuring that the shipped sections were alike. Every 20th slide of the serial TMA sections (n = 6) was IHC/HPtau-stained, inspected, and assessed (Fig. 1).
The slides stained by the participating BNE centers, data sheets, and staining protocols were collected and analyzed (Fig. 1). The IHC methods are listed and the results of the assessments are identified based on the assessment code in Table 2. The missing core samples were calculated from the stained TMA sections by light microscopy at 25× magnification. In addition, the damage in the core samples was assessed. To enable the comparison of the staining results, only those core samples were included in which ≥75% of the tissue was available for analysis (Fig. 1). In addition to loss or damage of cores during the staining procedures, some core samples were excluded because there was not enough material in the recipient block. In total, assessments of 10 of the initial 42 core samples (3D, 3G, 4E, 4G, 4H, 5C, 5D, 6A, 6C, and 6E) were excluded (Fig. 2).
The quality of staining of most of the core samples in a TMA section was estimated on a 3-step scale (good, acceptable, or poor) under light microscopy at 25× to 100× original magnification. Briefly, the quality of staining was assessed as good when the staining clearly labeled the lesion in most of the core samples in a TMA section (Fig. 2). The staining was acceptable when the lesions were detectable, but counting was laborious as a result of partial staining of the lesions or excessive background staining. The staining was assessed as poor when the lesions were not at all detectable or they were detectable but not stained as required (Fig. 2).
Reassessment of all TMA sections for each staining was carried out in one sitting by 2 evaluators following the given assessment instructions (Fig. 2). During the reassessment, strict criteria were applied and only those lesions that fulfilled the required staining criteria were included. The reassessment was carried out in the following order: Bielschowsky, Gallyas, IHC/Aβ, and lastly IHC/HPtau with a few days between the viewings.
The results are reported as mean values of the semiquantitative scores of lesions of the 32 core samples in a TMA section (at the TMA section level) for both the primary (each BNE participant) and the reassessments (coordinating center). Moreover, to analyze the variability in the assessments without the interference of staining quality, only TMA sections having good or acceptable staining quality were selected in a core-to-core comparison (at the core sample level). However, because the nature of the core sample (i.e. grey matter/white matter) might have influenced the results, only those core samples that were constructed mostly of grey matter (75-100%) were included in this core-to-core assessment.
Additionally, to evaluate the impact of the assessments of various lesions (NPs and NFTs) on the neuropathologic diagnosis of AD, a comparison of diagnoses using different protocols (CERAD, Braak and Braak, NIA-Reagan) was carried out on selected representative core samples with good staining quality as described previously (4-6).
Statistical Analysis and Photography
For statistical analyses, SPSS program for Windows (version 11.5) was used. The statistical difference in comparability of the shipped sections was estimated by the nonparametric Kruskal-Wallis (K-W) test. The statistical difference in both primary assessments and in reassessments at the TMA section level was estimated by the nonparametric K-W test. The agreement in the assessments of lesions between the primary assessment and the reassessment was estimated using the nonparametric Wilcoxon test (scores ranging from 0-4) or Fisher exact test (scores ranging from 0-1). In addition, the value of absolute agreement (%) was calculated, that is, the proportion of core samples assessed equally in the primary and reassessments. At the core sample level, the proportion of the most frequent score was calculated and given as a percentage of agreement. In addition, a value for the absolute agreement (%) between the primary and reassessments was calculated. Digital images were taken using a Leica DM4000 B microscope equipped with a Leica DFC 320 digital camera (Leica Microsystems Wetzlar Ltd., Heerbrugg, Germany).
The semiquantitative assessments of IHC/HPtau-labeled NFTs in the 6 stained consecutive TMA sections did not differ significantly (K-W test, p = 0.9); mean values (± standard error) ranged from 3.2 ± 0.2 to 2.8 ± 0.3, indicating that the number of lesions did not change in a significant manner over the depth of the tissue.
Bielschowsky Silver Impregnation
Details regarding the staining results are given in Figure 3 and Table 3. Bielschowsky staining was done in 15 of 17 centers. The staining intensity of both the lesions and the background varied extensively not only between different BNE centers, but also within a single TMA section. According to the strict assessment criteria, the staining of most of the core samples in a TMA section was assessed as being good or acceptable in 8 of 15 stainings (53%). Only once did the Bielschowsky method stain all core samples within the TMA section evenly. Core samples 1C, 1D, 6B, and 6D were usually understained, whereas core samples 3A, 3B, 4C, and 4D were usually overstained. During the reassessment, it was noted that the NPs in each core sample were diffusely dispersed and no overlapping or confluent lesions were noted.
At the TMA section level, when all assessments were included (Table 3), the primary assessment and the reassessment of NPs differed significantly (K-W test, p < 0.05), with mean values (± standard error) ranging from 3.6 ± 0.2 to 1.2 ± 0.5 and from 3.1 ± 0.3 to 0, respectively. However, when only the good/acceptable stainings were included, the primary assessments did not differ significantly (K-W test, p = 0.05), whereas the difference in the reassessments was still significant (K-W test, p = 0.02). The reassessments differed significantly (Wilcoxon, p < 0.05) from the primary assessments in 63% of the good/acceptable stainings (5 of 8). Finally, absolute agreement between the primary and reassessment in the good/acceptable stainings ranged from 50% to 84%, with good staining results (≥75%) in 3 of 8 assessments.
The core-to-core comparison of the good/acceptable stainings (Table 4), including only the 13 core samples composed mostly of grey matter (75-100%), revealed that there was still some variability in the assessments of NPs. The assessment ranged from zero to the highest score value of 4 in 3 of 13 core samples (23%). One of these core samples was repeatedly overstained (3A) and the other understained (6D). The score value of 4 was the most frequent assessment in 62% of the core samples (8 of 13). The agreement in the assessments of these 13 core samples ranged from 25% to 100%, being good (≥75%) in 7 of the core samples (54%). It is notable that in 38% of the core samples (5 of 13), the agreement was less than 50%. The absolute agreement between the primary and reassessments for the selected 13 core samples ranged from 17% to 100%, being good (≥75%) in only 46% of the assessments (6 of 13).
Gallyas Silver Impregnation
Details regarding the staining results are seen in Figure 4 and Table 3. Gallyas silver impregnation was carried out by 14 of the 17 BNE centers. Within most of the TMA sections, the staining of both lesions and the background in core samples was rather uniform. However, when comparing sections stained by different BNE centers, the staining of lesions and the background varied. According to the strict assessment criteria, the Gallyas staining (i.e. staining of most of the core samples in a TMA section) was good or acceptable in 57% of the cases (8 of 14). The poor stainings were either too dark or too light with almost no detectable staining. Another common problem with Gallyas staining was the high amount of grainy silver precipitate diffusely spread over the core sample.
At the TMA section level, when all assessments were included (Table 3), both the primary and reassessments of Gallyas silver-stained NFTs differed significantly (K-W test, p < 0.05), with the mean values (± standard error) ranging from 2.7 ± 0.3 to 1.4 ± 0.3 and from 2.6 ± 0.3 to 0.7 ± 0.2, respectively. However, when only good/acceptable stainings were evaluated, there were no significant differences between the primaries or the reassessments (K-W test, p > 0.05). Comparison of the primary and reassessments of NFTs indicated only minimal differences and no statistically significant differences (Wilcoxon, p > 0.05) in most of the assessments (6 of 8); however, the absolute agreement still ranged from 23% to 71%.
The core-to-core comparison of good/acceptable stainings (Table 4), including only the 13 core samples composed mostly of grey matter (75-100%), revealed that the evaluators still disagreed in the assessment of NFTs. In 23% of the selected core samples (3 of 13), the primary assessments ranged from zero to the highest score value of 4. The score value of 4 was the most frequent assessment in 54% of the core samples (7 of 13). The agreement between primary assessments ranged from 43% to 100%, being good (≥75%) in 54% of the core samples (7 of 13). Finally, the absolute agreement between the primary and reassessments of NFTs ranged from 14% to 100%, being good (≥75%) in only 38% of the selected core samples (5 of 13). Moreover, in 15% of these core samples (2 of 13), the absolute agreement was no better than 25%.
The assessment of Gallyas silver-stained NTs was carried out by 13 of 17 centers (Table 5). At the TMA section level, when all 13 stainings were included, NTs were seen in 37% to 88% of the core samples in primary assessments from and in 42% to 88% in reassessments. When only good/acceptable stainings (7 of 13) were evaluated, the range was similar (37-87% and 65-75%, respectively). In most of these cases, the primary assessments were significantly related to the reassessments (Fisher exact test, p < 0.05). The absolute agreement between primary and reassessments, when only good/acceptable stainings were included, ranged from 67% to 91%, being good (≥75%) in most of the assessments (6 of 7).
The core-to-core comparison of good/acceptable stainings (Table 4), including only the 13 core samples composed mostly of grey matter (75-100%), revealed that in 46% of the core samples (6 of 13), the evaluators either noted NTs or did not detect them at all. The agreement within the assessments ranged from 57% to 100%, being good (>75%) in 11 of 13 core samples (85%). Similarly, the absolute agreement between the primary and reassessments ranged from 40% to 100%, being good (≥75%) in 10 of 13 core samples (77%).
Details regarding the IHC/Aβ staining results are given in Figure 5 and Table 6. IHC/Aβ staining was carried out by all BNE centers (n = 17). There was considerable variation in the staining protocols, for example, 3 different antibodies and 15 different pretreatment and dilution combinations were used (Table 2). According to the strict assessment criteria, the IHC/Aβ staining of most of the core samples in a TMA section was good or acceptable in 71% of the stainings (12 of 17). In 3 of the good stainings (assessment codes 2, 12, and 17 in Table 2), the intensity of the lesions was quite uniform between various core samples within the TMA section. In the other good stainings, some of the core samples were less intensively stained, but the Aβ-labeled aggregates were still detectable at the 100× magnification. Three of the participants failed completely with the staining. During the reassessment, it was noted that no confluent stained aggregates were noticed and all aggregates were larger than a single cell.
At the TMA section level (Table 6), both primary and reassessments of IHC/Aβ-labeled plaques differed significantly (K-W test, p < 0.05), with the mean values (± standard error) ranging from 3.8 ± 0.1 to 0.6 ± 0.1 and from 3.7 ± 0.2 to 0.2 ± 0.1, respectively. When only good/acceptable stainings were included, the primary and reassessments still differed significantly (K-W test, p < 0.05), with the mean values (± standard error) ranging from 3.8 ± 0.1 to 1.6 ± 0.2 and from 3.7 ± 0.2 to 2.2 ± 0.3, respectively. When comparing the primary with the reassessments including only good/acceptable stainings, the results differed significantly in 5 of 12 assessments (Wilcoxon, p < 0.05). Furthermore, the absolute agreement between the primary and the reassessments ranged from 36% to 100%, being good (≥75%) in only 6 of 12 cases (50%).
The core-to-core comparison of good/acceptable stainings (Table 7), including only the 13 core samples composed mostly of grey matter (75-100%), revealed that the primary assessments of IHC/Aβ-labeled plaques still varied. In 23% of these selected core samples (3 of 13), the assessments ranged from zero to the highest score value of 4, whereas the score value of 4 was the most frequent assessment in 77% of the core samples (10 of 13). The agreement in the primary assessments was good (>75%) in 8 of 13 core samples (62%). The absolute agreement between primary and reassessments of IHC/Aβ aggregates, in turn, ranged from 55% to 92% and was good (≥75%) in 62% of the core samples (8 of 13).
All BNE centers carried out the IHC/HPtau staining (Fig. 6; Table 6). Again, there was considerable variation in the staining protocols; for example, 2 antibodies were used and some of the participants used some kind of pretreatment, although most centers did not (Table 2). According to strict assessment criteria, the IHC/HPtau staining (i.e. staining of most of the core samples in a TMA section) was good or acceptable in as many as 94% of the stainings (16 of 17). In these good specimens, the staining intensity of the lesions was uniform between different core samples within the TMA section and there was no disturbing background staining. In 4 of the acceptable stainings (assessment codes 3, 7, 9, and 13 in Table 2), the individual core samples were variably stained within the TMA section, whereas in one of the acceptable stainings (assessment code 11 in Table 2), the quality of immunostaining was good but the absence of counterstaining complicated the assessment of the TMA section.
When good/acceptable stainings were included (16 of 17), the distribution of the mean values (± standard error) of primary and reassessments of IHC/HPtau-labeled NFTs (3.4 ± 0.2 to 2.7 ± 0.3 and 3.4 ± 0.2 to 2.6 ± 0.3, respectively) at the TMA section level was rather uniform and the results did not differ statistically significantly (K-W test, p > 0.05) (Table 6). It was noteworthy that in only 4 of 16 good/acceptable stainings (25%) did the primary and reassessments differ significantly (Wilcoxon, p < 0.05). Furthermore, the absolute agreement between the primary and reassessments ranged from 63% to 100%, being good (≥75%) in 63% of cases (10 of 16).
The core-to-core comparison of good/acceptable stainings (Table 7), including only the 13 core samples composed mostly of grey matter (75-100%), indicated that some disagreement was still noted in the primary assessments of IHC/HPtau-labeled NFTs. In 62% of these selected core samples (8 of 13), the evaluators were unable to decide between the 2 highest scores (3 or 4), and this accounted for most of the disagreement. However, in only one of the 13 core samples did the primary assessments range from zero to the highest score of 4. The highest score value of 4 was the most frequent assessment in as many as 85% of the core samples (11 of 13). The agreement between the primary assessments ranged from 43% to 100% and was good (≥75%) in 11 of 13 core samples (85%). Lastly, the absolute agreement between the primary and reassessments of HPtau-labeled NFTs ranged from 29% to 100%, being good (≥75%) in as many as 92% of the core samples (12 of 13).
The assessment of IHC/HPtau-labeled NTs was carried out by all BNE centers (n = 17). At the TMA section level when all 17 stainings were included, the IHC/HPtau-labeled NTs were seen in primary assessments in 47% to 94%, and in reassessment, the range was from 68% to 97% of the core samples (Table 5). When only the 16 good/acceptable stainings were evaluated, NTs were seen in 67% to 94% of the core samples in primary assessments and in 68% to 97% of the core samples in reassessments. In most of these cases, the primary assessments were significantly related to the reassessments (Fisher exact test, p < 0.05). The absolute agreement between the primary and reassessments, including the 16 good/acceptable stainings, ranged from 63% to 100% and it was good (≥75%) in 94% of the assessments.
The core-to-core comparison of good/acceptable stainings (Table 7), including only the 13 core samples composed mostly of grey matter (75-100%), revealed that NTs were either not noted or not detected in 3 of 13 core samples (23%). The agreement within the primary assessments ranged from 81% to 100% and the absolute agreement was 100% in 6 of 13 core samples (46%).
Neuropathologic Diagnosis of Alzheimer Disease
In Figure 7, neuropathologic diagnoses of AD following different protocols (i.e. CERAD, Braak and Braak, and NIA-Reagan) were given for the 13 core samples (Tables 4, 7) composed mostly of grey matter (75-100%) and that were of good/acceptable staining quality. Six to 8 assessments were available for different core samples. Each core sample was obtained from a demented subject with cognitive impairment. Following the CERAD protocol in 5 of 13 (38%) cases, Braak and Braak staging in 3 of 13 cases (23%), and the NIA-Reagan protocol in 7 of 13 cases (54%), full agreement in diagnosis was obtained. When the results obtained using IHC/HPtau stainings were included in all core samples except 2 (3E and 6D), moderate to numerous labeled NFTs were seen, indicating that these 11 cores were from subjects with dementia, likely being AD. Core sample 3E lacked any AD-associated lesions and in core sample 6D, using the additional IHC/HPtau method, moderate or numerous NFTs were seen with either high (50%) or intermediate (50%) likelihood.
The neuropathologic diagnosis of AD includes both the semiquantitative assessment of NPs and/or NFTs (1, 4, 6) and/or the evaluation of the regional distribution of NFTs (5). The variability existing in the visualization of hallmark lesions such as NPs and NFTs may, however, constitutes a severe obstacle. The Gallyas silver stain, as recommended for use by Braak and Braak (5), is selective for NFTs and NTs and is less suitable for NPs (13). It is sensitive to laboratory protocols and is known to be capricious (3, 13). The Bielschowsky stain, which is recommended for use in assessing NPs (4), is also sensitive to laboratory routines and conditions such as ambient temperature (3, 11). Furthermore, irrespective of which silver staining technique was used, staining results have been shown to be influenced by the storage time in formalin (14). In addition, IHC techniques to visualize both NFTs and NPs have been recommended recently (6). However, IHC staining results can also be influenced by the methodology and the antibody used (15-20). The present interlaboratory study was designed to assess the quality of staining and to evaluate the level of agreement in assessing NPs and NFTs in postmortem brain tissue using both silver impregnation and IHC methods.
In a previous interlaboratory comparison of assessment of AD lesions, the reported high interrater variability among the participating 24 neuropathologists was believed to be attributable to the lack of specific guidelines for selecting assessment fields with each neuropathologist assessing a different neuroanatomic region (21). Our previous pilot study yielded similar results (data not published), and to avoid problems related to neuroanatomy, we used the 2-mm TMA technique (7, 8) for construction of the section to be assessed. Therefore, each participant evaluated the same region, with only minor variations resulting from the sectioning.
Because no systematic analysis regarding the influence of presectioning protocol on the stainings have been published, each center supplied their own tissue for the construction of the TMA block. Surprisingly, the material was indeed heterogeneous for postmortem delay and fixatives and fixation time, and this variability in the presectioning protocols was noted to influence the staining results. This was clearly notable at the TMA section level where the individual core samples processed by the same center stained differently. This is in line with a previous report (14) indicating that the fixation time can be crucial, particularly influencing the results obtained with silver stains. Similar results have also been reported for immunohistology (19). Additionally, there are probably also still unknown differences between individual cases that influence the staining results. It is noteworthy that there is no information addressing why specific structures are labeled with silver stains; thus, the influence of presectioning protocol on these stainings is difficult to analyze. Regarding the immunohistochemical stainings, more knowledge is available and systematic analysis of the preservation of some antigens and retrieval has been done. However, based on our results, it appears that good staining can be obtained irrespective of postmortem delay (range, 3-120 hours), fixation time (range, 1 day to 420 days), and storage time (range, 1 day to 5 years). One should, however, be cautious regarding the fixative used. Our results indicated that the use of formic acid in fixation solution seemed to influence the IHC/AT8 staining by diminishing the labeling. The reasons why and how the presectioning protocol is so critical lie beyond the scope of this article, but our results indicate that these steps are of importance and a more systematic evaluation of these issues is currently underway.
It was noted that among the participating BNE centers, there was a variation in the quality of stain, especially with regard to the silver stains (approximately 45% poor quality). As seen in Figure 3, the silver-labeled NPs were visualized differently in different stainings, and different core samples stained differently within the same TMA section. Similarly, as seen in Figure 4, various results were seen when Gallyas staining was used for visualization of NFTs. Variation in the staining quality was also noted in the IHC/Aβ stain (approximately 29% poor quality). As seen in Figure 5, the IHC/Aβ-labeled protein aggregates were visualized differently in different stainings and different core samples stained differently within the same TMA section. In contrast, the IHC/HPtau staining quality was quite uniform (approximately 6% poor quality), although some variability was noted (Fig. 6). Although the majority of participants used identical staining protocols, the variable staining quality with using silver techniques might also be attributable to the proficiency of the BNE center performing the technique. Some centers use the staining routinely, whereas others only rarely.
The primary assessment of lesions in the TMA sections varied significantly. This is partly explained by the methodological aspects discussed here. However, some level of agreement was reached when the poor stainings were excluded for all stains except the IHC/Aβ stain. The extensive variability in visualization of IHC/Aβ aggregates might be the result of the many different types of pretreatment used in IHC/Aβ staining by the participating BNE centers. This is in line with a previous study (15) that demonstrated that the IHC/Aβ staining is especially sensitive to the pretreatment conditions. However, the variability in the staining protocols does not alone explain the unevenness in the IHC/Aβ staining quality. The precisely same protocol yielded both good as well as poor quality of staining. This might be the result of the proficiency of the BNE center performing the techniques, but the results might also be influenced by factors such as buffers and detection kits used. Obviously, not only the presectioning protocol, but also the way in which the IHC staining is conducted is a major obstacle to reliably visualize the IHC/Aβ-labeled aggregates. Further refinement of the methodology, including both use of a panel of antibodies and a strict pretreatment protocol, should be undertaken.
When comparing results assessing good/acceptable silver stains, the primary and reassessments differed. It should be noted, however, that the good/acceptable stained TMA sections also included some poorly stained (i.e. over- and understained) core samples that might have influenced the results. In the Bielschowsky silver stain, the significant difference seen in most of the assessments was attributed to the strictness of the criteria used during the reassessment; thus, although a plaque was perceivable, it was not counted if silver-stained, thickened neurites were not seen. Moreover, some variance was noted after repeated assessments by the same primary and reassessor, indicating that some difficulties were encountered in identification of plaque subtypes such as overstained diffuse plaques counted as neuritic plaques and vice versa. In agreement with previous reports (3, 13, 21), there are both methodological and assessment-related pitfalls associated with the Bielschowsky silver-impregnation technique, making this method susceptible to serious errors. In the Gallyas silver-stained sections, the significant difference between the primary and reassessments of NFTs was less striking. However, the absolute agreement (i.e. the proportion of equally assessed core samples) was less than 75%. This was probably the result of different staining of the NFTs in core samples, although the staining of the TMA section was assessed as being rather uniform. Some of the cores were overstained and some were understained, with both affecting the identification as well as the rating of NFTs. However, the absolute agreement regarding the assessment of NTs was much better. It should, however, be noted that the Gallyas stain was performed by only 14 of 17 participants and in 6 of 14 stains, and the quality of staining was assessed as being poor, indicating that this technique is both difficult to standardize and capricious.
The most reproducible staining techniques proved to be IHC/HPtau in terms of both the staining quality and the assessments. In most cases, the major disagreement in the assessments was related to selecting between the 2 highest score values (range, 0-4), whereas alternatives ranging from zero to 4 were awarded only in one core sample. This core sample (6D) originated from a tissue block that had been fixed in 4% formaldehyde including formic acid and it was poorly stained (pale) in most of the stainings. Two centers used formic acid pretreatment, and it was noted that with the monoclonal antibody (AT8), the staining result was poor, whereas when applying the polyclonal antibody, the staining result was acceptable. These results are in contrast to a previous report (17) indicating that the formic acid pretreatment enhanced the labeling specifically of neuropil threads. However, these results are not fully comparable because all details regarding presectioning and postsectioning protocols are not available. This highlights the importance of reporting methodological details when capricious immunolabeling procedures are used.
A current Medline search reveals that 3 major protocols are used for the neuropathologic diagnosis of AD. These include the CERAD protocol based on semiquantitative assessment of NPs, the Braak and Braak staging based on the regional distribution of NFTs, and the NIA-Reagan protocol combining the previous ones (4-6). When only the good/acceptable stained sections and the 13 representative cores (≥75% grey matter; 6-8 evaluations) were included in 35% of the core samples, the same diagnosis of definite AD was given. It is noteworthy that 3 core samples were assessed as being either not from AD cases by 6% of evaluators or from definite AD cases by 14% of evaluators. Based on the variability in assessment of Bielschowsky-stained NPs, we cannot recommend the use of this protocol in an interlaboratory setting. Similar diagnostic variability was noted following the Braak and Braak protocol. The NIA-Reagan protocol combines the CERAD and Braak and Braak protocols, so there is a high likelihood of AD in cases with numerous NPs and isocortical NFTs. Regarding our 13 core samples, in 25% of the received assessments, the NIA-Reagan protocol (when strictly followed) could not be applied because the CERAD and Braak and Braak assessment results were not consistent with each other. In some cases with CERAD-definite AD, no NFTs were noted (plaque-predominant) and vice versa (tangle-predominant). However, when IHC/HPtau methodology was applied, in 11 of the 13 core samples, all evaluators detected numerous NFTs, indicating a high likelihood of AD.
In conclusion, some variability in the staining quality of silver techniques was noted, although almost all the participants followed the same staining procedure. This led to undesirable variation in the assessments of AD-related lesions and consequently neuropathologic diagnoses; therefore, these stainings are not recommended for use in interlaboratory settings. The variability found with respect to IHC/Aβ labeling is attributed to extensive methodological differences influencing the rating and, therefore, further methodological adjustments are required. Our study indicated that there is one method (the IHC/HPtau labeling of NFTs and NTs) that yields almost uniform quality of staining and appropriate assessment of lesions, although the IHC/HPtau methodology was not uniform. These findings suggest that the IHC/HPtau labeling can be recommended for assessing AD lesions, especially in interlaboratory settings. This method is consistent with the recommendations in http://www.ICDNS.org, which emphasize the use of immunocytochemistry in AD diagnostics. Furthermore, our findings point out the need for a neuropathologic diagnostic protocol of AD that is primarily based on the IHC/HPtau methodology.
Based on the number of poor stainings and the variability in the assessments of AD lesions found in this study, it can be further concluded that there is a need for continual quality control and standardization of methodology.
The authors thank Frances Carnie, Tarja Kauppinen, Maria Kemerli, M. Kooreman, Tanja Treutlein, A. Van den Berg, and J. Wouda as well as the other laboratory technicians of the individual brain banks for their skillful technical assistance and Vesa Kiviniemi for his assistance in statistics. MBG is grateful to the Parkinson's Disease Society Tissue Bank at Imperial College London, funded by the Parkinson's Disease Society of the U.K., registered charity 948776. The study has been authorized by the Ethics Committee of Kuopio University Hospital.