Effective and efficient active learning for deep learning-based tissue image analysis

Abstract Motivation Deep learning attained excellent results in digital pathology recently. A challenge with its use is that high quality, representative training datasets are required to build robust models. Data annotation in the domain is labor intensive and demands substantial time commitment from expert pathologists. Active learning (AL) is a strategy to minimize annotation. The goal is to select samples from the pool of unlabeled data for annotation that improves model accuracy. However, AL is a very compute demanding approach. The benefits for model learning may vary according to the strategy used, and it may be hard for a domain specialist to fine tune the solution without an integrated interface. Results We developed a framework that includes a friendly user interface along with run-time optimizations to reduce annotation and execution time in AL in digital pathology. Our solution implements several AL strategies along with our diversity-aware data acquisition (DADA) acquisition function, which enforces data diversity to improve the prediction performance of a model. In this work, we employed a model simplification strategy [Network Auto-Reduction (NAR)] that significantly improves AL execution time when coupled with DADA. NAR produces less compute demanding models, which replace the target models during the AL process to reduce processing demands. An evaluation with a tumor-infiltrating lymphocytes classification application shows that: (i) DADA attains superior performance compared to state-of-the-art AL strategies for different convolutional neural networks (CNNs), (ii) NAR improves the AL execution time by up to 4.3×, and (iii) target models trained with patches/data selected by the NAR reduced versions achieve similar or superior classification quality to using target CNNs for data selection. Availability and implementation Source code: https://github.com/alsmeirelles/DADA.

The slides listed in Tables S3 and S4 were randomly selected from a total of 5,202 images used in Saltz et al. (2018). These slides were divided into patches, generating a total of 525,431 of them, of which 15,000 correspond to the test set and originates from WSIs in Table S4, 100 form a validation set and the remaining patches form the pool of unlabeled data.
The number of generated patches is very large, making it impractical to be used in an AL scheme. Thus, in the early stages of DADA ( Meirelles et al. (2022)), a fixed pool of 100,000 randomly selected patches was created. This selection can be seen as an individual TIL dataset and different selections can be made from the overall patch set extracted. (Tables S5 to S7).

This section presents complementary information about the CNNs used during experiments and their reduced versions
It is important to point out that since ResNet50 is a shallower network than Inception and its architecture has a block structure that would convert blocks into sequential layers if higher φ values were used, NAR was limited to φ = 3 for this model.

S3 Cluster configuration
Defining the number of clusters to be used with DADA requires experimentation and/or knowledge about dataset characteristics. For the MNIST dataset, which consists of handwritten decimal digits, it is natural to assume that 10 clusters may be a good choice, although that does not necessarily correspond to the best configuration. The TIL dataset does not have an obvious first choice for this parameter and experimentation was needed to point out a candidate. Figure S1 presents the results of these experiments. Average AUC for the 20, 40 and 80 clusters configurations was, respectively, 0.735, 0.739 and 0.732, showing minor differences, while 10 and 200 averaged at 0.724 and 0.697. Given these findings, the 20 cluster size was chosen, but other values with similar AUC could also be used.    In order to illustrate the effect of DADA in patch selection, Figure S2 shows the 5 patches with highest uncertainty in DADA's 20 clusters, used during image selection. These patches come from iteration # 19 of an Ensemble DADA experiment, in which 20 iterations were carried out (the last iteration does not execute patch selection). In this iteration, a subpool of 1,800 patches was available to be chosen by DADA and they were distributed among the 20 patch clusters. The number of selected patches from each cluster depends on the cluster's average uncertainty.
Additionally, patches within clusters usually span multiple slides and even when a subset of them come from the same slide they are not necessarily spatially close to each other. This is a result of the subpooling/sampling approach, in which only a subset of the pool is available to the selection strategy at each iteration, with subpool regeneration carried out after every α acquisitions. Table S8 shows the maximum and average distance between pairs of patches within each cluster (given in pixels) of the same experiment used to provide the images in Figure S2. Slides that had at least 2 patches selected were analyzed with respect to the maximum distance between every patch pair and also the mean distance between pairs. These measurements confirm that DADA produces patch clusters that group similar visual characteristics but that are spread along the slide.
In opposition to the diversity of patches produced by the clustering system used by ENS DADA, the ENS method acquires the most uncertain images from the unlabelled pool. Figure S3 shows the 30 most uncertain images from the pool, which are selected by the ENS approach, and it is clear how they have very similar visual characteristics, as expected. This difference among DADA and other approaches is one of the key aspects of our solution, when uncertainties and diversity are combined in the selection strategy to accelerate model learning.      Table S8: Cluster setup for the last acquisition iteration of an ENS DADA experiment. Maximum and mean distances are given in pixels in the original slide and correspond to the cluster average. Figure S3: Panel of the 30 most uncertain patches, acquired with ENS method in the last acquisition of a 14 iteration experiment.

S4 Impact of φ to Other CNNs in AL
The φ parameter serves as a trade-off between execution cost and target network AUC. As presented in the main paper document, there are circumstances where higher selections of φ correspond to considerable reductions in execution time and only marginal to no loss in AUC, when using the Inception V4 network. In this section, experiments conducted with other models, the Xception and ResNet50 V2, are presented and similar behavior was observed. A first observation is that these architectures have different classification performances, as demonstrated by Figure S4a, and have varying iteration execution times, as shown in Figure S4b.
When NAR is used with these models, iteration times are drastically cut as was observed with the Inception network, with only marginal differences in AUC level in comparison to the full model ( Figures S5 and S6).   Figure S5: Impact of NAR applied to the quality of the classification in the AL using MC DADA with Xception and ResNet50 V2 for multiple φ values.

S5 Varying TIL Datasets
The experiments conducted in the main paper use the same initial subpool and same 100k patches fixed pool (when applicable). If these sets are changed, a new TIL dataset is formed and used in the experiments. Although they represent the same application (TIL classification), these datasets have different patches and may pose other difficulties to the model.
Here we have also executed the AL with a new pool and used a new test set, which was also originated from the same slides listed in Table S4. The results are presented in Figure S7. As may be observed, a similar behavior was seen, when compared to the experiments displayed in main text. Both MC and Ensembles DADA approaches have close AUC levels and produce successive gains along time. This corroborates the first experiments that showed learning gains along training, with different difficulty levels presented by these datasets.