-
PDF
- Split View
-
Views
-
Cite
Cite
Lucio Calandriello, Simon LF Walsh, The evolution of computer-based analysis of high-resolution CT of the chest in patients with IPF, British Journal of Radiology, Volume 95, Issue 1132, 1 April 2022, 20200944, https://doi.org/10.1259/bjr.20200944
- Share Icon Share
In patients with idiopathic pulmonary fibrosis (IPF), there is an urgent need of biomarkers which can predict disease behaviour or response to treatment. Most published studies report results based on continuous data which can be difficult to apply to individual patients in clinical practice. Having antifibrotic therapies makes it even more important that we can accurately diagnose and prognosticate in IPF patients. Advances in computer technology over the past decade have provided computer-based methods for objectively quantifying fibrotic lung disease on high-resolution CT of the chest with greater strength than visual CT analysis scores. These computer-based methods and, more recently, the arrival of deep learning-based image analysis might provide a response to these unsolved problems. The purpose of this commentary is to provide insights into the problems associated with visual interpretation of HRCT, describe of the current technologies used to provide quantification of disease on HRCT and prognostication in IPF patients, discuss challenges to the implementation of this technology and future directions.
Introduction
Idiopathic pulmonary fibrosis (IPF) is an inexorably progressive disease with a variable and unpredictable disease course.1 This makes prognostication for individual patients challenging and hampers drug development. In 2014, regulatory agencies approved two novel IPF therapies, which slow FVC decline.2 However, the use of FVC as a biomarker in IPF has limitations; it is prone to missing data, may miss treatment effects and since antifibrotic therapy slows FVC decline, the incremental benefit of new drugs will require increasingly sensitive biomarkers to capture treatment response. Therefore, there is an urgent need for sensitive biomarkers in IPF.3 High-resolution CT (HRCT) of the chest is an essential part of the diagnostic pathway of IPF4 and is widely available making it a promising source of digital biomarkers in IPF. Considerable research effort has been expended on developing visual and computer-based approach for analysing HRCT images.5
CT technique
Most available tools for the analysis of CT images are highly dependent on the technical parameters of the HRCT scan. Standardisation of CT acquisition protocol is mandatory to improve reproducibility of image analysis. Volumetric acquisitions with contiguous or overlapped reconstruction are essential with a slice thickness around 1 mm.6 The radiation dose ranges are variable but can be low and the use of dose reduction techniques should be encouraged. Iterative reconstructions are not recommended until their effect on computer analysis is clearly understood. A neutral or “soft” kernel is usually advocated for image reconstruction to avoid excessive noise and the use of sharper kernel is admitted for quantitative analysis if image normalisation techniques are used.7 An important source of variability in HRCT images is the depth of inspiration, which requires coaching the patient to comply with standardised breathing instructions. To avoid measurement errors due to technical parameters, patients should be imaged on the same CT scanner at all time points.
Quantitative ct analysis in patients with IPF
Visual scoring
Visual HRCT assessment, particularly the identification of the UIP (usual interstitial pneumonia) pattern, has prognostic value in patients with IPF. Patients with histologically proven UIP/IPF and a typical UIP pattern on HRCT are associated with increased mortality when compared to patients with non-UIP patterns.8 Furthermore, the extent of fibrosis on HRCT has consistently been shown to be an important predictor of morality in IPF.9 However, visual HRCT assessment has limitations; it is susceptible to high inter-observer variability and is relatively insensitive to subtle changes in disease extent over time.10 Specifically, honeycombing, which is a pivotal feature of IPF/UIP on HRCT and a consistent predictor of survival (Figure 1), is often misclassified in particular when there is coexistent emphysema.11 Composite scoring systems which integrate HRCT pattern extents with objective pulmonary function have been devised to overcome this variability and improve clinical applicability12 but these are not routinely used in clinical practice or implemented in clinical trials. These shortcomings represent the primary motivation for computer-based CT analysis, which can objectively analyse and quantify disease on HRCT of the chest.

HRCT image showing subpleural honeycombing (red arrows) and traction bronchiectasis (green arrows) in a patient with IPF.
Computerised texture analysis
The use of computerized tools to quantify disease on CT, known as quantitative CT (QCT), has been tested in a variety of fibrotic lung diseases, outperforming human-based CT evaluation due to its sensitivity and reproducibility. Both baseline and longitudinal changes in disease extent based on QCT evaluation have been reported in IPF.5
The first QCT applications in pulmonary fibrosis relied on simple measures of lung density, producing a histogram showing the distribution of lung density per voxel in a CT image. Since the presence of lung fibrosis causes changes in lung density and, therefore, the histogram-related features including mean lung attenuation, skewness and kurtosis, these parameters have been reported as surrogate markers of fibrosis extent on CT.13,14
However, analysing lung density over the entire lung does not allow discrimination between different patterns of disease such as ground glass opacification, honeycombing or emphysema. This limitation was overcome by more sophisticated QCT methods that allow pattern discrimination by training the computer to recognise individual HRCT patterns based on mathematical descriptions of the visual characteristics of an image.15
The QCT tools tested in IPF include the Adaptive Multiple Features Method (AMFM), Computer-Aided Lung Informatics for Pathology Evaluation and Ratings (CALIPER), Quantitative Lung Fibrosis (QLF) and Automated Quantification System (AQS) (Table 1).
Tool Name . | Approach . | Main Results in IPF patients . |
---|---|---|
Adaptive Multiple Feature Method (AMFM) | Feature Engineering | AMFM quantification of baseline fibrosis and progression of fibrosis on HRCT independently correlate with disease progression and change in FVC respectively |
Computer-Aided Lung Informatics for Pathology Evaluation and Ratings (CALIPER) | Feature Engineering | CALIPER related scores provide superior prediction of outcome when compared to conventional visual assessments of fibrosis. CALIPER can quantify a parameter called vessel-related structures which is the strongest predictor of mortality among all CALIPER and visual scores |
Quantitative Lung Fibrosis (QLF) | Feature Engineering | Changes in QLF scores across longitudinal CTs correlate with change in pulmonary function tests |
Automated Quantification System (AQS) | Feature Engineering | AQS scores correlate with pulmonary function tests. A cut-off value of AQS derived reticular opacity score has been identified below which FVC is likely to be stable at 1 year follow up in IPF patients |
Data-Driven Texture Analysis (DTA) | Deep Learning | Changes in DTA fibrosis scores correlate with changes in pulmonary function. Increasing extent of DTA scores is associated with an increased risk of disease progression and hospitalisation |
Tool Name . | Approach . | Main Results in IPF patients . |
---|---|---|
Adaptive Multiple Feature Method (AMFM) | Feature Engineering | AMFM quantification of baseline fibrosis and progression of fibrosis on HRCT independently correlate with disease progression and change in FVC respectively |
Computer-Aided Lung Informatics for Pathology Evaluation and Ratings (CALIPER) | Feature Engineering | CALIPER related scores provide superior prediction of outcome when compared to conventional visual assessments of fibrosis. CALIPER can quantify a parameter called vessel-related structures which is the strongest predictor of mortality among all CALIPER and visual scores |
Quantitative Lung Fibrosis (QLF) | Feature Engineering | Changes in QLF scores across longitudinal CTs correlate with change in pulmonary function tests |
Automated Quantification System (AQS) | Feature Engineering | AQS scores correlate with pulmonary function tests. A cut-off value of AQS derived reticular opacity score has been identified below which FVC is likely to be stable at 1 year follow up in IPF patients |
Data-Driven Texture Analysis (DTA) | Deep Learning | Changes in DTA fibrosis scores correlate with changes in pulmonary function. Increasing extent of DTA scores is associated with an increased risk of disease progression and hospitalisation |
Tool Name . | Approach . | Main Results in IPF patients . |
---|---|---|
Adaptive Multiple Feature Method (AMFM) | Feature Engineering | AMFM quantification of baseline fibrosis and progression of fibrosis on HRCT independently correlate with disease progression and change in FVC respectively |
Computer-Aided Lung Informatics for Pathology Evaluation and Ratings (CALIPER) | Feature Engineering | CALIPER related scores provide superior prediction of outcome when compared to conventional visual assessments of fibrosis. CALIPER can quantify a parameter called vessel-related structures which is the strongest predictor of mortality among all CALIPER and visual scores |
Quantitative Lung Fibrosis (QLF) | Feature Engineering | Changes in QLF scores across longitudinal CTs correlate with change in pulmonary function tests |
Automated Quantification System (AQS) | Feature Engineering | AQS scores correlate with pulmonary function tests. A cut-off value of AQS derived reticular opacity score has been identified below which FVC is likely to be stable at 1 year follow up in IPF patients |
Data-Driven Texture Analysis (DTA) | Deep Learning | Changes in DTA fibrosis scores correlate with changes in pulmonary function. Increasing extent of DTA scores is associated with an increased risk of disease progression and hospitalisation |
Tool Name . | Approach . | Main Results in IPF patients . |
---|---|---|
Adaptive Multiple Feature Method (AMFM) | Feature Engineering | AMFM quantification of baseline fibrosis and progression of fibrosis on HRCT independently correlate with disease progression and change in FVC respectively |
Computer-Aided Lung Informatics for Pathology Evaluation and Ratings (CALIPER) | Feature Engineering | CALIPER related scores provide superior prediction of outcome when compared to conventional visual assessments of fibrosis. CALIPER can quantify a parameter called vessel-related structures which is the strongest predictor of mortality among all CALIPER and visual scores |
Quantitative Lung Fibrosis (QLF) | Feature Engineering | Changes in QLF scores across longitudinal CTs correlate with change in pulmonary function tests |
Automated Quantification System (AQS) | Feature Engineering | AQS scores correlate with pulmonary function tests. A cut-off value of AQS derived reticular opacity score has been identified below which FVC is likely to be stable at 1 year follow up in IPF patients |
Data-Driven Texture Analysis (DTA) | Deep Learning | Changes in DTA fibrosis scores correlate with changes in pulmonary function. Increasing extent of DTA scores is associated with an increased risk of disease progression and hospitalisation |
The Adaptive Multiple Feature Method (AMFM) method is a computer-based texture analysis tool that quantifies lung parenchymal patterns on CT images and has demonstrated good levels of agreement with human observers for different interstitial HRCT pattern.16 AMFM quantification of baseline fibrosis and progression of fibrosis on HRCT of IPF patients independently correlated with disease progression and decrease in FVC, respectively.16
CALIPER-derived variables have been reported as more accurate for outcome prediction in IPF patients than corresponding semiquantitative CT scores, including in patients with early-stage disease.17,18 A unique feature of CALIPER is its ability to quantify a novel HRCT parameter, vessel-related structures (VRS), loosely corresponding to the volume of pulmonary vessels and associated structures, such as perivascular fibrosis, which has no visually scored equivalent. In one study, CALPER-derived scores were the only independent predictors of mortality in IPF when analysed alongside visual based scores. In this analysis, VRS was the strongest predictor of mortality.17,19
QLF score allows an objective and reproducible quantification of lung fibrosis in patients with IPF. Significant correlations between change in QLF values across longitudinal CTs and change in pulmonary function tests have been shown.20,21
AQS variables correlate with functional data and can predict FVC decline. Moreover, a cut-off value for AQS calculated reticular opacities has been identified as the level below which FVC is likely to be stable at 1-year follow-up in IPF patients.22
However, these methods are also limited because they require human input during training which introduces a degree of subjectivity, therefore, reducing the reliability of the training process. Second, feature engineering requires that the image features that are optimal for the task, in this case assessment of disease extent, progression or response to therapy, are known a priori. This approach does not allow inclusion of patterns that may be indiscernible by the human eye but clinically significant and machine detectable. Last, software training requires manual labelling of images which is time-consuming and requires high-level domain expertise.
Deep learning
The limitations of feature engineered QCT tools can be overcome using deep learning (DL), a type of machine learning which can autonomously and simultaneously optimise feature extraction and pattern discrimination, given a large well-labelled imaging dataset.23 The pivotal step for the development of a DL algorithm is the training process. During training, the algorithm iterates over a large number of labelled images and compares its prediction on each image to the image label. Using a process known as back propagation, the prediction error is calculated, and the algorithm adjusts its internal parameters to reduce the error on the next image. This procedure is repeated many times over the entire dataset of images, leading to gradual improvements in algorithmic predictions.24 Deep learning allows selection and amplification of subtle but highly discriminatory features while at the same time ignoring irrelevant variation (including HRCT technical parameters). An important advantage of this approach is that it does not require that the important CT features are known a priori; the process learns the most important features for predicting the desired outcome autonomously. Data-Driven Texture Analysis (DTA) is a QCT tool which avoids feature engineering by employing a deep learning approach to disease classification on HRCT.25 This DL algorithm has been trained to quantify changes in specific CT patterns over time for monitoring disease progression and therapeutic response.26 SOFIA (Systematic Objective Fibrotic lung disease Analysis Algorithm)27 is a deep learning algorithm, which instead of quantifying disease on HRCT images has been trained to provide a radiological diagnosis based on the joint IPF guideline criteria for a UIP pattern.28 Its results were validated against a panel of thoracic radiologists achieving expert thoracic radiologist-level performance.
Challenges to implementation
The implementation of DL algorithms in IPF may address several unmet needs both in research setting and in clinical practice, however its implementation in routine clinical practice faces several challenges.
Deep learning algorithms are to some extent black boxes; the complexity of the algorithm obscures its reasoning making it difficult to understand how it reaches its conclusions. This is a major limitation in cases where a radiologist and the algorithm disagree or when evaluating features that are undetectable to human observers.24 Understanding how a deep learning algorithm classifies specific cases is also critically important during algorithm training especially in cases where the algorithm makes an incorrect prediction. Class activation maps and saliency mapping can be used to highlight pixels in images, which have particular influence on algorithm predictions (Figure 2). More research is needed to improve algorithm interpretability.

(a) Four slice axial montage generated from a patient with usual interstitial pneumonia, depicted honeycombing in the apical segment of the right lower lobe (upper right image slice), apical segment of the left lower lobe (lower left image slice) and in the right upper lobe (upper and lower left images). (b) Saliency map following application of a Gaussian smoothing filter, generated by a deep learning algorithm (SOFIA – Systematic Objective Fibrotic lung disease Analysis Algorithm 9) highlighting pixels within figure leading to a diagnosis of usual interstitial pneumonia (outputted probabilities, UIP:0.979, probable UIP: 0.011, indeterminate: 0.003, alternative diagnosis: 0.007). The map shows that regions of honeycombing in the lower lobes contributed most to the algorithm’s diagnosis.
Another limitation of deep learning is the constraints of current computer memory capabilities; the resources required to train a deep learning algorithm using full volumetric HRCT are not available at many if not most research centres. This means that down sampling of HRCT scans for training is necessary which reduces the amount of data available for training (particularly relationships between pixel data in contiguous slices) and may introduce bias.27
A further obstacle to the training and development of DL algorithms in this setting is the need for large datasets of HRCT of fibrotic lung disease. Currently, these datasets do not exist in IPF. In the effort to overcome this limitation, in 2018 the Open-Source Imaging Consortium (OSIC) was founded with the aim of generating a large repository of chest CT images and clinical data from patients with fibrotic lung disease. OSIC also aims to facilitate collaborations between academia and industry for the development of machine learning algorithms focused on digital biomarker research in progressive fibrotic lung disease.
Finally, the results obtained with the above-mentioned computerised tools refer to continuous data in study populations. Demonstrating the correlation between a biomarker and prognosis in a large group of patients does not mean that it can be applied in clinical practice to a single patient. A possible approach to overcome this limit is using thresholds to stage the disease. This approach has been used with visual scoring providing good results12 but has not been fully explored using computer-based imaging analysis in IPF.
Future perspectives
IPF has a variable and unpredictable disease course. This unpredictability hampers management decisions and slows drug treatment development. In particular, accurate early diagnosis and outcome prediction in an individual patient would allow the initiation of appropriate therapy at the earliest opportunity. This represents one of the most urgent unmet needs in patients with IPF. For more than three decades, HRCT has been reported as having prognostic utility in IPF. The evolution of HRCT as a prognostic tool in IPF has occurred in stages. The subjectivity and poor reproducibility of visual assessment of HRCT gave way to early studies involving simple computer-based densitometry. This approach led to the development of more sophisticated methods for quantifying individual HRCT patterns across a wide range of fibrotic lung diseases, including IPF. More recently, rapid advances in machine learning and especially deep learning algorithms have created exciting opportunities for novel digital biomarker research in IPF. To harness this technology, improved algorithm interpretability as well as the development of large imaging datasets to drive algorithm training are needed
REFERENCES