-
PDF
- Split View
-
Views
-
Cite
Cite
Claudia Mello-Thoms, Carlos A B Mello, Clinical applications of artificial intelligence in radiology, British Journal of Radiology, Volume 96, Issue 1150, 1 October 2023, 20221031, https://doi.org/10.1259/bjr.20221031
- Share Icon Share
Abstract
The rapid growth of medical imaging has placed increasing demands on radiologists. In this scenario, artificial intelligence (AI) has become an attractive partner, one that may complement case interpretation and may aid in various non-interpretive aspects of the work in the radiological clinic. In this review, we discuss interpretative and non-interpretative uses of AI in the clinical practice, as well as report on the barriers to AI’s adoption in the clinic. We show that AI currently has a modest to moderate penetration in the clinical practice, with many radiologists still being unconvinced of its value and the return on its investment. Moreover, we discuss the radiologists’ liabilities regarding the AI decisions, and explain how we currently do not have regulation to guide the implementation of explainable AI or of self-learning algorithms.
Introduction
The rapid growth of medical imaging has placed increasing demands on radiologists, because the number of images per case has grown significantly with volumetric imaging. Radiologists can also diagnose ~20,000 diseases and over 50,000 causal relations,1 which can lead to diagnostic errors, intra- and interreader variability and burnout. In this scenario, artificial intelligence (AI) has become an attractive partner, one that may complement case interpretation2,3 and may aid in various non-interpretive aspects of the work in the radiological clinic. Although AI will not ultimately replace radiologists, many in radiology believe that radiologists who work with AI will replace those that do not.4–7
In 2020, the American College of Radiology (ACR) conducted a survey to try to understand how its members were using AI in their clinical practices.8 They had 1427 responses and found that AI was being used by 33.5% of radiologists in the clinic. For those not using AI, 80% reported that they saw “no benefit” in the technology, and a third stated that they couldn’t justify the purchase. Among the radiologists using AI, 94.3% reported that AI performance was “inconsistent”, 5.7% that it “always worked” and 2% that it “never worked”.8
More recently, a survey was conducted among radiologists’ members of the European Society of Radiology.9 This survey had 690 respondents, and 40% revealed that they had experience with AI in their clinical practice. When asked if they wanted to acquire AI for their practice, only 13.3% said yes, 52.6% said no, and 34.1% didn’t answer. In terms of the effects of AI in the workload, 4% felt that AI reduced it, whereas 48% felt that it increased it and 46% that it stayed the same.9
While many publications can be found in the literature combining “AI” and “radiology” (a recent search of PubMed yielded 5104 results), significantly fewer papers are retrieved when one includes the words “clinical practice” in the search terms (227 results, that is, only 4.4%). As the literature has shown, AI algorithms that may perform well with internal validation data may observe a substantial decrease in performance when deployed in the clinical workflow.10,11 This is because clinical standards and guidelines may change over time, disease prevalence or incidence may change, new imaging equipment may be added to the department, etc., and while humans can easily adapt to these changes, AI systems may succumb because the US Food and Drug Administration (FDA)-approved models cannot be significantly adjusted without losing certification.12 In other words, AI models are brittle.13 Reports show that they can fail unpredictably when applied to conditions different from the environment in which they were trained.14 Moreover, without adaptive learning (i.e. the ability to learn as they go) AI model performance declines over time, because of changes to imaging equipment or protocols, software updates, or changes in patient demographics.15 Finally, FDA clearance does not guarantee that AI algorithms are generalizable, and the majority of external validation of AI algorithms show reduced performance when applied to external data sets.13,16,17 Because of that, it is highly recommended that AI products be locally validated with data from the environment where they are supposed to work.13,16,17
Furthermore, few randomized clinical trials have been carried out to compare performance of AI algorithms with performance of radiologists—a recent search of ClinicalTrials.gov yield 128 clinical trials related to “radiology” and “AI”, with most being in the recruiting stages. On the other hand, most of the non-randomized trials supporting the results in the literature are at a very high risk for bias.18 In a meta-analysis that covered 10 years of studies,18 it was shown that in the six studies (out of 37 publications) that evaluated AI algorithms in a representative clinical environment, 80% demonstrated no changes in radiologists’ performance by using AI, whereas in the remaining 20% performance improved.18 In addition, in the 19 studies where a comparison was possible, use of AI was more often associated with improvement of performance in junior clinicians. Also, radiologists changed the AI algorithms’ performance in 93% of cases. Compared to the standalone AI performance, human intervention improved performance in 60% of cases and decreased it in 33%.18
These results show a modest to moderate penetration of AI in the clinical practice. To date, the FDA has approved 201 AI algorithms from 2008 to 2022.19,20 However, estimates are that the market for AI in medical imaging is going to grow 10-fold over the next 10 years,8 from US$ 21.48 billion in 2018 to US$ 264.85 billion in 2028,21 so understanding how radiologists are currently using AI is of paramount importance in order to fully integrate this technology into future clinical practice. As this is a fast-growing market, an online overview of FDA-approved and Comformité Européenne (CE)-marked AI algorithms based on vendor information has been created (www.aiforradiology.com).22 The site is searchable by imaging modality (CT, MR, etc.), by specialty (MSK, neuro, etc.) and by whether the algorithm has FDA-approval or whether it is CE-marked. These authors report that most AI algorithms address neuroradiology and chest radiology, followed by breast and musculoskeletal radiology.22 In addition, they report that only 36 out of 100 CE-marked products had peer-reviewed evidence on the efficacy of the AI algorithm. This evidence was divided as 27% on technical and clinical feasibility, 65% on standalone performance, while a few studies (only 27%) addressed the clinical impact of the AI algorithm.22 In an examination of AI algorithms approved by the FDA,23 it was reported that only 16% had a multireader, multicenter validation study. Except for one algorithm, none described whether in their training or validation data sets patients from geographically or racially diverse populations had been used.24 In fact, most FDA approved AI algorithms were only evaluated in a small number of centers (one or two) without a diverse population, and that significantly impacted on their generalizability.25,26
In this paper, we discuss some clinical applications of AI in radiology. We cover both interpretative and non-interpretative uses of AI, as well as barriers to adoption of AI.
Interpretative uses of AI
For all the examples of AI algorithms given in this section, we have consulted www.aiforradiology.com and selected only examples who provide peer-reviewed evidence of performance and that are both FDA approved, and CE marked. The presented examples are by no means an exhaustive list of available AIs in the market, as they are a result of convenience sampling.
Breast imaging
Breast cancer is still the most common cancer for females worldwide. In 2020, the GLOBOCAN estimated that 2,261,419 new breast cancer cases were diagnosed around the world, which corresponded to 11.7% of all cancers diagnosed.27 Screening for breast cancer reduces cancer-related mortality and allows for less aggressive treatment.24 The most commonly used imaging modality to screen for breast cancer is digital mammography, which has a limited sensitivity (particularly in dense breasts) and a relatively high number of false positives. Because of these, historically there has been much interest in developing computer-assisted tools to aid radiologists in the task of detecting early cancerous lesions.
Shift to 2022, and breast imaging has been a major area of interest for the developers of AI and it accounts for approximately 14.8% of all AI applications on the market.28 According to the American College of Radiology Data Sciences Institute’s (ACRDSI) AI Central,19 to date, there are 30 AI FDA-approved algorithms for breast imaging, divided as follows: (i) nine for breast lesion characterization in mammography; (ii) 10 for breast density estimation in mammography; (iii) one for image quality improvement in mammography; (iv) three for breast lesion characterization in ultrasound; (v) two for breast lesion characterization for MRI.19 For example, in mammography, Kim et al29 showed that their AI (Lunit’s INSIGHT MMG) improved the performance of six breast radiologists and six general radiologists when reading 320 mammograms (AUC without AI = 0.810, AUC with AI = 0.881, p < 0.0001). Pacilè et al,30 also analyzing mammograms, showed that AI (Therapixel’s MammoScreen) helped 14 breast radiologists improve performance when reading 240 digital mammograms (AUC without AI = 0.769, AUC with AI = 0.797, p = 0.035). Finally, using DBT, Romero-Martín et al31 showed that using ScreenPoint Medical’s Transpara, the AI had a sensitivity of 77% when performing as a single reader, 81.4% when performing as a double reader, but the AI’s recall rate was 12.3% higher than that of radiologists.
Thoracic radiology
Lung cancer has the highest mortality among all cancer types. In 2020, the GLOBOCAN estimated 2,206,771 new lung cancer cases (11.4% of all cancers), but mortality rates were high, at 1,796,144 deaths (18% of all cancer-related deaths).27 Because of this, many computer-assisted systems have been developed to detect lung nodules, initially on chest radiographs, which are not the optimal imaging modality for this task, and more recently on chest CT.
Chest imaging is one of the most frequently targeted area for AI algorithms, and it accounts for approximately 26.6% of the market, with a total of 54 FDA-approved algorithms.19 33 algorithms focus on chest CT, and despite the plethora of possible findings, most algorithms focus on detection of lung nodules.28 17 algorithms focus on chest radiographs, and from these, 8 aim to detect pneumothorax, whereas 3 focus on detection of pleural effusion. The remaining focus on detection of lines and tubes (3), pneumoperitoneum (1), lung nodules (1) and image quality improvements (1). Finally, five focus on MRI for lesion measurement, surgical planning, cardiac measurements, etc. For example, Ahn et al31 had a total of six observers (two thoracic radiologists, two thoracic imaging fellows, two residents) read a case set of 497 frontal chest radiographs looking for four targets: pneumonia, nodule, pneumothorax and pleural infusion, with and without AI. For all findings, observers had higher sensitivity with AI than without it, although not all differences reached statistical significance. In this paper, the AI used was Lumit’s Insight CXR.32 Using a preliminary version of AIDOC’s Pulmonary Embolism (PE) AI algorithm, Cheik et al33 evaluated 1202 patients, of which 190 had PE. The AI algorithm alone detected 219 suspicious PEs, of which 176 were true PEs and 19 were true PEs that had been missed by the radiologists. AI had the highest sensitivity (92.6% vs 90%) when compared to radiologists, and also a higher negative-predictive value (98.6% vs 98.1%). However, radiologists had a higher specificity when compared to the AI algorithm (99.1% vs 95.8%) and a higher positive-predictive value (95% vs 80.4%).
Neuroradiology
In the USA, ischemic strokes (IS) account for 87% of all strokes,34 whereas subarachnoid hemorrhages (SAH) have a mortality rate of 23–51%.35,36 However, the rates of misdiagnosis of SAH range from 5 to 51%, and for IS from 30 to 42%.37 As such, this is an area where the use of AI could be very important. According to the ACRDSI’s AI Center,19 currently there are 74 FDA-approved AI algorithms in neuroradiology (which corresponds to 36.5% of the market). 26 of them are in CT for detection of stroke and hemorrhages, and 21 are in CT for detection of aneurysms, brain anatomy calcium scoring, surgical planning, etc. 31 are in MR, 19 of which are for assessment of brain anatomy, and the remaining are for image quality improvement, high-grade brain glioma detection, surgical planning, etc. Finally, two are in PET for image denoising. For example, Fasen et al,38 using Nicolab’s StrokeViewer, compared the performance of the AI with the reading of 4 neuroradiologists, 15 general radiologists and 10 senior residents, who assessed in the clinic the CT angiography of 474 patients, 75 of which had an arterial occlusion in the proximal anterior circulation (readers read different cases). Sensitivity of StrokeViewer was not significantly lower than that of the radiologists (77.3% vs 78.7%, p = 1.00), but specificity of the AI algorithm was significantly lower than that of the radiologists (88.5% vs 100%, p < 0.001) StrokeViewer correctly identified 40 of 42 Large Vessel Occlusions and 18 of 33 Medium Vessel Occlusions.38 On a different application, Bash et al39 evaluated the impact on image quality of 60% accelerated volumetric MR imaging sequences processed with Cortechs.ai NeuroQuant AI algorithm. 40 patients underwent brain MRI on 6 scanners in 5 different institutions. Standard of care and accelerated data sets were acquired for each subject, and the accelerated scans were enhanced with deep learning processing. For the image quality assessment, 2 neuroradiologists compared the 40 paired side-by-side-multiplanar 3D T 1 weighted series data sets, between the standard of care and the FAST-DL. The authors reported a 100% agreement in clinical disease classification of both the standard of care and FAST-DL data sets (n = 29 healthy/mild cognitive impairment and n = 11 dementia). The FAST-DL was superior to standard of care in perceived SNR, perceived spatial resolution, artifact reduction, anatomic/lesion conspicuity, and image contrast.
Musculoskeletal radiology
Despite the many AI algorithms developed in labs across the world for musculoskeletal radiology (examples would include detection of fractures in the proximal humerus, hand, wrist, and ankle on radiographs),40,41 detection of hip osteoarthritis in radiographs,42 quantitative bone imaging for the assessment of bone strength and quality,43–45 not as many have been translated into FDA-approved products. According to the ACRDSI’s AI Center,14 only 22 algorithms have received approval thus far. Ten use radiographs for diagnosis of knee osteoarthritis (3), detection of fractures (4), surgical planning (1), etc. Five use MR for anatomic descriptors (1), surgical planning (1), image quality improvement (1), and so forth. Finally, nine use CT, most for image quality improvement (3), spine display and surgical planning (2 of each). For example, Regnard et al46 assessed the performance of Gleamer’s BoneView AI for the detection of limbs and pelvic fractures, dislocations, focal bone lesions, and elbow effusions on trauma X-rays, and compared the AI’s performance to that of 40 board-certified radiologists who had originally read the 4774 exams. The 40 radiologists read different images and had different levels of expertise and different specialties. BoneView showed higher sensitivity for fracture detection than the radiologists (98.1% vs 73.7%), for dislocation detection (89.9% vs 63.3%), for elbow effusion detection (91.5% vs 84.7%), and for focal bone lesions (98.1% vs 16.1%). Radiologists’ specificity was 100%, whereas BoneView’s was 88, 99.1, 99.8 and 95.6% for fractures, dislocations, elbow effusions, and focal bone lesions (FBLs). The radiologists’ performance was better than the AI’s for more severe diseases such as complete dislocations and aggressive FBL, whereas benign FBL and diastasis were rarely reported by them, likely because of their low clinical relevance.46 In a separate study that examined whether deep learning can be used to improve fracture detection on X-rays in the Emergency Room, Reichert et al47 used Azmed’s Rayvolve to detect traumatic fractures on the X-ray images of 125 patients. These patients were selected by two emergency physicians and the X-rays were also analyzed by a radiologist. Rayvolve was then compared with the radiologist’s annotations. For the 125 patients, the radiologist identified 25 fractures (20%), whereas the AI identified 24 fractures, for a sensitivity of 96%. Rayvolve incorrectly predicted a fracture in 14 of the 100 patients without a fracture, for a specificity of 86%.
Non-interpretative uses of AI
AI can be used to solve a wide range of tasks that are non-interpretive in nature, but that can significantly help the radiologist and their patients.48 Several reviews have been written in this topic.48–51 Here, we cite the most common applications.
Improving workflow
One of the most useful applications of AI and machine learning (ML) is in creating study protocols and developing hanging protocols that are adequate to each radiologist. Typically, radiologists create study protocols based on selected clinical parameters.50 This entails the review of clinical and ordering information stored in the patient’s electronic medical record, referencing relevant lab values, reviewing prior images and radiology reports. It is a time-consuming task—on average, it takes each division of a radiology department 1–2 h per day protocoling studies52—but research has shown that ML algorithms are capable of determining clinical protocols for studies in both brain and body MRI,50 as well as musculoskeletal MRI.53,54
Also, hanging protocols are critical for an optimal reading of the images, and radiologists have their personal preferences of how the images should be displayed. Most Picture Archival and Communication Systems (PACS) offer some form of automated hanging protocol, but this is generally one-size-fits-all, namely, all radiologists get to see the images displayed in the same way.48 This is slowly changing, in particular with software like GE’s “Smart Reading Protocols”55 which learns the user’s preferences and displays the images accordingly. This field is ripe for use of AI, as different vendors have different sequence names and there are inconsistencies in the DICOM metadata, which preclude the use of a simple rules-based algorithm to optimize a hanging protocol.48
Improve image quality, reduce radiation dose in CT, decrease scan time in MRI
Image quality can be improved by noise and/or artifact reduction, and contrast enhancement, which improves visualization of different tissues in the body. Deep learning (DL) has been used to accomplish these goals.56–59 AI can help improve image quality in two ways: (1) acting on the processed image; or (2) acting directly on the raw image.48
Another potential application of AI is in the automatic evaluation of image quality. Currently, this is done by technologists, but there is a need for sequence repetition in as many as 55% of abdominal MRIs, which impacts healthcare costs, patient emotional wellbeing, and scanner time.60 A DL network can be trained to recognize non-diagnostic image quality with an average negative-predictive value (relative to two radiologists) of 90%.61
In addition, there is a need to reduce radiation dose in CT, but dose reduction usually results in lower quality images, with more image noise. AI has a role here as well. It can be trained with “noisy” images and with “high-quality” images, and it learns what pathology and normal structures look like at low doses compared to regular doses.50 It can then reconstruct low dose images as if they were regular dose images.59,62
In positron emission tomography (PET) imaging, noise reduction algorithms have been used to reconstruct images generated using a low-dose radiotracer and making them appear like images generated using a full-dose radiotracer.52 AI has been able to reduce the tracer dose to one 200th of the standard tracer dose and reduce scan time in up to 75% while achieving image quality comparable to the full tracer dose.63,64
MRI is well known for having a long acquisition time, which often forces a compromise between image quality and limited scanning time. DL has been used in reconstruction of MR images from under sampled k-space data by training a convolution neural network (CNN) to map between zero-filled (under sampled) and fully sampled images.65 In addition, ML has been used to achieve a 10-fold reduction in gadolinium-based contrast administration with no reduction in image quality or contrast information.66
Scheduling scanners and patients
In the United States, between 2012 and 2016, advanced imaging (CT and MRI) experienced an annual increase in demand from 1 to 5% per year.67 But MRIs are time-consuming to acquire, and significant time variability can exist in similarly protocoled MRI examinations.68 AI applications are being developed to optimize scanner time and to reduce patient waiting times. Data from the Radiology Information System (RIS) can be used to predict ultrasound delays, followed in order by delays in radiography, CT and MRI.69
In addition, in North America, about 23.5% of scheduled hospital appointments are missed by patients, and these rates have remained relatively unchanged for the past 10 years70. To improve these, simple models do not work, because there are too many variables that could explain the patient’s behavior. However, AI produces complex models that can handle hundreds of features at once. In a study that used AI to predict whether a given patient would show up for his/her MRI appointment, an AUC of 0.852 was achieved.70
Finally, AI can be used to triage patients. That is, the AI algorithm identifies exams which are more likely to contain critical findings and puts them at the top of the radiologist’s reading list.71 For example, a ML model that prospectively implemented a system to prioritize detection of intracranial hemorrhage on brain CT was developed,72 and it flagged 94 of 347 routine cases as being emergencies. The results were 60 true positives and 34 false positives, with the detection of 5 new hemorrhages and a reduction in reporting time for those cases from 8.5 h to 19 min.72 Nonetheless, questions arose about how the AI is going to prioritize studies if they come from different imaging modalities and different parts of the body.73 For example, should a head CT with intracranial hemorrhage be prioritized above or below a chest radiograph with tension pneumothorax and pneumonia? This is an issue that currently does not have a solution.
Barriers to adoption of AI
Fear of AI
Unfortunately, the publicity surrounding AI is such that it is actively discouraging medical students from pursuing radiology as a career,74–77 which is projected to lead to a significant shortage of radiologists in years to come. Studies that investigated the link between fear of AI and medical students’ knowledge of AI found that those who estimated to have little or no knowledge feared AI the most.74,78 For their part, a significant proportion of radiologists say that they will not use AI because they do not understand how it works.79
According to multiple surveys, carried out in the UK, Canada and Germany74,75,80 and recently in the USA,77,81 medical students fear that AI will replace diagnostic radiologists to such an extent that they are choosing not to pursue careers in radiology. For example, 20% more Canadian medical students would choose radiology if it was not for AI.75 In the UK, less than 50% of medical students surveyed would consider radiology as their specialty of choice because of the perceived success of AI,74 whereas in the USA, 44% of those surveyed said that their perceived influence of AI in radiology would make them less likely to specialize in this domain.77 These fears have led many to call for training in AI in medical schools,76,80,82 but with a curriculum already packed full of other didactic materials, such calls will likely remain unacted upon. These fears are stoked not by the actual performance of AI, which rarely has been compared in a prospective clinical trial to radiologists’ performance,18,83 but by exaggerated press releases that claim that AI will replace radiologists84 and that AI performs better than radiologists at selected tasks. Nonetheless, due to the high number of diseases and causal relations that radiologists can diagnose, they could benefit from befriending this technology.2 In other words, they should position themselves in the driver’s seat of this new technology. This is in agreement with a recent survey85 of radiology residents in the USA, which agreed or strongly agreed (83%) that AI/ML education should be a part of radiology residency curriculum. Interesting, the area of AI that the residents were most interested in was in acquiring knowledge to troubleshoot an AI tool in the clinical practice, that is, to determine if the AI algorithm was working as it should (82%).85
Large data sets and algorithm training
Currently, most applications of ML in radiology required labeled data for training, i.e. they require annotated inputs and outputs to learn. This process is called supervised learning, and it is the equivalent to learn with a teacher. Unfortunately, data labeling is a time-consuming and labor-intensive process, and ML algorithms are very data hungry, which requires the availability of large, annotated data sets to be used to train, validate and test the algorithms. But there are other types of learning; e.g. in unsupervised learning, the algorithm itself learns the patterns in the data, without the need for data annotations, whereas in semi-supervised learning the algorithm is trained with part labeled data and part unlabeled data.86 There are several techniques currently being employed aiming to reduce the burden of supervised learning.86 Among them, we can cite transfer learning, where knowledge from one domain is used to learn knowledge in a different domain; active learning, where the algorithm queries a human user on a selected subset of data; and weak supervised learning, where the labels are seen as imprecise or noisy.
Other issues during algorithm training pertain to over- or underfitting, which occur when variations in the findings can lead the algorithm to identify random noise as being an abnormality (overfitting), or when the data set used to train the algorithm is biased (either in terms patients’ ethnic/racial backgrounds, age or gender), which leads the algorithm to misidentify data from a different population.51
Currently, efforts are in place to create large data repositories for training of AI algorithms. In particular, we can cite The Cancer Imaging Archive (http://www.cancerimagingarchive.net/) at the National Institutes of Health, and the Radiological Society of North America Medical Imaging and Data Resource Center (https://www.nibib.nih.gov/medical-imaging-and-data-resource-center). These were created to promote collaborative AI research through the sharing of high-quality clinical data.87 In Europe, the CHAIMELEON project has started, a collaboration of 18 European centers to collect high-quality medical imaging and patient information data for development of AI algorithms.88 The project is supposed to take 4 years for data collection, and to accrue thousands of cases of lung, breast, colorectal and prostate cancer.88
Dealing with AI’s “black-box” nature
Because the models developed by ML are complex and high-dimensional, it is difficult to explain them in simple terms.49 This makes it difficult to interpret the reasoning process of ML models, an issue that is critical particularly when the algorithm disagrees with the radiologist. However, as the radiologist is the one who will have to explain the findings to the prescribing clinician, and who will bear any legal consequences about ignoring or accepting AI’s marks, understanding the algorithm’s logic is critical. As a result, the American Medical Association has issued policy asking AI developers to produce algorithms that are explainable and transparent 89,90 as opposed to the traditional “black box” AI. This has been mirrored by the European Commission High-Level Expert Group (ECHLEG) on AI, which published its Ethics Guidelines for Trustworthy AI.91 Explainability and transparency are among these guidelines.48
Moreover, the “black-box” nature of many AI applications can cause the algorithms to perpetuate human biases.92 This is because if the algorithm is trained on data that inherit biases from its data engineers, or that does not represent properly underrepresented populations, existing disparities may be enforced. This is a serious problem with AI that has been encountered in other fields outside of Medicine, and radiologists should be aware of what type of training data have been used by a given AI algorithm to learn, and to avoid as much as possible algorithms that do not include underrepresented populations.
Regulation
In the United States, the federal Food, Drug & Cosmetics Act defines device as “an instrument […] intended for use in the diagnosis […] or in the cure, mitigation, treatment, or prevention of disease […] which does not achieve its primary intended purposes through chemical action”.93 There are three established pathways for devices to be evaluated before they are allowed to hit the market. However, specifically for software as medical devices (SaMD), the FDA has outlined a novel pathway for clearance.94
In line with this, the 21st Century Cures Act amends the Food, Drug & Cosmetics Act to include a new section where several software types are excluded from the statutory definition of device and hence from FDA regulation.93 Among them, we can cite software products that: (1) are intended for administrative support; (2) are employed in the maintenance of a healthy lifestyle; (3) electronic patient records; (4) storage or display of clinical data; and (5) “unless the function is intended to acquire, process, or analyze a medical image […] for the purpose of (i) displaying, analyzing or printing medical information […] (ii) supporting or providing recommendations to a healthcare professional about prevention, diagnosis or treatment and […] (iii) enabling such healthcare professional to independently review the basis for such recommendations to make a clinical diagnosis or treatment decision regarding an individual patient”.93 Note that “independently review” is the key to this provision, because in order to avoid the designation as a device and the approval process that it entails, healthcare professionals must “be able to reach the same recommendation […] without relying primarily on the software function”.95 Moreover, the FDA holds the AI developers responsible for explaining the purpose and intended use of the software function, the intended user, the inputs used to generate the recommendation, and the rationale or support for the recommendation. At this time, it is not clear how the FDA will interpret the latter provision, given the “black-box” nature of most AI algorithms.
In Europe, medical devices are not approved by a centralized agency, contrary to the USA.96 For the lower risk class devices (Class I), the manufacturers do not undergo a review process, and are responsible for ensuring that the products comply with the regulations. Medical devices of high risk classes (classes IIa, IIb, III) are handled by Notified Bodies, which are organizations that have been accredited to do a conformity assessment and issue a Conformité Européenne (CE) mark.96 All European countries recognize the CE-mark, even if it was generated in a different European country.
Medicolegal issues
One of the main advantages of AI is that it may consider new features that radiologists currently do not use (e.g. due to inability to visually detect or quantify them) into its decision-making process.97 The medicolegal issue that arises is who is responsible for the diagnosis, particularly if the diagnosis is incorrect.51 In other words, if radiologists are not the primary interpreters of the image, will they still carry the burden of the incorrect decision, or will the manufacturers of the AI system, its data scientists and engineers bear the burden? Some would argue that the manufacturers of the AI system cannot bear this burden, as the algorithms may eventually be learning with continuous use in the clinical practice.49 Thus, the radiologists must be the ones who will have to take ownership of the medical diagnoses and treatment choices delivered to the patient.98 This is a difficult asking, because even if radiologists monitor the output of the AI systems and make decisions regarding whether to accept or to reject the AI system’s prompts, how can they still be ultimately held responsible for its decisions, particularly in case of black-box AI systems that cannot explain their reasoning?
Mezrich99 argues that the legal handling of AI will greatly depend on the degree of autonomy given to the algorithm. In other words, if the primary use of AI is to highlight findings for the radiologist, and the radiologist then makes the final decisions on the case, then the law is simple, and it states that the radiologist bears the burden when a mistake occurs. If, however, the AI is to be used as primary reader, there simply are no established laws regarding AI’s autonomy, and this uncertainty creates an “inherent bias towards limiting the role of AI to that of a tool and holding the human user – the radiologist – primarily responsible” when a mistake is made.99 But even in this capability AI may create issues for the radiologist, particularly in the case where the AI prompt and the radiologist disagree. If it turns out that the AI prompt was correct, the radiologist may have to justify his/her decision in front of a jury, and the AI finding would become the “built-in expert witness” against the radiologist.100
Interestingly, in high volume settings or during off hours, if the AI system is used as the primary reader, it would be considered to be acting as the radiologist’s agent or subordinate. Hence, if the AI made a mistake, this brings into play the legal doctrine of vicarious liability, whereby the mistakes of the assistant are assigned to the supervisor even if the supervisor was not physically present when the decision was made.99
Ethical issues
In 2019, a joint statement from the American College of Radiology, the European Society of Radiology, the Radiological Society of North America, the Society for Imaging Informatics in Medicine, the European Society of Medical Imaging Informatics, the Canadian Association of Radiologists and the American Association of Physicists in Medicine was published101 highlighting the ethical considerations for AI in Radiology. In it, the authors suggested that currently there is not a whole lot of experience of use of AI for patient care in all types of clinical settings. Because of this, a lot of research still needs to be done to understand AI’s use in the clinical practice and what operational characteristics AI should have.101 Furthermore, the authors pointed out that radiologists have a duty to understand the risks of AI, to alert patients and stakeholders of these risks, and to monitor AI products to guard against harm. Moreover, radiologists should ensure that the benefits of AI are applied equally to all populations, and that they should make sure that the negative consequences of use of AI are not made worse by unethical behaviors.101
In addition, issues with data ownership and privacy should be considered before implementation of an AI algorithm. In the USA, all medical devices must be compliant with the Health Insurance Portability and Accountability Act guidelines for protection of patient identity and other confidential information. In Europe, stricter regulations are in place through the General Data Protection Regulation (GDPR).102 A complete analysis of the laws related to the GDPR and the use of personal data for AI algorithms applied to healthcare is provided by Forcier et al.103 The paper discusses improvements to public health due to big data, the paradigm of a learning health system, and the implications for data protection, with focus on the USA, Canada and Europe.103
However, risks to patient privacy can depend on the AI implementation.28 Currently, two types of implementation are possible, either locally, on the computers/servers of the hospital/clinic, or on the cloud. There are several advantages to a cloud-based implementation, such as it decreases implementation costs, facilitates software updates, and allows companies to scale more easily.28 The drawback of this implementation is that the patient data must leave the hospital and therefore it may carry a higher risk of data breach.
Conclusions
In this review, we discussed many aspects of AI’s application in the radiological clinic. From interpretative to non-interpretative uses of AI, we determined that currently AI has a modest to moderate penetration in the clinical practice, with many radiologists still being unconvinced of its value and the return on its investment. We also explored the many barriers for AI’s adoption, from its “black-box” nature to regulatory and ethical issues. We attempted to present a picture of AI in the clinical practice of Radiology that was accurate, but one that only reflected the elements currently in place, instead of future applications (such as self-learning AI algorithms and explainable AI), because not only these are not implemented in the clinic today, but also there are no regulations pertaining them. In conclusion, we can say that AI is coming to stay in Radiology, and we advocate for more educational opportunities for radiologists to learn about this technology so that they can ascertain algorithms’ biases and mistakes, because ultimately, according to today’s legislation, in many scenarios radiologists will be held responsible for the decisions made by the AI system.
The authors declare no competing interests.
This research was partially supported by a grant (1R01 CA259048) from the National Institutes of Health/National Cancer Institute
This paper did not involve directly any patients.
This paper involved a retrospective review of the published literature, as such it did not require ethical approval.
The authors have no disclosures to make.
REFERENCES