Abstract

Background

Use of artificial intelligence (AI), or machine learning, to assess dermoscopic images of skin lesions to detect melanoma has, in several retrospective studies, shown high levels of diagnostic accuracy on par with – or even outperforming – experienced dermatologists. However, the enthusiasm around these algorithms has not yet been matched by prospective clinical trials performed in authentic clinical settings. In several European countries, including Sweden, the initial clinical assessment of suspected skin cancer is principally conducted in the primary healthcare setting by primary care physicians, with or without access to teledermoscopic support from dermatology clinics.

Objectives

To determine the diagnostic performance of an AI-based clinical decision support tool for cutaneous melanoma detection, operated by a smartphone application (app), when used prospectively by primary care physicians to assess skin lesions of concern due to some degree of melanoma suspicion.

Methods

This prospective multicentre clinical trial was conducted at 36 primary care centres in Sweden. Physicians used the smartphone app on skin lesions of concern by photographing them dermoscopically, which resulted in a dichotomous decision support text regarding evidence for melanoma. Regardless of the app outcome, all lesions underwent standard diagnostic procedures (surgical excision or referral to a dermatologist). After investigations were complete, lesion diagnoses were collected from the patients’ medical records and compared with the app’s outcome and other lesion data.

Results

In total, 253 lesions of concern in 228 patients were included, of which 21 proved to be melanomas, with 11 thin invasive melanomas and 10 melanomas in situ. The app’s accuracy in identifying melanomas was reflected in an area under the receiver operating characteristic (AUROC) curve of 0.960 [95% confidence interval (CI) 0.928–0.980], corresponding to a maximum sensitivity and specificity of 95.2% and 84.5%, respectively. For invasive melanomas alone, the AUROC was 0.988 (95% CI 0.965–0.997), corresponding to a maximum sensitivity and specificity of 100% and 92.6%, respectively.

Conclusions

The clinical decision support tool evaluated in this investigation showed high diagnostic accuracy when used prospectively in primary care patients, which could add significant clinical value for primary care physicians assessing skin lesions for melanoma.

Linked Article: Jones et al. Br J Dermatol 2024; 191:13.

Author Video: https://youtu.be/jtzN34U9cG0

Plain language summary

Skin cancer is one of the most common forms of cancer worldwide. Melanoma is a serious type of skin cancer that can be difficult to differentiate from other skin lesions, even for experienced physicians.

In Sweden, the initial assessment of suspected melanoma is performed by primary care physicians. We aimed to investigate the capability of a diagnostic decision support tool to detect melanoma when used by primary care physicians to assess skin lesions of concern. The device is based on artificial intelligence (AI) assessment of dermoscopic images via a smartphone app.

Thirty-six primary care centres in Sweden took part. If a patient had a skin lesion that was suspected to be a melanoma, they were asked to participate. The physician used the app by photographing the lesion with a smartphone connected to a dermoscope (a magnifying lens). The app provides guidance on whether to suspect melanoma or not. However, independently of this guidance, all lesions underwent regular diagnostic investigation, and the results were compared.

Of 253 skin lesions of concern, 11 malignant melanomas and 10 in situ (precancerous) melanoma lesions were found. The app correctly identified all these as melanomas, except for one of the in situ melanomas. The negative predictive value of the app was 99.5%, which meant that if the tool suggested a lesion was not a melanoma, there was a 99.5% probability it was correct. Overall, our study findings suggest that decision support might help primary care physicians to avoid unnecessary surgical excision of benign skin lesions.

What is already known about this topic?
  • Use of artificial intelligence (AI) for melanoma detection has shown high diagnostic accuracy levels.

  • Prospective studies investigating the diagnostic accuracy in detecting melanoma in true clinical settings are lacking, particularly in primary care, where the initial assessment of skin lesions most often takes place.

 
What does this study add?
  • The present study investigated the diagnostic accuracy of an AI-based decision support for melanoma detection when used in primary care.

  • The study indicates that automated decision support may increase the diagnostic accuracy of primary care physicians’ ability to differentiate melanomas from other skin lesions, potentially reducing the number of benign lesions subjected to unnecessary excision.

For cutaneous melanoma (henceforth ‘melanoma’), detection and excision at an early stage of the disease is crucial for prognosis and survival.1,2 However, discriminating melanomas from benign lesions can be challenging for the examining physician, demanding considerable training and clinical experience. The addition of dermoscopy to the naked eye examination increases the diagnostic accuracy, especially when applying structured pattern recognition algorithms.3 Dermatologists generally achieve a higher diagnostic accuracy than primary care physicians (PCPs) in detecting melanomas, with a sensitivity of 73–96% vs. 50–84% and a specificity of 73–98% vs. 71–89%.4–8 In cases of diagnostic uncertainty, the lethal potential of melanoma necessitates histopathological investigation or at least lesion monitoring when the degree of suspicion is low. Consequently, a significant proportion of benign lesions are excised or, in primary care, referred to a dermatologist for assessment, often by teledermoscopy.6–8

Artificial intelligence (AI) in medical imaging has gained increasing attention and interest.9–14 This includes AI processing of dermoscopic images for melanoma recognition. Several studies have been conducted using various image databases, consistently reporting levels of diagnostic accuracy comparable to those achieved by experienced ­dermatologists.12–19 However, few studies have investigated the performance of AI prospectively with patients in real-life clinical settings.20–24 Of these, four were conducted at dermatology clinics,20–23 and only one in primary care, with only one melanoma.24 As the initial assessment for the majority of patients in many countries takes place at primary care centres (PCCs), large prospective studies on the use of AI for melanoma detection in primary care settings are needed.13,14,25 Furthermore, the potential benefit of adding AI support is likely to be greatest among PCPs, owing to their lower levels of diagnostic accuracy.

The aim of this investigation was to determine the diagnostic performance of an AI-based clinical decision support tool for melanoma detection, operated by a smartphone application (app), when used prospectively by PCPs to assess skin lesions of concern due to some degree of melanoma suspicion.

Materials and methods

A prospective real-life clinical trial was conducted to assess the diagnostic performance of the smartphone app Dermalyser® (AI Medical Technology, Stockholm, Sweden), an AI-based decision support tool (henceforth ‘the app’) to detect melanoma.26 The study was performed in accordance with the STARD 2015 guidelines for diagnostic accuracy studies and approved by the Swedish Medical Products Agency and the Swedish Ethical Review Authority. The trial was registered with ClinicalTrials.gov (NCT05172232). The machine learning algorithm behind the decision support had been trained in silico prior to the start of the study on an extensive number of dermoscopic images of skin lesions, showing an area under the receiver operating characteristics (ROC) curve (AUROC) of 0.94. The algorithm was fixed prior to the start of data collection (i.e. no further training or modification of its performance was done during the study). When applied on a dermoscopic image, the algorithm produces a value between 0 and 1, corresponding to a calculated probability (or risk) of the lesion being a melanoma, but in its communication with the user, the intention of the decision support is to be presented as a dichotomous outcome (‘evidence of melanoma detected’ or ‘no evidence of melanoma detected’) based on this probability. Thus, prior to data inclusion we decided on a cutoff level for the dicho­tomy that would entail a reasonable risk of false negatives without leading to an extensive proportion of false positives, which from the ROC curve in the aforementioned in silico pretrial corresponded to a sensitivity and specificity of 95% and 78%, respectively.

Study population

Overall, 36 PCCs located in seven regions of southern Sweden participated. At these centres, 138 PCPs (90 certified general practitioners and 48 resident trainees) were trained to be able to enrol study participants. Patients aged ≥ 18 years, visiting any of the study units and presenting with one or more skin lesions for which the PCP had any suspicion of melanoma (ranging from ‘appears as benign, but cannot with full certainty exclude melanoma’ to ‘undoubtedly a melanoma’) were eligible for inclusion. Lesions exclusively suspected of being any other cutaneous malignancy (such as basal cell or squamous cell carcinoma) were not included. Lesions on damaged or tattooed skin, located on inaccessible body locations (e.g. between the fingers) or covered with dense hair making it difficult to provide sufficient image quality, were excluded. Having a ­melanin-rich skin type (i.e. Fitzpatrick skin type V–VI)27 was also an exclusion criterion, owing to the lack of such images in the app’s convolutional neural network training data. Finally, poor-quality dermoscopic images (e.g. out-of-focus images or lesions not covered entirely by the image) were excluded. No remuneration was given to the participating centres or patients.

Study procedure/data collection

Eligible patients identified by the examining PCP received oral and written information about the study and gave their written informed consent to participate before being included. As a fundamental criterion for a skin lesion to be included was that the examining PCP had some degree of melanoma suspicion, PCPs decided on medical action according to their degree of suspicion and in accordance with ordinary clinical routine before applying the app. The PCPs recorded their degree of melanoma suspicion as either ‘high’ or ‘low, but cannot rule out melanoma’, as well as their decision on action as (i) excision at the PCC, (ii) referral for excision by another surgeon, (iii) referral to a dermatologist for further clinical evaluation (with or without the use of teledermoscopy) or (iv) other action taken (which they were asked to specify). They also reported whether they had used any kind of established diagnostic algorithm to support their evaluation, such as the clinical ABCDE criteria or any dermoscopic algorithms (e.g. classic or modified pattern analysis, 3- or 7-point checklists).28 Furthermore, the body location of the lesion and the patients’ Fitzpatrick skin type,27 age and sex were recorded. The PCP then used the app on the included skin lesion (or a maximum of three lesions if a patient had more than one suspected melanoma) and recorded its outcome (‘evidence of melanoma detected’ or ‘no evidence of melanoma detected’). Importantly, this was done without letting the app’s outcome affect the decision on action already taken or communicating the outcome to the patient. Instead, the PCPs were asked whether they believed that the app outcome would have changed their degree of suspicion or not and, if so, in what direction.

Technical equipment

All participating PCCs routinely use teledermoscopy in the assessment of skin tumours and were thus already equipped with polarized light contact dermoscopes [either a Heine iC1 (Heine Optotechnik, Gilching, Germany) or a DermLite DL3 (DermLite, Aliso Viejo, CA, USA), with occasional exceptions]. Prior to the study, all units were provided a camera-equipped smartphone (iPhone SE, 2020; iOS 14 or 15, Apple, Cupertino, CA, USA) with the app installed, together with a phone case adapter compatible with the dermoscope model available at the unit. Before study initiation, on-site instruction and education on how to use the app was given to the PCPs at each unit.

Data evaluation

For each lesion, the app outcome was compared with the final clinical or histopathological tumour diagnosis collected from the patient record. In the clinical setting where the study was performed (i.e. Swedish primary care), the standard routine for skin tumour diagnostics is either excision (at the PCC or surgical clinic) or referral to a dermatologist for further evaluation. In the latter case, if the dermatologist assesses the lesion as undoubtedly benign, excision is generally not performed, owing to the absence of melanoma suspicion at the higher level of clinical expertise, resulting in a clinical diagnosis. If excised, all lesions undergo histopathological analysis, irrespective of who performs the excision. This standard clinical routine for diagnosis was thus also applied for tumour diagnosis in the study.29

Results

Participant recruitment was performed from May to December 2022. An intended sample size of 500 lesions was considered to be required to detect at least 12 melanomas, to reach sufficient statistical power. Recruitment took longer than expected; however, as the number of detected melanomas reached the target number more quickly than expected, inclusion was terminated when 253 lesions in 228 patients (125 women, 103 men) had been included (Figure 1). The mean age of the overall study population was 54 years (for women it was 51 years and for men it was 55 years). The clinical ABCDE criteria was the most commonly used assessment tool by the PCPs (59.8% of all lesions), followed by the classic or modified dermoscopic pattern analysis algorithms (49.4%) or a combination of both (18.7%). In 10.1%, other assessment methods were used. Of the 36 participating PCCs, 7 did not include any lesions. Among the remaining 29 centres, the number of included lesions varied between 1 and 33. In total, 194 lesions were managed by referral to dermatologist, 54 by primary excision at the PCC and 5 by referral for excision at a surgical clinic. In 134 lesions, tumour diagnoses were based on histopathology, while 119 lesions were clinically diagnosed as benign by the dermatologist (Table 1). Overall, 21 melanomas (11 invasive melanomas and 10 melanomas in situ) were detected in 20 patients (Figure S1; see Supporting Information). The invasive melanomas were all of a superficial spreading type, exhibited a Breslow thickness ranging from 0.1 to 1.1 mm (median 0.5) and were of histopathologic class T1a, with the exception of two lesions (T1b and T2a). The posterior torso was the most common melanoma location.

Flowchart illustrating the study sample. Data are presented as number of lesions (number of patients). *Reasons for exclusion: two lesions showed at monitoring not to be melanoma suspected by the primary care physician (PCP), and for one lesion, the photo taken for app analysis was not dermoscopic.
Figure 1

Flowchart illustrating the study sample. Data are presented as number of lesions (number of patients). *Reasons for exclusion: two lesions showed at monitoring not to be melanoma suspected by the primary care physician (PCP), and for one lesion, the photo taken for app analysis was not dermoscopic.

Table 1

Different clinical and (for melanomas) histopathological characteristics of included lesions

Melanoma, invasiveMelanoma in situMelanoma, totalNonmelanomaAll lesions
Sex (n)
 Female8614126140
 Male347106112
Age (years)
 Mean7162665355
 Median (IQR)70 (63–76)60 (50–76)69 (58–76)53 (38–68)56 (55–69)
 Range57–8639–7839–8620–9020–90
Fitzpatrick skin type (n)
 I1011920
 II5712153165
 III5384765
 IV0001313
Lesion size, clinically assessed (mm)
 Mean8.47.68.07.17.2
 Median (IQR)5 (4–10)6.5 (5–9)6 (4.5–9.5)6 (4–10)6 (4–10)
 Range3–301–201–351–301–35
Body location (n)
 Face0001616
 Head and neck0002121
 Anterior torso3145054
 Posterior torso2467682
 Lateral torso1011011
 Upper extremities3142024
 Lower extremities2463238
 Palms/soles00055
 Groin/genital region00022
Lesion characteristics
Melanomas (n = 21)
 Histopathological characteristics
  Breslow thickness (mm)
  Mean0.54
  Median (IQR)5 (0.3–0.7)
  Range0.2–1.1
 Tumour width (mm)
  Mean7.97.2
  Median (IQR)6.5 (4–11)6.5 (5–9)
  Range3–174–13
 Tumour length (mm)
  Mean10.79.5
  Median (IQR)9.5 (5–13)8 (6–12)
  Range3–305–19
Nonmelanomas (n = 221)
 Diagnosis, total (based on dermatologist/histopathologic assessment)
  Nondysplastic melanocytic naevus66 (40/26)
  Congenital melanocytic naevus11 (9/2)
  Dysplastic melanocytic naevus30 (0/30)
  Seborrhoeic keratosis69 (45/24)
  Actinic keratosis2 (2/0)
  Solar lentigo8 (6/2)
  Dermatofibroma8 (3/5)
  Haemangioma7 (3/4)
  BCC11 (2/9)
  SCC (incl. in situ)4 (0/4)
  Other13 (7/6)
Melanoma, invasiveMelanoma in situMelanoma, totalNonmelanomaAll lesions
Sex (n)
 Female8614126140
 Male347106112
Age (years)
 Mean7162665355
 Median (IQR)70 (63–76)60 (50–76)69 (58–76)53 (38–68)56 (55–69)
 Range57–8639–7839–8620–9020–90
Fitzpatrick skin type (n)
 I1011920
 II5712153165
 III5384765
 IV0001313
Lesion size, clinically assessed (mm)
 Mean8.47.68.07.17.2
 Median (IQR)5 (4–10)6.5 (5–9)6 (4.5–9.5)6 (4–10)6 (4–10)
 Range3–301–201–351–301–35
Body location (n)
 Face0001616
 Head and neck0002121
 Anterior torso3145054
 Posterior torso2467682
 Lateral torso1011011
 Upper extremities3142024
 Lower extremities2463238
 Palms/soles00055
 Groin/genital region00022
Lesion characteristics
Melanomas (n = 21)
 Histopathological characteristics
  Breslow thickness (mm)
  Mean0.54
  Median (IQR)5 (0.3–0.7)
  Range0.2–1.1
 Tumour width (mm)
  Mean7.97.2
  Median (IQR)6.5 (4–11)6.5 (5–9)
  Range3–174–13
 Tumour length (mm)
  Mean10.79.5
  Median (IQR)9.5 (5–13)8 (6–12)
  Range3–305–19
Nonmelanomas (n = 221)
 Diagnosis, total (based on dermatologist/histopathologic assessment)
  Nondysplastic melanocytic naevus66 (40/26)
  Congenital melanocytic naevus11 (9/2)
  Dysplastic melanocytic naevus30 (0/30)
  Seborrhoeic keratosis69 (45/24)
  Actinic keratosis2 (2/0)
  Solar lentigo8 (6/2)
  Dermatofibroma8 (3/5)
  Haemangioma7 (3/4)
  BCC11 (2/9)
  SCC (incl. in situ)4 (0/4)
  Other13 (7/6)

BCC, squamous cell carcinoma; IQR, interquartile range; SCC, squamous cell carcinoma.

Table 1

Different clinical and (for melanomas) histopathological characteristics of included lesions

Melanoma, invasiveMelanoma in situMelanoma, totalNonmelanomaAll lesions
Sex (n)
 Female8614126140
 Male347106112
Age (years)
 Mean7162665355
 Median (IQR)70 (63–76)60 (50–76)69 (58–76)53 (38–68)56 (55–69)
 Range57–8639–7839–8620–9020–90
Fitzpatrick skin type (n)
 I1011920
 II5712153165
 III5384765
 IV0001313
Lesion size, clinically assessed (mm)
 Mean8.47.68.07.17.2
 Median (IQR)5 (4–10)6.5 (5–9)6 (4.5–9.5)6 (4–10)6 (4–10)
 Range3–301–201–351–301–35
Body location (n)
 Face0001616
 Head and neck0002121
 Anterior torso3145054
 Posterior torso2467682
 Lateral torso1011011
 Upper extremities3142024
 Lower extremities2463238
 Palms/soles00055
 Groin/genital region00022
Lesion characteristics
Melanomas (n = 21)
 Histopathological characteristics
  Breslow thickness (mm)
  Mean0.54
  Median (IQR)5 (0.3–0.7)
  Range0.2–1.1
 Tumour width (mm)
  Mean7.97.2
  Median (IQR)6.5 (4–11)6.5 (5–9)
  Range3–174–13
 Tumour length (mm)
  Mean10.79.5
  Median (IQR)9.5 (5–13)8 (6–12)
  Range3–305–19
Nonmelanomas (n = 221)
 Diagnosis, total (based on dermatologist/histopathologic assessment)
  Nondysplastic melanocytic naevus66 (40/26)
  Congenital melanocytic naevus11 (9/2)
  Dysplastic melanocytic naevus30 (0/30)
  Seborrhoeic keratosis69 (45/24)
  Actinic keratosis2 (2/0)
  Solar lentigo8 (6/2)
  Dermatofibroma8 (3/5)
  Haemangioma7 (3/4)
  BCC11 (2/9)
  SCC (incl. in situ)4 (0/4)
  Other13 (7/6)
Melanoma, invasiveMelanoma in situMelanoma, totalNonmelanomaAll lesions
Sex (n)
 Female8614126140
 Male347106112
Age (years)
 Mean7162665355
 Median (IQR)70 (63–76)60 (50–76)69 (58–76)53 (38–68)56 (55–69)
 Range57–8639–7839–8620–9020–90
Fitzpatrick skin type (n)
 I1011920
 II5712153165
 III5384765
 IV0001313
Lesion size, clinically assessed (mm)
 Mean8.47.68.07.17.2
 Median (IQR)5 (4–10)6.5 (5–9)6 (4.5–9.5)6 (4–10)6 (4–10)
 Range3–301–201–351–301–35
Body location (n)
 Face0001616
 Head and neck0002121
 Anterior torso3145054
 Posterior torso2467682
 Lateral torso1011011
 Upper extremities3142024
 Lower extremities2463238
 Palms/soles00055
 Groin/genital region00022
Lesion characteristics
Melanomas (n = 21)
 Histopathological characteristics
  Breslow thickness (mm)
  Mean0.54
  Median (IQR)5 (0.3–0.7)
  Range0.2–1.1
 Tumour width (mm)
  Mean7.97.2
  Median (IQR)6.5 (4–11)6.5 (5–9)
  Range3–174–13
 Tumour length (mm)
  Mean10.79.5
  Median (IQR)9.5 (5–13)8 (6–12)
  Range3–305–19
Nonmelanomas (n = 221)
 Diagnosis, total (based on dermatologist/histopathologic assessment)
  Nondysplastic melanocytic naevus66 (40/26)
  Congenital melanocytic naevus11 (9/2)
  Dysplastic melanocytic naevus30 (0/30)
  Seborrhoeic keratosis69 (45/24)
  Actinic keratosis2 (2/0)
  Solar lentigo8 (6/2)
  Dermatofibroma8 (3/5)
  Haemangioma7 (3/4)
  BCC11 (2/9)
  SCC (incl. in situ)4 (0/4)
  Other13 (7/6)

BCC, squamous cell carcinoma; IQR, interquartile range; SCC, squamous cell carcinoma.

Diagnostic accuracy of the app

The diagnostic accuracy of the app in differentiating melanoma from nonmelanoma lesions is presented as ROC curves (Figure 2). The AUROC for the app’s capability of differentiating all melanomas from other lesions was 0.960 [95% confidence interval (CI) 0.928–0.980], corresponding to, at best, 95.2% sensitivity and 84.5% specificity, a positive predictive value (PPV) of 35.9% and a negative predictive value (NPV) of 99.5% (Figure 2a). The sensitivity and specificity of the app’s predefined cutoff level applied in the study (marked as blue squares on the curves) were 95.2% and 60.3%, respectively (PPV 17.9%, NPV 99.3%).

Receiver operating characteristics curves for agreement between app outcome and final diagnosis (a) for all melanomas vs. nonmelanomas and (b) for invasive melanomas vs. all other lesions (including melanoma in situ). The blue squares mark the values for the predefined cutoff level used in the study to communicate the app’s guidance to the user. AUC, area under the curve.
Figure 2

Receiver operating characteristics curves for agreement between app outcome and final diagnosis (a) for all melanomas vs. nonmelanomas and (b) for invasive melanomas vs. all other lesions (including melanoma in situ). The blue squares mark the values for the predefined cutoff level used in the study to communicate the app’s guidance to the user. AUC, area under the curve.

The number needed to excise is commonly used to describe the number of lesions suspected to be melanoma that need to undergo histopathological investigation to detect at least one case (i.e. number of excised lesions/number of melanomas found). As lesion diagnoses in our study were based on either excision or dermatologist assessment, we instead explored the number needed to investigate (NNI), as indicated by the app as ‘evidence of melanoma detected’. For the app’s best performance on the ROC curve, the NNI was 2.8 and for the predefined cutoff level it was 5.5.

Concentrating exclusively on invasive melanomas, the AUROC was 0.988 (95% CI 0.965–0.997) (Figure 2b). The app’s sensitivity and specificity in detecting these was, at best, 100% and 92.6% respectively (PPV 38.2%, NPV 100%, NNI 5.1).

Associations between app guidance and primary care physicians’ degree of melanoma suspicion

Figure 3 illustrates the relationship between PCPs’ reported degree of melanoma suspicion (high/low), the final diagnosis (melanoma/not melanoma) outcome and the guidance communicated to the user on the screen (i.e. app outcome at the predefined cutoff level), displayed from both perspectives. Table 2 shows to what extent the app’s guidance was coherent with the PCPs’ degree of melanoma suspicion (i.e. app indicating ‘no evidence of melanoma’ when PCPs’ suspicion degree was low/app indicating ‘evidence of melanoma detected’ when PCP’s degree of suspicion was high). When theoretically adding the PCPs’ degree of melanoma suspicion to the app guidance (i.e. when they were in agreement with one another), the combined diagnostic value for predicting melanoma in terms of PPV would have increased vs. that of each parameter alone (Table 3), whereas the NPV was clearly lower for the PCPs’ degree of suspicion alone vs. the app or the two parameters combined. For the single in situ melanoma missed by the app, the PCPs’ degree of suspicion was also low.

Graph showing the relationships between the primary care physicians’ (PCPs') reported degree of melanoma suspicion, final tumour diagnosis and the app’s outcome (at the cutoff level applied in the study to communicate the app’s guidance to the user).
Figure 3

Graph showing the relationships between the primary care physicians’ (PCPs') reported degree of melanoma suspicion, final tumour diagnosis and the app’s outcome (at the cutoff level applied in the study to communicate the app’s guidance to the user).

Table 2

Interactions between app guidance (at the predefined cutoff level) and the primary care physician (PCP)-reported degree of melanoma suspicion: distribution of agreement between app guidance and PCP-reported degree of melanoma suspicion, and its association with final diagnosis

Categories of agreement between PCP degree of melanoma suspicion and app guidance (n = 253 lesions)Final diagnosis
PCP degree of suspicion high + app indicates ‘Evidence of melanoma detected’38 (15.0)Melanoma12 (4.7)
Nonmelanoma26 (10.3)
PCP degree of suspicion high + app indicates ‘No evidence of melanoma detected’13 (5.1)Melanoma0 (0)
Nonmelanoma13 (5.1)
PCP degree of suspicion low + app indicates ‘Evidence of melanoma detected’74 (29.2)Melanoma8 (3.2)
Nonmelanoma66 (26.1)
PCP degree of suspicion low + app indicates ‘No evidence of melanoma detected’128 (50.6)Melanoma1 (0.4)
Nonmelanoma127 (50.2)
Categories of agreement between PCP degree of melanoma suspicion and app guidance (n = 253 lesions)Final diagnosis
PCP degree of suspicion high + app indicates ‘Evidence of melanoma detected’38 (15.0)Melanoma12 (4.7)
Nonmelanoma26 (10.3)
PCP degree of suspicion high + app indicates ‘No evidence of melanoma detected’13 (5.1)Melanoma0 (0)
Nonmelanoma13 (5.1)
PCP degree of suspicion low + app indicates ‘Evidence of melanoma detected’74 (29.2)Melanoma8 (3.2)
Nonmelanoma66 (26.1)
PCP degree of suspicion low + app indicates ‘No evidence of melanoma detected’128 (50.6)Melanoma1 (0.4)
Nonmelanoma127 (50.2)

Data presented as n (%).

Table 2

Interactions between app guidance (at the predefined cutoff level) and the primary care physician (PCP)-reported degree of melanoma suspicion: distribution of agreement between app guidance and PCP-reported degree of melanoma suspicion, and its association with final diagnosis

Categories of agreement between PCP degree of melanoma suspicion and app guidance (n = 253 lesions)Final diagnosis
PCP degree of suspicion high + app indicates ‘Evidence of melanoma detected’38 (15.0)Melanoma12 (4.7)
Nonmelanoma26 (10.3)
PCP degree of suspicion high + app indicates ‘No evidence of melanoma detected’13 (5.1)Melanoma0 (0)
Nonmelanoma13 (5.1)
PCP degree of suspicion low + app indicates ‘Evidence of melanoma detected’74 (29.2)Melanoma8 (3.2)
Nonmelanoma66 (26.1)
PCP degree of suspicion low + app indicates ‘No evidence of melanoma detected’128 (50.6)Melanoma1 (0.4)
Nonmelanoma127 (50.2)
Categories of agreement between PCP degree of melanoma suspicion and app guidance (n = 253 lesions)Final diagnosis
PCP degree of suspicion high + app indicates ‘Evidence of melanoma detected’38 (15.0)Melanoma12 (4.7)
Nonmelanoma26 (10.3)
PCP degree of suspicion high + app indicates ‘No evidence of melanoma detected’13 (5.1)Melanoma0 (0)
Nonmelanoma13 (5.1)
PCP degree of suspicion low + app indicates ‘Evidence of melanoma detected’74 (29.2)Melanoma8 (3.2)
Nonmelanoma66 (26.1)
PCP degree of suspicion low + app indicates ‘No evidence of melanoma detected’128 (50.6)Melanoma1 (0.4)
Nonmelanoma127 (50.2)

Data presented as n (%).

Table 3

Interactions between app guidance (at the predefined cutoff level) and primary care physician (PCP)-reported degree of melanoma suspicion: effect on positive (PPV) and negative predictive value (NPV) when adding the physician’s degree of melanoma suspicion to the app’s guidance, if congruent with each other (i.e. both methods agree on suspicious for melanoma or not)

Lesion assessment methodPPV (%)NPV (%)
App outcome alone17.999.3
PCPs’ degree of melanoma suspicion alone23.595.5
App outcome when combined with PCPs’ degree of melanoma suspicion31.699.2
Lesion assessment methodPPV (%)NPV (%)
App outcome alone17.999.3
PCPs’ degree of melanoma suspicion alone23.595.5
App outcome when combined with PCPs’ degree of melanoma suspicion31.699.2
Table 3

Interactions between app guidance (at the predefined cutoff level) and primary care physician (PCP)-reported degree of melanoma suspicion: effect on positive (PPV) and negative predictive value (NPV) when adding the physician’s degree of melanoma suspicion to the app’s guidance, if congruent with each other (i.e. both methods agree on suspicious for melanoma or not)

Lesion assessment methodPPV (%)NPV (%)
App outcome alone17.999.3
PCPs’ degree of melanoma suspicion alone23.595.5
App outcome when combined with PCPs’ degree of melanoma suspicion31.699.2
Lesion assessment methodPPV (%)NPV (%)
App outcome alone17.999.3
PCPs’ degree of melanoma suspicion alone23.595.5
App outcome when combined with PCPs’ degree of melanoma suspicion31.699.2

Regarding the question about how the PCPs believed that the app’s outcome would have affected their degree of melanoma suspicion if they had been allowed to let it affect their clinical management, the PCPs reported that their suspicion degree would have increased for 61 lesions (24.1%), it would have decreased for 56 lesions (22.1%) and it would have remained unchanged for 136 lesions (53.8%).

Table 4 presents the associations between different lesion variables, including the app’s guidance, and the final diagnosis. As shown, the app’s guidance was also the strongest predictor of melanoma when adjusted for the clinical variables lesion size, patient age and sex.

Table 4

Odds ratios (ORs) for the app’s guidance (at the predefined cutoff level) and primary care physician (PCP) degree of melanoma suspicion in predicting a melanoma diagnosis, analysed with binary logistic regression analysis (enter method) and adjusted for lesion size, patient age and sex

ORP-value95% CI
App outcome (evidence of melanoma detected)a26.550.0023.29–213.96
PCPs’ degree of melanoma suspicion (high)a3.350.021.19–9.44
Lesion size (large)b0.980.560.89–1.06
Age (high)b1.060.0021.02–1.11
Sex (female)a1.870.250.64–5.47
ORP-value95% CI
App outcome (evidence of melanoma detected)a26.550.0023.29–213.96
PCPs’ degree of melanoma suspicion (high)a3.350.021.19–9.44
Lesion size (large)b0.980.560.89–1.06
Age (high)b1.060.0021.02–1.11
Sex (female)a1.870.250.64–5.47

CI, confidence interval. aCategorical variable; bcontinuous variable.

Table 4

Odds ratios (ORs) for the app’s guidance (at the predefined cutoff level) and primary care physician (PCP) degree of melanoma suspicion in predicting a melanoma diagnosis, analysed with binary logistic regression analysis (enter method) and adjusted for lesion size, patient age and sex

ORP-value95% CI
App outcome (evidence of melanoma detected)a26.550.0023.29–213.96
PCPs’ degree of melanoma suspicion (high)a3.350.021.19–9.44
Lesion size (large)b0.980.560.89–1.06
Age (high)b1.060.0021.02–1.11
Sex (female)a1.870.250.64–5.47
ORP-value95% CI
App outcome (evidence of melanoma detected)a26.550.0023.29–213.96
PCPs’ degree of melanoma suspicion (high)a3.350.021.19–9.44
Lesion size (large)b0.980.560.89–1.06
Age (high)b1.060.0021.02–1.11
Sex (female)a1.870.250.64–5.47

CI, confidence interval. aCategorical variable; bcontinuous variable.

Discussion

Diagnosing melanoma in primary care is a challenging task. In this trial we investigated the diagnostic performance of an AI-based clinical decision support tool to detect or dismiss melanoma, when used by PCPs on skin lesions of concern. The results showed that – based on the app’s high NPV (100% for invasive melanomas and 99.5% for all melanomas) – a large proportion of benign lesions excised or referred to a dermatologist could have been declared as benign at the primary care level if the app’s outcome had been applied, without increasing the risk of missing a melanoma. As illustrated in Figure 3, as many as 140 (127 + 13; 55.3%) of the 253 lesions either excised or referred to a dermatologist may not have needed further assessment if the PCP had followed the app’s guidance. In turn, this could reduce the demand for dermatologist and pathologist assessment, thereby increasing accessibility to secondary care for patients who do, in fact, present with a melanoma. It could also reduce inequities in melanoma diagnostics due to variation in diagnostic skills among PCPs, which is likely to be present. Notably, only one melanoma in situ was missed by the app; however, this lesion presented with an unremarkable dermoscopic appearance (Figure S1, lesion #16).

The app was strictly applied on lesions with some degree of suspicion for melanoma. We believe this enhances the significance of this study’s outcome, as this scenario corresponds to the real-life setting in primary care. Although it is known from previous studies that some melanomas are initially missed by the standard clinical assessment (in primary care, as well as by dermatologists),5,7,30 the solution is not to apply this type of app thoughtlessly on all of a patient’s skin lesions, as this could lead to an unmanageable number of false-positive results. However, the app’s high NPV and ease of use might promote use on lesions with a lower degree of suspicion for melanoma that might best be managed with lesion monitoring.

Despite a large number of studies on the ability of AI algorithms to recognize skin cancer by dermoscopic images,12–24 surprisingly few of these have a prospective study design.20–24 Most algorithms have been tested on varying numbers of already sampled images, often comparing AI performance with that of a group of clinicians (e.g. dermatologists).12–15,17–19 However, such an approach has important limitations. Firstly, in clinical reality the examining physician (dermatologist or PCP) considers not only the dermoscopic image, but also other important information such as personal or family history, evolution of the lesion and its appearance in comparison with the patient’s other lesions. Secondly, any sampled series of images will inevitably suffer from a varying degree of selection bias and would not be fully representative for how and on which lesions the AI device would be used in practice. Finally, the diagnostic label applied to a lesion in a simulated situation does not necessarily correspond to how it would be handled in clinical practice (e.g. how convinced the doctor needs to be that a lesion is benign in order not to excise it just for safety’s sake).23

The four previous prospective studies conducted in dermatology clinics reported an AUROC of 0.76–0.94 for detecting melanoma, and a melanoma prevalence rate of 48%, 23%, 56% and 16%, respectively.20–23 In our study, the AUROC was 0.96, despite a melanoma prevalence rate of only 8% among the investigated lesions. The proportion of melanomas of histopathological class pT2 or higher in our study was low, which might be considered a limitation. In comparison, Phillips et al. included 31% pT2–pT4 lesions.21 However, thin melanomas are not only the most common, but also often the most difficult to distinguish from benign melanocytic lesions on dermoscopic examination. Moreover, they are important to detect before they eventually become thicker and worsen the prognosis.31 Finally, if a diagnostic aid is to be implemented in a specific clinical setting, a key element in its validation process is to evaluate it adequately in that particular setting. For example, the lower specificity for the predefined cutoff level vs. that of the in silico material (60.3% vs. 78%), despite the overall higher AUROC value, is likely to reflect this.

The choice to use both histopathology and dermatologist assessment as the diagnostic reference mirrors the standard diagnostic routine for skin tumours assessed in Swedish primary care. Consequently, not all included lesions diagnosed as benign were histopathologically confirmed as such. Excision of this subset of lesions solely for the purpose of the study was not considered ethically justifiable (with regard to patient discomfort and unnecessary scarring) and could also be associated with a risk of dropouts and selection bias due to patients potentially declining surgical excision. We believe that by choosing this approach the results are as close to a true clinical situation as possible, strengthening the study’s generalizability.

In the present study design, the PCPs were instructed not to let the app guidance affect their clinical management. This was owing to insufficient evidence of the app’s diagnostic capacity and reliability based solely on the in silico, pretrial retrospective training. However, considering the favourable outcome of the study, the next step should be to proceed with a randomized study design, evaluating the app when it is actually being used to guide a PCP in the diagnostic process and comparing it with ordinary clinical routine. How physicians tend to rely on advice given by an app, emotionally and intellectually, in diagnosing such a serious condition as cancer, is yet rarely explored, but the presence of evidence-based knowledge supporting its reliability has emerged as an enhancing factor.26,32–34

Another limitation of the study was the varying inclusion rate among the participating PCCs, with a few centres not contributing any lesions at all. This reflects one of the challenges of performing prospective clinical real-life trials in primary care, affected by varying degrees of time pressure, staff shortages and heavy workloads. However, this illustrates the actual situation and circumstances in primary care daily practice, possibly contributing to the authenticity of the study. It also emphasizes the importance of developing novel, practical and useful diagnostic tools that could potentially contribute to reducing unnecessary work tasks (e.g. unnecessary excisions of benign skin lesions), not least since understaffing in primary care settings is reported to be associated with increased risk of missing melanomas at examination.32 A minor study limitation is the absence of data addressing PCP characteristics, such as clinical experience or workplace staffing conditions, which may have been of potential relevance for the interpretation of the study results. Of note, the dominance of individuals with Fitzpatrick skin types I–II in the study population (representative of the general population in Sweden), as well as the exclusion of patients with Fitzpatrick skin types V–VI, limits the applicability of the study results on other populations.

In conclusion, the AI-based decision support tool to detect melanoma evaluated in this study appears to be clinically reliable and of potential clinical benefit in the management of skin lesions of concern assessed in primary care and can improve the identification of lesions in need of dermatological or histopathological assessment. Further research, preferably with a randomized study design, is warranted to determine the tool’s actual usefulness and diagnostic safety over time.

Acknowledgements

Aigora GmbH provided image data for the in silico training of the convolutional neural network used in the clinical decision support prior to the clinical trial.

Funding sources

The study was funded by grants from Region Östergötland, Sweden, and the AIDA network (MedTech4Health).

Data availability

The data underlying this article will be shared upon reasonable request to the corresponding author, with the exception of lesion images other than those presented in Figure S1.

Ethics statement

The clinical trial was approved by the Swedish Ethical Review Authority (approval number Dnr. 2022-00895-01) and by the Swedish Medical Products Agency (CIV-21-12-038346).

Supporting Information

Additional Supporting Information may be found in the online version of this article at the publisher’s website.

References

1

Trager
 
MH
,
Queen
 
D
,
Samie
 
FH
 et al.  
Advances in prevention and surveillance of cutaneous malignancies
.
Am J Med
 
2020
;
133
:
417
23
.

2

Geller
 
AC
,
Dickerman
 
BA
,
Taber
 
JM
 et al.  
Skin cancer interventions across the cancer control continuum: a review of experimental evidence (1/1/2000–6/30/2015) and future research directions
.
Prev Med
 
2018
;
111
:
442
50
.

3

Weber
 
P
,
Tschandl
 
P
,
Sinz
 
C
,
Kittler
 
H
.
Dermatoscopy of neoplastic skin lesions: recent advances, updates, and revisions
.
Curr Treat Options Oncol
 
2018
;
19
:
56
.

4

Kittler
 
H
,
Pehamberger
 
H
,
Wolff
 
K
,
Binder
 
M
.
Diagnostic accuracy of dermoscopy
.
Lancet Oncol
 
2002
;
3
:
159
65
.

5

Harkemanne
 
E
,
Baeck
 
M
,
Tromme
 
I
.
Training general practitioners in melanoma diagnosis: a scoping review of the literature
.
BMJ Open
 
2021
;
11
:
e043926
.

6

Herschorn
 
A
.
Dermoscopy for melanoma detection in family practice
.
Can Fam Physician
 
2012
;
58
:
740
5
.

7

Vestergaard
 
ME
,
Macaskill
 
P
,
Holt
 
PE
,
Menzies
 
SW
.
Dermoscopy compared with naked eye examination for the diagnosis of primary melanoma: a meta-analysis of studies performed in a clinical setting
.
Br J Dermatol
 
2008
;
159
:
669
76
.

8

Wolner
 
ZJ
,
Yélamos
 
O
,
Liopyris
 
K
 et al.  
Enhancing skin cancer diagnosis with dermoscopy
.
Dermatol Clin
 
2017
;
35
:
417
37
.

9

Chan
 
HP
,
Samala
 
RK
,
Hadjiiski
 
LM
,
Zhou
 
C
.
Deep learning in medical image analysis
.
Adv Exp Med Biol
 
2020
;
1213
:
3
21
.

10

Egger
 
J
,
Gsaxner
 
C
,
Pepe
 
A
 et al.  
Medical deep learning – a systematic meta-review
.
Comput Methods Programs Biomed
 
2022
;
221
:
106874
.

11

Gao
 
J
,
Jiang
 
Q
,
Zhou
 
B
,
Chen
 
D
.
Convolutional neural networks for computer-aided detection or diagnosis in medical image analysis: an overview
.
Math Biosci Eng
 
2019
;
16
:
6536
61
.

12

Phillips
 
M
,
Greenhalgh
 
J
,
Marsden
 
H
,
Palamaras
 
I
.
Detection of malignant melanoma using artificial intelligence: an observational study of diagnostic accuracy
.
Dermatol Pract Concept
 
2019
;
10
:
e2020011
.

13

Haggenmüller
 
S
,
Maron
 
RC
,
Hekler
 
A
 et al.  
Skin cancer classification via convolutional neural networks: systematic review of studies involving human experts
.
Eur J Cancer
 
2021
;
156
:​
202
16
.

14

Jones
 
OT
,
Matin
 
RN
,
van der Schaar
 
M
 et al.  
Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review
.
Lancet Digit Health
 
2022
;
4
:
e466
76
.

15

Combalia
 
M
,
Codella
 
N
,
Rotemberg
 
V
 et al.  
Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 International Skin Imaging Collaboration Grand Challenge
.
Lancet Digit Health
 
2022
;
4
:
e330
9
.

16

Jain
 
A
,
Way
 
D
,
Gupta
 
V
 et al.  
Development and assessment of an artificial intelligence-based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices
.
JAMA Netw Open
 
2021
;
4
:
e217249
.

17

Lee
 
S
,
Chu
 
YS
,
Yoo
 
SK
 et al.  
Augmented decision-making for acral lentiginous melanoma detection using deep convolutional neural networks
.
J Eur Acad Dermatol Venereol
 
2020
;
34
:
1842
50
.

18

Maron
 
RC
,
Utikal
 
JS
,
Hekler
 
A
 et al.  
Artificial intelligence and its effect on dermatologists’ accuracy in dermoscopic melanoma image classification: web-based survey study
.
J Med Internet Res
 
2020
;
22
:
e18091
.

19

Grignaffin
 
F
,
Burbuto
 
F
,
Piazzo
 
L
 et al.  
Machine learning approaches for skin cancer classification from dermoscopic images: a systematic review
.
Algorithms
 
2022
;
15
:
438
.

20

MacLellan
 
AN
,
Price
 
EL
,
Publicover-Brouwer
 
P
 et al.  
The use of noninvasive imaging techniques in the diagnosis of melanoma: a prospective diagnostic accuracy study
.
J Am Acad Dermatol
 
2021
;
85
:
353
9
.

21

Phillips
 
M
,
Marsden
 
H
,
Jaffe
 
W
 et al.  
Assessment of accuracy of an artificial intelligence algorithm to detect melanoma in images of skin lesions
.
JAMA Netw Open
 
2019
;
2
:
e1913436
.

22

Marchetti
 
MA
,
Cowen
 
EA
,
Kurtansky
 
NR
 et al.  
Prospective validation of dermoscopy-based open-source artificial intelligence for melanoma diagnosis (PROVE-AI study)
.
NPJ Digit Med
 
2023
;
6
:
127
.

23

Menzies
 
SW
,
Sinz
 
C
,
Menzies
 
M
 et al.  
Comparison of humans versus mobile phone-powered artificial intelligence for the diagnosis and management of pigmented skin cancer in secondary care: a multicentre, prospective, diagnostic, clinical trial
.
Lancet Digit Health
 
2023
;
5
:
e679
91
.

24

Escalé-Besa
 
A
,
Yélamos
 
O
,
Vidal-Alaball
 
J
 et al.  
Exploring the potential of artificial intelligence in improving skin lesion diagnosis in primary care
.
Sci Rep
 
2023
;
13
:
4293
.

25

Liopyris
 
K
,
Gregoriou
 
S
,
Dias
 
J
,
Stratigos
 
AJ
.
Artificial intelligence in dermatology: challenges and perspectives
.
Dermatol Ther
 
2022
;
12
:
2637
51
.

26

Helenason
 
J
,
Ekstrom
 
C
,
Falk
 
M
,
Papachristou
 
P
.
Exploring the feasibility of an artificial intelligence based clinical decision support system for melanoma detection in primary care
.
Scand J Prim Health Care
 
2023
; doi: https://doi.org/10.1080/02813432.2023.2283190 (Epub ahead of print).

27

Fitzpatrick
 
TB
.
The validity and practicality of sun-reactive skin types I through VI
.
Arch Dermatol
 
1988
;
124
:
869
71
.

28

Williams
 
NM
,
Rojas
 
KD
,
Reynolds
 
JM
 et al.  
Assessment of diagnostic accuracy of dermoscopic structures and patterns used in melanoma detection: a systematic review and meta-analysis
.
JAMA Dermatol
 
2021
;
157
:
1078
88
.

29

Swedish National Guidelines for Malignant Melanoma
. Regionala Cancercentrum i Samverkan. Available at: https://kunskapsbanken.cancercentrum.se/diagnoser/melanom/vardprogram (last accessed 1 February 2024; in Swedish).

30

Nartey
 
Y
,
Sneyd
 
MJ
.
The presenting features of melanoma in New Zealand: implications for earlier detection
.
Aust N Z J Public Health
 
2018
;
42
:
567
71
.

31

Hynes
 
MC
,
Nguyen
 
P
,
Groome
 
PA
 et al.  
A population-based validation study of the 8th edition UICC/AJCC TNM staging system for cutaneous melanoma
.
BMC Cancer
 
2022
;
22
:
720
.

32

Meyer
 
J
,
Khademi
 
A
,
Têtu
 
B
 et al.  
Impact of artificial intelligence on pathologists’ decisions: an experiment
.
J Am Med Inform Assoc
 
2022
;
29
:
1688
95
.

33

Buck
 
C
,
Doctor
 
E
,
Hennrich
 
J
 et al.  
General practitioners’ attitudes toward artificial intelligence-enabled systems: interview study
.
J Med Internet Res
 
2022
;
24
:
e28916
.

34

Fleming
 
NH
,
Grade
 
MM
,
Bendavid
 
E
.
Impact of primary care provider density on detection and diagnosis of cutaneous melanoma
.
PLOS ONE
 
2018
;
13
:
e0200097
.

Author notes

Conflicts of interest P.P. is a co-founder of the clinical decision support (Dermalyser®) studied in the clinical trial. The other authors declare no conflicts of interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data