-
PDF
- Split View
-
Views
-
Cite
Cite
Panagiotis Papachristou, My Söderholm, Jon Pallon, Marina Taloyan, Sam Polesie, John Paoli, Chris D Anderson, Magnus Falk, Evaluation of an artificial intelligence-based decision support for the detection of cutaneous melanoma in primary care: a prospective real-life clinical trial, British Journal of Dermatology, Volume 191, Issue 1, July 2024, Pages 125–133, https://doi.org/10.1093/bjd/ljae021
- Share Icon Share
Abstract
Use of artificial intelligence (AI), or machine learning, to assess dermoscopic images of skin lesions to detect melanoma has, in several retrospective studies, shown high levels of diagnostic accuracy on par with – or even outperforming – experienced dermatologists. However, the enthusiasm around these algorithms has not yet been matched by prospective clinical trials performed in authentic clinical settings. In several European countries, including Sweden, the initial clinical assessment of suspected skin cancer is principally conducted in the primary healthcare setting by primary care physicians, with or without access to teledermoscopic support from dermatology clinics.
To determine the diagnostic performance of an AI-based clinical decision support tool for cutaneous melanoma detection, operated by a smartphone application (app), when used prospectively by primary care physicians to assess skin lesions of concern due to some degree of melanoma suspicion.
This prospective multicentre clinical trial was conducted at 36 primary care centres in Sweden. Physicians used the smartphone app on skin lesions of concern by photographing them dermoscopically, which resulted in a dichotomous decision support text regarding evidence for melanoma. Regardless of the app outcome, all lesions underwent standard diagnostic procedures (surgical excision or referral to a dermatologist). After investigations were complete, lesion diagnoses were collected from the patients’ medical records and compared with the app’s outcome and other lesion data.
In total, 253 lesions of concern in 228 patients were included, of which 21 proved to be melanomas, with 11 thin invasive melanomas and 10 melanomas in situ. The app’s accuracy in identifying melanomas was reflected in an area under the receiver operating characteristic (AUROC) curve of 0.960 [95% confidence interval (CI) 0.928–0.980], corresponding to a maximum sensitivity and specificity of 95.2% and 84.5%, respectively. For invasive melanomas alone, the AUROC was 0.988 (95% CI 0.965–0.997), corresponding to a maximum sensitivity and specificity of 100% and 92.6%, respectively.
The clinical decision support tool evaluated in this investigation showed high diagnostic accuracy when used prospectively in primary care patients, which could add significant clinical value for primary care physicians assessing skin lesions for melanoma.

Linked Article: Jones et al. Br J Dermatol 2024; 191:13.
Author Video: https://youtu.be/jtzN34U9cG0
Plain language summary
Skin cancer is one of the most common forms of cancer worldwide. Melanoma is a serious type of skin cancer that can be difficult to differentiate from other skin lesions, even for experienced physicians.
In Sweden, the initial assessment of suspected melanoma is performed by primary care physicians. We aimed to investigate the capability of a diagnostic decision support tool to detect melanoma when used by primary care physicians to assess skin lesions of concern. The device is based on artificial intelligence (AI) assessment of dermoscopic images via a smartphone app.
Thirty-six primary care centres in Sweden took part. If a patient had a skin lesion that was suspected to be a melanoma, they were asked to participate. The physician used the app by photographing the lesion with a smartphone connected to a dermoscope (a magnifying lens). The app provides guidance on whether to suspect melanoma or not. However, independently of this guidance, all lesions underwent regular diagnostic investigation, and the results were compared.
Of 253 skin lesions of concern, 11 malignant melanomas and 10 in situ (precancerous) melanoma lesions were found. The app correctly identified all these as melanomas, except for one of the in situ melanomas. The negative predictive value of the app was 99.5%, which meant that if the tool suggested a lesion was not a melanoma, there was a 99.5% probability it was correct. Overall, our study findings suggest that decision support might help primary care physicians to avoid unnecessary surgical excision of benign skin lesions.
Use of artificial intelligence (AI) for melanoma detection has shown high diagnostic accuracy levels.
Prospective studies investigating the diagnostic accuracy in detecting melanoma in true clinical settings are lacking, particularly in primary care, where the initial assessment of skin lesions most often takes place.
The present study investigated the diagnostic accuracy of an AI-based decision support for melanoma detection when used in primary care.
The study indicates that automated decision support may increase the diagnostic accuracy of primary care physicians’ ability to differentiate melanomas from other skin lesions, potentially reducing the number of benign lesions subjected to unnecessary excision.
For cutaneous melanoma (henceforth ‘melanoma’), detection and excision at an early stage of the disease is crucial for prognosis and survival.1,2 However, discriminating melanomas from benign lesions can be challenging for the examining physician, demanding considerable training and clinical experience. The addition of dermoscopy to the naked eye examination increases the diagnostic accuracy, especially when applying structured pattern recognition algorithms.3 Dermatologists generally achieve a higher diagnostic accuracy than primary care physicians (PCPs) in detecting melanomas, with a sensitivity of 73–96% vs. 50–84% and a specificity of 73–98% vs. 71–89%.4–8 In cases of diagnostic uncertainty, the lethal potential of melanoma necessitates histopathological investigation or at least lesion monitoring when the degree of suspicion is low. Consequently, a significant proportion of benign lesions are excised or, in primary care, referred to a dermatologist for assessment, often by teledermoscopy.6–8
Artificial intelligence (AI) in medical imaging has gained increasing attention and interest.9–14 This includes AI processing of dermoscopic images for melanoma recognition. Several studies have been conducted using various image databases, consistently reporting levels of diagnostic accuracy comparable to those achieved by experienced dermatologists.12–19 However, few studies have investigated the performance of AI prospectively with patients in real-life clinical settings.20–24 Of these, four were conducted at dermatology clinics,20–23 and only one in primary care, with only one melanoma.24 As the initial assessment for the majority of patients in many countries takes place at primary care centres (PCCs), large prospective studies on the use of AI for melanoma detection in primary care settings are needed.13,14,25 Furthermore, the potential benefit of adding AI support is likely to be greatest among PCPs, owing to their lower levels of diagnostic accuracy.
The aim of this investigation was to determine the diagnostic performance of an AI-based clinical decision support tool for melanoma detection, operated by a smartphone application (app), when used prospectively by PCPs to assess skin lesions of concern due to some degree of melanoma suspicion.
Materials and methods
A prospective real-life clinical trial was conducted to assess the diagnostic performance of the smartphone app Dermalyser® (AI Medical Technology, Stockholm, Sweden), an AI-based decision support tool (henceforth ‘the app’) to detect melanoma.26 The study was performed in accordance with the STARD 2015 guidelines for diagnostic accuracy studies and approved by the Swedish Medical Products Agency and the Swedish Ethical Review Authority. The trial was registered with ClinicalTrials.gov (NCT05172232). The machine learning algorithm behind the decision support had been trained in silico prior to the start of the study on an extensive number of dermoscopic images of skin lesions, showing an area under the receiver operating characteristics (ROC) curve (AUROC) of 0.94. The algorithm was fixed prior to the start of data collection (i.e. no further training or modification of its performance was done during the study). When applied on a dermoscopic image, the algorithm produces a value between 0 and 1, corresponding to a calculated probability (or risk) of the lesion being a melanoma, but in its communication with the user, the intention of the decision support is to be presented as a dichotomous outcome (‘evidence of melanoma detected’ or ‘no evidence of melanoma detected’) based on this probability. Thus, prior to data inclusion we decided on a cutoff level for the dichotomy that would entail a reasonable risk of false negatives without leading to an extensive proportion of false positives, which from the ROC curve in the aforementioned in silico pretrial corresponded to a sensitivity and specificity of 95% and 78%, respectively.
Study population
Overall, 36 PCCs located in seven regions of southern Sweden participated. At these centres, 138 PCPs (90 certified general practitioners and 48 resident trainees) were trained to be able to enrol study participants. Patients aged ≥ 18 years, visiting any of the study units and presenting with one or more skin lesions for which the PCP had any suspicion of melanoma (ranging from ‘appears as benign, but cannot with full certainty exclude melanoma’ to ‘undoubtedly a melanoma’) were eligible for inclusion. Lesions exclusively suspected of being any other cutaneous malignancy (such as basal cell or squamous cell carcinoma) were not included. Lesions on damaged or tattooed skin, located on inaccessible body locations (e.g. between the fingers) or covered with dense hair making it difficult to provide sufficient image quality, were excluded. Having a melanin-rich skin type (i.e. Fitzpatrick skin type V–VI)27 was also an exclusion criterion, owing to the lack of such images in the app’s convolutional neural network training data. Finally, poor-quality dermoscopic images (e.g. out-of-focus images or lesions not covered entirely by the image) were excluded. No remuneration was given to the participating centres or patients.
Study procedure/data collection
Eligible patients identified by the examining PCP received oral and written information about the study and gave their written informed consent to participate before being included. As a fundamental criterion for a skin lesion to be included was that the examining PCP had some degree of melanoma suspicion, PCPs decided on medical action according to their degree of suspicion and in accordance with ordinary clinical routine before applying the app. The PCPs recorded their degree of melanoma suspicion as either ‘high’ or ‘low, but cannot rule out melanoma’, as well as their decision on action as (i) excision at the PCC, (ii) referral for excision by another surgeon, (iii) referral to a dermatologist for further clinical evaluation (with or without the use of teledermoscopy) or (iv) other action taken (which they were asked to specify). They also reported whether they had used any kind of established diagnostic algorithm to support their evaluation, such as the clinical ABCDE criteria or any dermoscopic algorithms (e.g. classic or modified pattern analysis, 3- or 7-point checklists).28 Furthermore, the body location of the lesion and the patients’ Fitzpatrick skin type,27 age and sex were recorded. The PCP then used the app on the included skin lesion (or a maximum of three lesions if a patient had more than one suspected melanoma) and recorded its outcome (‘evidence of melanoma detected’ or ‘no evidence of melanoma detected’). Importantly, this was done without letting the app’s outcome affect the decision on action already taken or communicating the outcome to the patient. Instead, the PCPs were asked whether they believed that the app outcome would have changed their degree of suspicion or not and, if so, in what direction.
Technical equipment
All participating PCCs routinely use teledermoscopy in the assessment of skin tumours and were thus already equipped with polarized light contact dermoscopes [either a Heine iC1 (Heine Optotechnik, Gilching, Germany) or a DermLite DL3 (DermLite, Aliso Viejo, CA, USA), with occasional exceptions]. Prior to the study, all units were provided a camera-equipped smartphone (iPhone SE, 2020; iOS 14 or 15, Apple, Cupertino, CA, USA) with the app installed, together with a phone case adapter compatible with the dermoscope model available at the unit. Before study initiation, on-site instruction and education on how to use the app was given to the PCPs at each unit.
Data evaluation
For each lesion, the app outcome was compared with the final clinical or histopathological tumour diagnosis collected from the patient record. In the clinical setting where the study was performed (i.e. Swedish primary care), the standard routine for skin tumour diagnostics is either excision (at the PCC or surgical clinic) or referral to a dermatologist for further evaluation. In the latter case, if the dermatologist assesses the lesion as undoubtedly benign, excision is generally not performed, owing to the absence of melanoma suspicion at the higher level of clinical expertise, resulting in a clinical diagnosis. If excised, all lesions undergo histopathological analysis, irrespective of who performs the excision. This standard clinical routine for diagnosis was thus also applied for tumour diagnosis in the study.29
Results
Participant recruitment was performed from May to December 2022. An intended sample size of 500 lesions was considered to be required to detect at least 12 melanomas, to reach sufficient statistical power. Recruitment took longer than expected; however, as the number of detected melanomas reached the target number more quickly than expected, inclusion was terminated when 253 lesions in 228 patients (125 women, 103 men) had been included (Figure 1). The mean age of the overall study population was 54 years (for women it was 51 years and for men it was 55 years). The clinical ABCDE criteria was the most commonly used assessment tool by the PCPs (59.8% of all lesions), followed by the classic or modified dermoscopic pattern analysis algorithms (49.4%) or a combination of both (18.7%). In 10.1%, other assessment methods were used. Of the 36 participating PCCs, 7 did not include any lesions. Among the remaining 29 centres, the number of included lesions varied between 1 and 33. In total, 194 lesions were managed by referral to dermatologist, 54 by primary excision at the PCC and 5 by referral for excision at a surgical clinic. In 134 lesions, tumour diagnoses were based on histopathology, while 119 lesions were clinically diagnosed as benign by the dermatologist (Table 1). Overall, 21 melanomas (11 invasive melanomas and 10 melanomas in situ) were detected in 20 patients (Figure S1; see Supporting Information). The invasive melanomas were all of a superficial spreading type, exhibited a Breslow thickness ranging from 0.1 to 1.1 mm (median 0.5) and were of histopathologic class T1a, with the exception of two lesions (T1b and T2a). The posterior torso was the most common melanoma location.

Flowchart illustrating the study sample. Data are presented as number of lesions (number of patients). *Reasons for exclusion: two lesions showed at monitoring not to be melanoma suspected by the primary care physician (PCP), and for one lesion, the photo taken for app analysis was not dermoscopic.
Different clinical and (for melanomas) histopathological characteristics of included lesions
. | Melanoma, invasive . | Melanoma in situ . | Melanoma, total . | Nonmelanoma . | All lesions . |
---|---|---|---|---|---|
Sex (n) | |||||
Female | 8 | 6 | 14 | 126 | 140 |
Male | 3 | 4 | 7 | 106 | 112 |
Age (years) | |||||
Mean | 71 | 62 | 66 | 53 | 55 |
Median (IQR) | 70 (63–76) | 60 (50–76) | 69 (58–76) | 53 (38–68) | 56 (55–69) |
Range | 57–86 | 39–78 | 39–86 | 20–90 | 20–90 |
Fitzpatrick skin type (n) | |||||
I | 1 | 0 | 1 | 19 | 20 |
II | 5 | 7 | 12 | 153 | 165 |
III | 5 | 3 | 8 | 47 | 65 |
IV | 0 | 0 | 0 | 13 | 13 |
Lesion size, clinically assessed (mm) | |||||
Mean | 8.4 | 7.6 | 8.0 | 7.1 | 7.2 |
Median (IQR) | 5 (4–10) | 6.5 (5–9) | 6 (4.5–9.5) | 6 (4–10) | 6 (4–10) |
Range | 3–30 | 1–20 | 1–35 | 1–30 | 1–35 |
Body location (n) | |||||
Face | 0 | 0 | 0 | 16 | 16 |
Head and neck | 0 | 0 | 0 | 21 | 21 |
Anterior torso | 3 | 1 | 4 | 50 | 54 |
Posterior torso | 2 | 4 | 6 | 76 | 82 |
Lateral torso | 1 | 0 | 1 | 10 | 11 |
Upper extremities | 3 | 1 | 4 | 20 | 24 |
Lower extremities | 2 | 4 | 6 | 32 | 38 |
Palms/soles | 0 | 0 | 0 | 5 | 5 |
Groin/genital region | 0 | 0 | 0 | 2 | 2 |
Lesion characteristics | |||||
Melanomas (n = 21) | |||||
Histopathological characteristics | |||||
Breslow thickness (mm) | |||||
Mean | 0.54 | ||||
Median (IQR) | 5 (0.3–0.7) | ||||
Range | 0.2–1.1 | ||||
Tumour width (mm) | |||||
Mean | 7.9 | 7.2 | |||
Median (IQR) | 6.5 (4–11) | 6.5 (5–9) | |||
Range | 3–17 | 4–13 | |||
Tumour length (mm) | |||||
Mean | 10.7 | 9.5 | |||
Median (IQR) | 9.5 (5–13) | 8 (6–12) | |||
Range | 3–30 | 5–19 | |||
Nonmelanomas (n = 221) | |||||
Diagnosis, total (based on dermatologist/histopathologic assessment) | |||||
Nondysplastic melanocytic naevus | 66 (40/26) | ||||
Congenital melanocytic naevus | 11 (9/2) | ||||
Dysplastic melanocytic naevus | 30 (0/30) | ||||
Seborrhoeic keratosis | 69 (45/24) | ||||
Actinic keratosis | 2 (2/0) | ||||
Solar lentigo | 8 (6/2) | ||||
Dermatofibroma | 8 (3/5) | ||||
Haemangioma | 7 (3/4) | ||||
BCC | 11 (2/9) | ||||
SCC (incl. in situ) | 4 (0/4) | ||||
Other | 13 (7/6) |
. | Melanoma, invasive . | Melanoma in situ . | Melanoma, total . | Nonmelanoma . | All lesions . |
---|---|---|---|---|---|
Sex (n) | |||||
Female | 8 | 6 | 14 | 126 | 140 |
Male | 3 | 4 | 7 | 106 | 112 |
Age (years) | |||||
Mean | 71 | 62 | 66 | 53 | 55 |
Median (IQR) | 70 (63–76) | 60 (50–76) | 69 (58–76) | 53 (38–68) | 56 (55–69) |
Range | 57–86 | 39–78 | 39–86 | 20–90 | 20–90 |
Fitzpatrick skin type (n) | |||||
I | 1 | 0 | 1 | 19 | 20 |
II | 5 | 7 | 12 | 153 | 165 |
III | 5 | 3 | 8 | 47 | 65 |
IV | 0 | 0 | 0 | 13 | 13 |
Lesion size, clinically assessed (mm) | |||||
Mean | 8.4 | 7.6 | 8.0 | 7.1 | 7.2 |
Median (IQR) | 5 (4–10) | 6.5 (5–9) | 6 (4.5–9.5) | 6 (4–10) | 6 (4–10) |
Range | 3–30 | 1–20 | 1–35 | 1–30 | 1–35 |
Body location (n) | |||||
Face | 0 | 0 | 0 | 16 | 16 |
Head and neck | 0 | 0 | 0 | 21 | 21 |
Anterior torso | 3 | 1 | 4 | 50 | 54 |
Posterior torso | 2 | 4 | 6 | 76 | 82 |
Lateral torso | 1 | 0 | 1 | 10 | 11 |
Upper extremities | 3 | 1 | 4 | 20 | 24 |
Lower extremities | 2 | 4 | 6 | 32 | 38 |
Palms/soles | 0 | 0 | 0 | 5 | 5 |
Groin/genital region | 0 | 0 | 0 | 2 | 2 |
Lesion characteristics | |||||
Melanomas (n = 21) | |||||
Histopathological characteristics | |||||
Breslow thickness (mm) | |||||
Mean | 0.54 | ||||
Median (IQR) | 5 (0.3–0.7) | ||||
Range | 0.2–1.1 | ||||
Tumour width (mm) | |||||
Mean | 7.9 | 7.2 | |||
Median (IQR) | 6.5 (4–11) | 6.5 (5–9) | |||
Range | 3–17 | 4–13 | |||
Tumour length (mm) | |||||
Mean | 10.7 | 9.5 | |||
Median (IQR) | 9.5 (5–13) | 8 (6–12) | |||
Range | 3–30 | 5–19 | |||
Nonmelanomas (n = 221) | |||||
Diagnosis, total (based on dermatologist/histopathologic assessment) | |||||
Nondysplastic melanocytic naevus | 66 (40/26) | ||||
Congenital melanocytic naevus | 11 (9/2) | ||||
Dysplastic melanocytic naevus | 30 (0/30) | ||||
Seborrhoeic keratosis | 69 (45/24) | ||||
Actinic keratosis | 2 (2/0) | ||||
Solar lentigo | 8 (6/2) | ||||
Dermatofibroma | 8 (3/5) | ||||
Haemangioma | 7 (3/4) | ||||
BCC | 11 (2/9) | ||||
SCC (incl. in situ) | 4 (0/4) | ||||
Other | 13 (7/6) |
BCC, squamous cell carcinoma; IQR, interquartile range; SCC, squamous cell carcinoma.
Different clinical and (for melanomas) histopathological characteristics of included lesions
. | Melanoma, invasive . | Melanoma in situ . | Melanoma, total . | Nonmelanoma . | All lesions . |
---|---|---|---|---|---|
Sex (n) | |||||
Female | 8 | 6 | 14 | 126 | 140 |
Male | 3 | 4 | 7 | 106 | 112 |
Age (years) | |||||
Mean | 71 | 62 | 66 | 53 | 55 |
Median (IQR) | 70 (63–76) | 60 (50–76) | 69 (58–76) | 53 (38–68) | 56 (55–69) |
Range | 57–86 | 39–78 | 39–86 | 20–90 | 20–90 |
Fitzpatrick skin type (n) | |||||
I | 1 | 0 | 1 | 19 | 20 |
II | 5 | 7 | 12 | 153 | 165 |
III | 5 | 3 | 8 | 47 | 65 |
IV | 0 | 0 | 0 | 13 | 13 |
Lesion size, clinically assessed (mm) | |||||
Mean | 8.4 | 7.6 | 8.0 | 7.1 | 7.2 |
Median (IQR) | 5 (4–10) | 6.5 (5–9) | 6 (4.5–9.5) | 6 (4–10) | 6 (4–10) |
Range | 3–30 | 1–20 | 1–35 | 1–30 | 1–35 |
Body location (n) | |||||
Face | 0 | 0 | 0 | 16 | 16 |
Head and neck | 0 | 0 | 0 | 21 | 21 |
Anterior torso | 3 | 1 | 4 | 50 | 54 |
Posterior torso | 2 | 4 | 6 | 76 | 82 |
Lateral torso | 1 | 0 | 1 | 10 | 11 |
Upper extremities | 3 | 1 | 4 | 20 | 24 |
Lower extremities | 2 | 4 | 6 | 32 | 38 |
Palms/soles | 0 | 0 | 0 | 5 | 5 |
Groin/genital region | 0 | 0 | 0 | 2 | 2 |
Lesion characteristics | |||||
Melanomas (n = 21) | |||||
Histopathological characteristics | |||||
Breslow thickness (mm) | |||||
Mean | 0.54 | ||||
Median (IQR) | 5 (0.3–0.7) | ||||
Range | 0.2–1.1 | ||||
Tumour width (mm) | |||||
Mean | 7.9 | 7.2 | |||
Median (IQR) | 6.5 (4–11) | 6.5 (5–9) | |||
Range | 3–17 | 4–13 | |||
Tumour length (mm) | |||||
Mean | 10.7 | 9.5 | |||
Median (IQR) | 9.5 (5–13) | 8 (6–12) | |||
Range | 3–30 | 5–19 | |||
Nonmelanomas (n = 221) | |||||
Diagnosis, total (based on dermatologist/histopathologic assessment) | |||||
Nondysplastic melanocytic naevus | 66 (40/26) | ||||
Congenital melanocytic naevus | 11 (9/2) | ||||
Dysplastic melanocytic naevus | 30 (0/30) | ||||
Seborrhoeic keratosis | 69 (45/24) | ||||
Actinic keratosis | 2 (2/0) | ||||
Solar lentigo | 8 (6/2) | ||||
Dermatofibroma | 8 (3/5) | ||||
Haemangioma | 7 (3/4) | ||||
BCC | 11 (2/9) | ||||
SCC (incl. in situ) | 4 (0/4) | ||||
Other | 13 (7/6) |
. | Melanoma, invasive . | Melanoma in situ . | Melanoma, total . | Nonmelanoma . | All lesions . |
---|---|---|---|---|---|
Sex (n) | |||||
Female | 8 | 6 | 14 | 126 | 140 |
Male | 3 | 4 | 7 | 106 | 112 |
Age (years) | |||||
Mean | 71 | 62 | 66 | 53 | 55 |
Median (IQR) | 70 (63–76) | 60 (50–76) | 69 (58–76) | 53 (38–68) | 56 (55–69) |
Range | 57–86 | 39–78 | 39–86 | 20–90 | 20–90 |
Fitzpatrick skin type (n) | |||||
I | 1 | 0 | 1 | 19 | 20 |
II | 5 | 7 | 12 | 153 | 165 |
III | 5 | 3 | 8 | 47 | 65 |
IV | 0 | 0 | 0 | 13 | 13 |
Lesion size, clinically assessed (mm) | |||||
Mean | 8.4 | 7.6 | 8.0 | 7.1 | 7.2 |
Median (IQR) | 5 (4–10) | 6.5 (5–9) | 6 (4.5–9.5) | 6 (4–10) | 6 (4–10) |
Range | 3–30 | 1–20 | 1–35 | 1–30 | 1–35 |
Body location (n) | |||||
Face | 0 | 0 | 0 | 16 | 16 |
Head and neck | 0 | 0 | 0 | 21 | 21 |
Anterior torso | 3 | 1 | 4 | 50 | 54 |
Posterior torso | 2 | 4 | 6 | 76 | 82 |
Lateral torso | 1 | 0 | 1 | 10 | 11 |
Upper extremities | 3 | 1 | 4 | 20 | 24 |
Lower extremities | 2 | 4 | 6 | 32 | 38 |
Palms/soles | 0 | 0 | 0 | 5 | 5 |
Groin/genital region | 0 | 0 | 0 | 2 | 2 |
Lesion characteristics | |||||
Melanomas (n = 21) | |||||
Histopathological characteristics | |||||
Breslow thickness (mm) | |||||
Mean | 0.54 | ||||
Median (IQR) | 5 (0.3–0.7) | ||||
Range | 0.2–1.1 | ||||
Tumour width (mm) | |||||
Mean | 7.9 | 7.2 | |||
Median (IQR) | 6.5 (4–11) | 6.5 (5–9) | |||
Range | 3–17 | 4–13 | |||
Tumour length (mm) | |||||
Mean | 10.7 | 9.5 | |||
Median (IQR) | 9.5 (5–13) | 8 (6–12) | |||
Range | 3–30 | 5–19 | |||
Nonmelanomas (n = 221) | |||||
Diagnosis, total (based on dermatologist/histopathologic assessment) | |||||
Nondysplastic melanocytic naevus | 66 (40/26) | ||||
Congenital melanocytic naevus | 11 (9/2) | ||||
Dysplastic melanocytic naevus | 30 (0/30) | ||||
Seborrhoeic keratosis | 69 (45/24) | ||||
Actinic keratosis | 2 (2/0) | ||||
Solar lentigo | 8 (6/2) | ||||
Dermatofibroma | 8 (3/5) | ||||
Haemangioma | 7 (3/4) | ||||
BCC | 11 (2/9) | ||||
SCC (incl. in situ) | 4 (0/4) | ||||
Other | 13 (7/6) |
BCC, squamous cell carcinoma; IQR, interquartile range; SCC, squamous cell carcinoma.
Diagnostic accuracy of the app
The diagnostic accuracy of the app in differentiating melanoma from nonmelanoma lesions is presented as ROC curves (Figure 2). The AUROC for the app’s capability of differentiating all melanomas from other lesions was 0.960 [95% confidence interval (CI) 0.928–0.980], corresponding to, at best, 95.2% sensitivity and 84.5% specificity, a positive predictive value (PPV) of 35.9% and a negative predictive value (NPV) of 99.5% (Figure 2a). The sensitivity and specificity of the app’s predefined cutoff level applied in the study (marked as blue squares on the curves) were 95.2% and 60.3%, respectively (PPV 17.9%, NPV 99.3%).

Receiver operating characteristics curves for agreement between app outcome and final diagnosis (a) for all melanomas vs. nonmelanomas and (b) for invasive melanomas vs. all other lesions (including melanoma in situ). The blue squares mark the values for the predefined cutoff level used in the study to communicate the app’s guidance to the user. AUC, area under the curve.
The number needed to excise is commonly used to describe the number of lesions suspected to be melanoma that need to undergo histopathological investigation to detect at least one case (i.e. number of excised lesions/number of melanomas found). As lesion diagnoses in our study were based on either excision or dermatologist assessment, we instead explored the number needed to investigate (NNI), as indicated by the app as ‘evidence of melanoma detected’. For the app’s best performance on the ROC curve, the NNI was 2.8 and for the predefined cutoff level it was 5.5.
Concentrating exclusively on invasive melanomas, the AUROC was 0.988 (95% CI 0.965–0.997) (Figure 2b). The app’s sensitivity and specificity in detecting these was, at best, 100% and 92.6% respectively (PPV 38.2%, NPV 100%, NNI 5.1).
Associations between app guidance and primary care physicians’ degree of melanoma suspicion
Figure 3 illustrates the relationship between PCPs’ reported degree of melanoma suspicion (high/low), the final diagnosis (melanoma/not melanoma) outcome and the guidance communicated to the user on the screen (i.e. app outcome at the predefined cutoff level), displayed from both perspectives. Table 2 shows to what extent the app’s guidance was coherent with the PCPs’ degree of melanoma suspicion (i.e. app indicating ‘no evidence of melanoma’ when PCPs’ suspicion degree was low/app indicating ‘evidence of melanoma detected’ when PCP’s degree of suspicion was high). When theoretically adding the PCPs’ degree of melanoma suspicion to the app guidance (i.e. when they were in agreement with one another), the combined diagnostic value for predicting melanoma in terms of PPV would have increased vs. that of each parameter alone (Table 3), whereas the NPV was clearly lower for the PCPs’ degree of suspicion alone vs. the app or the two parameters combined. For the single in situ melanoma missed by the app, the PCPs’ degree of suspicion was also low.

Graph showing the relationships between the primary care physicians’ (PCPs') reported degree of melanoma suspicion, final tumour diagnosis and the app’s outcome (at the cutoff level applied in the study to communicate the app’s guidance to the user).
Interactions between app guidance (at the predefined cutoff level) and the primary care physician (PCP)-reported degree of melanoma suspicion: distribution of agreement between app guidance and PCP-reported degree of melanoma suspicion, and its association with final diagnosis
Categories of agreement between PCP degree of melanoma suspicion and app guidance (n = 253 lesions) . | Final diagnosis . | ||
---|---|---|---|
PCP degree of suspicion high + app indicates ‘Evidence of melanoma detected’ | 38 (15.0) | Melanoma | 12 (4.7) |
Nonmelanoma | 26 (10.3) | ||
PCP degree of suspicion high + app indicates ‘No evidence of melanoma detected’ | 13 (5.1) | Melanoma | 0 (0) |
Nonmelanoma | 13 (5.1) | ||
PCP degree of suspicion low + app indicates ‘Evidence of melanoma detected’ | 74 (29.2) | Melanoma | 8 (3.2) |
Nonmelanoma | 66 (26.1) | ||
PCP degree of suspicion low + app indicates ‘No evidence of melanoma detected’ | 128 (50.6) | Melanoma | 1 (0.4) |
Nonmelanoma | 127 (50.2) |
Categories of agreement between PCP degree of melanoma suspicion and app guidance (n = 253 lesions) . | Final diagnosis . | ||
---|---|---|---|
PCP degree of suspicion high + app indicates ‘Evidence of melanoma detected’ | 38 (15.0) | Melanoma | 12 (4.7) |
Nonmelanoma | 26 (10.3) | ||
PCP degree of suspicion high + app indicates ‘No evidence of melanoma detected’ | 13 (5.1) | Melanoma | 0 (0) |
Nonmelanoma | 13 (5.1) | ||
PCP degree of suspicion low + app indicates ‘Evidence of melanoma detected’ | 74 (29.2) | Melanoma | 8 (3.2) |
Nonmelanoma | 66 (26.1) | ||
PCP degree of suspicion low + app indicates ‘No evidence of melanoma detected’ | 128 (50.6) | Melanoma | 1 (0.4) |
Nonmelanoma | 127 (50.2) |
Data presented as n (%).
Interactions between app guidance (at the predefined cutoff level) and the primary care physician (PCP)-reported degree of melanoma suspicion: distribution of agreement between app guidance and PCP-reported degree of melanoma suspicion, and its association with final diagnosis
Categories of agreement between PCP degree of melanoma suspicion and app guidance (n = 253 lesions) . | Final diagnosis . | ||
---|---|---|---|
PCP degree of suspicion high + app indicates ‘Evidence of melanoma detected’ | 38 (15.0) | Melanoma | 12 (4.7) |
Nonmelanoma | 26 (10.3) | ||
PCP degree of suspicion high + app indicates ‘No evidence of melanoma detected’ | 13 (5.1) | Melanoma | 0 (0) |
Nonmelanoma | 13 (5.1) | ||
PCP degree of suspicion low + app indicates ‘Evidence of melanoma detected’ | 74 (29.2) | Melanoma | 8 (3.2) |
Nonmelanoma | 66 (26.1) | ||
PCP degree of suspicion low + app indicates ‘No evidence of melanoma detected’ | 128 (50.6) | Melanoma | 1 (0.4) |
Nonmelanoma | 127 (50.2) |
Categories of agreement between PCP degree of melanoma suspicion and app guidance (n = 253 lesions) . | Final diagnosis . | ||
---|---|---|---|
PCP degree of suspicion high + app indicates ‘Evidence of melanoma detected’ | 38 (15.0) | Melanoma | 12 (4.7) |
Nonmelanoma | 26 (10.3) | ||
PCP degree of suspicion high + app indicates ‘No evidence of melanoma detected’ | 13 (5.1) | Melanoma | 0 (0) |
Nonmelanoma | 13 (5.1) | ||
PCP degree of suspicion low + app indicates ‘Evidence of melanoma detected’ | 74 (29.2) | Melanoma | 8 (3.2) |
Nonmelanoma | 66 (26.1) | ||
PCP degree of suspicion low + app indicates ‘No evidence of melanoma detected’ | 128 (50.6) | Melanoma | 1 (0.4) |
Nonmelanoma | 127 (50.2) |
Data presented as n (%).
Interactions between app guidance (at the predefined cutoff level) and primary care physician (PCP)-reported degree of melanoma suspicion: effect on positive (PPV) and negative predictive value (NPV) when adding the physician’s degree of melanoma suspicion to the app’s guidance, if congruent with each other (i.e. both methods agree on suspicious for melanoma or not)
Lesion assessment method . | PPV (%) . | NPV (%) . |
---|---|---|
App outcome alone | 17.9 | 99.3 |
PCPs’ degree of melanoma suspicion alone | 23.5 | 95.5 |
App outcome when combined with PCPs’ degree of melanoma suspicion | 31.6 | 99.2 |
Lesion assessment method . | PPV (%) . | NPV (%) . |
---|---|---|
App outcome alone | 17.9 | 99.3 |
PCPs’ degree of melanoma suspicion alone | 23.5 | 95.5 |
App outcome when combined with PCPs’ degree of melanoma suspicion | 31.6 | 99.2 |
Interactions between app guidance (at the predefined cutoff level) and primary care physician (PCP)-reported degree of melanoma suspicion: effect on positive (PPV) and negative predictive value (NPV) when adding the physician’s degree of melanoma suspicion to the app’s guidance, if congruent with each other (i.e. both methods agree on suspicious for melanoma or not)
Lesion assessment method . | PPV (%) . | NPV (%) . |
---|---|---|
App outcome alone | 17.9 | 99.3 |
PCPs’ degree of melanoma suspicion alone | 23.5 | 95.5 |
App outcome when combined with PCPs’ degree of melanoma suspicion | 31.6 | 99.2 |
Lesion assessment method . | PPV (%) . | NPV (%) . |
---|---|---|
App outcome alone | 17.9 | 99.3 |
PCPs’ degree of melanoma suspicion alone | 23.5 | 95.5 |
App outcome when combined with PCPs’ degree of melanoma suspicion | 31.6 | 99.2 |
Regarding the question about how the PCPs believed that the app’s outcome would have affected their degree of melanoma suspicion if they had been allowed to let it affect their clinical management, the PCPs reported that their suspicion degree would have increased for 61 lesions (24.1%), it would have decreased for 56 lesions (22.1%) and it would have remained unchanged for 136 lesions (53.8%).
Table 4 presents the associations between different lesion variables, including the app’s guidance, and the final diagnosis. As shown, the app’s guidance was also the strongest predictor of melanoma when adjusted for the clinical variables lesion size, patient age and sex.
Odds ratios (ORs) for the app’s guidance (at the predefined cutoff level) and primary care physician (PCP) degree of melanoma suspicion in predicting a melanoma diagnosis, analysed with binary logistic regression analysis (enter method) and adjusted for lesion size, patient age and sex
. | OR . | P-value . | 95% CI . |
---|---|---|---|
App outcome (evidence of melanoma detected)a | 26.55 | 0.002 | 3.29–213.96 |
PCPs’ degree of melanoma suspicion (high)a | 3.35 | 0.02 | 1.19–9.44 |
Lesion size (large)b | 0.98 | 0.56 | 0.89–1.06 |
Age (high)b | 1.06 | 0.002 | 1.02–1.11 |
Sex (female)a | 1.87 | 0.25 | 0.64–5.47 |
. | OR . | P-value . | 95% CI . |
---|---|---|---|
App outcome (evidence of melanoma detected)a | 26.55 | 0.002 | 3.29–213.96 |
PCPs’ degree of melanoma suspicion (high)a | 3.35 | 0.02 | 1.19–9.44 |
Lesion size (large)b | 0.98 | 0.56 | 0.89–1.06 |
Age (high)b | 1.06 | 0.002 | 1.02–1.11 |
Sex (female)a | 1.87 | 0.25 | 0.64–5.47 |
CI, confidence interval. aCategorical variable; bcontinuous variable.
Odds ratios (ORs) for the app’s guidance (at the predefined cutoff level) and primary care physician (PCP) degree of melanoma suspicion in predicting a melanoma diagnosis, analysed with binary logistic regression analysis (enter method) and adjusted for lesion size, patient age and sex
. | OR . | P-value . | 95% CI . |
---|---|---|---|
App outcome (evidence of melanoma detected)a | 26.55 | 0.002 | 3.29–213.96 |
PCPs’ degree of melanoma suspicion (high)a | 3.35 | 0.02 | 1.19–9.44 |
Lesion size (large)b | 0.98 | 0.56 | 0.89–1.06 |
Age (high)b | 1.06 | 0.002 | 1.02–1.11 |
Sex (female)a | 1.87 | 0.25 | 0.64–5.47 |
. | OR . | P-value . | 95% CI . |
---|---|---|---|
App outcome (evidence of melanoma detected)a | 26.55 | 0.002 | 3.29–213.96 |
PCPs’ degree of melanoma suspicion (high)a | 3.35 | 0.02 | 1.19–9.44 |
Lesion size (large)b | 0.98 | 0.56 | 0.89–1.06 |
Age (high)b | 1.06 | 0.002 | 1.02–1.11 |
Sex (female)a | 1.87 | 0.25 | 0.64–5.47 |
CI, confidence interval. aCategorical variable; bcontinuous variable.
Discussion
Diagnosing melanoma in primary care is a challenging task. In this trial we investigated the diagnostic performance of an AI-based clinical decision support tool to detect or dismiss melanoma, when used by PCPs on skin lesions of concern. The results showed that – based on the app’s high NPV (100% for invasive melanomas and 99.5% for all melanomas) – a large proportion of benign lesions excised or referred to a dermatologist could have been declared as benign at the primary care level if the app’s outcome had been applied, without increasing the risk of missing a melanoma. As illustrated in Figure 3, as many as 140 (127 + 13; 55.3%) of the 253 lesions either excised or referred to a dermatologist may not have needed further assessment if the PCP had followed the app’s guidance. In turn, this could reduce the demand for dermatologist and pathologist assessment, thereby increasing accessibility to secondary care for patients who do, in fact, present with a melanoma. It could also reduce inequities in melanoma diagnostics due to variation in diagnostic skills among PCPs, which is likely to be present. Notably, only one melanoma in situ was missed by the app; however, this lesion presented with an unremarkable dermoscopic appearance (Figure S1, lesion #16).
The app was strictly applied on lesions with some degree of suspicion for melanoma. We believe this enhances the significance of this study’s outcome, as this scenario corresponds to the real-life setting in primary care. Although it is known from previous studies that some melanomas are initially missed by the standard clinical assessment (in primary care, as well as by dermatologists),5,7,30 the solution is not to apply this type of app thoughtlessly on all of a patient’s skin lesions, as this could lead to an unmanageable number of false-positive results. However, the app’s high NPV and ease of use might promote use on lesions with a lower degree of suspicion for melanoma that might best be managed with lesion monitoring.
Despite a large number of studies on the ability of AI algorithms to recognize skin cancer by dermoscopic images,12–24 surprisingly few of these have a prospective study design.20–24 Most algorithms have been tested on varying numbers of already sampled images, often comparing AI performance with that of a group of clinicians (e.g. dermatologists).12–15,17–19 However, such an approach has important limitations. Firstly, in clinical reality the examining physician (dermatologist or PCP) considers not only the dermoscopic image, but also other important information such as personal or family history, evolution of the lesion and its appearance in comparison with the patient’s other lesions. Secondly, any sampled series of images will inevitably suffer from a varying degree of selection bias and would not be fully representative for how and on which lesions the AI device would be used in practice. Finally, the diagnostic label applied to a lesion in a simulated situation does not necessarily correspond to how it would be handled in clinical practice (e.g. how convinced the doctor needs to be that a lesion is benign in order not to excise it just for safety’s sake).23
The four previous prospective studies conducted in dermatology clinics reported an AUROC of 0.76–0.94 for detecting melanoma, and a melanoma prevalence rate of 48%, 23%, 56% and 16%, respectively.20–23 In our study, the AUROC was 0.96, despite a melanoma prevalence rate of only 8% among the investigated lesions. The proportion of melanomas of histopathological class pT2 or higher in our study was low, which might be considered a limitation. In comparison, Phillips et al. included 31% pT2–pT4 lesions.21 However, thin melanomas are not only the most common, but also often the most difficult to distinguish from benign melanocytic lesions on dermoscopic examination. Moreover, they are important to detect before they eventually become thicker and worsen the prognosis.31 Finally, if a diagnostic aid is to be implemented in a specific clinical setting, a key element in its validation process is to evaluate it adequately in that particular setting. For example, the lower specificity for the predefined cutoff level vs. that of the in silico material (60.3% vs. 78%), despite the overall higher AUROC value, is likely to reflect this.
The choice to use both histopathology and dermatologist assessment as the diagnostic reference mirrors the standard diagnostic routine for skin tumours assessed in Swedish primary care. Consequently, not all included lesions diagnosed as benign were histopathologically confirmed as such. Excision of this subset of lesions solely for the purpose of the study was not considered ethically justifiable (with regard to patient discomfort and unnecessary scarring) and could also be associated with a risk of dropouts and selection bias due to patients potentially declining surgical excision. We believe that by choosing this approach the results are as close to a true clinical situation as possible, strengthening the study’s generalizability.
In the present study design, the PCPs were instructed not to let the app guidance affect their clinical management. This was owing to insufficient evidence of the app’s diagnostic capacity and reliability based solely on the in silico, pretrial retrospective training. However, considering the favourable outcome of the study, the next step should be to proceed with a randomized study design, evaluating the app when it is actually being used to guide a PCP in the diagnostic process and comparing it with ordinary clinical routine. How physicians tend to rely on advice given by an app, emotionally and intellectually, in diagnosing such a serious condition as cancer, is yet rarely explored, but the presence of evidence-based knowledge supporting its reliability has emerged as an enhancing factor.26,32–34
Another limitation of the study was the varying inclusion rate among the participating PCCs, with a few centres not contributing any lesions at all. This reflects one of the challenges of performing prospective clinical real-life trials in primary care, affected by varying degrees of time pressure, staff shortages and heavy workloads. However, this illustrates the actual situation and circumstances in primary care daily practice, possibly contributing to the authenticity of the study. It also emphasizes the importance of developing novel, practical and useful diagnostic tools that could potentially contribute to reducing unnecessary work tasks (e.g. unnecessary excisions of benign skin lesions), not least since understaffing in primary care settings is reported to be associated with increased risk of missing melanomas at examination.32 A minor study limitation is the absence of data addressing PCP characteristics, such as clinical experience or workplace staffing conditions, which may have been of potential relevance for the interpretation of the study results. Of note, the dominance of individuals with Fitzpatrick skin types I–II in the study population (representative of the general population in Sweden), as well as the exclusion of patients with Fitzpatrick skin types V–VI, limits the applicability of the study results on other populations.
In conclusion, the AI-based decision support tool to detect melanoma evaluated in this study appears to be clinically reliable and of potential clinical benefit in the management of skin lesions of concern assessed in primary care and can improve the identification of lesions in need of dermatological or histopathological assessment. Further research, preferably with a randomized study design, is warranted to determine the tool’s actual usefulness and diagnostic safety over time.
Acknowledgements
Aigora GmbH provided image data for the in silico training of the convolutional neural network used in the clinical decision support prior to the clinical trial.
Funding sources
The study was funded by grants from Region Östergötland, Sweden, and the AIDA network (MedTech4Health).
Data availability
The data underlying this article will be shared upon reasonable request to the corresponding author, with the exception of lesion images other than those presented in Figure S1.
Ethics statement
The clinical trial was approved by the Swedish Ethical Review Authority (approval number Dnr. 2022-00895-01) and by the Swedish Medical Products Agency (CIV-21-12-038346).
Supporting Information
Additional Supporting Information may be found in the online version of this article at the publisher’s website.
References
Author notes
Conflicts of interest P.P. is a co-founder of the clinical decision support (Dermalyser®) studied in the clinical trial. The other authors declare no conflicts of interest.