Quality of reporting of robot-assisted cholecystectomy in relation to the IDEAL recommendations: systematic review

Abstract Introduction Robotic cholecystectomy (RC) is a recent innovation in minimally invasive gallbladder surgery. The IDEAL (idea, development, exploration, assessment, long-term study) framework aims to provide a safe method for evaluating innovative procedures. This study aimed to understand how RC was introduced, in accordance with IDEAL guidelines. Methods Systematic searches were used to identify studies reporting RC. Eligible studies were classified according to IDEAL stage and data were collected on general study characteristics, patient selection, governance procedures, surgeon/centre expertise, and outcome reporting. Results Of 1425 abstracts screened, 90 studies were included (5 case reports, 38 case series, 44 non-randomized comparative studies, and 3 randomized clinical trials). Sixty-four were single-centre and 15 were prospective. No authors described their work in the context of IDEAL. One study was classified as IDEAL stage 1, 43 as IDEAL 2a, 43 as IDEAL 2b, and three as IDEAL 3. Sixty-four and 51 provided inclusion and exclusion criteria respectively. Ethical approval was reported in 51 and conflicts of interest in 34. Only 21 reported provision of training for surgeons in RC. A total of 864 outcomes were reported; 198 were used in only one study. Only 30 reported a follow-up interval which, in 13, was 1 month or less. Conclusion The IDEAL framework was not followed during the adoption of RC. Few studies were conducted within a research setting, many were retrospective, and outcomes were heterogeneous. There is a need to implement appropriate tools to facilitate the incremental evaluation and reporting of surgical innovation.


Introduction
Approximately 70 000 cholecystectomies are undertaken each year in England at a cost of around £111 million 1 . More than 90 per cent of these are performed using laparoscopic techniques 1 . Laparoscopic cholecystectomy (LC) typically results in less postoperative pain, faster recovery, improved cosmesis, and a shorter hospital stay compared with open surgery 2 . Single-incision laparoscopic cholecystectomy (SILC) was developed in 2010 3 in an attempt to further improve cosmesis and decrease postoperative pain 4 ; however, ergonomic limitations and a lack of clear clinical benefit have hindered its adoption into routine practice [5][6][7] .
Robotic cholecystectomy (RC) is the most recent technological innovation for minimally invasive gallbladder surgery. It is performed through single or multiple small incisions, by an operating surgeon seated at a console away from the sterile field 8 . RC has perceived benefits, including enhanced tactile feedback, reduced musculoskeletal strain on the surgeon, better exposure, easier manipulation of the instruments, high-definition three-dimensional visualization, and fewer instrument collisions 5,8 . Due to these purported advantages, it is becoming increasingly popular; in the USA, rates of RC increased from 0.02 per cent of all cholecystectomies performed in 2008 to 3.2 per cent in 2017 9 . This increase may in part be due to surgeons using RC as a means of developing their robotic skills for more complex operations 10 ; however, convincing evidence of clinical benefit over conventional laparoscopic methods has not been forthcoming 11,12 . The disparity between adoption of new techniques and robust evaluation has been observed in other areas of gastrointestinal surgery 13 , leading to calls for tighter regulation of the field 14 . Although surgical robots are considered devices and subject to regulatory approval, there is currently no requirement for individual procedures such as RC to undergo robust clinical evaluation before implementation in clinical practice.
The idea, development, exploration, assessment, long-term follow-up (IDEAL) framework was developed in 2009 and updated in 2019 15 , to provide a stepwise approach for the evaluation and reporting of innovative surgical procedures ( Table 1).
Specific recommendations include details about patient selection, governance measures, surgeon expertise, and standardized outcome reporting, all which are critical to the safe introduction of new surgical procedures. By providing a stepwise framework to report the evolution of innovations, IDEAL seeks to facilitate incremental learning 17 , whereby researchers build on previous reports and add value to the existing evidence base. It is presently unclear whether this process occurred during the adoption and evaluation of RC.
The aims of this study are to understand how RC has been adopted into clinical practice, and to establish whether the evaluation and reporting of RC occurred in accordance with IDEAL guidelines.

Methods
The methods are based on a previously published protocol that aimed to investigate the introduction of a robotic procedure for diseases of the oropharynx 5 . Reporting was conducted in line with PRISMA 2020 guidelines 18 (Tables S1 and S2).

Search strategy and study selection
Searches were undertaken in MEDLINE, Embase, Cochrane Library, and Web of Science databases, from inception to February 2020. Searches consisted of subject headings and text words, combining terms for 'robotic surgery' with 'cholecystectomy' using the Boolean operator 'AND' (Table S3).

Study eligibility
Searches were limited to studies of adults aged 18 years or older and written in English. All primary research study designs (such as case reports, case series, and comparative studies) were eligible for inclusion. Presentations and conference abstracts were excluded because of the high probability of incomplete data. Further exclusions included studies where the main focus was not the surgical procedure (such as anaesthesia, perioperative physiotherapy, or nutrition); describing indications for cholecystectomy other than cholelithiasis or polyps (such as cancer); where a combination of robotic procedures was described (such as when results of RC were reported alongside other robotic procedures and could not be separated); and investigating robotic camera holders rather than RC itself.

Identification and selection of papers
Search results were de-duplicated and uploaded to Rayyan software (Rayyan -a web and mobile app for systematic reviews) 19 . Titles and abstracts were screened independently by at least two authors. The full-text versions of papers retained after title and abstract screening were further assessed for eligibility. Disagreements were first discussed between the reviewers, and any unresolved conflicts referred to the senior authors (N.B. and S.P.); the final decision was the majority opinion. Data from full-text papers were extracted independently by at least two assessors.

Data collection
Data collection was based on IDEAL recommendations and included information about general study characteristics, patient selection, regulatory and governance arrangements, centre and operator expertise, and outcome reporting 13,20 .

General study characteristics and identification of IDEAL stage
The study design, year, and journal of publication, country of origin, and number of participating centres and patients were extracted. The presence and nature of comparison interventions and the type of robotic device used in each study were documented.
Where authors reported an IDEAL stage, it was recorded. Where this information was not provided, a flow diagram designed by the IDEAL Collaboration was used to establish the IDEAL stage 21 . Any difficulties assigning IDEAL stages to papers were recorded. Risk of bias assessments were undertaken for randomized clinical trials (RCTs) using the revised Cochrane Risk of Bias tool 22 .
Any reported rationale for why the study was undertaken was documented in the following categories: assessment of safety and efficacy; support for regulatory approval (such as the Medicines and Healthcare Products Regulatory Agency); description of technique; evaluation of learning curves; description of a centre's experience; prediction of patient outcomes; and/or 'other'.

Patient selection
Inclusion and exclusion criteria for patients undergoing RC were documented for each study. The number of patients declining RC was recorded, along with any stated reasons.

Regulatory and governance arrangements
The reporting of conflicts of interest, study funding and governance approvals (such as ethics committees, institutional review boards, or clinical effectiveness committees) was collected. Statements relating to patient consent, and whether patients were specifically informed of the innovative nature of RC, or of modifications made to the surgical technique, were recorded.

Centre and surgeon expertise
Information about centre expertise, such as the volume of robotic and non-robotic cholecystectomies undertaken at the institution(s), was recorded. Information about the number of surgeons performing the operation, and the expertise of those surgeons was also extracted, including their grade and experience with RC, and any details of specific training and mentorship in RC.

Outcome selection, measurement, and reporting
Outcomes reported in each manuscript were recorded verbatim and categorized into domains by two researchers (E.K. and C.S.J.; Table S4). To determine the number of distinct outcomes, those with the same meaning but different wording, were rationalized within each domain. Where reported, the duration of follow-up for each study was documented.

Data synthesis
Results were summarized in a narrative synthesis, with descriptive statistics where appropriate. The study did not aim to investigate the effectiveness of RC, therefore meta-analyses were not performed. To evaluate whether studies' rationale and outcomes evolved over time, data were presented by IDEAL stage.

Results
Of 1425 abstracts and 303 full-text articles screened, a total of 90 articles, published between 2001 and 2020, were included ( Fig. 1

General study characteristics
Among the 90 studies there were five case studies, 38 case series, 44 non-randomized comparative studies and three RCTs. Most studies were single-centre (n = 64) and only 15 were prospective ( Table 2). All three RCTs compared RC with LC, and were published in 2014 23 (single-centre, n = 22), 2015 24 (single-centre, n = 60), and 2017 25 (multicentre, n = 136; Table S5). The risk of bias was unclear in two 23,25 , and in one 24 there was a large (more than 20 per cent) loss to follow-up. The most commonly used robots were Da Vinci systems (Intuitive Surgical (California, US)., 66). Seventeen studies provided no description of the system used.
No studies reported an IDEAL stage. The first study (a case series of 20 patients published in 2001) was considered to be IDEAL stage 1. Forty-three studies were identified as IDEAL 2a, 43 as IDEAL 2b, and three as IDEAL 3 (the RCTs), with no studies meeting the criteria for IDEAL stage 4. We experienced difficulties assigning IDEAL stages to many of the included papers. Overall, 49 studies were retrospective in nature and therefore did not strictly meet the IDEAL criteria, and had a further problem was the lack of detail about technique description or modifications, making it difficult to differentiate between stage 2a and 2b. Although two studies undertook data analysis from large databases, they only included information about short-term adverse events and, as such, did not meet the criteria for IDEAL stage 4. Although the number of IDEAL 2b studies has increased over time, only three were conducted prospectively. IDEAL stage 2a studies are still being conducted, despite the fact that the first RCT was published in 2014. There is, therefore, minimal evidence of evolution of study design as per the IDEAL recommendations (Fig. 2).
Of the 90 studies, 73 reported a rationale. Most commonly, this was to assess safety, efficacy, and adverse events (n = 38). Others included descriptions of a centre's experience (n = 18), prediction of outcomes (n = 13), evaluation of the learning curve (n = 11), and/or descriptions of the surgical technique (n = 7). There was no correlation between study rationale and IDEAL stage (the rationale did not evolve despite advancing IDEAL stage; Table 3).

Patient selection
Sixty-four and 51 studies provided inclusion and exclusion criteria respectively (Tables 4 and 5). Eight studies reported that there were no exclusion criteria. A total of 15 studies described how patients were selected for robotic surgery over conventional approaches: availability of the robot (n = 8), surgeon's discretion (n = 4), willingness to pay (n = 1), the time interval of recruitment (before and after the robot became available, n = 1), and one study stated that there were no formal selection criteria. No studies specifically commented on the number of patients declining RC.

Regulatory and governance arrangements
Ethical approval was reported in 51 of the 90 studies (institutional review boards, n = 41 and ethics committee, n = 10) and four reported registration within a trials register (ClinicalTrials.gov, n = 3 and Australian New Zealand Clinical Trials Registry, n = 1). Conflicts of interest were common, with 11 studies funded by the robot manufacturer and a further 23 reporting conflicts of interest between the author(s) and the manufacturer.
Although patient consent for study participation was explicitly documented in 42 studies, just four stated that patients were informed of the innovative nature of RC 10,26-28 . Of the 10 studies reporting modifications to the robotic technique during the study, none reported that patients were informed of this.

Centre and operator expertise
Four studies defined the participating centres' usual caseload for RC (range 50-500 per year). The number of surgeons performing robotic surgery was reported in 51 studies (median 2, range 1-42). The grade of operating surgeon(s) was reported in 12 studies (consultant/attending, n = 2 and mixed trainee and consultant, n = 10). Provision of training in RC was reported in 21 studies, mostly consisting of animal-based (n = 12), simulation (n = 10), and dry laboratory (n = 6; Table 6). Proctorship and dualconsultant operating were each reported in four studies.

Discussion
This comprehensive review of the reporting of the adoption of RC summarizes information from 90 studies published between 2001 and 2020. The current evidence base for RC is formed largely by retrospective observational studies from single centres. Although three RCTs were identified, they were small and poorly designed. Most studies aimed to assess the safety of RC, with little evolution of study rationale or design that would be expected based on synthesis of preceding evidence. Details of regulatory and governance arrangements were infrequently reported, and conflicts of interest were common. Selection criteria were inconsistently reported, limiting understanding of which patients were offered the new procedure and why. Provision of training in RC was poorly reported with only four studies reporting any ongoing monitoring or proctorship. Outcome selection and reporting was heterogeneous, with 198 of the outcomes used just once. This review highlights that RC has been adopted into clinical practice without adequate comparative or prospective evidence and without the parameters of the IDEAL recommendations. This means that uncertainties about the efficacy, effectiveness, and cost-effectiveness of RC remain, which has inherent risks for clinical practice. More rigorous methods for evaluation of surgical innovation are therefore recommended. Two meta-analyses comparing RC and LC have been undertaken. The first (2016) included one RCT and 12 observational studies. The second (2017) included five RCTs (two of which were outside the inclusion criteria for our review) and 21 observational studies 11,12 . Neither identified any significant difference in complications, readmission rates, or hospital stay, although operating time and the incidence of postoperative incisional hernia were higher after RC 11 ; however, these meta-analyses were based primarily on retrospective observational studies and therefore must be interpreted with caution due the presence of confounders, selection bias, and differences in study design 22 . Both studies highlighted the issue of heterogeneous outcomes, which reduced the number of studies available for meta-analysis. This finding is consistent with our own study and illustrates how heterogeneous outcomes can impair evidence synthesis [49][50][51] . The COMET (Core Outcome Measures in Effectiveness Trials) Initiative 52 recommends the development of core outcome sets (an agreed minimum set of outcomes that should be measured and reported in all clinical trials of a specific disease or trial population) 53 with an expectation that core outcomes will be collected and reported, making it easier for the results of studies to be compared, contrasted, and combined as appropriate 52,54 . Core outcome sets are increasingly mandated by journals before publication; streamlining the outcomes reported in robotic surgery would enable the efficacy and effectiveness of robotic procedures to be clearly detailed, subsequently optimizing transparency, maximizing patient benefit, and reducing harms.
To our knowledge, this review represents the first in-depth case study to summarize published evidence of how a robot-assisted procedure was adopted into clinical practice. Although the inclusion of all study types allowed a comprehensive review of the evidence base for RC, this study has some limitations. First, the exclusion of non-English language papers may have resulted in some relevant papers being missed. Second, reporting standards and expectations change with time; 19 of the included studies were published before the introduction of the IDEAL framework in 2009 and benchmarking such studies against these criteria may be considered unfair, although the principles underpinning IDEAL represent the foundations of evidence-based surgery. A third limitation is that the IDEAL Collaboration's flow chart for determining stage of innovation was challenging to use because most papers did not provide information about technique descriptions or modifications, creating difficulties in distinguishing between 2a and 2b studies. Furthermore, many of the studies were difficult to classify given their retrospective nature; however, aside from the temporality of the study, other criteria to classify the IDEAL stage were met and they were therefore assigned stages while acknowledging this limitation. Retrospective categorization of studies to IDEAL stages has been recorded in the literature in line with this 55 . It is widely recognized that there is still a need for the quality of surgical research to improve, including the heavy reliance on retrospective study designs due to their inherent limitations.

Disclosure
The views expressed are those of the authors and not necessarily those of the UK National Health Service, NIHR, or Department of Health. J.B. is an NIHR Senior Investigator. N.B. is a Medical Research Council Clinician Scientist. This work was not preregistered with an analysis plan in an independent, institutional registry.

Supplementary material
Supplementary material is available at BJS Open online

Data availability
We are willing to make our data, analytic methods, and study materials available to other researchers on request.