CNV-ClinViewer: enhancing the clinical interpretation of large copy-number variants online

Abstract Motivation Pathogenic copy-number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype–phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that require the integration and analysis of information from multiple scattered sources by experts. Results Here, we introduce the CNV-ClinViewer, an open-source web application for clinical evaluation and visual exploration of CNVs. The application enables real-time interactive exploration of large CNV datasets in a user-friendly designed interface and facilitates semi-automated clinical CNV interpretation following the ACMG guidelines by integrating the ClassifCNV tool. In combination with clinical judgment, the application enables clinicians and researchers to formulate novel hypotheses and guide their decision-making process. Subsequently, the CNV-ClinViewer enhances for clinical investigators’ patient care and for basic scientists’ translational genomic research. Availability and implementation The web application is freely available at https://cnv-ClinViewer.broadinstitute.org and the open-source code can be found at https://github.com/LalResearchGroup/CNV-clinviewer.


Introduction
One of the reasons for genetic disorders is copy-number changes of one or more genes, resulting from deletions, duplications, or other genomic rearrangements. Most diseaseassociated CNVs are unique, cause severe complex disorders, and are often hard to distinguish from benign CNVs frequently found in the general population (Riggs et al. 2020).
Although the average size of pathogenic/likely pathogenic CNVs in ClinVar (Landrum et al. 2018) of 9.7 Mb (95% CI 9.3-10.1 Mb, median ¼ 2.6 Mb) is likely overestimated due to frequent imprecise discovery of CNVs with arrays, diseaseassociated CNVs typically cover tens to hundreds of genes. Thus, pinpointing the underlying disease driver and modifier genes represents a major challenge. Given the complexity of CNV interpretation, the American College of Medical Genetics and Genomics (ACMG) has developed technical standards (a quantitative, evidence-based scoring framework) to standardize the evaluation process of the genomic content of a CNV region, and to promote consistency and transparency in classification and reporting across clinical laboratories (Riggs et al. 2020). The application of those standards involves clinical and genetic CNV data that are scattered across registries, databases, and the literature. Therefore, it is difficult and highly time-consuming to annotate, analyze, and interpret CNVs manually in a clinical setting.
Several bioinformatics tools, often fully-automated, have been developed for clinicians and researchers to facilitate large-scale CNV classification according to the ACMG guidelines (for example see (Geoffroy et al. 2018, Gurbich and Ilinsky 2020, Requena et al. 2021). However, no algorithm or even classification framework can perfectly capture data, i.e. usually only available to experts, such as deep clinical, CNV, or gene-level information. This specifically affects data that cannot be extracted from public online resources, such as detailed information about the phenotype, including family history, CNV inheritance pattern, and many other criteria currently not integrated in the clinical significance classification. For the semi-automated tools (Fan et al. 2021), the extracted and manually entered information is not explorable within the same web-application interface and comparative visual inspection of CNVs with various genomic and clinical data sources is not possible because current tools do not provide such functionality (Geoffroy et al. 2018, Gurbich and Ilinsky 2020, Fan et al. 2021, Requena et al. 2021). While the current focus of those existing tools is on clinical significance classification, they are not designed to perform ad-hoc evaluations of new data such as genotype-phenotype analyses to expose undiscovered patterns of CNV localization in patient cohorts. To show a correlation of patient CNVs with known disease genes at a specific locus or to narrow down dosagesensitive genes that are likely driver genes in particular CNVs, the interactive visual inspection of affected genomic content combined with crosslinked data from various sources can enhance the interpretational analysis of CNVs. Paradoxically, the individuals who collect the most relevant clinical information for CNV clinical significance classification and interpretation, such as genetic counselors, treating physicians, and patients' families, often lack the required bioinformatical expertise to examine several patient CNVs simultaneously and integrate information from multiple scattered sources timeefficiently. To overcome current limitations for biomedical CNV interpretation of those individuals and researchers, we developed the CNV-ClinViewer, a fully open-source exploration and interpretation platform that integrates analysis, annotation, guideline based, classification and clinical evaluation of CNVs and provides users, even families with CNVs of uncertain significance, with an interface to curate, visualize, interact with, and continuously re-evaluate the CNV data.

Materials and methods
Details about the data sources used in the CNV-ClinViewer can be found in Supplementary Table S1.

Genetic CNV variants and regions datasets
CNVs with annotated clinical significance were obtained from ClinVar (Landrum et al. 2018) (updated quarterly), and CNVs identified in the general population were obtained from the UK biobank (Aguirre et al. 2019) as well as from gnomAD (Collins et al. 2020) (controls-only dataset, version 2.1). GnomAD CNVs were reduced to those with a filter equal to PASS, and SVTYPE equivalent to "DEL" or "DUP", and all CNVs from the general population were filtered for large CNVs >50 kb. Specifically, we collected 52 344 CNVs from ClinVar (10 654 pathogenic/likely pathogenic CNVs), 195 916 CNVs from the UK-biobank (>50 kb), and 10 370 CNVs from gnomAD (>50 kb). The genome coordinates of CNVs from the UK biobank and gnomAD were converted to GRCh38 using the UCSC LiftOver tool (Kent et al. 2002, Riggs et al. 2020.
To annotate neutral as well as clinically relevant regions by use of a visual summary of the variants, we further processed the variants by counting the number of deletions or duplications in the different datasets using a sliding window approach with a window size of 200 kb and step size of 100 kb. For the cohort-level data from the UK-biobank and gnomAD, we divided the counts by the number of samples in the cohort to retrieve region-specific allele frequencies.
Known CNV syndromes and genomic regions with evaluated dosage sensitivity were obtained from DECIPHER (Firth et al. 2009) and ClinGen (Rehm et al. 2015) (updated quarterly) databases.

Gene-level annotations
We retrieved a list of 18 792 protein-coding gene symbols from the HUGO Gene Nomenclature Committee (HGNC) (Tweedie et al. 2021). These genes were annotated by genomic boundaries of one transcript from RefSeq (O'Leary et al. 2016) (GRCh37/hg19 and GRCh38/hg38), either the MANE select transcript or alternatively the longest transcript. In addition, we annotated gene-level scores for sequence constraint and dosage sensitivity in humans (Table 1) and gene-disease associations from ClinGen (Rehm et al. 2015) (updated quarterly).

CNV-ClinViewer web server development
The CNV-ClinViewer was developed with the Shiny framework of R studio software (v.1.7.1, https://shiny.rstu dio.com/) which transforms regular R code into an interactive environment that can follow and "react" to remote-user instructions. The pre-processed data alongside the R/Shiny code was uploaded as a stand-alone Ubuntu image with Google Cloud services. The image was deployed into a Google Virtual Machine (VM) using the google ComputeEngineR package (v.0.3.0, https://github.com/clou dyr/googleComputeEngineR). The CNV-ClinViewer web server (https://cnv-ClinViewer.broadinstitute.org/) is compatible with all commonly used Internet browsers.
The CNV-ClinViewer is an open-source project, and its code will continue to grow and improve through version control in the GitHub repository (https://github.com/LalResearch Group/CNV-clinviewer).

Results
We present the CNV-ClinViewer, a user-friendly web application that semi-automatedly enhances the ACMG guidelinebased CNV classification (Kuleshov et al. 2016) for caregivers and facilitates the biomedical interpretation of gene content at a locus of interest for scientists. Development, workflow, and main features are presented in Fig. 1.

Upload of one up to thousands of CNVs for clinical interpretation and genetic reports
The CNV-ClinViewer allows simultaneous analysis of single or multiple CNVs, irrespective of the technology used to identify them. It requires as input the genomic coordinates of CNVs based on the human reference genome GRCh37/hg19 (https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_00000 1405.13/) or GRCh38/hg38 (https://www.ncbi.nlm.nih.gov/ data-hub/genome/GCF_000001405.40/), either by copypasting or by uploading a tab-separated text or Excel file (max. number of CNVs ¼ 10 000). Minimal required information for each CNV, including whole chromosome trisomies and monosomies, is the chromosome, start, end, and CNV type (deletion or duplication). Optionally, the user can provide sample IDs, phenotype information for filtering and a score of manually assessed evidence categories of the 2019 ACMG standard guidelines (see below). Details and example files can be found in the help section and on the about page of the CNV-ClinViewer.
After CNV submission, the user is directed to the analysis interface. Here, five different analysis panels are available. In the first analysis panel, the uploaded CNVs are displayed in a downloadable table comprising their annotated clinical significance and details about the scoring ( Fig. 2A). The CNVs are classified by ClassifyCNV (Gurbich and Ilinsky 2020), a command-line tool integrated into the CNV-ClinViewer infrastructure that automatically evaluates the evidence categories 1A/B, 2A-H, 3A-C, 4O for copy-number losses and 1A/B, 2A-H, 2J-L, 3A-C, 4O for copy-number gains from the 2019 ACMG/ClinGen Technical Standards for CNVs (Riggs et al. 2020). In case the user uploaded additional scores of relevant

>0.9
Sequence constraint score as proxy for haploinsufficiency loeuf  Loss-of-function observed/expected upper bound fraction. Low LOEUF scores indicate strong selection against predicted loss-of-function (pLoF) variation in a given gene.

<0.35
Haploinsufficiency pHI (Collins et al. 2022) Predicted probability of haploinsufficiency of autosomal genes by predictive model from a cross-disorder dosage sensitivity map of the human genome (analysis of rare CNVs from >700k individuals). Genes with high pHI scores are considered more likely triplosensitive.

>0.833
Triplosensitivity pTS (Collins et al. 2022) Predicted probability of triplsensitivity of autosomal genes by predictive model from a cross-disorder dosage sensitivity map of the human genome (analysis of rare CNVs from >700k individuals). Genes with high pTS scores are considered more likely triplosensitive.

(sufficient evidence)
Triplosensitivity TS Score ClinGen (Rehm et al. 2015) The ClinGen Dosage Sensitivity expert group collects evidence supporting/refuting the triplosensitivity of genes.

(sufficient evidence)
a The given thresholds are used in the CNV-ClinViewer to annotate dosage sensitivity of genes. ACMG clinical significance criteria for which additional information, such as family history or the de novo status, is required, the score is added to the score from ClassifyCNV to refine the final score and classification. In the second analysis panel, a comprehensive report of the individual CNVs can be downloaded (Fig. 2B). The report includes details about the clinical significance classification and the overlap with established/predicted haploinsufficient/triplosensitive and clinically relevant genes and genomic regions.

Uploaded CNV data can be visually inspected alongside publicly available data
The third panel of the analysis interface is a genomic viewer ( Fig. 3 and Supplementary Fig. S1A-F). Here, the uploaded CNVs and their genomic region can be inspected alongside biomedical annotations and other pathogenic and general population CNV datasets. Included are five data tracks that the user can interactively navigate by zooming in/out, moving horizontally, and selecting genomic regions of interest. In the first track of the genomic viewer ( Supplementary Fig. S1B), the CNV-ClinViewer integrates seamless evaluation of the gene content by visualization of all protein-coding genes.
Here, the genes are highlighted by a selection of gene dosage sensitivity scores (Table 1) Fig. S1D). The latter are displayed in an interactive plot grouped by their assigned clinical significance (pathogenic/likely pathogenic versus uncertain significance versus benign/likely benign) and can be filtered based on their type and clinical significance. Details about the ClinVar CNVs, such as their reported phenotypes or allele origin and the link to the ClinVar variant website, are shown in a table and can be downloaded for further analyses.

Additional features and analyses provide users with advanced CNV insights
Below the genomic viewer, the user can retrieve and download information about the intersecting genes, classified genedisease associations from ClinGen (Rehm et al. 2015), known dosage-sensitive regions and CNV syndromes (Fig. 4A)   Here, the user can visually compare and dynamically filter uploaded CNVs, perform a seamless evaluation of the gene content, identify diseaseprone regions and find unknown patterns of CNV localization to generate hypotheses for further research. An extended view of the genomic viewer can be found in Supplementary Fig. S1. genomic region by comparing it to more than 180 annotated gene sets representing prior biological knowledge such as pathways, Human Phenotype Ontology (HPO) terms (Kö hler et al. 2021), and known (rare) diseases (Fig. 4B).

Example analysis: 9q33.3q34.11 microdeletions
To illustrate the utility of the CNV-ClinViewer, we performed an example analysis of 14 previously published CNVs (9q33.3q34.11 microdeletions) from patients with complex developmental disorders from the literature (n ¼ 14) (Campbell et al. 2012, Saitsu et al. 2012, Ehret et al. 2015, Nambot et al. 2016. Here, the CNVs were first uploaded and automatically classified as (likely) pathogenic ( Supplementary  Figs S2 and S3). Next, the smallest region of overlap with its dosage sensitive gene STXBP1 could be immediately identified due to the interactive visualization of the overlapping patient CNVs ( Supplementary Figs S4 and S5). In addition, the CNV-ClinViewer could resolve the clinical heterogeneity of patients by enabling the user to filter the CNVs based on assigned phenotypes (Supplementary Fig. S7) and retrieve region-specific information about intersecting clinically relevant genes and their associated phenotypes ( Supplementary  Fig. S6) as well as overlapping patient CNVs from ClinVar ( Supplementary Fig. S8). Overall, we demonstrate the CNV-ClinViewer's fast and seamless biomedical interpretation of CNVs and its potential to replicate research findings due to a large number of annotations and visual inspection capabilities. To perform a comparable analysis without the CNV-ClinViewer, the user would be required to classify the CNVs manually one at a time or with the support of an existing tool, visualize the CNVs with yet another tool or genomic viewer, visit several webpages, and process and annotate information from various databases. The example analysis, including an example of benign CNVs of chromosome 9, can be found in more detail and with step-by-step instructions illustrated with images in the Supplementary File.

Discussion
By aggregating more than 250 000 population and patient CNVs, annotating various gene scores and clinical annotations, and integrating useful existing bioinformatics tools, we developed a novel and user-friendly interface as a decision support tool for the clinical evaluation of CNVs and comparative visual inspection.
CNV analysis and interpretation tools and, in general, clinical decision support tools can make patient care more efficient, cost-effective, and guideline concordant. However, although the prediction results from many approaches are promising, their value is limited by their lack of interpretability and human intuition. The black-box nature of some pathogenicity predictions can further exacerbate trust issues and worsen the overall experience (Cai et al. 2019). To address this issue, human-centered tools with interactive mechanisms can grant end-users more agency in guiding the interpretation and can be used for critical decision-making purposes beyond an algorithm (Cai et al. 2019, Babione et al. 2020. Therefore, in the development of the CNV-ClinViewer, we emphasized interactive analyses of the CNVs beyond a classification score and allow the user to dynamically inspect and compare the results in detail. In comparison, existing opensource tools (Geoffroy et al. 2018, Gurbich and Ilinsky 2020, Fan et al. 2021) focus more on the classification scores than on the interactive inspection and further investigations. We believe that the comprehensive approach of the CNV-ClinViewer will empower users with more trust in the results, the agency to test hypotheses, and enable them to apply their domain knowledge while simultaneously leveraging the benefits of automation.
To ensure that a tool is easy to use and is designed for target users at each development stage, human-centered design methods and principles should be applied (Babione et al. 2020, Bates et al. 2003. Such methods draw from wellestablished design principles in many disciplines, including usability, visualizations, and interface design, and consider users' context and experience. As the effectiveness of clinical tools may be severely limited without the application of those principles and could even contribute to adverse medical events (Graham et al. 2008), we put great emphasis on developing a user-friendly interface with simple data upload with minimal requirements, a clear analysis workflow, and real-time results, while giving the user the ability to retrieve details and help and modify the analysis when required. We also focused on the visual and intuitive inspection of CNV data and its interaction with the user. In particular, for genomic data, the integration of visualizations is essential for interpretation and hypothesis generation, essential for combining different data sources and discovering unexpected localization patterns of genomic annotations in a short amount of time, as well as a valuable aid in communicating discoveries (Nusrat et al. 2019).
On the one hand, the CNV-ClinViewer will assist clinicians, clinical geneticists, and genetic counselors in analyzing and interpreting CNV data and improving objectivity and accuracy in the clinical work-up. On the other hand, the collected and integrated knowledge combined with clinical judgment will enable clinicians and researchers to formulate novel hypotheses and guide their decision-making process.
The current features of the tool are focused on large CNVs that intersect several genes when gene prioritization plays a role in the analysis. Future versions will increase the granularity of the tool, e.g. by providing a more detailed gene track that includes exons and introns, and different transcripts with expression data. With that, it will bring more value for interpreting intragenic deletions and duplications. Also, to integrate with existing workflows, we plan to develop an application programming interface (API) to query the system and fetch the results without using a graphical user interface. Overall, the CNV-ClinViewer is the first framework that enables interactive exploration of CNV data and semi-automated variant classification. With its scalable architecture, the web application is well suited for both genome-wide (re-)evaluations of large datasets and the small number of CNVs. The CNV-ClinViewer is an open-source tool (https://www.cnv-ClinViewer.broadinstitute.org/), including complete documentation on the use of the website.

Supplementary data
Supplementary data is available at Bioinformatics online.

Ethics declaration
This study does not involve human subjects. We use as reference datasets copy number variants from public resources such as databases and published research studies, in which individual investigators or organizations have consented their individuals. For example, the CNV-ClinViewer contains individual copy-number variant (CNV) data from ClinVar, which is de-identified and publicly available (https://www.ncbi.nlm. nih.gov/clinvar/). CNV data from gnomAD (https://gnomad. broadinstitute.org/downloads) and the UK Biobank (https:// biobankenginedev.stanford.edu/downloads) are aggregated and de-identified data. The submitters of all databases had obtained appropriate consent for data sharing.