REEV: review, evaluate and explain variants

Abstract In the era of high throughput sequencing, special software is required for the clinical evaluation of genetic variants. We developed REEV (Review, Evaluate and Explain Variants), a user-friendly platform for clinicians and researchers in the field of rare disease genetics. Supporting data was aggregated from public data sources. We compared REEV with seven other tools for clinical variant evaluation. REEV (semi-)automatically fills individual ACMG criteria facilitating variant interpretation. REEV can store disease and phenotype data related to a case to use these for phenotype similarity measures. Users can create public permanent links for individual variants that can be saved as browser bookmarks and shared. REEV may help in the fast diagnostic assessment of genetic variants in a clinical as well as in a research context. REEV (https://reev.bihealth.org/) is free and open to all users and there is no login requirement.


Introduction
In recent years, high throughput genetic testing has fundamentally changed the diagnostic paradigm in clinical genetics ( 1 ,2 ).Hitherto, clinical geneticists were testing a very limited number of genes selected depending on a patient's phenotype.If a variant with severe consequences was found in the patients W 149 but not in healthy controls, it was likely deemed pathogenic.However, by now access to high throughput genetic analyses such as exome sequencing (ES) or genome sequencing (GS) could be implemented in standard health care ( 3 ,4 ).Accordingly, clinical geneticists have been faced with new challenges.One is, they now have to interpret a considerably larger number of variants from screening assays, hoping to identify the underlying disease-causing mutation among them ( 5 ,6 ).Specialized software is required to achieve this goal.Such tools for variant prioritization usually use two sources of information: variant-specific (molecular, evolutionary and population genetic) features and gene-to-phenotype associations (7)(8)(9).However, collecting, connecting and integrating these data about a variant of interest from multiple online sources is time consuming and individual tools frequently lack an easily accessible and comprehensive output ( 10 ,11 ).Also, existing platforms for gathering gene and variant information often fall short by either missing critical data or predictions, or by restricting access to these features to their commercial, fullversion offerings ( 10 ,12-14 ).In particular, while several tools exist for the analysis and interpretation of small coding variants, albeit with the limitations mentioned above, there are currently only few platforms for the analysis and evaluation of structural variants ( 15 ).Finally, critical to the process of clinical variant evaluation is a robust final assessment of the variant's pathogenicity.To this end, the American College of Medical Genetics and Genomics (ACMG) has developed standardized guidelines for the classification of both sequence and structural variants ( 16 ,17 ).Although there are tools that offer individual or automated ACMG classification, an easy-touse tool that integrates all these needs in a single platform is lacking.
Here, we introduce such a tool for the analysis, interpretation and ACMG classification of both small and structural variants.REEV (Review, Evaluate and Explain Variants; https: // reev.bihealth.org/) provides an automated classification proposal, which can be easily understood, adapted, and amended.We also aim to compare REEV with other available web tools for clinical variant evaluation.

Method outline
All components of the REEV backend pipeline including REEV's specific original code, as well as components using or referring to third-party software and services (Table 1 ) are free to use in academic and commercial settings (open source, MIT license).All code, tests and data required to run REEV can be found at https:// github.com/bihealth/ reev .A comprehensive user documentation including a quickstart and a tutorial can be found at https:// reev.readthedocs.io/ .REEV is technically designed to speed-up diagnostic variant evaluation and classification in both research and clinical settings.We offer REEV as a complete open-source software suite including comprehensive automated tests as well as deployment scripts following FAIR4RS principles.Thus, REEV can also be individually extended and tailored to the specific needs of the laboratory or clinical institution using it.In several jurisdictions, software used for clinical diagnostics needs to be certified in accordance with national legal regulations.Any laboratory or institution using REEV should be aware that REEV has not been formally certified for diagnostic use in a clinical setting.Responsibility to use REEV or derivatives of REEV in a clini-cal setting and certify them for clinical use lies with the using institutions.

Data storage and preprocessing
On the server, static data is stored in files.Smaller datasets are stored in text files or compressed binary Protocol Buffers format and loaded into memory on startup.To allow for reduced on-disk storage, low main memory footprint and fast lookup, larger datasets are generally stored in RocksDB (an embedded key / value store).
All data is downloaded from public sources using a Snakemake ( 18 ,19 ) workflow (s. also Data Availability ).The resulting files are publicly available from our S3 server (see also Data availability ).

S oftw are architecture and backend
REEV is designed to ensure full transparency and reproducibility of variant analysis in concordance with the FAIR4RS principles ( 20 ) and enable timely updates from quickly evolving data sources.REEV is actively maintained with updated releases scheduled once a quarter.
The overall architecture is a typical 'microservice based' web application, as roughly depicted in the graphical abstract.The 'REEV' server is a web server consisting of two layers.The actual user interface consists of a TypeScript / Vue single-page app (SPA) front-end (served through the web server) and a Python / FastAPI based backend for the SPA.The FastAPI server provides functionality for user login and persisting / managing data of logged-in users in a PostgreSQL database.It also functions as a reverse proxy to a number of backend services that provide the actual data and functionality.
These 'microservices' are: Annonars provides access to data on genes and variants which are stored in RocksDB databases.Mehari provides transcript-based variant consequence annotations (i.e. to project a genomic variant to the transcriptlevel and compute whether the variant leads to a missense or frameshift variant at protein level).Viguno provides access to HPO terms (21)(22)(23), a full-text index thereof, and standard ontology algorithms based on information content.The services above are implemented in the Rust programming language.The cada-prio service provides phenotype similarity queries based on the CADA ( 24 ) algorithm.The dotty service provides functionality for transforming variant descriptions between different notation systems, including HGVS ( 25 ), using the hgvs Python package ( 26 ).These two services are implemented in the Python programming language.
These microservices are running (together with utility services such as nginx , traefik , PostgreSQL , Redis , and Rab-bitMQ ) in Docker containers and orchestrated with Docker Compose.

Frontend overview
In the following, we focus on the features of the software from the perspective of the user.
REEV allows for the search of genes as well as sequence and structural variants.A short display of supported input styles is shown on the REEV startpage, while a quickstart and full tutorial on how to navigate REEV are provided in the REEV documentation available at https://reev.readthedocs.io .REEV frontend allows to interface with semi-automated ACMG scoring tools, phenotype similarity applications and pubmed interfaces.To further support variant evaluation, validation and interpretation REEV also links out to many additional external resources, e.g. to the GA4GH beacon network, V ariantV alidator or MutationTaster.Current versions of the sources and tools integrated with REEV can be found at https:// reev.cubi.bihealth.org/info#data-versions .
complete list of integrated sources see Table 1 .An overview of the REEV frontend for the query of sequence and structural variants can also be found in Supplementary Figures S1 and  S2 , respectively.

Genes
At first, REEV provides basic information about a selected gene, including a short summary from NCBI Gene ( 27 ), a list of and link-outs to alternative identifiers for this gene, and link-outs to useful resources on gene level, e.g.DECIPHER ( 28 ), OMIM ( 29 ), pubmed ( 27 ), etc.If applicable for the gene of interest, REEV offers links to locus-specific databases, and NCBI's References Into Functions (RIFs) .Next, REEV gives an overview about the gene's potential pathogenicity by providing haploinsufficiency and triplosensitivity scores, e.g.ClinGen ( 30) DECIPHER ( 28 ), gnomAD pLI and pLOEUF ( 6 ,30 ).The user is shown information about associated phenotypes (HPO terms (21)(22)(23)) and diseases (OMIM ( 29 ), Orphanet ( 32 ), Genomics England PanelApp ( 33 )).In the case of a gene with an -as of yet -unknown disease association, REEV also displays gene expression data from the GTEx project ( 34 ).Also, REEV offers aggregated ClinVar variant statistics ( 35 ).This includes a summary of variant counts, a visualization of variant population frequencies separated by their ACMG class ( 16 ), and a plot of the variants' positions and their ACMG classes.Finally, REEV summarizes available information of the gene of interest from the literature providing the ten most relevant hits from PubTator3 ( 36 ,37 ); (full data are available via link-outs to PubTator3 and PubMed).

Sequence variants
When querying a sequence variant, REEV first provides the above-mentioned details on the respective gene.Following this gene information, the user is presented with a table showing the impact of the variant on different transcripts (according to NCBI GenBank ( 27 ,38 )).REEV also provides an overview about the variant's ClinVar ( 35 ) entries, including whether the variant is present, its respective ClinVar reference assertion, its most pathogenic significance, and its review status.Users can also fold out this ClinVar card and look at the individual reference assertion.Furthermore, in this section REEV provides gnomAD v4 ( 6 ) population frequencies of the queried variant in different populations as well as the different sexes (XX vs. XY genotype) and the variant's UCSC 100 vertebrate conservation on protein level ( 39 ).Additionally, link-outs to genome browsers (ENSEMBL ( 40 ), UCSC ( 41 )) and various external tools (DGV ( 42 ), Genoox Franklin (franklin.genoox.com),gnomAD ( 6 ,30 ), Mutation-T aster ( 43 ,44 ), V arsome ( 12)) help the user to further assess their variant of interest.REEV not only comes with an integrated display of variant pathogenicity scores from different tools (aggregated by dbNSFP ( 45 )) but also provides a color coded suggestion of the ACMG criterion PP3 as well as raw pathogenicity scores calibrated following Pejaver et al. ( 46 ) (see also ACMG classification ).Finally, users can query the W 151 GA4GH Beacon network ( 47 ) for entries of the variant by other institutions.Users may also submit the variant to Vari-antValidator ( 48 ) to obtain gold standard HGVS representation.We provide an example query for the sequence variant chr7:42012159:T:G (i.e.GLI3(NM_000168.6):c.1880A> C, p.(His627Pro)) using REEV in Figure 2 , and online in our REEV tutorial (reev.readthedocs.io).

Structural variants
Besides sequence variants, REEV also allows users to examine structural variants.These can be provided either in gno-mAD ( 6 ,30 ) (GRCh37 or GRCh38 coordinates) or ISCN style ( 49 ).As for the sequence variants, REEV first shows general information on the affected gene followed by variant specific information.Here, REEV depicts an overview of the affected genes in the form of a list of genes that are overlapping with or are close to the structural variant.REEV displays how the variant affects every gene (i.e. gene fully or only partially contained; breakpoints exonic, intronic or extragenic).In the case of multiple overlapping genes, users may sort this list of genes by, e.g.gnomAD ( 6 ,30 ) ) and other external tools (DGV ( 42 ), Genoox Franklin (franklin.genoox.com),gnomAD ( 6 ,30 ), Varsome ( 12 )) for further analysis.For an example query in REEV invoking the structural variant DEL:chr14:37131998:37133815, see also Figure 3 as well as the REEV tutorial (reev.readthedocs.io).

ACMG classification
REEV provides tools for the fast interpretation of sequence and structural variants using the ACMG guidelines for variant interpretation including commonly used modifications.For semi-automated variant classification, REEV considers three rule systems: the original ACMG 2015 guidelines ( 16 ), the ACGS 2020 rules ( 17 ) and the 2020 point system described by Tavtigian et al. ( 51 ).REEV offers a short explanation of the definition and application of every of their criteria.For the classification of sequence variants, REEV uses a semiautomated ACMG variant class assessment based on the In-terVar ( 52 ) tool (Figure 1 A, see also below for details).Users can modify and complete this by checking and unchecking every single criterion or altering its level of evidence (supporting, moderate, strong, or very strong), clear all criteria or reset to auto-fill (Figure 1 A, B).Users can customize the application of the PP3-criterion (e.g.adapt the level of evidence according to the applied pathogenicity prediction tool) (Figure 1 C).Semi-automated classification of structural variants is performed following ACMG and ClinGen standards ( 17 ) using the AutoCNV ( 53 ) tool (see below for details).As for sequence variants, users can modify each criterion by checking and unchecking or adapting the points given for the respective criterion (Figure 1 D).Since both, InterVar and AutoCNV, can provide inaccurate pathogenicity predictions not only manual completion of unassigned ACMG criteria but also checking of prefilled ACMG criteria is crucial for the correct assessment of a variant.Therefore, we designed REEV as a diagnostic decision support system putting a focus on the semi-automated classification by the user themself.To this end, REEV assists the user with the aforementioned explanation of the definition and application of every ACMG criterion and the easy option to adapt them.
AutoCNV ( 53 ) automatically classifies CNVs using the ACMG / ClinGen CNV criteria ( 17 ).AutoCNV implements automated evaluation of all criteria in Section 1 and 3, as well as most of Section 2 (excluding 2J and 2K) and criteria 4O in Section 4. DGV ( 42 ) and gnomAD ( 6 ,30 ) were used to obtain frequency information.Haploinsufficiency and triplosensitivity information were obtained through Decipher ( 28 ).Au-toPVS1 ( 54) is used to evaluate the impact of gene-level duplications and deletions.All sections and criteria concerned with phenotype specificity, segregation or patient phenotyping require manual evaluation by the user.

Additional features for logged-in users
While REEV is fully functional without login, there are a few features only available after login.Storing ACMG assessment and bookmarks on the server requires a login to be uniquely assigned to users and across computers.Users can always store the persistent URLs into the REEV server as browser bookmarks and share these URLs without registration, e.g.via Email.ClinVar ( 35 ) uploads can only be made available after login as this requires depositing the institution's ClinVar API key on the server which should be protected by login.
Logging in is possible via an ORCID account ( 55 ) or Life-Science RI ( 56 ) (which allows researchers in the European Union to use their home institution accounts).

Storing bookmarks and ACMG assessments
Logged-in users can store bookmarks on genes and variants on the server.In addition, they can store the phenotype information they entered for their case as well as the ACMG variant assessment scores (both the automated analysis results as well as their adjustments).

ClinVar upload
Logged-in users can deposit their ClinVar ( 35 ) API key and use it to submit their clinical variant assessments to ClinVar.To the best of our knowledge, REEV is the first graphical tool to allow for such uploads via the API.Submission via the API has the advantage that ClinVar submission accession identifiers can be obtained within a few hours.
We plan to add further functionality for logged-in users such as publicly sharing comments on variants.4).Every criterion can be (de)selected manually ( 5) and set at the chosen level of confidence ( 6). ( B ) Example of this manual setting of the le v el of confidence for the variable pathogenicity prediction criterion PP3. ( C ) Overview of variant pathogenicity scores from different tools (aggregated by dbNSFP) as well as color coding and calibrated scores where applicable for an easy and correct usage of the modified PP3 criterion f ollo wing Peja v er et al. ( 46 ).( D ) Semi-automated ACMG classification of a str uct ural variant based on A utoCNV (applicable for ACMG / ClinGen CNV criteria 1-3 of 5 ( 17)), which can also be reset ( 7), (de)selected manually ( 8) and set at the chosen le v el of confidence ( 9).

Benchmarking and software testing
We benchmarked REEV's performance with regard to accuracy and speed using a defined set of 10 different variants investigated independently by six different clinicians.These ten variants comprise two null variants, two missense variants, two splice site variants, two deletions and two duplications ( Supplementary Table S1 ).Correctness of displayed information (e.g.gnomAD frequencies, ClinVar entries, pathogenicity predictions, etc.) was checked.Classification of variants was carried out according to current A CMG and A CGS guidelines ( 16 , 17 , 51 ).T ime until reaching final ACMG classification was measured and compared between using REEV and single look-up of information required for classification.Statistical analysis was performed using a one-sided paired t -test.

Results and discussion
Here we present REEV, a free and open tool for the userfriendly automated clinical evaluation of genetic variants.

Accuracy and performance in comparison to single tools
Testing REEV's accuracy using ten different benchmarking variants showed a correct presentation of this information.When this information was used to classify the respective variant with the help of the semi-automated ACMG classification offered by REEV this classification overall matched the clinician's classification reached without using REEV.For small variants, REEV significantly reduced the time required to reach final ACMG classification (compared to single lookup of information).For structural variants we could only detect a significant reduction of the time required for classification for deletions ( Supplementary Figure S3 ).

Comparison to existing integrative tools
To benchmark REEV's potential use in the clinical routine, we compared REEV and ten similar state-of-the-art web tools (Table 2 ).Of the ten evaluated tools gathering variant effect prediction, only five provide (detailed) information on genedisease-associations.The only other tool enabling the annotation with case specific phenotype information and returning gene-to-phenotype ranks is the commercial platform Genoox Franklin.A frequent short-coming of existing variant interpretation platforms is their restriction to coding small variants.
Apart from REEV, we found only DECIPHER and the two merely commercial services, Genoox Franklin and Varsome, supporting both small variants and structural variants.The ability to submit variants directly to the ClinVar database is only available in one other software: the commercial offering Varsome.

Distinctive features and use cases
With REEV we present a free and open easy-to-use, versatile web application that integrates all relevant information on genes, sequence, and structural variants into one single platform.By combining extensive information on genes and their clinical relevance (gene-disease associations, dosage sensitivity scores, etc.) as well as compiling variant specific pathogenicity predictions and database information (gnomAD v4, Clin-Var, etc.) REEV sets the basis for the rapid and thorough  W 156 Nucleic Acids Research , 2024, Vol.52, Web Server issue evaluation of a variant of interest.As described above, a standalone feature of REEV amongst further academic variant interpretation platforms is its capability of taking case specific phenotype information and returning gene-to-phenotype ranking, helping the user to rate a variant's significance.To achieve this, we implemented semi-automated predictions for small (InterVar) and structural (AutoCNV) variants and combined them with a flexible, easy-to-use interactive system for manual interpretation and completion of ACMG criteria (Figure 1 ).However, REEV does not only aim to support the daily work of assessing a variant's relevance in routine NGS diagnostics but also wants to provide valuable additional information when it comes to rating a variant's possible significance in a research setting.We summarize and demonstrate these features in two use cases, one for a sequence (Figure 2 ) and one for a structural variant (Figure 3 ).

Conclusion
REEV is a free versatile web tool that implements all the information needed for a fast and complete assessment of sequence as well as structural variants.By allowing for case specific phenotype-to-gene ranking, REEV assists the user in a fast prioritization of clinically relevant variants.With a special focus on ACMG classification and an userfriendly way of submitting the evaluated variant to the Clin-Var database, REEV is useful in a clinical geneticist's daily routine.REEV also provides valuable additional information necessary when further evaluating variants in a research context.
REEV is used in the authors' daily work and actively maintained.We welcome questions, comments, and suggestions via email or the GitHub project's issue tracker and discussion forum.

Figure 1 .
Figure 1.Semi-automated ACMG Classification.Ov ervie w of semi-automated variant classification using REEV.(A-C) ACMG classification of a sequence variant.( A ) Semi-automated classification based on the InterVar tool.The user can (de)activate a terse mode ( 1 ), show or hide failed ACMG criteria ( 2 ) and clear or reset chosen criteria to auto ( 3 ) meaning they are computed rather than user set.Logged in users can also load and save the classification of variants (4).Every criterion can be (de)selected manually (5) and set at the chosen level of confidence (6).( B ) Example of this manual setting of the le v el of confidence for the variable pathogenicity prediction criterion PP3. ( C ) Overview of variant pathogenicity scores from different tools (aggregated by dbNSFP) as well as color coding and calibrated scores where applicable for an easy and correct usage of the modified PP3 criterion f ollo wing Peja v er et al.( 46 ).( D ) Semi-automated ACMG classification of a str uct ural variant based on A utoCNV (applicable for ACMG / ClinGen CNV criteria 1-3 of 5 (17)), which can also be reset (7), (de)selected manually (8) and set at the chosen le v el of confidence (9).

W 153 Table 2 .
Feature comparison of REEV with existing tools

Figure 2 .
Figure 2. Use case: sequence variant.Example query for the sequence variant c hr7:4201 21 59:T:G.Variants can be either entered as genomic variant or on cDNA le v el pro viding the corresponding GenB ank transcript v ariant, e.g .NM_0 0 0168.6(GLI3 ):c.1880A > C. Here, w e highlight k e y features pro vided b yREEV; for a full use case demonstration, e.g.including an example on the overview given to the affected gene, see the REEV quickstart and tutorial at reev.readthedocs.io.( A ) Logged-in users can provide a case specific pseudonym / id ( 1 ) and phenotype information in the form of HPO terms or OMIM diseases ( 2 ).In our example, we provided a patient phenotype which is a preaxial hand and foot polydactyly, foot postaxial polydactyly, syndactyly and macrocephaly.REEV automatically suggests possible genes linked to the given phenotype, as well as genes, variants in which might be likely to be causative of this phenotype (as provided by CADA-prio HPO-gene-ranking) (note, that this phenotypic information does not influence automated variant predictions by InterVar and AutoCNV, but the results of these gene-to-phenotype rankings can be taken into account by the user for the manual completion of variant classification according to ACMG guidelines.) ( 3 ) GLI3 , the gene affected by the likely causative variant, is listed in 4th position and possible differential diagnoses: LMBR1 , HOXD13 , etc. are also shown.( B ) Overview of the variant and its consequences summarized by REEV.( C ) Associated conditions (shown here: Orphanet; but OMIM, HPO and Genomics England PanelApp are also available in REEV) for the gene affected by the respectiv e v ariant.T his list can also be sorted either b y name or le v el of confidence ( 4 ).Note, when logged-in users ha v e pro vided case specific phenotype information (see A), here they are shown a gene-to-phenotype rank (according to CADA ranking) ( 5 ).( D ) Semi-automated ACMG classification based on InterVar which classifies this variant as of uncertain significance.( E ) Manual input of further information allows for the final classification as likely pathogenic: REEV shows that this variant is listed once in the ClinVar database as likely pathogenic (*).We thus can additionally assign the ACMG criterion PS4 on supporting le v el.Using REVEL as our reference pathogenicity prediction tool and this variant yielding a REVEL score of 0.96 we can use the PP3 criterion at strong le v el according to Peja v er et al.( 46 ) (see also Figure1 A-C).

W 155 Figure 3 .
Figure 3. Use case: str uct ural variant.Example query of the str uct ural variant DEL:c hr1 4:371 31 998:371 3381 5.Again, here we demonstrate the unique features provided by REEV; for a full use case demonstration see the REEV quickstart and tutorial at reev.readthedocs.io.( A ) Logged-in users can provide case specific phenotype information (here we provided our patient's phenotype which is oligodontia).Again, REEV suggests possible genes linked to the given phenotype (as provided by CADA-prio HPO-gene-ranking) ( PAX9 , the gene affected by our variant is listed 1st).( B ) Full list of genes affected by our variant as well as different dosage sensitivity scores (provided by ClinGen, gnomAD, RCNV) of the respective gene.This list can also be sorted either by gene symbol or score of interest.( C ) Associated conditions for the gene affected by the respective variant.This list can be sorted either by name or level of confidence.Again, when a logged-in user has provided case specific phenotype information (see A), they are shown a gene-to-phenotype rank (according to CADA ranking), in our example rank for a variant in PAX9 .( D ) Semi-automated ACMG classification of our str uct ural variant based on AutoCNV (see also Figure1 D) yields a variant of uncertain significance with the CNV criteria L1A true (deletion contains part of a protein coding gene), L2E true with 0.45 points (corresponding to the PVS1 criterion; our variant abrogates > 10% of the protein and PVS1 is to be used at strong le v el, equaling 0.45 points for L2E) and L3A true (for number of contained genes is 0-24).( E ) Manual input of further information allows for the final classification as pathogenic: ClinVar information of this variant provided by REEV revealed several (likely) pathogenic loss of function sequence variants in PAX9 , therefore criterion L4E is true with 0.3 points.Also, we identified this variant in a trio-genome sequencing as a de novo variant, so criterion L5A is true with 0.45 points.

Table 1 .
Here, we summarize the features implemented in REEV.For a Ov ervie w of sources and tools integrated with REEV.T he REEV back end integrates st ate-of-the-art dat abases cont aining gene and variant related information including condition / disease / phenotype related information, population frequencies, pathogenicity predictions as well as expression and cross species conservation https:// github.com/varfish-org/ annonars https:// github.com/varfish-org/ mehari https:// github.com/varfish-org/ viguno https:// github.com/varfish-org/ cada-prio https:// github.com/bihealth/ dotty ( 40 )ore or the ClinGen( 31 )haploinsufficiency or triplosensitivity assessment.REEV also depicts details on overlapping variants in ClinVar( 35 )and the individual reference ClinVar assertions.ClinVar variants will be sorted by reciprocal overlap (the fraction of overlap between the variant studied and the ClinVar variant).Users can see the genomic location of the variant in an integrated IGV genome browser( 50 )with tracks for interpreting the variant.Users are again provided with link-outs to known external genome browsers (ENSEMBL( 40 ), UCSC