shinyseg: a web application for flexible cosegregation and sensitivity analysis

Abstract Motivation Cosegregation analysis is a powerful tool for identifying pathogenic genetic variants, but its implementation remains challenging. Existing software is either limited in scope or too demanding for many end users. Moreover, current solutions lack methods for assessing the robustness of cosegregation evidence, which is important due to its reliance on uncertain estimates. Results We present shinyseg, a comprehensive web application for clinical cosegregation analysis. Our app streamlines penetrance specification based on either liability classes or epidemiological data such as risks, hazard ratios, and age of onset distribution. In addition, it incorporates sensitivity analyses to assess the robustness of cosegregation evidence, and offers support in clinical interpretation. Availability and implementation The shinyseg app is freely available at https://chrcarrizosa.shinyapps.io/shinyseg, with documentation and complete R source code on https://chrcarrizosa.github.io/shinyseg and https://github.com/chrcarrizosa/shinyseg.


CoSeg
CoSeg (Ranola and Shirts 2016) is an R implementation of Mohammadi et al.'s (2009) algorithm for computing the CSLR.It is limited to autosomal dominant inheritance and pedigrees compatible with the method assumptions, including absence of loops/inbreeding.
Furthermore, it can only handle one phenotype at a time and may be computationally prohibitive for large families (50+ members).The variant penetrances adhere to a normal model, with CoSeg providing estimates for a few common cancer genes.

segregatr
segregatr (Ratajska et al. 2023) is an R implementation of Thompson et al.'s (2003) approach for computing the FLB.Free from the constraints of CoSeg's algorithm, segregatr is also more accessible thanks to its availability via CRAN.Together with shinyseg, it uniquely handles X-linked inheritance and most consanguineous cases, including pedigrees with parent-child matings.Moreover, it offers the most informative error reporting.In addition to the need to engage with R coding, segregatr's main limitation is that it lacks guidance for defining the liability classes on which penetrance depends.

COOL
COOL v2 (Belman et al. 2020) is a website that also implements Thompson et al.'s (2003) method.As such, it shares several qualities with segregatr, while offering increased accessibility through a more convenient interface.COOL's main forte lies in the inclusion of built-in cancer incidence data and even relative risk estimates for multiple genes, simplifying penetrance specification for those cases.For others, however, this step is less intuitive due to the lack of guidance, inaccurate error messages, and unavailable source code.COOL's online-only nature may also pose challenges for clinical use in many countries.

shinyseg
shinyseg is an R Shiny app designed to address segregatr's shortcomings by wrapping its features in a user-friendly interface.As a result, it is the most interactive tool, uniquely providing step-by-step feedback, real-time pedigree visualizations, and direction for the clinical interpretation of results.Additionally, shinyseg allows users to test cosegregation assumptions through sensitivity analyses, and introduces a general and fully parametric version of COOL's cancer penetrances.Its limitations are that it does not provide estimates for any particular disease or gene, and that sensitivity analyses can be slow.

Parametric penetrances
shinyseg offers different ways of defining the variant-disease model.For simple cases a manual specification of liability classes can suffice, but this becomes difficult when needing to account for onset age and multiple phenotypes.Here, we detail the parametric model that the app provides to streamline these more complex scenarios.

Overview
With its relative risk mode, shinyseg enables a parametric specification of the survival penetrances described in Belman et al. (2020).This model-based approach relies on two types of inputs for each disease phenotype d: • The baseline parameters describe the incidence of d in non-carriers and heterozygous carriers in recessive inheritance.These include a baseline lifetime risk r 0 d , and the mean µ 0 d and standard deviation σ 0 d of their age of onset.
• The hazard ratios define the relative risk of d in homo-, hemi-, and heterozygous carriers in dominant inheritance, compared to the baseline.They can be constant or age-dependent, and may be specified either directly or through the variant-associated lifetime risk r 1 d .
Briefly, these parameters are used to calculate the baseline and variant-associated hazards for ages t = 1, . . ., 100 years, and each phenotype d; subsequently, these hazards are used to derive the penetrances.The procedure is detailed in the following sections.Note that a sex-specific specification is also possible, in which case the computations are performed separately for each sex.

Baseline hazards
To define the baseline hazards h 0 d,t , we follow an approach akin to Jonker et al. (2003) and Mohammadi et al. (2009), using the cumulative distribution function (CDF) of a Normal distribution.One key difference is that shinyseg also incorporates truncation for greater control of the cumulative incidence within the parametric framework.
The relevant inputs are the baseline parameters: baseline lifetime risk r 0 d , mean µ 0 d and standard deviation σ 0 d of the age of onset.With these, the baseline cumulative incidences CI 0 d,t are taken from the CDF of a truncated Normal (µ 0 d , σ 0 d ) multiplied by r 0 d : where Φ is the CDF of a standard Normal.The truncation, at 0 and 100 years, ensures that the cumulative incidences at these time points equal 0 and r 0 d .
The baseline hazards h 0 d,t are then derived as:

Variant-associated hazards
The variant-associated hazards h 1 d,t result from multiplying the baseline h 0 d,t by the userspecified hazard ratios . While this approach aligns with Belman et al. (2020), shinyseg introduces a distinctive model-based specification for these relative risks.
To elaborate, users provide a vector b d as input, where values loosely represent the hazard ratios at equidistant ages from 1 to 100 years; this vector has a variable length H d , allowing to specify a flexible number of age points.These values undergo a smoothing process across the entire age range based on B-splines, resulting in the age-specific ratios: Here, b d,j and B d,j (t) refer to the j-th element of b d and j-th basis function, respectively.
If a single value is provided, the basis function is set to 1, leading to a constant, i.e. age-independent, hazard ratio b d .
These smoothed hazard ratios are then utilized to calculate h 1 d,t :

Variant-associated lifetime risk
To facilitate hazard ratio specification and enable further sensitivity analyses, we provide the variant-associated lifetime risk r 1 d .This parameter represents the cumulative incidence of d in variant carriers (homo-, hemi-, and heterozygous carriers in dominant inheritance) at 100 years of age, and it is dynamically updated based on the other inputs.
More importantly, it can also be directly modified, resulting in the scaling of the specified hazard ratios to align with the new value.In practical terms, this allows users to use the hazard ratio input b d to define a relative risk 'pattern'-whether constant or agedependent-and subsequently adjust r 1 d to tailor it to the desired lifetime risk.Internally, this involves optimizing a scaling factor λ d to minimize the expression: ) , which is achieved using a quasi-Newton algorithm via the optim() R function.

Survival penetrances
The last step involves calculating the penetrances following a survival model.
First, the overall baseline and variant-associated cumulative incidences, CI 0 t and CI 1 t , are computed by treating the D disease phenotypes as independent and summing their individual contributions: ) ) The baseline and variant-associated (survival) penetrances, denoted as SP 0 d,t and SP 1 d,t , are finally calculated as described in Belman et al. (2020)

Table 1 .
A comparison of available cosegregation analysis tools for variant interpretation. :