Abstract

Background

In descriptive epidemiology, there are strong similarities between incidence and survival analyses. Because of the success of multidimensional penalized splines (MPSs) in incidence analysis, we propose in this pedagogical paper to show that MPSs are also very suitable for survival or net survival studies.

Methods

The use of MPSs is illustrated in cancer epidemiology in the context of survival trends studies that require specific statistical modelling. We focus on two examples (cervical and colon cancers) using survival data from the French cancer registries (cases 1990–2015). The dynamic of the excess mortality hazard according to time since diagnosis was modelled using an MPS of time since diagnosis, age at diagnosis and year of diagnosis. Multidimensional splines bring the flexibility necessary to capture any trend patterns while penalization ensures selecting only the complexities necessary to describe the data.

Results

For cervical cancer, the dynamic of the excess mortality hazard changed with the year of diagnosis in opposite ways according to age: this led to a net survival that improved in young women and worsened in older women. For colon cancer, regardless of age, excess mortality decreases with the year of diagnosis but this only concerns mortality at the start of follow-up.

Conclusions

MPSs make it possible to describe the dynamic of the mortality hazard and how this dynamic changes with the year of diagnosis, or more generally with any covariates of interest: this gives essential epidemiological insights for interpreting results. We use the R package survPen to do this type of analysis.

Key Messages
  • In cancer epidemiology, detailed survival trend analyses are essential for health policies and require an appropriate statistical method able to model simple as well as complex trends.

  • The dynamic of the mortality hazard according to time since diagnosis and the way this dynamic changes with the year of diagnosis, or more generally with any covariates of interest, are essential information for epidemiologists to interpret survival results.

  • Multidimensional penalized splines (MPSs) form a powerful statistical tool for modelling the dynamic of the mortality hazard: multidimensional splines allow having a flexible effect of the year of diagnosis (non-linearity, time- and age-dependent effect) while penalization avoids overfitting.

  • For survival analyses, MPSs can easily be used through the R package survPen. This package includes both hazard models and excess hazard models, respectively, for survival or net survival analyses.

Introduction

The statistical analyses of survival data1,2 present strong similarities with the analyses of incidence or mortality data (population count data)3: both examine the instantaneous risk of occurrence of an event over a follow-up time (often called the ‘hazard’ in survival studies or the ‘rate’ in incidence studies) and the effects of factors that influence this risk. In survival models, the event considered is death (or another event, e.g. remission) among patients suffering from the studied disease whereas, in incidence models, the event considered is diagnosis of the disease among the whole general population. Besides, in the survival field, the follow-up time is the time elapsed since diagnosis whereas it is time since birth in the incidence field (i.e. age). An important feature to note is that, in both fields (incidence and survival), ‘time’ itself is a primary ‘factor of interest’ because the instantaneous risk changes over time. Thus, the common aim of survival and incidence analyses is to describe the ‘dynamics’ of the event occurrence according to the follow-up time (i.e. the change in risk over time) and how risk factors affect this dynamic.

The Generalized Additive Models4,5 as developed by Simon Wood form a powerful statistical framework that provides epidemiological insights when analysing incidence data (Poisson regression). These models rely on multidimensional penalized splines (MPSs) that allow flexibility while avoiding overfitting thanks to the penalization. In particular, Uhry et al.6 highlighted the interest of MPSs in detailed incidence and mortality trend analyses and the richness of the descriptions obtained. Especially, trends could vary smoothly with age, thanks to the flexible age × year interaction allowed by MPSs. Using the efficient and robust statistical package mgcv,5,7 these models performed well for analysing cancer incidence and mortality trends in France (1990–2018) for >70 sites or subsites.6,8,9

Inspired by the similarity between incidence and survival fields, we recently led statistical developments in order to make MPSs available for both hazard and excess hazard models for survival analyses.10,11 This new method allowed, for example, in the context of survival trends analyses according to the year of diagnosis, modelling of the excess mortality hazard by using an MPS of time since diagnosis, age at diagnosis and year of diagnosis.10,11 Parameters were estimated in a penalized likelihood framework and simulations showed that MPSs are well suited to analysing survival trends. Moreover, MPSs for survival and net survival models are available in the R package survPen12 and could be used in the latest national French cancer survival trend study to analyse 59 cancer sites or subsites.13

In the same spirit as the Uhry et al.6 paper, this pedagogical article aims to illustrate the richness of the information that can be derived from survival data using MPSs. These models are able to reflect potentially complex effects of several covariates (non-proportionality, non-linearity and interactions) and may model simple as well as complex trend patterns.

The illustrative examples stem from the recent French cancer survival trend study13,14 and focus on net survival and excess mortality hazard (EMH), as they are key indicators in cancer epidemiology.

Material

The method is illustrated with two examples of cancer: cervical cancer to represent a complex setting and colon cancer to represent a ‘typical’ setting. Survival data of cases diagnosed between 1990 and 2015 (age >15 years at diagnosis) were provided by the French population-based cancer registries. The end of follow-up was 30 June 2018. Cancer cases were coded according to the International Classification of Diseases for Oncology, 3rd Edition (ICD-03). Colon cancer analysis focused on both sexes combined (no sex effect considered).

To obtain the expected mortality hazards needed for the excess hazard models, we smoothed (and projected for 2018) the all-cause mortality hazards observed in the general French population by calendar year (1975–2017), sex, 1-year age class and residential area (Département), provided by the French national statistical office.

Methods

EMH and net survival

The EMH corresponds to mortality due to a given cancer and is defined as the difference between the observed mortality in cancer patients and the expected mortality in the general matched population.15 The net survival stems directly from the EMH and is the survival that would be observed if the studied cancer was the only cause of death. As these two indicators are not affected by other causes of death, they allow comparisons between sexes, ages, years of diagnosis and countries.

More precisely, excess hazard models15 assume that the overall mortality hazard ho in cancer patients is the sum of the expected mortality hazard hp in the general population (obtained as described above and assumed known) and the EMH he due directly or indirectly to the studied cancer. Within the context of trend analyses, this may be written as:
where t is the time since diagnosis, a is the age at diagnosis, y is the year of diagnosis and z is a vector of the demographic covariates that affect the expected mortality (here, sex and place of residence).

Penalized excess hazard models for trend analyses

These models are introduced hereafter; we refer the reader to Remontet et al.10 for a complete pedagogical presentation of these models and to Fauvernier et al.11 for mathematical details.

The EMH he is the key indicator of interest and is modelled on the log-scale as:
Specifying flexible effects (non-linearity, non-proportionality, interactions) of the variables needed to describe the trends may be achieved by using an MPS of time, age and year of diagnosis.5,10,11 This tri-variate MPS is defined as the tensor product tet,a,y of marginal bases of time miti=1..I, age qjaj=1..J and year of diagnosis bkyk=1..K, i.e. a multiplication term by term of the bases. The EMH is then modelled as:
where β=βi,j,k is the parameters’ vector (of size I*J*K) to estimate. Restricted cubic splines (i.e. cubic splines constrained to be linear beyond boundary knots) are commonly used for marginal bases.5,10,11 To analyse survival trends between 1990 and 2015 and ≤10 years of follow-up, we used marginal bases with seven knots for time, five for age and five for year of diagnosis, which led to 175 (7 × 5 × 5) parameters to estimate.

With this high number of parameters, this model is very flexible and, to avoid overfitting, parameter estimation is obtained by maximization of a penalized likelihood. This achieves and guarantees a good compromise between fit (likelihood) and smoothness of the estimated hazard (penalization). This compromise is controlled by three smoothing parameters (one for each direction of time, age and year of diagnosis) that are estimated using the Laplace approximate marginal likelihood criterion.11 One major strength of penalized models, as highlighted by Wood, is that the smoothing parameter estimation acts as a model selection procedure5 (p. 301); for example, for a given direction, a high smoothing parameter results in a linear effect.

The fundamental principles of the penalized framework are therefore: (i) use highly flexible models able to capture any trend pattern and (ii) let the penalization do its job by retaining only the complexities necessary to describe the data.

Here, a technical point should be mentioned: this methodology was adopted in the recent French survival study14 with an additional refinement because many different cancers had to be analysed (with a number of cases ranging from 1500 to 125 000).13 Indeed, although the above tri-variate MPS achieves a great deal of the model selection process, linear terms (main effects of t, a and y and interactions t: a, t: y, a: y, t: a: y) remain unpenalized.11 Moreover, in some cases, smoothing can be slightly improved by confronting this complex model to a simpler model and by introducing different smoothing parameters for the main effect and for the interactions (whereas they are common in the tri-variate MPS). For these reasons, instead of using the single above-mentioned tri-variate MPS model, the latter was compared with four simpler penalized models that differ in the year effect and its interactions with time and age (see Supplementary Table S1, available as Supplementary data at IJE online): the final model retained was the one with the smallest Akaike Information criterion (AIC). This refinement completes the selection process, especially when the sample size is small (less frequent cancers), in which case the unpenalized terms may lead to some variability in the estimates. This refinement aims at providing trends that are cleared of small insignificant variations and are thus easier to interpret. We present here results using this refined methodology, such as adopted in the French survival study.

Indicators estimated from the selected model

The selected model provides estimates for the EMH and the net survival for any value of time, age and year of diagnosis. EMH ratios can also be derived; e.g. this ratio according to the year of diagnosis is EHRya,t,y=he(t,a,y)he(t,a,2005) (with reference year 2005). The ages shown in the figures are based on the age distributions of the cancer cases (10th, 50th and 90th percentiles, with maximum 80 years).

The EMH is expressed in number of deaths per person-years. When its value is low (say, ≤0.1), the EMH can be directly translated into an annual probability of death (e.g. an EMH equal to 0.05 corresponds to an annual probability of death of 5%). When its value is high, it can be first converted into deaths per person-months and then translated into a monthly probability of death.

Goodness of fit of the models

To evaluate the goodness of fit of the models, the estimates were compared with those obtained from two methods, namely the non-parametric Pohar–Perme estimator for net survival16 and a piecewise constant model for the EMH. These two methods, applied to data by age group and period, are ‘assumption-free’ and provide estimates that are therefore viewed as ‘raw data’. Marginal net survivals by age group and period were calculated from the model to be compared with the Pohar–Perme estimates. We considered that the goodness of fit was satisfactory when the model smoothed and adjusted these ‘raw data’ correctly.

Implementation

All analyses were performed in R, version 3.5.2, using survPen package version 1.2.0.12

Results

The cervical cancer analysis included 7878 cases (median age at diagnosis: 53 years) and 3393 deaths at 10 years. The colon cancer analysis included 70 341 cases (median age at diagnosis: 73 years) and 43 543 deaths at 10 years.

For cervical cancer, the penalized model selected via the AIC was Model M3 (see Supplementary Table S1, available as Supplementary data at IJE online): it allowed for a non-linear effect of the year of diagnosis and for the age × year and time × year interactions. For colon cancer, the selected model was M4 (i.e. the tri-variate MPS).

The results for cervical cancer are shown in Figures 1 and 2 and those for colon cancer are shown in Figures 3 and 4. Rows ‘a’ of Figures 1 and 3 show the surfaces of the EMH according to the time elapsed since diagnosis and age for three different years of diagnosis: these 3D plots show that MPSs provide smooth surfaces in age and time directions and various surface patterns for the two studied cancers.

Excess mortality hazard (EMH, in deaths per person-years) for cervical cancer. Row a: 3D plots of the EMH according to the time since diagnosis (in years) and age at diagnosis, for years of diagnosis 1995, 2005 and 2015 (one plot per year). Row b: dynamics of the EMH at different ages at diagnosis, for years of diagnosis 1995, 2005 and 2015 (one plot per year). Row c: dynamics of the EMH at different years of diagnosis, for ages at diagnosis of 30, 50 and 80 years (one plot per age)
Figure 1.

Excess mortality hazard (EMH, in deaths per person-years) for cervical cancer. Row a: 3D plots of the EMH according to the time since diagnosis (in years) and age at diagnosis, for years of diagnosis 1995, 2005 and 2015 (one plot per year). Row b: dynamics of the EMH at different ages at diagnosis, for years of diagnosis 1995, 2005 and 2015 (one plot per year). Row c: dynamics of the EMH at different years of diagnosis, for ages at diagnosis of 30, 50 and 80 years (one plot per age)

Focus on the trends for cervical cancer. Row a: excess mortality hazard ratio (EHR) according to the year of diagnosis (reference year: 2005) at different times since diagnosis, for ages at diagnosis of 30, 50 and 80 years (one plot per age). Row b: net survivals at 1, 5 and 10 years according to the year of diagnosis at different ages at diagnosis (one plot per time since diagnosis)
Figure 2.

Focus on the trends for cervical cancer. Row a: excess mortality hazard ratio (EHR) according to the year of diagnosis (reference year: 2005) at different times since diagnosis, for ages at diagnosis of 30, 50 and 80 years (one plot per age). Row b: net survivals at 1, 5 and 10 years according to the year of diagnosis at different ages at diagnosis (one plot per time since diagnosis)

Excess mortality hazard (EMH, in deaths per person-years) for colon cancer. Row a: 3D plots of the EMH according to the time since diagnosis (in years) and age at diagnosis, for years of diagnosis 1995, 2005 and 2015 (one plot per year). Row b: dynamics of the EMH at different ages at diagnosis, for years of diagnosis 1995, 2005 and 2015 (one plot per year). Row c: dynamics of the EMH at different years of diagnosis, for ages at diagnosis of 50, 70 and 80 years (one plot per age)
Figure 3.

Excess mortality hazard (EMH, in deaths per person-years) for colon cancer. Row a: 3D plots of the EMH according to the time since diagnosis (in years) and age at diagnosis, for years of diagnosis 1995, 2005 and 2015 (one plot per year). Row b: dynamics of the EMH at different ages at diagnosis, for years of diagnosis 1995, 2005 and 2015 (one plot per year). Row c: dynamics of the EMH at different years of diagnosis, for ages at diagnosis of 50, 70 and 80 years (one plot per age)

Focus on the trends for colon cancer. Row a: excess mortality hazard ratio (EHR) according to the year of diagnosis (reference year: 2005) at different times since diagnosis and for ages at diagnosis of 50, 70 and 80 years (one plot per age). Row b: net survivals at 1, 5 and 10 years according to the year of diagnosis, for different ages at diagnosis (one plot per time since diagnosis)
Figure 4.

Focus on the trends for colon cancer. Row a: excess mortality hazard ratio (EHR) according to the year of diagnosis (reference year: 2005) at different times since diagnosis and for ages at diagnosis of 50, 70 and 80 years (one plot per age). Row b: net survivals at 1, 5 and 10 years according to the year of diagnosis, for different ages at diagnosis (one plot per time since diagnosis)

We describe below the cross-sectional cuts of these figures that allow detailed descriptions of EMH dynamics at different ages and years of diagnosis.

Dynamics of the EMH by age for cervical cancer

For cervical cancer, Figure 1b presents, for 3 years of diagnosis (one plot per year), the EMH dynamics (i.e. for recall, the variation in the EMH according to time) at ages 30, 50 and 80 years. These plots help to set the scene by describing the way the EMH dynamic varies with age (at a fixed year). Indeed, age at diagnosis is often a strong prognostic factor and often affects the values and dynamics of the EMH. This was the case for cervical cancer: the EMH values increased with age (whatever the time elapsed since diagnosis and the year of diagnosis) and its dynamics differed according to age. For example, in 1995, the EMH decreased with the time since diagnosis in women aged 80 years at diagnosis (from 0.39 deaths per person-years at diagnosis to 0.04 at 10 years). Conversely, in women aged 30 years at diagnosis, the EMH increased over 1.5 years (from 0.01 deaths per person-years at diagnosis to 0.7 at 1.5 years) then decreased to 0.006 at 10 years. The dynamic of the EMH observed in women aged 80 years in 1995 is classical in the cancer field: the EMH is high at diagnosis [here, a 3% ‘monthly’ probability of death (0.39/12)] and decreases rapidly; at 10 years after diagnosis, the EMH may seem low (0.04), but it still corresponds to a 4% ‘annual’ probability of death.

Focus on the trends for cervical cancer

If our interest lies in trend analysis, then the most relevant information is given in Figures 1c and 2a. Figure 1c shows, for three ages at diagnosis (one plot per age), the way the dynamics of the EMH change between 1995, 2005 and 2015. Figure 2a shows, for the same three ages (one plot per age), the effect of the year of diagnosis by plotting the EMH ratios according to the year of diagnosis EHRya,t,y (the reference year being 2005) at different times since diagnosis. Figure 1c is on the hazard scale and thus highlights the trends in terms of ‘difference’ in EMH values, whereas Figure 2a highlights the trends in terms of EMH ‘ratios’.

Figure 1c shows opposite trends of the EMH dynamics according to the age at diagnosis: the EMH increased with the year of diagnosis in women aged 80 years whereas it remained stable in women aged 50 years and decreased in women aged 30 years. In women aged 80 years at diagnosis, the difference in EMH between those diagnosed in 2015 (or 2005) and those diagnosed in 1995 reduces with the time since diagnosis and are the highest in the first 2 years after diagnosis.

Figure 2a shows clearly these opposite age-specific trends and gives interesting complementary information using the EMH ratio. At all ages at diagnosis, the EMH ratio according to the year of diagnosis was linear and the same whatever the time elapsed since diagnosis. Contrarily to the differences in the EMH values (Figure 1c), the EMH ratio did not depend on the time since diagnosis. For example, in women aged 80 years at diagnosis, the EMH ratio for 2015 vs 2005 (reference year) was 1.15 whatever the time since diagnosis: this means that the EMH of women diagnosed in 2015 was 1.15 times higher than the EMH of women diagnosed in 2005, whatever the time since diagnosis.

Finally, the changes in the EMH dynamics led to opposite net survival trends over age (Figure 2b): the net survival decreased in women aged 80 years at diagnosis, remained stable in women aged 50 years and increased slightly in women aged 30 years.

Dynamics of the EMH by age for colon cancer

For colon cancer, Figure 3b shows that, whatever the year of diagnosis, the EMH was very high just after diagnosis in patients aged 80 years at diagnosis. The differences in EMH according to age decreased rapidly with the time elapsed since diagnosis and disappeared after 2 years.

Focus on the trends for colon cancer

Figure 3c shows the decrease in the EMH with the years of diagnosis on the hazard scale. This decrease was observed over 5 years after diagnosis in patients aged 50 years, but only over 1.5 years in patients aged 80 years. These results were confirmed by the trends in the EMH ratio (Figure 4a). It decreased with the year of diagnosis whatever the time since diagnosis (≤5 years) in patients aged 50 years. In patients aged 80 years, it decreased at start of follow-up (times 0 and 1) but remained stable at 5 years.

These decreases in EMH led to an improvement in the net survival over the years of diagnosis whatever the age and the time since diagnosis (Figure 4b). As the net survival results from cumulative effects of the EMH, this example of colon cancer illustrates that a high decrease in the EMH observed only at the start of follow-up results in an improvement in the net survival whatever the follow-up time.

Goodness of fit of the models

The goodness of fit of each model is presented in Supplementary Figures S1–S3 (available as Supplementary data at IJE online) for cervical cancer and Supplementary Figures S4–S6 (available as Supplementary data at IJE online) for colon cancer: in both cancers, the goodness of fit was deemed acceptable.

Discussion

This paper presents the epidemiological insights that are provided by our new methodology based on MPSs for two cancers with different trend patterns. For colon cancer, one of the most frequent cancers,17 the net survival increased over the study period for all ages. For cervical cancer, a far less frequent cancer, despite the relatively low number of cases, our method had enough power to show that trends were quite complex and in opposite directions depending on age.

Regarding cervical cancer, the penalization (and AIC) was very successful in performing model selection. Indeed, as described in the ‘Methods’ section, the key principle of penalized models is to use very flexible models with all possible non-linear effects and interactions and then let the penalization retain only the complexities that are necessary to describe the data. Despite the complexity of the selected model M3, penalization led to retaining only a simple linear effect of the year of diagnosis, keeping a strong age × year interaction and eliminating the unnecessary time × year interaction. This process led to a clear picture of the EMH dynamics and the survival of women over the period 1990–2015.

As illustrated in this paper, there are many possibilities for presenting the rich information that can be derived from the model, each emphasizing a specific view, and choices of the most relevant will depend on the study objective. In particular, EMH trends can be described with either EMH values or EMH ratios. The EMH ratio (Figures 2a and 4a) is a classical indicator in log-hazard models (especially in a Cox survival model) and it shows directly the type of year effect retained in the model (for cervical cancer, a linear effect with no time × year interaction). However, this indicator masks the EMH level that is very important for epidemiological interpretation. In our view, the dynamic of the excess hazard over time since diagnosis should be described as a start (Figures 1 and 3), as it is a key indicator in survival studies, and completed if relevant (see e.g. Figures 2a and 4a). In addition, summary indicators may be also easily derived (with their CIs), such as marginal net survivals (by age group or all ages), age-standardized net survival and differences in net survival between 1990 and 2015 (by age or all ages).18

The striking result for cervical cancer was that net survival decreased in elderly women but increased slightly in young women. The epidemiologists hypothesized that the net survival decrease in the elderly results from a ‘paradoxical’ effect of screening.19 More specifically, cervical screening in France has led to a dramatic decrease in cervical cancer incidence because it detects pre-invasive tumours before they develop into cancer. This means that, currently, a high proportion of cancer cases stem from unscreened women, leading probably to a higher overall proportion of late-stage tumours now than in the past (the population ‘at risk of cervical cancer’ having shifted from ‘all women’ to more deprived women as the latter participate less in cervical screening).20 The reason for which this paradoxical effect was not observed in young women is still unclear.

Regarding colon cancer, the improvements in net survival at 1, 5 and 10 years at all ages was mainly due to an EMH decrease during the first 2 years of follow-up—a decrease that results probably from surgery improvements since 1990 that reduced post-surgical morbi-mortality.

In the most recent French study on cancer survival, trend analyses were performed for 41 solid tumours (23 cancers and 18 topographical or histological subsites) and 18 haematological malignancies (14 entities and 4 sub-entities).13 MPSs proved to be perfectly adapted to capture various (net or overall) survival trends of these wide varieties of epidemiological profiles, thanks to the penalization framework. Moreover, an MPS avoids categorization of continuous variables, which results in accurate descriptions of their effects. International cancer survival trend studies,21,22 as well as the previous French study,19 used non-parametric methods16,23 that require stratifying on continuous variables that are, for trends analyses, age and year of diagnosis. Stratification presents several drawbacks: arbitrary choices of the strata, loss of information due to categorization, possible inconsistencies between estimates from adjacent strata, imprecise estimates in small-sized strata and difficulties in studying covariate interactions. These drawbacks are illustrated by the comparisons we made with Pohar–Perme estimates (as part of goodness-of-fit assessments). In cervical cancer, the trend pattern did not appear clearly with the Pohar–Perme estimates because of the high variability of these estimates (Supplementary Figure S3, available as Supplementary data at IJE online). In contrast, the penalized model provided a clear picture of the trends and the estimates are more precise than with non-parametric methods (see CIs width in Supplementary Figures, available as Supplementary data at IJE online).

Another advantage of flexible hazard models using MPSs is that it provides detailed survival trends (according to covariates of interest) and various hazard graphics (hazard dynamics, trends of this dynamic, hazard ratio, etc.); these indicators cannot be obtained using the non-parametric approaches whereas they provide additional essential information to enrich epidemiological interpretation.

As evoked in the ‘Methods’ section, we opted for AIC selection (rather than the model M4 alone) to further improve, in some cases, the smoothness of the excess hazard and net survival estimates obtained. The trends are then stripped of some small uncertain variations not supported by the AIC, giving a clearer pattern of trends and making it easier for epidemiologists to interpret the results. This is only a minor refinement though, which may be discussed, and the cost is a moderate underestimation of the standard errors (by a factor 0.9 on average from simulation explorations). The advantages of the AIC selection are more pronounced when it comes to estimating the excess hazard ratio.

The MPS approach detailed here is also suitable, obviously, for analysing alternative covariates instead of the year of diagnosis, such as a spatial dimension or a deprivation index, or adding such covariates in a trend analysis (e.g. spatio-temporal model). One limitation though is that only a restricted number of variables may be entered in one single MPS; as a consequence, when the number of continuous variables is too high, a model-building strategy is required.24 When considering categorical covariates such as sex, the penalization framework allows, for instance, penalizing the sex hazard ratio,12 which seems an interesting option to model jointly men and women.

Finally, the MPS proved to be particularly adapted to making survival projections,25 which are important in guiding public health policies.

Conclusion

Penalized hazard models are a powerful statistical tool for survival study. In survival trend studies, they allow the revealing of simple as well as complex effects of the year of diagnosis. Describing the dynamics of the EMH gives additional and valuable epidemiological information for interpreting survival trends. The maturity of the statistical theoretical framework developed by Wood and MPS availability via the survPen R package makes MPSs perfectly suitable for routine use in epidemiological survival studies, including large multisite cancer studies.

Ethics approval

This study is based on legal authorization provided by the National Data Protection Authority (Commission Nationale de l’Informatique et des Libertés, no. 903324).

Data availability

The data underlying this article cannot be shared publicly due to European General Data Protection Regulation (GDPR).

Supplementary data

Supplementary data are available at IJE online.

Author contributions

E.D., Z.U., M.F., L.Ro. and L.Re. conceived the overall design. E.D. and Z.U. implemented the statistical analyses. E.D. wrote the first draft of the manuscript supervised by Z.U. and L.Re. G.C., M.M, B.T. and F.M interpreted the epidemiological results. All authors reviewed the final manuscript.

Funding

This work was supported by the Institut National du Cancer (INCa) (grant number 2019–164) and Santé publique France (SpF) (grant number 19DMNA021-0).

Acknowledgements

The authors thank Jean Iwaz for his thorough proofreading.

Conflict of interest

None declared.

References

1

Cox
DR.
Regression models and life tables (with discussion)
.
J Royal Stat Soc
1972
;
34
:
187
20
.

2

Kalbfleisch
JD
,
Prentice
RL.
The Statistical Analysis of Failure Time Data
, 2nd edn.
New-York
:
John Wiley & Sons
,
2002
.

3

Clayton
D
,
Schifflers
E.
Models for temporal variation in cancer rates. II: age–period–cohort models
.
Stat Med
1987
;
6
:
469
81
.

4

Wood
SN
,
Augustin
NH.
GAMs with integrated model selection using penalized regression splines and applications to environmental modelling
.
Ecol Modell
2002
;
157
:
157
77
.

5

Wood
SN.
Generalized Additive Models: An Introduction with R
, 2nd edn.
London
:
Chapman & Hall/CRC
,
2017
.

6

Uhry
Z
,
Chatignoux
E
,
Dantony
E
et al.
Multidimensional penalized splines for incidence and mortality-trend analyses and validation of national cancer-incidence estimates
.
Int J Epidemiol
2020
;
49
:
1294
306
.

7

Wood
SN
,
Pya
N
,
Säfken
B.
Smoothing parameter and model selection for general smooth models
.
J Am Stat Assoc
2016
;
111
:
1548
63
.

8

Defossez
G
,
Uhry
Z
,
Delafosse
P
et al. ;
French Network of Cancer Registries (FRANCIM)
.
Cancer incidence and mortality trends in France over 1990–2018 for solid tumors: the sex gap is narrowing
.
BMC Cancer
2021
;
21
:
726
14
.

9

Defossez
G
,
Guyader
L
,
Peyrou
S
,
Uhry
Z
et al.
National Estimates of Cancer Incidence and Mortality in Metropolitan France Between 1990 and 2018
.
Overview
.
Saint-Maurice
:
Santé Publique France
,
2019
. https://www.santepubliquefrance.fr/content/download/190600/2335091?version=1 (8 June 2023, date last accessed)

10

Remontet
L
,
Uhry
Z
,
Bossard
N
et al. ;
CENSUR Working Survival Group
.
Flexible and structured survival model for a simultaneous estimation of non-linear and non-proportional effects and complex interactions between continuous variables: Performance of this multidimensional penalized spline approach in net survival trend analysis
.
Stat Methods Medical Res
2019
;
28
:
2368
84
.

11

Fauvernier
M
,
Roche
L
,
Uhry
Z
,
Tron
L
,
Bossard
N
,
Remontet
L
;
and the Challenges in the Estimation of Net Survival Working Survival Group
.
Multidimensional penalized hazard model with continuous covariates: applications for studying trends and social inequalities in cancer survival
.
J R Stat Soc Ser C Appl Stat
2019
;
68
:
1233
57
.

12

Fauvernier
M
,
Remontet
L
,
Uhry
Z
,
Bossard
N
,
Roche
L.
survPen: an R package for hazard and excess hazard modelling with multidimensional penalized splines
.
Joss
2019
;
4
:
1434
.

13

Coureau
G
,
Mounier
M
,
Tretarre
B
et al. Survival Among Individuals Diagnosed with Cancer in Mainland France 1989–2018—Summary of results. Boulogne-Billancourt: Institut National du Cancer,
2021
. https://www.santepubliquefrance.fr/content/download/411680/3356847?version=1 (8 June 2023, date last accessed).

14

Uhry
Z
,
Dantony
E
,
Roche
L
et al. Survie des Personnes Atteintes de Cancer en France Métropolitaine 1989–2018—Matériel et méthodes. Boulogne-Billancourt: Institut National du Cancer,
2020
. https://www.santepubliquefrance.fr/content/download/384305/3200494?version=1 (8 June 2023, date last accessed).

15

Estève
J
,
Benhamou
E
,
Croasdale
M
,
Raymond
L.
Relative survival and the estimation of net survival: elements for further discussion
.
Stat Med
1990
;
9
:
529
38
.

16

Perme
MP
,
Stare
J
,
Estève
J.
On estimation in relative survival
.
Biometrics
2012
;
68
:
113
20
.

17

Bray
F
,
Ferlay
J
,
Soerjomataram
I
,
Siegel
RL
,
Torre
LA
,
Jemal
A.
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
2018
;
68
:
394
424
.

18

Bouvier
A-M
,
Jooste
V
,
Launoy
G
et al. Survie des Personnes Atteintes de Cancer en France Métropolitaine 1989–2018—Côlon. Boulogne-Billancourt: Institut National du Cancer,
2020
. https://www.santepubliquefrance.fr/content/download/322953/2936263?version=1 (8 June 2023, date last accessed).

19

Cowppli-Bony
A
,
Uhry
Z
,
Remontet
L
et al. ;
French Network of Cancer Registries (FRANCIM)
.
Survival of solid cancer patients in France, 1989–2013: a population-based study
.
Eur J Cancer Prev
2017
;
26
:
461
68
.

20

Kelly
DM
,
Estaquio
C
,
Leon
C
,
Arwidson
P
,
Nabi
H.
Temporal trend in socioeconomic inequalities in the uptake of cancer screening programmes in France between 2005 and 2010: results from the Cancer Barometer surveys
.
BMJ Open
2017
;
7
:
e016941
.

21

Allemani
C
,
Matsuda
T
,
Di Carlo
V
et al. ;
CONCORD Working Group
.
Global surveillance of trends in cancer survival 2000–14 (CONCORD-3): analysis of individual records for 37 513 025 patients diagnosed with one of 18 cancers from 322 population-based registries in 71 countries
.
Lancet
2018
;
391
:
1023
75
.

22

Rossi
S
,
Baili
P
,
Capocaccia
R
et al. ;
EUROCARE-5 Working Group
.
The EUROCARE-5 study on cancer survival in Europe 1999–2007: Database, quality checks and statistical analysis methods
.
Eur J Cancer
2015
;
51
:
2104
19
.

23

Ederer
F
,
Axtell
LM
,
Cutler
SJ.
The relative survival rate: a statistical methodology
.
Natl Cancer Inst Monogr
1961
;
6
:
101
21
.

24

Rodríguez‐Girondo
M
,
Kneib
T
,
Cadarso‐Suárez
C
,
Abu‐Assi
E.
Model building in nonproportional hazard regression
.
Stat Med
2013
;
32
:
5301
14
.

25

Currie
ID
,
Durban
M
,
Eilers
PH.
Smoothing and forecasting mortality rates
.
Stat Modelling
2004
;
4
:
279
98
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)

Supplementary data