Abstract

Motivation

Although there is a rich literature on methods for assessing the impact of functional predictors, the focus has been on approaches for dimension reduction that do not suit certain applications. Examples of standard approaches include functional linear models, functional principal components regression and cluster-based approaches, such as latent trajectory analysis. This article is motivated by applications in which the dynamics in a predictor, across times when the value is relatively extreme, are particularly informative about the response. For example, physicians are interested in relating the dynamics of blood pressure changes during surgery to post-surgery adverse outcomes, and it is thought that the dynamics are more important when blood pressure is significantly elevated or lowered.

Results

We propose a novel class of extrema-weighted feature (XWF) extraction models. Key components in defining XWFs include the marginal density of the predictor, a function up-weighting values at extreme quantiles of this marginal, and functionals characterizing local dynamics. Algorithms are proposed for fitting of XWF-based regression and classification models, and are compared with current methods for functional predictors in simulations and a blood pressure during surgery application. XWFs find features of intraoperative blood pressure trajectories that are predictive of postoperative mortality. By their nature, most of these features cannot be found by previous methods.

Availability and implementation

The R package ‘xwf’ is available at the CRAN repository: https://cran.r-project.org/package=xwf.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Trying to relate functional data with a scalar outcome is a common goal in applications. For instance, images and time series data can be interpreted as functions. One might want to understand what differentiates images that belong to different groups, or what aspects of stock index trajectories are unique to developing versus developed economies. Even though pictures and the stock indices are both functional data, their nature is clearly different, showing the diversity in functional data analysis (FDA) applications and consequently methods.

This article presents a new FDA method motivated by clinical questions regarding blood pressure measurements. Intraoperatively, blood pressure is measured either continuously via invasive monitors or at intervals of 30 s to 3 min. See Figure 1 for example trajectories. The effects of various blood pressure levels are extensively studied. Nonetheless, questions from clinicians remain, especially regarding the relevance of various degrees of blood pressure variability (Mancia and Grassi, 2000).

Fig. 1.

The mean arterial blood pressure measurements of three surgeries during their first 3 h

Fig. 1.

The mean arterial blood pressure measurements of three surgeries during their first 3 h

Variability has been established as an important biomarker (Parati et al., 2013), even though often not fully considered. For instance, it is suggested that high average blood pressure can be a result of blood pressure variability such that established associations between high blood pressure and certain outcomes might be driven by variability. For instance, Rothwell et al. (2010) found that blood pressure variability can predict stroke better than mean blood pressure, and that the effect of variability interacts with the mean. Sternbach et al. (2002) show that anesthesiologists have a big influence on perioperative blood pressure variability.

It is not clear what features of the dynamic variations in blood pressure are most relevant (Aronson et al., 2010). For example, anesthesiologists want to identify those aspects of the dynamics that predict adverse postoperative outcomes, such as postoperative mortality. The main goal is to identify, in real time, those patients that are experiencing adverse blood pressure dynamics. Anesthesiologists can then intervene and modify the dynamics to hopefully prevent subsequent organ injury. It is thought that abrupt changes in blood pressure, especially when associated with severe systolic hypertension and shear stress, significantly increase this risk (Aronson et al., 2011). However, this is mostly based on poor quality evidence and there is no clear idea what types of dynamics are most or least harmful.

Previous studies have used simple metrics of blood pressure variability, such as the standard deviation of measured values, which ignores the order in which the measurements were obtained, and average real variability (Hansen et al., 2010; Mena et al., 2005). These features capture only a very specific type of blood pressure dynamic. Moreover, they do not allow for blood pressure values at the extremes to be treated differently from those in the normal blood pressure ranges.

Trying to relate functions, here blood pressure trajectories, with an outcome, such as postoperative mortality, is often done in FDA. See, for instance, Ramsay and Silverman (2005). Usually, one extracts features from the functional predictors, which are then used inside a regression or classification model. Popular methods for feature extraction are functional principal components analysis (Jones and Rice, 1992) or basis expansions, which include wavelets (Morris et al., 2003, 2008) and Fourier transforms.

Giraldo and Domínguez (2010) describe feature extraction jointly with dimension reduction to motivate a weighting of features extracted from functional data. Chen et al. (2014) consider weighting of functions for FDA. They focus on distances between functions while we focus on weighted feature extraction. Moreover, while also data-dependent, their weighting functions and their motivation differ from those used here. Hall and Hooker (2016) estimate a truncation of the domain on which the functional parameter in functional linear regression is nonzero. This allows varying domains in a limited fashion. Gellar et al. (2014) propose a variable-domain functional regression which builds upon the generalized functional linear model by allowing smooth changes in the functional parameter as the time domain changes. Despite the large literature, existing FDA methods are clearly unsatisfactory for our motivating blood pressure application, as well as for other related applications.

Most existing approaches assume that the various functions share the same domain, while the time and progression of surgery varies substantially across patients, even when stratifying by type of procedure. One could naively normalize the lengths of surgeries, but such standardization removes valuable information on time and has no clinical justification. The reason to place the curves on a common time domain is that usual FDA methods focus on characterizing shared versus individual-specific components of variability in this trajectory specific to each time. If there are similarities in the dynamics over time (e.g. a spike upwards in the curve), then the curves need to be aligned so that the dynamic features (e.g. the spike) occur at the same time to enable detection. Similar limitations hold for the domain-varying method from Gellar et al. (2014) as the functional linear model with smoothing with respect to changes in the domain can still only detect features that are in some way aligned in the various domains. That method, for instance, reduces to standard functional regression in the absence of variation in the domains of the functional predictors. For the blood pressure data, such time alignment is artificial; instead, we want to characterize similarities and differences in the dynamics in the trajectories even if these dynamics occur at varying times during surgeries of very different lengths.

These shortcomings of existing methods for this application and the interest of clinicians in local effects, with the extremes of particular importance, provide motivation for this article. Our focus is on proposing a new approach for feature extraction for functional predictors that address these issues. Its main building blocks are a flexible definition of variability, or more generally local effects, and a framework that allows for extremes to contribute more than average values. Functions measuring local effects, or variability, are averaged over the domain of the functional predictor. This is a weighted average where the weight is influenced at each point in the domain by how extreme the functional predictor is there. This average for a specific local-effect function is referred to as an extrema-weighted feature (XWF).

1.1 The blood pressure data and the outcome

We have access to the blood pressure data from 80 065 surgeries performed at the Duke University Health System between 2000 and 2014. We consider the mean arterial pressure (MAP). All MAP measurements below 10 and above 250 are discarded as measurement errors. Most measurements are 30 s apart but not all. The methods presented can deal with unevenly space measurements but might behave unexpectedly in the presence of large gaps in measurements. Therefore, cases with gaps of >5 min in their blood pressure measurements are removed, also because they potentially involve unusual circumstances. Additionally, we remove blood pressure trajectories that are shorter than 30 min.

The outcome of interest is 30-day postoperative mortality. The medical database includes mortality records which indicate whether someone died within 30 days after surgery. Clinically, certain postoperative complications, such as infection or reduced renal function, are more interesting to be related with intraoperative blood pressures. However, mortality data are more reliable and have less missing data issues than other outcomes, and hence provide a clearer application to test our new method on.

To obtain a more homogeneous population of procedures, we restrict ourselves to non-cardiac surgeries. Other features considered are gender, age and body mass index (BMI). Any cases with these missing, or a BMI above 400, most likely a data error, are excluded. This leaves us with 18 657 surgeries. Figure 1 shows a couple of these blood pressure trajectories.

2 Materials and methods

This section describes the proposed XWF extraction as well as two alternative approaches we will use for comparison.

2.1 Notation and proposed model

Let yi denote the response for subject i, for i = 1,…, n. Let xi:Ti, where TiT, denote a functional predictor for subject i, here the MAP trajectory, with Ti the time domain across which the predictor is observed for that subject. We restrict ourselves for simplicity to functions that are curves over time, though extensions to functions in more than one dimension are possible. Let zi=(zi1,,ziq)T denote q other covariates that may be important in predicting yi. For the blood pressure application, we consider gender, age, BMI and duration of surgery such that q = 4.

We assume dense measurements of xi(t) are available at successive locations ti=(ti1,,tini)T within Ti. In the presence of short blocks of time with intermittent missingness within Ti, we use linear interpolation to fill in the gaps. Although we could estimate our features without this interpolation, simply using the times that are available for a patient, we wish to account for the possibility that the observed trajectories at adjacent times to the missing data block are informative about the occurrence of missing values. One can alternatively use more advanced interpolation or model the measurements as noisy observations of the underlying function xi, though we do not consider such approaches in this article.

We let f denote the marginal density of xi(t). In the application, f is the marginal density of blood pressure measurements. We define this marginal density to satisfy xi(t) ∼ f, with {xi,Ti} sampled from the population and t sampled uniformly from Ti. In addition, let F denote the corresponding cumulative distribution function (CDF). Motivated by the application in Section 1.1, we want to define a method for upweighting the importance of local features of xi for times t where xi(t) is in the tails of f. In the application, these tails correspond to unusually high or low blood pressure values calibrated relative to the distribution of blood pressure values across individuals having non-cardiac surgical procedures. Let ui(t) = F{xi(t)}, for tTi, denote trajectories that are transformed to have standard uniform marginal density.

2.1.1 Local features

Let ψj(xi, t), for j = 1,…, p, be functions that characterize different aspects of the local dynamics of the trajectory xi at time t. Let us define (c)+ = max(0, c) and (c)- = max(0, –c). In this article, p = 4, ψ1(xi,t)=1, ψ2(xi,t)=xi(t) as the level of the trajectory at that time point, ψ3(xi,t)=(xi(t))+ as the rate of increase, and ψ4(xi,t)=(xi(t)) as the rate of decrease. Here, we use linear interpolation such that the derivative follows as:  
xi(t)=xi(tik)xi(ti(k1))tikti(k1)for t(ti(k1),tik].

2.1.2 Extrema weighting

Weight functions ωj(u) on [0, 1] allow differential weights of the local functional dynamic features ψj according to the quantile ui(t). This is motivated by the hypothesis that a specific blood pressure local functional feature affects the outcome in varying degrees depending on whether it appears at low, medium or high blood pressures.

Due to the difference in nature of abnormally high versus low blood pressures, we split ω into two parts; one upweighting low (ωL)and one upweighting high (ωR)blood pressures. Specifically, we choose  
ωj(u)=ωLj(u)+ωRj(u)
with  
ωLj(u)={1,ubLj(1u1bLj)2,u>bLj
and  
ωRj(u)={1,ubRj(ubRj)2,u<bRj.
Here, bLj ∈ (0, 1∕2] and bRj ∈ [1∕2, 1) are parameters controlling the weighting near zero and one. Figure 2 contains examples. Weight functions that are more concentrated near zero or one will weight values in the further extremes of the tail more highly. We will estimate bLj and bRj based on the data as described in Section 2.1.4.
Fig. 2.

Examples of ωLj and ωRj with bLj = 0.3 and bRj = 0.9, respectively

Fig. 2.

Examples of ωLj and ωRj with bLj = 0.3 and bRj = 0.9, respectively

Supplementary Section S2 compares the results from Section 3.3 with this quadratically decaying choice for ωj to linearly and exponentially decaying weighting. The sensitivity analysis reveals that, while the choice is important, our main finding is consistent across these different choices for ωj.

2.1.3 Extrema-weighted features

Employing the aforementioned components, we define the XWFs as  
wij=1TiTiωj{ui(t)}ψj(xi,t)dt=1TiTiωj[F{xi(t)}]ψj(xi,t)dt=1TiTiωLj{ui(t)}ψj(xi,t)dtwLij+1TiTiωRj{ui(t)}ψj(xi,t)dtwRij
(1)
where Ti=|Ti| is the total time across which the trajectory is observed for individual i. Since this time may be informative about the outcome yi, we included Ti as one of the covariates in zi.

The XWFs have an interpretation as weighted averages. If ωj(⋅) = 1, then the XWFs are time averages of the ψs. Since ωj varies, they become weighted: wL2 is a time average of blood pressure values where low values are weighted more. Similarly, wR3 is a time average of increases in blood pressure where times with high blood pressures are weighted more.

Finally, the response yi is modeled as a generalized additive model (GAM) (Hastie and Tibshirani, 1986; Wood, 2017) with predictors xi, in terms of wLi, and wRi, and zi. That is, yi ∼Π (μi) independently for i = 1,…, n where Π(μi) is a distribution from an exponential family with mean μi. For some link function g,  
g(μi)=α+j=1qγj(zij)+j=1pβLj(wLij)+j=1pβRj(wRij)
(2)
with α an intercept, γ1,…,γq (non-linear) functions on the covariates zi, and βL1, βR1,…,βLp, βRp functions on the XWFs. In the blood pressure application, we use logistic regression. That means that Π(μi) is a Bernoulli distribution with success probability μi and g is the logit function.

2.1.4 Estimation algorithm

We estimate the density f through a weighted kernel smoothing approach. In particular, we treat the data as arising from a stratified sampling design, and assign each blood pressure measurement xi(tik), k = 1,…, ni, i = 1,…, n, sampling weight 1∕Ti, which is inversely proportional to the surgery duration. From an estimate of the density f, we can obtain an estimate of the CDF F.

The integral in defining the XWFs in (1) is one-dimensional and can be accurately approximated using a variety of numerical methods. Then, once accurate approximations of the feature vectors wLi and wRi are produced, conditionally on the unknown bL and bR in the weight functions ω, we can easily obtain penalized maximum likelihood estimates for the coefficients in (2) using the R package ‘mgcv’. The smoothness of the estimated functions is determined by minimizing the generalized cross-validation (GCV) score, when the scale parameter is unknown, or the unbiased risk estimator (UBRE) score, when the scale parameter is known such as here in logistic regression (Wood, 2017). Therefore, a grid search can find the bL and bR that minimize the GAM UBRE score. We employ an adaptive grid search akin to binary search, detailed in Algorithm 1. We use M = 3 in this article such that we get bLjs and bRjs at a resolution of 24.

2.2 Alternative approaches for comparison

To evaluate the XWF extraction in perspective, we consider two other approaches: A measure from the medical literature, average real variability and a method based on the power spectrum of blood pressure trajectories which is closer to traditional FDA approaches.

2.2.1 Average real variability

Mena et al. (2005) proposed average real variability (ARV) as an improvement over standard deviation as a measure of blood pressure variability. ARV has been used to measure the effect of variability on health outcomes (Hansen et al., 2010). We consider the version from Hansen et al. (2010) that takes into account differences in times between subsequent measurements. That is,  
ARVi=1tiniti1k=2ni(tikti(k1)) |xi(tik)xi(ti(k1))|
in our notation.

To compare the XWFs with this more straight-forward measure, we fit the data in a generalized additive model with one of the predictors being the ARV. Specifically, the alternative approach consists of replacing the 2p XWFs in (2) by the ARV.

2.2.2 Power spectrum

Being able to capture local dynamics rather than global effects is an important characteristic of XWFs. An issue with global effects is that they require some kind of alignment across observations of the functional predictor. By transforming the functional predictor to the frequency domain, one also moves away from this alignment requirement. Therefore, we consider principal components analysis on the power spectrum of the functional predictor.

We compute the power spectrum of each xi via a fast Fourier transform. Since a spectrum is a function itself, we treat it as a functional predictor. Specifically, we compute its supervised principal components (PCs) (Bair and Tibshirani, 2004; Bair et al., 2006). The supervised PC threshold is determined by cross-validation. As detailed in Ramsay and Silverman (2005, Chap. 8), principal component computation for the spectra, being discretized functions, does not differ from vector-valued data. The three largest PCs replace the XWFs in (2). That is, supervised PCs analysis on the power spectrum of the functional predictor provides features for a generalized additive model.

Algorithm 1

Adaptive grid search to optimize the weighting functions

Initialization: Set bLj = 0.25 and bRj = 0.75 for j = 1,…, P.

For m = 1,…, M, do the following for j = 1,…, P:

  1. Compute the UBRE score for when bLj is 21–m larger than its current value, equal to its current value, and when it is 21–m smaller than its current value, while not changing any other parameters. Increase or decrease bLj by 22–m if the first or the third UBRE score, respectively, is the smallest. Otherwise, do nothing.

  2. Compute the UBRE score for when bRj is 21–m larger than its current value, equal to its current value, and when it is 21–m smaller than its current value, while not changing any other parameters. Increase or decrease bRj by 22–m if the first or the third UBRE score, respectively, is the smallest. Otherwise, do nothing.

Return: Values for bLj and bRj, j = 1,…, P.

2.3 Inference

Our XWFs algorithm estimates the weighting functions by minimizing the UBRE score. The generalized additive model is then fit with the resulting XWFs as predictors. The P-values that the GAM provides (Wood, 2013) however do not take into account any estimation uncertainty in the weighting functions. To obtain principled measures of statistical significance, we obtain P-values via a permutation test where the weighting functions are re-estimated for each permutation.

We reorder the elements in the outcome vector y at random, fit the XWF model to the resulting data, and record the reported P-values. By construction, any association between the predictors and the outcome y in these data is by chance. Also, the recorded P-values are sample statistics. Let us denote them by Ps, s = 1,…, S, with S the number of permutations. Let P0 be the P-value from the original data. Then, we compute the P-value as:  
P=1+s=1S1[PsP0]1+S
which is conservatively very slightly greater than the exact P-value of a permutation test (Phipson and Smyth, 2010). We use S = 100.

For consistency, the P-values for the alternative approaches, the ARV and the power spectrum, are obtained in the same fashion. An added benefit of this approach is that it does not require assumptions on the data generating process for the inference to be valid.

3 Results

This section starts with two simulation studies to understand the performance and interpretation of all three methods considered. It concludes with the results from the blood pressure application. Supplementary Section S1 contains additional simulations.

3.1 Frequency simulation

Let n = 200, and, for i = 1,…, n; ni = 60 and tik = k for k = 1,…, ni. We draw ϕi from a uniform distribution on (2, 5) and τi uniformly from {2, 3,…, 29} independently for i = 1,…, n. The functional predictors are defined by:  
xi(t)={sin(πt30),tτisin(π(τi+(tτi)ϕi)30)+2,t(τi,τi+30]sin(π(τi+30ϕi+t)30),t>+30
for i = 1,…, n. That is, the xi have a fixed frequency and are centered at zero except for a randomly sampled interval of size 30 in their domain where they have a higher, random frequency ϕi and are centered at two. Let q = 2 in (2) and sample the zij i.i.d. from a standard normal distribution. The outcome yi ∈{0, 1} is sampled according to  
P(yi=1)=11+exp(median({ϕj}j=1n)ϕi),
independently for i = 1,…, n.

Supplementary Table S1 contains the P-values for the XWFs, ARV and the power spectrum. The P-values for the zij were not significant and were left out. Figure 3 contains the principal components used in the power spectrum method. The weighting functions of the statistically significant XWFs (with P <0.05) were estimated as bR1 = 0.84 bL2 = 0.03, bR3 = 0.53 and bR4 = 0.75. The corresponding smooth GAM functions are shown in Figure 4. The confidence region is from the GAM and ignores uncertainty in our estimation of the weighting functions. The horizontal axes range from the 1st to the 99th percentile of the XWF.

Fig. 3.

Top three supervised principal components of the power spectra from the frequency simulated data

Fig. 3.

Top three supervised principal components of the power spectra from the frequency simulated data

Fig. 4.

Statistically significant smooth functions from the generalized additive model for the frequency simulation with 95% confidence region in gray

Fig. 4.

Statistically significant smooth functions from the generalized additive model for the frequency simulation with 95% confidence region in gray

Since ARV increases with ϕi, ARV finds a trend between variability and the outcome in the simulated data. Similarly, ϕi directly relates to the frequencies in the power spectrum. Therefore, the power spectrum method also finds an effect. However, it is hard to interpret the first spectrum principal component, even when knowing the data-generating process.

Extrema-weighted feature selection picks up that variability at higher values increases P(yi = 1): The magnitude of the derivative of xi is proportional to ϕi if xi(t) > 1. The XWFs wR3 and wR4 measure the magnitude of the derivative since ψ3(xi,t)=(xi(t))+ and ψ4(xi,t)=(xi(t)). What happens at higher values of xi(t) contributes more to the values of wR3 and wR4. According to the data-generating process, yi is positively associated with ϕi. Figure 4 shows that yi is also positively associated with wR3 and wR4. This association is thus reflective of the data-generating process. The other two significant XWFs, wR1 and wL2, do not vary with ϕi. Their association with yi as implied by their statistical significance is not consistent with the true model where yi depends on ϕi only. The results from the XWF-based analysis yield observations consistent with the underlying data-generating process via wR3 and wR4 with richer and clearer interpretations than the alternative methods.

3.2 XWF simulation

Let n = 1000 and ni = 50 for i = 1,…, n. The zij in (2) and t are defined analogous to the frequency simulation from the previous section. We draw ϕi and μi from a uniform distribution on (0.9, 0.99) and (2(1ϕi),4(1ϕi)), respectively, independently for i = 1,…, n. Sample the measurements of xi as follows: Set xi(1)=μi1ϕi and  
xi(k)N(μi+ϕixi(k1),0.052),k=2,,ni,
independently for i = 1,…, n. We set bL1 = 0.2 and bR3 = 0.7 and compute the corresponding wL1 and wR3. Set θi=wL1/swL1+wR3/swR3 where swL1 and swR3 are the sample standard deviations of wL1 and wR3, respectively. Then, the outcome yi ∈{0, 1} is sampled independently for i = 1,…, n according to  
P(yi=1)=11+exp(median({θj}j=1n)θi).
Supplementary Table S2 of the contains the P-values for the three approaches under consideration. As in Section 3.1, the P-values for the zij were not significant and were left out. Figure 5 contains the supervised principal components used in the power spectrum method. The weighting functions of the statistically significant XWFs were estimated as bL1 = 0.03, bR3 = 0.72 and bL4 = 0.25. The corresponding smooth GAM functions are shown in Figure 6.
Fig. 5.

The three supervised principal components of the power spectra from the XWF simulated data

Fig. 5.

The three supervised principal components of the power spectra from the XWF simulated data

Fig. 6.

Statistically significant smooth functions from the generalized additive model for the XWF simulation with 95% confidence region in gray

Fig. 6.

Statistically significant smooth functions from the generalized additive model for the XWF simulation with 95% confidence region in gray

Average real variability measures variation which is associated with yi via wR3. As a result, ARV finds a trend. The supervised principal components fail to find a signal with the frequency spectrum dominated by the autoregressive process: The autoregressive parameters do not influence y directly. Figure 6 shows that wL1 and wR3 capture most of the variation in y. This is consistent with how y was simulated from wL1 and wR3. Also, the estimates for bL1 and bR3 are close to the values used for simulation.

3.3 Blood pressure data

Table 1 contains the results for the three methods when applied to the blood pressure data. The P-values for the zij were omitted, being all.01. Figure 7 contains the supervised principal components used in the power spectrum method. For computational stability of the supervised principal components, we truncated the spectra at a frequency of a 1000. The weighting function of the statistically significant XWF was estimated as bL2 = 0.03. The corresponding smooth GAM function is shown in Figure 8.

Table 1.

Results from the blood pressure data

MethodParameterP-value
XWFs βL1 0.13 
 βR1 0.07 
 βL2 0.01 
 βR2 0.14 
 βL3 0.78 
 βR3 0.49 
 βL4 0.45 
 βR4 0.63 
Average real variability ARV 0.01 
Power spectrum PC 1 0.01 
 PC 2 0.31 
 PC 3 0.02 
MethodParameterP-value
XWFs βL1 0.13 
 βR1 0.07 
 βL2 0.01 
 βR2 0.14 
 βL3 0.78 
 βR3 0.49 
 βL4 0.45 
 βR4 0.63 
Average real variability ARV 0.01 
Power spectrum PC 1 0.01 
 PC 2 0.31 
 PC 3 0.02 
Table 1.

Results from the blood pressure data

MethodParameterP-value
XWFs βL1 0.13 
 βR1 0.07 
 βL2 0.01 
 βR2 0.14 
 βL3 0.78 
 βR3 0.49 
 βL4 0.45 
 βR4 0.63 
Average real variability ARV 0.01 
Power spectrum PC 1 0.01 
 PC 2 0.31 
 PC 3 0.02 
MethodParameterP-value
XWFs βL1 0.13 
 βR1 0.07 
 βL2 0.01 
 βR2 0.14 
 βL3 0.78 
 βR3 0.49 
 βL4 0.45 
 βR4 0.63 
Average real variability ARV 0.01 
Power spectrum PC 1 0.01 
 PC 2 0.31 
 PC 3 0.02 

Fig. 7.

Top three principal components of the power spectra from the blood pressure trajectories

Fig. 7.

Top three principal components of the power spectra from the blood pressure trajectories

Fig. 8.

Statistically significant smooth function from the generalized additive model for the blood pressure application with 95% confidence region in gray

Fig. 8.

Statistically significant smooth function from the generalized additive model for the blood pressure application with 95% confidence region in gray

Figure 9 shows the blood pressure trajectories with the most extreme values for the XWFs, out of all 18 657 trajectories. It is striking how the trajectories vary in length: One of the goals of the XWFs was to be well-defined over, and be comparable across, functions with greatly varying sizes of domain.

Fig. 9.

The blood pressure trajectories with the highest (black) and lowest (gray) values for the XWFs

Fig. 9.

The blood pressure trajectories with the highest (black) and lowest (gray) values for the XWFs

While the XWFs are defined in an interpretable fashion, their values are not always readily gleaned from a trajectory plot. It is for instance hard to eyeball that the black trajectory has a higher XWF value than the gray trajectory for wL2 and wL3 in Figure 9. However, that wL1 and wR1 measure difference in level is clearly visible with the black and gray trajectories separated in terms of average value. Similarly, both trajectories plotted for wR3 and wR4 show a clear difference in variability.

Average real variability and the power spectrum method find an association between the blood pressure trajectories and postoperative mortality. However, only the XWFs reveal that low blood pressure values are predictive. Figure 8 shows that higher and more blood pressure values at the lower extreme (wL2) are associated with increased mortality. This interpretation is different from how one would interpret ARV. This shows the importance of considering richer features than ARV.

3.3.1 Relation with existing findings

Our result for ARV is consistent with previous literature that showed that an increased ARV is associated with adverse outcomes (Hansen et al., 2010; Mena et al., 2005). ARV is, to our knowledge, the most advanced method used previously to assess the effect of the blood pressure variability.

As XWFs provide richer insights, they cannot be contrasted one-to-one with former findings. We can instead look at how the richer insights from the XWF results fit into previously found associations. Having a larger proportion of blood pressure values in the extremes was associated with mortality. Extreme values are more common in the presence of high blood pressure standard deviation or ARV which previous study also found to be associated with increased mortality (Parati et al., 2013).

3.3.2 Predictive performance

Using specific predefined features instead of general methods that are less constrained in how they capture variability in the data, might hurt the overall explanatory power. To assess whether XWFs suffer from this in the motivating application, we look at predictive performance. We randomly select 1000 cases, 100 with yi = 1 and 900 with y1 = 0, as test data leaving training data of size 17 657. We compute the XWFs, ARV and supervised principal components as before on the training data. Three predictive models are considered: A GAM with the zij and only XWFs, with XWFs and ARV, and with XWFs and supervised principal components. We compare the performance of these on the test data to assess any loss in explanatory power by being constrained to XWFs.

Area under the curve (AUC) of the receiver operating characteristic quantifies predictive performance. Using 10 repeated random splits of the data into training and test set, we obtain 10 AUCs for each of the three models considered. They range from 0.56 to 0.64 while the absolute differences in AUC between the models are <0.01, none of which are statistically significant. Additionally, Supplementary Section S3 similarly does not show a notable difference in the precision-recall curves. The usage of the more interpretable predefined XWFs does not seem to lessen explanatory power compared to approaches which have a greater focus on predictive performance and less on interpretability.

4 Discussion

Existing functional feature extraction methods focus on functions that share the same domain and mostly extract global features. Motivated by rich blood pressure data from surgeries, we introduced a new feature extraction method, called XWFs, allowing for differing domains while aiming at functional features that are defined in a local nature, like variability. We compared XWFs to ARV, a functional feature from the medical literature, and to supervised principal components analysis on the power spectrum of the functional predictor.

Simulations reveal that XWFs are not only more easily interpretable due to the way they are defined, but they are also more flexible in detecting associations that do not fit traditional models. This was instrumental in finding a relationship between blood pressure trajectories during surgery and the outcome postoperative mortality. Whereas the power spectrum principal components detected a hard to interpret association and ARV only detects that variability is predictive, the proposed XWF model found that increased blood pressure at both extremes of the observed blood pressure range are associated with increased mortality. The XWF results directly address questions of interest to the clinician collaborators. Despite their predefined and specific nature, XWFs had similar explanatory power as competing methods.

This article presented a set of well-defined features applicable to the data at hand. The ideas behind them and the general framework of allowing varying domains in the functional predictors and taking weighted averages of local phenomena can however be applied in a larger variety of ways, making the methods from this article widely applicable to functional data.

Many parts of the XWF model provide choices to adapt the method to other applications. Weighting functions can be chosen and parametrized as desired. Similarly, the generalized additive model shows one way to use XWFs but the features can be broadly used within different models and machine learning methods.

We presented a novel functional feature extraction method motivated by modeling the effect of intraoperative blood pressure trajectories on postoperative mortality. The XWFs proposed are by design suited for functional predictors whose domains vary. The XWFs outperform more traditional methods and FDA tools, both in simulations and in the application of interest. The proposed model constitutes a modifiable framework for interpretable functional feature extraction.

Acknowledgements

We would like to thank the anonymous referees for suggestions that substantially improved the article and Dr Michael W. Manning and Dr Tracy L. Setji for their discussions of the medical data.

Funding

This work was supported by Accenture PLC.

Conflict of Interest: none declared.

References

Aronson
S.
et al.  (
2010
)
Intraoperative systolic blood pressure variability predicts 30-day mortality in aortocoronary bypass surgery patients
.
Anesthesiology
,
113
,
305
312
.

Aronson
S.
et al.  (
2011
)
Does perioperative systolic blood pressure variability predict mortality after cardiac surgery? An exploratory analysis of the ECLIPSE trials
.
Anesth. Analg
.,
113
,
19
30
.

Bair
E.
,
Tibshirani
R.
(
2004
)
Semi-supervised methods to predict patient survival from gene expression data
.
PLoS Biol
.,
2
,
e108.

Bair
E.
et al.  (
2006
)
Prediction by supervised principal components
.
J. Am. Stat. Assoc
.,
101
,
119
137
.

Chen
H.
et al.  (
2014
)
Optimally weighted L2 distance for functional data
.
Biometrics
,
70
,
516
525
.

Gellar
J.E.
et al.  (
2014
)
Variable-domain functional regression for modeling ICU data
.
J. Am. Stat. Assoc
.,
109
,
1425
1439
.

Giraldo
L.G.S.
,
Domínguez
G.C.
(
2010
)
Weighted feature extraction with a functional data extension
.
Neurocomput
,
73
,
1760
1773
.

Hall
P.
,
Hooker
G.
(
2016
)
Truncated linear models for functional data
.
J. R. Stat. Soc. Series B (Stat. Methodol.)
,
78
,
637
653
.

Hansen
T.W.
et al.  (
2010
)
Prognostic value of reading-to-reading blood pressure variability over 24 hours in 8938 subjects from 11 populations
.
Hypertension
,
55
,
1049
1057
.

Hastie
T.
,
Tibshirani
R.
(
1986
)
Generalized additive models
.
Stat. Sci
.,
1
,
297
310
.

Jones
M.C.
,
Rice
J.A.
(
1992
)
Displaying the important features of large collections of similar curves
.
Am. Stat
.,
46
,
140
.

Mancia
G.
,
Grassi
G.
(
2000
)
Mechanisms and clinical implications of blood pressure variability
.
J. Cardiovasc. Pharmacol
.,
34 (7 Suppl 4)
,
S15
19
.

Mena
L.
et al.  (
2005
)
A reliable index for the prognostic significance of blood pressure variability
.
J. Hypertens
.,
23
,
505
511
.

Morris
J.S.
et al.  (
2003
)
Wavelet-based nonparametric modeling of hierarchical functions in colon carcinogenesis
.
J. Am. Stat. Assoc
.,
98
,
573
583
.

Morris
J.S.
et al.  (
2008
)
Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models
.
Biometrics
,
64
,
479
489
.

Parati
G.
et al.  (
2013
)
Assessment and management of blood-pressure variability
.
Nat. Rev. Cardiol
.,
10
,
143
155
.

Phipson
B.
,
Smyth
G.K.
(
2010
)
Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn
.
Stat. Appl. Genet. Mol. Biol
.,
9
, Article 39. doi: 10.2202/1544-6115.1585.

Ramsay
J.
,
Silverman
B.W.
(
2005
)
Functional Data Analysis
. 2nd edn.
Springer-Verlag
,
New York
.

Rothwell
P.M.
et al.  (
2010
)
Prognostic significance of visit-to-visit variability, maximum systolic blood pressure, and episodic hypertension
.
Lancet
,
375
,
895
905
.

Sternbach
Y.
et al.  (
2002
)
Hemodynamic benefits of regional anesthesia for carotid endarterectomy
.
J. Vasc. Surg
.,
35
,
333
339
.

Wood
S.N.
(
2013
)
On P-values for smooth components of an extended generalized additive model
.
Biometrika
,
100
,
221
228
.

Wood
S.N.
(
2017
)
Generalized Additive Models: An Introduction with R
. 2nd edn (Chapman & Hall/CRC Texts in Statistical Science).
Chapman and Hall/CRC, Boca Raton, Florida
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Associate Editor: Jonathan Wren
Jonathan Wren
Associate Editor
Search for other works by this author on:

Supplementary data