-
PDF
- Split View
-
Views
-
Cite
Cite
Willem van den Boom, Callie Mao, Rebecca A Schroeder, David B Dunson, Extrema-weighted feature extraction for functional data, Bioinformatics, Volume 34, Issue 14, 15 July 2018, Pages 2457–2464, https://doi.org/10.1093/bioinformatics/bty120
Close -
Share
Abstract
Although there is a rich literature on methods for assessing the impact of functional predictors, the focus has been on approaches for dimension reduction that do not suit certain applications. Examples of standard approaches include functional linear models, functional principal components regression and cluster-based approaches, such as latent trajectory analysis. This article is motivated by applications in which the dynamics in a predictor, across times when the value is relatively extreme, are particularly informative about the response. For example, physicians are interested in relating the dynamics of blood pressure changes during surgery to post-surgery adverse outcomes, and it is thought that the dynamics are more important when blood pressure is significantly elevated or lowered.
We propose a novel class of extrema-weighted feature (XWF) extraction models. Key components in defining XWFs include the marginal density of the predictor, a function up-weighting values at extreme quantiles of this marginal, and functionals characterizing local dynamics. Algorithms are proposed for fitting of XWF-based regression and classification models, and are compared with current methods for functional predictors in simulations and a blood pressure during surgery application. XWFs find features of intraoperative blood pressure trajectories that are predictive of postoperative mortality. By their nature, most of these features cannot be found by previous methods.
The R package ‘xwf’ is available at the CRAN repository: https://cran.r-project.org/package=xwf.
Supplementary data are available at Bioinformatics online.
1 Introduction
Trying to relate functional data with a scalar outcome is a common goal in applications. For instance, images and time series data can be interpreted as functions. One might want to understand what differentiates images that belong to different groups, or what aspects of stock index trajectories are unique to developing versus developed economies. Even though pictures and the stock indices are both functional data, their nature is clearly different, showing the diversity in functional data analysis (FDA) applications and consequently methods.
This article presents a new FDA method motivated by clinical questions regarding blood pressure measurements. Intraoperatively, blood pressure is measured either continuously via invasive monitors or at intervals of 30 s to 3 min. See Figure 1 for example trajectories. The effects of various blood pressure levels are extensively studied. Nonetheless, questions from clinicians remain, especially regarding the relevance of various degrees of blood pressure variability (Mancia and Grassi, 2000).
The mean arterial blood pressure measurements of three surgeries during their first 3 h
The mean arterial blood pressure measurements of three surgeries during their first 3 h
Variability has been established as an important biomarker (Parati et al., 2013), even though often not fully considered. For instance, it is suggested that high average blood pressure can be a result of blood pressure variability such that established associations between high blood pressure and certain outcomes might be driven by variability. For instance, Rothwell et al. (2010) found that blood pressure variability can predict stroke better than mean blood pressure, and that the effect of variability interacts with the mean. Sternbach et al. (2002) show that anesthesiologists have a big influence on perioperative blood pressure variability.
It is not clear what features of the dynamic variations in blood pressure are most relevant (Aronson et al., 2010). For example, anesthesiologists want to identify those aspects of the dynamics that predict adverse postoperative outcomes, such as postoperative mortality. The main goal is to identify, in real time, those patients that are experiencing adverse blood pressure dynamics. Anesthesiologists can then intervene and modify the dynamics to hopefully prevent subsequent organ injury. It is thought that abrupt changes in blood pressure, especially when associated with severe systolic hypertension and shear stress, significantly increase this risk (Aronson et al., 2011). However, this is mostly based on poor quality evidence and there is no clear idea what types of dynamics are most or least harmful.
Previous studies have used simple metrics of blood pressure variability, such as the standard deviation of measured values, which ignores the order in which the measurements were obtained, and average real variability (Hansen et al., 2010; Mena et al., 2005). These features capture only a very specific type of blood pressure dynamic. Moreover, they do not allow for blood pressure values at the extremes to be treated differently from those in the normal blood pressure ranges.
Trying to relate functions, here blood pressure trajectories, with an outcome, such as postoperative mortality, is often done in FDA. See, for instance, Ramsay and Silverman (2005). Usually, one extracts features from the functional predictors, which are then used inside a regression or classification model. Popular methods for feature extraction are functional principal components analysis (Jones and Rice, 1992) or basis expansions, which include wavelets (Morris et al., 2003, 2008) and Fourier transforms.
Giraldo and Domínguez (2010) describe feature extraction jointly with dimension reduction to motivate a weighting of features extracted from functional data. Chen et al. (2014) consider weighting of functions for FDA. They focus on distances between functions while we focus on weighted feature extraction. Moreover, while also data-dependent, their weighting functions and their motivation differ from those used here. Hall and Hooker (2016) estimate a truncation of the domain on which the functional parameter in functional linear regression is nonzero. This allows varying domains in a limited fashion. Gellar et al. (2014) propose a variable-domain functional regression which builds upon the generalized functional linear model by allowing smooth changes in the functional parameter as the time domain changes. Despite the large literature, existing FDA methods are clearly unsatisfactory for our motivating blood pressure application, as well as for other related applications.
Most existing approaches assume that the various functions share the same domain, while the time and progression of surgery varies substantially across patients, even when stratifying by type of procedure. One could naively normalize the lengths of surgeries, but such standardization removes valuable information on time and has no clinical justification. The reason to place the curves on a common time domain is that usual FDA methods focus on characterizing shared versus individual-specific components of variability in this trajectory specific to each time. If there are similarities in the dynamics over time (e.g. a spike upwards in the curve), then the curves need to be aligned so that the dynamic features (e.g. the spike) occur at the same time to enable detection. Similar limitations hold for the domain-varying method from Gellar et al. (2014) as the functional linear model with smoothing with respect to changes in the domain can still only detect features that are in some way aligned in the various domains. That method, for instance, reduces to standard functional regression in the absence of variation in the domains of the functional predictors. For the blood pressure data, such time alignment is artificial; instead, we want to characterize similarities and differences in the dynamics in the trajectories even if these dynamics occur at varying times during surgeries of very different lengths.
These shortcomings of existing methods for this application and the interest of clinicians in local effects, with the extremes of particular importance, provide motivation for this article. Our focus is on proposing a new approach for feature extraction for functional predictors that address these issues. Its main building blocks are a flexible definition of variability, or more generally local effects, and a framework that allows for extremes to contribute more than average values. Functions measuring local effects, or variability, are averaged over the domain of the functional predictor. This is a weighted average where the weight is influenced at each point in the domain by how extreme the functional predictor is there. This average for a specific local-effect function is referred to as an extrema-weighted feature (XWF).
1.1 The blood pressure data and the outcome
We have access to the blood pressure data from 80 065 surgeries performed at the Duke University Health System between 2000 and 2014. We consider the mean arterial pressure (MAP). All MAP measurements below 10 and above 250 are discarded as measurement errors. Most measurements are 30 s apart but not all. The methods presented can deal with unevenly space measurements but might behave unexpectedly in the presence of large gaps in measurements. Therefore, cases with gaps of >5 min in their blood pressure measurements are removed, also because they potentially involve unusual circumstances. Additionally, we remove blood pressure trajectories that are shorter than 30 min.
The outcome of interest is 30-day postoperative mortality. The medical database includes mortality records which indicate whether someone died within 30 days after surgery. Clinically, certain postoperative complications, such as infection or reduced renal function, are more interesting to be related with intraoperative blood pressures. However, mortality data are more reliable and have less missing data issues than other outcomes, and hence provide a clearer application to test our new method on.
To obtain a more homogeneous population of procedures, we restrict ourselves to non-cardiac surgeries. Other features considered are gender, age and body mass index (BMI). Any cases with these missing, or a BMI above 400, most likely a data error, are excluded. This leaves us with 18 657 surgeries. Figure 1 shows a couple of these blood pressure trajectories.
2 Materials and methods
This section describes the proposed XWF extraction as well as two alternative approaches we will use for comparison.
2.1 Notation and proposed model
Let yi denote the response for subject i, for i = 1,…, n. Let , where , denote a functional predictor for subject i, here the MAP trajectory, with the time domain across which the predictor is observed for that subject. We restrict ourselves for simplicity to functions that are curves over time, though extensions to functions in more than one dimension are possible. Let denote q other covariates that may be important in predicting yi. For the blood pressure application, we consider gender, age, BMI and duration of surgery such that q = 4.
We assume dense measurements of xi(t) are available at successive locations within . In the presence of short blocks of time with intermittent missingness within , we use linear interpolation to fill in the gaps. Although we could estimate our features without this interpolation, simply using the times that are available for a patient, we wish to account for the possibility that the observed trajectories at adjacent times to the missing data block are informative about the occurrence of missing values. One can alternatively use more advanced interpolation or model the measurements as noisy observations of the underlying function xi, though we do not consider such approaches in this article.
We let f denote the marginal density of xi(t). In the application, f is the marginal density of blood pressure measurements. We define this marginal density to satisfy xi(t) ∼ f, with sampled from the population and t sampled uniformly from . In addition, let F denote the corresponding cumulative distribution function (CDF). Motivated by the application in Section 1.1, we want to define a method for upweighting the importance of local features of xi for times t where xi(t) is in the tails of f. In the application, these tails correspond to unusually high or low blood pressure values calibrated relative to the distribution of blood pressure values across individuals having non-cardiac surgical procedures. Let ui(t) = F{xi(t)}, for , denote trajectories that are transformed to have standard uniform marginal density.
2.1.1 Local features
2.1.2 Extrema weighting
Weight functions ωj(u) on [0, 1] allow differential weights of the local functional dynamic features ψj according to the quantile ui(t). This is motivated by the hypothesis that a specific blood pressure local functional feature affects the outcome in varying degrees depending on whether it appears at low, medium or high blood pressures.
Examples of ωLj and ωRj with bLj = 0.3 and bRj = 0.9, respectively
Examples of ωLj and ωRj with bLj = 0.3 and bRj = 0.9, respectively
Supplementary Section S2 compares the results from Section 3.3 with this quadratically decaying choice for ωj to linearly and exponentially decaying weighting. The sensitivity analysis reveals that, while the choice is important, our main finding is consistent across these different choices for ωj.
2.1.3 Extrema-weighted features
The XWFs have an interpretation as weighted averages. If ωj(⋅) = 1, then the XWFs are time averages of the ψs. Since ωj varies, they become weighted: wL2 is a time average of blood pressure values where low values are weighted more. Similarly, wR3 is a time average of increases in blood pressure where times with high blood pressures are weighted more.
2.1.4 Estimation algorithm
We estimate the density f through a weighted kernel smoothing approach. In particular, we treat the data as arising from a stratified sampling design, and assign each blood pressure measurement xi(tik), k = 1,…, ni, i = 1,…, n, sampling weight 1∕Ti, which is inversely proportional to the surgery duration. From an estimate of the density f, we can obtain an estimate of the CDF F.
The integral in defining the XWFs in (1) is one-dimensional and can be accurately approximated using a variety of numerical methods. Then, once accurate approximations of the feature vectors wLi and wRi are produced, conditionally on the unknown bL and bR in the weight functions ω, we can easily obtain penalized maximum likelihood estimates for the coefficients in (2) using the R package ‘mgcv’. The smoothness of the estimated functions is determined by minimizing the generalized cross-validation (GCV) score, when the scale parameter is unknown, or the unbiased risk estimator (UBRE) score, when the scale parameter is known such as here in logistic regression (Wood, 2017). Therefore, a grid search can find the bL and bR that minimize the GAM UBRE score. We employ an adaptive grid search akin to binary search, detailed in Algorithm 1. We use M = 3 in this article such that we get bLjs and bRjs at a resolution of 2–4.
2.2 Alternative approaches for comparison
To evaluate the XWF extraction in perspective, we consider two other approaches: A measure from the medical literature, average real variability and a method based on the power spectrum of blood pressure trajectories which is closer to traditional FDA approaches.
2.2.1 Average real variability
To compare the XWFs with this more straight-forward measure, we fit the data in a generalized additive model with one of the predictors being the ARV. Specifically, the alternative approach consists of replacing the 2p XWFs in (2) by the ARV.
2.2.2 Power spectrum
Being able to capture local dynamics rather than global effects is an important characteristic of XWFs. An issue with global effects is that they require some kind of alignment across observations of the functional predictor. By transforming the functional predictor to the frequency domain, one also moves away from this alignment requirement. Therefore, we consider principal components analysis on the power spectrum of the functional predictor.
We compute the power spectrum of each xi via a fast Fourier transform. Since a spectrum is a function itself, we treat it as a functional predictor. Specifically, we compute its supervised principal components (PCs) (Bair and Tibshirani, 2004; Bair et al., 2006). The supervised PC threshold is determined by cross-validation. As detailed in Ramsay and Silverman (2005, Chap. 8), principal component computation for the spectra, being discretized functions, does not differ from vector-valued data. The three largest PCs replace the XWFs in (2). That is, supervised PCs analysis on the power spectrum of the functional predictor provides features for a generalized additive model.
Adaptive grid search to optimize the weighting functions
Initialization: Set bLj = 0.25 and bRj = 0.75 for j = 1,…, P.
For m = 1,…, M, do the following for j = 1,…, P:
Compute the UBRE score for when bLj is 2–1–m larger than its current value, equal to its current value, and when it is 2–1–m smaller than its current value, while not changing any other parameters. Increase or decrease bLj by 2–2–m if the first or the third UBRE score, respectively, is the smallest. Otherwise, do nothing.
Compute the UBRE score for when bRj is 2–1–m larger than its current value, equal to its current value, and when it is 2–1–m smaller than its current value, while not changing any other parameters. Increase or decrease bRj by 2–2–m if the first or the third UBRE score, respectively, is the smallest. Otherwise, do nothing.
Return: Values for bLj and bRj, j = 1,…, P.
2.3 Inference
Our XWFs algorithm estimates the weighting functions by minimizing the UBRE score. The generalized additive model is then fit with the resulting XWFs as predictors. The P-values that the GAM provides (Wood, 2013) however do not take into account any estimation uncertainty in the weighting functions. To obtain principled measures of statistical significance, we obtain P-values via a permutation test where the weighting functions are re-estimated for each permutation.
For consistency, the P-values for the alternative approaches, the ARV and the power spectrum, are obtained in the same fashion. An added benefit of this approach is that it does not require assumptions on the data generating process for the inference to be valid.
3 Results
This section starts with two simulation studies to understand the performance and interpretation of all three methods considered. It concludes with the results from the blood pressure application. Supplementary Section S1 contains additional simulations.
3.1 Frequency simulation
Supplementary Table S1 contains the P-values for the XWFs, ARV and the power spectrum. The P-values for the zij were not significant and were left out. Figure 3 contains the principal components used in the power spectrum method. The weighting functions of the statistically significant XWFs (with P <0.05) were estimated as bR1 = 0.84 bL2 = 0.03, bR3 = 0.53 and bR4 = 0.75. The corresponding smooth GAM functions are shown in Figure 4. The confidence region is from the GAM and ignores uncertainty in our estimation of the weighting functions. The horizontal axes range from the 1st to the 99th percentile of the XWF.
Top three supervised principal components of the power spectra from the frequency simulated data
Top three supervised principal components of the power spectra from the frequency simulated data
Statistically significant smooth functions from the generalized additive model for the frequency simulation with 95% confidence region in gray
Statistically significant smooth functions from the generalized additive model for the frequency simulation with 95% confidence region in gray
Since ARV increases with ϕi, ARV finds a trend between variability and the outcome in the simulated data. Similarly, ϕi directly relates to the frequencies in the power spectrum. Therefore, the power spectrum method also finds an effect. However, it is hard to interpret the first spectrum principal component, even when knowing the data-generating process.
Extrema-weighted feature selection picks up that variability at higher values increases P(yi = 1): The magnitude of the derivative of xi is proportional to ϕi if xi(t) > 1. The XWFs wR3 and wR4 measure the magnitude of the derivative since and . What happens at higher values of xi(t) contributes more to the values of wR3 and wR4. According to the data-generating process, yi is positively associated with ϕi. Figure 4 shows that yi is also positively associated with wR3 and wR4. This association is thus reflective of the data-generating process. The other two significant XWFs, wR1 and wL2, do not vary with ϕi. Their association with yi as implied by their statistical significance is not consistent with the true model where yi depends on ϕi only. The results from the XWF-based analysis yield observations consistent with the underlying data-generating process via wR3 and wR4 with richer and clearer interpretations than the alternative methods.
3.2 XWF simulation
The three supervised principal components of the power spectra from the XWF simulated data
The three supervised principal components of the power spectra from the XWF simulated data
Statistically significant smooth functions from the generalized additive model for the XWF simulation with 95% confidence region in gray
Statistically significant smooth functions from the generalized additive model for the XWF simulation with 95% confidence region in gray
Average real variability measures variation which is associated with yi via wR3. As a result, ARV finds a trend. The supervised principal components fail to find a signal with the frequency spectrum dominated by the autoregressive process: The autoregressive parameters do not influence y directly. Figure 6 shows that wL1 and wR3 capture most of the variation in y. This is consistent with how y was simulated from wL1 and wR3. Also, the estimates for bL1 and bR3 are close to the values used for simulation.
3.3 Blood pressure data
Table 1 contains the results for the three methods when applied to the blood pressure data. The P-values for the zij were omitted, being all.01. Figure 7 contains the supervised principal components used in the power spectrum method. For computational stability of the supervised principal components, we truncated the spectra at a frequency of a 1000. The weighting function of the statistically significant XWF was estimated as bL2 = 0.03. The corresponding smooth GAM function is shown in Figure 8.
Results from the blood pressure data
| Method . | Parameter . | P-value . |
|---|---|---|
| XWFs | βL1 | 0.13 |
| βR1 | 0.07 | |
| βL2 | 0.01 | |
| βR2 | 0.14 | |
| βL3 | 0.78 | |
| βR3 | 0.49 | |
| βL4 | 0.45 | |
| βR4 | 0.63 | |
| Average real variability | ARV | 0.01 |
| Power spectrum | PC 1 | 0.01 |
| PC 2 | 0.31 | |
| PC 3 | 0.02 |
| Method . | Parameter . | P-value . |
|---|---|---|
| XWFs | βL1 | 0.13 |
| βR1 | 0.07 | |
| βL2 | 0.01 | |
| βR2 | 0.14 | |
| βL3 | 0.78 | |
| βR3 | 0.49 | |
| βL4 | 0.45 | |
| βR4 | 0.63 | |
| Average real variability | ARV | 0.01 |
| Power spectrum | PC 1 | 0.01 |
| PC 2 | 0.31 | |
| PC 3 | 0.02 |
Results from the blood pressure data
| Method . | Parameter . | P-value . |
|---|---|---|
| XWFs | βL1 | 0.13 |
| βR1 | 0.07 | |
| βL2 | 0.01 | |
| βR2 | 0.14 | |
| βL3 | 0.78 | |
| βR3 | 0.49 | |
| βL4 | 0.45 | |
| βR4 | 0.63 | |
| Average real variability | ARV | 0.01 |
| Power spectrum | PC 1 | 0.01 |
| PC 2 | 0.31 | |
| PC 3 | 0.02 |
| Method . | Parameter . | P-value . |
|---|---|---|
| XWFs | βL1 | 0.13 |
| βR1 | 0.07 | |
| βL2 | 0.01 | |
| βR2 | 0.14 | |
| βL3 | 0.78 | |
| βR3 | 0.49 | |
| βL4 | 0.45 | |
| βR4 | 0.63 | |
| Average real variability | ARV | 0.01 |
| Power spectrum | PC 1 | 0.01 |
| PC 2 | 0.31 | |
| PC 3 | 0.02 |
Top three principal components of the power spectra from the blood pressure trajectories
Top three principal components of the power spectra from the blood pressure trajectories
Statistically significant smooth function from the generalized additive model for the blood pressure application with 95% confidence region in gray
Statistically significant smooth function from the generalized additive model for the blood pressure application with 95% confidence region in gray
Figure 9 shows the blood pressure trajectories with the most extreme values for the XWFs, out of all 18 657 trajectories. It is striking how the trajectories vary in length: One of the goals of the XWFs was to be well-defined over, and be comparable across, functions with greatly varying sizes of domain.
The blood pressure trajectories with the highest (black) and lowest (gray) values for the XWFs
The blood pressure trajectories with the highest (black) and lowest (gray) values for the XWFs
While the XWFs are defined in an interpretable fashion, their values are not always readily gleaned from a trajectory plot. It is for instance hard to eyeball that the black trajectory has a higher XWF value than the gray trajectory for wL2 and wL3 in Figure 9. However, that wL1 and wR1 measure difference in level is clearly visible with the black and gray trajectories separated in terms of average value. Similarly, both trajectories plotted for wR3 and wR4 show a clear difference in variability.
Average real variability and the power spectrum method find an association between the blood pressure trajectories and postoperative mortality. However, only the XWFs reveal that low blood pressure values are predictive. Figure 8 shows that higher and more blood pressure values at the lower extreme (wL2) are associated with increased mortality. This interpretation is different from how one would interpret ARV. This shows the importance of considering richer features than ARV.
3.3.1 Relation with existing findings
Our result for ARV is consistent with previous literature that showed that an increased ARV is associated with adverse outcomes (Hansen et al., 2010; Mena et al., 2005). ARV is, to our knowledge, the most advanced method used previously to assess the effect of the blood pressure variability.
As XWFs provide richer insights, they cannot be contrasted one-to-one with former findings. We can instead look at how the richer insights from the XWF results fit into previously found associations. Having a larger proportion of blood pressure values in the extremes was associated with mortality. Extreme values are more common in the presence of high blood pressure standard deviation or ARV which previous study also found to be associated with increased mortality (Parati et al., 2013).
3.3.2 Predictive performance
Using specific predefined features instead of general methods that are less constrained in how they capture variability in the data, might hurt the overall explanatory power. To assess whether XWFs suffer from this in the motivating application, we look at predictive performance. We randomly select 1000 cases, 100 with yi = 1 and 900 with y1 = 0, as test data leaving training data of size 17 657. We compute the XWFs, ARV and supervised principal components as before on the training data. Three predictive models are considered: A GAM with the zij and only XWFs, with XWFs and ARV, and with XWFs and supervised principal components. We compare the performance of these on the test data to assess any loss in explanatory power by being constrained to XWFs.
Area under the curve (AUC) of the receiver operating characteristic quantifies predictive performance. Using 10 repeated random splits of the data into training and test set, we obtain 10 AUCs for each of the three models considered. They range from 0.56 to 0.64 while the absolute differences in AUC between the models are <0.01, none of which are statistically significant. Additionally, Supplementary Section S3 similarly does not show a notable difference in the precision-recall curves. The usage of the more interpretable predefined XWFs does not seem to lessen explanatory power compared to approaches which have a greater focus on predictive performance and less on interpretability.
4 Discussion
Existing functional feature extraction methods focus on functions that share the same domain and mostly extract global features. Motivated by rich blood pressure data from surgeries, we introduced a new feature extraction method, called XWFs, allowing for differing domains while aiming at functional features that are defined in a local nature, like variability. We compared XWFs to ARV, a functional feature from the medical literature, and to supervised principal components analysis on the power spectrum of the functional predictor.
Simulations reveal that XWFs are not only more easily interpretable due to the way they are defined, but they are also more flexible in detecting associations that do not fit traditional models. This was instrumental in finding a relationship between blood pressure trajectories during surgery and the outcome postoperative mortality. Whereas the power spectrum principal components detected a hard to interpret association and ARV only detects that variability is predictive, the proposed XWF model found that increased blood pressure at both extremes of the observed blood pressure range are associated with increased mortality. The XWF results directly address questions of interest to the clinician collaborators. Despite their predefined and specific nature, XWFs had similar explanatory power as competing methods.
This article presented a set of well-defined features applicable to the data at hand. The ideas behind them and the general framework of allowing varying domains in the functional predictors and taking weighted averages of local phenomena can however be applied in a larger variety of ways, making the methods from this article widely applicable to functional data.
Many parts of the XWF model provide choices to adapt the method to other applications. Weighting functions can be chosen and parametrized as desired. Similarly, the generalized additive model shows one way to use XWFs but the features can be broadly used within different models and machine learning methods.
We presented a novel functional feature extraction method motivated by modeling the effect of intraoperative blood pressure trajectories on postoperative mortality. The XWFs proposed are by design suited for functional predictors whose domains vary. The XWFs outperform more traditional methods and FDA tools, both in simulations and in the application of interest. The proposed model constitutes a modifiable framework for interpretable functional feature extraction.
Acknowledgements
We would like to thank the anonymous referees for suggestions that substantially improved the article and Dr Michael W. Manning and Dr Tracy L. Setji for their discussions of the medical data.
Funding
This work was supported by Accenture PLC.
Conflict of Interest: none declared.









