
Contents
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Introduction Introduction
-
Materials and Methods Materials and Methods
-
Animals and Experimental Procedures Animals and Experimental Procedures
-
Data Preprocessing and Feature Engineering Data Preprocessing and Feature Engineering
-
Exploratory Data Analysis and Statistical Hypothesis Testing Exploratory Data Analysis and Statistical Hypothesis Testing
-
Machine Learning Machine Learning
-
Model Training and Evaluation Model Training and Evaluation
-
Cohort Similarity Assessment Cohort Similarity Assessment
-
Implementation Implementation
-
-
Results Results
-
Exploratory Data Analysis and Hypothesis Tests Exploratory Data Analysis and Hypothesis Tests
-
Naïve versus Sham-Operated Experimental Controls versus TBI Groups Naïve versus Sham-Operated Experimental Controls versus TBI Groups
-
TBI+ versus TBI– Animals TBI+ versus TBI– Animals
-
-
Classification Classification
-
Naïve versus Sham-Operated Experimental Controls versus TBI Groups Naïve versus Sham-Operated Experimental Controls versus TBI Groups
-
TBI+ versus TBI– Animals TBI+ versus TBI– Animals
-
-
Subcohort Similarity Subcohort Similarity
-
-
Discussion Discussion
-
Acknowledgments Acknowledgments
-
Disclosure Statement Disclosure Statement
-
References References
-
-
-
-
-
-
-
-
-
-
-
-
40 Machine-Learning Approach to Discover Novel Biomarkers for Posttraumatic Epilepsy
-
Published:May 2024
Cite
Abstract
In this chapter, machine learning was applied to detect traumatic brain injury (TBI), posttraumatic epilepsy (PTE), and severity of PTE in a rat TBI model using data from animal weight, parameters related to TBI induction, multiple behavioral tests, and cortical lesion volumes calculated from magnetic resonance imaging (MRI). The combined data from all assays allowed perfect separation of naïve and sham-operated control animals from TBI animals. However, PTE was only detectable when the threshold for PTE was set at three or more observed seizures. These results imply that less severe forms of PTE were poorly reflected on the assays used.
Introduction
Globally, an estimated 2.4 million people are diagnosed with epilepsy each year. Thus, a new person is diagnosed with epilepsy every 13 seconds (WHO, 2019). In 60% of those affected, epileptogenesis is initiated by structural causes such as traumatic brain injury (TBI; Hauser et al., 1993; Scheffer et al., 2017). Over 10 hypothesis-driven monotherapy approaches have demonstrated some disease-modifying effects in animal models of posttraumatic epileptogenesis (Dulla and Pitkänen, 2021). Currently, however, no clinical treatments are available to stop or alleviate epileptogenesis in at-risk patients after TBI or to alleviate the course of posttraumatic epilepsy (PTE) after its diagnosis. One major reason for the stalled progression of compounds showing proof-of-concept evidence in animal models to clinical antiepileptogenesis trials is the lack of prognostic biomarkers for epileptogenesis. Such biomarkers could be used to stratify patient populations for antiepileptogenesis trials and reduce study costs, making sufficiently powered clinical trials affordable (Engel et al., 2013; Pitkänen et al., 2018).
Our previous work utilizing univariate statistical assay indicated that behavioral tests or assessments of cortical lesion volume performed poorly in separating rats with and without PTE (Manninen et al., 2020; Lapinlampi et al., 2021). Conventional statistical analyses emphasize explanation of phenomenon under study, for example, by null hypothesis testing for presumed univariate difference between the groups. Conversely, machine learning (ML) excels at prediction, deriving complex multivariate, and possibly nonlinear relationships between patterns in data and outcomes of interest. The relationships are encapsulated in a form of a model, for which the defining parameters are learned from the data without domain expertise built into the model. The ability to derive predictive models without human bias allows screening of data for the presence of patterns that can be further refined into biomarkers, by assessing the performance of models learned from the data. Once learned and validated, ML models can be reverse engineered to gain insights into how variables in the data relate to PTE. The variables relevant to classification can be further refined in downstream analyses to transform them into a human interpretable formula or encapsulated directly in a ML model.
We utilized ML to assess whether animals with epilepsy after lateral fluid-percussion-induced brain injury (TBI+) can be separated from animals without epilepsy (TBI–), using animal weight during the follow-up, TBI induction parameters, behavioral measurements, and cortical lesion volumes extracted from magnetic resonance imaging (MRI). We applied the same approach to separate animals with TBI from naïve and sham-operated experimental controls. Because the large cohorts of animals studied were generated by inducing TBI in eight consecutive subcohorts over a 3-year period, we also studied the degree of inter-cohort differences. Intuitively, a possibility to reliably separate samples into cohorts based on feature values signals marked inter-cohort differences. Specifically, we address the following questions: Can we separate (1) naïve/sham animals from TBI animals and (2) TBI+ animals from TBI– animals based on given unimodal or multimodal feature sets, (3) which assay features differentiate the groups, and (4) will a linear combination of measurement variables suffice for PTE classification?
Materials and Methods
Animals and Experimental Procedures
The study outline is summarized in Figure 40–1A. The study population was generated in eight successive subcohorts of 18 to 22 rats per cohort over a period of 3 years. Rats were randomized into the naïve, sham-operated experimental control or TBI groups. A total of 150 rats completed the 6 months follow-up and were included in the analysis. Of these, 13 were naïve animals, 23 were sham-operated experimental controls, and 114 were rats with lateral fluid-percussion injury (FPI)-induced TBI. Of the 114 TBI animals, 29 had unprovoked electrographic seizures during the sixth month of video-EEG monitoring and were diagnosed with epilepsy (TBI+). Eighty-five of the 114 rats with TBI did not show any seizures (TBI–). Seven TBI+ rats had 1 unprovoked seizure during the sixth postinjury month, 2 had 2 seizures, and 20 had ≥3 seizures. Details of procedures used to induce lateral FPI, behavioral assessment, neuroscore, elevated plus maze (EPM), open field (OF), Morris water maze (MWM), sucrose preference (SP), forced swimming (FS), and video-EEG monitoring were described by Lapinlampi et al. (2021). Cortical lesion volume was assessed as described by Manninen et al. (2020).

A. The study population was generated in eight successive sub-cohorts of 18 to 22 rats per cohort over a period of 3 years. Rats were randomized into the naïve, sham-operated experimental control or TBI groups. A total of 150 rats completed the 6 months follow-up and were included in the analysis. Uniform Manifold Approximation and Projection (UMAP) for the (B-E) combination of all modalities and (F) neuroscore. B. Naïve, sham-operated and TBI groups were clearly separated from each other in the feature space. C-E. Division of TBI animals into four separate groups is evident from the visualization. D. A complete homogeneity of one of the clusters was reached when the threshold of seizure frequency/month was increased to ≥ 2 (i.e., only the rats with more severe epilepsy were included in the TBI + group). E. The number of TBI + animals in the bottom right cluster decreases by two, but the cluster still consists of a mixture of TBI + and TBI- animals. F. No clustering of the eight subcohorts of animals was evident from the cohort-wise UMAP. G. Neuroscore separated animals in the TBI group from those in the naïve and sham-operated groups well, with the exception of two naïve animals.
Data Preprocessing and Feature Engineering
We considered measurements from several different assays as modalities: body weight, neuroscore, EPM, OF, MWM, SP, FS, and MRI lesion volume and variables related to TBI induction (impact pressure, apnea time, duration of immediate postimpact seizure-like behavior). The modalities themselves were composed of features corresponding to assay measurements (e.g., animal weight on day 1). Furthermore, two multimodal feature sets were generated by combining all available modalities (termed “All”) and all behavioral assays (termed “Behavior”). The list of the modalities and the features they contained are presented in Table 40–1. The total number of features per modality was 440 for All, 316 for Behavior, 54 for MRI, 50 for EPM test, 24 for OF test, 14 for MWM test, 14 for FS test, 206 for neuroscore test, 8 for SP test, 67 for body weight, and 3 for TBI induction parameters. Missing measurements were linearly imputed for animal weight. For other modalities missing values were denoted with appropriate values outside the range of feature values attainable in the experiment. No imputation was performed during feature-wise univariate statistical tests and instead samples with missing values for the tested feature were excluded. The heading degrees from MWM were split into two features presenting the sine and cosine of the heading angle. For modalities for which time series of measurements were available, engineered temporal features were produced by calculating time series mean, median, standard deviation, difference between subsequent measurements, and the slope of a linear model fitted through the measurements. Three different datasets for TBI– versus TBI+ classification were generated by varying the threshold between TBI– and TBI+ from 1 to 3 seizures. For the remaining chapter, these specific datasets are referred as TBI1±, TBI2±, and TBI3±. In the TBI1± dataset, the TBI– class consisted of animals without observed seizures and the TBI+ class consisted of animals with at least one observed seizure. In the TBI2±(TBI3±)dataset, the TBI– class consisted of animals with less than 2 (3) observed seizures and the TBI+ class consisted of the animals with at least 2 (3) observed seizures. Thus, the datasets contained for naive/sham versus TBI classification 150 samples (13 naive, 23 sham, 114 TBI) and for TBI– versus TBI+ classification 114 samples (29 TBI1+, 85 TBI1–; 22 TBI2+, 92 TBI2–; 20 TBI3+, 94 TBI3–).
Modality . | Feature . | Time Points . |
---|---|---|
EPM | Total distance | D28, D126 |
Center duration | ||
Latency to first center entry | ||
South duration | ||
South frequency | ||
Latency to first south entry | ||
East duration | ||
East frequency | ||
Latency to first east entry | ||
West duration | ||
West frequency | ||
Latency to first west entry | ||
North duration | ||
North frequency | ||
North latency to first | ||
Open arms duration | ||
Open arms frequency | ||
Latency to first open arms entry | ||
Closed arms duration | ||
Closed arms frequency | ||
Closed arms latency to first | ||
Mean velocity | ||
Total entries | ||
Open arms entries proportion | ||
FST | Climbing duration | D42, D132 |
Climbing frequency | ||
Swimming duration | ||
Swimming frequency | ||
Immobility duration | ||
Immobility frequency | ||
Immobility latency | ||
MRI | Lesion size on threshold 1 | D2, D7, D21 |
Lesion size on threshold 2 | ||
Lesion size on threshold 3 | ||
Lesion size on threshold 4 | ||
Lesion size on threshold 5 | ||
Total lesion size | ||
MWM | Total distance moved | D41 |
Mean distance traveled to platform | ||
Mean heading | ||
Duration in platform zone | ||
Entry frequency to platform zone | ||
Latency to first entry to platform zone | ||
Duration in zone 1 | ||
Duration in northeast zone | ||
Entry frequency to northeast zone | ||
Latency to first entry to northeast zone | ||
Duration in southeast zone | ||
Duration in southwest zone | ||
Duration in northwest zone | ||
Rotation frequency | ||
Rotation frequency 2 | ||
Mean velocity | ||
Neuroscore | Left contraflexion | D0, D2, D6, D14 |
Right contraflexion | ||
Left hindlimb flexion | ||
Right hindlimb flexion | ||
Left lateral pulsion | ||
Right lateral pulsion | ||
Angleboard left | ||
Angleboard right | ||
Angleboard score | ||
Total score left | ||
Total score right | ||
Neuroscore | ||
OF | Total distance | D29, D127 |
Duration in outer zone | ||
Outer zone entry frequency | ||
Latency to first outer zone entry | ||
Duration in middle zone | ||
Middle zone entry frequency | ||
Latency to first middle zone entry | ||
Duration in inner zone | ||
Inner zone entry frequency | ||
Latency to first inner zone entry | ||
Mean velocity | ||
Total entries | ||
Sucrose consumption | Sugar consumption | D40, D140 |
Water consumption | ||
Total consumption | ||
TBI | Hit pressure | D0 |
Apnea time | ||
Seizure duration | ||
Lesion size | Total lesion size | - |
Weight | Animal weight | D-1, D0, D1, D2, D3, D4, D5, D6, D7, D14, D21, D28, D29, D32, D35, D42, D49, D56, D77, D84, D112, D119, D126, D127, D130, D140, D141, D142, D143, D147, D168, D183 |
Modality . | Feature . | Time Points . |
---|---|---|
EPM | Total distance | D28, D126 |
Center duration | ||
Latency to first center entry | ||
South duration | ||
South frequency | ||
Latency to first south entry | ||
East duration | ||
East frequency | ||
Latency to first east entry | ||
West duration | ||
West frequency | ||
Latency to first west entry | ||
North duration | ||
North frequency | ||
North latency to first | ||
Open arms duration | ||
Open arms frequency | ||
Latency to first open arms entry | ||
Closed arms duration | ||
Closed arms frequency | ||
Closed arms latency to first | ||
Mean velocity | ||
Total entries | ||
Open arms entries proportion | ||
FST | Climbing duration | D42, D132 |
Climbing frequency | ||
Swimming duration | ||
Swimming frequency | ||
Immobility duration | ||
Immobility frequency | ||
Immobility latency | ||
MRI | Lesion size on threshold 1 | D2, D7, D21 |
Lesion size on threshold 2 | ||
Lesion size on threshold 3 | ||
Lesion size on threshold 4 | ||
Lesion size on threshold 5 | ||
Total lesion size | ||
MWM | Total distance moved | D41 |
Mean distance traveled to platform | ||
Mean heading | ||
Duration in platform zone | ||
Entry frequency to platform zone | ||
Latency to first entry to platform zone | ||
Duration in zone 1 | ||
Duration in northeast zone | ||
Entry frequency to northeast zone | ||
Latency to first entry to northeast zone | ||
Duration in southeast zone | ||
Duration in southwest zone | ||
Duration in northwest zone | ||
Rotation frequency | ||
Rotation frequency 2 | ||
Mean velocity | ||
Neuroscore | Left contraflexion | D0, D2, D6, D14 |
Right contraflexion | ||
Left hindlimb flexion | ||
Right hindlimb flexion | ||
Left lateral pulsion | ||
Right lateral pulsion | ||
Angleboard left | ||
Angleboard right | ||
Angleboard score | ||
Total score left | ||
Total score right | ||
Neuroscore | ||
OF | Total distance | D29, D127 |
Duration in outer zone | ||
Outer zone entry frequency | ||
Latency to first outer zone entry | ||
Duration in middle zone | ||
Middle zone entry frequency | ||
Latency to first middle zone entry | ||
Duration in inner zone | ||
Inner zone entry frequency | ||
Latency to first inner zone entry | ||
Mean velocity | ||
Total entries | ||
Sucrose consumption | Sugar consumption | D40, D140 |
Water consumption | ||
Total consumption | ||
TBI | Hit pressure | D0 |
Apnea time | ||
Seizure duration | ||
Lesion size | Total lesion size | - |
Weight | Animal weight | D-1, D0, D1, D2, D3, D4, D5, D6, D7, D14, D21, D28, D29, D32, D35, D42, D49, D56, D77, D84, D112, D119, D126, D127, D130, D140, D141, D142, D143, D147, D168, D183 |
Exploratory Data Analysis and Statistical Hypothesis Testing
To assess the tendency of the data to form clusters, we used dimensionality reduction with uniform manifold approximation and projection (UMAP) to visualize modalities (McInnes et al., 2020). Univariate differences between naïve, sham, and TBI groups were assessed using Kruskal-Wallis test with post hoc multiple comparison analysis with Dunn test. A 95% bootstrap confidence interval for differences between the means of groups in each feature was calculated to gauge the size of the differences in group means. A pairwise Mann-Whitney U test was applied to compare the differences between TBI– and TBI+ groups with TBI+ membership criteria varying from ≥1 to ≥3 observed seizures during the follow-up. The Benjamini-Hochberg procedure was used to control for false discovery rate (FDR) in all tests (Benjamini & Hochberg, 1995).
Machine Learning
Succinctly, ML solves the optimization problem
where presents a mapping from an input vector to the corresponding target variable yi. The vectors —referred to as feature vectors—correspond to measurements from a single animal , whereas values encode class membership. The estimated function is encapsulated in the form of a mathematical structure known as model and the process of adjusting model parameters to minimize the mapping error, compounded with regularization term presenting model complexity, is referred to as training. The mapping error is expressed in terms of a loss function , which measures the divergence between known values of and the values mapped by the model.
The regularization term is derived from model parameters. Regularization alleviates the risk of failing to estimate the actual general relationship between and by overemphasizing experiment specific details present in the training data. This focus on specificities of the training data is referred to as overfitting, and it manifests as a decrease in the model’s ability to correctly assign labels to vectors not utilized in training, that is, to generalize outside the training data to a whole population.
In this work, we utilized k-nearest neighbors (Cover and Hart, 1967), Random Forests (Breiman, 1984;Breiman, 2001), and regularized logistic regression (Tibshirani, 1996; Hastie et al., 2001) as classification models.
Model Training and Evaluation
For model training and evaluation, nested cross-validation (CV) with grid search for hyperparameter optimization was performed by using a stratified 10-fold split both on the upper and lower CV level. During training, all features were standardized to zero mean and unit variance. For each CV fold, the standardization of test set was performed using the mean and standard deviation of the training set in order to avoid bias by information leakage from training to testing. Features with zero variance, that is, features which were constant across samples, were excluded from each fold’s testing and training sets.
In the inner training loop, a separate feature selection step was performed prior to model training utilizing recursive feature elimination (RFE) and univariate filtering with Kolmogorov-Smirnov test (KST). In RFE the least informative feature in terms of reduction in variance Gini index was repeatedly removed until target number of retained features (set as a RFE hyperparameter) was reached. The number of retained features was jointly optimized with the model hyperparameters. To avoid bias induced by preselected feature selection method, as an alternative to RFE and KST filtering the complete set of features was passed to the classifier in the feature selection step. The best performing feature combination and model configuration in terms of area under receiver-operator characteristic curve (AUC) on the inner CV was utilized to evaluate the model. Separate models were trained for naïve/sham versus TBI and TBI– versus TBI+ classification.
Classification performance was evaluated by calculating AUC, accuracy, sensitivity, and specificity on pooled predictions from the outer CV loop (Airola et al., 2009). To alleviate randomness caused by small sample size, nested CV was repeated 10 times with different random number generator seeds, and the performance metrics were averaged over the 10 repeats. Classification concordance was calculated as a ratio of correct classification of each animal over the 10 nested CV repeats. RF feature importance was calculated by averaging the feature specific mean decreases in Gini index over 10 CV folds, and p-values were assigned to RF importance using the permutation method (Altmann et al., 2010) with 500 iterations. Similarly, permutation testing with 500 iterations was used to assess the statistical significance of TBI– versus TBI+ classification scores reaching at least moderate AUC.
Statistical analysis on the differences between classifier performance between RF trained for TBI– versus TBI+ classification on the explored thresholds of TBI+ membership, and between LR and RF trained for TBI3– TBI3+ was conducted using 20 × 10cv paired t-test (Bouckaert and Frank, 2004) performed on averages over sorted runs. For the statistical analysis, nested CV was repeated 20 times using 10-fold nonstratified CV on the outer CV loop, and t-test was performed over the means of classifier accuracies from ranked folds over 20 runs. This approach has been previously shown to yield lower type I error and higher replicability than direct comparison over folds (Bouckaert, 2003, 2004). For 20 × 10cv correction for multiple comparison over models and thresholds was performed using the Bonferroni-Holm method (Holm, 1979).
Cohort Similarity Assessment
To gauge the level of inter-cohort similarity in the analyzed modalities, multivariate intra-class correlation (ICC) and 1-versus-rest classification of cohorts was utilized. If no inter-cohort differences are present, the ratio of within cohort distances in the measurement space between samples should be similar between cohort distances, which results in ICC near zero. Similarly, when inter-cohort differences are nonexistent, it is not possible for a ML classifier to differentiate a single cohort from the rest.
ICC 95% confidence intervals were computed using bootstrapping. In classification experiments K-nearest neighbor classifier (KNN) was used. In classification experiments, leave-one-out CV was utilized in the outer loop of nested CV.
Implementation
Data processing and analyses were performed using Python 3.9 and Cython 0.29.22. Kruskall-Wallis, Mann-Whitney U, Dunn, and Kolmogorov-Smirnov tests were conducted using statsmodels 0.13. Dimensionality reduction via UMAP was performed using umap-learn 0.5.1. For ML and CV, scikit-learn 1.0 was utilized. During grid search KS p-cutoff was varied between 0.05 and 0.1, and the maximum number of features in RFE was between 5 and 15. The number of trees in RF was varied between 100 and 1000, with the maximum number of features per tree kept at default values of square root of the number of sample features. The depth of trees was unrestricted, and the number of samples in bootstrapped datasets during bagging was equal to number of samples in the dataset.
Results
Exploratory Data Analysis and Hypothesis Tests
Naïve versus Sham-Operated Experimental Controls versus TBI Groups
Figure 40–2B presents UMAP visualization for clustering of naïve, sham, and TBI groups in the feature space of the “All” modality. Combination of all modalities separated TBI animals from naïve and sham-operated rats. Naive and sham animals were also separated. In the feature space of unimodal neuroscore only, TBI animals were well separated from naïve and sham animals (Fig. 40–1G), even though two naïve animals resembled the rats in the TBI group.

A. A box-plot of mean accuracies from paired 20 x 10 cv test. RF classifiers differ in performance on different TBI+ inclusion thresholds, and the RF for TBI3– vs. TBI3+ differs significantly from corresponding LR model (*p < 0.001). B. Feature importance in terms of mean decrease of Gini index in permutation test (*p < 0.05, **p < 0.01). C. Ratio of correct classification (TBI3– vs. TBI3) of individual animals over the 10 CV folds. Green line indicates the cumulative proportion.
Kruskal-Wallis test (KW) with Dunn post-hoc analysis revealed that naïve and sham animals differed in terms of slope of line fitted through the time series of body weight measurements (H 14, difference CI95[–0.33,–0.16]), with sham group showing a greater weight increase over time.
Naïve/Sham and TBI animals showed differences of several points in magnitude in the mean of almost all neuroscores (Dunn p < 0.001, H ≥ 38). Similarly, the mean body weight of TBI animals was up to 43.6 g (CI95[36.5,50.4, H 50, p < 0.001]) less than in naïve and up to 34.5 g less than in sham (CI95[27.4;41.2],H 52,2, p < 0.001) rats till day 42 post injury. In MWM, TBI animals traveled up to 18 cm longer distance to reach the platform (CI[14.7;21.9], H 64.8, Dunn p < 0.001), spent up to 10.7 s less time in the northeast zone (CI95[7.9;13.7], H 46.7, p < 0.0001), and swam approximately 6 cm/s faster (CI95[4.3;9.3], H 43.5, p < 0.0001).
TBI+ versus TBI– Animals
Next, we assessed whether the TBI+ and TBI– animals clustered separately and whether the separation was affected by the seizure frequency. Four distinct clusters were visible in the feature space of TBI group only (Fig. 40–1C–E). When the inclusion criteria for TBI+ was increased to ≥2 seizures (i.e., only the rats with more severe epilepsy were included into the TBI+ group), one of the clusters reached a complete homogeneity, indicating presence of a subgroup among TBI– samples discernable from TBI+.
MWU test showed TBI1– and TBI1+ groups to differ mainly (p < 0.05) in terms of EPM latency to first visit in north section (1049 s higher in TBI1–, CI95[474;1173], U 807), duration in the north section (12.5 s longer for TBI1+, CI95[1.7,29.4], U 826), total distance (256 cm longer for TBI1+, CI95[123.2;392.9], U 759), velocity (0.86 cm/s faster for TBI1+, CI95[0.41;1.34], U 761), and EPM north frequency (1 visit more for TBI1+, CI95[0.26;2.17], U 868.5). Similarly, in the OF test TBI1+ animals showed a 2.26 cm/s faster mean velocity (CI95[1.26;3.48], U 756) and 674 cm longer total distance (CI95[365.5;996.1], U 757) compared to TBI1–. Additionally, TBI1+ animals differed also in terms of several weight-related variables, with TBI1+ animals weighting up to 21.6 g (CI95[11.2;32.9], U 725.5) less before day 32.
The pattern persisted in TBI2– versus TBI2+ with comparisons, including differences in weight, EPM, and OF. TBI2+ rats had lower body weight on multiple time points (p < 0.01, U ≥ 449). In OF, they moved a 787 cm longer total distance (CI95[417.2;1140.4], U 559, p < 0.05) and with a 2.6 cm/s higher mean velocity (CI95[1.53;3.95], U 558, p < 0.05). In EPM, TBI2+ rats showed a 4.5 greater center area (CI95[2;7], U 556, p < 0.05) and 2.1 greater west area visiting frequency (CI95[0.87;3.4], U 639, p < 0.05), and 32 s longer west area visit duration (CI95[7.3;58.9], U 613, p < 0.05), as compared to TBI2–. Additionally, in contrast to TBI1– versus TBI1+, TBI2+ showed 0.42 points lower mean left (CI95[0.21;0.64], U 646, p < 0.05) and 0.32 lower right neuroscore (CI95[0.15; 0.56], U < 699, p < 0.05) than TBI2–.
TBI3+ showed a lower body weight (p < 0.01, U ≥ 402), visited on average 4.1 (CI95[1.53;6.74], U 531, p < 0.01) and 2.1 (CI95[0.79;3.41], U 593, p < 0.05) more often the EPM center and west areas, respectively, and spent 32 s longer (CI95[9.9;62.7], U 551, p < 0.05) on the EPM center area than TBI3–. They had 0.32 points lower mean left hind limb flexion score on day 6 than rats in the TBI3– group (CI95[0.17;0.6], U 629, p < 0.05).
Classification
Naïve versus Sham-Operated Experimental Controls versus TBI Groups
A perfect naïve/sham versus TBI classification was possible with RF utilizing the multimodal All and Behavioral modalities, resulting in a pooled AUC 1.0 (Table 40–2). RF achieved a similar performance using neuroscore and MRI. Using only weight-related variables, RF separated naïve/sham animals from TBI animals with a pooled AUC 0.96 and using MWM with pooled AUC 0.93. Similarly, LR showed a high performance when combined with the multimodal All and Behavior modalities, with pooled AUC 0.99 and 0.98, respectively. Utilizing neuroscore, LR reached a pooled AUC of 0.98 and using MWM 0.91. In contrast to RF, LR was not able to correctly assign classes utilizing only MRI (pooled AUC 0.69).
Modality . | Model . | Pooled AUC . | Sensitivity . | Specificity . |
---|---|---|---|---|
All | RF | 1.00 | 1.00 | 1.00 |
LR | 0.99 | 1.00 | 0.99 | |
Behavior | RF | 1.00 | 1.00 | 1.00 |
LR | 0.98 | 1.00 | 0.98 | |
MRI | RF | 1.00 | 1.00 | 1.00 |
LR | 0.69 | 1.00 | 0.36 | |
EPM | RF | 0.63 | 0.93 | 0.18 |
LR | 0.60 | 0.95 | 0.04 | |
OF | RF | 0.71 | 0.92 | 0.29 |
LR | 0.74 | 0.90 | 0.29 | |
MWM | RF | 0.93 | 0.95 | 0.73 |
LR | 0.91 | 0.90 | 0.83 | |
FS | RF | 0.73 | 0.92 | 0.38 |
LR | 0.69 | 0.81 | 0.47 | |
Neuroscore | RF | 1.00 | 1.00 | 1.00 |
LR | 0.98 | 1.00 | 0.98 | |
Sucrose preference | RF | 0.44 | 0.82 | 0.08 |
LR | 0.58 | 1.00 | 0.00 | |
Weight | RF | 0.96 | 0.96 | 0.79 |
LR | 0.98 | 0.95 | 0.86 |
Modality . | Model . | Pooled AUC . | Sensitivity . | Specificity . |
---|---|---|---|---|
All | RF | 1.00 | 1.00 | 1.00 |
LR | 0.99 | 1.00 | 0.99 | |
Behavior | RF | 1.00 | 1.00 | 1.00 |
LR | 0.98 | 1.00 | 0.98 | |
MRI | RF | 1.00 | 1.00 | 1.00 |
LR | 0.69 | 1.00 | 0.36 | |
EPM | RF | 0.63 | 0.93 | 0.18 |
LR | 0.60 | 0.95 | 0.04 | |
OF | RF | 0.71 | 0.92 | 0.29 |
LR | 0.74 | 0.90 | 0.29 | |
MWM | RF | 0.93 | 0.95 | 0.73 |
LR | 0.91 | 0.90 | 0.83 | |
FS | RF | 0.73 | 0.92 | 0.38 |
LR | 0.69 | 0.81 | 0.47 | |
Neuroscore | RF | 1.00 | 1.00 | 1.00 |
LR | 0.98 | 1.00 | 0.98 | |
Sucrose preference | RF | 0.44 | 0.82 | 0.08 |
LR | 0.58 | 1.00 | 0.00 | |
Weight | RF | 0.96 | 0.96 | 0.79 |
LR | 0.98 | 0.95 | 0.86 |
Note: Random forest detected TBI perfectly using a combination of all modalities (All), combination of all behavioral modalities (Behavior) and neuroscores. High naïve/sham versus TBI classification performance was achieved on all modalities except sucrose intake, forced swimming (FS), elevated plus-maze (EPM), and open field (OF) test.
Abbreviations: EPM, elevated plus maze; FS, forced swimming; MRI, magnetic resonance imaging; MWM, Morris water maze; OF, open field.
A reliable naive/sham classification was not possible using EPM, OF, FS, or sucrose consumption either with RF or LR.
TBI+ versus TBI– Animals
In TBI1– versus TBI1+ and TBI2– and TBI2+ classification, RF achieved pooled AUC 0.70 and 0.71, respectively, using the All modality (Table 40–3). When threshold was increased to ≥ 3 seizures, RF separated TBI3– from TBI3+ moderately with AUC 0.75 (permutation test, p < 0.05). Conversely, AUC of LR on the same modality was 0.61, 0.62, and 0.60 on thresholds 1, 2, and 3, respectively. In unimodal classification, none of the individual modalities enabled distinguishing TBI+ from TBI- on any threshold (Table 40–4).
Modality . | Threshold for TBI+ . | Model . | AUC . | Sensitivity . | Specificity . |
---|---|---|---|---|---|
All | 1 | RF | 0.71 | 0.23 | 0.91 |
LR | 0.61 | 0.43 | 0.74 | ||
2 | RF | 0.70 | 0.18 | 0.93 | |
LR | 0.62 | 0.36 | 0.82 | ||
3 | RF | 0.75 | 0.10 | 0.94 | |
LR | 0.60 | 0.33 | 0.83 | ||
Behavior | 1 | RF | 0.67 | 0.23 | 0.89 |
LR | 0.61 | 0.31 | 0.82 | ||
2 | RF | 0.56 | 0.04 | 0.94 | |
LR | 0.55 | 0.13 | 0.89 | ||
3 | RF | 0.49 | 0.13 | 0.85 | |
LR | 0.53 | 0.11 | 0.92 |
Modality . | Threshold for TBI+ . | Model . | AUC . | Sensitivity . | Specificity . |
---|---|---|---|---|---|
All | 1 | RF | 0.71 | 0.23 | 0.91 |
LR | 0.61 | 0.43 | 0.74 | ||
2 | RF | 0.70 | 0.18 | 0.93 | |
LR | 0.62 | 0.36 | 0.82 | ||
3 | RF | 0.75 | 0.10 | 0.94 | |
LR | 0.60 | 0.33 | 0.83 | ||
Behavior | 1 | RF | 0.67 | 0.23 | 0.89 |
LR | 0.61 | 0.31 | 0.82 | ||
2 | RF | 0.56 | 0.04 | 0.94 | |
LR | 0.55 | 0.13 | 0.89 | ||
3 | RF | 0.49 | 0.13 | 0.85 | |
LR | 0.53 | 0.11 | 0.92 |
Note: Random forest consistently outperforms logistic regression. Moderate pooled area under curve (AUC) of 0.75 was reached by RF trained on All modality with threshold of three or more seizures.
Modality . | Threshold for TBI+ . | Model . | AUC . | Sensitivity . | Specificity . |
---|---|---|---|---|---|
MRI | 1 | RF | 0.49 | 0.11 | 0.89 |
LR | 0.40 | 0.05 | 0.88 | ||
2 | RF | 0.37 | 0.06 | 0.88 | |
LR | 0.44 | 0.02 | 0.92 | ||
3 | RF | 0.44 | 0.12 | 0.91 | |
LR | 0.50 | 0.09 | 0.92 | ||
EPM | 1 | RF | 0.63 | 0.23 | 0.93 |
LR | 0.58 | 0.27 | 0.85 | ||
2 | RF | 0.65 | 0.10 | 0.96 | |
LR | 0.59 | 0.34 | 0.86 | ||
3 | RF | 0.58 | 0.06 | 0.94 | |
LR | 0.57 | 0.31 | 0.84 | ||
OF | 1 | RF | 0.68 | 0.21 | 0.88 |
LR | 0.67 | 0.11 | 0.93 | ||
2 | RF | 0.69 | 0.13 | 0.93 | |
LR | 0.67 | 0.12 | 0.95 | ||
3 | RF | 0.64 | 0.12 | 0.96 | |
LR | 0.60 | 0.05 | 0.98 | ||
MWM | 1 | RF | 0.58 | 0.19 | 0.93 |
LR | 0.62 | 0.22 | 0.87 | ||
2 | RF | 0.51 | 0.04 | 0.96 | |
LR | 0.54 | 0.09 | 0.92 | ||
3 | RF | 0.64 | 0.12 | 0.96 | |
LR | 0.60 | 0.05 | 0.98 | ||
FS | 1 | RF | 0.58 | 0.19 | 0.93 |
LR | 0.62 | 0.22 | 0.87 | ||
2 | RF | 0.51 | 0.04 | 0.96 | |
LR | 0.54 | 0.09 | 0.92 | ||
3 | RF | 0.53 | 0.08 | 0.95 | |
LR | 0.58 | 0.01 | 0.99 | ||
Neuroscore | 1 | RF | 0.48 | 0.11 | 0.87 |
LR | 0.55 | 0.20 | 0.84 | ||
2 | RF | 0.53 | 0.09 | 0.92 | |
LR | 0.53 | 0.13 | 0.90 | ||
3 | RF | 0.63 | 0.10 | 0.94 | |
LR | 0.47 | 0.10 | 0.86 | ||
Sucrose preference | 1 | RF | 0.41 | 0.17 | 0.75 |
LR | 0.44 | 0.00 | 0.99 | ||
2 | RF | 0.40 | 0.10 | 0.79 | |
LR | 0.44 | 0.00 | 0.99 | ||
3 | RF | 0.37 | 0.11 | 0.82 | |
LR | 0.46 | 0.00 | 0.99 | ||
Weight | 1 | RF | 0.62 | 0.19 | 0.93 |
LR | 0.57 | 0.40 | 0.77 | ||
2 | RF | 0.68 | 0.22 | 0.94 | |
LR | 0.61 | 0.39 | 0.81 | ||
3 | RF | 0.62 | 0.13 | 0.94 | |
LR | 0.67 | 0.39 | 0.83 | ||
TBI | 1 | RF | 0.62 | 0.19 | 0.93 |
LR | 0.57 | 0.40 | 0.77 | ||
2 | RF | 0.68 | 0.22 | 0.94 | |
LR | 0.61 | 0.39 | 0.81 | ||
3 | RF | 0.62 | 0.13 | 0.94 | |
LR | 0.67 | 0.39 | 0.83 |
Modality . | Threshold for TBI+ . | Model . | AUC . | Sensitivity . | Specificity . |
---|---|---|---|---|---|
MRI | 1 | RF | 0.49 | 0.11 | 0.89 |
LR | 0.40 | 0.05 | 0.88 | ||
2 | RF | 0.37 | 0.06 | 0.88 | |
LR | 0.44 | 0.02 | 0.92 | ||
3 | RF | 0.44 | 0.12 | 0.91 | |
LR | 0.50 | 0.09 | 0.92 | ||
EPM | 1 | RF | 0.63 | 0.23 | 0.93 |
LR | 0.58 | 0.27 | 0.85 | ||
2 | RF | 0.65 | 0.10 | 0.96 | |
LR | 0.59 | 0.34 | 0.86 | ||
3 | RF | 0.58 | 0.06 | 0.94 | |
LR | 0.57 | 0.31 | 0.84 | ||
OF | 1 | RF | 0.68 | 0.21 | 0.88 |
LR | 0.67 | 0.11 | 0.93 | ||
2 | RF | 0.69 | 0.13 | 0.93 | |
LR | 0.67 | 0.12 | 0.95 | ||
3 | RF | 0.64 | 0.12 | 0.96 | |
LR | 0.60 | 0.05 | 0.98 | ||
MWM | 1 | RF | 0.58 | 0.19 | 0.93 |
LR | 0.62 | 0.22 | 0.87 | ||
2 | RF | 0.51 | 0.04 | 0.96 | |
LR | 0.54 | 0.09 | 0.92 | ||
3 | RF | 0.64 | 0.12 | 0.96 | |
LR | 0.60 | 0.05 | 0.98 | ||
FS | 1 | RF | 0.58 | 0.19 | 0.93 |
LR | 0.62 | 0.22 | 0.87 | ||
2 | RF | 0.51 | 0.04 | 0.96 | |
LR | 0.54 | 0.09 | 0.92 | ||
3 | RF | 0.53 | 0.08 | 0.95 | |
LR | 0.58 | 0.01 | 0.99 | ||
Neuroscore | 1 | RF | 0.48 | 0.11 | 0.87 |
LR | 0.55 | 0.20 | 0.84 | ||
2 | RF | 0.53 | 0.09 | 0.92 | |
LR | 0.53 | 0.13 | 0.90 | ||
3 | RF | 0.63 | 0.10 | 0.94 | |
LR | 0.47 | 0.10 | 0.86 | ||
Sucrose preference | 1 | RF | 0.41 | 0.17 | 0.75 |
LR | 0.44 | 0.00 | 0.99 | ||
2 | RF | 0.40 | 0.10 | 0.79 | |
LR | 0.44 | 0.00 | 0.99 | ||
3 | RF | 0.37 | 0.11 | 0.82 | |
LR | 0.46 | 0.00 | 0.99 | ||
Weight | 1 | RF | 0.62 | 0.19 | 0.93 |
LR | 0.57 | 0.40 | 0.77 | ||
2 | RF | 0.68 | 0.22 | 0.94 | |
LR | 0.61 | 0.39 | 0.81 | ||
3 | RF | 0.62 | 0.13 | 0.94 | |
LR | 0.67 | 0.39 | 0.83 | ||
TBI | 1 | RF | 0.62 | 0.19 | 0.93 |
LR | 0.57 | 0.40 | 0.77 | ||
2 | RF | 0.68 | 0.22 | 0.94 | |
LR | 0.61 | 0.39 | 0.81 | ||
3 | RF | 0.62 | 0.13 | 0.94 | |
LR | 0.67 | 0.39 | 0.83 |
Note: None of the modalities enable reliable TBI– versus TBI + classification in any of the thresholds.
Subcohort Similarity
As studies with large animal numbers need to be conducted typically in several successive subcohorts, we next assessed the similarity between the subcohorts. Moderate ICC was observed on weight (0.33, CI95 [0.26,0.42]) and to lesser extent in MRI (0.23, CI95[0.17,0.29]). Lower ICC was observed on FS (0.15, CI95[0.1,0.2]), sucrose preference (0.16, C95[0.09,0,23]), TBI induction parameters (0.15, CI95[0.06,0.25]), and neuroscore (0.13, CI95[0.09,0.18]). For the remaining modalities, ICC was near zero.
Similarly, animals could be assigned into a correct cohort based on their weight with a mean pooled AUC 0.89. However, reliable cohort-wise classification using MRI or other modalities was not possible.
Discussion
We have previously investigated behavioral and cortical MRI parameters using more traditional biomarker statistics (Manninen et al., 2020; Lapinlampi et al., 2021). Here, we applied ML approach to discover single or combinatory biomarkers for PTE that may have gone undetected in previous analyses.
ML classifiers detect patterns specific to PTE during training to derive an optimal rule for classification. This allows hypothesis-free multivariate and nonlinear modeling of the relationship between measurements and PTE. When the data contain negligible amounts of information related to PTE, the extracted rules reflect measurement noise, leading to low classifier performance (close to chance level) during repeated CV. For this reason, ML provides a built-in method for checking correctness in biomarker discovery. The inability to combine a set of variables from an assay to detect PTE may indicate the following: (a) low relevance of the latent biological phenomenon measured by the assays to the PTE classification, (b) inability of the assay to measure the latent phenomenon with a sufficiently high signal-to-noise ratio, (c) inability of the utilized ML methods to determine relationship between the variables and PTE, or (d) sample presenting insufficiently the distribution to be estimated. Point (b) may result from the inherent nature of the assay or from inter-cohort differences due to, for example, experimenter level differences or from center-level differences in multicenter studies. In the present dataset, the differences between the cohorts were negligible, with the exception of differences in body weight and measures of cortical MRI.
A single, irrelevant assay can still provide complementary information when combined with measurements from another assay. For example, moderate classification performance of AUC 0.75 was reached by combining modalities such as weight, EPM, sucrose intake, and neuroscore, which individually were not able to separate TBI3– and TBI3 + classes. Nevertheless, classification accuracy showed relatively high variance, and a subset of animals was assigned into a correct class in only below 60% of the CV repeats. The inclusion of additional modalities such as EEG parameters or plasma markers could provide insight into the underlying pathology and further improve the classification performance.
There was limited overlap between the features highlighted by conventional statistics and RF feature importance. Statistical significance does not ensure separation; that is, statistical significance tells that class means of feature differ, which is conceptually different from the class separation. As class separation is necessitated for a pattern to qualify as a biomarker, and individual measurements may require contextualization through accounting for complementary information from other measurements, screening for candidate features for a biomarker by their contribution to a multivariate predictive model for PTE is justified. The observed low classification performance of linear combination of measurement variables utilized by LR implies necessity to account for nonlinear effects in PTE classification with the utilized modalities.
In these experiments PTE was defined by the presence of one or more unprovoked seizures. This classification scheme does not take into account, for example, the number and duration of seizures, the behavioral characteristics of the seizures or presence of epileptiform activity in addition to seizures. Thus, animals with a single observed seizure may differ in terms of severity of the underlying pathology. When the severity of PTE in TBI+ group was increased through stricter inclusion criteria of 3 or more electrographic seizures, AUC of 0.75 was reached. Conversely, regarding animals with less than 3 seizures as “nonepileptic” did not negatively affect the classification. This suggests that a combination of weight, sucrose intake, neuroscore, and EPM measurements differentiate only the severe PTE (i.e., animals with three seizures per month). However, as ML models are based solely on patterns present in the data, the observed performance may be affected by the presence of possible confounding factors. As is the case with ML generally, validation on independent datasets is necessary before the patterns discovered can be considered as possible biomarkers.
Acknowledgments
This study was supported by Medical Research Council of the Academy of Finland (grants 272249 and 273909, AP), by Research Council for Natural Sciences and Engineering of the Academy of Finland (Grant 316258, JT) and by the European Union’s Seventh Framework Program (FP7/2007-2013) under grant agreement n°602102 (EPITARGET)(AP). The computational analyzes were performed on servers provided by UEF Bioinformatics Center, University of Eastern Finland, Finland.
Disclosure Statement
The authors declare no relevant conflicts.
References
Airola, A., Pahikkala, T., Waegeman, W., Baets, B. De, & Salakoski, T. A comparison of AUC estimators in small-sample studies. In S. Džeroski, P. Guerts, & J. Rousu (Eds.), Proceedings of the third International Workshop on Machine Learning in Systems Biology. 2009; 8: 3–13. PMLR. http://proceedings.mlr.press/v8/airola10a.html
Altmann, A., Toloşi, L., Sander, O., & Lengauer, T.
Benjamini, Y., & Hochberg, Y.
Bouckaert, R. R.
Bouckaert, R. R.
Bouckaert, R. R., & Frank, E. (
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J.
Breiman, L.
Cover, T., Hart, P.
Dulla, C.G., Pitkänen, A.
Engel, J.J., Pitkanen, A., Loeb, J.A., Dudek, F.E., Bertram, E.H. 3rd, Cole, A.J., Moshe, S.L., Wiebe, S., Jensen, F.E., Mody, I., Nehlig, A., Vezzani, A.
Hastie, T.; Tibshirani, R. & Friedman, J.
Hauser, W.A., Annegers, J.F., Kurland, L.T.
Holm, S.
Lapinlampi, N., Andrade, P., Paananen, T., Hämäläinen, E., Ekolle Ndode-Ekane, X., Puhakka, N., & Pitkänen, A.
Manninen, E., Chary, K., Lapinlampi, N., Andrade, P., Paananen, T,. Sierra, A., Tohka, J., Gröhn, O., & Pitkänen, A. Early increase in cortical T2 relaxation is a prognostic biomarker for the evolution of severe cortical damage, but not for epileptogenesis, after experimental traumatic brain injury.
McInnes, L., Healy, J., & Melville, J. (
Pitkänen, A., Ekolle Ndode-Ekane, X., Lapinlampi, N., & Puhakka, N.
Scheffer, I.E., Berkovic, S., Capovilla, G., Connolly, M.B., French, J., Guilhoto, L., Hirsch, E., Jain, S., Mathern, G.W., Moshé, S.L., Nordli, D.R., Perucca, E., Tomson, T., Wiebe, S., Zhang, Y.-H., Zuberi, S.M.
Tibshirani, R.
WHO. Epilepsy, 2021. Accessed November 1, 2021. https://www.who.int/news-room/fact-sheets/detail/epilepsy
Month: | Total Views: |
---|---|
July 2024 | 8 |
August 2024 | 21 |
September 2024 | 12 |
October 2024 | 12 |
November 2024 | 4 |
December 2024 | 19 |
January 2025 | 13 |
February 2025 | 11 |
March 2025 | 11 |
April 2025 | 10 |