Discovery and Validation of Biomarkers to Guide Clinical Management of Pneumonia in African Children

Lipocalin 2 distinguishes severe and bacterial pneumonia from nonsevere and nonbacterial pneumonia with a high level of precision. The clinical impact of this biomarker requires large-scale clinical evaluation.


2
(lower chest wall indrawing or nasal flaring). Probable bacterial pneumonia was defined by the presence of a positive blood culture or by the presence of clear consolidation in the chest X-ray ("end-point" consolidation according to the WHO definition) and/or pleural effusion [4]. Probable non-bacterial pneumonia was defined by a normal chest X-ray and a low white blood cell count (<10,000x10 9 /L) in children without "end-point" consolidation on the chest X-ray.
All cases had a chest radiograph taken at recruitment. The radiographs were assessed according to WHO criteria by a trained clinician [5]. Blood cultures were obtained from all patients with non-severe and severe disease. Blood cultures were performed using the BACTEC 9050 system (BD, New Jersey, USA) by direct inoculation of culture media using standard microbiological procedures [6]. A blood culture was considered positive if a significant bacterial isolate was grown and identified from a colony sub-cultured in solid medium. Micrococci spp (N=3), coagulase-negative staphylococci (N=6) and Bacillus spp (N=3) were considered culture contaminants. The blood volume inoculated for culture was not measured in this study. However, data from other studies using the same extraction protocol indicate that the volumes are within the recommended range, namely1.9 ml (IQR 1.4-2.3) (MRC Gambia Unit) and 1.2 mL (0.6-1.9) in the Kenyan study [7]. The prevalence of occult bacteremia in a previous study of Kenyan children presenting with illnesses treated as outpatients was 0.5% [8].
To evaluate the association of Lpc-2 with blood culture positivity we defined two distinct etiological groups of patients: probable bacterial and probable non-bacterial pneumonia (most likely viral), based on radiological findings and white blood cell count.

Proteomic studies
The initial biomarker discovery phase was conducted on clinical samples from the Gambian cohort. The rational to include 100 individual per group was based on 1) the sample volume required for HPLC depletion and gel separation and adequacy for mass spectrometry analysis, and 2) to ensure that the confidence interval around the estimated sensitivity/specificity is sufficiently narrow to derive meaningful conclusions. For example, a candidate biomarker with a sensitivity and specificity higher than 75%, the maximum 95% confidence interval will be 8% below and above our point estimate, namely from 67% to 83%.Samples were divided into three different groups of 100 children according to disease severity (see Supplementary Figure 1). Individual samples (5 µl of plasma) were pooled into 3 different groups (~165µl of plasma per batch) in each disease category (control, non-severe and severe pneumonia). Pooled plasma samples were depleted of the top 14 highly-abundant plasma proteins with a multiple affinity removal (MARS) column (Agilent, UK) using highperformance liquid-chromatography (HPLC) 1200 series (Agilent, UK). Proteins from depleted plasma were precipitated with trichloroacetic acid and quantified using a colorimetric assay (BCA Protein assay, Thermo Scientific, US) and further separated by size using SDS-PAGE. Protein bands (13 bands per sample) were cut and digested with trypsin.
Peptide digests were purified using Sep-Pak C18 columns (Waters, Milford, MA). Samples were analysed in a nano LC-MS/MS (Agilent 1200 HPLC with a Chip cube coupled to a 6520 Q-TOF, Agilent, UK) and searched against the human proteome with a false-discovery rate of 1% calculated from target-decoy hits and relative (label-free) quantification was based on normalized spectral index quantitation (SINQ) [9,10]. Clinically and biologically relevant proteins that were up-regulated or down-regulated following a disease progression pattern (SP>NSP>C or vice versa) in two batches or more were selected as candidate biomarkers (see supplementary Figure 1 and supplementary Table 1). μm material, 300 Å pore size) integrated in the HPLC Chip (G4240-62001). For each mass spectrometry experiment, peptides were loaded onto the enrichment column with 100% 5 solvent A (2% acetonitrile with 0.1% formic acid). A two-step gradient generated at a flow rate of 0.6 μl/min was used for peptide elution. This included a linear gradient from 5% to 40% buffer B (95% acetonitrile with 0.1% formic acid) over 45 min followed by a sharp increase to 100% B within 10 min. The total run time, including column reconditioning, was 60 min. Full scan was acquired over a range of m/z 400-1700 at 5 spectra per second rate, and MS/MS over m/z 50-1700 at two spectra per second rate selecting the six most abundant doubly or triply charged precursor ions per cycle. Auto-MS/MS was performed with a total cycle time of 3.3 seconds. Selected precursor masses were excluded for 0.9 minutes in order to avoid repeated sequencing. Spectra were deconvoluted and analysed using Mass Hunter

Figures
Supplementary Figure 1| Proteomic Study workflow. Plasma samples from Gambian children were randomly selected and used for shotgun proteomic studies. Samples were separated in 3 different groups of 100 children according to disease severity. Individual samples (5µl of plasma) were pooled into 3 different groups (~165µl of plasma per batch) in each disease category and depleted of the top-14 most abundant plasma proteins (albumin, IgG, antitrypsin, IgA, transferrin, haptoglobin, fibrinogen, alpha2-macroglobulin, alpha1-acid glycoprotein, IgM, apolipoprotein AI, apolipoprotein AII, complement C3, and transthyretin). After protein precipitation and enzymatic digestion with trypsin, samples were analysed in an LC-MS/MS (Agilent Q-TOF 6520) and searched against the human proteome with a falsediscovery rate of 1% calculated from target-decoy hits and relative (label-free) quantification was based on normalized spectral counts (SINQ). Proteins that were up-or down-regulated following a disease progression pattern (SP>NSP>C or vice versa) in two batches or more were selected as candidate biomarkers. Final selection for enzyme immuno assay (EIA) validation was based on these criteria and clinical and biological relevance. 8 Supplementary Figure 2| Relative abundance of plasma proteins identified and quantified by liquid chromatography tandem-mass spectrometry in children with severe pneumonia (SP), non-severe pneumonia (NSP) and controls (Ctrl). Protein abundance was quantified using normalized spectral counts quantitation (SINQ). Error bars indicate standard error of the mean of three independent pools of samples. Red and green bars denote up-and -downregulation, respectively. Arrows indicate candidate markers of disease progression.  Tables  Supplementary Table 1| Clinical study sites and patients enrolled in the discovery and validation studies.