Assessment of diagnostic accuracy of biomarkers to assess lung consolidation in calves with induced bacterial pneumonia using receiver operating characteristic curves

Abstract Bovine respiratory disease (BRD) is the most economically significant disease for cattle producers in the U.S. Cattle with advanced lung lesions at harvest have reduced average daily gain, yield grades, and carcass quality outcomes. The identification of biomarkers and clinical signs that accurately predict lung lesions could benefit livestock producers in determining a BRD prognosis. Receiver operating characteristic (ROC) curves are graphical plots that illustrate the diagnostic ability of a biomarker or clinical sign. Previously we used the area under the ROC curve (AUC) to identify cortisol, hair cortisol, and infrared thermography imaging as having acceptable (AUC > 0.7) diagnostic accuracy for detecting pain in cattle. Herein, we used ROC curves to assess the sensitivity and specificity of biomarkers and clinical signs associated with lung lesions after experimentally induced BRD. We hypothesized pain biomarkers and clinical signs assessed at specific time points after induction of BRD could be used to predict lung consolidation at necropsy. Lung consolidation of > 10% was retrospectively assigned at necropsy as a true positive indicator of BRD. Calves with a score of < 10% were considered negative for BRD. The biomarkers and clinical signs analyzed were serum cortisol; infrared thermography (IRT); mechanical nociceptive threshold (MNT); substance P; kinematic gait analysis; a visual analog scale (VAS); clinical illness score (CIS); computerized lung score (CLS); average activity levels; prostaglandin E2 metabolite (PGEM); serum amyloid A; and rectal temperature. A total of 5,122 biomarkers and clinical signs were collected from 26 calves, of which 18 were inoculated with M. haemolytica. All statistics were performed using JMP Pro 14.0. Results comparing calves with significant lung lesions to those without yielded the best diagnostic accuracy (AUC > 0.75) for right front stride length at 0 h; gait velocity at 32 h; VAS, CIS, average activity and rumination levels, step count, and rectal temperature, all at 48 h; PGEM at 72 h; gait distance at 120 h; cortisol at 168 h; and IRT, right front force and serum amyloid A, all at 192 h. These results show ROC analysis can be a useful indicator of the predictive value of pain biomarkers and clinical signs in cattle with induced bacterial pneumonia. AUC values for VAS score, average activity levels, step count, and rectal temperature seemed to yield good diagnostic accuracy (AUC > 0.75) at multiple time points, while MNT values, substance P concentrations, and CLS did not (all AUC values < 0.75).


Introduction
Bovine respiratory disease (BRD) is the most economically significant disease for cattle producers in the United States, affecting 16.2% of cattle on feed. Fibrinous pleuropneumonia produced by M. haemolytica and is the most common form of acute pneumonia in weaned, stressed calves (Andrews and Kennedy, 1997). Pleuritis is an indication of the aggressiveness of the lung infection and the extent of infection and inflammation; lipopolysaccharide and leukotoxin are the two factors responsible for the destructive lesions of M. haemolytica infection. (Panciera and Confer, 2010). Lung lesion incidence has been recorded at 62% to 67% (Schneider et al., 2009;Blakebrough-Hall et al., 2020;Tennant et al., 2014). A biomarker is a molecular, histologic, or physiological characteristic "that is an indicator of normal biological processes, pathogenic processes, or responses to an exposure or intervention, including therapeutic interventions" (FDA, 2017). The relationship between clinical illness score, computerized lung score, rectal temperature, facial thermography, behavior and lung lesion scores has been examined in past studies to detect the presence of BRD (White et al., 2008;DeDonder et al., 2010;Baruch et al., 2019). However, which biomarkers and clinical signs yield good diagnostic accuracy when predicting lung lesions at varying time points throughout the disease state has not been examined.
Receiver operating characteristic (ROC) curves are graphical plots that illustrate the diagnostic ability of a test as its discrimination threshold is varied. The plot of true positive (sensitivity) verses false positive (1-specificity) across possible cutoff values generates a ROC curve (Hajian-Tilaki, 2013). The area under the ROC curve (AUC) can be used to measure discriminative ability. The objective of this analysis was to use AUC values derived from ROC analysis to assess the predictive value of BRD biomarkers and clinical signs to evaluate the presence of lung lesions after experimentally induced bacterial pneumonia.

Materials and Methods
This study was reviewed and approved by the Institutional Animal Care and Use Committee at Kansas State University (IACUC# 4465).

Study design
Twenty-six calves, 6 to 7 months of age, weighing an average of 185 ± 4 kg were enrolled onto the study. On arrival to the study site, calves were affixed with a three-axis accelerometer ear-tag (Allflex Livestock Intelligence, Madison, WI) to quantify activity. At 24 h prior to the start of the study, 18 calves were inoculated with a strain of Mannheimia haemolytica using bronchoalveolar lavage as described by Theurer et al. (2013). The right apical lung lobes were inoculated using broncho-selective endoscopy. A 10-mL dose of M. haemolytica serotype A1 at 1 × 10 9 cfu/mL was used to inoculate each calf, then the endoscope was flushed with 60 mL of phosphatebuffered solution to achieve a total volume of 70 mL. The eight control calves received 70 mL of phosphate-buffered solution as described above and were all inoculated prior to the calves inoculated with M. haemolytica. At any point while on study, if the rectal temperature of a calf was >40.3 °C, tildipirosin was administered at a dosage of 4 mg/kg. Eight calves that were inoculated with M. haemolytica were administered flunixin transdermal at a dosage of 3.33 mg/kg at 0 and 24 h relative to disease onset. Across all treatments, calves were restrained, and samples were collected the same number of times.
Disease onset was considered to be 24 h following inoculation and was considered time point 0 h. Biomarker and clinical sign variables were collected at −48 prior to disease onset, 0 (disease onset), 4, 8, 24, 32, 48, 72, 96, 120, 144, 168, and 192 h relative to disease onset, in addition to the three-axis accelerometer ear-tags continuously collecting data throughout the study. A total of 5,122 biomarkers and clinical signs were collected which included: infrared thermography (IRT) imaging, kinematic gait analysis, mechanical nociception threshold (MNT), visual analog scale (VAS) score, clinical illness score (CIS), computerized stethoscope (Whisper Veterinary Stethoscope, Merck Animal Health, Madison, NJ) lung score (CLS), average activity levels, rectal temperature, and blood sampling for cortisol, substance P, prostaglandin E 2 metabolite (PGEM), and serum amyloid A (SAA) analysis. All trained evaluators were blinded to treatment for the duration of the study. Following the 192 h collection, calves were euthanized and transported to the Kansas State Veterinary Diagnostic Laboratory for necropsy and lung lesion scoring. The postmortem examination was performed by a board-certified veterinary pathologist (K.M.A.) to determine lung lesions and a lung lesion score was assigned based on lung consolidation. The lung lesion score was determined using methods described by Fajt et al. (2003). Briefly, the extent of consolidated lung divided by the total lung volume was determined. A lung score > 10% was used as a positive indicator of BRD for the sake of this analysis. Regardless of therapeutic intervention, calves were grouped based on a lung score > 10% or < 10%. The biomarkers and sample collection time points are outlined in Table 1.

Physiological and behavioral parameters Serum cortisol
A total of 312 cortisol samples made up the data set. The samples were obtained as described by Kleinhenz et al. (2017). Blood was obtained by jugular venipuncture using a syringe (Kleinhenz et al., 2018;2019b) and was immediately transferred to a blood tube and centrifuged at 3,000 g for 10 min. The serum was pipetted into cryovials, placed on dry ice, and stored at −80 °C until analysis. Cortisol concentrations were determined using a commercially available radioimmunoassay (MP Biomedicals, Santa Ana, CA).

Infrared thermography
A total of 312 IRT images made up the data set. Mean and percent change from baseline values from images of the medial canthus of the left eye were included in the analysis. Infrared thermography images were obtained using a research-grade infrared camera (Fluke TiX580, Fluke Corp, Everett, WA). The IRT camera was calibrated prior to being used with the ambient temperature and relative humidity. As described in Kleinhenz et al. (2017) and Kleinhenz et al. (2018), an image of the lateral aspect of the head was obtained so that the image contained the medial canthus of the eye. Infrared images were analyzed using research-grade computer software (SmartView v. 4.3, Fluke Thermography, Plymouth, MN).

Mechanical nociception threshold
A total of 208 MNT measures made up the data set. Using a hand held pressure algometer (Wagner Instruments, Greenwich, CT), force was applied perpendicularly at a rate of approximately 1 kg of force per second at one location on each side of the ribs of each calf over the sixth intercostal space for a total of two locations, as described in (Williams et al., 2020). A withdrawal response was indicated by an overt movement away from the applied pressure algometer and values were recorded by a second investigator to prevent bias. Locations were tested three times in sequential order, and the values were averaged for statistical analysis.

Substance P
A total of 312 substance P samples made up the data set. As described in Kleinhenz et al. (2017), benzamidine hydrochloride (final concentration of 1mM) was added to ethylenediaminetetraacetic acid (EDTA) blood tubes (BD Vacutainer, Franklin Lakes, NJ) 48 h prior to the start of the studies. During sample collection, 3 mL of blood was added to the spiked EDTA tube. The samples were immediately placed on ice, centrifuged within 30 min of collection, and the plasma was placed into cryovials. The cryovials were stored at −80 °C until analysis. Substance P levels were determined using the methods described by Van Engen et al. (2014) using nonextracted plasma.

Kinematic gait analysis
A total of 2,184 gait readings made up the data set. As described in Kleinhenz et al. (2019c), a commercially available kinematic gait system (Walkway, Tekscan, Inc., South Boston, MA) was used to record gait and biomechanical parameters. The gait system was calibrated, using a known mass, daily and before each use of the computer software to ensure accuracy of the measurements at each time point. Video synchronization was used to ensure consistent gait between and within calf at each time point. Using research specific software (Walkway 7.7, Tekscan, Inc.), force, contact pressure, and impulse were assessed.

Visual analog scale
A total of 312 VAS scores made up the data set. As described in Martin et al. (2020a), a daily VAS pain assessment was conducted by two trained evaluators blinded to treatment allocations. The VAS used was a 100-mm (10 cm) line anchored by descriptors of "No Pain" on the left (0 cm) and "Severe Pain" on the right (10 cm). Five parameters were used to assess pain: depression, tail swishing or flicking, stance, head carriage, and foot stomping or kicking. No pain was characterized by being alert and quick to show interest, no tail swishing, a normal stance, head held above spine level, and absence of foot stomping. Severe pain was characterized by being dull and showing no interest, more than three tail swishes per minute, legs abducted, head held below spine level, and numerous stomps. The evaluator marked the line between the two descriptors to indicate the pain intensity. A millimeter scale was used to measure the score from the zero anchor point to the evaluator's mark. The mean VAS measures of the two evaluators were combined into one score for statistical analysis.

Clinical illness score
A total of 312 CIS measures made up the data set. A CIS was assigned by two trained evaluators blinded to treatment allocations. The CIS consisted of (1) is a normal healthy animal, (2) slightly ill with mild depression or gauntness, (3) moderately ill demonstrating severe depression/labored breathing/ nasal or ocular discharge, and (4) severely ill and near death showing minimal response to human approach. If either evaluator scored a calf > 1, that score was used for statistical analysis, with 1 being considered normal and greater than 1 considered abnormal.

Computerized lung score
A total of 312 computerized lung scores made up the data set. A computerized stethoscope (Whisper, Merck Animal Health, De Soto, KS) was used to analyze lung and heart sounds via a machine-learning algorithm that assigns a lung score from 1 to 5, with 1 being normal and 5 being severely compromised lung tissue (Nickell et al., 2020). The bell of the lung stethoscope was placed approximately two inches caudal and dorsal to the right elbow of each calf, and lung sounds were recorded for 8 s. If the recording was deemed acceptable by the computer program, the score was recorded.

Prostaglandin E 2 metabolite
A total of 78 samples were analyzed for PGEM. Prostaglandin E 2 metabolites were analyzed using a commercially available ELISA kit (cat. no. 514531, Cayman Chemical, Ann Arbor, MI) following manufacturer specifications with minor modifications. Sample input was adjusted to 375 µL with 1.5-mL ice-cold acetone added for sample purification. Samples were incubated at −20 °C for 30 min, then centrifuged at 3,000 × g for 5 min. Supernatant was transferred to clean 13 × 100 mm glass tubes and evaporated using a CentriVap Concentrator (cat. no. 7810014, Labconco, Kansas City, MO) overnight (approx. 18 h). Samples were reconstituted with 375 µL of appropriate kit buffer. A 300-µL aliquot of the reconstituted sample was derivatized with proportionally adjusted kit components. Manufacturer protocol was then followed. Samples were diluted 1:2 and ran in duplicate. Absorbance was measured at 405 nm after 60 min of development (SpectraMax i3, Molecular Devices, San Jose, CA). Sample results were excluded if the raw read exceeded the raw read of the highest standard (Standard 1; 50 pg/mL) or was below the lowest acceptable standard. The lowest acceptable standard was defined for each individual plate and was identified by excluding standards that had a ratio of absorbance of that standard to the maximum binding of any well (%B/B 0 ) of ≥ 80% or ≤ 20%. Any individual sample outside the standard curve, with a %B/B 0 outside the 20% to 80% range, or a coefficient of variation (CV) > 15% were re-analyzed.

Serum amyloid A
A total of 312 serum amyloid A samples made up the data set. Serum Amyloid A concentrations were determined in serum samples using an ELISA assay (Phase Range Multispecies SAA ELISA kit; Tridelta Development Ltd., Cat no. TP802). Manufacturer specifications were followed and samples were diluted as necessary. Absorbance was measured at 450 nm on a SpectraMax i3 plate reader (Molecular Devices). Raw data was analyzed using MyAssays Desktop software (version 7.0.211.1238) for concentration determination. Standard curves were plotted as a four-parameter logistic curve. Samples with a CV > 15% were re-analyzed.

Rectal temperature
A total of 260 rectal temperature samples made up the data set. Rectal temperatures were taken by placing a digital thermometer (180 Innovations, Lakewood, CO) against the wall of the rectum until a temperature reading was produced on the screen of the thermometer.

Receiver operating characteristic curve determination
All statistics were performed using statistical software (JMP Pro 14.0, SAS Institute, Inc., Cary, NC). Receiver operating characteristic curves were created for each time point, with AUC values comparing > 10% lung lesions to < 10%, with > 10% as the positive control. The biomarker was plotted as the x-coordinate and the status (>10% or < 10%) was plotted as the y-coordinate. Bootstrapping via fractional weights was used to generate confidence intervals for each AUC value. AUC values ≥ 0.7 were considered to yield good diagnostic accuracy (Yang and Berdine, 2017). Specific cutoff values were selected based upon optimized specificity and sensitivity values. Positive and negative predictive values with confidence intervals were calculated for each AUC value (MedCalc Software Ltd., Ostend, Belgium).

Results
Sixteen of the 18 calves that were inoculated with M. haemolytica had lung lesion scores > 10%, with all 18 calves having extensive lung consolidation in their right apical lung lobe. Three of the eight control calves also had lung lesion scores > 10%, with four of the eight having extensive lung consolidation in their right apical lung lobe. Five calves received tildipirosin as an intervention and all had lung lesion scores > 10%. Seven of the eight calves who received flunixin had lung lesion scores > 10%. The biomarker parameter AUC values are outlined in Table 2, with rankings in Table 3 and cutoff values in Table 4. Step count      Figure 3).

Discussion
The BRD induction model used in the present study was expected to produce pleuritis, and gross necropsy findings revealed that pleuritis was present in inoculated calves with significant lung consolidation. Cattle with advanced lung lesions at harvest have been shown to have reduced average daily gain; hot carcass weight; kidney, pelvic, and heart fat; 12th rib fat; calculated yield grades; marbling scores and percentage choice carcasses (Schneider et al., 2009;Tennant et al., 2014;Blakebrough-Hall et al., 2020), all resulting in economic losses. This creates a need for the development of robust biomarkers to objectively predict BRD and the presence of lung lesions.
Accelerometers have been used to monitor animal behavior without the presence of human evaluators which can reduce subjectivity (White et al., 2008). Our results showed the highest AUC values to be associated with average activity levels and step count indicating that they may yield the best diagnostic accuracy. Diagnostic accuracy remained strong throughout the duration of the study, indicating that activity levels and step count may be viable options for quantifying differences early in the BRD disease process as well as during the disease progression.
Rectal temperatures yielded good diagnostic accuracy, particularly between 24 and 72 h after disease onset. In previous BRD challenge studies, rectal temperature differed between days when animals were diseased when compared with healthy (Baruch et al., 2019) and calves exhibited fever as part of the sickness response on days 3 to 7 (Toaff-Rosenstein et al., 2016).
Visual analog scale assessment is a method of evaluating pain intensity based on behavioral parameters. Our results revealed good diagnostic accuracy for VAS values out to 9 d following inoculation which may be due to the nature of BRD causing more chronic changes that VAS scoring captured 7 to 9 d postinoculation. Cutoff values did not decrease throughout the study, supporting the idea that calves were painful for the study duration.
Cortisol results from the present study revealed higher AUC values later in the study indicating that there may be meaningful differences in cortisol levels more chronically and less acutely when identifying calves with lung lesions. These results are similar to our previous findings for AUC values for lameness which is also a more chronic condition (Kleinhenz et al., 2019b). Cutoff values were lower than cutoff values from previous findings, indicating that BRD may not cause as acute or magnified of a cortisol response as some procedures such as castration and dehorning.
Kinematic gait analysis in cattle has not been wellcharacterized in the past but is becoming more prevalently used as a diagnostic tool (Pairis-Garcia et al., 2015;Kleinhenz et al., 2018Kleinhenz et al., , 2019a. A commercially available floor matbased gait system can be used to assess variables such as gait distance and weight distribution . Our results yielded good diagnostic accuracy for right front stride length. In the present study, the right apical lung lobes were inoculated indicating that gait measurements more specific to the area of trauma may yield better results. Bacterial infections such an pneumonia cause the activation of monocytes and macrophages and release of inflammatory mediators such as prostaglandin E 2 (Idoate et al., 2015). Our results revealed good diagnostic accuracy for PGEM concentrations with cutoff values decreasing over the duration of the study. Concentrations of PGEM were only quantified at three time points throughout the study which all yielded good diagnostic accuracy, at 0, 72, and 192 h indicating that PGEM concentrations maintained good diagnostic accuracy throughout the study.
Clinical illness scores yielded good diagnostic accuracy 48 and 96 h after BRD onset but did not continue to yield good diagnostic accuracy throughout the study duration. Amrine et al. (2013) found that interobserver agreement for assigning CIS was low and CIS accuracy relative to pulmonary consolidation varied based upon the severity of consolidation.
The AUC values from this analysis did not indicate a clear pattern for when IRT may yield good diagnostic accuracy. An acute epinephrine release would be more likely to occur shortly after an acutely painful procedure such as dehorning relative to respiratory disease event. Environmental factors along with distance from the animal can be very influential upon IRT readings (Church et al., 2014). Sampling time points being at different points throughout the day resulted in varying ambient temperatures that likely influenced results.
Determining mechanical nociception threshold via a pressure algometer can establish the minimal amount of pressure that produces a response. The AUC values indicated poorer diagnostic accuracy relative to previous findings when MNT was used in a dehorning study . Our results also revealed higher cutoff values relative to dehorning cutoff values. This may be due to calves being more sensitive to force around their horn buds compared to calves being relatively less sensitive to force in their thoracic region.
Our results showed poor diagnostic accuracy for substance P when identifying calves with lung lesions. Previous cattle pain studies did not find a difference in substance P levels between calves likely experiencing pain from procedures such as dehorning and castration compared with sham calves who likely were not in pain (Kleinhenz et al., , 2018. Higher cutoff values were observed in our results relative to previous findings indicating that BRD may result in higher substance P levels relative to castration and dehorning studies (Kleinhenz et al., , 2018. Computerized lung scores only seemed to yield good diagnostic accuracy at 96 h after BRD onset. Baruch et al. (2019) when examining the association between computerized lung score relative to lung consolidation found a trend toward significance with a difference only between normal and moderate acute scores. This analysis did not differentiate between mild and moderate acute CLS. This analysis used a cutoff of < 10% consolidation while (Baruch et al., 2019) found a mean of 13.7% lung consolidation to be associated with a normal CLS score.
Our results yielded poor diagnostic accuracy for serum amyloid A concentrations which was likely due to all calves having elevated SAA levels regardless of lung lesions. Higher levels of SAA were observed in the present study compared to levels in previously recorded naturally occurring BRD in calves (Joshi et al., 2018). In the present study, the cutoff values were most elevated at 8 h and steadily decreased for the remainder of the study, indicating that a SAA response was observed.
The impact on animal well-being from BRD is considerable (Hanzlicek et al., 2010). Pleuritic chest pain as a result of bacterial pneumonia is commonly reported in human medicine (Boyd et al., 2006); however, published literature regarding pain associated with lung lesions as a result of bacterial pneumonia in cattle is lacking. The biomarkers quantified in the present study varied in their specificity to BRD, objectivity, and association with pain. Through previous ROC analysis we identified cortisol, hair cortisol and infrared thermography imaging as having acceptable (AUC > 0.7) diagnostic sensitivity and specificity for detecting pain in cattle (Martin et al., 2020b). In the present study, average activity levels yielded good diagnostic accuracy for predicting lung lesions and were objective, but not specific to pain. Visual analog scale scores yielded good diagnostic accuracy and were somewhat specific to pain but were not specific to BRD and were subjective. Rectal temperatures yielded good diagnostic accuracy and were objective but were not specific to BRD or pain. Identifying biomarkers that are (1) predictive of lung lesions and (2) specific to pain will require further investigation into diagnostic tools and biomarkers to quantify pain from BRD.

Conclusions
In the first 72 h after onset of BRD, right front stride length, gait velocity, VAS, CIS, average activity level, step count, and rectal temperature yielded the best diagnostic accuracy (AUC > 0.75) for predicting calves with significant lung lesions (>10% consolidation) at necropsy compared with those with < 10% lung lesions. After 72 h postinduction, PGEM, gait distance, cortisol, IRT, right front force, average activity level, step count, and serum amyloid A yielded the best diagnostic accuracy (AUC > 0.75) for predicting the severity of lung lesions. Biomarkers and clinical signs with the best diagnostic accuracy early in the disease process would likely be the most valuable in field conditions. These results indicate that ROC analysis can be a useful indicator of the predictive value of biomarkers and clinical signs associated with pain and inflammation in cattle with induced bacterial pneumonia. AUC values for VAS score, average activity levels, step count, and rectal temperature seemed to yield very good diagnostic accuracy (AUC > 0.75) at multiple time points, while average and percent change MNT values, substance P concentrations, and CLS did not (all AUC values < 0.75). These results can be used to guide refinement of the optimal time points and biomarkers for the diagnosis of significant lung lesions after BRD.