Host Transcription Profile in Nasal Epithelium and Whole Blood of Hospitalized Children Under 2 Years of Age With Respiratory Syncytial Virus Infection

RSV infection induces a clearly different host response pattern compared with hRV and induced strong innate immune responses both locally and systemically. B cell lymphoma (BCL6) is a hub gene that positively correlates with RSV load and disease severity.

Respiratory syncytial virus (RSV) is the leading cause of lower respiratory tract infections (LRTIs) in young children. Although some risk factors associated with severe RSV disease have been identified (prematurity, underlying lung or heart diseases, immunodeficiencies), the majority of children with RSV that require hospitalization were previously healthy and lacked comorbidities [1,2]. The pathogenesis of RSV remains poorly understood, and no vaccines are available. Microarray techniques provide an excellent hypothesis-generating tool to explore the expression patterns of transcripts across entire biological pathways during the course of disease [3]. The advent of microarray technologies enables investigating the host factors involved in the immediate response to infection and those that modulate RSV disease severity. Several RSV host expression profiling studies have been conducted using in vitro models [4][5][6][7][8][9][10], animal models [11][12][13][14][15], and human subjects [16][17][18][19][20][21][22]. However, most in vivo studies only investigated systemic transcriptional profiles in blood [17-19, 21, 22], with only 1 study exploring local respiratory expression profiles by analysis of nasopharyngeal (NP) aspirates of hospitalized children (n = 30) [23], and 1 study, using an animal model, exploring overlapping components of respiratory and systemic responses [15]. Because the burden of RSV disease is greatest amongst children <2, and because patterns of RSV pathogenesis in young children are likely to differ significantly from immune responses of adults or animal models, specimens from pediatric cohorts represent a priority for in-depth analysis.
In this study, we applied microarray technology to acute and early recovery blood and NP swabs obtained from children <2 years hospitalized for LRTI with laboratory-confirmed RSV or rhinovirus (hRV) infections, which was reported as a second pathogen found in this population [24].
Using a combination of weighted gene coexpression network analysis (WGCNA) [25] and conventional analysis approaches [26], this unique data set allowed us to compare local and systemic host responses with RSV and hRV infections and to correlate these to clinical outcomes to identify key genes involved in RSV pathogenesis and severity.

Ethics
The following Institutional Review Boards approved the study: Children's Hospitals (CH)1 and 2 and the Hospital for Tropical Diseases (Ho Chi Minh City, Vietnam) and the Oxford University Tropical Research Ethics Committee (Oxford, UK). Written informed consent was obtained from parents or legal guardians of children before enrollment into the study.

Study Design, Setting, and Population
The clinical data and samples for gene expression profiling were collected from a previously reported cohort of 632 children with LRTI under 2 years of age hospitalized at CH1 and CH2 [24]. Severe cases were defined as being hospitalized in the intensive care unit or requiring supplemental oxygen/mechanical ventilation or having a peripheral capillary oxygen saturation (SpO2) <92%. In brief, at admission (acute phase) and discharge or day 7 of hospitalization (early recovery), blood was collected in ethylenediaminetetraacetic acid (EDTA), and PAXgene ribonucleic acid (RNA) tubes (PreAnalytiX, Hombrechtikon, Switzerland) and 2 NP swabs (Copan Diagnostics Inc, Murrieta, CA) were collected in RNAlater (Sigma, Singapore) and viral transport medium (VTM) [27]. Nasopharyngeal swabs in RNAlater and blood in PAXgene RNA tubes were processed for host gene expression profile analyses. The EDTA tubes and the NP swabs in VTM were processed for blood count and detection of 14 respiratory viruses, including RSV and hRV [28][29][30]. Based on the results of the viral detection on NP swabs at admission, and the availability of paired samples, 343 cases were selected and classified into 3 groups for the current study: single-RSV group (RSVsi) (n = 177 of 343, 52%), RSV coinfection group (RSVco) (n = 88 of 343, 26%), and single-hRV group (hRV) (n = 78 of 343, 22%) (Supplementary Figure 1). The RSV group includes patients from RSVsi and RSVco.

Gene Expression Profiling by Microarray
Total RNAs from whole blood and NP swabs were extracted using the PAXgene blood RNA isolation kit (PreAnalytiX) and the RNeasy Mini kit (QIAGEN, Hilden, Germany), respectively. Biotinylated amplified complementary RNA (cRNA) was generated by in vitro transcription of total RNA using Ambion Illumina TotalPrepRNA amplification kit (Ambion Inc., Austin, TX), according to the manufacturer's instructions. After purification, 800 ng of cRNA from NP samples and 2000-ng labeled cRNA from blood were hybridized to the HumanHT-12 v4 BeadChip array at 55°C for 18 hours and under washing, blocking, and streptavidine-Cy3 staining steps, according to the manufacturer's instructions. Each array on the HumanHT-12 v4 Expression BeadChip targets more than 31 000 annotated genes with 47 231 probes derived from the National Center for Biotechnology Information Reference Sequence RefSeq Release 38 (November 7, 2009) and other sources. The array was scanned using an iScan confocal scanner (Illumina Inc., San Diego, CA).

Preprocessing and Quality Control of Microarray Data
Using R statistical software version 3.2.1 (R Development Core Team) [31], raw expression data files from the Illumina iScan system were preprocessed using Bioconductor lumi package [32] for quality control, background correction, transformation, and normalization (robust spline normalization algorithm). The ArrayQualityMetric package [33] was used for summarizing quality of metrics data and detecting outlier arrays that differed substantially from other arrays in the dataset [34]. If an array from paired samples of a given patient was removed, this patient and all associated data were also removed from the final analysis. Based on the quality control analysis, only 83 cases were processed for the final microarray analysis (4 arrays from every case, 332 arrays), RSVsi (n = 38 of 83, 46%), RSVco (n = 15 of 332, 18%), and hRV (n = 30 of 83, 36%) (Supplementary Figure 1 and Supplementary Table 1). After normalization and filtering, the expression data of NP and blood consisted of transcript levels for 30 020 and 24 228 probes, respectively, being processed for downstream analysis of differentially expressed genes (DEGs) and WGCNA. Gene expression data are available at the GEO database (https://www. ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE97743) under accession number GSE97743.

Identification of Differentially Expressed Genes
To investigate differences in gene expression between 2 time points (early recovery versus acute phase) for each group (RSV, RSVsi, RSVco, and hRV), the linear modeling framework in the limma package [26] was used. A DEG was defined by a fold change ≥2 or ≤−2 and Benjamini-Hochberg false discovery rate corrected P < .05. Due to the design of Illumina BeadArray, there were multiple probes for each gene, reflecting the possibility of multiple differentially spliced transcripts for each gene. We collapsed this information as such that each probe represented an average transcript for each gene.

Weighted Gene Coexpression Network Analysis
The WGCNA uses a stepwise analytical process to interrogate variations in gene coexpression patterns across all samples in the same group (RSV or hRV infection) to identify subsets of coexpressed genes with similar transcriptional patterns together to form a group of transcripts (coexpressed network modules). Because the probability for multiple transcripts to follow a complex pattern of expression across all the samples by chance is low, such sets of genes should constitute coherent and biologically meaningful transcriptional units. The coexpressed network modules that correlate to a clinical trait are assumed to have a coordinated biological function in the clinical condition under study (disease status) (Supplementary Table 1 represents clinical traits of patients at acute phase and early recovery phase) [25]. We focused on groups of transcripts significantly correlated to clinical and laboratory parameters that have been previously implicated in RSV outcomes [24] and that were also differentially expressed during RSV infection.

Functional Enrichment Analysis
For the biological interpretation of gene lists and WGCNA modules, InnateDB pathway analysis [35] and g:Profiler [36] were used to identify significantly enriched pathways.

Protein-Protein Interaction Network
To obtain insights into the interaction between DEGs of selected modules, a protein-protein interactions (PPIs) network was constructed by NetworkAnalyst [37] using the IMEx Interactome database [38]. The pipeline of data analysis is summarized in Supplementary Figure 2.

Demographics of Microarray Patients
There were no significant differences in sex and number of household members between the 3 selected patient groups as well, compared with the original cohort (Table 1). Patients with single RSV infection were the youngest group. There were no deaths and similar percentages of patients with a history of premature birth between hRV and single-RSV infection groups. It is interesting to note that the RSVco had been significantly under steroids treatment (Table 1).

Infection
Differentially expressed genes were identified in the expression data of NP swabs and blood of RSVsi (182 DEGs in blood and 992 in NP swabs), RSVco (106 DEGs in blood and 479 in NP swabs), and hRV (92 DEGs in blood and 325 in NP swabs) ( Table 2 and Supplementary Table 2).
When DEG lists were compared between NP and blood, 36 and 43 overlapping DEGs were observed in RSV single and RSV coinfection, respectively ( Unsupervised hierarchical clustering heatmaps using DEGs of RSV and hRV in blood and NP swabs confirmed differences in the pattern of gene expression at acute phase and early convalescent phase across 4 patient groups (Supplementary Figure 3). These heatmaps also show more variability between patients infected by the same virus than what is used in the analysis. This advocates for a more thorough analysis of this dataset that lies beyond the scope of this manuscript.

Functional Annotation and Classification of Differentially Expressed
Genes Induced During Respiratory Syncytial Virus and Rhinovirus

Infection
By using g:Profiler and InnateDB pathway analysis, we revealed the functional significance of the identified DEGs in all RSV infection cases and in hRV. Innate immune responses, ie, interferon (IFN) signaling (IFN-α/β and IFN-γ signaling), cytokine signaling, and the pathway related to Th17 cell population, ie, interleukin (IL)-23-mediated signaling, were the main pathways enriched from up-regulated genes in both NP swabs and blood of RSV cases. In contrast, the down-regulated genes were involved in specific metabolic processes of different cell types from each compartment (Figure 2a).
For hRV infection, all DEGs in blood arrays were up-regulated and enriched in pathways related to (1) innate immune responses, (2) proinflammatory cytokine production (IL-1 signaling pathway), and (3) Th17 cell population (IL-23mediated signaling) ( Figure 2b); however, there were no enriched pathways related to Th17 cell populations in hRV NP arrays.

Infection
We explored the coexpressed network modules in the RSV host response by using WGCNA, and we identified 20 and 37 coexpressed network modules in NP swabs and blood during RSV infection, respectively (Figure 3a and b). For hRV infection, 22 and 16 coexpressed network modules in NP swabs and blood were identified, respectively (Supplementary Figure 4a and b). We focused on the RSV-coexpressed gene modules that were significantly correlated with the course of illness, ie, acute versus early recovery, and with at least 3 clinical traits   related to severity, RSV subgroup, and RSV load. Hence, 7 of 20 modules in NP swabs and 5 of 37 modules in blood were selected and named as selected significantly coexpressed gene modules.

Expressed Genes Identified During RSV Infection
We hypothesized that the common set of genes between the DEGs identified over the course of the RSV infection and the significantly coexpressed gene modules could reveal leading biological pathways and identify key genes related to pathogenesis. Hence, we compared the selected significantly coexpressed gene modules and DEG lists (Supplementary Figure 5). Expression of these modules was either positively correlated with viral load, severity, and blood neutrophil counts or inversely correlated with time, load, and severity ( Table 3).
The RSV "severe" network (severe NP network) and the RSV "nonsevere" network (nonsevere NP network) in NP arrays were built from 309 genes and 498 genes, respectively (Supplementary Table 3, Figure 4). The RSV severe network in blood arrays (severe blood network) and the RSV nonsevere network were built from 118 genes and 37 genes, respectively (Supplementary Table 6, Figure 4). Genes in the severe and nonsevere networks were either up-regulated or down-regulated, respectively, at RSV acute infection in both sites (NP swabs and blood). However, only gene ORM1 (orosomucoid 1) in severe NP network was down-regulated during RSV acute infection (logFC = −1.2, adjusted P-value = 2.7E-03).
To assess the interaction between the DEGs in the selected modules, a PPI network was explored and visualized using NetworkAnalyst. As shown in Figure 4, the severe NP network consisted of 1803 nodes and 2894 edges, and the severe blood network consisted of 1027 nodes and 1323 edges. In the PPI networks, the nodes with a high degree of connection were

DISCUSSION
To our knowledge, this is the first study examining the host gene expression response in children with LRTI with RSV or hRV over the course of infection from 2 compartments, ie, NP swabs and blood samples, representing in situ and systemic responses, respectively. Using an unbiased analytical strategy, we found that RSV infection induced similar pathways in both compartments, whereas for hRV infection, no common pathways were shared between the two compartments. For RSV infection, the overall transcriptional responses in blood were reduced in comparison to the NP compartment, with fewer DEGs and enriched pathways as well as lower fold changes. This finding highlights the localized response to RSV infection and the importance of studying the NP compartment, which has also been previously shown to have a similar gene expression pattern as bronchial epithelium in response to cigarette smoking [40,41] or in inflammatory responses [42]. This study invites further explorations on the utility of NP specimens as a noninvasive surrogate for bronchial cells in a larger-scale gene expression analysis using the BeadArray platform [20,23,43].   . The x-axis represents clinical traits and the y-axis represents coexpressed modules. (*) shows the selected modules that were significantly correlated with the course of illness, ie, acute versus early recovery, and with at least 3 clinical traits related to severity, RSV subgroup, and RSV load.

Respiratory syncytial virus infection induced a different host response from hRV in terms of local and systemic responses.
Furthermore, RSV appears to be the dominant pathogen during coinfections because similar patterns were observed in RSV coinfection cases. Others also found that in bacterial carriages, the response to RSV was also predominant and independent of the microbiome B   [44]. We could not confirm this observation in our study, because we did not attempt to detect bacteria in NP samples. In contrast to hRV infection, we found that RSV induced a large number of chemokines/cytokines and IFN responses that still predominated on day 4 after illness onset. Others observed only a few of these responses in lung tissue from experimental mouse infections on day 5 [13] and after 24-hour post-RSV challenge in adult patients [14]. This highlighted the limitations of experimental infections in animals or human volunteers, because it may not reliably reflect RSV pathogenesis of severe disease in young children.  [22]. Also of note, for RSV infection only, WGCNA showed clear clusters of modules that significantly correlated with severity (Figure 3a and b), whereas for hRV infection, we did not observe this pattern (Supplementary Figure 4). The ISG15 was up-regulated in both blood and NP samples in acute RSV infection and belongs to modules significantly correlated with viral load. The positive correlation of the expression of ISG15 and RSV load has recently been observed in an ex vivo model of respiratory epithelium and confirmed in clinical samples of RSV-infected children [45]. Another gene positively correlating with RSV load and disease severity and significantly up-regulated in the acute phase was BCL6, which was found as a hub gene in blood. B cell lymphoma 6 mediates the development of T follicular helper (Tfh) cells and is an important regulator of Th2 responses. More importantly, it has been suggested that the short-lived, protective, neutralizing antibody response after RSV infection is due to an impaired development of Tfh cells, which are required for plasmablast generation and subsequent antibody formation [46]. B cell lymphoma 6 overexpression has recently been shown for the first time to promote inflammatory responses, but it inhibits antiviral signaling through repressing the IRF7 promoter activity [47]. The down-regulation of the ORM1 gene in the severe NP network suggests a possible correlation between severe RSV infection and uncontrolled inflammatory cytokine production. However, ORM1 is an anti-inflammatory molecule that inhibits neutrophil activation, complement activity, and modulates proinflammatory cytokine secretion by monocytes [48].
Our study has limitations. Our patients were all hospitalized with severe illness and not representative for the entire spectrum of the RSV infection phenotype, and gene expression patterns may not be similar for milder cases managed as outpatients. Only 83/632 patients from the original cohort were selected for the host expression analysis. The selection was based on availability of paired samples and RNA quality for microarray. The demographic characteristics of this subset of patients were not different from the original cohort. On the other end of the spectrum, because mortality was 0%, we also did not have samples from patients who died. A broader spectrum of clinical severity will facilitate the identification of host markers associated with RSV severity. Using steroids can have an impact on the innate immune response. However, the number of patients under this treatment is small (Table 1), and so we were unable to explore this impact. Although our aim was not to identify biomarkers for RSV and hRV infection, a healthy control group would provide us a baseline to correlate the dispersion of the gene expression induced by viral infection and disease severity. Despite the fact that the accuracy of gene expression data from BeadArray platform is highly improved by using multiple probes for each gene, measuring proteins derived from highly expressed genes, ie, cytokines/chemokines, in blood and in prenasal swabs is required to confirm our observations and the potential markers for studying RSV severity. Figure 4. (a) Analysis of acute network in nasopharyngeal (NP). Network analysis using common genes between modules positively correlated with respiratory syncytial virus (RSV) load/severity and those differentially expressed genes (DEGs) identified between acute versus early recovery phase. Red and green nodes represent genes showing increased and decreased expression, respectively. Nodes in gray are direct interaction partners. The size of nodes is proportional to their degree values. (b) Analysis of acute network in blood. Network analysis using common genes between modules positively correlated with RSV load/severity and those DEGs identified between acute versus early recovery phase. Nodes in gray are direct interaction partners. The size of nodes is proportional to their degree values, and the color of nodes are proportional to their betweenness centrality values.

CONCLUSIONS
In conclusion, the present study showed that RSV infection induces strong and prolonged innate immune responses with overlap between local and systemic samples, which was not observed in hRV infection. In severe RSV cases, RSV may cause an unbalanced inflammatory response and interact with the BCL6 gene to escape the host immune response.

Supplementary Data
Supplementary materials are available at The Journal of Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.
B Figure 4. Continued