Longitudinal proteomic investigation of COVID-19 vaccination

Abstract Although the development of COVID-19 vaccines has been a remarkable success, the heterogeneous individual antibody generation and decline over time are unknown and still hard to predict. In this study, blood samples were collected from 163 participants who next received two doses of an inactivated COVID-19 vaccine (CoronaVac®) at a 28-day interval. Using TMT-based proteomics, we identified 1,715 serum and 7,342 peripheral blood mononuclear cells (PBMCs) proteins. We proposed two sets of potential biomarkers (seven from serum, five from PBMCs) at baseline using machine learning, and predicted the individual seropositivity 57 days after vaccination (AUC = 0.87). Based on the four PBMC’s potential biomarkers, we predicted the antibody persistence until 180 days after vaccination (AUC = 0.79). Our data highlighted characteristic hematological host responses, including altered lymphocyte migration regulation, neutrophil degranulation, and humoral immune response. This study proposed potential blood-derived protein biomarkers before vaccination for predicting heterogeneous antibody generation and decline after COVID-19 vaccination, shedding light on immunization mechanisms and individual booster shot planning.


Introduction
The global public health crisis and the social disruption caused by the coronavirus disease 2019 (COVID-19) pandemic have prompted the emergency use of speedily developed vaccines.As of October 2022, over 12 billion doses had been administered globally (Data source WHO COVID-19 Dashboard accessed on December 16, 2022), although the vaccination distribution is significantly unbalanced (van der Graaf et al., 2022).Previous studies have reported that NAbs responses elicited by an inactivated vaccine (CoronaVac ® ) and an mRNA vaccine (BNT162b2) persisted for 6-8 months after full-schedule vaccination and declined to varying degrees (Falsey et al., 2021;Zeng et al., 2022).Therefore, multiple vaccine boosters and prolonged intervals between vaccine doses are needed to maintain the immunity against SARS-CoV-2 (Zhao et al., 2022), and could induce a robust humoral immune response (Ai et al., 2022).Several studies reported the dynamics of NAbs generation and the molecules dysregulation occurring after vaccinations (Liu et al., 2021;Wang et al., 2022), and potential biomarkers for assessing the effectiveness of vaccination (Ma et al., 2021).Seroconversion rates and antibody titers after COVID-19 vaccines are significantly lower in immunocompromised patients than immunocompetent individuals (Lee et al., 2022), including immune-mediated inflammatory disorders, solid cancers, organ transplant recipients and hematological cancers.While to the best of our knowledge based on literature search, no study has systematically reported heterogeneous hematological host responses to vaccination in both PBMCs and serum.There is currently no known biomarker for predicting the effectiveness of vaccines before vaccination.

Machine learning model for predicting the antibody generation
We next developed a set of models for predicting the seropositivity of individuals 57 days after their first vaccination dose and 28 days after their second one (at D57) based on the proteomics and clinical indicators collected prior to both doses (at D0).Machine learning models were developed using XGBoost (Chen and Guestrin, 2016) (Fig. 2A).Proteins or clinical indicators with a significant difference (P-value < 0.05) between the two classes and with |log 2 (fold change)| > 0.25 in the discovery dataset were included in our final feature set.Then, some sparse proteins (NA rate > 50%) were also removed.We optimized the models' parameters in the discovery dataset (Cohort 1), and generated a model based on the five PBMC proteins and another based on the seven serum proteins.Using the test cohort (Cohort 2), the PBMC model achieved an Areas Under the Curve (AUC) score of 0.84, while the serum one of 0.82 (Fig. 2A).Next, we developed an ensemble model combining these two models, which led to better performance (AUC = 0.87) (Fig. 2A).
Five PBMC proteins (UNC45A, IGHM, FADD, NCK2, and DCPS) and seven serum proteins (SERPINA10, SOD3, LTA4H, SPP2, NAGLU, APLP2, and CHRDL2) were selected for our machine learning models.Most of the above PBMC biomarkers are expressed in immune cells, including B cells, macrophages, natural killer (NK) cells, and dendritic cells (Karlsson et al., 2021).They are thus associated with both innate and adaptive immunity (Fig. 2B and 2D).In particular, unc-45 myosin chaperone A (UNC45A) acts as a co-chaperone for HSP90 promoting progesterone receptor function in the cell, and is required for the NK cell cytotoxicity via lytic granule secretion's control (Iizuka et al., 2015).Immunoglobulin heavy constant mu (IGHM) is the constant region of immunoglobulin heavy chains and mediates the effector phase of humoral immunity, which eliminates the bound antigens.Fas associated via death domain (FADD) is an adaptor molecule that interacts with various cell surface receptors, mediates cell apoptotic signals, and is essential in early T cell development (Kabra et al., 2001).The seven serum biomarkers are associated with immunity and metabolism (Fig. 2C and 2E): serpin family A member 10 (SERPINA10) and secreted phosphoprotein 2 (SPP2) are secreted proteins associated with coagulation and metabolism; leukotriene A4 hydrolase (LTA4H) is enriched in Kupffer cells, monocytes, and neutrophils; N-acetyl-alpha-glucosaminidase (NAGLU) is mainly expressed in most immune cells; superoxide dismutase 3 (SOD3) and chordin like 2 (CHRDL2) can interact with the extracellular matrix (ECM) organization (Karlsson et al., 2021).
Based on our three models, only two participants from the test cohort (Nos.209 and 233) were mispredicted.Possibly, their predictions were affected by their drug treatments.Specifically, participant No. 209 was incorrectly predicted to be seronegative.This may be due to the long-term treatment with simvastatin and rosuvastatin against hyperlipidemia, which have been suggested to enhance the immune response (Guerra-De-Blas et al., 2019;Karmaus et al., 2019).Participant No. 233, whose atherosclerosis was treated with bisoprolol fumarate before vaccination, was predicted to be seropositive despite being seronegative at D57.Despite these two mispredictions, our results showed that PBMC and serum proteomics could well predict the individual host responses after vaccination.
In addition, predicting those being negative at D28 and then converting (Group 1) or never converting (Group 0) could allow for the earlier switch to another vaccine.Similarly, we developed a set of models based on proteins at D0 and achieved an AUC score of 0.842 (PBMC model), 0.847 (serum model) and 0.853 (ensemble model combining these two models) to predict the individual host responses after vaccination (Fig. S3A-C)., the first at D0 and the second at D28. Blood samples and PBMCs were collected at D0 (before vaccination), D28, and D57.Participants were divided into four groups based on the xenoreactivity of their NAbs on D28 and D57: Group 0, the seronegative group, included the participants that were seronegative on D28 and D57; Group 1, the late seropositive group, included the participants that were seronegative on Day 28 but seropositive on D57; Group 2, the early seropositive group, included the participants that were seropositive on D28 and D57.Groups 1 and 2 were combined into Group 1 + 2 to bring together all the seropositive participants.Group 1 + 2 was then divided into Group 3 (seronegative participants on D180) and Group 4 (the persistently seropositive group, seropositive participants on D180).120, 25, 13, 107, 5, and 20, respectively.Group 0, the seronegative group, included the participants that were seronegative on D28 and D57.Groups 1 + 2 were all the seropositive participants on D57.

Increased innate and adaptive immunity in the seropositive group
We next explored the differences between the seropositive and the seronegative groups using the PBMC data.Thirtyeight proteins were differentially expressed (DEPs) within the PBMC proteome between the two groups at three time points [Benjamini-Hochberg (B-H) adjusted P-value < 0.05, |log 2 (fold change)| > 0.25] (Table S3; Fig. 3A and 3B).In particular, 33 proteins were dysregulated at D0 or D28.This result suggests that the immune system of the two groups was different at baseline (D0) and was most strongly activated during the early stage after vaccination.
A gene-set variation analysis (GSVA) was then used to identify the most significantly enriched pathways in the seropositive and the seronegative groups.The resulting pathways [B-H adjusted P-value < 0.05, |log 2 (fold change)| > 0.25] were mainly involved in the immune system.They included the IFNγ, IFNα and IFNβ signaling, RNA and DNA modulation, and metabolic pathways.Most of these pathways were upregulated in the seropositive group (Fig. 3C).
The DEPs among the three immune response groups-Group 0, Group 1, and Group 2 (Fig. 1A)-were primarily involved in RNA metabolism, cellular processes, and cytoskeleton regulation-related pathways (Fig. S4A-D).Therefore, the dysregulation of these T2DM: Type 2 diabetes mellitus, which was defined as fasting glucose ≥ 7.0 mmol/L.MAFLD: metabolic associated fatty liver disease.Group 3: the group that became seronegative before D180; Group 4: the group that was seropositive at least until D180.

Protein & Cell
proteins may have contributed to elevated immunity.This agrees with the functional analysis between the seropositive and the seronegative groups.Over 80% of DEPs observed between Group 1 and Group 0 were also detected in the comparison between seropositive (Group 1 + 2) and seronegative (Group 0) groups (Fig. S3D and S3E; Table S3).
We next analyzed the immune cells composition of the PBMCs in our experiment using the deconvolution algorithm of CIBERSORT (Newman et al., 2015).The seropositive group showed an increase in memory B cells and a reduction of naïve B cells (Fig. 3D).In addition, CD8 + T cells and activated memory CD4 + T cells were significantly higher in the seropositive group than  in the seronegative one at D0.The adaptive immune responses were thus significantly enhanced at the baseline in the seropositive group (Fig. 3E).At D57, the memory B cells and the CD8 + T cells showed an upward trend over time and were significantly higher in the seropositive group.This result is consistent with the reported host responses to SARS-CoV-2 infection and vaccination (Chen et al., 2022;Sette and Crotty, 2021).Furthermore, some innate immune cells, such as monocytes, activated NK cells, and activated dendritic cells, also increased over time in the seropositive group (Fig. 3E).Moreover, the early seropositive group showed increased memory B cells, activated NK cells, M1 and M2 macrophages (Fig. S4F-G).These results show that the proportion of SARS-CoV-2-specific memory lymphocytes may increase after vaccination with CoronaVac ® .

The interaction between metabolism and immunity is linked with seroconversion
We next investigated the differences between the seropositive and the seronegative groups using the serum data.A total of 13 DEPs were found at D0 and D28, and two at D57 [B-H adjusted P-value < 0.05, |log 2 (fold change)| > 0.25] (Table S3; Fig. 4A and 4B).In line with our findings from the PBMC data, more DEPs were identified at D0 and D28 than at D57.All the DEPs were upregulated in the seropositive group except transthyretin (TTR).Misfolding and aggregation of TTR, causing amyloid thyroxine protein amyloidosis, has been reported associated with a higher risk of COVID-19 morbidity and mortality (Brannagan et al., 2021).This finding is consistent with our results, as TTR was downregulated in the seropositive group.Many of our serum DEPs are secreted proteins and include components of the immunoglobulin family: IGKV1-8, IGKV1-16, and IGHV3-15 (Schroeder and Cavacini, 2010).
Further functional analyses were performed on the DEPs between the seropositive and the seronegative groups, and among the three immune response groups.The significantly enriched functions were neutrophil degranulation, acute phase response signaling, and hemostasis (Figs.4B, 4C, S5A and S5B; Table S4).It has been shown that the enriched apolipoprotein family could induce the activation of leukocytes, especially the degranulation of neutrophils (Botham and Wheeler-Jones, 2013).Our analyses showed that 15 out of the 17 proteins involved in plasma lipoprotein remodeling and six out of the eight proteins involved in neutrophil degranulation were upregulated in the seropositive groups (Fig. 4C).This finding suggests that the interaction of metabolism and immunity is closely linked with seroconversion.

Predicting individual antibody persistence to guide booster shot planning
To predict whether the antibodies produced after the CoronaVac ® vaccination could last for at least 180 days, we generated machine learning models based on the proteomics data and the clinical indicators collected prior to vaccination.In this analysis, we excluded participants from Group 0 (the seronegative ones) and those without clinical indicators on D180.Then the remaining two cohorts (Table 2) were the discovery cohort (Cohort 3, N = 107) and the test cohort (Cohort 4, N = 20).Similar as before, proteins with a significant difference (P-value < 0.05) between two classes and with |log 2 (fold change)| > 0.25 in the discovery dataset were included in our final feature set.Then, some proteins (NA rate > 50%) were removed.We optimized the models' parameters in the discovery dataset.An AUC score of 0.79 was obtained using only the PBMC proteins (Fig. 5A), indicating that PBMC proteomics had an excellent prediction ability of the antibody response after both 57 and 180 days.
The four PBMC biomarkers (PYCARD, MTMR2, PPCDC, and BRAF) selected by machine learning showed different expression patterns of the immune cells between Groups 3 and 4 (Fig. 5C).In particular, PYD and CARD domain containing (PYCARD) and phosphopantothenoylcysteine decarboxylase (PPCDC) are mainly expressed in innate immune cells, like monocytes and dendritic cells.Myotubularin related protein 2 (MTMR2) and B-Raf proto-oncogene (BRAF), on the other hand, are expressed in innate and adaptive immune cells, including NK cells, monocytes, T cells, and B cells (Fig. 5D).
However, seven participants were incorrectly predicted using this model.Specifically, participants Nos.209 and 216 were incorrectly classified, probably because they both received simvastatin and rosuvastatin (Guerra-De-Blas et al., 2019;Karmaus et al., 2019).In addition, three participants (Nos. 212,222,and 225) with fatty liver disease and metabolic abnormalities were also wrongly predicted, probably due to their metabolic conditions.No. 226 was misclassified may because of receiving dexamethasone and amoxicillin.

Predicting the host response to CoronaVac ® vaccination using machine learning models
We conducted a TMT-based proteomics analysis to profile the PBMC and serum features that could affect the response to the CoronaVac ® vaccination.Using a set of biomarkers measured before vaccination, we built three models to predict individual NAb levels at D57 and another model to predict the persistence of NAbs until at least D180.These potential biomarkers, which were used to distinguish different host responses, were validated using an independent cohort, confirming that the changes in PBMC and serum proteins reflect the pathophysiological differences between seropositive and seronegative subjects.

Potential biomarkers for vaccine-induced antibody generation and persistence
The proteins used by our machine learning classifiers contain several known biomarkers for COVID-19 severity or viral infections.Of note, SERPINA10, predominantly expressed in the liver and subsequently secreted into plasma, inhibits the activity of the coagulation factors Xa and XIa in the presence of protein Z, calcium, and phospholipids (Han et al., 2000).SERPINA10 is a known discriminating feature between severe and non-severe COVID-19 (Shen et al., 2020), and can be used as a classifier of disease severity (Messner et al., 2020).Specifically, it is upregulated in the severe COVID-19 cases.In our data, SERPINA10 was downregulated in the seropositive group with a negative SHapley Additive exPlanations (SHAP) value (Fig. 2C and 2E).This result further highlights the role of coagulation during COVID-19 vaccination and indicates that SERPINA10 may contribute to reducing antibody generation.SOD3, an antioxidant enzyme, has been reported to be downregulated in the urine of severe COVID-19 cases (Bi et al., 2021).In our study, SOD3 was significantly upregulated in the seropositive group with a positive SHAP value, indicating that SOD3 may promote antibody generation.PYCARD, a key mediator of apoptosis and inflammation, is mainly involved in the innate immune response (Wang et al., 2017).It also contributes to T cell immunity stimulation and cytoskeletal rearrangements coupled to chemotaxis and antigen uptake during

Protein & Cell
adaptive immunity (de Souza et al., 2021).We found PYCARD was upregulated in the seropositive group with a positive SHAP value (Fig. 5B and 5C).And thus, we suggest this protein may also promote antibody persistence.These potential biomarkers may promote or reduce antibody generation or persistence, providing therapeutic guidance for vaccination strategy.

Mechanisms behind vaccine-induced immunity
To investigate the molecular mechanisms behind vaccine-induced immunity, we integrated our proteomics analyses and thus generated a summary of the dysregulated pathways between the seropositive and the seronegative groups (Figs. 6,S6 and S7).

Neutrophil degranulation
In our data, we found several proteins involved in activating the neutrophil degranulation-based innate immunity, in particular, LTA4H, LTF, MMP9, TTR, CAP1, PYCARD, and GDI2.Specifically, MMP9, LTF, and CAP1 were upregulated at D28 in the seropositive group.Previous studies have shown that the release of MMP9 from neutrophils stimulates the migration of inflammatory cells and promotes inflammation and the degradation of the alveolar-capillary barrier (Davey et al., 2011).In our seropositive data, MMP9 was upregulated at D28 and then downregulated at D57 (Fig. 6C).This result suggests MMP9 may contribute to a reduced antibody generation, and is consistent with this protein being an indicator of respiratory failure (Ueland et al., 2020) and enhanced mortality risk in COVID-19 patients (C et al., 2021).Indeed, evidence has shown that neutrophil activation is a hallmark of severe SARS-CoV-2 infection (Meizlish et al., 2021).Therefore, we speculate that a modest upregulation of neutrophil degranulation may contribute to immunity activation and vice versa.

Regulation of lymphocyte migration
Most regulators of leukocyte extravasation and lymphocyte migration were elevated at D0 or during the early stages in the seropositive group: MSN, MMP9, CXCL12, FADD, NCK2, and TMSB10.In particular, MSN interacts with members of the ezrin-radixin-moesin family and regulates lymphocyte egress from lymphoid organs (Serrador et al., 1998).TMSB10, CXCL12, and NCK2 regulate the cytoskeleton organization and are involved in transmigration (Fig. 6A).By secreting proteases like MMP9, leukocytes degrade the basement membrane and penetrate the tissue interstitial spaces (Sternlicht and Werb, 2001).T cell receptors are activated through the binding by FADD and the interaction with NCK2, consistently with our immune cell analysis (Fig. 6C).Except for antigen recognition, T cell migration was

Protein &
positively regulated by CXCL12, FADD, and PYCARD in our seropositive groups.Intriguingly, increased FADD at baseline possibly contributed to the enhanced antibody generation at D57 (Fig. 2D).Also, higher PYCARD at baseline led to a long antibody persistence based on our machine learning models (Fig. 5C).

Humoral immune response
The B cell receptor is a complex of surface immunoglobulin, and some of its accessory molecules, such as IGHM, IGHV3-15, IGKV1-8, and IGKV1-16, were upregulated in our seropositive group (Fig. 6C).Following the receptor cross-linking, a complex cascade of signaling molecules results in NF-κB complex and B cell receptor activation.These IgG and cytokines are expressed by JCHAIN and LTF, which were both significantly elevated in our data after vaccination.
We clustered PBMC and serum DEPs over time (D0, D28 and D57), and grouped them into several discrete clusters using mFuzz, respectively (Fig. S6).We focused on four DEP clusters after excluding DEPs in the seronegative group (Group 0): steadily increased PBMC DEPs, decreased PBMC DEPs, increased sera DEPs, and decreased sera DEPs.Then we compared the DEPs and immune response between PBMC and serum data of the seropositive groups in Fig. S7.There were some proteins in PBMC cells that can be secreted into the serum (Fig. S7D), which were downregulated in PBMC and upregulated in serum.This cluster of PBMC and serum overlap proteins mainly functions in neutrophil degranulation and protein-lipid complex remodeling.In addition, we overlapped the proteins from these selected mFuzz clusters of PBMC and serum, and found they are involved in neutrophil degranulation, protein-lipid complex remodeling, platelet degranulation, and complement system (Figs.S6 and S7).Our data show that the neutrophil degranulation-based activation of the innate immunity, the multiple immune cell migration enhancement, and the humoral immune response activation are dysregulated at baseline and during the early stages after vaccination (Fig. 6A-C).

Comparisons with other studies before/after vaccination
Several studies of COVID-19 vaccination have identified modulation of multiple proteins, metabolites, and gene expression after vaccination (Arunachalam et al., 2021;Liu et al., 2021;Zhang et al., 2021;Wang et al., 2022).Besides, multiple peptides and proteins have been identified to predict vaccination efficacy and differentiating COVID-19 patients from vaccinated individuals (Ma et al., 2021), which is promising for inactivated virus vaccine-related applications and translational medicine.Protein & Cell 2021).PBMCs analysis showed that CD8 + T cells, memory B cells, and activated NK cells were increasingly upregulated in the seropositive group.Previous study showed that mRNA vaccinations can significantly enhance the innate immune response, as proven by the greater frequency CD14 + CD16 + inflammatory monocytes and higher concentration of plasma IFNγ (Arunachalam et al., 2021).In our PBMC proteome, monocytes and IFNγ, IFNα and IFNβ signaling were elevated in the seropositive groups.A Proteins involved in the humoral immune response, neutrophil degranulation, network maps of SARS-CoV-2 signaling pathways, and regulation of lymphocyte migration are shown in this network with their corresponding expression levels in the seronegative and the seropositive groups.Group 0: the seronegative group; Group 1 + 2: the seropositive group; Group 3: the group that became seronegative before D180; Group 4: the group that was seropositive at least until D180.
single-cell RNA-sequencing study of the PBMCs of healthy subjects revealed that, after CoronaVac ® vaccination, the levels of B cells, T cells, NK cells, and myeloid cells better resembled those of COVID-19 recovery controls rather than their own before vaccination (Zhang et al., 2021).Similarly, our PBMCs analysis showed that CD8 + T cells, memory B cells, and activated NK cells were increasingly upregulated in the seropositive group.Consistent with other reports (Wang et al., 2022), our findings support that the humoral immune response, complement activation were induced by CoronaVac ® .What's unique in the seropositive participants is activated regulation of lymphocyte migration pathway, which suggests enhanced immunity.

Planning booster shot and their benefits
Due to the relatively high effectiveness of booster immunization against severe COVID-19, hospitalization, and even the Omicron variant (Xue et al., 2022), it should be strongly supported and administered at the appropriate time.The effectiveness and the safety of boosters have been assessed via large-scale randomized studies and individuals, proving the booster's benefits and the negligible impact of its immune-mediated side effects (Zeng et al., 2022).Boosting is particularly important for specific subpopulations: individuals who generate less or shorter-lived NAbs and those who are immunocompromised, such as our seronegative participants (Group 0).Moreover, the vaccination strategy may change for recipients with heterologous or homologous vaccinations (Costa Clemens et al., 2022).Our machine learning models predict the seropositivity of individuals at D57 and their NAbs persistence until at least D180 using potential blood-derived protein biomarkers.These tools can establish which populations or individuals may generate enough and persistent NAbs, and therefore help plan precise booster administrations.Furthermore, a better balance between primary vaccination and booster may benefit more countries in the global fight against COVID-19 ( Krause et al., 2021).
In summary, we performed a systematic PBMC and serum proteomic study of the heterogeneous hematological host responses to vaccination.We developed machine learning models based on panels of proteins expressed at baseline to predict antibody generation and decline after vaccination.The model can be potentially used to identify the individuals of high risk, and guide booster shot, or recommendation of other vaccines.Furthermore, our data also provides a panoramic view of the molecular changes in PBMCs and serum after vaccination.

Limitations of the study
The findings of this study have to be considered in light of some limitations.First, the predictive models need to be further validated in larger cohorts and multicenter samples, both biologically and clinically.This model was established for CoronaVac ® .Whether this model is applicable to other vaccines remains unclear.Nevertheless, the basic principle for different inactivated vaccines is similar.Therefore, we anticipate the findings may be of value to other inactivated vaccines.Second, the explanations for misclassifications are not very strong, may because of complex drug history.Confounding factors including age, sex, smoking history and diseases affecting immunity might influence the proteomic profiling in this study.We have analyzed some as shown in Fig. S1, and did not observe significant confounding factors, but there might be other clinical factors not included in our analysis, which could be studied in the future.Third, the B cell and T cell responses and the neutralization tests were analyzed in the mixed PBMCs but not assessed in vitro.Recently, the dominant strain is Omicron with evolved pathogenicity.The biological insights and predictive model established here developed may not be directly applicable to the changing viruses.However, the AI-empowered proteomic methodology established here could be directly applied to other vaccines.

Experimental design and statistical rationale
The overall goal was systematic investigation of host responses to COVID-19 vaccines, including the heterogeneity among the recipients, and machine learning models to predict the effectiveness of vaccination using potential biomarkers at baseline.Subject information for vaccinated recipients is summarized in Tables 1,  2 and S1.Study design of the TMT labeling-based quantitative proteomics analysis of the PBMCs and sera samples is depicted in Figs.1A and S2A.

Participants and samples
We recruited 163 vaccination recipients (>18 years) who were not infected with SARS-CoV-2 and some of them had stable chronic medical conditions, including hypertension, T2DM, and metabolic fatty liver disease, were eligible to be enrolled from the affiliated hospital of Hangzhou Normal University between January and February 2021, including a discovery (N = 137) and an independent test cohort (N = 26).All participants received two doses of CoronaVac ® (0.5 mL/dose, Sinovac life science, Beijing, China), an inactivated vaccine against SARS-CoV-2; the second dose 28 days after the first one.Blood samples were collected before vaccination (D0), then 28 (D28), 57 (D57).Blood mononuclear cells and serum were extracted from the blood samples.The xenoreactivity was also measured at D0, D28, D57 and 180 after the first dose vaccination (D180).The NAbs for the receptor-binding domain of the SARS-CoV-2 spike protein were detected using the iFlash 2019-nCoV NAb assay (SHENZHEN YHLO BIOTECH CO., LTD, Shenzhen, China, Cat#C86109), which is a paramagnetic particle chemiluminescent immunoassay for the qualitative detection of SARS-CoV-2 NAbs in human serum and plasma using the automated iFlash immunoassay system; the cut-off value for the antibody was 10.00 AU/mL.
The participants were classified into three groups based on the xenoreactivity of their NAbs on D28 and Day 57.Specifically, Group 0 included the participants that were seronegative on D28 and D57; Group 1 included the participants that were seronegative on D28 but were seropositive on D57; Group 2 included the participants that were seropositive on D28 and D57.Groups 1 and 2 were then merged into Group 1 + 2 (all the seropositive participants).After excluding participants without clinical indicators on D180, Group 1 + 2 was then split into Group 3 (seronegative at D180) and Group 4 (seropositive at D180).

Serum and PBMC protein extraction and digestion
From each sample, 4 µL of serum were depleted of 14 high abundant serum proteins using a human affinity depletion resin (Thermo Fisher Scientific™, San Jose, USA) and then concentrated into 50 μL through a 3K MWCO filtering unit (Thermo Fisher Scientific™, San Jose, USA).More details can be found in the manufacturer's protocols.The resulting serum samples were then prepared for mass spectrometry as described (Shen et al., 2020).Briefly, they were denatured in 8 mol/L urea at 31.5°C for 30 min.Next, the proteins were reduced with 10 mmol/L tris (2-carboxyethyl) phosphine (TCEP) and then alkylated with 40 mmol/L iodoacetamide (IAA).Finally, the protein extracts were diluted and digested using a double step trypsinization for 16 h totally (Hualishi Tech. Ltd, Beijing, China).
PBMCs were prepared as previously described (Gao et al., 2020).Briefly, 30 μL of lysis buffer in 100 mmol/L TEAB with 20 mmol/L TCEP, and 40 mmol/L IAA were added to the PCT-Microtubes for 60 min.The proteins were digested using a mixture of trypsin and Lys-C for 120 min.Then, the digestion was arrested by adding 10% trifluoroacetic acid (TFA).

LC-MS/MS analysis
The proteome analysis was performed similar as previously described (Shen et al., 2020).Digested peptides were cleaned-up and labeled using TMTpro 16plex label reagents (Thermo Fisher Scientific, San Jose, USA).Peptides were separated into 30 fractions, which were later combined into 15 fractions.Subsequently, the fractions were dried, redissolved in 2% ACN/0.1% formic acid.All the samples were analyzed liquid chromatography (LC)coupled tandem mass spectrometry (MS/MS) with a data-dependent acquisition mode on an Orbitrap 480 (Thermo Fisher Scientific, San Jose, USA).During each acquisition, peptides were analyzed using a 30 min-long LC gradient 7 to 30% buffer B).The m/z range of MS1 was 375-1,800, with a resolution of 60,000, normalized Automatic Gain Control (AGC) target of 300%, maximum ion injection time (max IT) of 50 ms, and compensation voltages of −48 V and −68 V for FAIMS Pro™.MS/MS experiments were performed with a resolution of 30,000, normalized AGC target of 200%, and 86 ms max IT for Serum and 100 ms for PBMC.The turbo-TMT and the advanced peak determination were enabled.

Database search for proteomics quantification
The mass spectrometric data were analyzed using Proteome Discoverer (Version 2.4.0.305,Thermo Fisher Scientific) and the Homo sapiens protein database downloaded from UniProtKB on 27 April 2020 (Fasta file containing 20,301 reviewed protein sequences).The database search was performed as previously described (Shen et al., 2020), including Carbamidomethyl (C) as a fixed modification and oxidation (M) as a variable modification.The false discovery rate (FDR) was set as 0.01.Data normalization was performed against the total peptide amount.Other parameters followed the default setup.

Quality control of the proteome data
The quality of the proteomics data was ensured at multiple levels.A pool of samples labeled by TMTpro-134N was used as the control for aligning the data from different batches.Also, we assessed the reproducibility of the data using technical replicates, water samples (buffer A) as blanks every four injections to avoid carry-over.
After removing the proteins with over 90% missing values, 6331 proteins of PBMC and 961 of serum underwent quality controls.We then assessed the coefficient of variation in the pooled samples (Fig. S2B).Finally, the Pearson's correlation values of the technical replicates (17 PBMC samples and three serum samples) were used to evaluate the reproducibility of the data (Fig. S2C).

Statistical analysis of clinical indicators
Continuous variables were calculated by Student's t-test or Welch's t-test, Pearson χ 2 test or Fisher's exact test for the analysis of categorical outcomes.We calculated Geometric Mean Titers (GMT) of the neutralizing antibody titers and the overall anti-Spike IgG levels, using the t-test method to compare the difference.Statistical analysis was performed by IBM SPSS Statistics 26 (Armonk, NY: IBM Corp).

Differential expression analysis
A set of statistical tools were used to process and analyze our proteomics data.First, the batch effect of the serum proteome was removed using the R package combat.No other significant batch effect was highlighted by principal component analysis (Fig. S2D  and S2E).For comparing the protein expressions between groups, the log 2 (fold change) was calculated using the mean values of each group.A two-sided unpaired Welch's t-test was performed for each group pair.A one-way analysis of variance (ANOVA) was performed among three groups at three time points.Finally, the adjusted P-values were calculated using the B-H correction.
DEPs were selected by imposing the B-H adjusted P-values to be less than 0.05 and the absolute log 2 (fold change) larger than 0.25.Next, a soft clustering of the time series data was performed using MFuzz (version 2.48.0).We clustered the PBMC and serum DEPs expression along time using default settings (Fig. S6).The single-cell RNA expression of PBMCs was derived from the Human Proteins Atlas (Karlsson et al., 2021).

Estimation of the immune cell type fractions
CIBERSORT is an analytical tool for estimating the cell composition of tissues using their gene expression profiles (Newman et al., 2015).In CIBERSORT, the relative amounts of 20 human immune cell types (including naïve and memory B cells, seven T cell types, NK cells, plasma cells, monocytes, etc.) were estimated in our PBMC bulk cells using the leukocyte gene signature matrix.In addition, vaccinated individuals were divided into seronegative and seropositive groups, and the fraction of each immune cell type was investigated and visualized with bar plots using R software (R 4.0.5).

Machine learning
For prediction of NAbs generation on D57, we used the samples from a discovery cohort (Cohort 1, N = 137) to optimize the model's parameters, the discovery dataset was randomly split into a training (80%) and a validation (20%) dataset.To establish the features for our machine learning models, we used a differential protein expression analysis which returned a set of biomarkers from the PBMCs and the serum (Figs.2A and 5A).Proteins with a significant difference (P-value < 0.05) between two classes and with |log 2 (fold change)| > 0.25 in the training dataset were included in our final feature set.Then, sparse proteins (NA rate > 50%) were removed.The missing values were imputed with the minimum of each protein.We decided on the top N best features as the final feature set for our model, as well as the optimal parameters, by searching for the highest AUC in the validation dataset.All individual models for these two tasks can achieve an AUC of 1.0 in the validation dataset.Finally, the results illustrated in this paper were derived from the model with the best features and parameters.The implementation of machine learning was done using Python 3.8.10 and xgboost 1.4.2python package (Chen and Guestrin, 2016).The model was then tested using an independent test cohort (Cohort 2, N = 26): the first based on PBMC biomarkers and the second on serum biomarkers.We next developed a third model that was an ensemble of the two previous ones.This third model led to an AUC of 0.87, which was higher than using PBMC or serum proteins individually.
For prediction of NAbs persistence till D180, we discarded participants from Group 0 (the seronegative ones) and those without clinical indicators on D180, and the remaining two cohorts: a training cohort (Cohort 3, N = 107) and a test cohort for the validation (Cohort 4, N = 20).We optimized the models' parameters in the training and a validation dataset.Similarly, we tested in Cohort 4, and an AUC score of 0.79 was obtained using only the PBMC proteins (Fig. 5A).

Functional analyses
Specifically, we investigated 38 PBMC DEPs and 14 serum DEPs from different immune response groups using a two-sided unpaired Welch's t-test, and 985 DEPs from PBMCs and 129 DEPs from serum were evaluated using the ANOVA test, biomarker proteins were also included.Several pathway analysis tools were used to perform the functional analysis of our significantly DEPs.Enrichments analyses based on Gene Ontology processes, KEGG pathways, Reactome gene sets, and Wiki pathways were performed using the web-based platform of Metascape (Zhou et al., 2019).With an Ingenuity pathway analysis (Kramer et al., 2014) of the dysregulated proteins, we identified the most significantly dysregulated pathways; P-values were based on a right-tailed Fisher's Exact Test, and the enriched pathways' overall activation/inhibition state was predicted using the z-score.A pathway's regulation was significant if its P-value < 0.05.Gene Set Variation Analysis (GSVA) was performed using the R package GSVA (version 3.11) (Hanzelmann et al., 2013) to identify the most dysregulated pathways (Canonical pathways) between the seronegative and the seropositive groups (B-H adjusted P-value < 0.05).The functional network images generated by Metascape were visualized with Cytoscape (version 3.9.0) to generate the network of predicted associations for a specific group of proteins (Otasek et al., 2019).

Figure 1 .
Figure 1.Study design and overview of clinical indicators.(A) Study design of the TMT labeling-based quantitative proteomics analysis of the PBMCs and sera samples.Vaccination recipients were vaccinated with two doses of 500 µL CoronaVac® , the first at D0 and the second at D28. Blood samples and PBMCs were collected at D0 (before vaccination), D28, and D57.Participants were divided into four groups based on the xenoreactivity of their NAbs on D28 and D57: Group 0, the seronegative group, included the participants that were seronegative on D28 and D57; Group 1, the late seropositive group, included the participants that were seronegative on Day 28 but seropositive on D57; Group 2, the early seropositive group, included the participants that were seropositive on D28 and D57.Groups 1 and 2 were combined into Group 1 + 2 to bring together all the seropositive participants.Group 1 + 2 was then divided into Group 3 (seronegative participants on D180) and Group 4 (the persistently seropositive group, seropositive participants on D180).(B and C) Antibody titers of neutralizing antibodies (B) and Spike-specific IgG (C) to live SARS-CoV-2 at different time points after vaccination.The horizontal line represents the threshold of specific response.The bars represent the median and IQR values of titers.Sample comparisons were tested by Student's t-test or Welch's t-test.* Represents P < 0.05, ** represents P < 0.01, *** represents P < 0.001.
Figure 1.Study design and overview of clinical indicators.(A) Study design of the TMT labeling-based quantitative proteomics analysis of the PBMCs and sera samples.Vaccination recipients were vaccinated with two doses of 500 µL CoronaVac® , the first at D0 and the second at D28. Blood samples and PBMCs were collected at D0 (before vaccination), D28, and D57.Participants were divided into four groups based on the xenoreactivity of their NAbs on D28 and D57: Group 0, the seronegative group, included the participants that were seronegative on D28 and D57; Group 1, the late seropositive group, included the participants that were seronegative on Day 28 but seropositive on D57; Group 2, the early seropositive group, included the participants that were seropositive on D28 and D57.Groups 1 and 2 were combined into Group 1 + 2 to bring together all the seropositive participants.Group 1 + 2 was then divided into Group 3 (seronegative participants on D180) and Group 4 (the persistently seropositive group, seropositive participants on D180).(B and C) Antibody titers of neutralizing antibodies (B) and Spike-specific IgG (C) to live SARS-CoV-2 at different time points after vaccination.The horizontal line represents the threshold of specific response.The bars represent the median and IQR values of titers.Sample comparisons were tested by Student's t-test or Welch's t-test.* Represents P < 0.05, ** represents P < 0.01, *** represents P < 0.001.

Figure 2 .
Figure 2. Machine learning-based prediction of individuals' seronegative or seropositive status based on their PBMCs and serum proteins before vaccination.(A) Our machine learning-based predictor was based on PBMC, serum, and both types of proteins.We used the samples from a discovery cohort (Cohort 1, N = 137) to optimize the model's parameters, the discovery dataset was randomly split into a training (80%) and a validation (20%) dataset.The models were then tested using a test cohort (Cohort 2, N = 26): the first based on PBMC biomarkers and the second on serum biomarkers.We next developed a third model that was an ensemble of the two previous ones.This third model led to an AUC of 0.87, which was higher than using PBMC or serum proteins individually.(B) The SHAP values of the five PBMC proteins were prioritized using the machine learning model.(C) The SHAP values of the seven serum proteins were prioritized using the machine learning model.(D) Boxplots of the selected biomarker proteins from the PBMC samples.(E) Boxplots of the selected biomarker proteins from the serum samples.Asterisks in (D) and (E) indicate the statistical significance based on the unpaired two-sided Welch's t-test.Specifically, the P-values are: * < 0.05; ** < 0.01; *** < 0.001.Group 0: the seronegative group; Group 1 + 2: the seropositive group.

Figure 3 .
Figure 3.Comparison of the immune responses in the seropositive and the seronegative groups using the PBMCs proteome.(A) Identification of NAb status-associated proteins in PBMC using volcano plot analysis on D0, D28, and D57 (two-sided unpaired Welch's t-test).Proteins with B-H adjusted P-value < 0.05 with |log 2 (fold change)| > 0.25 were considered as significantly differential expression.(B) Heatmap of the proteins that were significantly regulated in (A).The expression of each protein is shown for both immune response groups and at D0, D28, and D57.(C) Heatmap of the most significantly differentially enriched pathways between the seropositive and the seronegative groups generated using GSVA [B-H adjusted P-value < 0.05, |log 2 (fold change)| > 0.25].(D) Barplots visualizing the inferred proportions (mean ± standard error of mean) of 20 immune cell types.(E) Barplots visualizing the average proportions (mean ± standard error of mean) of B cells, T cells, and several innate immune cells, in the seronegative (Group 0) and seropositive (Group 1 + 2) groups at three time points.Asterisks in (D) and (E) indicate the statistical significance based on the Mann-Whitney rank test.P-value: * < 0.05; ** < 0.01; *** < 0.001.

Figure 4 .
Figure 4. Comparison of the immune responses in the seropositive and the seronegative groups using the serum proteome.(A) Identification of NAb status-associated proteins in serum using volcano plot analysis on D0, D28, and D57 by the two-sided unpaired Welch's t-test.Proteins with B-H adjusted P-value < 0.05 with |log 2 (fold change)| > 0.25 were considered as significantly differential expression.(B) Heatmap of the proteins that were significantly regulated in (A).The expression of each protein is shown for both immune response groups and at D0, D28, and D57.(C) The most significantly enriched networks generated using significantly dysregulated proteins from the serum proteome.Proteins involved in plasma lipoprotein remodeling and neutrophil degranulation are shown with their expression levels in the seropositive and the seronegative groups at three time points.The cut-off of the dysregulated proteins was set at P-value < 0.05 and |log 2 (fold change)| > 0.25.The proteins highlighted with a red * had B-H adjusted P-values < 0.05, while those with a black * were selected from our optimized machine learning models.Group 0: the seronegative group; Group 1 + 2: the seropositive group.
Figure 5. Proteomics of seropositive and seronegative individuals 180 days after CoronaVac ® vaccination.(A) Workflow for generating a model to predict the antibody persistence till D180.We discarded participants from Group 0 (the seronegative ones) and those without clinical indicators on D180, the remaining two cohorts: a training cohort (Cohort 3, N = 107) and a test cohort for the validation (Cohort 4, N = 20).(B) SHAP values of the machine learning classifier trained with selected PBMC proteins.(C) Expression of the selected proteins from the PBMC samples.The asterisks indicate the statistical significance based on the unpaired two-sided Welch's t-test.P-value: * < 0.05; ** < 0.01; *** < 0.001.Group 3: the seronegative group on D180; Group 4: the persistently seropositive group.(D) Relative expression of the proteins selected for our model in the different cell type clusters of PBMCs [data from the Human Proteins Atlas (Karlsson et al., 2021)].

Figure 6 .
Figure 6.Functional and network analyses of the seropositive and seronegative groups' immune responses: a comparison between PBMC and serum data.(A) Chord diagrams of the most enriched pathways based on the significantly dysregulated proteins and the potential biomarker proteins.(B) Network analysis of the most significantly enriched pathways (with their P-values) based on the DEPs and the potential biomarker proteins.(C) Key PBMC and serum proteins characterized in seronegative and seropositive recipients.Proteins involved in the humoral immune response, neutrophil degranulation, network maps of SARS-CoV-2 signaling pathways, and regulation of lymphocyte migration are shown in this network with their corresponding expression levels in the seronegative and the seropositive groups.Group 0: the seronegative group; Group 1 + 2: the seropositive group; Group 3: the group that became seronegative before D180; Group 4: the group that was seropositive at least until D180.

Table 2 . Clinical metadata of the subjects for Group 3 and Group 4.
Data are shown as mean values (standard deviation in parentheses).Pearson χ 2 test or Fisher's exact test were used to analyze the categorical outcomes and Student's t-test or Welch's t-test for continuous outcomes.Hypertension was defined as systolic blood pressure ≥ 140 or diastolic blood pressure ≥ 90 mmHg.