Single Cell and Plasma RNA Sequencing for RNA Liquid Biopsy for Hepatocellular Carcinoma

BACKGROUND: Human plasma contains RNA transcripts released by multiple cell types within the body. Single-cell transcriptomic analysis allows the cellular origin of circulating RNA molecules to be elucidated at high resolution and has been successfully utilized in the pregnancy context. We explored the application of a similar approach to develop plasma RNA markers for cancer detection. METHODS: Single-cell RNA sequencing was performed to decipher transcriptomic proﬁles of single cells from hepatocellular carcinoma (HCC) samples. Cell-type-speciﬁc transcripts were identiﬁed and used for deducing the cell-type-speciﬁc gene signature (CELSIG) scores of plasma RNA from patients with and without HCC. RESULTS: Six major cell clusters were identiﬁed, including hepatocyte-like, cholangiocyte-like, myoﬁbroblast, endothelial, lymphoid, and myeloid cell clusters based on 4 HCC tumor tissues as well as their paired adjacent nontumoral tissues. The CELSIG score of hepatocyte-like cells was signiﬁcantly increased in preoperative plasma RNA samples of patients with HCC ( n ¼ 14) compared with non-HCC participants ( n ¼ 49). The CELSIG score of hepatocyte-like cells declined in plasma RNA samples of patients with HCC within 3 days after tumor resection. Compared with the dis-criminating power between patients with and without HCC using the abundance of ALB transcript in plasma [area under curve (AUC) 0.72)], an improved perfor-mance (AUC: 0.84) was observed using the CELSIG score. The hepatocyte-speciﬁc transcript markers in plasma RNA were further validated by ddPCR assays. The CELSIG scores of hepatocyte-like cell and cholangiocyte trended with patients’ survival. CONCLUSIONS: The combination of single-cell transcriptomic analysis and plasma RNA sequencing represents an approach for the development of new noninvasive cancer markers.


Introduction
The genetics and epigenetics of circulating cell-free DNA are being actively studied, leading to numerous novel diagnostic approaches for noninvasive prenatal testing and cancer detection (1)(2)(3)(4)(5)(6)(7)(8)(9). This field can be generally referred to as "liquid biopsies." On the other hand, there is much less interest being paid to the diagnostic and research uses of plasma transcriptome, with early work directed to the development of noninvasive prenatal markers (10,11). Tsui et al. demonstrated that plasma RNA sequencing could provide a holistic view of the maternal plasma transcriptomic repertoire, revealing a set of pregnancy-associated transcripts as well as the relative contributions from the placenta and maternal sources (12). Another study further confirmed that the use of plasma transcriptome would allow the obtaining of the relative transcriptional contributions to plasma RNA pool from different tissues (13). The plasma transcriptome thus possesses a wealth of information related to the tissues of origin.
A number of challenges, however, still exist in exploiting plasma RNA for diagnostic purposes. First, the quantities of tissue-specific transcripts in plasma are highly variable (12), which could be attributed by the variable cellular compositions of different cell types in that tissue, hampering the discovery of a set of robust transcript biomarkers. Second, many organ-specific transcripts derived from a bulk tissue might not be specifically associated with certain diseases. For example, the liver-specific ALB messenger RNA (mRNA) molecules in the plasma were reported to be increased both in patients with hepatocellular carcinoma (HCC) and those with chronic hepatitis B virus (HBV) infection, compared with healthy control participants (14).
A bulk tissue consists of a number of cell types. If one could dissect the cell-type-specific transcripts and determine the correlations between those transcripts and diseases, it might improve the signal-to-noise ratio regarding the targeted RNA molecules in plasma. In a previous study, we combined single-cell RNA sequencing and cellfree RNA sequencing for tracing the placental cellular dynamics during pregnancy in a noninvasive manner (15). More importantly, the expression of transcripts associated with extravillous trophoblast was significantly higher in pregnant women affected by preeclampsia than unaffected pregnant women. On the other hand, the expression of transcripts associated with other cell types in the placenta such as syncytiotrophoblasts, decidual cells, and endothelial cells appeared to exhibit no significant difference between these 2 groups (15). Thus, based on single-cell RNA sequencing technology, the dissection of cell-type-specific transcripts in the plasma would enhance the noninvasive detection of diseases.
In this study, we explored the feasibility of using cell-type-specific transcript signals in the plasma RNA pool to inform the presence of cancer. We used HCC, the most common type of liver cancer, as a model. Celltype-specific transcripts in plasma were determined by single-cell sequencing of HCC tumor and paired normal tissues. On the basis of plasma RNA sequencing of patients with HCC, the plasma RNA molecules contributed by such cell-type-specific transcripts were investigated.

SAMPLE COLLECTION
This study was approved by the Joint Chinese University of Hong Kong-Hospital Authority New Territories East Clinical Research Ethics Committee. Patients with HCC (n ¼ 14) were recruited, 20 mL of peripheral blood was drawn into EDTA-anticoagulated blood tubes before (preoperative samples) and after (postoperative samples) surgical resections. Seven of the 14 patients with HCC had consented to the collection of postsurgical samples. Postoperative blood samples (n ¼ 7) were taken within 72 h after tumor resection. For healthy controls (n ¼ 8), participants with chronic HBV infection with cirrhosis (n ¼ 23) and without cirrhosis (n ¼ 18), 20 mL of peripheral blood from each participant was drawn into EDTA-anticoagulated tubes. For survival data analysis, day 0 was defined as the day of surgical resection. Details about the participants are shown in Table 1 in the online Data Supplement.
We obtained the independent tumoral tissue samples (n ¼ 4) and the paired adjacent nontumoral tissue samples (n ¼ 4) for single-cell RNA sequencing on the basis of the frozen-section from the Department of Surgery of The Chinese University of Hong Kong at the Peripheral blood samples were centrifuged at 1600g for 10 min at 4 C, and the plasma portion was recentrifuged at 16 000g for 10 min at 4 C (16). The RNA materials extracted from plasma were subjected to the digestion of DNA using the TURBO DNA-free TM kit (Thermo Fisher Scientific), minimizing the influence of residual DNA. Details of plasma RNA sequencing library preparation, sequencing data process and analysis, identification of cell-type-specific gene signature (i.e., a gene preferentially expressed in a particular cell type), and the deduction of the cell-type-specific gene signature (CELSIG) score are provided in the Materials and Methods in the online Data Supplement. In brief, the CELSIG score was calculated by the mean log 2 -transformed expression level of cell-type-specific gene signatures in plasma RNA using the formula: Because a sufficient quantity of empty droplets would be required to apply Poisson statistics to ddPCR data analysis, we independently collected only 8 mL of preoperative (n ¼ 15) and postoperative peripheral blood for HCC (n ¼ 12). We also collected 8 mL of peripheral blood for non-HCC participants, including participants with chronic HBV infection with (n ¼ 10) and without cirrhosis (n ¼ 9). Details of the ddPCR design and analysis are provided in Supplemental Table 2 as well as Materials and Methods in the online Data Supplement.

SEQUENCING DATA AVAILABILITY
Sequence data for the participants studied in this work have been deposited at the European Genome-Phenome Archive (17), hosted by the European Bioinformatics Institute (accession no. EGAS00001005194).

STATISTICAL ANALYSIS
Kaplan-Meier survival curves were analyzed using Prism 8 (GraphPad). Receiver operating characteristics curves were plotted using R programming language (18). The DeLong test was performed using R. The Mann-Whitney U tests were performed using R. A P value <0.05 was considered statistically significant.

SINGLE-CELL TRANSCRIPTOMIC PROFILING OF CELLS FROM HCC TUMOR AND PAIRED ADJACENT NORMAL TISSUES
We used high-throughput droplet-based single-cell RNA sequencing (10x Genomics) to profile a total of 17 176 cells from 4 patients with HCC, consisting of 5782 cells originating from HCC tumor samples and 11 394 cells from paired normal liver samples, based on the analytical framework described previously (15). Briefly, the dissociated single cells were individually compartmentalized into each droplet. The transcripts served as templates for generating complementary DNA by barcoded primers comprising an oligo (dT) sequence, a 10-nucleotide (nt) molecular index for uniquely determining each mRNA strand, 16-nt barcode unique to each cell, and a universal sequence. All droplets were pooled together and broken to collect barcoded complementary DNA molecules for sequencing. The sequencing reads were traced back to the original cells accordingly to the barcode sequences, allowing for determining cell types and identifying cell-type-specific gene signatures (Fig. 1, A).
The median number of analyzable cells per sample was 1903 (range: 391-5400). The median number of detectable transcripts per cell was 6340 (range: 2122-16 812), corresponding to a median of 1718 genes per cell (range: 750-2898) (Supplemental Table 1). We performed clustering analysis by t-stochastic neighborhood embedding (t-SNE) (15). Cells were clustered into different groups according to the similarities of transcriptomic profile between cells. As shown in the t-SNE plot (Fig. 1, B), each dot indicated one cell. The t-SNE analysis displayed 6 major cell clusters (Fig. 1, B). Cluster C1 was transcriptionally defined as hepatocyte-like cells with the expression of liver-specific genes such as ALB, F2, APOA2, and HP; cluster C2 was defined as cholangiocyte-like cells, with the expression of cytokeratins (KRT7, KRT19), claudins (CLDN4, CLDN10) and MMP7; cluster C3 was deemed myofibroblast-like cells with the expression of genes specific to hepatic stellate cells such as ACTA2, and collagen-related genes (COL1A2, COL3A1); cluster C4 corresponded to endothelial cells with the expression of TEK, DNASE1L3, CLEC4G, and FCN3; cluster C5 represented lymphoid cells expressing CD3D, CST7, GZMA, and NKG7; and cluster C6 would be myeloid cells expressing CD163, CD86, AIF1, S100A8, S100A9, and MARCO (Fig. 1, B). Figure 1, C shows the expression level of a representative marker of each cluster in the t-SNE map. The full list of cell-type-specific genes is shown in Supplemental Table 3.

CELL-TYPE-SPECIFIC GENE SIGNATURES
We attempted to identify the cell-type-specific gene signatures that were preferentially expressed in a particular cell cluster. The gene-expression z-score was used to measure the expression preference of a transcript in a particular cell type. The z-score quantified a transcript expression level in a cluster, equivalent to the number of standard deviations from the mean of the remaining clusters. A z-score cutoff of 3 was used for defining the cell-type-specific transcript. In total, we identified 39, 15,21,20,19, and 20 cell-type-specific gene signatures from hepatocyte-like, cholangiocyte-like, myofibroblast-like, endothelial, lymphoid, and myeloid cells, respectively (Fig. 1, D). For example, a number of previously reported liver-specific transcripts such as TF, ALDOB, and F2 were overexpressed in the hepatocyte-like cells. In addition, we identified numerous transcripts specific to the hepatocytelike cells, including KNG1, FGL1, and HPD. Among them, FGL1 (fibrinogen-like 1) gene expression was very recently reported to be higher in HCC tissues than in adjacent normal tissues (19), but FGL1 had not yet been well studied in the plasma RNA pool.

PLASMA RNA POOL
We further studied whether the use of cell-free-specific gene signatures in plasma would be able to inform the presence of cancer. Based on plasma RNA sequencing results, CELSIG scores were calculated for healthy controls (n ¼ 8), patients with HBV infection with (n ¼ 23) cirrhosis and without (n ¼ 18) cirrhosis (i.e., HBV carriers), and preoperative patients with HCC (n ¼ 14), with 7 paired postoperative patients collected within 72 h after the tumor surgery. The CELSIG score represented the transcriptional contributions to the plasma RNA pool from different cell clusters.
A median of 185 million sequencing reads (range: 103-255 million) were obtained. Interestingly, the plasma CELSIG score related to hepatocyte-like cells was significantly increased in patients with HCC (median: 3.60; range 0.90-6.97) compared to the healthy controls and HBV carriers with and without cirrhosis (median: 1.54; range 0.28-7.24) (P ¼ 0.0002; Mann-Whitney U-test) (Fig. 2, A). The plasma CELSIG score related to hepatocyte-like cells quickly declined within 72 h after surgical tumoral resection (median: 1.99; range 0.84-3.41) (P ¼ 0.03; Mann-Whitney U-test) (Fig. 2, A). In contrast, no such changing patterns of the CELSIG score were observed in other cell clusters including cholangiocyte-like, myofibroblast-like, endothelial, lymphoid, and myeloid cell clusters (Supplemental Fig. 1, A). Furthermore, the CELSIG score of lymphoid cells was significantly decreased (P ¼ 0.006; Mann-Whitney U-test) in patients with HCC (median: 7.47; range 5.28-9.69), compared with non-HCC groups (median: 8.46; range: 5.55-9.98) (Fig. 2, B). However, the CELSIG score of lymphoid cells paradoxically appeared to be further decreased in postoperative patients, perhaps suggesting the surgical removal of liver cancer would trigger the release of confounding transcripts from other cell types to the plasma RNA pool.
To further study whether the transcript signatures of hepatocyte-like cells were mainly derived from the platelets, we obtained the expression levels of 39 signatures of hepatocyte-like cells in platelets for healthy control participants from a published study (20) and the normal liver tissues from transcriptomic BodyMap (accession number: GSE30611; Gene Expression Omnibus database). We found that the expression levels of transcript signatures of hepatocyte-like cells were much higher in the liver tissues [median fragments per kilobase per million mapped fragments (FPKM): 375.2; IQR: 194.9-857.9], in comparison with platelets (median FPKM: 0.006; IQR: 0-0.07). These data indicated that there was a low probability for those signature signals of hepatocyte-like cells mainly originating from platelets.

VALIDATION OF HEPATOCYTE-LIKE GENE SIGNATURES
PRESENT IN PLASMA USING DDPCR Figure 3, A and Supplemental Fig. 1, B show the expression levels in plasma RNA for representative transcripts specific to hepatocyte-like cells. Such hepatocyte-like cell specific transcripts were found to be generally increased in the plasma RNA of patients with HCC compared with non-HCC participants, whose expression levels were decreased after the surgical removal of HCC tumors (Fig. 3, A and Supplemental Fig. 1, B). For example, the FGL1 gene displayed nearly no expression in plasma of HBV carriers with and without cirrhosis (mean expression level in FPKM: 0.90; range: 0-14.78) (Fig. 3, A), according to the plasma RNA sequencing results. The FGL1 expression level was sharply increased in plasma of preoperative patients with HCC (mean expression level: 4.92; range: 0-29.59), rapidly declining after the surgical resection of HCC tumors (mean expression level: 0.50; range: 0-3.32) (Fig. 3, A). In contrast, the pattern of the ALB gene expression appeared to be less specific to the preoperative patients with HCC (Fig. 3, B).
To orthogonally validate whether the cell-typespecific gene signatures would be robustly detectable in the plasma RNA pool, we designed ddPCR assays to quantify the expression levels for 7 representative genes (TF, GSTA1, ORM2, FGL1, KNG1, ALDOB, HPD) out of the 39 hepatocyte-like specific genes ( Table 1). The gene expressions were normalized to the reference gene ACTB in plasma (Supplemental Fig. 2). The ACTB gene expression was chosen as the reference gene in this study because its expression level was shown to be relatively high (median FPKM: 1830; range: 933-3027), with the smallest variation across different participants [coefficient of variation (CV): 24%] when compared to 8 genes with a comparable expression level in plasma relative to the ACTB (median CV: 44%). These genes included RPS11, SH3BGRL3, PPBP, RPS27, HBB, AC138811, AC006064, and AL138963. All 7 transcripts specific to hepatocyte-like cells showed generally consistent expression patterns in plasma RNA across groups measured by ddPCR, when compared with plasma RNA sequencing results (Supplemental Fig. 2). Figure 3, C and D shows that the patterns of the FGL1 and ALB gene expressions in plasma were confirmed by the corresponding ddPCR assays. The dot plots of FGL1 and ALB gene expressions for individual patients across groups are shown in Supplemental Fig.  3, A and B.

SIGNATURES
We further investigated the potential clinical applications of using those cell-type-specific gene signatures identified from this study. Because we observed that the CELSIG score of hepatocyte-like cells was increased in patients with HCC whereas the CELSIG score of lymphoid cells was decreased, the ratio of CELSIG score of hepatocyte-like cells to lymphoid cells (i.e., H/L ratio) was used for studying the clinical potential of HCC detection. Receiver operating characteristic curve analysis revealed that H/L ratio achieved an area under the receiver operating characteristic curve (AUC) of 0.84 for differentiating between patients with and without HCC, suggesting feasibility in differentiating between patients with and without HCC. H/L ratio-based analysis appeared to be superior to that based on plasma ALB expression level (AUC: 0.72) (P ¼ 0.03; DeLong test) (Fig. 4, A). More importantly, when differentiating between HCC patients and patients with chronic HBV infection, H/L ratio resulted in an AUC of 0.82 that was higher than the ALB expression-based analysis (AUC: 0.68) (P ¼ 0.02; DeLong test) (Fig. 4, B). These results indicated that the combined analysis of single-cell and plasma RNA sequencing data might be superior to analysis based on a single plasma RNA biomarker. As the CELSIG score reflected the transcriptional contribution of a cell type to the plasma RNA pool, it might be correlated with the turnover of cell death of particular tissues/organs that would contribute RNA transcripts to the blood circulation. We envisioned that the CELSIG score of a cell type might correlate with disease outcomes. Thus, we analyzed the correlation between the patient survival data and the CELSIG score of each cell type. We divided the preoperative patients with HCC (n ¼ 14) into 2 groups based on a CELSIG score below or above the median value. The H/L ratio appeared to be correlated with survival data (Fig. 4, C) (i.e., the higher the H/L ratio, the shorter the survival time). However, H/L ratio analysis did not achieve statistical significance (P ¼ 0.15; Gehan-Breslow-Wilcoxon tests) (Fig. 4, C), perhaps because of the small sample size available for this analysis. Of note, we observed a significantly longer survival time for patients with lower CELSIG scores of cholangiocyte-like cells than those with higher scores (P ¼ 0.03; Gehan-Breslow-Wilcoxon tests) (Fig. 4, D).

Discussion
Here we demonstrate that the plasma RNA transcriptomic profiling in combination with single-cell RNA sequencing can be used for the development of noninvasive tumor markers. Single-cell RNA sequencing enabled a high-resolution mapping of cell types, thus allowing the identification of those disease-associated cell-type-specific gene signatures that were preferentially present in the plasma RNA pool.
This proof-of-concept study illustrated that the analysis of multiple plasma transcript signals of hepatocyte-like cells (i.e., CELSIG score) led to a better power of differentiating patients with and without HCC, compared with a single plasma transcript marker. One possible reason may be that a single transcript marker may not be universally present in all HCC tumors, resulting in a poor clinical sensitivity, as HCC tumors were reported to be highly heterogeneous (21)(22)(23). Another possible reason may be that a single transcript marker is expressed in multiple cell types, resulting in a decrease of clinical specificity. In contrast, the use of multiple transcript markers specific to a cell type would offer a new possibility to not only increase the sensitivity with more transcript signals available, but also improve the clinical specificity because of the removal of confounding transcript signals from other cell types.
Among the 39 transcripts identified to be specific to hepatocyte-like cells, the FGL1 (fibrinogen-like 1) gene expression level quantified by plasma RNA sequencing was elevated in patients with HCC, showing a potential for further development as a biomarker. The FGL1 was recently reported to be transcriptionally upregulated in HCC tissues and the HCC cell line (HepG2) (24). Furthermore, there was a significant increase of FGL1-positive cells by multiplex immunofluorescence staining in HCC tumors compared with the adjacent nontumoral tissues (19), and FGL1 was suggested as a prognostic marker. Therefore, the circulating FGL1 mRNA concentration may be a useful marker for diagnosis and prognosis.
Because the sample size was relatively small in this proof-of-concept study, a validation study with a large sample size would be warranted. It would be of interest to design an assay for plasma RNA targeted sequencing based on those 39 hepatocyte-like signatures identified in this work, exploring a cost-effective approach for HCC detection.
The ratio of the CELSIG score of hepatocyte-like cells to lymphoid cells (i.e., the H/L ratio) demonstrated a trend of correlation with overall survival, although the results did not achieve statistical significance. On the other hand, we observed that patients with HCC who had lower CELSIG scores of cholangiocyte-like cells exhibited a longer overall survival time. It has been reported that HCC tumor tissues can acquire the characteristic of positivity of cholangiocyte markers during the process of tumor progression, showing a higher positivity of cholangiocyte markers in larger-sized and more poorly differentiated tumors (25,26). We speculated that one possible mechanism may be that the release of mRNA from dying transdifferentiated cholangiocytes from hepatocytes during focal damage repair in HCC.
It has also been reported that RNA profiles of blood platelets derived from patients with cancer (i.e., tumor-educated platelets) are distinct from platelets of healthy individuals (20). However, in this study, there was a low probability of those signature signals of hepatocyte-like cells present in plasma that were contributed by platelets, as the expression levels of those hepatocyte-derived signatures were much lower in platelets in compared with liver tissues.
Those tissue-specific RNA transcripts that are readily detectable in plasma might be attributed to the potential protection of RNA species from degradation in blood circulation. For example, certain plasma RNA would be contained in extracellular membrane vesicles (27), and thus might be protected from degradation. Yao et al. reported that many mRNA fragments in plasma would be associated with protein-binding sites that could render such RNA species resistant to plasma nucleases (28). Whether such RNA protection might be associated with tissue specificity will be an interesting avenue for future research. Since a human genome could encode more than 80 000 mRNA transcripts (The GENCODE Project) (29), i.e., one order of magnitude more pervasive than mature micro RNAs (miRNAs) (approximately 2600) (30). Thus, theoretically, the information available for the analysis based on plasma mRNA transcripts would be 30 times more than that based on mature miRNAs. On the other hand, the miRNA species would be commonly bound by the argo-naute2 protein. Such miRNA-argonaute2 complexes might specifically stabilize plasma miRNAs (31). The potential lengthening of the half-life of the miRNAs in circulation would be one possible advantage for biomarker development.
In summary, this proof-of-concept study has demonstrated an integrative analysis of single-cell and plasma RNA sequencing data, shedding light on an approach for the development of new cancer markers.