Method for the extraction of circulating nucleic acids based on MOF reveals cell-free RNA signatures in liver cancer

Abstract Cell-free RNA (cfRNA) allows assessment of health, status, and phenotype of a variety of human organs and is a potential biomarker to non-invasively diagnose numerous diseases. Nevertheless, there is a lack of highly efficient and bias-free cfRNA isolation technologies due to the low abundance and instability of cfRNA. Here, we developed a reproducible and high-efficiency isolation technology for different types of cell-free nucleic acids (containing cfRNA and viral RNA) in serum/plasma based on the inclusion of nucleic acids by metal-organic framework (MOF) materials, which greatly improved the isolation efficiency and was able to preserve RNA integrity compared with the most widely used research kit method. Importantly, the quality of cfRNA extracted by the MOF method is about 10-fold that of the kit method, and the MOF method isolates more than three times as many different RNA types as the kit method. The whole transcriptome mapping characteristics of cfRNA in serum from patients with liver cancer was described and a cfRNA signature with six cfRNAs was identified to diagnose liver cancer with high diagnostic efficiency (area under curve = 0.905 in the independent validation cohort) using this MOF method. Thus, this new MOF isolation technique will advance the field of liquid biopsy, with the potential to diagnose liver cancer.


Blood sample preparation
Blood samples were collected in VACUETTE K3 EDTA tubes for plasma processing or Vacutainer tubes for serum processing.Blood samples were kept at room temperature and samples were processed within 2 h after blood draw.The blood was centrifuged (10 min, 1,500g, 4 °C); the supernatant was transferred to fresh 1-2 mL tubes and centrifuged again (10 min, 3,000g, 4 °C); and the supernatant was used as plasma or serum for follow-up experiments.The plasma or serum was used fresh or flash frozen and stored at -80 °C for long-term storage.Freeze/thaw cycles were avoided.

Commercial kit for circulating nuclei acids extraction
QIAamp ccfDNA/RNA Kit (QIAGEN, Germany) The most widely used commercial method on the scientific research for extracting cfNA from blood is a product developed by Qiagen company (QIAamp ccfDNA/RNA Kit, QIAGEN, Germany, catalog no.55184).The extraction technique is based on the physical properties of nucleic acids, the solubility of nucleic acid is reduced in the ethanol environment, and it is adsorbed to the surface of the column material in the form of precipitate, and is separated from other impurity molecules (proteins, salts, etc.).Then, the nucleic acid is eluted from the column material with water to achieve the effect of purification and enrichment, which is a common way of purifying and enriching nucleic acid.However, this method also has some problems.
There are certain differences in the physical properties of DNA and RNA, the purification and recovery efficiency of DNA and RNA are different, and the purification and recovery efficiency of nucleic acids with different fragment lengths are different, so this purification method exists nucleic acids type and fragment length preference.

cfDNA/cfRNA extraction through MOF enrichment method
The extraction step as follows: (1) Take 500 μL of plasma sample into 1 mL EP tube, add Guanidine thiocyanate solution to lyse serum/plasma, vortex for 5 seconds, and after mixing, let it stand at room temperature for 3 minutes.
(2) Add ZnCl 2 solution and vortex for 1 min immediately, milky white insoluble matter appears, place on ice for 3 min.
(3) Centrifuge at 12000g for 5 min, milky white precipitate can be seen.The supernatant is a clear and light yellow liquid.Transfer the supernatant to a clean EP tube for later use.500 μL of plasma can get 500 μL of supernatant.Add 6 μL Tris (1M, pH=10.89),mix well, and centrifuge briefly.
(4) MOF material adsorbs cfDNA: Add 60 μL of 10 mg/ml IRMOF-74-IV material to the solution obtained in step (3), and react the above reaction on a rotating instrument at room temperature for 2 hours.
(5) After the reaction is over, centrifuge the above reaction at 12000g for 15 minutes, remove the supernatant, and collect the precipitate (the precipitate is a pink solid).(6) Destroy the MOF material to release cfDNA: Add 20 uL of 2 M Hac (pH 2.06) to the pellet, mix by pipetting for 1 min, and leave it at room temperature for 5 min.It can be seen that the pink solid disappears and white flocculent insoluble matter appears.
(7) Purification of step (6) to obtain cfDNA/cfRNA: Ice ethanol precipitation: (a) Add 80 uL of water to the mixture obtained in step (7), and then add 10 μL of sodium acetate (3 mol/L, pH=5.2) and 1 μL glycogen, mix well.(b) Add 300 μL of pre-chilled ice ethanol, mix well, and place it at -80°C for 30 minutes to 2 hours or overnight; (c) Centrifuge at 12000 g for 20 minutes, carefully remove the supernatant, and aspirate all the droplets on the tube wall; (d) Add 75% ice ethanol with 1/2 of the centrifuge tube capacity, pipette three times, centrifuge at 12000g for 10 minutes, carefully remove the supernatant, and aspirate all the droplets on the tube wall; (e) Place the uncapped EP tube on the laboratory table for 5 minutes at room temperature to evaporate the remaining liquid to dryness; (f) Add 20 μL of RNase-free water to dissolve the cfDNA/cfRNA solid at the bottom of the EP tube, and place it in a refrigerator at -20 ℃ for later use or -80 ℃ for long-term storage.
We obtain the circulating cell-free nucleic acids, including cfDNA and cfRNA, this solution was used in downstream experiments of cfDNA.

Purification of cfRNA
Add 1uL DNase I and 2 uL DNase I buffer into the solution (obtained from part 3.1 step 7), react at 37 °C for 1h, and then purify the cfRNA by RNA purification kit, RNA Clean & Concentrator™-5 (ZYMO RESEARCH, R1013, USA), the operation steps were performed according to the instructions provided by the manufacturer.

Characterization of the MOF material
The material used in this work is the same as that reported in our previous work, in which we have fully characterized the material, including PXRD, N2 adsorption analysis, stability of the MOF materials in various buffer conditions, etc.In this project, we also performed PXRD characterization for each synthesized material, and    The material was incubated with lysed blood for 2h, and then centrifuged to obtain the MOF material adsorbed with circulating nucleic acids (cfNA).The peak patterns of the PXRD spectrum (Figure S1, blue line) are identical to those of the untreated material (Figure S1, red line).This indicates that the crystal structure of MOF material is the same as that of the untreated material during the adsorption of cfNA, and the material is stable during the adsorption of cfNA.The next process is the elution of cfNA from the MOF material.We took advantage of the unstable nature of MOF material under acidic conditions and used HAc to destroy the structure of MOF material to release free nucleic acids from the material.Therefore, we tested the stability of the material after incubation with HAc (2 M, pH=2.06), and the PXRD results (Figure S1, yellow line) showed that the crystal structure of the material was all destroyed after the addition of HAc.In summary, MOF is stable and maintains its crystal structure during the adsorption of free nucleic acids from lysed plasma.However, when HAc was added to destroy the structure of MOF during the elution of cfNA, the crystal structure of the material was completely destroyed, so that the cfNA could be released.
Our previous work has demonstrated that Co-IRMOF-74-IV has a high adsorption capacity for DNA/RNA and adsorbs nucleic acids to the inside of the MOF pore [Nat.Commun., 2018, 9(1): 1293; J. Am.Chem.Soc.2020,142, 5049-5059], so we apply this capacity to adsorb very low amounts of circulating nucleic acids in the blood.These very low amounts of nucleic acids can be used as markers for disease diagnosis, but because of their extremely low levels in the blood, silica gel column and magnetic bead methods are not efficient for their extraction.Based on the fact that the Co-IRMOF-74-IV material is unstable under acidic conditions, the circulating nucleic acids were released from the MOF material by destabilizing the structure of the MOF material using HAc (2 M, pH=2.06).Finally, the released nucleic acids were purified by ice-ethanol precipitation and used for downstream analytical experiments, the nucleic acid extraction process is shown in Figure 1 in the main text.

Valuation of recovery efficiency and fragment integrity used MOF method
Recovery efficiency of short DNA/RNA addition in the plasma Figure S3.Comparation recovery yield of different lengths oligonucleotide added into plasma using MOF method and kit method through polyacrylamide gel electrophoresis, sequence used in this experiment is list in Table S8.

cfDNA detection
At first a lot of plasma from different people was mix up, then cfDNA was extracted from 250 μL and 1000 μL plasma by MOF enrichment method and commercial kit, respectively.Finally, all cfDNA was dissolved in 20 μL H 2 O, which was used as the template for qPCR detection and dsDNA Qubit quantification.These assays were performed with three biological repeats by three different laboratory staffs.

cfDNA quantitively detection by Qubit kit
The concentration of cfDNA was test by Qubit™ 1X dsDNA HS Assay Kits (Invitrogen™, Q33230), and Qubit™ Flex Fluorometer, the operation steps were performed according to the instructions provided by the manufacturer.
Table S1.Total quantity of cfDNA was tested by Qubit™ 1X dsDNA HS Assay

Analyses of WGS data from cfDNA
Whole-genome NGS data for cfDNA samples with 150 base-pair pair-end reads from Illumina sequencing were first sent for the adaptor and quality trimming using cutadapt (2).Reads shorter than 25 nt after trimming were excluded.Processed reads were then mapped to the human genome (hg38) by Bowtie2 with the default parameters.Coverage of different samples were calculated for each 100 kb bin, the correlation between samples was calculated by plotCorrelation in deepTools 2.0.

cfRNA analysis cfRNA quantitively detection by Qubit kit:
The concentration of cfRNA was test by Qubit™ microRNA Assay Kits (Invitrogen™, Q32880), and Qubit™ Flex Fluorometer, the operation steps were performed according to the instructions provided by the manufacturer.

cfRNA library construction:
cfRNA library constructed by SMARTer Stranded Total RNA-Seq Kit -Pico Input Mammalian (Takara Bio USA, Inc., 635005), the operation steps were performed according to the instructions provided by the manufacturer.

Quantification of cfRNA fragments
The human hg38 genome and list of transcripts v31 were downloaded from Gencode (www.gencodegenes.org).Raw FASTQ reads were trimmed to remove adaptor contamination using cutadapt.We include a 10 bp random barcode (NNNNNNNNNN) ligated to the fragments during library construction, random barcode serves to identify PCR duplicates from real different fragments with the identical sequences.After removing the random sequence, clean reads were aligned to the human reference genome using STAR 2.7.5a (3).Only the proper pair and uniquely mapped alignments was persisted for the downstream pipelines.Feature Counts v2.0.1 was used to count reads on gene, the number of different types of RNA were counted based on the annotation file from Gencode.
6.1 Comparison of the quantity cfRNA between MOF method and kit method (Qiagen).
To illustrate that DNase I degraded all of the cfDNA completely, we introduced 400 nt-length dsDNA to show the efficiency of DNase I degradation.We added 400 nt-dsDNA to a mixture of cfDNA and cfRNA extracted from plasma, then added DNase I and reacted at 37 °C for 1 h, and finally performed electrophoresis experiments by 1% agarose gel electrophoresis of the reaction products.As shown in Figure S7, the electrophoretic bands of 400nt-dsDNA and cfDNA disappeared after incubation with DNase I, indicating that both 400nt-dsDNA and cfDNA were completely degraded by DNase I, and there was no interference of DNA in the system.
cfRNA has molecules of various lengths and low content, and gel electrophoresis was performed using gel red dye for staining display, which is less effective in imaging single-stranded RNA, and therefore no bands of cfRNA can be seen in the gel electrophoresis pictures.The above steps for degradation of 400 nt-dsDNA and cfNA are consistent with the steps for obtaining cfRNA in the main text, indicating that DNase I degradation of DNA is still very complete and pure cfRNA can be obtained.The quality of sequencing is closely related to the quantity of input RNA during library construction, so when we compare the two methods by sequencing, the quantity of input is the same (calculated according to the microRNA qubit determination method).According to the reviewer's comment, we recalculated the cfRNA comparison data of the two methods in the main text to get the sequencing depth and the duplication rate data.These new results were now added to the revised supplementary information (Figure S9).The data show that for the same amount of input, the sequencing depth of the MOF method is higher than that of the kit method, and the duplication rate is much lower than that of the kit method.In other words, the kit method yields a low number of cfRNA species and results in libraries with low complexity.This new detailed information about the library preparation of these two methods also confirms the better RNA diversity obtained by the MOF method.In order to increase the accuracy of the comparison results of these two methods, we repeated the analysis of cfRNA content, fragment length, and gene type in another batch of plasma samples, and the results are as follows:     The same mixed plasma sample was extracted by MOF method and VAHTS, and then the amount of cfDNA and cfRNA was detected separately.The results are shown in Figure S15, the amount of cfDNA extracted by VAHTS is 1.18 times higher than that of MOF method (Figure S15A).The same result can be observed in the data of qPCR assay again (Figure S15B-C).For the cfRNA comparison results (Figure S15D-F), the VAHTS method was not effective in extracting 18S rRNA of cfRNA from plasma (Ct values were the same as those of the control group without template).
Such a result was expected because the operation manual of the VAHTS method states that it is only suitable for cfDNA extraction.We hypothesize that this is because the buffer conditions for the magnetic beads to adsorb and elute DNA and RNA are not the same, and therefore RNA and DNA cannot be extracted at the same time.The HCV copy number results are shown in Table S2 of the two methods, then we perform the consistency test of the two methods.
Table S2.HCV copy number detected by MOF method and clinical method.In this study, kappa statistical test was used to analyze the consistency of the two 4 methods to detect HCV virus copy number in 30 specimens，showing that the two methods had high consistency (Kappa value=0.724,P=0.000063).The results showed that there were 10 positive samples and 17 negative samples for both methods.The clinical test results were negative, and there were 3 samples tested positive by the MOF method.Only 1 sample was a positive sample by the clinical test results, and the MOF method test results were negative (Table S3).It should be noted that the HCV virus copy number of the 4 samples with inconsistent detection is very low, and the judgment of these samples needs further verification.However, in samples with high HCV virus copy number, the positive diagnostic consistency of the two methods was extremely high.We analyzed the correlation of HCV virus copy number between the two methods in 10 samples with both positive diagnoses, and the result were consistent (r = 0.754, P =0.011788), as showed in Fig. S10.
The HCV virus copy number detected by the MOF method in the same sample is highly consistent with the clinical detection method, indicating that the MOF method for extracting HCV virus RNA from blood has high accuracy and good stability.In positive samples, the number of HCV virus copies detected by the MOF method is slightly lower than that of the clinical method, mainly because the clinical method is only a si mple lysate followed by direct qPCR detection, while the MOF method also includes a step of purifying nucleic acid, which will lead to partial loss.The purified nucleic acid is suitable for small-volume enzyme ligation reactions for downstream high-throughput sequencing detection.

HBV copy number detection by MOF-sequencing and compared with clinical detecting
Serum HBV DNA, hepatitis B surface antigen (HBsAg) and hepatitis B core-related antigens (HBcrAgs), have been shown to correlate with intrahepatic covalently closed circular DNA (cccDNA), furthermore, serum HBV RNA has been considered as a new biomarker for especially in virally suppressed patients with low detectable HBV DNA under nucleos(t)ide analogues therapy.Guo et al. summarized the reported correlations between serum HBV RNA and other serological markers, and showed that serum HBV RNA and HBV DNA had a good correlation, and the correlation coefficient, r was around 0.7.The overall correlation coefficient with serum HBsAg was lower than that with HBV DNA (6).It is worth mentioning that some patients had undetectable serum HBV DNA in the CHB patient group, but HBV RNA signals could be detected (Fig. 4E), probably due to the difference in extraction and detection methods.Because the clinical method used qPCR detection after serum lysis, while our method used transcriptome sequencing after extraction and purification.It also may be due to inconsistent changes in HBV RNA and DNA levels after drug treatment (6).

Identification of HCC-specific cfRNA candidates and evaluation their potential diagnostic value for HCC
A considerable number of cfRNAs were upregulated in HCC samples compared with healthy controls (Figure S19).However, these upregulated cfRNAs rarely overlapped with upregulated mRNAs in HCC tissues (Figure S20).We believe that there are two reasons for the low number of overlaps: the first one is that cfRNA represents a mixture of transcripts reflecting the health status of multiple tissues, thereby affording broad clinical utility.However, several aspects about the physiologic origins of cfRNA, including that which cell types are contributors of cfRNA origin and which RNA molecules in cells are released into the bloodstream, remain unknown.(Figure S17).The other reason is that we set |logFC| > 1, we wanted to get cfRNAs with a greater degree of difference.At the same time, we got cfRNAs with a relatively high base expression by setting the base mean value above 10, thus excluding a considerable number of cfRNAs that could potentially overlap with mRNAs in liver cancer tissues.Therefore, upregulated cfRNAs rarely overlapped with upregulated mRNAs in HCC tissues.
the characterization results are shown in Figure S1.The PXRD data of Co-IRMOF-74-IV were collected on a Rigaku Smartlab 9 kW diffractometer operated at 45 kV, 200 mA for Cu Kα (λ = 1.5406Å).The good agreement between the pattern of activated Co-IRMOF-74-IV and that of the simulated model indicated the material has the same crystal structure as the simulated one.

3. 4 .
Stability of the MOF materials throughout the extraction-elution process and mechanisms of MOF nucleic acids extraction First, we tested the PXRD of the MOF material and the whole extraction process.

Figure S4 .
Figure S4.Assessment of total RNA released efficiency and fragment integrity using MOF method.
Thermal cycling was performed on a CFX-96 TM Real-Time System (Bio-Rad, USA) under the following conditions: 95 °C , 5 minutes; 95°C , 20 seconds; 55 cycles of 95 °C for 3 seconds, 60 °C for 30 seconds.Fluorescence was measured at 60 °C for each cycle(1).All assays were performed with at least three technical replicates.

Figure S5 .
Figure S5.Detection the expression of ALU gene in cfDNA through qPCR using MOF and kit method (repeat the experiment three times).
cfDNA library construction through NEBNext Ultra II DNA Library Prep Kit (NEB, E7645S), the solution from part 3.1 step 7 directly used as input DNA without fragment.

Figure S6 .
Figure S6.Comparation the fragment characterization and correlation of cfDNA between MOF and kit methods through sequencing technology.(A) Evaluation the stability of the MOF method and discrimination from kit method using High-throughput sequencing.(B) The correlation of cfDNA expression level (MOF-1, MOF-2, MOF-3) obtained by the MOF method in three parallel experiments and kit method.(C) The length distribution of these fragments.

Figure S7 .
Figure S7.1% agarose gel electrophoresis of the reaction products of cfDNA and 400nt-dsDNA mixture after DNase I treatment.

Figure S8 .
Figure S8.Sequencing depth and the duplication rate of cfRNA library using MOF method and kit method, respectively.

Figure S9 .
Figure S9.Comparation the quality, fragment distribution, and species distribution of cfRNA between MOF and kit methods.(A) The correlation of cfRNA expression level obtained by the MOF method in two parallel experiments (MOF-A, MOF-B) and kit method (different input serum volume, 0.5 mL,1mL and 2 mL).(B) The reads count of different types of RNA species in MOF and kit method.(C) Proportion distribution of different types of species RNA in MOF and kit method.

Figure. S10 .
Figure.S10.Total quantity of cfRNA was tested by Qubit™ microRNA Assay Kits used by MOF and kit methods.

Figure S11 .
Figure S11.The correlation of cfRNA expression level obtained by the MOF method and kit method from 0.5 mL plasma in two parallel experiments, including MOF method (MOF-1, MOF-3) and kit method (kit-1, kit-3).

Figure S12 .
Figure S12.cfRNA fragments length distribution was detected by the Agilent 2100 Bioanalyzer, and was calculated through sequencing data.

Figure S13 .
Figure S13.cfRNA fragments length distribution was calculated through sequencing data.

Figure S14 .
Figure S14.Number of different kinds of species RNA detected in MOF and kit method (fpkm>1)

Figure S16 . 7 . 2
Figure S16.Linear relationship between the quantification cycle (Ct) values, and the log HCV RNA copy number, in three different experiments.

Figure S18 .
Figure S18.Gene Set Enrichment Analysis (GSEA) for differentially expressed genes between HCC and normal serum samples.

Figure S19 .
Figure S19.Volcano plot of differentially expressed genes between HCC and normal serum samples, HCC and CHB serum samples in training set, and HCC and non-HCC tissue samples (TARGET GTEx dataset), |logFC| > 1 and P <0.05 was used as the cutoff criteria for the differential expression analysis.

Figure S20 .
Figure S20.Overlapping features of differentially expressed genes between HCC and normal serum samples in training set, HCC and CHB serum samples in training set, and HCC and non-HCC tissue samples (TARGET GTEx dataset).

Figure S21 .
Figure S21.The distribution of the 14 cfRNA candidates expression levels in HCC, CHB and normal group of the training set.

Figure S22 .
Figure S22.Selection of the cfRNA candidates through lasso regression analysis with 1000 iteration for classifying patients with HCC and non-HCC.

Figure S23 .
Figure S23.ROC curve of the cfRNA signature for discriminating HCC from CHB and normal together, or from CHB and normal separately in the training set.

Figure
Figure S24.Tissue-wise expression of SETBP1 gene in different cancer types (Data from database of Gene Expression Profiling Interactive Analysis, http://gepia2.cancer-pku.cn/#analysis)

Figure S27 .
Figure S27.Pearson's correlation between markers and age.

Table S3 .
Comparison of HCV-RNA detection results between two methods 2

Table S4 .
HBV copy number detected by MOF assay and clinical assay.

Table S5 .
Characteristics of training population

Table S6 .
Characteristics of validation population

Table S7 .
Literature survey on the function of 6 genes and their association with liver disease

Table S8 .
Sequences of DNA/RNA used in this study

the interference of age (and gender) on the predictive ability of these markers and model score for HCC Age
may be a confounding factor in the diagnosis of HCC by the model in this work, and therefore the interference of age (and gender) on the predictive ability of these markers and model score for HCC needs to be critically assessed.First, we assessed the correlation of the six markers and the model score with age, and found that only C1QTNF4 and the model score were moderately correlated with age as well as statistically significant.The other markers were not statistically significant despite some weak correlation with age (FigureS24).From the model equation cfRNA-score = (-0.65310)*C1QTNF4+0.15879*CYBA + 0.27373*HMGA1 + 0.29307*PCDHB3 + (-0.21822) *SETBP1 + 0.12482*ZNF541 + (-2.25578), score is mainly contributed by negative C1QTNF4 expressions (with the largest absolute values of the coefficients) and they show a negative correlation with age.Therefore, assessing the effect of age on the predictive ability of the model Score depends mainly on the effect of age on the predictive ability of C1QTNF4.Further, we used multifactorial logistic regression to assess the interference of age/sex on the predictive ability of markers and model score for HCC (TableS6).We found that without correcting for any factors (model 1), age, CYBA, HMGA1, PCDHB3, and ZNF541 were risk factors for HCC, and C1QTNF4 and SETBP1 were protective factors for HCC.Corrected for age (model 2) or corrected for age and sex (model 3), the odds ratios (ORs) of 6 markers and score were only mildly altered.Probably due to the small sample size, CYBA and SETBP1 did not reach statistical significance after complete correction (model 3).Notably, C1QTNF4 is still shown to be independent protective factors for HCC after correcting for age (model 2) or correcting for age and sex (model 3).In summary, our model was limitedly disturbed by age differences and demonstrated an independent predictive value for hepatocellular carcinoma.

Table S9 .
The odds ratios (ORs) of 6 markers and score