Rh blood group D antigen genotyping using a portable nanopore based sequencing device: proof of principle

Abstract


Introduction
Nanopore sequencing, also known as third generation sequencing (TGS) and single molecule sequencing (SMS) enables fast and direct sequencing of single stranded DNA molecules using biological pores.Nanopore sequencing, first proposed in the 1980s (1,2), overcomes limitations in sequencing by synthesis technologies, such as next-generation sequencing (NGS), and allows faster library preparation and real-time sequence data analysis.Although NGS has allowed for high throughput sequencing while lowering the cost, short reads generated during NGS library preparation have made de novo assembly for large genomes difficult due to repetitive DNA sequences (3 -5).TGS does not rely on polymerase chain reaction (PCR) amplification but aims for SMS with real-time data analysis.The PCR free approach in TGS abolishes sequencing biases introduced by PCR (6, 7).The advancement of TGS reduced time of library preparation and sequencing from days to hours when compared to NGS (3,7).
In 2014, Oxford Nanopore Technologies introduced the small portable nanopore based sequencing device, named the MinION (8,9), which offered different cost-efficient sequencing kits to meet various sequencing needs.MinION sequencer technology is based on a flowcell containing 512 pores that are derived from Escherichia coli curli (2, 10), embedded in a synthetic membrane submerged in ionic solution (7,8).By applying a voltage, a DNA molecule is driven through the pores causing changes in the ionic current running through the pores in a distinctive manner, described as "squiggle" (2,7).These changes are measured by a sensor thousands of times per second (11), which are then translated to nucleotides using software, in a process known as basecalling.
The MinION sequencer has been used in infectious agent surveillance and clinical diagnosis since these areas would benefit the most from real-time sequencing technology (3).Studies have shown the great potential of the MinION, for example, during the Ebola and Zika virus outbreaks (14,15).The technology was also used to sequence the SARS-CoV-2 during the COVID-19 pandemic (16 -18).
Different studies have used the nanopore sequencer to detect DNA and RNA modification, such as methylation in bacterial and mammalian genomes (19 -21).
Although the use of nanopore sequencing has not been widely investigated in blood group genotyping (BGG), it has been shown effective for clinical genotyping of human leukocyte antigens (22)(23)(24).Other real-time SMS technology has been used to genotype the atypical chemokine receptor 1 (ACKR1) gene that encodes the Duffy blood group antigens, resulting in the establishment of ACKR1 allele specific reference sequences (25).MinION has also been used in ABO, alpha 1-3-Nacetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase (ABO) genotyping ( 26) by sequencing a 7 kb amplicon, covering the region of exons 6 and 7, successfully allowing the differentiation of six ABO genotypes.
The Rh blood group system (ISBT004) is the second most important blood group system after ABO (27,28) and one of the most polymorphic blood group systems.The Rh protein is expressed in a complex transmembrane structure, where the RhCcEe protein is encoded by the Rh blood group CcEe antigens (RHCE) gene and the RhD protein is encoded by the Rh blood group D antigen (RHD) gene (29,30).The D antigen is the most clinically significant antigen in the Rh system due to its high immunogenicity and being the main cause of hemolytic disease of the fetus and newborn In 2018, the RHD gene was fully sequenced using NGS (31) on the Ion Personal Genome Machine (Ion PGM) where allele specific reference sequences were established.In this study, we tested the suitability and efficiency of the MinION sequencer in BGG by fully sequencing the RHD gene and comparing the results to RHD genotyping results obtained previously using Ion PGM (31,32).

Sample processing
One blood donor sample (National Health Service Blood and Transplant (NHSBT), Bristol, United Kingdom) and 12 genomic DNA (gDNA) samples from Finnish pregnant females phenotyped as RhD negative, supplied by Finnish Red Cross Blood Service (Helsinki, Finland) with full ethical approval, were genotyped for the RHD gene and results were published in 2020 (32).The serology testing performed on the samples from the Finnish pregnant females included a red cell antibody adsorption and elution test for the RHD blood group DEL phenotype where there are a low number of D antigens per red cell (32).
The blood donor sample was serologically phenotyped for different blood group antigens by the NHSBT.The sample was received in an ethylenediaminetetraacetic acid tube, which was centrifuged at 2500g for 10 minutes at room temperature.The plasma on the top layer was carefully disposed and buffy coat was collected into a 1.5-mL tube; the remaining content was discarded.gDNA was extracted from buffy coat using the QIAamp DNA Blood Mini kit (Qiagen Ltd, United Kingdom) as previously described (31).
The RHD gene was amplified from gDNA samples in 6 overlapping amplicons using previously described primers (31).Amplicons were then purified using Agencourt AMPure XP reagent (Beckman Coulter, United Kingdom) and then pooled in a quantitative manner, to ensure equal representation of each amplicon, to yield a final amount of 1800 ng in 48 µL final volume.

Library preparation and sequencing
Sequencing library was prepared following the 1D Native barcoding gDNA protocol using the Native Barcoding kit 1-12 and the Ligation Sequencing kit (SQK-LSK109) with 1D flow cells R9 version (Oxford Nanopore Technologies, UK).A Flow cell was placed in the MinION, which was then plugged directly into a USB3 port on a laptop running Windows 10.MinKNOW v1.13 software (Oxford Nanopore Technologies) connected to the MinION and the software ran default control checks on the quality of the sequencing pores.The flow cell was primed as per manufacturer guidelines and the sequencing library was then added.In the MinKNOW v1.13 software, the sequencing run was started and left running for 12 h.Raw signal data (FAST5 files) was then transferred to an external hard drive for analysis.

Data analysis
The sequencing run produced 49 FAST5 files each containing about 4000 reads.Guppy basecaller v3.2.4 (Oxford Nanopore Technologies) was used for basecalling the raw data (FAST5), which divided the read into pass and fail FASTQ files by comparing the quality score per read to a threshold ≥ 7 (35).Only pass reads were used to carry on the analysis.Files were subjected to sequencing quality analysis using EPI2ME software (Oxford Nanopore Technologies).
Barcoded reads were then trimmed using SeqKit software v0.7.1.A script was written in Bash to automate the process of the analysis.Nanopolish software v.0.9.0 was used to index the reference human genome build 38 (hg38) chromosome 1 reference sequence (NC_000001.11)),index the FASTQ files, and then map the reads to the reference which generated BAMfiles.BAMfiles were then sorted using Samtools v.1.4.1 to generate BAI files.Variants were then called using Nanopolish software v.0.9.0 (36).The reads were visualised using Integrative Genomics Viewer v.2.5.3 (Broad Institute and the Regents of the University of California, United States) and CLC Main Workbench 10 software (Qiagen Ltd, United Kingdom).
Variant calling was also performed using CLC MainWorkbench 10 software at 100x minimum coverage.The data was then compared to the sequencing data obtained from Ion PGM.Exonic and intronic mutations detected from both platforms were compared and the variant tracks of the same sample were aligned for comparison.

Results
All samples were tested for RHD gene zygosity using ddPCR with all samples having a hemizygous RHD gene (one copy), except for two samples that showed a homozygous RHD gene (two copies).Samples (n=13) (Table 1) were serologically phenotyped as RhD negative or weak D by serology.The RHD gene was fully sequenced using the MinION sequencer using overlapping long-range PCR (LR-PCR) amplicons.Quality assessment for MinION reads was performed and a mean quality PHRED score of 11 was detected and reads length mode of 10,450 bp.PHRED quality score is an algorithmic integer value representing the estimated probability of an error in the identification of a base, for example, a score of 10 indicates a 1/10 probability of an incorrect base or a 90% confidence in the called base.
Data was analysed and mapped to the RHD hg38 reference sequence, which was visualised using Integrated Genome Viewer software.As noted previously (31), the RHD human reference sequence in the hg38 encodes a variant RHD allele RHD*DAU0 encoded by c.1136C>T (p.Thr379Met) in exon 9. Therefore, all 13 samples showed a homozygous SNP in exon 9 c.1136T>C (Met379Thr) (data not shown, 31).
Variant calling was performed and a variant track was generated for each sample, which was then compared to the variant track generated from the Ion PGM sequencing data for the same sample.All exonic single nucleotide polymorphisms (SNPs) detected in the 13 samples agreed with the ones detected from the Ion PGM data (Table 1).The RHD allele was determined in all samples sequenced except for one RHD homozygous sample where the results were inconclusive.In this sample, 4 heterozygous mutations were detected (c.48G>C, c.602C>G, c.667T>G, c.819G>A) (Table 1), suggesting the presence of a wild type RHD allele, which did not agree with the weak D serology result, thus genotyping results remained inconclusive.Intronic changes detected were also compared and agreed with the SNPs detected by Ion PGM (31), except for 6 SNPs.These 6 intronic SNPs were expected to be specific to the RHD reference sequence hg38 RHD*DAU0, which included 25,286,520 T>C; 25,286,601 T>A; 25,286,605 A>T; 25,286,674 C>T; 25,286,732 A>G; 25,295,850 A>G and were mainly located in intron 2. These SNPs were most probably false positive SNPs from the Ion PGM data assembly of the short reads that were generated during library preparation.

Discussion
In this study, SMS was used through MinION sequencing for RHD genotyping.
Thirteen samples were sequenced and results were compared to the ones obtained by Ion PGM.RHD gene genotyping using MinION proved to be successful and alleles determined agreed with the ones identified using NGS (Ion PGM).The RHD gene, from 13 samples, was sequenced and the RHD allele was determined for all samples except for one where the presence of two RHD variant alleles is expected.
Two samples showed the same novel variant (Val141fs/Val141Glu) but we confirmed that these samples were from two separate individuals.
In the sample where sequencing was inconclusive, determined to be RHD homozygous by ddPCR, 4 heterozygous exonic SNPs were detected including c.48G>C in exon 1, c.602C>G in exon 4, c.667T>G in exon 5, and c.819G>A in exon 6 (Table 1), which indicated the presence of two variant RHD alleles (compound heterozygote) (32) (encoded by c.48G>C, c.602C>G, c.667T>G, c.819G>A) and RHD*01 (considered wild type) (Supplemental Figure 1).However, the presence of a wild type RHD allele that produced a normal RhD protein would not agree with the weak D reactivity in serology, as normal D would mask the weak D reactivity and the result would be RhD positive instead of weak D. Since the presence of an intact copy of either RHD*01.01 or RHD*01 allele is unlikely, genotyping results for this sample remain inconclusive.It is possible that the seemingly wild type copy of the RHD gene carried a deletion that was concealed by the presence of an intact copy of the mutated RHD gene (either RHD*09.03.01 or RHD*09.04).Due to the location of the RHD primers, variation in the promotor of the RHD gene cannot be ruled out for this sample.
Variation in the Rh Associated Glycoprotein (RHAG) gene for this sample can also not be excluded.Only DNA was available from this sample and so no mRNA sequencing could be performed from either cultured red cells or reticulocytes.
The advantages of using MinION over NGS are the faster library preparation, real-time sequencing analysis and sequencing of longer reads that allow for better assembly (3,6,36).In this study, MinION library preparation and sequencing time was reduced to a day compared to 4 d for NGS, considering that library preparation started after the LR-PCR amplification and purification, which takes 3 d for 20 samples.The bioinformatics for base calling and determination of variants with the MinION sequencing takes ~1-2 d.Although LR-PCR amplicons were used to amplify the RHD gene for sequencing, direct sequencing of any target gene is the main goal with SMS.One prior study (37) employed target enrichment using biotinylated PCRgenerated baits that allowed capturing the targeted gene for MinION sequencing.
An error in the MinION sequencing occurs in specific sequences with an estimated 11% error rate (38).MinION, with 40x depth of coverage, may cause a false substitution and insertion every 10-50 kb and a false deletion every 1000 bp, which may cause an issue in detection of variations (38).According to Laver (12), the MinION error rate per base with a certain quality number does not correspond to the error rate per base expected for the PHRED value of the same quality results in MinION technology.Even though the MinION sequencing quality score does not correspond with the PHRED-score used for NGS technologies, it is still used as an error estimation score.Using R6 MinION chemistry, MinION had an ~40% error rate on single read sequencing (12).In our study here, however, the current work 1D flow cells R9 version was used for sequencing, which showed a lower error rate (13).
We did not encounter any issue in calling variants since high coverage across the gene was achieved with up to 500x coverage in some regions.Exonic and intronic SNPs were detected and alleles were determined, which agreed with ones found using NGS (Ion PGM).Variation in coverage is expected due to the fact that multiple LR-PCR amplicons are sequenced.Eliminating the need for PCR amplification should speed the library preparation process and enable allele phasing.This might be possible through targeted MinION sequencing using Cas9 guided adaptors ligation (39) or biotinylated PCR (38).Eliminating the PCR amplification step should enable easier allele phasing, which is important in BGG to enable assigning alleles successfully in hemizygous samples and identifying novel deletions, insertions or hybrid alleles.
Other challenges facing SMS are data handling, storage and analysis.The evolving nature of this sequencing technology makes it difficult to establish a userfriendly software that would enable fast and accurate data analysis to make it suitable for clinical use.Currently, there are numerous published reports about the utilisation of MinION and data handling and analysis.Most of these papers, however, focused on genome assembly and analysis for microorganisms (14,15).The human genome is larger and far more complex; therefore, more work is needed to explore the potential power of this approach in human genome sequencing and analysis to improve sequencing accuracy and develop user friendly interfaces for data analysis (38). Abbreviations: . Allele phasing was not possible due to the fact that PCR amplicons were used for sequencing.Possible alleles encoded by these exonic changes are either RHD*09.03.01 (encoded by c.602C>G, c.667T>G, c.819G>A) and RHD*01.01(encoded by c.48G>C; considered wild type) or RHD*09.04 Authors contribution: W.A.T wrote manuscript; W.A.T performed experiments; W.A.T and V.P.L analysed data; S.M.T, S.S, and K.H collected and processed Finnish samples; T.E.M. and N.D.A. supervised study and revised manuscript.All authors reviewed, edited and approved the manuscript.Conflict-of-interest disclosure: N.D.A. is a consultant for Natera Inc. and has an expert testimony appointment with Wilmer-Hale.Funding: The research was funded by King Abdulaziz University, Jeddah, Saudi Arabia.