- Split View
-
Views
-
CiteCitation
C. Lloyd, D. Lowe, B. Edwards, F. Welsh, T. Dilks, C. Hardman, T. Vaughan; Modelling the human immune response: performance of a 1011 human antibody repertoire against a broad panel of therapeutically relevant antigens, Protein Engineering, Design and Selection, Volume 22, Issue 3, 1 March 2009, Pages 159–168, https://doi.org/10.1093/protein/gzn058
Download citation file:
© 2018 Oxford University Press
Close -
Share
Abstract
A large 1.29 × 1011 antibody fragment library, based upon variable (V) genes isolated from human B-cells from 160 donors has been constructed and its performance measured against a panel of 28 different clinically relevant antigens. Over 5000 different target-specific antibodies were isolated to the 28 antigens with 3340 identified as modulating the biological function (e.g. antagonism, agonism) of the target antigen. This represents an average of ∼120 different functionally active antibodies per target. Analysis of a sample of >800 antibodies from the unselected library indicates V gene usage is representative of the human immune system with no strong bias towards any particular VH–VL pairing. Germline diversity is broad with 45/49 functional VH germlines and 28/30 Vλ and 30/35 Vκ light-chain germlines represented in the sample. The number of functional VH germlines and Vκ light-chain germlines present is increased to 48/49 and 31/35, respectively, when selected V gene usage is included in the analysis. However, following selection on the antigen panel, VH1–Vλ1 germline family pairings are preferentially enriched and represent a remarkable 25% of the antigen-specific selected repertoire.
Introduction
Human antibodies are an important class of therapeutics for a wide range of diseases, from cancers to autoimmune conditions such as rheumatoid arthritis (Carter, 2006). In order to generate human antibodies the two most commonly used technologies are immunisation of transgenic mice (Gallo et al., 2000) and the generation and screening of large antibody fragment display libraries (McCafferty et al., 1990; Marks et al., 1991). Binding of these phage-displayed antibody fragments to an antigen of interest, followed by elution of the antibody–antigen complex allows the enrichment of antigen-specific antibodies. Large libraries of phage-displayed antibody fragments have proved to be a rich source of fully human antibodies for potential clinical development (Edwards et al., 2003; Pukac et al., 2005; Thompson et al., 1999). The first large (>1010 distinct transformants) non-immunised phage display library capable of yielding high (sub-nanomolar) affinity antibody fragments to a given target was described in 1996 (CAT1.0 library) (Vaughan et al., 1996). This library was constructed from antibody variable (V) genes derived from 43 donors from B-cells derived from peripheral blood lymphocytes, tonsil and bone marrow. Antibody fragments with a Kd as low as 0.3 nM were subsequently isolated directly from this library. Over the following decade various groups have constructed similarly large libraries, using scFv’s, Fab fragments (Knappik et al., 2000; Hoet et al., 2005), or VH–VL binding domains (Jespers et al., 2004), and sources of V genes ranging from immune tissues (Loset et al., 2005) to purely synthetic de novo constructs (Knappik et al., 2000). Selection of such libraries to a given antigen can give rise to several hundreds to thousands of different antibodies (Edwards et al., 2003; Miller et al., 2005).
As libraries increase in size to >1010 entities, it is predicted that the further increase in diversity will provide a greater number of antibodies with the desired properties such as epitope, affinity or potency (Perelson et al., 1979; Vaughan et al., 1996). This would likely enable the isolation of broad panels of antibodies that bind to functionally relevant epitopes that may result in disease modification. Further, some of these antibodies will likely have sufficient potency for therapeutic use without the need for further engineering. We have therefore designed and constructed an scFv phagemid library of >1011 variants, incorporating V genes derived from human B-cells from spleen and foetal liver tissue (CAT2.0 library). The utility of this library as a source of specific antibody fragments with appropriate activity in biologically relevant assays, suitable for potential clinical development, is demonstrated across a range of antigen classes.
Materials and methods
CAT2.0 phagemid library
The CAT2.0 phage library is based upon the previously described CAT1.0 library (Vaughan et al., 1996). Spleen lymphocyte cDNA (Clontech) from 20 healthy donors was used as template for V gene amplification. VH and VL gene segments were amplified separately using germline specific primers (Hutchings, 2001) and were subsequently cloned into a modified pCANTAB6 vector (McCafferty et al., 1994). This vector contains a [Gly4Ser]3 linker with flanking XhoI and ApaLI sites to facilitate independent heavy- and light-chain cloning. Over 100 independent bacterial transformations were carried out by electroporation, producing a total of 8.5 × 1010 transformants.
V genes amplified from foetal liver library cDNA, obtained from Clontech, Invitrogen and Stratagene (97 donors, age range: 18–40 weeks), was also constructed as described earlier. This resulted in a repertoire size of 3.3 × 1010 transformants. Combination of the original CAT1.0 with the spleen and foetal liver repertoires resulted in a combined library of 1.29 × 1011 (CAT2.0). In addition to the germline-specific primers described previously (Hutchings, 2001), the following primers were also designed and used in the construction of the foetal liver repertoire:
HuK6BackApaL14: ACCGCCTCCACCAGTGCACTTGATGTTGTGATGACACAGTCTCC.
HuL3BackApaLm: ACCGCCTCCACCAGTGCACTTTCCTATG(AT)GCTGATGCAGCCACC.
HuL3BackApaLpae: ACCGCCTCCACCAGTGCACTTTCCTATG(AT)GCTGACACAGC(CT)ACC.
HuL4abBackApaL1: ACCGCCTCCACCAGTGCACAGCCTGTGCTGACTCAATC(AG)(TC)CC.
HuL4cBackApaL1: ACCGCCTCCACCAGTGCACTGCCTGTGCTGACTCAGCCCCCG.
HuL5bBackApaL1: ACCGCCTCCACCAGTGCACAGCCTGTGCTGACTCAGCCATC.
HuL5cBackApaL1: ACCGCCTCCACCAGTGCACAGGCTGTGCTGACTCAGCCGGC.
HuL5eBackApaL1: ACCGCCTCCACCAGTGCACAGCCTGTGCTGACTCAGCCACC.
Unselected libraries were sequenced to assess diversity and scFv VH–VL pairing frequency. Phage library aliquots were prepared using standard techniques (Sambrook et al, 1987).
Selection of phagemid libraries
Selections were performed on a variety of target proteins including peptides, soluble growth factors, cytokines and transmembrane receptors. Antigens were available as soluble recombinant protein or expressed on a cell surface. A range of presentation methods were utilised to perform panning and solution-phase selections as described previously (Hawkins et al., 1992; Vaughan et al., 1996). In each case two or three rounds of selection were carried out, followed by sequence analysis and screening.
Phage ELISA
Phage ELISA was carried out as described previously (Vaughan et al., 1996). ScFv's which bound specifically to the relevant antigen were defined by having at least a 3-fold signal over background and in the large majority of cases, >10-fold higher than that seen on an irrelevant control antigen.
Sequencing
ScFv DNA was amplified by PCR using vector primers pUC19 reverse and fdtetseq (Vaughan et al., 1996). Excess primer and dNTPs were removed using shrimp alkaline phosphatase (Promega) and Exonuclease 1 (New England Biolabs) according to manufacturer’s protocols. VH genes were sequenced with primers Gene 3 leader and LSeq. VL genes were sequenced with primer Myc Seq 10. Reactions were carried out using Big Dye Terminator V3.1 (Applied Biosystems). Samples were run and analysed on an Applied Biosystems 3700 DNA Analyser. ScFv's with novel amino acid sequences (having at least one amino acid different from the nearest related scFv) were retained for further analysis.
LSeq – GATTACGCCAAGCTTTGGAGC.
Gene 3 leader – TTATTATTCGCAATTCCTTTAGTTGTTCCT.
Myc seq 10 – CTCTTCTGAGATGAGTTTTTG.
pUC 19 reverse – AGCGGATAACAATTTCACACAGG.
FDTETSEQ 24 – TTTGTCGTCTTTCCAGACGTTAGT.
Sequence analysis and annotation
Antibody sequences were analysed and annotated using Blaze2 software. This application is designed for high-throughput analysis of large numbers of antibody sequences. Raw data from Applied Biosystems 3700 DNA analysers are fed into the server-side portion of the Blaze2 analysis pipeline. The first stage of analysis involves the identification of the read direction used for sequencing. This is achieved using BLAST (Altschul et al., 1990) to compare the query sequence with a database of known antibodies. If necessary Blaze2 will reverse complement the raw sequence depending upon the direction of the closest hit from the database. In the second stage of the analysis, the Kabat numbering scheme (Kabat and Wu, 1991) is applied to the antibody. The approach used is to annotate the antibody in a number of sections, filling in the gaps in the annotation (which represent the inserts) in a later stage. The heavy- and light-chain portions of the antibody are analysed separately and the results are stored in an Oracle database. The application of the Kabat numbering scheme to the antibody sequence allows the position of the CDR and framework regions to be inferred. In the third stage of the analysis, the germline assignments are determined. Blaze2 uses an internally curated database, containing a single allele per locus, to make the germline assignments. The closest germline match from the database is identified using BLAST.
Statistics
χ2 tests were used to compare the distribution of VH and VL sub-families pre- and post-selection for CAT libraries.
Results
CAT2.0 phagemid scFv antibody fragment library construction
The CAT2.0 phagemid library was constructed in order to capture additional antibody diversity to the library previously described in 1996 (Vaughan et al., 1996). Splenic B-cells from non-immunised human donors, representing a secondary lymphoid tissue, and foetal liver B-cells, representing a primary lymphoid tissue were used as alternative tissue sources for library construction. These repertoires of VH and VL genes were used to generate the CAT2.0 scFv library with a repertoire size of 1.29 × 1011 (spleen repertoire = 8.5 × 1010; foetal liver repertoire = 3.3 × 1010; bone marrow, lymph node, peripheral blood from original 1996 repertoire = 1.1 × 1010). To assess the initial diversity of the library, bacterial colonies from the library transformation plates were picked at random and sequenced, generating 841 full-length V gene sequences which were aligned to human germline V gene segments (Figs 1 and 2). Extensive V gene diversity was observed in both the heavy- and light-chain repertoires. In total 45/49 of the functional VH germline gene segments were accounted for, with no particular gene segment dominating. Those utilising a segment from the VH3 family occurred most frequently, followed by the VH1 and VH4 families respectively. Within the light-chain repertoire there is a clear preference for usage of Vλ over Vκ and particular gene segments were represented frequently, e.g. Vλ1-c (6%), Vλ1-e (6%), Vλ2-a2 (6%), Vλ3-l (17%) and Vκ1-L12 (9%). Although these VL germline segments were observed in 44% of the antibodies sequenced, VL diversity in the remaining antibodies sequenced was high with a total of 28/30 Vλ and 30/35 Vκ functional germline segments present in this relatively small sample size.
Germline diversity of antibodies isolated from the CAT2.0 library: (A) VH germline gene usage, (B) Vλ germline gene usage and (C) Vκ germline gene usage. For each germline sequence, the proportion of antibodies isolated from the unselected library and the library post-selection are shown by blue and red bars, respectively. Figures are presented as a percentage of the total number of antibodies sequenced. Significant changes in frequency between the unselected and selected CAT2.0 library are highlighted (*P < 0.0001).
Germline diversity of antibodies isolated from the CAT2.0 library: (A) VH germline gene usage, (B) Vλ germline gene usage and (C) Vκ germline gene usage. For each germline sequence, the proportion of antibodies isolated from the unselected library and the library post-selection are shown by blue and red bars, respectively. Figures are presented as a percentage of the total number of antibodies sequenced. Significant changes in frequency between the unselected and selected CAT2.0 library are highlighted (*P < 0.0001).
Two-dimensional dendrograms illustrating VH–VL pairing frequencies in the unselected CAT2.0 library (n = number of times each VH–VL pairing was observed).
Two-dimensional dendrograms illustrating VH–VL pairing frequencies in the unselected CAT2.0 library (n = number of times each VH–VL pairing was observed).
Isolation of potent functional antibody fragments
The CAT2.0 library has been utilised on multiple antigen targets within our laboratory as a source for potent antibodies with biologically relevant functional activity. Table I details a summary for use of CAT2.0 on 28 different antibody isolation programmes representing seven different antigen classes. With the exception of gp41 (HIV), all antigens described are human. Biologically relevant assay screens were established for each, based upon characteristics such as the ability to inhibit a particular ligand/receptor interaction (antagonist) or to mimic the bioactivity of a natural ligand (agonist). Assays were designed to facilitate large-scale screening of crude scFv preparations isolated directly from the library, as previously described (Edwards et al., 2003). Panels of between 5 and 636 unique functional scFv were identified per target antigen, with a mean of 119 scFv per antigen. The potencies of the lead scFv in vitro were all shown to be in the nanomolar range (EC50's / IC50's ranging from 0.09 to 250 nM), with many scFv’s exhibiting single-digit nanomolar or sub-nanomolar potencies.
CAT2.0 library was used for selection upon 27 different human antigens and one viral peptide (HIV-gp41), representing seven different antigen classesa
| Target name | Antigen class | No. of biologically active scFv | Assay potency [IC50/EC50 (nM)] |
|---|---|---|---|
| Ghrelin | Peptide | 68 | 0.09 |
| IL-21R | Receptor | 140 | 0.41 |
| LARC | Chemokine | 48 | 0.7 |
| GDF-8 | Growth factor | 23 | 1 |
| PSGL-1 | Receptor | 29 | 1 |
| PAI-1 | Protease inhibitor | 32 | 1 |
| GHR | Receptor | 35 | 1.2 |
| IL-6 | Cytokine | 468 | 1.2 |
| IGF1-R | Receptor | 21 | 1.4 |
| CXCL13 | Chemokine | 63 | 1.4 |
| GM-CSFRa | Receptor | 231 | 2.3 |
| TRAIL-R1 | Receptor | 7 | 3.4 |
| APRIL | Cytokine | 65 | 4 |
| BLys | Cytokine | 636 | 4 |
| IL-22 | Cytokine | 107 | 5 |
| TRAIL-R2 | Receptor | 13 | 6 |
| IL-15 | Cytokine | 132 | 10 |
| CD30L | Cytokine | 117 | 10 |
| LIGHT | Cytokine | 130 | 11 |
| IL-4R | Receptor | 182 | 12 |
| NKB | Peptide | 29 | 35 |
| NGF | Growth factor | 25 | 39 |
| IgE | Antibody | 270 | 39 |
| Type 1 transmembrane receptor | Receptor | 195 | 44 |
| TR2 | Receptor | 134 | 81 |
| VEGF-2 | Growth factor | 54 | 123 |
| gp41 | Peptide | 5 | 240 |
| PlGF | Growth factor | 81 | 250 |
| Range | 5–636 | 0.09–250 | |
| Target name | Antigen class | No. of biologically active scFv | Assay potency [IC50/EC50 (nM)] |
|---|---|---|---|
| Ghrelin | Peptide | 68 | 0.09 |
| IL-21R | Receptor | 140 | 0.41 |
| LARC | Chemokine | 48 | 0.7 |
| GDF-8 | Growth factor | 23 | 1 |
| PSGL-1 | Receptor | 29 | 1 |
| PAI-1 | Protease inhibitor | 32 | 1 |
| GHR | Receptor | 35 | 1.2 |
| IL-6 | Cytokine | 468 | 1.2 |
| IGF1-R | Receptor | 21 | 1.4 |
| CXCL13 | Chemokine | 63 | 1.4 |
| GM-CSFRa | Receptor | 231 | 2.3 |
| TRAIL-R1 | Receptor | 7 | 3.4 |
| APRIL | Cytokine | 65 | 4 |
| BLys | Cytokine | 636 | 4 |
| IL-22 | Cytokine | 107 | 5 |
| TRAIL-R2 | Receptor | 13 | 6 |
| IL-15 | Cytokine | 132 | 10 |
| CD30L | Cytokine | 117 | 10 |
| LIGHT | Cytokine | 130 | 11 |
| IL-4R | Receptor | 182 | 12 |
| NKB | Peptide | 29 | 35 |
| NGF | Growth factor | 25 | 39 |
| IgE | Antibody | 270 | 39 |
| Type 1 transmembrane receptor | Receptor | 195 | 44 |
| TR2 | Receptor | 134 | 81 |
| VEGF-2 | Growth factor | 54 | 123 |
| gp41 | Peptide | 5 | 240 |
| PlGF | Growth factor | 81 | 250 |
| Range | 5–636 | 0.09–250 | |
aFor each antigen, biochemical and/or biological assays were used to identify scFv with functional properties. The total number of unique biologically active scFv identified for each target antigen are shown alongside the highest potency (IC50/EC50) scFv isolated from the library.
CAT2.0 library was used for selection upon 27 different human antigens and one viral peptide (HIV-gp41), representing seven different antigen classesa
| Target name | Antigen class | No. of biologically active scFv | Assay potency [IC50/EC50 (nM)] |
|---|---|---|---|
| Ghrelin | Peptide | 68 | 0.09 |
| IL-21R | Receptor | 140 | 0.41 |
| LARC | Chemokine | 48 | 0.7 |
| GDF-8 | Growth factor | 23 | 1 |
| PSGL-1 | Receptor | 29 | 1 |
| PAI-1 | Protease inhibitor | 32 | 1 |
| GHR | Receptor | 35 | 1.2 |
| IL-6 | Cytokine | 468 | 1.2 |
| IGF1-R | Receptor | 21 | 1.4 |
| CXCL13 | Chemokine | 63 | 1.4 |
| GM-CSFRa | Receptor | 231 | 2.3 |
| TRAIL-R1 | Receptor | 7 | 3.4 |
| APRIL | Cytokine | 65 | 4 |
| BLys | Cytokine | 636 | 4 |
| IL-22 | Cytokine | 107 | 5 |
| TRAIL-R2 | Receptor | 13 | 6 |
| IL-15 | Cytokine | 132 | 10 |
| CD30L | Cytokine | 117 | 10 |
| LIGHT | Cytokine | 130 | 11 |
| IL-4R | Receptor | 182 | 12 |
| NKB | Peptide | 29 | 35 |
| NGF | Growth factor | 25 | 39 |
| IgE | Antibody | 270 | 39 |
| Type 1 transmembrane receptor | Receptor | 195 | 44 |
| TR2 | Receptor | 134 | 81 |
| VEGF-2 | Growth factor | 54 | 123 |
| gp41 | Peptide | 5 | 240 |
| PlGF | Growth factor | 81 | 250 |
| Range | 5–636 | 0.09–250 | |
| Target name | Antigen class | No. of biologically active scFv | Assay potency [IC50/EC50 (nM)] |
|---|---|---|---|
| Ghrelin | Peptide | 68 | 0.09 |
| IL-21R | Receptor | 140 | 0.41 |
| LARC | Chemokine | 48 | 0.7 |
| GDF-8 | Growth factor | 23 | 1 |
| PSGL-1 | Receptor | 29 | 1 |
| PAI-1 | Protease inhibitor | 32 | 1 |
| GHR | Receptor | 35 | 1.2 |
| IL-6 | Cytokine | 468 | 1.2 |
| IGF1-R | Receptor | 21 | 1.4 |
| CXCL13 | Chemokine | 63 | 1.4 |
| GM-CSFRa | Receptor | 231 | 2.3 |
| TRAIL-R1 | Receptor | 7 | 3.4 |
| APRIL | Cytokine | 65 | 4 |
| BLys | Cytokine | 636 | 4 |
| IL-22 | Cytokine | 107 | 5 |
| TRAIL-R2 | Receptor | 13 | 6 |
| IL-15 | Cytokine | 132 | 10 |
| CD30L | Cytokine | 117 | 10 |
| LIGHT | Cytokine | 130 | 11 |
| IL-4R | Receptor | 182 | 12 |
| NKB | Peptide | 29 | 35 |
| NGF | Growth factor | 25 | 39 |
| IgE | Antibody | 270 | 39 |
| Type 1 transmembrane receptor | Receptor | 195 | 44 |
| TR2 | Receptor | 134 | 81 |
| VEGF-2 | Growth factor | 54 | 123 |
| gp41 | Peptide | 5 | 240 |
| PlGF | Growth factor | 81 | 250 |
| Range | 5–636 | 0.09–250 | |
aFor each antigen, biochemical and/or biological assays were used to identify scFv with functional properties. The total number of unique biologically active scFv identified for each target antigen are shown alongside the highest potency (IC50/EC50) scFv isolated from the library.
Analysis of VH and VL germline gene usage
Selection upon the 28 targets has generated sequencing information on vast numbers of distinct antibodies that specifically bind their target antigen. Sequence analysis and alignment has allowed the rapid assessment of V gene germline usage within this population. To date we have assembled unique VH and VL sequences from 5044 different antibodies isolated against a range of protein ligands, cell-surface receptors and peptides. Specific antibodies to these different antigens were identified by the biologically relevant assays described earlier and also by ELISA, where a positive antibody was identified as one which bound to its target with an assay signal at least 3-fold and in the large majority of cases >10-fold higher than that seen on an irrelevant Control antigen (data not shown). Nucleotide sequences were determined for all biologically active and/or ELISA positive antibodies and then organised into groups based on unique VH and VL amino acid sequences. This process resulted in a panel of 5044 antibodies that were all different in sequence by one or more amino acids. These were then aligned to human VH, Vλ and Vκ germline gene segments to fully assess the germline diversity. In addition, the germline usage of these selected antibodies was compared with the 841 antibodies picked at random from the unselected library (Fig. 1).
Virtually the entire repertoire of functional VH germline gene segments is captured in the CAT2.0 library, with 48/49 of the total VH gene segments observed (Fig. 1A). No antibodies utilising the VH gene segment, VH3-d, were identified in either the unselected or selected repertoires. This VH coverage was similar to that observed with the CAT1.0 library (data not shown). The 48 germline gene segments utilised were individually represented between 0.02 and 15.5% of the total. No one particular VH family was shown to dominate but a comparison of VH usage both before and after selection reveals that certain VH germline genes were preferentially enriched as a result of the selection process. For example, 32% of antibodies in the unselected library utilised a VH1 germline gene segment, increasing to 51% after selection. The most abundant VH1 germline gene segments were VH1-69 (from 5.5% in the unselected library to 15.5% after selection), VH1-46 (from 2.9 to 7.0%) and VH1-e (from 2.2 to 8.3%). VH3 germline usage remained largely consistent and was observed in 38% of antibodies in the unselected library and in 34% of antibodies after selection. VH4 family usage was seen to fall as a result of the selection process, from 20% in the unselected library to 10% after selection. Virtually every individual VH4 germline gene segment was observed at a lower frequency in the selected repertoire from its starting position in the unselected repertoire, the most striking examples being VH4-59 (from 7.9% of antibodies in the unselected library to 1.9% after selection), and VH4-39 (from 3.7 to 0.9%). Usage of the remaining VH4 segments fell ∼1–4-fold with only one exception, VH4-04, increasing in usage by ∼5-fold after selection.
VL germline gene usage in the CAT2.0 library was also extensive with a total of 28/30 Vλ and 31/35 Vκ functional germline segments observed (Fig. 1B and C). This represents a greater degree of VL diversity in the CAT2.0 library relative to the CAT1.0 library, since in the CAT1.0 library fewer Vλ (18 out of 30) and Vκ (10 out of 35) germline segments were observed (data not shown). As with the VH diversity, the frequency of VL germline gene usage in the unselected CAT2.0 library was different to that observed after selection. A clear preference for Vλ germlines was observed after selection, increasing from 64% of antibodies in the unselected library to 91% of antibodies after selection. The most prevalent Vλ family was Vλ1, observed in 20% of antibodies in the unselected library and 44% of antibodies post-selection. Vλ2 germline usage was also positively selected, from 9.2% of antibodies in the unselected library to 21.3% of antibodies after selection. Within these two families the most commonly observed germline segments after selection were Vλ1-c and Vλ2-a2, increasing in use from 5.8 to 15.8% and from 5.8 to 18.9%, respectively. Outside of these two common VL germline families other Vλ and Vκ gene segments were often observed, but in general these decreased in frequency relative to their starting levels in the unselected library, e.g. Vλ3-r and Vκ1-L12 both fell after selection (from 4.2 to 1.5% and from 9.1 to 5.0%, respectively). Furthermore, a total of three Vλ and nine Vκ germline gene segments present in the unselected library were not observed at all following selection, despite the much larger sample size.
VH and VL germline gene pairings
We have also assessed the frequency of particular VH–VL combinations observed in the unselected and selected CAT2.0 library (Figs 2 and 3). Theoretically, during library construction functional VH and VL germline genes could recombine in any one of 3185 possible combinations. We have analysed 841 antibodies from the unselected CAT2.0 library, a tiny fraction of the total library size. Despite this, a significant number of different VH–VL combinations were observed, with 490 of the possible different VH–VL combinations identified (Fig. 2). Of these only two (VH4-59/Vλ3-l and VH3-30.5/Vλ3-l) appeared on >10 occasions indicating that there is no strong bias towards any one particular VH–VL pairing within the unselected library. This is further exemplified by the pairing of two light-chain germline segments in the unselected CAT2.0 library, Vλ3-l and Vκ1-L12, to multiple VH germlines: Vλ3-l was observed in partnership with 35 of the 49 functional VH germline genes and Vκ1-L12 with 29 out of 49.
Two-dimensional dendrograms illustrating VH–VL pairing frequencies in the selected CAT2.0 library (n = number of times each VH–VL pairing was observed).
Two-dimensional dendrograms illustrating VH–VL pairing frequencies in the selected CAT2.0 library (n = number of times each VH–VL pairing was observed).
A different picture emerged when VH–VL combinations were studied following selection on antigen (Fig. 3). Again, the numbers of antibodies represent only a small fraction of the total library size (5044 antibodies), but after selection upon antigen 569 different VH–VL pairings were observed. There was a clear enrichment for particular VH and VL germline families, particularly VH1 and Vλ1, in preference to others such as VH4 and Vκ1. More detailed analysis identified the most commonly observed VH–VL pairing to be VH1–Vλ1 (in ∼25% of antibodies) which was enriched ∼5-fold relative to its starting frequency in the unselected libraries. Overall, a clear preference for Vλ as a partner for VH was observed with 91% of the observed VH–VL pairings utilising a Vλ light chain.
The two most common VL gene segments in the unselected library (Vλ3-l) and Vκ1-L12) were also observed frequently after selection, in 17 and 5% of antibodies, respectively. However, other VL's that were not abundant in the unselected library were observed at much higher frequencies after selection, e.g. Vλ1-c, Vλ1-e and Vλ2-a2, which were found in 16, 12 and 19% of selected antibodies, respectively. Each of these VL's were observed to pair with virtually any VH with the most promiscuous example, Vλ2-a2, observed in combination with 42 of the 49 functional VH germline genes. After selection on antigen the most common VH–VL pairings were VH1-69/Vλ1-c, VH1-69/Vλ1-b, VH1-e/Vλ1-1c, VH1-03/Vλ2-a2 and VH1-69/Vλ2-a2, which together accounted for 12% of all antibodies isolated. These common VH–VL pairings were all significantly enriched as a result of the selection process, since in the starting libraries they accounted for <1% of the antibodies sequenced. The frequency of other VH–VL pairings remained largely the same both before and after selection (e.g. VH4-28/Vλ1-c and VH3-30.5/Vλ2-a2), or were decreased (e.g. VH4-59/Vλ3-l) and VH1-24/Vλ3-r). Some VH–VL pairings observed in the unselected CAT2.0 library were not seen at all after selection, e.g. VH4-61/Vλ2-b2 and VH3-23/Vκ2-a19,a3.
Amino acid changes from germline
The CAT2.0 library was generated from human B lymphocytes isolated from both secondary lymphoid tissue (spleen, bone marrow, tonsil and peripheral blood) and primary lymphoid tissue (foetal liver).
To investigate antibody maturation in B-cells derived from both the primary and secondary tissue sets, we have compared the number of V gene mutations present in antibodies isolated from each tissue source. To simplify the analysis, the VH-CDR3 and VL-CDR3 domains, which are derived by VH–D–JH and VL–JL recombination events (Brack et al., 1978), were not included in this analysis because of their high inherent diversity.
Overall, antibody genes derived from secondary lymphoid tissues possessed a higher frequency of amino acid changes from germline compared with those from primary sources (Fig. 4A). An antibody from the CAT2.0 library possessed, on average, 11 framework mutations (seven in the VH, four in the VL) and eight CDR mutations (five in the VH, three in the VL). When normalised according to framework and CDR length, the analysis also demonstrates that it is the CDR regions, particularly VH-CDR1, VH-CDR2 and VL-CDR2, that contain a greater number of mutations by ∼2–3-fold relative to the framework regions (Fig. 4B).
Mean number of amino acid differences from germline observed in antibodies isolated from the secondary and primary lymphoid tissue components which make up the CAT2.0 library: (A) the mean number of amino acid differences from germline observed in both the VH (left of dotted line) and VL (right of dotted line) domains of antibodies isolated from the CAT2.0 (framework regions (FR) – blue bars and CDR regions – red bars); (B) the mean number of amino acid differences from germline normalised to account for FR or CDR length [for VH, mean FR1 length across all germline sequences = 30 amino acids, FR2 = 14, FR3 = 32, CDR1 = 5.5 and CDR2 = 17. For VL (combined Vλ and Vκ), mean FR1 length = 22.5, FR2 = 15, FR3 = 32, CDR1 = 12.5 and CDR2 = 7.5]. Results are presented as percent mutation frequency, i.e. the likelihood of observing an amino acid change from germline within a given region of an antibody. VH and VL CDR3 and FR4 regions were not included in this analysis.
Mean number of amino acid differences from germline observed in antibodies isolated from the secondary and primary lymphoid tissue components which make up the CAT2.0 library: (A) the mean number of amino acid differences from germline observed in both the VH (left of dotted line) and VL (right of dotted line) domains of antibodies isolated from the CAT2.0 (framework regions (FR) – blue bars and CDR regions – red bars); (B) the mean number of amino acid differences from germline normalised to account for FR or CDR length [for VH, mean FR1 length across all germline sequences = 30 amino acids, FR2 = 14, FR3 = 32, CDR1 = 5.5 and CDR2 = 17. For VL (combined Vλ and Vκ), mean FR1 length = 22.5, FR2 = 15, FR3 = 32, CDR1 = 12.5 and CDR2 = 7.5]. Results are presented as percent mutation frequency, i.e. the likelihood of observing an amino acid change from germline within a given region of an antibody. VH and VL CDR3 and FR4 regions were not included in this analysis.
Antibodies derived from the primary lymphoid tissue component of the library possessed very few amino acid differences from germline and, where changes were observed, they were distributed equally amongst framework and CDR regions (Fig. 4B). On average an antibody derived from foetal liver B-cells possessed only four mutations in its framework regions (two in the VH, two in the VL) and only one mutation in the CDRs.
Discussion
The CAT2.0 human antibody library comprises >1011 individual members and has proven to be a useful resource for the generation of broad panels of distinct, functionally active antibodies against a wide spectrum of clinically relevant antigens. The library was generated from both spleen and foetal liver derived B-cells and is 10-fold larger than previously reported naïve repertoires (Vaughan et al., 1996; Loset et al., 2005). In addition, the Vλ and Vκ light-chain diversity is doubled relative to the original CAT1.0 library (Vaughan et al., 1996). The additional diversity in the CAT2.0 library has resulted in a more powerful phage antibody library from which to derive both greater numbers and greater quality of antibody candidates. For example, a further 460 antibodies specific for BLys were isolated from the CAT2.0 library compared with CAT1.0, and many of these were identified as functional inhibitors (Edwards et al., 2003).
Analysis of the VH and VL germline gene usage for antibody fragments selected from this library demonstrates broad coverage with the large majority of segments utilised. Of the 841 unselected and 5044 selected antibodies sequenced, all but one of the 49 functional VH gene segments was observed. The only VH germline sequence, VH3-d, not evident in our library sample should have been amplified by the V gene-specific primers suggesting that this particular gene was poorly represented in our donor pool and used infrequently. Improved diversity in VL usage over the original CAT1.0 library was also achieved with 28 out of the functional 30 Vλ gene segments and 31 out of the functional 35 Vκ gene segments seen in the combined unselected and selected library samples. Further light-chain segments may be evident if greater numbers of scFv's had been sequenced. That being said, these data compare favourably with the diversity seen with previously published in vivo V gene usage (Brezinschek et al., 1997; Knappik et al., 2000) and with observations of transgenic humanised mouse systems (Gallo et al., 2000). For example, in the unselected library, antibodies utilising VH3 germline segments occur most frequently, followed by VH1 and then VH4 family members. Within these families, genes such as VH3-30.5, VH3-23, VH4-59 and VH1-69 are most frequently observed. This is consistent with the in vivo V gene usage (Brezinschek et al., 1997; Knappik et al., 2000) and that of transgenic humanised mouse systems (Gallo et al., 2000), demonstrating that the diversity within the unselected library is appropriately represented.
After selection upon antigen, antibodies utilising a VH1 gene segment are most frequently observed, whereas those utilising VH3 genes remain at similar frequencies pre- and post-selection, contrary to a previous report analysing isolated human B-cells (Brezinschek et al., 1997). This found that B-cells expressing VH1 antibodies remained consistent between the non-productive and productive repertoires, while those expressing VH3 gene segments increased in the productive repertoire. The expansion of B-cells expressing VH3 antibodies in vivo has been attributed to possible exposure to superantigens, such as protein A, which preferentially bind VH3 variable domains (Zouali, 1995). We would therefore not see preferential selection in vitro based on this attribute, with a greater influence on the ability of an scFv antibody to express and fold in the periplasm of Escherichia coli (E.coli).
Analysis of the spectrum of specific gene segments within the library pre- and post-selection on antigen also reveals some broad similarities. For example, although the VH1 family as a group is not enriched in productively rearranged antibodies in vivo (Brezinschek et al., 1997; Knappik et al., 2000), the VH1-69 gene segment is specifically enriched and is also seen at the greatest frequency (15.5%) in the antigen selected population from the CAT2.0 library. Similarly, the proportion of VH4 gene segments falls as a result of the selection process and a similar drop is observed in the productive repertoire of B-cells in vivo, indicating that VH4 gene segments are not enriched upon encountering antigen. For example, the segment VH4-59 drops in frequency between non-productive and productive rearrangements in vivo (Brezinschek et al., 1997), and drops 4-fold after selection on antigen using the CAT2.0 library. B-cells expressing VH4 gene segments are not enriched relative to VH3 genes or certain VH1 segments in vivo, possibly due to evidence suggesting that antibodies utilising a VH4 segment have a greater propensity for being autoantibodies (Pascual and Capra, 1991; Stewart et al., 1992). Taken together, these results suggest that the VH gene usage within the CAT2.0 library is broadly similar to that described in vivo, indicating that any potential biases due to bacterial expression of the library are minimised.
A major difference seen in the CAT2.0 library compared with in vivo light-chain usage is the clear preference for Vλ gene segments after selection (91% of antibodies isolated utilise Vλ as opposed to 9% Vκ), whereas in human serum, Vκ light chains account for 60–70% of immunoglobulins (Cohen and Porter, 1964; Farner et al., 1999). The clear preference for the expression and display of scFv’s with a Vλ light chain rather than the Vκ light chains that predominate in an in vivo immune response, suggest that scFv’s containing Vλ chains are more stably expressed and displayed in E.coli. Ewert et al (2003) have previously shown that Vκ3 is the most stable light chain as an individual domain in terms of its expression, oligomeric state in solution and folding. However, importantly, Vλ families are more stable when paired with different VH domains to form more stable scFv antibodies. A recently described naïve library (Loset et al., 2005) based upon V genes isolated from IgM and IgD expressing cells has also been found to favour VH–Vλ combinations over VH–Vκ, due to higher scFv–pIII fusion protein expression levels for the Vλ pairings. Exploiting knowledge of optimal pairings of expressed and correctly folded antibodies has been used to help design libraries to maximise the number of functional antibodies. The synthetic HuCal library (Knappik et al., 2000) is made up of seven consensus heavy chains, paired with four kappa and three lambda consensus light chains. Analysis of the germline frequencies observed pre- and post-antigen selection with the CAT2.0 library indicates that the majority of the 14 HuCal consensus genes are frequently observed. For example, the VH1 consensus gene chosen, 1-69, is also the most frequently observed heavy-chain germline in the CAT2.0 library. Interestingly however, the VH4 family member, 4-59, used as the consensus gene in the HuCal library, is widely represented in the unselected CAT2.0 library but significantly decreased in frequency post-antigen selection, indicating that the 4-59 germline may not be optimal for antigen recognition or expression and folding in E.coli.
Comparison of the V gene framework amino acids of isolated scFv’s with the closest germline V gene sequences identified significant numbers of mutations away from germline. Some of these mutations will be the result of either PCR errors or the usage of an inappropriate 5′ primer during library construction. Others, however, will be the result of in vivo somatic hypermutation, which will be more prevalent in antibodies derived from B-cells from tissues such as spleen that are involved in secondary immune responses. Interestingly, however, we have not observed any notable correlation between affinity and the number of germline gene mutations in antibodies isolated from the CAT2.0 library (data not shown).
The CAT2.0 library, at 1.29 × 1011 in size, represents an antibody repertoire with a magnitude many times that found in any individual organism at any one time. By exploiting this library across a variety of different human antigens, we have isolated large panels of potent functional antibody fragments, with an average of ∼120 per antigen, in a manner analogous to a humoral immune response. The analysis of the heavy- and light-chain family usage, as well as the different pairings is the most comprehensive yet reported, and demonstrates that this phage display library is broadly similar to the in vivo antibody V gene usage in humans. The data presented here demonstrate that the isolation of target-specific antibodies with function modifying properties comprises a wide variety of combinations of heavy- and light-chain variable genes, but with some preference for a VH1–Vλ1 pairing which should be explored further.
Acknowledgements
The authors would like to thank Sara Carmen, Ruth Featherstone, Thor Holtet, Catherine Hutchings, Jane Wilton and Katherine Vousden for their input in constructing the scFv phagemid libraries that are currently in use within the Medimmune Cambridge laboratories. We would also like to thank Richard Lowden for his help with the V gene usage analysis.





![Mean number of amino acid differences from germline observed in antibodies isolated from the secondary and primary lymphoid tissue components which make up the CAT2.0 library: (A) the mean number of amino acid differences from germline observed in both the VH (left of dotted line) and VL (right of dotted line) domains of antibodies isolated from the CAT2.0 (framework regions (FR) – blue bars and CDR regions – red bars); (B) the mean number of amino acid differences from germline normalised to account for FR or CDR length [for VH, mean FR1 length across all germline sequences = 30 amino acids, FR2 = 14, FR3 = 32, CDR1 = 5.5 and CDR2 = 17. For VL (combined Vλ and Vκ), mean FR1 length = 22.5, FR2 = 15, FR3 = 32, CDR1 = 12.5 and CDR2 = 7.5]. Results are presented as percent mutation frequency, i.e. the likelihood of observing an amino acid change from germline within a given region of an antibody. VH and VL CDR3 and FR4 regions were not included in this analysis.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/peds/22/3/10.1093/protein/gzn058/2/m_gzn05804.jpeg?Expires=2147483647&Signature=rQwwXiz7GwoLZPks5cLpHICvoAGSj7SxDTCJHx3RUxeHZ7lKLalEQG21EkNWrQWP7YmF1-VsZAYSaUYp6FUbIxpwBoxFXW--GUp17BSNFrJlhAiQdSvKxBHOoPgbdztJ0mjNqChHndARGB1sFlSTg6CEUfyk9Dvy0DBNJ1EkxTsh9VWtgRwpimWi3lYULR2hT4vbjW1Mu8pqYeVPoyu4ZE2yumfKj7kOQaLaNle3LxWCZ5~MWWK0o8x8wfDCrRtkQeGBuEwo~9sTEOTiXyHqU~IttVQx4HtU5YjihAJLLxH~lqfjVNEO5EMLOt~0PfPK4JIFmQJvm6CvG5H5RtlhHw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)